TLB:: ----- Q: The assignment says the following about the TLB: pid: 6 bits; a context or address space ID that can be used to allow entries to remain in the TLB after a context switch Is the "address space ID" the same as the PIDs we wrote in assignment 2? If so, then it seems like we just placed a limit on how many processes can run at a given time (2^6=64), and therefore there is a hardware defined max PID table size. Is this a correct interpretation? A: You can still have as many processes as you want and set up ASID = PID % 64. Then, when performing a context switch, you only need to invalidate the TLB entries for ASID. For example, if PID = 65, then ASID = 1. So you need to make sure when switching to PID = 65, that entries for PID = 64 * n + 1 are not present in the TLB. -------------------------------------------------------------------------------- Q: I see duplicate entries in the TLB. A: Make sure that as_activate is called on context switch and that it flushes the TLB. -------------------------------------------------------------------------------- Coremap:: --------- Q: Could the OS call kmalloc during an interrupt? If so, are we supposed to disable interrupts when accessing our coremap data structure? A: The OS161 interrupt handlers do not call kmalloc (locate the code for the interrupt handlers and see what they do). However, kfree can be called in interrupt context. Hence, you should disable interrupts when accessing the coremap. -------------------------------------------------------------------------------- Q: Do we need to distinguish between frames allocated to the kernel versus user pages? A: Yes, you should distinguish between frames used by the kernel (e.g., for page tables or any other memory returned by kmalloc) versus pages used for a user's address space. The reason is that you should page out user pages to swap but you should not page out kernel pages. For example, you could have a bit in the coremap for each frame that indicates whether the page in use is a kernel page or a user page. Another option is that a kernel page will not have a virtual page number associated with it. -------------------------------------------------------------------------------- Q: I understand that when a page fault occurs and the OS has run out of memory, then we need to page out some frames to swap, so that memory can be allocated for the faulting page. What should happen if kmalloc runs out of memory. A: The simplest option when allocating a page for the kernel (either directly, e.g., a call to alloc_kpages to allocate memory for page tables, or indirectly via kmalloc) is to return an error (e.g., NULL) when no more memory is available. More sophisticated operating systems will try to swap out user pages to make more memory available to the operating system. However, this needs to be done carefully because kmalloc will block (sleep) when swapping pages. So kmalloc and any caller of kmalloc needs to be prepared to handle sleeping on a call to kmalloc. Also, if kmalloc can be called in an interrupt context, then it should not sleep. In this case, a parameter can be passed that indicates that kmalloc should not sleep but return an error if no more memory is available. -------------------------------------------------------------------------------- Q: The kernel can request multiple contiguous frames. Isn't it possible that at some point, the kernel will try to do an allocation that happens to be bigger than any hole we currently have? A: The kernel should only allocate one frame at a time for user pages. For kernel pages, it could allocate multiple contiguous frames (e.g., by calling kmalloc with a size greater than 4K), but anytime it runs out of memory or doesn't have a big enough hole, you code should return an error (see question above). -------------------------------------------------------------------------------- Q: How do we "free" a frame of memory? Do we have to zero it out, or is it as simple as marking its corresponding bitmap entry as "0" so that we know we can re-use it now? You can free a frame either when free_kpages is called on a page or when you swap out a page to disk. A frame is freed by setting the corresponding bitmap entry in the coremap to 0. Then the freed frame can be used to serve a page request from another (or same process). Note that a page table that is using the frame will have to be updated. -------------------------------------------------------------------------------- Q: I understand that I can figure out the size of the coremap and use ramstealmem to "allocate" memory for it, and then later on I'll change getppages to use my coremap. However, how would I know the size of the inverted page table (IPT) in the coremap in advance since it will be changing over time. I am planning on using lists to implement the inverted page table, so that every time I make a new mapping, I will need to allocate more nodes. The Lab slides say that "Memory allocation for the coremap must be outside of the region of memory used to satisfy requests for memory from the rest of the kernel" - does this refer just to the bitmap, and not to the IPT? Because otherwise the only solution I see is to leave a big block of memory for it and hope I don't go over the boundary. A: The memory allocation for the bitmap part of the coremap (and any other fixed-size information that you need to maintain per frame) should be done early. After that, you can implement the inverted page table by allocating memory from the coremap itself. Note, that some frames are associated with kernel pages while others will be associated with user pages. To summarize, you should preallocate the coremap. Later on (after the coremap is setup), you can allocate (pid, vaddr) pairs using malloc. In this way, you are implementing the inverted page table by allocating memory from the coremap itself. The hardware maps the kernel virtual addresses to physical addresses without using the TLB or any page tables. Also, kernel frames should not be paged out to disk. So you do not need any page table, or inverted page table information for kernel frames. But you do need some way to distinguish kernel and user pages in the coremap. -------------------------------------------------------------------------------- Q: I am imagining the coremap as a single data structure: an array indexed by physical address where each element stores the vaddr, pid, allocated/unallocated bit for that frame, and the size of an allocated memory region (so that kfree will know when to stop freeing). For example, if there is a 4 page block allocated, then the first phys frame allocated will set it's size in the coremap to 4 and the next three frames to -1 to show that they are part of the same block. Is it a good idea to combine the bitmap and inverted page table like this? Or should we treat the bitmap and the inverted page table as separate data structures and why? A: You are welcome to combine the two data structures. If you plan to support page sharing then you will need to store one or more (pid, vaddr) pairs for each frame. For kernel pages, you need to store no such mappings. -------------------------------------------------------------------------------- Q: Why does the coremap need to keep track of the size of allocated regions? Since physical memory is non-contiguous, don't you actually need to keep track of the pointers to the next frame of the allocated region? A: alloc_kpages(npages) should allocate npages of contiguous physical memory. Hence you need to keep track of size (in pages) of allocated regions. -------------------------------------------------------------------------------- Q: Why does ram_getsize() set firstpaddr and lastpaddr to 0? A: Please read comments in arch/mips/include/vm.h -------------------------------------------------------------------------------- kmalloc:: --------- Q: How does kmalloc work? In particular, how does subpage_kmalloc work without knowing the information stored in our coremap? A: kmalloc (and subpage_kmalloc) calls alloc_kpages as needed and keeps track of memory it has allocated or freed within pages obtained by alloc_kpages. When it doesn't need a whole page, it calls free_kpages. In both cases, kmalloc/kfree just use the implementation of alloc_kpages/free_kpages but don't need to know how they are implemented. -------------------------------------------------------------------------------- Q: One of the slides in the lab 3 overview stated "You shouldn't have to call alloc+kpages except in as_create to allocate pages for your page table". We have been using kmalloc for our page tables. Is there a problem with this implementation? A: There is no problem with using kmalloc. However, alloc_kpages is more efficient because its allocates at page granularity (if you allocate a page using kmalloc, it may allocate more than a frame to keep track of the malloc'd page). -------------------------------------------------------------------------------- Q: I'm having difficulty understanding how the subpage allocator works. Before starting this assignment we were able to run forktest without any problems. We now find that the subpage allocator is failing, returning an error; "kmalloc: Subpage allocator couldn't get pageref", causing the kernel to crash. A: We found the bug in our code. The subpage allocator calls alloc_kpages when it needs to grab a new page for pagerefs. Our implementation of alloc_kpages had a call to kmalloc in it, causing a recursive loop. -------------------------------------------------------------------------------- page tables:: ------------- Q: Are page tables in userspace or kernelspace? A: Page tables are operating system data structures (in kernel space). However, they are used to map user address spaces to physical memory. Page tables cannot stored be in user space. Otherwise, a process could change the page table entries to point them other user's pages, etc. -------------------------------------------------------------------------------- Q: Are we going to be creating a page table for each process and storing it in RAM? To keep track of a page table, should each address space have a pointer to its own page table? Should the OS mark these frames as taken in the bitmap? A: Yes, Yes, Yes. -------------------------------------------------------------------------------- Q: If we understand correctly, each process needs its own page table, and the OS needs an inverted page table for all the processes (the core map). A page table stores the physical address for each virtual page, and there are 2^19 virtual pages (0x7FFFFFFF/2^12 addresses per page), meaning that a single linear table is more than the RAM that we have. The lab handout also talks about implementing both page tables AND an inverted page table for the coremap. What are we misunderstanding? A: You will need to implement a two level page table per process. -------------------------------------------------------------------------------- Q: Suppose Process 1 is running and emitting requests for virtual addresses. These get mapped to physical frames. For example virtual page 1 gets mapped to frame 25. There is a context switch and process 2 now gets its page 2 mapped to frame 25. The old contents of frame 25 are now on swap. How does process 1 get them back when it is context switched back in. A: When Page 1 of Process 1 was mapped to frame 25, then the OS would have invalidated the page table entry associated with Page 2 of Process 2 (assuming frame 25 was originally mapped to Page 2 of Process 2). So when Process 2 starts running and accesses its Page 2, a TLB/page fault will be generated. At this time, the OS will need to assign a frame to Page 2 of Process 2. If the frame contents are in the swap, it needs to load them from swap (the page table entry of Page 2 can keep this information). Otherwise, the frame contents may be in the executable. Suppose no free frame is available right now. The OS may choose to unmap frame 25 from Page 1 of Process 1 (note that Process 1 is not running right now). In this case, it will invalidate the page table entry associated with Page 1 of Process 1). Overall, the idea is that a frame is associated with only one process at a time -- unless a frame is shared by multiple pages of the same or multiple processes. The page table entry and the corresponding TLB entry should be made invalid when a frame is not associated with the page corresponding to the page table entry. An invalid/non-existent TLB entry will generate a TLB fault, at which time, the work of assigning a frame to the page on which the fault occurred can be done. -------------------------------------------------------------------------------- Q: When a context switch occurs, the OS needs to change the page table. Since the page table is stored in memory, the OS can simply keep track of the Page Table Register (the start of the page table) for each process. How should we store the previous value of this register and change this register? Where is the assembly code which does this and how do we interface with it? A: The page table register is used in x86 hardware where the page table is accessed by hardware. There is no page table register in MIPS because it uses a software managed TLB and doesn't have any idea about page tables. In OS161, you can keep track of the per-process page table in the thread/process structure (you could think of the pointer that you would keep to the page table as the value that would be used to update the page table register in x86). On a TLB fault, you can access the current thread and then get hold of the page table from the thread/process structure. -------------------------------------------------------------------------------- Q: Should OS page tables be paged to disk? A: We don't expect you to page operating system code and data because that can add significant complexity. -------------------------------------------------------------------------------- demand paging:: --------------- Q: Do we need to refer to the ph struct available in load_elf when demand loading pages? A: The ph struct should be used when defining VM regions but it should not be required when demand loading pages. -------------------------------------------------------------------------------- Q: Can we reuse the load_segment() function for loading pages on demand? A: You should not reuse load_segment directly for loading pages on demand. The reason is that load_segment uses user-virtual addresses for loading pages. See: u.uio_segflg = is_executable ? UIO_USERISPACE : UIO_USERSPACE; After setting the user segment, load_segment uses VOP_READ, which writes to a user-virtual page, and so the hardware will perform a TLB lookup for the same entry that caused the TLB fault, causing another (recursive) TLB fault. This is hard to handle correctly. Instead, you should load pages using kernel addresses, which will bypass the TLB. To do so, define two functions, page_read and page_write. These functions should read from disk and write to disk. They should take arguments similar to load_segment but operate on kernel virtual addresses (so the vaddr argument of load_segment should be a kernel virtual address of a frame that is assigned to a user virtual address, i.e., use PADDR_TO_KVADDR to convert a physical address to a kernel virtual addres). Then use mk_kuio and use the kernel virtual address for doing the VOP_READ(). -------------------------------------------------------------------------------- Q: The function load_segment takes a vnode, offset, vaddr, memsize, filesize and executable. The lecture notes suggest that vnode and offset be stored in the address space. vaddr can be obtained when a TLB_fault occurs. memsize seems to be page size. However, how does one obtain the filesize for load_segment? A: The vnode and offset should be stored per segment/region, not for the entire address space. filesize is obtained from the executable, see ph.p_filesze in the load_elf implementation. You should keep this value per-region when defining the regions, and read until there. For example, say the ph.p_filesz value for the data segment is 4100, while p_memsz is 4096 * 3 = 12288. Then you can read the first page from the data section (0-4096). For the next page, you should ONLY read 4 bytes from the executable and then ZERO out the rest of the page. For the last page, you should not read from the file. Instead, you should zero out the whole page. Otherwise, you will have errors similar to "short read on segment-file truncated" in load_segment. Similarly, if ph.p_filesz is 4096, while p_memsz is 12288, then you should fully zero out the second and third pages when loading them. -------------------------------------------------------------------------------- Q: Is there a way to see the various segments/regions that are in the ELF executable? A: Use cs161-readelf to inspect OS161 MIPS elf files. Example usage: readelf -a segments -------------------------------------------------------------------------------- VM regions:: ------------ Q: Is it true that the code section is always read only and the data section is always read/write? A: Yes, the compiler normally adds this information in the ELF executable. -------------------------------------------------------------------------------- Q: alloc_kpages and getppages allocate contiguous frames of physical memory. Does that imply the stack and other virtual memory regions must all be contiguous in memory? A: Each region is contiguous in virtual memory, but the physical pages (frames) can be discontiguous. On a page fault, a frame is mapped to a page in the region. You should call alloc_kpages(1) to allocate the frame. -------------------------------------------------------------------------------- Q: In as_define_region(), do we assign zero pages or one page for a region initially? What about the stack, can we assign several (how many?) pages since we expect the stack to grow quickly? A: You can allocate zero pages and they should be paged in on demand. -------------------------------------------------------------------------------- Q: I don't understand what vbase1 and vbase2 are. What is the difference between them? Is one for data and one for text? A: vbase1 is the text (code) region. vbase2 is the data (globals) region. They are currently initialized by load_elf() using as_define_region(). -------------------------------------------------------------------------------- Q: Are data and text segments/regions, loaded one segment at a time? A: You need to load a page at a time for any segment, not the entire segment, in the TLB fault handler. -------------------------------------------------------------------------------- Q: The r/w/x permissions for the VM regions are available in the executable. How are we supposed to use these permissions? A: The OS is supposed to use the permissions from the executable to setup the permissions for the VM regions. -------------------------------------------------------------------------------- Q: It seems like we need 2 vnodes and 2 offsets, one for text and one for data sections. Is this correct, shouldn't it be 1 vnode for the file and then an offset for each of the sections? More generally, why do we need a vnode? Don't we just need a string that will allow us to open the executable file into a vnode every time we need to read in more of the code. If not, when does the file get closed? A: For each region, you should maintain a pointer to the same vnode, but with different offsets. It is faster to keep an open file (a vnode is an in-kernel data structure representing an open file) around so that when a page fault occurs, the file does not have to be opened every time. The kernel maintains vnodes using reference counts. The file is opened using vfs_open() in execv. result = vfs_open(filename, permissions, &v); // reference count for v is 1 // execv also closes it using vfs_close(v); // decrements reference count by 1. // when reference count is 0, close file. However, vfs_close() will only close the file when the reference count is 1. You can increase the vnode reference count by using VOP_INCREF(v); So when you define a region and setup a pointer to the vnode, increment the reference count to the vnode. Similarly, when a region is destroyed, use VOP_DECREF(). When two regions share the save vnode (VOP_INCREF will be called twice), they will both have to be destroyed before the vnode is closed. -------------------------------------------------------------------------------- Q: Dumb vm supports three memory regions (code, data and stack). How many do we need to support? A: You need to add a heap region. -------------------------------------------------------------------------------- Q: In the current thread_exit implementation, as_destroy is called before entering zombie mode. According to the definition of as_destroy, we should be freeing the vm regions and pages associated with the address space, which would destroy the current thread's stack before we switch to another thread. Isn't that a problem? A: The thread stack region that is destroyed by as_destroy is the user stack. The kernel also maintains a kernel stack per thread and executes on this stack (thread->t_stack). The kernel stack is destroyed later. -------------------------------------------------------------------------------- as_copy:: --------- Q: It seems tricky to copy the address contents in as_copy from the parent to the child because there does not seem to be a way for the TLB to distinguish between the two sets of virtual addresses. A: Notice that as_copy uses kernel virtual addresses (by using PADDR_TO_KVADDR), and the TLB is not used for kernel virtual addresses, so it doesn't need to worry about distinguishing between the two set of virtual addresses. You should allocate frames for the new address space, and then call memmove to copy the contents of the regions (text, data, etc.) by using the macro PADDR_TO_KVADDR. -------------------------------------------------------------------------------- Q: Should as_copy make a copy of all of the parent pages? What about pages that not loaded yet? What about pages that are in the swap? A: The version of as_copy in dumbvm.c works assumes that the parent process already has all its pages loaded in memory and simply copies them to the child. With demand paging, as_copy has to be careful because the parent's pages may not be in memory. The new as_copy should set up the regions (as you did by copying the region values, e.g., as_vbase_text) and the page tables. You should copy all parent pages that are 1) currently in memory, or 2) in swap. You do not need to copy pages that not in memory and have never been swapped. If these pages belong to the text or data segment, the child can load them from the executable. If they belong to the stack, they will be empty (zeroed out) when loaded. If you plan to implement copy-on-write, then you simply need to copy the page table entries for the pages in memory (and also update the coremap to indicate shared pages) but do not need to copy the parent's pages. For parent pages that are in swap, you should read them in, and then copy them to the child to avoid handling swapping of shared pages. If you plan to implement shared swap pages, then you can simply copy the shared swap entry. -------------------------------------------------------------------------------- Q: When I run forktest, I get a panic in vm_fault because it gets a NULL address space. It seems my child thread is running before it gets an address space. A: A thread doesn't run until you call make_runnable, so you should call as_copy before making a thread runnable. Also, you should check that as_copy is not returning an error. -------------------------------------------------------------------------------- sbrk:: ------ Q: Do we need to handle sbrk(size), for a non-page aligned size value? A: Yes, assume that sbrk can be called with a non-page aligned size. Note that in the kernel, you will need to align the brk value to a page size (round it) when modifying the heap region, because you cannot provide protection at a granularity smaller than the page size. However, you will need to keep the actual (possibly non-page aligned) value of brk so that you can return the previous value on a call to sbrk(). -------------------------------------------------------------------------------- Q: Do we need to handle sbrk(size), for a negative size value? A: The malloc implementation we have provided does not call sbrk with a negative value so you don't need to support sbrk with a negative value. However, the sbrk semantics are that it can be called with a negative value. -------------------------------------------------------------------------------- Q: On the first call of malloc, it is supposed to run __malloc_init, where it calls sbrk(0) to initialize __heapbase and __heaptop to be equal to the base address of the heap. To ensure __malloc_init is run only on the first call to malloc, it checks a conditional (if __heapbase == 0), implying it should be zero on startup. Our problem is that __heapbase, a stactic userptr_t, takes on a garbage value not equal to zero, and hence the init function is never called. A: This problem suggests that the page loading code needs to initialize pages correctly. The problem may be that the ph.p_filesz value in a segment (in loadelf.c) is not a multiple of the page size. In that case, you need to zero out the rest of the page. Also, ph.p_filesz may be much less than ph.p_memsz, in which case you need to zero out entire pages. -------------------------------------------------------------------------------- Q: Should it be possible for a user process to allocate contiguous frames of physical memory? Right now, our implementation of sbrk() just increases the top of the heap and then memory is allocated in vm_fault, one page at a time. A: You should not need to allocate more than one page at a time for user memory. Now you can see why a paging based virtual-memory system is useful - fixed size allocation is much easier to handle! -------------------------------------------------------------------------------- Q: I have yet to see a program that uses the heap so I am not sure who is suppose to be defining the region and how we can test it. A: At the user level, any program that calls malloc will be using the heap because malloc calls sbrk. Update any existing program or write a new program to call malloc and free. -------------------------------------------------------------------------------- swapping:: ---------- Q: When reading from an executable, we needed to take ph.p_filesz into account. Do we have to do the same for swapping? A: When swapping, with mk_kuio, you don't need to explicitly set the uio_resid for either VOP_WRITE or VOP_READ. In either case, they should be PAGE_SIZE, since you are reading or writing a PAGE_SIZE. -------------------------------------------------------------------------------- Q: How can I make sure that the data I wrote to the swap is the same data I am reading from the swap? A: Here is a way to check that the swap is working fine on a page fault: before writing to swap, take a checksum of the page (e.g., cast the page as an array of integers and take a sum of the integers), store the checksum in the page table entry. Then read the swap page as needed on a page fault, and then when returning from the page fault, take the checksum of the page that is read in, and compare with the stored checksum. -------------------------------------------------------------------------------- Q: Do I need to worry about the interaction between swapping and the paging thread? A: You need to make sure that a frame that is allocated to a faulting page is pinned when data is being read from the swap so that it is not chosen for eviction? A similar issue can arise whenever data is read from or written to swap (e.g., on a fork, the parent's pages might be read from swap). -------------------------------------------------------------------------------- Q: Are concurrent reads and writes allowed to swap? A: With shared pages, multiple threads may issue read/write to the same page in swap, and so it is a good idea to lock pages being being read/written to swap. Concurrent read/write to different swap pages should be handled by the disk and do not need to be serialized. However, you can lock all swap operations with a single lock for simplicity. -------------------------------------------------------------------------------- testing:: --------- Q: Why is this panic happening? "panic: Assertion failed: in_interrupt==0, at ../../thread/synch.c:140 (lock_acquire)" A: This panic suggests that you are trying to acquire a lock in interrupt context. If you need to synchronize a data structure that can be accessed in a system call and interrupt context, then you need to disable interrupts when accessing the structure in system call context. -------------------------------------------------------------------------------- Q: Why would this panic happen in the mips_trap function? assert ((vaddrt_t)tf < (vaddr_t)(curthread->t_stack+STACK_SIZE)); A: This assert will fail if you overflow the kernel stack of the current thread. That could happen if you call some function in the kernel recursively many times (e.g., a bug might do that) or allocate a large array as a local variable and then an interrupt occurs while the kernel code is running. -------------------------------------------------------------------------------- Q: What programs should I be using to test the VM assignment? A: The Lab 2 tests should work. The rest of the tests are mentioned in the VM assignment. After you implement demand paging but before you implement swapping, matmult, palin, parallelvm, stacktest should work. If they don't work, you should see out of memory errors. Increase the size of your OS sufficiently and these should start working. The rest of the tests will work after swapping is implemented. -------------------------------------------------------------------------------- Q: When we run malloctest, particularly options 5,6, and 7, we experience synchronization issues between malloctest and the menu. Sometimes, the menu interrupts the malloctest thread and we come back to the OS main menu instead of terminating the malloc test and coming back to the malloctest menu. A: are you using thread_join after thread_fork to ensure that the main thread waits for the child thread in common_prog? Also, malloctest 6 will not work because it uses an open system call which you have't implemented yet. -------------------------------------------------------------------------------- Q: I am seeing "Bus error exception". A: Make sure that you are not returning a physical frame that is greater than the total RAM region. For example, with 512K physical memory, you should not be assigning a frame with address more than 0x80000. -------------------------------------------------------------------------------- Q: I get the following panic: assertion fail (vaddr_t)tf > (vaddr_t)curthread->t_stack A: This means that the trap frame is at a lower address than the bottom of the stack. This is probably due to a serious corruption of the thread struct or the thread stack. You should first find out whether the code was running in kernel or user mode before the trap - see the value of the iskern variable in mips_trap. If in user mode, the kernel stack is setup based on the curkstack global variable (see exception.S) and md_switch(). Use gdb to break in md_switch() and check that nu->pcb_kstack is correct (it should be at the top of the thread stack). If the code was previously in kernel mode, put a break in gdb at the point of the assert in mips_trap and then type "bt" to see the backtrace. Is the backtrace reasonable? If it is very long, e.g., some long recursive call, then you are likely running off the one page stack. --------------------------------------------------------------------------------