Message ID | Pine.GSO.4.21.0010301505590.1177-100000@weyl.math.psu.edu |
---|---|
State | New, archived |
Headers | show |
Series |
|
Related | show |
On Mon, 30 Oct 2000, Alexander Viro wrote: > > Unfortunately, it doesn't fix the thing. ->sync_page() is called when we > do not own the page lock and nfs_sync_page() uses page->mapping. Yes, we > check it before calling the bloody thing, but we don't own the lock. Good catch. > Problem only for NFS, but I'm not sure what to do about it - the whole > point of ->sync_page() seems to be (if I understood Trond's intentions > right) in forcing the ->readpage() in progress. How about just changing ->sync_page() semantics to own the page lock? That sound slike the right thing anyway, no? > Another place you've missed is in read_cache_page(). That one is easy - we've > just locked the page and we should just repeat the whole thing if it's out > of cache. I didn't actually miss it, I just looked at the users and decided that it looks like they should never have this issue. But I might have missed something. As far as I can tell, "read_cache_page()" is only used for meta-data like things that cannot be truncated. But you're right, we should do it for consistency. > One more is in filemap_swapout() - dunno, I just shifted the check to > filemap_write_page(). I'd really like to do these in the thing that locks the page, and make the rule be that the page locker needs to do the work. That's why I'd prefer to let the test be in the _caller_ of filemap_write_page(), as that's the point where we got the lock. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Linus Torvalds wrote: > How about just changing ->sync_page() semantics to own the page lock? That > sound slike the right thing anyway, no? It would kill the ->sync_page(), but yes, _that_ might be the right thing ;-) > I didn't actually miss it, I just looked at the users and decided that it > looks like they should never have this issue. But I might have missed > something. As far as I can tell, "read_cache_page()" is only used for > meta-data like things that cannot be truncated. invalidate_inode_pages(). > I'd really like to do these in the thing that locks the page, and make the > rule be that the page locker needs to do the work. That's why I'd prefer > to let the test be in the _caller_ of filemap_write_page(), as that's the > point where we got the lock. Fine with me, but then we would have to do it in try_to_swap_out() and that would be Wrong Thing(tm) (e.g. because ->swapout() makes sense for anonymous pages). We could do it in filemap_swapout(), but the lock is taken by its caller, so... Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Alexander Viro wrote: > > > On Mon, 30 Oct 2000, Linus Torvalds wrote: > > > How about just changing ->sync_page() semantics to own the page lock? That > > sound slike the right thing anyway, no? > > It would kill the ->sync_page(), but yes, _that_ might be the right thing ;-) To elaborate: the thing is called if we get a contention on the page lock. Essentially, its use in NFS is renice -20 for the requests on our page wrt RPC scheduler. By the time when page gets unlocked it becomes a NOP. On local filesystems it just runs the tq_disk - nothing in common with the NFS case and IMO Trond was wrong lumping them together. In effect, we are getting run_task_queue(&tq_disk) executed _very_ often and I'm less than sure that it's a good idea. I think that ->sync_page() is not a well-defined operation and NFS scheduler should use the locking of its own, both for inavlidate_... and here. Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Alexander Viro wrote: > The last one is in deactivate_page_nolock() - there we check the > ->mapping without pagecache_lock and without page lock. Hell > knows whether it's a bug or not. Rik? Shouldn't be a problem, since we'll have the lock at a time we actually /do/ something with those pointers. In deactivate_page_nolock(), all we can modify is the list in which the page resides, the flags indicating on which list the page is and the referenced bit + page age. No other stuff is touched. Furthermore, the locking order (first pagecache lock, then the page_list_lock) would make it difficult to do this right... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Alexander Viro wrote: > > > I didn't actually miss it, I just looked at the users and decided that it > > looks like they should never have this issue. But I might have missed > > something. As far as I can tell, "read_cache_page()" is only used for > > meta-data like things that cannot be truncated. > > invalidate_inode_pages(). Nope. It checks the page count these days, so it would never kill such a page from under us (we increment the page count while holding the pagecache lock). But yes, I'm starting to agree with you more and more.. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Alexander Viro wrote: > > [ sync_page brokenness ] > > To elaborate: the thing is called if we get a contention on the page lock. Ok, sync_page() looks like a broken design, but I suspect that for expediency the simplest fix is to just make the NFS sync_page() (re-)check for "mapping == NULL", and let it be at that. Avoid the NULL pointer dereference (very small window already). We should probably in the long run make "page->buffers" be a more generic thing, and let NFS use it as a wb-info thing, and be done with it. That's obviously not 2.4.x material, though. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Linus Torvalds wrote: > Ok, sync_page() looks like a broken design, but I suspect that for > expediency the simplest fix is to just make the NFS sync_page() (re-)check > for "mapping == NULL", and let it be at that. Avoid the NULL pointer > dereference (very small window already). Fine with me. Just let's remember that it should be revisited in 2.5. What about filemap_swapout()? If you agree with checking ->mapping there... looks like we are done with that crap for the time being. If it's OK with you I'll send such patch against vanilla -pre7. Cheers, Al - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
On Mon, 30 Oct 2000, Alexander Viro wrote: > > Fine with me. Just let's remember that it should be revisited in 2.5. > What about filemap_swapout()? If you agree with checking ->mapping > there... looks like we are done with that crap for the time being. Yup, I agree. I already applied your patch, and did the additional "mapping" check in nfs_sync_page. We should be ok for now, the only wart being the fact that sync_page() is ugly. But better ugly than broken. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/
--- filemap.c Mon Oct 30 18:46:17 2000 +++ filemap.c.new Mon Oct 30 18:54:05 2000 @@ -981,7 +981,7 @@ * virtual addresses, take care about potential aliasing * before reading the page on the kernel side. */ - if (page->mapping->i_mmap_shared != NULL) + if (mapping->i_mmap_shared != NULL) flush_dcache_page(page); /* @@ -1473,7 +1473,8 @@ * vma/file is guaranteed to exist in the unmap/sync cases because * mmap_sem is held. */ - return page->mapping->a_ops->writepage(file, page); + /* Nothing to do if somebody truncated the page from under us.. */ + return page->mapping?page->mapping->a_ops->writepage(file, page):0; } @@ -1544,9 +1545,7 @@ lock_page(page); error = 0; - /* Nothing to do if somebody truncated the page from under us.. */ - if (page->mapping) - error = filemap_write_page(vma->vm_file, page, 1); + error = filemap_write_page(vma->vm_file, page, 1); UnlockPage(page); page_cache_free(page); @@ -2313,13 +2312,20 @@ int (*filler)(void *,struct page*), void *data) { - struct page *page = __read_cache_page(mapping, index, filler, data); + struct page *page; +retry: + page = __read_cache_page(mapping, index, filler, data); int err; if (IS_ERR(page) || Page_Uptodate(page)) goto out; lock_page(page); + if (!page->mapping) { + UnlockPage(page); + page_cache_release(page); + goto retry; + } if (Page_Uptodate(page)) { UnlockPage(page); goto out;