Re: test10-pre7

Message ID Pine.GSO.4.21.0010301505590.1177-100000@weyl.math.psu.edu
State New, archived
Headers show
Series
  • Re: test10-pre7
Related show

Commit Message

Alexander Viro Oct. 30, 2000, 8:34 p.m. UTC
On Mon, 30 Oct 2000, Linus Torvalds wrote:

> 
> Ok, this one contains at least a preliminary fix for the problem with
> truncate together with a concurrent page access - the bug that causes
> oopses in block_read_full_page() and filemap_nopage().
> 
> This is a fairly minimal fix, and I'll still have to verify that I caught
> all the relevant places, but I wanted people who have seen this problem to
> please test this out asap - I'll make a real test10 later once I've
> integrated some further patches from Alan and Jeff, but this should fix
> the major show-stopper bug.

Unfortunately, it doesn't fix the thing. ->sync_page() is called when we
do not own the page lock and nfs_sync_page() uses page->mapping. Yes, we
check it before calling the bloody thing, but we don't own the lock.
Problem only for NFS, but I'm not sure what to do about it - the whole
point of ->sync_page() seems to be (if I understood Trond's intentions
right) in forcing the ->readpage() in progress.

Another place you've missed is in read_cache_page(). That one is easy - we've
just locked the page and we should just repeat the whole thing if it's out
of cache.

One more is in filemap_swapout() - dunno, I just shifted the check to
filemap_write_page().

One more: check in do_generic_file_read() for ->mapping->i_shared_mmap.
Fix: trivial.

The last one is in deactivate_page_nolock() - there we check the ->mapping
without pagecache_lock and without page lock. Hell knows whether it's a
bug or not. Rik?

Minimal patch (against -pre7) follows. It still leaves sync_page() problem
open - any suggestions on that one are very welcome. Other than that and
deactivate_page_nolock() we should be safe wrt. ->mapping. Please, apply -
after that we will be in sync. nfs_sync_page() is still a problem and if
somebody (Trond?) might tell WTF it is supposed to be...
							Cheers,
								Al


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Comments

Linus Torvalds Oct. 30, 2000, 9:02 p.m. UTC | #1
On Mon, 30 Oct 2000, Alexander Viro wrote:
> 
> Unfortunately, it doesn't fix the thing. ->sync_page() is called when we
> do not own the page lock and nfs_sync_page() uses page->mapping. Yes, we
> check it before calling the bloody thing, but we don't own the lock.

Good catch.

> Problem only for NFS, but I'm not sure what to do about it - the whole
> point of ->sync_page() seems to be (if I understood Trond's intentions
> right) in forcing the ->readpage() in progress.

How about just changing ->sync_page() semantics to own the page lock? That
sound slike the right thing anyway, no?

> Another place you've missed is in read_cache_page(). That one is easy - we've
> just locked the page and we should just repeat the whole thing if it's out
> of cache.

I didn't actually miss it, I just looked at the users and decided that it
looks like they should never have this issue. But I might have missed
something. As far as I can tell, "read_cache_page()" is only used for
meta-data like things that cannot be truncated.

But you're right, we should do it for consistency.

> One more is in filemap_swapout() - dunno, I just shifted the check to
> filemap_write_page().

I'd really like to do these in the thing that locks the page, and make the
rule be that the page locker needs to do the work. That's why I'd prefer
to let the test be in the _caller_ of filemap_write_page(), as that's the
point where we got the lock.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Alexander Viro Oct. 30, 2000, 9:23 p.m. UTC | #2
On Mon, 30 Oct 2000, Linus Torvalds wrote:

> How about just changing ->sync_page() semantics to own the page lock? That
> sound slike the right thing anyway, no?

It would kill the ->sync_page(), but yes, _that_ might be the right thing ;-)

> I didn't actually miss it, I just looked at the users and decided that it
> looks like they should never have this issue. But I might have missed
> something. As far as I can tell, "read_cache_page()" is only used for
> meta-data like things that cannot be truncated.

invalidate_inode_pages().

> I'd really like to do these in the thing that locks the page, and make the
> rule be that the page locker needs to do the work. That's why I'd prefer
> to let the test be in the _caller_ of filemap_write_page(), as that's the
> point where we got the lock.

Fine with me, but then we would have to do it in try_to_swap_out() and
that would be Wrong Thing(tm) (e.g. because ->swapout() makes sense for
anonymous pages).

We could do it in filemap_swapout(), but the lock is taken by its caller,
so...
							Cheers,
								Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Alexander Viro Oct. 30, 2000, 10:01 p.m. UTC | #3
On Mon, 30 Oct 2000, Alexander Viro wrote:

> 
> 
> On Mon, 30 Oct 2000, Linus Torvalds wrote:
> 
> > How about just changing ->sync_page() semantics to own the page lock? That
> > sound slike the right thing anyway, no?
> 
> It would kill the ->sync_page(), but yes, _that_ might be the right thing ;-)

To elaborate: the thing is called if we get a contention on the page lock.
Essentially, its use in NFS is renice -20 for the requests on our page
wrt RPC scheduler. By the time when page gets unlocked it becomes a NOP.
On local filesystems it just runs the tq_disk - nothing in common with
the NFS case and IMO Trond was wrong lumping them together. In effect,
we are getting run_task_queue(&tq_disk) executed _very_ often and I'm less
than sure that it's a good idea. I think that ->sync_page() is not a
well-defined operation and NFS scheduler should use the locking of its own,
both for inavlidate_... and here.
							Cheers,
								Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Rik van Riel Oct. 30, 2000, 10:06 p.m. UTC | #4
On Mon, 30 Oct 2000, Alexander Viro wrote:

> The last one is in deactivate_page_nolock() - there we check the
> ->mapping without pagecache_lock and without page lock. Hell
> knows whether it's a bug or not. Rik?

Shouldn't be a problem, since we'll have the lock at a time
we actually /do/ something with those pointers.

In deactivate_page_nolock(), all we can modify is the list
in which the page resides, the flags indicating on which
list the page is and the referenced bit + page age. No other
stuff is touched.

Furthermore, the locking order (first pagecache lock, then
the page_list_lock) would make it difficult to do this right...

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
       -- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/		http://www.surriel.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Linus Torvalds Oct. 30, 2000, 10:21 p.m. UTC | #5
On Mon, 30 Oct 2000, Alexander Viro wrote:
> 
> > I didn't actually miss it, I just looked at the users and decided that it
> > looks like they should never have this issue. But I might have missed
> > something. As far as I can tell, "read_cache_page()" is only used for
> > meta-data like things that cannot be truncated.
> 
> invalidate_inode_pages().

Nope. It checks the page count these days, so it would never kill such a
page from under us (we increment the page count while holding the
pagecache lock).

But yes, I'm starting to agree with you more and more..

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Linus Torvalds Oct. 30, 2000, 11:05 p.m. UTC | #6
On Mon, 30 Oct 2000, Alexander Viro wrote:
>
> [ sync_page brokenness ]
> 
> To elaborate: the thing is called if we get a contention on the page lock.

Ok, sync_page() looks like a broken design, but I suspect that for
expediency the simplest fix is to just make the NFS sync_page() (re-)check
for "mapping == NULL", and let it be at that. Avoid the NULL pointer
dereference (very small window already).

We should probably in the long run make "page->buffers" be a more generic
thing, and let NFS use it as a wb-info thing, and be done with it. That's
obviously not 2.4.x material, though.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Alexander Viro Oct. 30, 2000, 11:14 p.m. UTC | #7
On Mon, 30 Oct 2000, Linus Torvalds wrote:

> Ok, sync_page() looks like a broken design, but I suspect that for
> expediency the simplest fix is to just make the NFS sync_page() (re-)check
> for "mapping == NULL", and let it be at that. Avoid the NULL pointer
> dereference (very small window already).

Fine with me. Just let's remember that it should be revisited in 2.5.
What about filemap_swapout()? If you agree with checking ->mapping
there... looks like we are done with that crap for the time being.
If it's OK with you I'll send such patch against vanilla -pre7.
							Cheers,
								Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
Linus Torvalds Oct. 30, 2000, 11:17 p.m. UTC | #8
On Mon, 30 Oct 2000, Alexander Viro wrote:
> 
> Fine with me. Just let's remember that it should be revisited in 2.5.
> What about filemap_swapout()? If you agree with checking ->mapping
> there... looks like we are done with that crap for the time being.

Yup, I agree. I already applied your patch, and did the additional
"mapping" check in nfs_sync_page. We should be ok for now, the only wart
being the fact that sync_page() is ugly.

But better ugly than broken.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Patch

--- filemap.c	Mon Oct 30 18:46:17 2000
+++ filemap.c.new	Mon Oct 30 18:54:05 2000
@@ -981,7 +981,7 @@ 
 		 * virtual addresses, take care about potential aliasing
 		 * before reading the page on the kernel side.
 		 */
-		if (page->mapping->i_mmap_shared != NULL)
+		if (mapping->i_mmap_shared != NULL)
 			flush_dcache_page(page);
 
 		/*
@@ -1473,7 +1473,8 @@ 
 	 * vma/file is guaranteed to exist in the unmap/sync cases because
 	 * mmap_sem is held.
 	 */
-	return page->mapping->a_ops->writepage(file, page);
+	/* Nothing to do if somebody truncated the page from under us.. */
+	return page->mapping?page->mapping->a_ops->writepage(file, page):0;
 }
 
 
@@ -1544,9 +1545,7 @@ 
 	lock_page(page);
 
 	error = 0;
-	/* Nothing to do if somebody truncated the page from under us.. */
-	if (page->mapping)
-		error = filemap_write_page(vma->vm_file, page, 1);
+	error = filemap_write_page(vma->vm_file, page, 1);
 
 	UnlockPage(page);
 	page_cache_free(page);
@@ -2313,13 +2312,20 @@ 
 				int (*filler)(void *,struct page*),
 				void *data)
 {
-	struct page *page = __read_cache_page(mapping, index, filler, data);
+	struct page *page;
+retry:
+	page = __read_cache_page(mapping, index, filler, data);
 	int err;
 
 	if (IS_ERR(page) || Page_Uptodate(page))
 		goto out;
 
 	lock_page(page);
+	if (!page->mapping) {
+		UnlockPage(page);
+		page_cache_release(page);
+		goto retry;
+	}
 	if (Page_Uptodate(page)) {
 		UnlockPage(page);
 		goto out;