linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
@ 2004-08-05 17:50 Phillip Lougher
  2004-08-06  0:55 ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Phillip Lougher @ 2004-08-05 17:50 UTC (permalink / raw)
  To: linux-kernel

Hi,

There is a readahead bug in do_generic_mapping_read (filemap.c).  This
bug appears to have been introduced in 2.6.8-rc1.  Specifically the bug
is caused by an incorrect code change which causes VFS to call
readpage() for indexes beyond the end of files where the file length is
zero or a 4k multiple.

In Squashfs this causes a variety of almost immediate OOPes because
Squashfs trusts the VFS not to pass invalid index values.  For other
filesystems it may also be causing subtle bugs.  I have received
prune_dcache oopes similar to Gene Heskett's (which was also
pointer corruption), and so it may fix this and other reported
readahead bugs.

The patch is against 2.6.8-rc3.

Regards

Phillip Lougher

diff --new-file -ur linux-2.6.8-rc3-squashfs2.0-test/mm/filemap.c linux-2.6.8-rc3-squashfs2.0-patched/mm/filemap.c
--- linux-2.6.8-rc3-squashfs2.0-test/mm/filemap.c       2004-08-05 02:14:39.000000000 +0100
+++ linux-2.6.8-rc3-squashfs2.0-patched/mm/filemap.c    2004-08-05 18:15:00.000000000 +0100
@@ -674,6 +674,15 @@
                 unsigned long nr, ret;

                 cond_resched();
+
+               /* nr is the maximum number of bytes to copy from this page */
+               nr = PAGE_CACHE_SIZE;
+               if (index == end_index) {
+                       nr = isize & ~PAGE_CACHE_MASK;
+                       if (nr <= offset)
+                               goto out;
+               }
+
                 page_cache_readahead(mapping, &ra, filp, index);

  find_page:
@@ -685,15 +694,6 @@
                 if (!PageUptodate(page))
                         goto page_not_up_to_date;
  page_ok:
-               /* nr is the maximum number of bytes to copy from this page */
-               nr = PAGE_CACHE_SIZE;
-               if (index == end_index) {
-                       nr = isize & ~PAGE_CACHE_MASK;
-                       if (nr <= offset) {
-                               page_cache_release(page);
-                               goto out;
-                       }
-               }
                 nr = nr - offset;

                 /* If users can be writing to this page using arbitrary


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-05 17:50 [PATCH] VFS readahead bug in 2.6.8-rc[1-3] Phillip Lougher
@ 2004-08-06  0:55 ` Nick Piggin
  2004-08-06  2:19   ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2004-08-06  0:55 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel, Andrew Morton

Phillip Lougher wrote:

> Hi,
>
> There is a readahead bug in do_generic_mapping_read (filemap.c).  This
> bug appears to have been introduced in 2.6.8-rc1.  Specifically the bug
> is caused by an incorrect code change which causes VFS to call
> readpage() for indexes beyond the end of files where the file length is
> zero or a 4k multiple.
>
> In Squashfs this causes a variety of almost immediate OOPes because
> Squashfs trusts the VFS not to pass invalid index values.  For other
> filesystems it may also be causing subtle bugs.  I have received
> prune_dcache oopes similar to Gene Heskett's (which was also
> pointer corruption), and so it may fix this and other reported
> readahead bugs.
>
> The patch is against 2.6.8-rc3.
>

Good work - bug is mine, sorry.

You actually re-introduce a bug where read can return incorrect
data due to i_size changing from under it (I introduced this bug
while fixing that one).

My fix was to re-check i_size and update 'nr' after doing the
->readpage. You could probably fix up both problems with your
patch and also copying the hunk down to after i_size gets rechecked.
Does that sound ok?


The root of the problem is that i_size gets checked from multiple
places that it can get out of synch. A nice fix would be to snapshot
i_size once, and pass that around everywhere. Unfortunately this is
very intrusive.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06  0:55 ` Nick Piggin
@ 2004-08-06  2:19   ` Nick Piggin
  2004-08-06 16:58     ` Phillip Lougher
  0 siblings, 1 reply; 11+ messages in thread
From: Nick Piggin @ 2004-08-06  2:19 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel, Andrew Morton, viro

Nick Piggin wrote:

> Phillip Lougher wrote:
>
>> Hi,
>>
>> There is a readahead bug in do_generic_mapping_read (filemap.c).  This
>> bug appears to have been introduced in 2.6.8-rc1.  Specifically the bug
>> is caused by an incorrect code change which causes VFS to call
>> readpage() for indexes beyond the end of files where the file length is
>> zero or a 4k multiple.
>>
>> In Squashfs this causes a variety of almost immediate OOPes because
>> Squashfs trusts the VFS not to pass invalid index values.  For other
>> filesystems it may also be causing subtle bugs.  I have received
>> prune_dcache oopes similar to Gene Heskett's (which was also
>> pointer corruption), and so it may fix this and other reported
>> readahead bugs.
>>
>> The patch is against 2.6.8-rc3.
>>
>
> Good work - bug is mine, sorry.
>

On second thought, maybe not. I think your filesystem is at fault.

Firstly, there possibly is a bug in do_pagecache_readahead which allows 
it to
read off the end of the file in a completely serialised situation. But 
even so,
notice the absence of locking - i_size can change at any time can't it? Then
your fix will blow up when you race with a truncate, right?

I think you need to handle reading off the end of the file properly: I'm not
too familiar with this code, but IIRC you are allowed to setup pagecache 
past
the end of the file - it gets handled correctly in 
do_generic_mapping_read, and
ends up falling off the LRU. This should help you.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06  2:19   ` Nick Piggin
@ 2004-08-06 16:58     ` Phillip Lougher
  2004-08-06 18:58       ` Nick Piggin
  0 siblings, 1 reply; 11+ messages in thread
From: Phillip Lougher @ 2004-08-06 16:58 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Andrew Morton, viro

Nick Piggin wrote:

> On second thought, maybe not. I think your filesystem is at fault.

No I'm not wrong here. With a read-only filesystem i_size can
never change, there are no possible race conditions.  If a too
large index is passed it is a VFS bug.  Are you suggesting I should
start to code assuming the VFS is broken?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 16:58     ` Phillip Lougher
@ 2004-08-06 18:58       ` Nick Piggin
  2004-08-06 19:14         ` Phillip Lougher
                           ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Nick Piggin @ 2004-08-06 18:58 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: linux-kernel, Andrew Morton, viro

Phillip Lougher wrote:
> Nick Piggin wrote:
> 
>> On second thought, maybe not. I think your filesystem is at fault.
> 
> 
> No I'm not wrong here. With a read-only filesystem i_size can
> never change, there are no possible race conditions.  If a too
> large index is passed it is a VFS bug.  Are you suggesting I should
> start to code assuming the VFS is broken?
> 

No, I suggest you start to code assuming this interface does
what it does. I didn't say there is no bug here, but nobody
else's filesystem breaks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 18:58       ` Nick Piggin
@ 2004-08-06 19:14         ` Phillip Lougher
  2004-08-06 19:31           ` viro
  2004-08-06 19:18         ` Phillip Lougher
  2004-08-07 14:21         ` Pozsar Balazs
  2 siblings, 1 reply; 11+ messages in thread
From: Phillip Lougher @ 2004-08-06 19:14 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Andrew Morton, viro

Nick Piggin wrote:
> Phillip Lougher wrote:
> 
>> Nick Piggin wrote:
>>
>>> On second thought, maybe not. I think your filesystem is at fault.
>>
>>
>>
>> No I'm not wrong here. With a read-only filesystem i_size can
>> never change, there are no possible race conditions.  If a too
>> large index is passed it is a VFS bug.  Are you suggesting I should
>> start to code assuming the VFS is broken?
>>
> 
> No, I suggest you start to code assuming this interface does
> what it does. I didn't say there is no bug here, but nobody
> else's filesystem breaks.
> 

Point one: The interface didn't do this UNTIL you changed the code
Point two: Just because no one has reported other filesystem
breakage, it doesn't mean other filesystems have not broken.

Perhaps I should have the check in my code.  However, it is
still stupid to fix an occasional race by ensuring readpage() is
always called with an out of bounds index when files are 0 or a
4K multiple.  The filesystem check may prevent a crash, but it
is needlessly wasteful by design, not through any inadvertant
race condition.

think is it stupid to fix an o.  However, I it
stupid to fix a race co
ll think it stupid to fix a race you en


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 18:58       ` Nick Piggin
  2004-08-06 19:14         ` Phillip Lougher
@ 2004-08-06 19:18         ` Phillip Lougher
  2004-08-06 19:46           ` Andrew Morton
  2004-08-07 14:21         ` Pozsar Balazs
  2 siblings, 1 reply; 11+ messages in thread
From: Phillip Lougher @ 2004-08-06 19:18 UTC (permalink / raw)
  To: Nick Piggin; +Cc: linux-kernel, Andrew Morton, viro

Nick Piggin wrote:

> No, I suggest you start to code assuming this interface does
> what it does. I didn't say there is no bug here, but nobody
> else's filesystem breaks.
> 

To stop this silly argument from escalating, I will patch my code.

Phillip



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 19:14         ` Phillip Lougher
@ 2004-08-06 19:31           ` viro
  0 siblings, 0 replies; 11+ messages in thread
From: viro @ 2004-08-06 19:31 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: Nick Piggin, linux-kernel, Andrew Morton

On Fri, Aug 06, 2004 at 08:14:55PM +0100, Phillip Lougher wrote:
> Point one: The interface didn't do this UNTIL you changed the code
> Point two: Just because no one has reported other filesystem
> breakage, it doesn't mean other filesystems have not broken.

Point three: it was never promised *AND* never intended.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 19:18         ` Phillip Lougher
@ 2004-08-06 19:46           ` Andrew Morton
  2004-08-16  7:55             ` [PATCH] " Ram Pai
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2004-08-06 19:46 UTC (permalink / raw)
  To: Phillip Lougher; +Cc: nickpiggin, linux-kernel, viro

Phillip Lougher <phillip@lougher.demon.co.uk> wrote:
>
> Nick Piggin wrote:
> 
> > No, I suggest you start to code assuming this interface does
> > what it does. I didn't say there is no bug here, but nobody
> > else's filesystem breaks.
> > 
> 
> To stop this silly argument from escalating, I will patch my code.
> 

Well I don't think it's silly.

We are deterministically asking the fs to read a page which lies outside
EOF, and we shouldn't.  If for no other reason than that the ever-popular
"read a million 4k files" workload will consume extra CPU and twice the
pagecache.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 18:58       ` Nick Piggin
  2004-08-06 19:14         ` Phillip Lougher
  2004-08-06 19:18         ` Phillip Lougher
@ 2004-08-07 14:21         ` Pozsar Balazs
  2 siblings, 0 replies; 11+ messages in thread
From: Pozsar Balazs @ 2004-08-07 14:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Phillip Lougher, linux-kernel, Andrew Morton, viro

On Sat, Aug 07, 2004 at 04:58:21AM +1000, Nick Piggin wrote:
> Phillip Lougher wrote:
> >Nick Piggin wrote:
> >
> >>On second thought, maybe not. I think your filesystem is at fault.
> >
> >
> >No I'm not wrong here. With a read-only filesystem i_size can
> >never change, there are no possible race conditions.  If a too
> >large index is passed it is a VFS bug.  Are you suggesting I should
> >start to code assuming the VFS is broken?
> >
> 
> No, I suggest you start to code assuming this interface does
> what it does. I didn't say there is no bug here, but nobody
> else's filesystem breaks.

Well, that might not exactly be true. We have noticed the same problem 
with ext2 too. Maybe I can send an example initrd image which triggers 
the bug on Monday.


-- 
Pozsar Balazs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Re: [PATCH] VFS readahead bug in 2.6.8-rc[1-3]
  2004-08-06 19:46           ` Andrew Morton
@ 2004-08-16  7:55             ` Ram Pai
  0 siblings, 0 replies; 11+ messages in thread
From: Ram Pai @ 2004-08-16  7:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Phillip Lougher, nickpiggin, linux-kernel, viro

[-- Attachment #1: Type: TEXT/PLAIN, Size: 755 bytes --]

On Fri, 6 Aug 2004, Andrew Morton wrote:

> Phillip Lougher <phillip@lougher.demon.co.uk> wrote:
> >
> > Nick Piggin wrote:
> > 
> > > No, I suggest you start to code assuming this interface does
> > > what it does. I didn't say there is no bug here, but nobody
> > > else's filesystem breaks.
> > > 
> > 
> > To stop this silly argument from escalating, I will patch my code.
> > 
> 
> Well I don't think it's silly.
> 
> We are deterministically asking the fs to read a page which lies outside
> EOF, and we shouldn't.  If for no other reason than that the ever-popular
> "read a million 4k files" workload will consume extra CPU and twice the
> pagecache.

Andrew,

	Enclosed a patch developed with Nick Piggin. It takes care of the
	bug. 

Thanks,
RP

[-- Attachment #2: Type: TEXT/PLAIN, Size: 611 bytes --]

--- ram/linux-2.6.8.1/mm/filemap.c	2004-08-14 03:56:25.000000000 -0700
+++ linux-2.6.8.1/mm/filemap.c	2004-08-16 07:56:31.912038720 -0700
@@ -665,14 +665,18 @@ void do_generic_mapping_read(struct addr
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
 	isize = i_size_read(inode);
-	end_index = isize >> PAGE_CACHE_SHIFT;
-	if (index > end_index)
+	if (!isize)
 		goto out;
+		
+	end_index = isize >> PAGE_CACHE_SHIFT;
 
 	for (;;) {
 		struct page *page;
 		unsigned long nr, ret;
 
+		if (index > end_index)
+			goto out;
+
 		cond_resched();
 		page_cache_readahead(mapping, &ra, filp, index);
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2004-08-16  7:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-05 17:50 [PATCH] VFS readahead bug in 2.6.8-rc[1-3] Phillip Lougher
2004-08-06  0:55 ` Nick Piggin
2004-08-06  2:19   ` Nick Piggin
2004-08-06 16:58     ` Phillip Lougher
2004-08-06 18:58       ` Nick Piggin
2004-08-06 19:14         ` Phillip Lougher
2004-08-06 19:31           ` viro
2004-08-06 19:18         ` Phillip Lougher
2004-08-06 19:46           ` Andrew Morton
2004-08-16  7:55             ` [PATCH] " Ram Pai
2004-08-07 14:21         ` Pozsar Balazs

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).