All of lore.kernel.org
 help / color / mirror / Atom feed
* [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
@ 2016-12-14 19:27 Joseph Salisbury
  2016-12-14 20:27 ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Joseph Salisbury @ 2016-12-14 19:27 UTC (permalink / raw)
  To: fangwei1
  Cc: hch, david, Alexander Viro, stable, Andrew Morton, Linus Torvalds

Hi Wei,

A kernel bug report was opened against Ubuntu [0].  It was found that
reverting the following commit resolved this bug:

commit c2a9737f45e27d8263ff9643f994bda9bac0b944
Author: Wei Fang <fangwei1@huawei.com>
Date:   Fri Oct 7 17:01:52 2016 -0700

    vfs,mm: fix a dead loop in truncate_inode_pages_range()


The regression was introduced as of v4.9-rc1.  It was also sent to
stable, and now affects 4.8.4 and newer.

I was hoping to get your feedback, since you are the patch author.  Do
you think gathering any additional data will help diagnose this issue,
or would it be best to submit a revert request?


Thanks,

Joe

[0] http://pad.lv/1649342



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 19:27 [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range() Joseph Salisbury
@ 2016-12-14 20:27 ` Linus Torvalds
  2016-12-14 20:45   ` Linus Torvalds
  2016-12-14 21:15   ` Al Viro
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2016-12-14 20:27 UTC (permalink / raw)
  To: Joseph Salisbury
  Cc: fangwei1, Christoph Hellwig, Dave Chinner, Alexander Viro,
	stable, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Wed, Dec 14, 2016 at 11:27 AM, Joseph Salisbury
<joseph.salisbury@canonical.com> wrote:
>
> A kernel bug report was opened against Ubuntu [0].  It was found that
> reverting the following commit resolved this bug:
>
> commit c2a9737f45e27d8263ff9643f994bda9bac0b944
>
> [0] http://pad.lv/1649342

Hah.

Looks like a rather special corner case for s_maxbytes.

I think what is going on is that the read() at the end of the file
should return 0, not EINVAL.

And nobody normally notices, because nobody ever hits s_maxbytes in
any normal situation.

Anyway, does this stupid one-liner fix the problem? By definition, if
*ppos >= inode->i_sb->s_maxbytes, then we must be at or past the end
of the file, and so the read() should return 0.

The EINVAL was always wrong, I think, but it made some (bad) sense
when you think of the issue as being the result of somebody trying to
do something bad. But it isn't actually an error.

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 465 bytes --]

 mm/filemap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 69568388c699..b06517b7f97f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1637,7 +1637,7 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
 	int error = 0;
 
 	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
-		return -EINVAL;
+		return 0;
 	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
 
 	index = *ppos >> PAGE_SHIFT;

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 20:27 ` Linus Torvalds
@ 2016-12-14 20:45   ` Linus Torvalds
  2016-12-14 21:14     ` Linus Torvalds
  2016-12-19 10:45     ` Christoph Hellwig
  2016-12-14 21:15   ` Al Viro
  1 sibling, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2016-12-14 20:45 UTC (permalink / raw)
  To: Joseph Salisbury
  Cc: fangwei1, Christoph Hellwig, Dave Chinner, Alexander Viro,
	stable, Andrew Morton

On Wed, Dec 14, 2016 at 12:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The EINVAL was always wrong, I think, but it made some (bad) sense
> when you think of the issue as being the result of somebody trying to
> do something bad. But it isn't actually an error.

Here's an example program that I think shows this on pretty much any filesystem.

Run with something like

    strace ./a.out < ./a.out

and then look at the final part of the trace, which will look something like

    lseek(0, 17592186040320, SEEK_SET)      = 17592186040320
    read(0, 0x7ffe1d2df140, 10)             = -1 EINVAL (Invalid argument)
    lseek(0, 17592186040319, SEEK_SET)      = 17592186040319
    read(0, "", 10)                         = 0

and that EINVAL from the first read is just bogus and wrong. We're
clearly at a good offset - the lseek() succeeded - and we're clearly
past the end of the file, so we should be returning 0 (for that EOF).

And the second read succeeds just because we've done a seek to one byte less.

I'm not guaranteeing that I got that stupid binary search for the
filesystem s_maxsize right, but it worked for me. This is a truly
crappy test program, but whatever.

              Linus

---
#define _LARGEFILE64_SOURCE
#include <sys/types.h>
#include <unistd.h>

int main(int argc, char **argv)
{
        char buf[10];
        off64_t large_offset = 1000000;
        off64_t max_offset = (~(unsigned long long)0)>>1;

        for (;;) {
                off64_t try_offset = (unsigned long long)(max_offset +
large_offset + 1)/2;

                if (try_offset >= max_offset)
                        break;

                if (lseek64(0, try_offset, SEEK_SET) < 0)
                        max_offset = try_offset;
                else
                        large_offset = try_offset;
        }

        lseek64(0, large_offset, SEEK_SET);
        read(0, buf, 10);

        lseek64(0, large_offset-1, SEEK_SET);
        read(0, buf, 10);

        return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 20:45   ` Linus Torvalds
@ 2016-12-14 21:14     ` Linus Torvalds
  2016-12-19 10:45     ` Christoph Hellwig
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2016-12-14 21:14 UTC (permalink / raw)
  To: Joseph Salisbury
  Cc: fangwei1, Christoph Hellwig, Dave Chinner, Alexander Viro,
	stable, Andrew Morton

On Wed, Dec 14, 2016 at 12:45 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> and then look at the final part of the trace, which will look something like
>
>     lseek(0, 17592186040320, SEEK_SET)      = 17592186040320
>     read(0, 0x7ffe1d2df140, 10)             = -1 EINVAL (Invalid argument)
>     lseek(0, 17592186040319, SEEK_SET)      = 17592186040319
>     read(0, "", 10)                         = 0
>
> and that EINVAL from the first read is just bogus and wrong.

Trivially tested that yes, my suggested patch does actually fix it,
and with it you get

  lseek(0, 17592186040320, SEEK_SET)      = 17592186040320
  read(0, "", 10)                         = 0
  lseek(0, 17592186040319, SEEK_SET)      = 17592186040319
  read(0, "", 10)                         = 0

instead. Commited and pushed out, and marked for stable.

I didn't actually test the md5 case that was in the original report,
but it seems "obviously the same thing".

Famous last words.

             Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 20:27 ` Linus Torvalds
  2016-12-14 20:45   ` Linus Torvalds
@ 2016-12-14 21:15   ` Al Viro
  2016-12-14 21:27     ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Al Viro @ 2016-12-14 21:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joseph Salisbury, fangwei1, Christoph Hellwig, Dave Chinner,
	stable, Andrew Morton

On Wed, Dec 14, 2016 at 12:27:41PM -0800, Linus Torvalds wrote:

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 69568388c699..b06517b7f97f 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1637,7 +1637,7 @@ static ssize_t do_generic_file_read(struct file *filp, loff_t *ppos,
>  	int error = 0;
>  
>  	if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
> -		return -EINVAL;
> +		return 0;
>  	iov_iter_truncate(iter, inode->i_sb->s_maxbytes);

BTW, shouldn't we truncate to *ppos - inode->i_sb->s_maxbytes here as well?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 21:15   ` Al Viro
@ 2016-12-14 21:27     ` Linus Torvalds
  2016-12-14 21:30       ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2016-12-14 21:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Joseph Salisbury, fangwei1, Christoph Hellwig, Dave Chinner,
	stable, Andrew Morton

On Wed, Dec 14, 2016 at 1:15 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>>       if (unlikely(*ppos >= inode->i_sb->s_maxbytes))
>> -             return -EINVAL;
>> +             return 0;
>>       iov_iter_truncate(iter, inode->i_sb->s_maxbytes);
>
> BTW, shouldn't we truncate to *ppos - inode->i_sb->s_maxbytes here as well?

You're right, that looks iffy, and I think we should.

That said, I wonder if we instead should get rid of that truncation
entirely, and just do the s_maxbytes test inside the loop? Or perhaps
just modify the last_index calculations - screw the "s_maxbytes" part,
it's really the last index that matters.

           Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 21:27     ` Linus Torvalds
@ 2016-12-14 21:30       ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2016-12-14 21:30 UTC (permalink / raw)
  To: Al Viro
  Cc: Joseph Salisbury, fangwei1, Christoph Hellwig, Dave Chinner,
	stable, Andrew Morton

On Wed, Dec 14, 2016 at 1:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> That said, I wonder if we instead should get rid of that truncation
> entirely, and just do the s_maxbytes test inside the loop? Or perhaps
> just modify the last_index calculations - screw the "s_maxbytes" part,
> it's really the last index that matters.

Actually, let's ignore s_maxbytes *entirely*, and just check against
i_size_read() before each page.

Yes,. we need to check it *after* getting the page too (like we
already do), because of races. But if somebody is reading past the end
of the file, why bother creating the empty page cache entry?

What do you think?

                Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-14 20:45   ` Linus Torvalds
  2016-12-14 21:14     ` Linus Torvalds
@ 2016-12-19 10:45     ` Christoph Hellwig
  2016-12-19 15:56       ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2016-12-19 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joseph Salisbury, fangwei1, Christoph Hellwig, Dave Chinner,
	Alexander Viro, stable, Andrew Morton

Linus,

can we add the testcase to xfstests?  For that we'll just need
some license on it (BSD, GPLv2, your choice)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range()
  2016-12-19 10:45     ` Christoph Hellwig
@ 2016-12-19 15:56       ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2016-12-19 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joseph Salisbury, fangwei1, Dave Chinner, Alexander Viro, stable,
	Andrew Morton

On Mon, Dec 19, 2016 at 2:45 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> can we add the testcase to xfstests?  For that we'll just need
> some license on it (BSD, GPLv2, your choice)

Sure. For stuff I don't care about, I use the BSD 3-clause thing. I'm
assuming you'll fix the stupid  thing to actually report the error
rather than having to strace it.

    Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-12-19 15:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-14 19:27 [v4.9 Regression] vfs,mm: fix a dead loop in truncate_inode_pages_range() Joseph Salisbury
2016-12-14 20:27 ` Linus Torvalds
2016-12-14 20:45   ` Linus Torvalds
2016-12-14 21:14     ` Linus Torvalds
2016-12-19 10:45     ` Christoph Hellwig
2016-12-19 15:56       ` Linus Torvalds
2016-12-14 21:15   ` Al Viro
2016-12-14 21:27     ` Linus Torvalds
2016-12-14 21:30       ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.