All of lore.kernel.org
 help / color / mirror / Atom feed
* fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF
@ 2013-06-10 13:17 Brian Foster
  2013-06-10 21:31 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2013-06-10 13:17 UTC (permalink / raw)
  To: xfs

Hi guys,

I wanted to get this onto the list... I suspect this could be
similar/related to the issue reported here:

http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

While running xfstests, the only apparent regression I hit from 3.9.0
was generic/263. This test fails due to the following command (and
resulting output):

# fsx -N 10000 -o 128000 -l 500000 -r 4096 -t 512 -w 512 -Z /mnt/junk
truncating to largest ever: 0x12a00
truncating to largest ever: 0x75400
fallocating to largest ever: 0x7a120
Mapped Read: non-zero data past EOF (0x79dff) page offset 0xe00 is 0xe927
	offset = 0x78000, size = 0x1220
LOG DUMP (7966 total operations):
...
7959( 23 mod 256): TRUNCATE DOWN	from 0x2d200 to 0x1e200
7960( 24 mod 256): MAPWRITE 0x54800 thru 0x655e7	(0x10de8 bytes)
7961( 25 mod 256): FALLOC   0x448b4 thru 0x5835d	(0x13aa9 bytes) INTERIOR
7962( 26 mod 256): WRITE    0x8200 thru 0xb7ff	(0x3600 bytes)
7963( 27 mod 256): READ     0x61000 thru 0x64fff	(0x4000 bytes)
7964( 28 mod 256): MAPREAD  0x6000 thru 0xe5fe	(0x85ff bytes)
7965( 29 mod 256): WRITE    0x6ca00 thru 0x79dff	(0xd400 bytes) HOLE
7966( 30 mod 256): MAPREAD  0x78000 thru 0x7921f	(0x1220 bytes)
Correct content saved for comparison
(maybe hexdump "/mnt/junk" vs "/mnt/junk.fsxgood")

So if I'm following that correctly, we truncate the file down to
0x1e200, extend it with an mmap write to 0x655e7, do a couple internal
reads/writes, extend to 0x79dff with a direct write and hit stale data
on an mmap read at eof.

Post-mortem on the file:

# stat /mnt/junk
  File: `/mnt/junk'
  Size: 499200    	Blocks: 704        IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 131         Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-06-05 11:04:04.968000000 -0400
Modify: 2013-06-05 11:04:04.967000000 -0400
Change: 2013-06-05 11:04:04.967000000 -0400
 Birth: -

# xfs_bmap -v /mnt/junk
junk:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
   0: [0..31]:         96..127           0 (96..127)           32
   1: [32..127]:       256..351          0 (256..351)          96
   2: [128..223]:      896..991          0 (896..991)          96
   3: [224..543]:      hole                                   320
   4: [544..1023]:     1312..1791        0 (1312..1791)       480 10000

I ran a bisect between tot and 3.9 and narrowed down to:

e114b5fc xfs: increase prealloc size to double that of the previous extent

... though IIRC, this was an additive fix on top of the recent sparse
speculative prealloc updates, so it might not be much more than a data
point:

a1e16c26 xfs: limit speculative prealloc size on sparse files

This is interesting from a release perspective in that the latter change
is included in 3.9 and the former fix is not. Therefore, I went back to
3.8 and found I can reproduce it there as well. FWIW, I can also
reproduce this on tot with allocsize=131072 (though it appears to be
intermittent) and on 3.8 under similar circumstances (w/ speculative
prealloc or allocsize >= 131072). Given all that, my speculation at this
point is that the more recent prealloc changes probably don't introduce
the core issue, but rather alter the behavior enough to determine
whether this test case triggers it.

Brian

P.S., I also came across the following thread which, if related,
suggests this might be known/understood to a degree:

http://oss.sgi.com/archives/xfs/2012-04/msg00703.html

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF
  2013-06-10 13:17 fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF Brian Foster
@ 2013-06-10 21:31 ` Dave Chinner
  2013-06-10 23:17   ` Brian Foster
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2013-06-10 21:31 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote:
> Hi guys,
> 
> I wanted to get this onto the list... I suspect this could be
> similar/related to the issue reported here:
> 
> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html

Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's
problem has neither...

> While running xfstests, the only apparent regression I hit from 3.9.0
> was generic/263. This test fails due to the following command (and
> resulting output):

Not a regression - 263 has been failing ever since it was introduced
in 2011 by:

commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024
Author: Christoph Hellwig <hch@infradead.org>
Date:   Mon Oct 10 18:22:16 2011 +0000

    split mapped writes vs direct I/O tests from 091
    
    This effectively reverts
    
        xfstests: add mapped write fsx operations to 091
    
    and adds a new test case for it.  It tests something slightly
    different, and regressions in existing tests due to new features
    are pretty nasty in a test suite.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Dave Chinner <dchinner@redhat.com>
    Signed-off-by: Alex Elder <aelder@sgi.com>

It is testing mmap() writes vs direct IO, something that is known to
be fundamentally broken (i.e. racy) as mmap() page fault path does
not hold the XFS_IOLOCK or i_mutex in any way.  The direct IO path
tries to wark around this by flushing and invalidating cached pages
before IO submission, but the lack of locking in the page fault path
means we can't avoid the race entirely.

> P.S., I also came across the following thread which, if related,
> suggests this might be known/understood to a degree:
> 
> http://oss.sgi.com/archives/xfs/2012-04/msg00703.html

Yup, that's potentially one aspect of it. However, have you run the
test code on ext3/4? it works just fine - it's only XFS that has
problems with this case, so it's not clear that this is a DIO
problem. It was never able to work out where ext3/ext4 were zeroing
the part of the page beyond EOF, and I couldn't ever make the DIO
code reliably do the right thing. It's one of the reasons that lead
to this discussion as LSFMM:

http://lwn.net/Articles/548351/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF
  2013-06-10 21:31 ` Dave Chinner
@ 2013-06-10 23:17   ` Brian Foster
  2013-06-10 23:42     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Brian Foster @ 2013-06-10 23:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 06/10/2013 05:31 PM, Dave Chinner wrote:
> On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote:
>> Hi guys,
>>
>> I wanted to get this onto the list... I suspect this could be
>> similar/related to the issue reported here:
>>
>> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
> 
> Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's
> problem has neither...
> 

Oh, Ok. I didn't look at that one closely enough then.

>> While running xfstests, the only apparent regression I hit from 3.9.0
>> was generic/263. This test fails due to the following command (and
>> resulting output):
> 
> Not a regression - 263 has been failing ever since it was introduced
> in 2011 by:
> 
> commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024
...
> 
> It is testing mmap() writes vs direct IO, something that is known to
> be fundamentally broken (i.e. racy) as mmap() page fault path does
> not hold the XFS_IOLOCK or i_mutex in any way.  The direct IO path
> tries to wark around this by flushing and invalidating cached pages
> before IO submission, but the lack of locking in the page fault path
> means we can't avoid the race entirely.
> 

Thanks for the explanation.

>> P.S., I also came across the following thread which, if related,
>> suggests this might be known/understood to a degree:
>>
>> http://oss.sgi.com/archives/xfs/2012-04/msg00703.html
> 
> Yup, that's potentially one aspect of it. However, have you run the
> test code on ext3/4? it works just fine - it's only XFS that has
> problems with this case, so it's not clear that this is a DIO
> problem. It was never able to work out where ext3/ext4 were zeroing
> the part of the page beyond EOF, and I couldn't ever make the DIO
> code reliably do the right thing. It's one of the reasons that lead
> to this discussion as LSFMM:
> 
> http://lwn.net/Articles/548351/
> 

Interesting, thanks again. I did happen to run the script and the fsx
test on the ext4 rootfs of my VM and observed expected behavior.

Note that I mentioned this was harder to reproduce with fixed alloc
sizes less than 128k or so. I don't believe ext4 does any kind of
speculative preallocation in the manner that XFS does. Perhaps that is a
factor..?

Brian

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF
  2013-06-10 23:17   ` Brian Foster
@ 2013-06-10 23:42     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2013-06-10 23:42 UTC (permalink / raw)
  To: Brian Foster; +Cc: xfs

On Mon, Jun 10, 2013 at 07:17:22PM -0400, Brian Foster wrote:
> On 06/10/2013 05:31 PM, Dave Chinner wrote:
> > On Mon, Jun 10, 2013 at 09:17:31AM -0400, Brian Foster wrote:
> >> Hi guys,
> >>
> >> I wanted to get this onto the list... I suspect this could be
> >> similar/related to the issue reported here:
> >>
> >> http://oss.sgi.com/archives/xfs/2013-06/msg00066.html
> > 
> > Unlikely - generic/263 tests mmap IO vs direct IO, and Sage's
> > problem has neither...
> > 
> 
> Oh, Ok. I didn't look at that one closely enough then.
> 
> >> While running xfstests, the only apparent regression I hit from 3.9.0
> >> was generic/263. This test fails due to the following command (and
> >> resulting output):
> > 
> > Not a regression - 263 has been failing ever since it was introduced
> > in 2011 by:
> > 
> > commit 0d69e10ed15b01397e8c6fd7833fa3c2970ec024
> ...
> > 
> > It is testing mmap() writes vs direct IO, something that is known to
> > be fundamentally broken (i.e. racy) as mmap() page fault path does
> > not hold the XFS_IOLOCK or i_mutex in any way.  The direct IO path
> > tries to wark around this by flushing and invalidating cached pages
> > before IO submission, but the lack of locking in the page fault path
> > means we can't avoid the race entirely.
> > 
> 
> Thanks for the explanation.
> 
> >> P.S., I also came across the following thread which, if related,
> >> suggests this might be known/understood to a degree:
> >>
> >> http://oss.sgi.com/archives/xfs/2012-04/msg00703.html
> > 
> > Yup, that's potentially one aspect of it. However, have you run the
> > test code on ext3/4? it works just fine - it's only XFS that has
> > problems with this case, so it's not clear that this is a DIO
> > problem. It was never able to work out where ext3/ext4 were zeroing
> > the part of the page beyond EOF, and I couldn't ever make the DIO
> > code reliably do the right thing. It's one of the reasons that lead
> > to this discussion as LSFMM:
> > 
> > http://lwn.net/Articles/548351/
> > 
> 
> Interesting, thanks again. I did happen to run the script and the fsx
> test on the ext4 rootfs of my VM and observed expected behavior.
> 
> Note that I mentioned this was harder to reproduce with fixed alloc
> sizes less than 128k or so. I don't believe ext4 does any kind of
> speculative preallocation in the manner that XFS does. Perhaps that is a
> factor..?

Oh, it most likely is, but XFS has done speculative prealloc since,
well, forever, so this isn't a regression as such.  FWIW, the old
default for speculative prealloc was XFS_WRITEIO_LOG_LARGE (16
filesystem blocks), so this test would have failed before any of the
dynamic speculative alloc changes were made....

Indeed, if you mount with -o allocsize=4k, you'll find the test case
no longer fails - it requires allocsize=32k (or larger) to fail
here. That's not surprising, given that the test is writing across a
16k-beyond-eof boundary when it triggers the problem, and so needs a
prealloc size of >16k to trigger...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-06-10 23:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-10 13:17 fsx failure on 3.10.0-rc1+ (xfstests 263) -- Mapped Read: non-zero data past EOF Brian Foster
2013-06-10 21:31 ` Dave Chinner
2013-06-10 23:17   ` Brian Foster
2013-06-10 23:42     ` Dave Chinner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.