All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 209243] New: fsx IO_URING reading get BAD DATA
@ 2020-09-12  5:46 bugzilla-daemon
  2020-09-12  5:50 ` [Bug 209243] " bugzilla-daemon
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-12  5:46 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

            Bug ID: 209243
           Summary: fsx IO_URING reading get BAD DATA
           Product: File System
           Version: 2.5
    Kernel Version: 5.9.0-rc4 with xfs-5.10-merge-1
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: XFS
          Assignee: filesystem_xfs@kernel-bugs.kernel.org
          Reporter: zlang@redhat.com
        Regression: No

Description of problem:

I've maken xfstests' fsstress and fsx support IO_URING read/write, refer to
below patchset:
https://marc.info/?l=fstests&m=159980858202447&w=2

But the fsx io_uring test on XFS failed:

# ./ltp/fsx -U -R -W -l $((128000*5)) -o 128000 /mnt/scratch/foo
truncating to largest ever: 0x8d1ef
copying to largest ever: 0x94e29
zero_range to largest ever: 0x9c400
READ BAD DATA: offset = 0xadf4, size = 0x1de1d, fname = /mnt/scratch/foo
OFFSET  GOOD    BAD     RANGE
0x17000 0x7e57  0x0000  0x00000
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17001 0x577e  0x0000  0x00001
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17002 0x7e81  0x0000  0x00002
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17003 0x817e  0x0000  0x00003
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17004 0x7ee5  0x0000  0x00004
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17005 0xe57e  0x0000  0x00005
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17006 0x7ec3  0x0000  0x00006
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17007 0xc37e  0x0000  0x00007
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17008 0x7e60  0x0000  0x00008
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x17009 0x607e  0x0000  0x00009
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700a 0x7ee4  0x0000  0x0000a
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700b 0xe47e  0x0000  0x0000b
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700c 0x7e16  0x0000  0x0000c
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700d 0x167e  0x0000  0x0000d
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700e 0x7ebb  0x0000  0x0000e
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
0x1700f 0xbb7e  0x0000  0x0000f
operation# (mod 256) for the bad data unknown, check HOLE and EXTEND ops
LOG DUMP (6784 total operations):
1(  1 mod 256): TRUNCATE UP     from 0x0 to 0x8d1ef     ******WWWW
2(  2 mod 256): CLONE 0x74000 thru 0x7ffff      (0xc000 bytes) to 0x11000 thru
0x1cfff  ******JJJJ
3(  3 mod 256): COPY 0x896df thru 0x8d1ee       (0x3b10 bytes) to 0x39fd3 thru
0x3dae2
4(  4 mod 256): READ     0x5c4d2 thru 0x72c7d   (0x167ac bytes)
5(  5 mod 256): COPY 0x4fad2 thru 0x676c1       (0x17bf0 bytes) to 0x213f7 thru
0x38fe6
...
...
6775(119 mod 256): WRITE    0x42e2b thru 0x546bf        (0x11895 bytes)
6776(120 mod 256): TRUNCATE DOWN        from 0x7ebed to 0x4a2e5
6777(121 mod 256): READ     0x1c41 thru 0x10ef9 (0xf2b9 bytes)
6778(122 mod 256): PUNCH    0x2ba67 thru 0x466da        (0x1ac74 bytes)
6779(123 mod 256): INSERT 0x3c000 thru 0x59fff  (0x1e000 bytes)
6780(124 mod 256): ZERO     0x2648e thru 0x3bdfc        (0x1596f bytes)
6781(125 mod 256): WRITE    0x2532d thru 0x3b5ed        (0x162c1 bytes)
6782(126 mod 256): WRITE    0xc049 thru 0x20ea9 (0x14e61 bytes) ***WWWW
6783(127 mod 256): FALLOC   0x2420 thru 0x19c4f (0x1782f bytes) INTERIOR
6784(128 mod 256): READ     0xadf4 thru 0x28c10 (0x1de1d bytes) ***RRRR***
Log of operations saved to "/mnt/scratch/foo.fsxops"; replay with --replay-ops
Correct content saved for comparison
(maybe hexdump "/mnt/scratch/foo" vs "/mnt/scratch/foo.fsxgood")

I'll upload foo foo.fsxgood, foo.fsxops and foo.fsxlog files later.

It's easily to reproduce on my system, by the arguments I used above. But same
test can't reproduce this error on ext4.

Version-Release number of selected component (if applicable):
Fedora with upstream xfs-linux for-next HEAD = commit
18a1031619dece8f46b7deb2da477c8134d0bf89 xfs-5.10-merge-1

How reproducible:
100%

Steps to Reproduce:
1. mkfs.xfs -f /dev/mapper/testvg-scratchdev
meta-data=/dev/mapper/testvg-scratchdev isize=512    agcount=4, agsize=1966080
blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=7864320, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=3840, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
2. git clone latest xfstests-dev, and merge the patchset
https://marc.info/?l=fstests&m=159980858202447&w=2
3. build xfstests
4. mount /dev/mapper/testvg-scratchdev /mnt/scratch
5. ./xfstests/ltp/fsx -U -R -W -l $((128000*5)) -o 128000 /mnt/scratch/foo

Additional info:
1) To avoid the same mistake of
https://bugzilla.kernel.org/show_bug.cgi?id=208827, I've tried to deal with the
short read/write of io_uring in fsx. 

2) General fsx test(fsx -S 0 -U /mnt/scratch/foo) doesn't fail, but only when I
specify the "-o" option of fsx, the test fails.

3) Can't reproduce same failure on ext4 by keep running same test on ext4 one
hour.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
@ 2020-09-12  5:50 ` bugzilla-daemon
  2020-09-13  7:59 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-12  5:50 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #1 from Zorro Lang (zlang@redhat.com) ---
Created attachment 292481
  --> https://bugzilla.kernel.org/attachment.cgi?id=292481&action=edit
fsx log files

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
  2020-09-12  5:50 ` [Bug 209243] " bugzilla-daemon
@ 2020-09-13  7:59 ` bugzilla-daemon
  2020-09-13 15:00 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-13  7:59 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #2 from Zorro Lang (zlang@redhat.com) ---
This failure can't be reproduced on older kernel, likes 5.8-rc4 (I have a
ready-made one, so just tested on it). It looks like a regression, need more
test.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
  2020-09-12  5:50 ` [Bug 209243] " bugzilla-daemon
  2020-09-13  7:59 ` bugzilla-daemon
@ 2020-09-13 15:00 ` bugzilla-daemon
  2020-09-13 18:05 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-13 15:00 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #3 from Zorro Lang (zlang@redhat.com) ---
FYI, looks like LVM is needed, at least I haven't reproduced it on general disk
partition or loop device.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (2 preceding siblings ...)
  2020-09-13 15:00 ` bugzilla-daemon
@ 2020-09-13 18:05 ` bugzilla-daemon
  2020-09-14  7:07 ` [Bug 209243] [regression] " bugzilla-daemon
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-13 18:05 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #4 from Zorro Lang (zlang@redhat.com) ---
FYI, This mainline linux v5.8 can't reproduce this bug, but v5.9-rc1 can... The
v5.9-rc1 contains lots of changes of xfs and io_uring, I'm still not sure if
it's a xfs issue or io_uring issue for now.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-09-13 18:05 ` bugzilla-daemon
@ 2020-09-14  7:07 ` bugzilla-daemon
  2020-09-14 13:39 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14  7:07 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

Zorro Lang (zlang@redhat.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |axboe@kernel.dk
            Summary|fsx IO_URING reading get    |[regression] fsx IO_URING
                   |BAD DATA                    |reading get BAD DATA

--- Comment #5 from Zorro Lang (zlang@redhat.com) ---
Finally, I find the first commit which can reproduce this failure:

commit c3cf992c25c7ff04bfc4dec5c916705a5332320e
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri May 22 09:24:42 2020 -0600

    io_uring: support true async buffered reads, if file provides it

    If the file is flagged with FMODE_BUF_RASYNC, then we don't have to punt
    the buffered read to an io-wq worker. Instead we can rely on page
    unlocking callbacks to support retry based async IO. This is a lot more
    efficient than doing async thread offload.

    The retry is done similarly to how we handle poll based retry. From
    the unlock callback, we simply queue the retry to a task_work based
    handler.

    Signed-off-by: Jens Axboe <axboe@kernel.dk>

So CC the author of this patch to get more review.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-09-14  7:07 ` [Bug 209243] [regression] " bugzilla-daemon
@ 2020-09-14 13:39 ` bugzilla-daemon
  2020-09-14 15:04 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14 13:39 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #6 from Jens Axboe (axboe@kernel.dk) ---
I'll take a look. Do you have a repo of fsstress with your io_uring changes
included?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-09-14 13:39 ` bugzilla-daemon
@ 2020-09-14 15:04 ` bugzilla-daemon
  2020-09-14 15:27 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14 15:04 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #7 from Zorro Lang (zlang@redhat.com) ---
(In reply to Jens Axboe from comment #6)
> I'll take a look. Do you have a repo of fsstress with your io_uring changes
> included?

Thanks for your reply. Sorry, I don't prepare a public xfstests git repo. But
you can:

1) git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
2) merget below 5 patches into above xfstests:
https://patchwork.kernel.org/patch/11769857/
https://patchwork.kernel.org/patch/11769849/
https://patchwork.kernel.org/patch/11769851/
https://patchwork.kernel.org/patch/11769853/
https://patchwork.kernel.org/patch/11769855/

3) build xfstests with liburing install
4) make a XFS filesystem on a LVM device
5) mount the xfs filesystem
6) run xfstests/ltp/fsx -U -R -W -o 128000 $mnt/foo

Thanks,
Zorro

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-09-14 15:04 ` bugzilla-daemon
@ 2020-09-14 15:27 ` bugzilla-daemon
  2020-09-14 18:17 ` bugzilla-daemon
  2020-09-14 19:49 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14 15:27 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #8 from Jens Axboe (axboe@kernel.dk) ---
No problem, that's what I ended up doing. Can you try the patch I replied on
the list with?

https://lore.kernel.org/io-uring/9d3c38bc-302c-5eb6-c772-7072a75eaf74@kernel.dk/T/#m669bfdc265c53e165bc2bfd4c3484243fa0ac33d

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (7 preceding siblings ...)
  2020-09-14 15:27 ` bugzilla-daemon
@ 2020-09-14 18:17 ` bugzilla-daemon
  2020-09-14 19:49 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14 18:17 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #9 from Zorro Lang (zlang@redhat.com) ---
(In reply to Jens Axboe from comment #8)
> No problem, that's what I ended up doing. Can you try the patch I replied on
> the list with?
> 
> https://lore.kernel.org/io-uring/9d3c38bc-302c-5eb6-c772-7072a75eaf74@kernel.
> dk/T/#m669bfdc265c53e165bc2bfd4c3484243fa0ac33d

By a simple test, I can't reproduce this bug after merge above patch.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug 209243] [regression] fsx IO_URING reading get BAD DATA
  2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
                   ` (8 preceding siblings ...)
  2020-09-14 18:17 ` bugzilla-daemon
@ 2020-09-14 19:49 ` bugzilla-daemon
  9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2020-09-14 19:49 UTC (permalink / raw)
  To: linux-xfs

https://bugzilla.kernel.org/show_bug.cgi?id=209243

--- Comment #10 from Jens Axboe (axboe@kernel.dk) ---
Great, thanks for testing! I'll add your reported-by and tested-by.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-14 19:49 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-12  5:46 [Bug 209243] New: fsx IO_URING reading get BAD DATA bugzilla-daemon
2020-09-12  5:50 ` [Bug 209243] " bugzilla-daemon
2020-09-13  7:59 ` bugzilla-daemon
2020-09-13 15:00 ` bugzilla-daemon
2020-09-13 18:05 ` bugzilla-daemon
2020-09-14  7:07 ` [Bug 209243] [regression] " bugzilla-daemon
2020-09-14 13:39 ` bugzilla-daemon
2020-09-14 15:04 ` bugzilla-daemon
2020-09-14 15:27 ` bugzilla-daemon
2020-09-14 18:17 ` bugzilla-daemon
2020-09-14 19:49 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.