All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: linux-xfs@vger.kernel.org
Subject: [Bug 208827] [fio io_uring] io_uring write data crc32c verify failed
Date: Fri, 07 Aug 2020 03:12:03 +0000	[thread overview]
Message-ID: <bug-208827-201763-ubSctIQBY4@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-208827-201763@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=208827

--- Comment #1 from Dave Chinner (david@fromorbit.com) ---
On Thu, Aug 06, 2020 at 04:57:58AM +0000, bugzilla-daemon@bugzilla.kernel.org
wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=208827
> 
>             Bug ID: 208827
>            Summary: [fio io_uring] io_uring write data crc32c verify
>                     failed
>            Product: File System
>            Version: 2.5
>     Kernel Version: xfs-linux xfs-5.9-merge-7 + v5.8-rc4
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: XFS
>           Assignee: filesystem_xfs@kernel-bugs.kernel.org
>           Reporter: zlang@redhat.com
>         Regression: No
> 
> Description of problem:
> Our fio io_uring test failed as below:
> 
> # fio io_uring.fio
> uring_w: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB,
> (T)
> 64.0KiB-64.0KiB, ioengine=io_uring, iodepth=16
> uring_sqt_w: (g=0): rw=randwrite, bs=(R) 64.0KiB-64.0KiB, (W)
> 64.0KiB-64.0KiB,
> (T) 64.0KiB-64.0KiB, ioengine=io_uring, iodepth=16
> uring_rw: (g=0): rw=randrw, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB, (T)
> 64.0KiB-64.0KiB, ioengine=io_uring, iodepth=16
> uring_sqt_rw: (g=0): rw=randrw, bs=(R) 64.0KiB-64.0KiB, (W) 64.0KiB-64.0KiB,
> (T) 64.0KiB-64.0KiB, ioengine=io_uring, iodepth=16
> fio-3.21-39-g87622
> Starting 4 threads
> uring_w: Laying out IO file (1 file / 256MiB)
> uring_sqt_w: Laying out IO file (1 file / 256MiB)
> uring_rw: Laying out IO file (1 file / 256MiB)
> uring_sqt_rw: Laying out IO file (1 file / 256MiB)
> crc32c: verify failed at file /mnt/fio/uring_rw.0.0 offset 265289728, length
> 65536 (requested block: offset=265289728, length=65536)
>        Expected CRC: e8f1ef35
>        Received CRC: 9dd0deae
> fio: pid=46530, err=84/file:io_u.c:2108, func=io_u_queued_complete,

This looks like it's either a short read or it's page cache
corruption. I've confirmed that the data on disk is correct when the
validation fails, but the data in the page cache is not correct.

That is, when the fio verification fails, the second 32kB of the
64kB data block returned does not match the expected data to be
returned. Using the options:

verify_fatal=1
verify_dump=1

and getting rid of the "unlink=1" option from the config file
confirms that reading the data using xfs_io -c "pread -v <off> 64k"
returns the bad data.

Unmounting the filesystem and mounting it again (or using direct IO
to bypass the page cache), and repeating the xfs_io read returns
64kB of data identical to the expected data dump except for 16 bytes
in the block header that have some minor differences. I'm not sure
this is expected or not, but we can ignore it to begin with because
it is clear that there's exactly 8 pages of bad data in the page
cache in this range.

So, add:

verify=pattern
verify_pattern=%o

to have the buffer stamped with file offset data rather than random
data, and it turns out that the bad half of the buffer has an
incorrect file offset, but the offset in the entire range on disk is
correct.

Ok, so now I have confirmed that the data is valid on disk, but
incorrect in cache. That means the buffered write did contain
correct data in the cache, and that it was written to disk
correctly. So some time between the writeback completing and the
data being read, we've ended up with stale data in the page
cache....

This corruption only appears to happen with io_uring based buffered
IO - syscall based buffered IO and buffered IO with AIO doesn't
trigger it at all. Hence I suspect there is bug somewhere in the
io_uring code or in a code path that only the io_uring code path
tickles.

I can't really make head or tail of the io_uring code and there's no
obvious way to debug exactly what the user application is asking the
filesystem to do or what the kernel it returning to the filesytsem
(e.g. strace doesn't work). Hence I suspect that this needs the
io_uring folk to look at it and isolate it down to the operation
that is corrupting the page cache. I'd love to know how we are
can triage issues like this in the field given the tools we
normally use to triage and debug data corruption issues seem to be
largely useless...

Cheers,

Dave.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2020-08-07  3:12 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-06  4:57 [Bug 208827] New: [fio io_uring] io_uring write data crc32c verify failed bugzilla-daemon
2020-08-07  2:42 ` Dave Chinner
2020-08-07  3:12 ` bugzilla-daemon [this message]
2020-08-10  0:09   ` [Bug 208827] " Dave Chinner
2020-08-10  3:56     ` Dave Chinner
2020-08-10  7:08       ` Dave Chinner
2020-08-10  9:08         ` Dave Chinner
2020-08-11  1:15           ` Jens Axboe
2020-08-11  1:50             ` Jens Axboe
2020-08-11  2:01               ` Jens Axboe
2020-08-11  3:01                 ` Jens Axboe
2020-08-11 20:56                 ` Jeff Moyer
2020-08-11 22:09                   ` Dave Chinner
2020-08-12 15:13                     ` Jens Axboe
2020-08-12 15:24                       ` Jeff Moyer
2020-08-12 15:26                         ` Jens Axboe
2020-08-11  2:00           ` Dave Chinner
2020-08-11  2:19             ` Jens Axboe
2020-08-11  5:53               ` Dave Chinner
2020-08-11  7:05               ` Dave Chinner
2020-08-11 13:10                 ` Jens Axboe
2020-08-11 21:59                   ` Dave Chinner
2020-08-11 23:00                     ` Dave Chinner
2020-08-12 15:19                       ` Jens Axboe
2020-08-11  1:07         ` Jens Axboe
2020-08-10  0:09 ` bugzilla-daemon
2020-08-10  3:56 ` bugzilla-daemon
2020-08-10  7:08 ` bugzilla-daemon
2020-08-10  9:09 ` bugzilla-daemon
2020-08-11  1:07 ` bugzilla-daemon
2020-08-11  1:15 ` bugzilla-daemon
2020-08-11  1:50 ` bugzilla-daemon
2020-08-11  2:00 ` bugzilla-daemon
2020-08-11  2:01 ` bugzilla-daemon
2020-08-11  2:20 ` bugzilla-daemon
2020-08-11  3:01 ` bugzilla-daemon
2020-08-11  5:53 ` bugzilla-daemon
2020-08-11  7:05 ` bugzilla-daemon
2020-08-11 13:10 ` bugzilla-daemon
2020-08-11 16:16 ` bugzilla-daemon
2020-08-11 20:56 ` bugzilla-daemon
2020-08-11 21:59 ` bugzilla-daemon
2020-08-11 22:09 ` bugzilla-daemon
2020-08-11 23:00 ` bugzilla-daemon
2020-08-12  3:15 ` bugzilla-daemon
2020-08-12 15:14 ` bugzilla-daemon
2020-08-12 15:19 ` bugzilla-daemon
2020-08-12 15:24 ` bugzilla-daemon
2020-08-12 15:26 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-208827-201763-ubSctIQBY4@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.