All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] io_uring: fix short read slow path corruptions
@ 2022-06-29  4:49 Dominique Martinet
  2022-06-29  5:23 ` Dominique Martinet
  2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
  0 siblings, 2 replies; 17+ messages in thread
From: Dominique Martinet @ 2022-06-29  4:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Dominique Martinet

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but apparently btrfs with O_DIRECT (cache=none) does.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---

I just spent a couple of days on this bug, will follow up with kernel to
see if we can also not get rid of the short read but perhaps a warning
should be added the first time we get a short read, as it's not supposed
to happen?
Well, slow path now seems to work (at least my VM now boots fine), but
if the code clearly states it should never be used I assume there might
be other bugs laying there as it's not tested... That this one was easy
enough to spot once I noticed the short reads was its only grace...

Thanks!

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..d58aff9615ce 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH] io_uring: fix short read slow path corruptions
  2022-06-29  4:49 [PATCH] io_uring: fix short read slow path corruptions Dominique Martinet
@ 2022-06-29  5:23 ` Dominique Martinet
  2022-06-29  8:46   ` Kevin Wolf
  2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
  1 sibling, 1 reply; 17+ messages in thread
From: Dominique Martinet @ 2022-06-29  5:23 UTC (permalink / raw)
  To: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi, Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, qemu-block, qemu-devel, Dominique Martinet

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but apparently btrfs with O_DIRECT (cache=none) does.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
Forgive the double mail if it gets to you twice: I missed Ccs on the first
try, I should have known better...

I just spent a couple of days on this bug, will follow up with kernel to
see if we can also not get rid of the short read but perhaps a warning
should be added the first time we get a short read, as it's not supposed
to happen?
Well, slow path now seems to work (at least my VM now boots fine), but
if the code clearly states it should never be used I assume there might
be other bugs laying there as it's not tested... That this one was easy
enough to spot once I noticed the short reads was its only grace...

Thanks!

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..d58aff9615ce 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] io_uring: fix short read slow path corruptions
  2022-06-29  5:23 ` Dominique Martinet
@ 2022-06-29  8:46   ` Kevin Wolf
  2022-06-29 10:22     ` Dominique Martinet
  0 siblings, 1 reply; 17+ messages in thread
From: Kevin Wolf @ 2022-06-29  8:46 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi,
	Stefano Garzarella, Hanna Reitz, qemu-block, qemu-devel

Am 29.06.2022 um 07:23 hat Dominique Martinet geschrieben:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
> 
> Normally recent versions of linux will not issue short reads,
> but apparently btrfs with O_DIRECT (cache=none) does.
> 
> This lead to weird image corruptions when short read happened
> 
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> Forgive the double mail if it gets to you twice: I missed Ccs on the first
> try, I should have known better...
> 
> I just spent a couple of days on this bug, will follow up with kernel to
> see if we can also not get rid of the short read but perhaps a warning
> should be added the first time we get a short read, as it's not supposed
> to happen?
> Well, slow path now seems to work (at least my VM now boots fine), but
> if the code clearly states it should never be used I assume there might
> be other bugs laying there as it's not tested... That this one was easy
> enough to spot once I noticed the short reads was its only grace...
> 
> Thanks!
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index d48e472e74cb..d58aff9615ce 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
>                        remaining);
>  
>      /* Update sqe */
> -    luringcb->sqeq.off = nread;
> +    luringcb->sqeq.off += nread;
>      luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
>      luringcb->sqeq.len = luringcb->resubmit_qiov.niov;

I see this a few lines above:

    /* Update read position */
    luringcb->total_read = nread;

Doesn't it have the same problem? Though maybe getting two short reads
is more of a theoretical case.

Kevin



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] io_uring: fix short read slow path corruptions
  2022-06-29  8:46   ` Kevin Wolf
@ 2022-06-29 10:22     ` Dominique Martinet
  0 siblings, 0 replies; 17+ messages in thread
From: Dominique Martinet @ 2022-06-29 10:22 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi,
	Stefano Garzarella, Hanna Reitz, qemu-block, qemu-devel

Kevin Wolf wrote on Wed, Jun 29, 2022 at 10:46:08AM +0200:
> I see this a few lines above:
> 
>     /* Update read position */
>     luringcb->total_read = nread;
> 
> Doesn't it have the same problem? Though maybe getting two short reads
> is more of a theoretical case.

Good catch, I'll send a v2 with that one adjusted as well tomorrow -- I
had logs and can confirm I didn't get two short reads in a row...

I wonder if there'd be a way to test? E.g. act like we've requested n
bytes but actually only fill in a bit less in the iov?

-- 
Dominique


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] io_uring: fix short read slow path
  2022-06-29  4:49 [PATCH] io_uring: fix short read slow path corruptions Dominique Martinet
  2022-06-29  5:23 ` Dominique Martinet
@ 2022-06-30  1:01 ` Dominique Martinet
  2022-06-30 15:43   ` Hanna Reitz
                     ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Dominique Martinet @ 2022-06-30  1:01 UTC (permalink / raw)
  To: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi, Stefano Garzarella
  Cc: Kevin Wolf, Hanna Reitz, qemu-block, qemu-devel, Filipe Manana,
	Dominique Martinet

sqeq.off here is the offset to read within the disk image, so obviously
not 'nread' (the amount we just read), but as the author meant to write
its current value incremented by the amount we just read.

Normally recent versions of linux will not issue short reads,
but it can happen so we should fix this.

This lead to weird image corruptions when short read happened

Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
---
v1 -> v2: also updated total_read to use += as suggested by Kevin,
thank you!

I've tested this quickly by making short reads "recursives", e.g. added
the following to luring_resubmit_short_read() after setting 'remaining':
    if (remaining > 4096) remaining -= 4096;

so when we ask for more we issue an extra short reads, making sure we go
through the two short reads path.
(Unfortunately I wasn't quite sure what to fiddle with to issue short
reads in the first place, I tried cutting one of the iovs short in
luring_do_submit() but I must not have been doing it properly as I ended
up with 0 return values which are handled by filling in with 0 (reads
after eof) and that didn't work well)

Anyway, this looks OK to me now.

Thanks,
Dominique

 block/io_uring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..b238661740f5 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -89,7 +89,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
     trace_luring_resubmit_short_read(s, luringcb, nread);
 
     /* Update read position */
-    luringcb->total_read = nread;
+    luringcb->total_read += nread;
     remaining = luringcb->qiov->size - luringcb->total_read;
 
     /* Shorten qiov */
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-- 
2.35.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
@ 2022-06-30 15:43   ` Hanna Reitz
  2022-06-30 15:49   ` Stefano Garzarella
  2022-07-05 13:34   ` Stefan Hajnoczi
  2 siblings, 0 replies; 17+ messages in thread
From: Hanna Reitz @ 2022-06-30 15:43 UTC (permalink / raw)
  To: Dominique Martinet, Aarushi Mehta, Julia Suvorova,
	Stefan Hajnoczi, Stefano Garzarella
  Cc: Kevin Wolf, qemu-block, qemu-devel, Filipe Manana

On 30.06.22 03:01, Dominique Martinet wrote:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
>
> Normally recent versions of linux will not issue short reads,
> but it can happen so we should fix this.
>
> This lead to weird image corruptions when short read happened
>
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> v1 -> v2: also updated total_read to use += as suggested by Kevin,
> thank you!
>
> I've tested this quickly by making short reads "recursives", e.g. added
> the following to luring_resubmit_short_read() after setting 'remaining':
>      if (remaining > 4096) remaining -= 4096;
>
> so when we ask for more we issue an extra short reads, making sure we go
> through the two short reads path.
> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> reads in the first place, I tried cutting one of the iovs short in
> luring_do_submit() but I must not have been doing it properly as I ended
> up with 0 return values which are handled by filling in with 0 (reads
> after eof) and that didn't work well)
>
> Anyway, this looks OK to me now.
>
> Thanks,
> Dominique
>
>   block/io_uring.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Hanna Reitz <hreitz@redhat.com>



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
  2022-06-30 15:43   ` Hanna Reitz
@ 2022-06-30 15:49   ` Stefano Garzarella
  2022-06-30 22:52     ` Dominique Martinet
  2022-07-05 13:34   ` Stefan Hajnoczi
  2 siblings, 1 reply; 17+ messages in thread
From: Stefano Garzarella @ 2022-06-30 15:49 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana

On Thu, Jun 30, 2022 at 10:01:37AM +0900, Dominique Martinet wrote:
>sqeq.off here is the offset to read within the disk image, so obviously
>not 'nread' (the amount we just read), but as the author meant to write
>its current value incremented by the amount we just read.
>
>Normally recent versions of linux will not issue short reads,
>but it can happen so we should fix this.
>
>This lead to weird image corruptions when short read happened
>
>Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
>Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
>Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>

Thanks for fixing this issue!

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>

>---
>v1 -> v2: also updated total_read to use += as suggested by Kevin,
>thank you!
>
>I've tested this quickly by making short reads "recursives", e.g. added
>the following to luring_resubmit_short_read() after setting 'remaining':
>    if (remaining > 4096) remaining -= 4096;
>
>so when we ask for more we issue an extra short reads, making sure we go
>through the two short reads path.
>(Unfortunately I wasn't quite sure what to fiddle with to issue short
>reads in the first place, I tried cutting one of the iovs short in
>luring_do_submit() but I must not have been doing it properly as I ended
>up with 0 return values which are handled by filling in with 0 (reads
>after eof) and that didn't work well)

Do you remember the kernel version where you first saw these problems?

Thanks,
Stefano



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30 15:49   ` Stefano Garzarella
@ 2022-06-30 22:52     ` Dominique Martinet
  2022-07-01  1:33       ` Dominique Martinet
  2022-07-05 13:28       ` Stefan Hajnoczi
  0 siblings, 2 replies; 17+ messages in thread
From: Dominique Martinet @ 2022-06-30 22:52 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana

Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > so when we ask for more we issue an extra short reads, making sure we go
> > through the two short reads path.
> > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > reads in the first place, I tried cutting one of the iovs short in
> > luring_do_submit() but I must not have been doing it properly as I ended
> > up with 0 return values which are handled by filling in with 0 (reads
> > after eof) and that didn't work well)
> 
> Do you remember the kernel version where you first saw these problems?

Since you're quoting my paragraph about testing two short reads, I've
never seen any that I know of; but there's also no reason these couldn't
happen.

Single short reads have been happening for me with O_DIRECT (cache=none)
on btrfs for a while, but unfortunately I cannot remember which was the
first kernel I've seen this on -- I think rather than a kernel update it
was due to file manipulations that made the file eligible for short
reads in the first place (I started running deduplication on the backing
file)

The older kernel I have installed right now is 5.16 and that can
reproduce it --  I'll give my laptop some work over the weekend to test
still maintained stable branches if that's useful.

-- 
Dominique


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30 22:52     ` Dominique Martinet
@ 2022-07-01  1:33       ` Dominique Martinet
  2022-07-05 13:28       ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Dominique Martinet @ 2022-07-01  1:33 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Aarushi Mehta, Julia Suvorova, Stefan Hajnoczi, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana

Dominique Martinet wrote on Fri, Jul 01, 2022 at 07:52:31AM +0900:
> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > > so when we ask for more we issue an extra short reads, making sure we go
> > > through the two short reads path.
> > > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > > reads in the first place, I tried cutting one of the iovs short in
> > > luring_do_submit() but I must not have been doing it properly as I ended
> > > up with 0 return values which are handled by filling in with 0 (reads
> > > after eof) and that didn't work well)
> > 
> > Do you remember the kernel version where you first saw these problems?
> 
> Since you're quoting my paragraph about testing two short reads, I've
> never seen any that I know of; but there's also no reason these couldn't
> happen.
> 
> Single short reads have been happening for me with O_DIRECT (cache=none)
> on btrfs for a while, but unfortunately I cannot remember which was the
> first kernel I've seen this on -- I think rather than a kernel update it
> was due to file manipulations that made the file eligible for short
> reads in the first place (I started running deduplication on the backing
> file)
> 
> The older kernel I have installed right now is 5.16 and that can
> reproduce it --  I'll give my laptop some work over the weekend to test
> still maintained stable branches if that's useful.

I can confirm the same behavior happens with 5.4.202
I'm not sure about older kernels as my io_uring test that reproduces
short reads just doesn't start on these, but for all intent and purposes
we can probably say this has just always been possible.

The fix Filipe sent me unfortunately doesn't apply as far back as 5.4
(btrfs_get_blocks_direct had no flags argument back then, that got
introduced with conversion to dio in 5.8), but should probably work as
is for 5.10/15 kernels.

-- 
Dominique


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30 22:52     ` Dominique Martinet
  2022-07-01  1:33       ` Dominique Martinet
@ 2022-07-05 13:28       ` Stefan Hajnoczi
  2022-07-05 19:23         ` Jens Axboe
  2022-07-05 22:52         ` Dominique Martinet
  1 sibling, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-07-05 13:28 UTC (permalink / raw)
  To: Dominique Martinet, Jens Axboe
  Cc: Stefano Garzarella, Aarushi Mehta, Julia Suvorova, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana, io-uring

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> > > so when we ask for more we issue an extra short reads, making sure we go
> > > through the two short reads path.
> > > (Unfortunately I wasn't quite sure what to fiddle with to issue short
> > > reads in the first place, I tried cutting one of the iovs short in
> > > luring_do_submit() but I must not have been doing it properly as I ended
> > > up with 0 return values which are handled by filling in with 0 (reads
> > > after eof) and that didn't work well)
> > 
> > Do you remember the kernel version where you first saw these problems?
> 
> Since you're quoting my paragraph about testing two short reads, I've
> never seen any that I know of; but there's also no reason these couldn't
> happen.
> 
> Single short reads have been happening for me with O_DIRECT (cache=none)
> on btrfs for a while, but unfortunately I cannot remember which was the
> first kernel I've seen this on -- I think rather than a kernel update it
> was due to file manipulations that made the file eligible for short
> reads in the first place (I started running deduplication on the backing
> file)
> 
> The older kernel I have installed right now is 5.16 and that can
> reproduce it --  I'll give my laptop some work over the weekend to test
> still maintained stable branches if that's useful.

Hi Dominique,
Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
async context"). The comment above QEMU's luring_resubmit_short_read()
claims that short reads are a bug that was fixed by Linux commit
9d93a3f5a0c.

If the comment is inaccurate it needs to be fixed. Maybe short writes
need to be handled too.

I have CCed Jens and the io_uring mailing list to clarify:
1. Are short IORING_OP_READV reads possible on files/block devices?
2. Are short IORING_OP_WRITEV writes possible on files/block devices?

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
  2022-06-30 15:43   ` Hanna Reitz
  2022-06-30 15:49   ` Stefano Garzarella
@ 2022-07-05 13:34   ` Stefan Hajnoczi
  2 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-07-05 13:34 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Aarushi Mehta, Julia Suvorova, Stefano Garzarella, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana

[-- Attachment #1: Type: text/plain, Size: 1683 bytes --]

On Thu, Jun 30, 2022 at 10:01:37AM +0900, Dominique Martinet wrote:
> sqeq.off here is the offset to read within the disk image, so obviously
> not 'nread' (the amount we just read), but as the author meant to write
> its current value incremented by the amount we just read.
> 
> Normally recent versions of linux will not issue short reads,
> but it can happen so we should fix this.
> 
> This lead to weird image corruptions when short read happened
> 
> Fixes: 6663a0a33764 ("block/io_uring: implements interfaces for io_uring")
> Link: https://lkml.kernel.org/r/YrrFGO4A1jS0GI0G@atmark-techno.com
> Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
> ---
> v1 -> v2: also updated total_read to use += as suggested by Kevin,
> thank you!
> 
> I've tested this quickly by making short reads "recursives", e.g. added
> the following to luring_resubmit_short_read() after setting 'remaining':
>     if (remaining > 4096) remaining -= 4096;
> 
> so when we ask for more we issue an extra short reads, making sure we go
> through the two short reads path.
> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> reads in the first place, I tried cutting one of the iovs short in
> luring_do_submit() but I must not have been doing it properly as I ended
> up with 0 return values which are handled by filling in with 0 (reads
> after eof) and that didn't work well)
> 
> Anyway, this looks OK to me now.
> 
> Thanks,
> Dominique
> 
>  block/io_uring.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-05 13:28       ` Stefan Hajnoczi
@ 2022-07-05 19:23         ` Jens Axboe
  2022-07-06  7:16           ` Stefan Hajnoczi
  2022-07-05 22:52         ` Dominique Martinet
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2022-07-05 19:23 UTC (permalink / raw)
  To: Stefan Hajnoczi, Dominique Martinet
  Cc: Stefano Garzarella, Aarushi Mehta, Julia Suvorova, Kevin Wolf,
	Hanna Reitz, qemu-block, qemu-devel, Filipe Manana, io-uring

On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
>> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
>>>> so when we ask for more we issue an extra short reads, making sure we go
>>>> through the two short reads path.
>>>> (Unfortunately I wasn't quite sure what to fiddle with to issue short
>>>> reads in the first place, I tried cutting one of the iovs short in
>>>> luring_do_submit() but I must not have been doing it properly as I ended
>>>> up with 0 return values which are handled by filling in with 0 (reads
>>>> after eof) and that didn't work well)
>>>
>>> Do you remember the kernel version where you first saw these problems?
>>
>> Since you're quoting my paragraph about testing two short reads, I've
>> never seen any that I know of; but there's also no reason these couldn't
>> happen.
>>
>> Single short reads have been happening for me with O_DIRECT (cache=none)
>> on btrfs for a while, but unfortunately I cannot remember which was the
>> first kernel I've seen this on -- I think rather than a kernel update it
>> was due to file manipulations that made the file eligible for short
>> reads in the first place (I started running deduplication on the backing
>> file)
>>
>> The older kernel I have installed right now is 5.16 and that can
>> reproduce it --  I'll give my laptop some work over the weekend to test
>> still maintained stable branches if that's useful.
> 
> Hi Dominique,
> Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> async context"). The comment above QEMU's luring_resubmit_short_read()
> claims that short reads are a bug that was fixed by Linux commit
> 9d93a3f5a0c.
> 
> If the comment is inaccurate it needs to be fixed. Maybe short writes
> need to be handled too.
> 
> I have CCed Jens and the io_uring mailing list to clarify:
> 1. Are short IORING_OP_READV reads possible on files/block devices?
> 2. Are short IORING_OP_WRITEV writes possible on files/block devices?

In general we try very hard to avoid them, but if eg we get a short read
or write from blocking context (eg io-wq), then io_uring does return
that. There's really not much we can do here, it seems futile to retry
IO which was issued just like it would've been from a normal blocking
syscall yet it is still short.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-05 13:28       ` Stefan Hajnoczi
  2022-07-05 19:23         ` Jens Axboe
@ 2022-07-05 22:52         ` Dominique Martinet
  2022-07-06  7:17           ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Dominique Martinet @ 2022-07-05 22:52 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jens Axboe, Stefano Garzarella, Aarushi Mehta, Julia Suvorova,
	Kevin Wolf, Hanna Reitz, qemu-block, qemu-devel, Filipe Manana,
	io-uring

Stefan Hajnoczi wrote on Tue, Jul 05, 2022 at 02:28:08PM +0100:
> > The older kernel I have installed right now is 5.16 and that can
> > reproduce it --  I'll give my laptop some work over the weekend to test
> > still maintained stable branches if that's useful.
> 
> Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> async context"). The comment above QEMU's luring_resubmit_short_read()
> claims that short reads are a bug that was fixed by Linux commit
> 9d93a3f5a0c.
> 
> If the comment is inaccurate it needs to be fixed. Maybe short writes
> need to be handled too.
> 
> I have CCed Jens and the io_uring mailing list to clarify:
> 1. Are short IORING_OP_READV reads possible on files/block devices?
> 2. Are short IORING_OP_WRITEV writes possible on files/block devices?

Jens replied before me, so I won't be adding much (I agree with his
reply -- linux tries hard to avoid short reads but we should assume they
can happen)

In this particular case it was another btrfs bug with O_DIRECT and mixed
compression in a file, that's been fixed by this patch:
https://lore.kernel.org/all/20220630151038.GA459423@falcondesktop/

queued here:
https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=dio_fixes&id=b3864441547e49a69d45c7771aa8cc5e595d18fc

It should be backported to 5.10, but the problem will likely persist in
5.4 kernels if anyone runs on that as the code changed enough to make
backporting non-trivial.


So, WRT that comment, we probably should remove the reference to that
commit and leave in that they should be very rare but we need to handle
them anyway.


For writes in particular, I haven't seen any and looking at the code
qemu would blow up that storage (IO treated as ENOSPC would likely mark
disk read-only?)
It might make sense to add some warning message that it's what happened
so it'll be obvious what needs doing in case anyone falls on that but I
think the status-quo is good enough here.

-- 
Dominique

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-05 19:23         ` Jens Axboe
@ 2022-07-06  7:16           ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-07-06  7:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stefan Hajnoczi, Dominique Martinet, Stefano Garzarella,
	Aarushi Mehta, Julia Suvorova, Kevin Wolf, Hanna Reitz,
	qemu block, qemu-devel, Filipe Manana, io-uring

On Tue, 5 Jul 2022 at 20:26, Jens Axboe <axboe@kernel.dk> wrote:
>
> On 7/5/22 7:28 AM, Stefan Hajnoczi wrote:
> > On Fri, Jul 01, 2022 at 07:52:31AM +0900, Dominique Martinet wrote:
> >> Stefano Garzarella wrote on Thu, Jun 30, 2022 at 05:49:21PM +0200:
> >>>> so when we ask for more we issue an extra short reads, making sure we go
> >>>> through the two short reads path.
> >>>> (Unfortunately I wasn't quite sure what to fiddle with to issue short
> >>>> reads in the first place, I tried cutting one of the iovs short in
> >>>> luring_do_submit() but I must not have been doing it properly as I ended
> >>>> up with 0 return values which are handled by filling in with 0 (reads
> >>>> after eof) and that didn't work well)
> >>>
> >>> Do you remember the kernel version where you first saw these problems?
> >>
> >> Since you're quoting my paragraph about testing two short reads, I've
> >> never seen any that I know of; but there's also no reason these couldn't
> >> happen.
> >>
> >> Single short reads have been happening for me with O_DIRECT (cache=none)
> >> on btrfs for a while, but unfortunately I cannot remember which was the
> >> first kernel I've seen this on -- I think rather than a kernel update it
> >> was due to file manipulations that made the file eligible for short
> >> reads in the first place (I started running deduplication on the backing
> >> file)
> >>
> >> The older kernel I have installed right now is 5.16 and that can
> >> reproduce it --  I'll give my laptop some work over the weekend to test
> >> still maintained stable branches if that's useful.
> >
> > Hi Dominique,
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> In general we try very hard to avoid them, but if eg we get a short read
> or write from blocking context (eg io-wq), then io_uring does return
> that. There's really not much we can do here, it seems futile to retry
> IO which was issued just like it would've been from a normal blocking
> syscall yet it is still short.

Thanks! QEMU's short I/O handling is spotty - some code paths handle
it while others don't. For the io_uring QEMU block driver we'll try to
handle short all I/Os.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-05 22:52         ` Dominique Martinet
@ 2022-07-06  7:17           ` Stefan Hajnoczi
  2022-07-06  7:26             ` Dominique Martinet
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-07-06  7:17 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Stefan Hajnoczi, Jens Axboe, Stefano Garzarella, Aarushi Mehta,
	Julia Suvorova, Kevin Wolf, Hanna Reitz, qemu block, qemu-devel,
	Filipe Manana, io-uring

On Tue, 5 Jul 2022 at 23:53, Dominique Martinet
<dominique.martinet@atmark-techno.com> wrote:
>
> Stefan Hajnoczi wrote on Tue, Jul 05, 2022 at 02:28:08PM +0100:
> > > The older kernel I have installed right now is 5.16 and that can
> > > reproduce it --  I'll give my laptop some work over the weekend to test
> > > still maintained stable branches if that's useful.
> >
> > Linux 5.16 contains commit 9d93a3f5a0c ("io_uring: punt short reads to
> > async context"). The comment above QEMU's luring_resubmit_short_read()
> > claims that short reads are a bug that was fixed by Linux commit
> > 9d93a3f5a0c.
> >
> > If the comment is inaccurate it needs to be fixed. Maybe short writes
> > need to be handled too.
> >
> > I have CCed Jens and the io_uring mailing list to clarify:
> > 1. Are short IORING_OP_READV reads possible on files/block devices?
> > 2. Are short IORING_OP_WRITEV writes possible on files/block devices?
>
> Jens replied before me, so I won't be adding much (I agree with his
> reply -- linux tries hard to avoid short reads but we should assume they
> can happen)
>
> In this particular case it was another btrfs bug with O_DIRECT and mixed
> compression in a file, that's been fixed by this patch:
> https://lore.kernel.org/all/20220630151038.GA459423@falcondesktop/
>
> queued here:
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=dio_fixes&id=b3864441547e49a69d45c7771aa8cc5e595d18fc
>
> It should be backported to 5.10, but the problem will likely persist in
> 5.4 kernels if anyone runs on that as the code changed enough to make
> backporting non-trivial.
>
>
> So, WRT that comment, we probably should remove the reference to that
> commit and leave in that they should be very rare but we need to handle
> them anyway.
>
>
> For writes in particular, I haven't seen any and looking at the code
> qemu would blow up that storage (IO treated as ENOSPC would likely mark
> disk read-only?)
> It might make sense to add some warning message that it's what happened
> so it'll be obvious what needs doing in case anyone falls on that but I
> think the status-quo is good enough here.

Great! I've already queued your fix.

Do you want to send a follow-up that updates the comment?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-06  7:17           ` Stefan Hajnoczi
@ 2022-07-06  7:26             ` Dominique Martinet
  2022-07-06  7:51               ` Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Dominique Martinet @ 2022-07-06  7:26 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, Jens Axboe, Stefano Garzarella, Aarushi Mehta,
	Julia Suvorova, Kevin Wolf, Hanna Reitz, qemu block, qemu-devel,
	Filipe Manana, io-uring

Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> Great! I've already queued your fix.

Thanks!

> Do you want to send a follow-up that updates the comment?

I don't think I'd add much value at this point, leaving it to you unless
you really would prefer me to send it.


Cheers,
-- 
Dominique

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] io_uring: fix short read slow path
  2022-07-06  7:26             ` Dominique Martinet
@ 2022-07-06  7:51               ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2022-07-06  7:51 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Stefan Hajnoczi, Jens Axboe, Stefano Garzarella, Aarushi Mehta,
	Julia Suvorova, Kevin Wolf, Hanna Reitz, qemu block, qemu-devel,
	Filipe Manana, io-uring

[-- Attachment #1: Type: text/plain, Size: 435 bytes --]

On Wed, Jul 06, 2022 at 04:26:59PM +0900, Dominique Martinet wrote:
> Stefan Hajnoczi wrote on Wed, Jul 06, 2022 at 08:17:42AM +0100:
> > Great! I've already queued your fix.
> 
> Thanks!
> 
> > Do you want to send a follow-up that updates the comment?
> 
> I don't think I'd add much value at this point, leaving it to you unless
> you really would prefer me to send it.

That's fine. I'll send a patch. Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2022-07-06  7:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-29  4:49 [PATCH] io_uring: fix short read slow path corruptions Dominique Martinet
2022-06-29  5:23 ` Dominique Martinet
2022-06-29  8:46   ` Kevin Wolf
2022-06-29 10:22     ` Dominique Martinet
2022-06-30  1:01 ` [PATCH v2] io_uring: fix short read slow path Dominique Martinet
2022-06-30 15:43   ` Hanna Reitz
2022-06-30 15:49   ` Stefano Garzarella
2022-06-30 22:52     ` Dominique Martinet
2022-07-01  1:33       ` Dominique Martinet
2022-07-05 13:28       ` Stefan Hajnoczi
2022-07-05 19:23         ` Jens Axboe
2022-07-06  7:16           ` Stefan Hajnoczi
2022-07-05 22:52         ` Dominique Martinet
2022-07-06  7:17           ` Stefan Hajnoczi
2022-07-06  7:26             ` Dominique Martinet
2022-07-06  7:51               ` Stefan Hajnoczi
2022-07-05 13:34   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.