All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] file-posix: alignment probing improvements
@ 2022-11-01 19:00 Stefan Hajnoczi
  2022-11-01 19:00 ` [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned Stefan Hajnoczi
  2022-11-01 19:00 ` [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support Stefan Hajnoczi
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-01 19:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: hreitz, qemu-block, Kevin Wolf, nsoffer, Stefan Hajnoczi

These patches fix alignment probing with dm-crypt and add support for the new
Linux statx(STATX_DIOALIGN) interface.

Stefan Hajnoczi (2):
  file-posix: fix Linux alignment probing when EIO is returned
  file-posix: add statx(STATX_DIOALIGN) support

 block/file-posix.c | 96 +++++++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 44 deletions(-)

-- 
2.38.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-01 19:00 [PATCH 0/2] file-posix: alignment probing improvements Stefan Hajnoczi
@ 2022-11-01 19:00 ` Stefan Hajnoczi
  2022-11-02  2:27   ` Eric Biggers
  2022-11-01 19:00 ` [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support Stefan Hajnoczi
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-01 19:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: hreitz, qemu-block, Kevin Wolf, nsoffer, Stefan Hajnoczi

Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
Alignment probing fails on dm-crypt devices because the code expects
EINVAL.

Treating any errno as an "unaligned" indicator would be easy, but breaks
commit 22d182e82b4b ("block/raw-posix: fix launching with failed
disks"). Offline disks return EIO for correctly aligned requests and
EINVAL for unaligned requests.

It's possible to make both dm-crypt and offline disks work: look for the
transition from EINVAL to EIO instead of for a single EINVAL value.

Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
Fixes: 22d182e82b4b ("block/raw-posix: fix launching with failed disks")
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 42 +++++++++++++++---------------------------
 1 file changed, 15 insertions(+), 27 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b9647c5ffc..b9d62f52fe 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -355,31 +355,6 @@ static bool raw_needs_alignment(BlockDriverState *bs)
     return s->force_alignment;
 }
 
-/* Check if read is allowed with given memory buffer and length.
- *
- * This function is used to check O_DIRECT memory buffer and request alignment.
- */
-static bool raw_is_io_aligned(int fd, void *buf, size_t len)
-{
-    ssize_t ret = pread(fd, buf, len, 0);
-
-    if (ret >= 0) {
-        return true;
-    }
-
-#ifdef __linux__
-    /* The Linux kernel returns EINVAL for misaligned O_DIRECT reads.  Ignore
-     * other errors (e.g. real I/O error), which could happen on a failed
-     * drive, since we only care about probing alignment.
-     */
-    if (errno != EINVAL) {
-        return true;
-    }
-#endif
-
-    return false;
-}
-
 static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 {
     BDRVRawState *s = bs->opaque;
@@ -426,34 +401,47 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
      * try to detect buf_align, which cannot be detected in some cases (e.g.
      * Gluster). If buf_align cannot be detected, we fallback to the value of
      * request_alignment.
+     *
+     * The probing loop keeps track of the last errno so that the alignment of
+     * offline disks can be probed. On Linux pread(2) returns with errno EINVAL
+     * for most file descriptors when O_DIRECT alignment constraints are unmet.
+     * Offline disks fail correctly aligned pread(2) with EIO. Therefore it's
+     * possible to detect alignment on offline disks by observing when the
+     * errno changes from EINVAL to something else.
      */
 
     if (!bs->bl.request_alignment) {
+        int last_errno = 0;
         int i;
         size_t align;
         buf = qemu_memalign(max_align, max_align);
         for (i = 0; i < ARRAY_SIZE(alignments); i++) {
             align = alignments[i];
-            if (raw_is_io_aligned(fd, buf, align)) {
+            if (pread(fd, buf, align, 0) >= 0 ||
+                (errno != EINVAL && last_errno == EINVAL)) {
                 /* Fallback to safe value. */
                 bs->bl.request_alignment = (align != 1) ? align : max_align;
                 break;
             }
+            last_errno = errno;
         }
         qemu_vfree(buf);
     }
 
     if (!s->buf_align) {
+        int last_errno = 0;
         int i;
         size_t align;
         buf = qemu_memalign(max_align, 2 * max_align);
         for (i = 0; i < ARRAY_SIZE(alignments); i++) {
             align = alignments[i];
-            if (raw_is_io_aligned(fd, buf + align, max_align)) {
+            if (pread(fd, buf + align, max_align, 0) >= 0 ||
+                (errno != EINVAL && last_errno == EINVAL)) {
                 /* Fallback to request_alignment. */
                 s->buf_align = (align != 1) ? align : bs->bl.request_alignment;
                 break;
             }
+            last_errno = errno;
         }
         qemu_vfree(buf);
     }
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support
  2022-11-01 19:00 [PATCH 0/2] file-posix: alignment probing improvements Stefan Hajnoczi
  2022-11-01 19:00 ` [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned Stefan Hajnoczi
@ 2022-11-01 19:00 ` Stefan Hajnoczi
  2022-11-02  3:32   ` Eric Biggers
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-01 19:00 UTC (permalink / raw)
  To: qemu-devel
  Cc: hreitz, qemu-block, Kevin Wolf, nsoffer, Stefan Hajnoczi, Eric Biggers

Linux commit 825cf206ed51 ("statx: add direct I/O alignment
information") added an interface to fetch O_DIRECT alignment values for
block devices and file systems.

Prefer STATX_DIOALIGN to older interfaces and probing, but keep them as
fallbacks in case STATX_DIOALIGN cannot provide the information.

Testing shows the status of STATX_DIOALIGN support in Linux 6.1-rc3
appears to be:
- btrfs: no
- ext4: yes
- XFS: yes
- NVMe block devices: yes
- dm-crypt: yes

Cc: Eric Biggers <ebiggers@google.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 54 +++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 17 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index b9d62f52fe..00d8191880 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -372,28 +372,48 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
 
     bs->bl.request_alignment = 0;
     s->buf_align = 0;
+
+#if defined(__linux__) && defined(STATX_DIOALIGN)
+    struct statx stx;
+
+    /*
+     * Linux 6.1 introduced an interface for both block devices and file
+     * systems. The system call returns with the STATX_DIOALIGN bit cleared
+     * when the information is unavailable.
+     */
+    if (statx(fd, "", AT_EMPTY_PATH, STATX_DIOALIGN, &stx) == 0 &&
+        (stx.stx_mask & STATX_DIOALIGN)) {
+        bs->bl.request_alignment = stx.stx_dio_offset_align;
+        s->buf_align = stx.stx_dio_mem_align;
+    }
+#endif /* defined(__linux__) && defined(STATX_DIOALIGN) */
+
     /* Let's try to use the logical blocksize for the alignment. */
-    if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
-        bs->bl.request_alignment = 0;
+    if (!bs->bl.request_alignment) {
+        if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
+            bs->bl.request_alignment = 0;
+        }
     }
 
 #ifdef __linux__
-    /*
-     * The XFS ioctl definitions are shipped in extra packages that might
-     * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
-     * here, we simply use our own definition instead:
-     */
-    struct xfs_dioattr {
-        uint32_t d_mem;
-        uint32_t d_miniosz;
-        uint32_t d_maxiosz;
-    } da;
-    if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
-        bs->bl.request_alignment = da.d_miniosz;
-        /* The kernel returns wrong information for d_mem */
-        /* s->buf_align = da.d_mem; */
+    if (!bs->bl.request_alignment) {
+        /*
+         * The XFS ioctl definitions are shipped in extra packages that might
+         * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
+         * here, we simply use our own definition instead:
+         */
+        struct xfs_dioattr {
+            uint32_t d_mem;
+            uint32_t d_miniosz;
+            uint32_t d_maxiosz;
+        } da;
+        if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
+            bs->bl.request_alignment = da.d_miniosz;
+            /* The kernel returns wrong information for d_mem */
+            /* s->buf_align = da.d_mem; */
+        }
     }
-#endif
+#endif /* __linux__ */
 
     /*
      * If we could not get the sizes so far, we can only guess them. First try
-- 
2.38.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-01 19:00 ` [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned Stefan Hajnoczi
@ 2022-11-02  2:27   ` Eric Biggers
  2022-11-02  2:49     ` Eric Biggers
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Biggers @ 2022-11-02  2:27 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, hreitz, qemu-block, Kevin Wolf, nsoffer

On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.

Citation needed.  For direct I/O to block devices, the kernel's block layer
checks the alignment before the I/O is actually submitted to the underlying
block device.  See
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306

> Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290

That "bug" seems to be based on a misunderstanding of the kernel source code,
and not any actual testing.

I just tested it, and the error code is EINVAL.

- Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-02  2:27   ` Eric Biggers
@ 2022-11-02  2:49     ` Eric Biggers
  2022-11-02 18:50       ` Stefan Hajnoczi
  2022-11-03  9:52       ` Kevin Wolf
  0 siblings, 2 replies; 13+ messages in thread
From: Eric Biggers @ 2022-11-02  2:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, hreitz, qemu-block, Kevin Wolf, nsoffer

On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> 
> Citation needed.  For direct I/O to block devices, the kernel's block layer
> checks the alignment before the I/O is actually submitted to the underlying
> block device.  See
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> 
> > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> 
> That "bug" seems to be based on a misunderstanding of the kernel source code,
> and not any actual testing.
> 
> I just tested it, and the error code is EINVAL.
> 

I think I see what's happening.  The kernel code was broken just a few months
ago, in v6.0 by the commit "block: relax direct io memory alignment"
(https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
through when the user buffer is only aligned to the device's dma_alignment.  But
a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
(and thus also the logical block size) is 4096.  So there is now a case where
misaligned DIO can reach dm-crypt, when that shouldn't be possible.

It also means that STATX_DIOALIGN will give the wrong value for
stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
STATX_DIOALIGN for block devices relies on the dma_alignment.

I'll raise this on the linux-block and dm-devel mailing lists.  It would be nice
if people reported kernel bugs instead of silently working around them...

- Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support
  2022-11-01 19:00 ` [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support Stefan Hajnoczi
@ 2022-11-02  3:32   ` Eric Biggers
  2022-11-02 18:53     ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Biggers @ 2022-11-02  3:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, hreitz, qemu-block, Kevin Wolf, nsoffer

On Tue, Nov 01, 2022 at 03:00:31PM -0400, Stefan Hajnoczi wrote:
>      /* Let's try to use the logical blocksize for the alignment. */
> -    if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
> -        bs->bl.request_alignment = 0;
> +    if (!bs->bl.request_alignment) {
> +        if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
> +            bs->bl.request_alignment = 0;
> +        }
>      }
>  
>  #ifdef __linux__
> -    /*
> -     * The XFS ioctl definitions are shipped in extra packages that might
> -     * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
> -     * here, we simply use our own definition instead:
> -     */
> -    struct xfs_dioattr {
> -        uint32_t d_mem;
> -        uint32_t d_miniosz;
> -        uint32_t d_maxiosz;
> -    } da;
> -    if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
> -        bs->bl.request_alignment = da.d_miniosz;
> -        /* The kernel returns wrong information for d_mem */
> -        /* s->buf_align = da.d_mem; */
> +    if (!bs->bl.request_alignment) {

This patch changes the fallback code to make the request_alignment value from
probe_logical_blocksize() override the value from XFS_IOC_DIOINFO.  Is that
intentional?

> +        if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
> +            bs->bl.request_alignment = da.d_miniosz;
> +            /* The kernel returns wrong information for d_mem */
> +            /* s->buf_align = da.d_mem; */

Has this bug been reported to the XFS developers (linux-xfs@vger.kernel.org)?

- Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-02  2:49     ` Eric Biggers
@ 2022-11-02 18:50       ` Stefan Hajnoczi
  2022-11-03  9:52       ` Kevin Wolf
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-02 18:50 UTC (permalink / raw)
  To: Eric Biggers; +Cc: qemu-devel, hreitz, qemu-block, Kevin Wolf, nsoffer

[-- Attachment #1: Type: text/plain, Size: 1848 bytes --]

On Tue, Nov 01, 2022 at 07:49:20PM -0700, Eric Biggers wrote:
> On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > 
> > Citation needed.  For direct I/O to block devices, the kernel's block layer
> > checks the alignment before the I/O is actually submitted to the underlying
> > block device.  See
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > 
> > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > 
> > That "bug" seems to be based on a misunderstanding of the kernel source code,
> > and not any actual testing.
> > 
> > I just tested it, and the error code is EINVAL.
> > 
> 
> I think I see what's happening.  The kernel code was broken just a few months
> ago, in v6.0 by the commit "block: relax direct io memory alignment"
> (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
> through when the user buffer is only aligned to the device's dma_alignment.  But
> a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
> (and thus also the logical block size) is 4096.  So there is now a case where
> misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> 
> It also means that STATX_DIOALIGN will give the wrong value for
> stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> STATX_DIOALIGN for block devices relies on the dma_alignment.
> 
> I'll raise this on the linux-block and dm-devel mailing lists.  It would be nice
> if people reported kernel bugs instead of silently working around them...

Thanks! You have completed the picture of what's going on here.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support
  2022-11-02  3:32   ` Eric Biggers
@ 2022-11-02 18:53     ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-02 18:53 UTC (permalink / raw)
  To: Eric Biggers, pbonzini
  Cc: qemu-devel, hreitz, qemu-block, Kevin Wolf, nsoffer

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]

On Tue, Nov 01, 2022 at 08:32:30PM -0700, Eric Biggers wrote:
> On Tue, Nov 01, 2022 at 03:00:31PM -0400, Stefan Hajnoczi wrote:
> >      /* Let's try to use the logical blocksize for the alignment. */
> > -    if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
> > -        bs->bl.request_alignment = 0;
> > +    if (!bs->bl.request_alignment) {
> > +        if (probe_logical_blocksize(fd, &bs->bl.request_alignment) < 0) {
> > +            bs->bl.request_alignment = 0;
> > +        }
> >      }
> >  
> >  #ifdef __linux__
> > -    /*
> > -     * The XFS ioctl definitions are shipped in extra packages that might
> > -     * not always be available. Since we just need the XFS_IOC_DIOINFO ioctl
> > -     * here, we simply use our own definition instead:
> > -     */
> > -    struct xfs_dioattr {
> > -        uint32_t d_mem;
> > -        uint32_t d_miniosz;
> > -        uint32_t d_maxiosz;
> > -    } da;
> > -    if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
> > -        bs->bl.request_alignment = da.d_miniosz;
> > -        /* The kernel returns wrong information for d_mem */
> > -        /* s->buf_align = da.d_mem; */
> > +    if (!bs->bl.request_alignment) {
> 
> This patch changes the fallback code to make the request_alignment value from
> probe_logical_blocksize() override the value from XFS_IOC_DIOINFO.  Is that
> intentional?

Thanks for pointing out the bug. That was not intentional. Will fix.

> > +        if (ioctl(fd, _IOR('X', 30, struct xfs_dioattr), &da) >= 0) {
> > +            bs->bl.request_alignment = da.d_miniosz;
> > +            /* The kernel returns wrong information for d_mem */
> > +            /* s->buf_align = da.d_mem; */
> 
> Has this bug been reported to the XFS developers (linux-xfs@vger.kernel.org)?

Paolo: Do you remember if you reported this when you wrote commit
c25f53b06eba ("raw: Probe required direct I/O alignment")?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-02  2:49     ` Eric Biggers
  2022-11-02 18:50       ` Stefan Hajnoczi
@ 2022-11-03  9:52       ` Kevin Wolf
  2022-11-03 13:57         ` Stefan Hajnoczi
  2022-11-03 16:26         ` Eric Biggers
  1 sibling, 2 replies; 13+ messages in thread
From: Kevin Wolf @ 2022-11-03  9:52 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Stefan Hajnoczi, qemu-devel, hreitz, qemu-block, nsoffer

Am 02.11.2022 um 03:49 hat Eric Biggers geschrieben:
> On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > 
> > Citation needed.  For direct I/O to block devices, the kernel's block layer
> > checks the alignment before the I/O is actually submitted to the underlying
> > block device.  See
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > 
> > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > 
> > That "bug" seems to be based on a misunderstanding of the kernel source code,
> > and not any actual testing.
> > 
> > I just tested it, and the error code is EINVAL.
> > 
> 
> I think I see what's happening.  The kernel code was broken just a few months
> ago, in v6.0 by the commit "block: relax direct io memory alignment"
> (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
> through when the user buffer is only aligned to the device's dma_alignment.  But
> a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
> (and thus also the logical block size) is 4096.  So there is now a case where
> misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> 
> It also means that STATX_DIOALIGN will give the wrong value for
> stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> STATX_DIOALIGN for block devices relies on the dma_alignment.

In other words, STATX_DIOALIGN is unusable from the start because we
don't know whether the information it returns is actually correct? :-/

I guess we could still use the value returned by STATX_DIOALIGN as a
preferred value that we'll use if it survives probing, and otherwise
fall back to the same probing we've always been doing because there was
no (or no sane) way to get the information from the kernel.

> I'll raise this on the linux-block and dm-devel mailing lists.  It
> would be nice if people reported kernel bugs instead of silently
> working around them...

I wasn't involved in this specific one, but in case you're wondering why
I wouldn't have reported it either...

On one hand, I agree with you because I want bugs in my code reported,
too, but on the other hand, it has also happened to me before that
you're treated as the stupid userland developer who doesn't know how
the kernel works and who should better have kept his ideas of how it
should work to himself - which is not exactly encouraging to report
things when you can just deal with the way they are. I wouldn't have
been able to tell whether in the mind of the respective maintainers,
-EINVAL is required behaviour or whether that was just a totally
unreasonable assumption on our side. Erring on the safe side, I'll give
up an assumption that obviously doesn't match reality.

And even a kernel fix now doesn't change that there are broken kernels
out there and we need to work with them. So reporting it doesn't even
solve our problem, it's just additional effort with limited expectations
of success.

Kevin



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-03  9:52       ` Kevin Wolf
@ 2022-11-03 13:57         ` Stefan Hajnoczi
  2022-11-03 16:26         ` Eric Biggers
  1 sibling, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-03 13:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Eric Biggers, qemu-devel, hreitz, qemu-block, nsoffer

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]

On Thu, Nov 03, 2022 at 10:52:43AM +0100, Kevin Wolf wrote:
> Am 02.11.2022 um 03:49 hat Eric Biggers geschrieben:
> > On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > > 
> > > Citation needed.  For direct I/O to block devices, the kernel's block layer
> > > checks the alignment before the I/O is actually submitted to the underlying
> > > block device.  See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > > 
> > > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > > 
> > > That "bug" seems to be based on a misunderstanding of the kernel source code,
> > > and not any actual testing.
> > > 
> > > I just tested it, and the error code is EINVAL.
> > > 
> > 
> > I think I see what's happening.  The kernel code was broken just a few months
> > ago, in v6.0 by the commit "block: relax direct io memory alignment"
> > (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
> > through when the user buffer is only aligned to the device's dma_alignment.  But
> > a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
> > (and thus also the logical block size) is 4096.  So there is now a case where
> > misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> > 
> > It also means that STATX_DIOALIGN will give the wrong value for
> > stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> > STATX_DIOALIGN for block devices relies on the dma_alignment.
> 
> In other words, STATX_DIOALIGN is unusable from the start because we
> don't know whether the information it returns is actually correct? :-/
> 
> I guess we could still use the value returned by STATX_DIOALIGN as a
> preferred value that we'll use if it survives probing, and otherwise
> fall back to the same probing we've always been doing because there was
> no (or no sane) way to get the information from the kernel.

Yes, it seems probing is required to verify the values returned by
STATX_DIOALIGN. At least until enough time passes so we can say
"STATX_DIOALIGN has been correct for X years and no one is running those
old kernels anymore".

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-03  9:52       ` Kevin Wolf
  2022-11-03 13:57         ` Stefan Hajnoczi
@ 2022-11-03 16:26         ` Eric Biggers
  2022-11-03 16:54           ` Eric Biggers
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Biggers @ 2022-11-03 16:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, hreitz, qemu-block, nsoffer

On Thu, Nov 03, 2022 at 10:52:43AM +0100, Kevin Wolf wrote:
> Am 02.11.2022 um 03:49 hat Eric Biggers geschrieben:
> > On Tue, Nov 01, 2022 at 07:27:16PM -0700, Eric Biggers wrote:
> > > On Tue, Nov 01, 2022 at 03:00:30PM -0400, Stefan Hajnoczi wrote:
> > > > Linux dm-crypt returns errno EIO from unaligned O_DIRECT pread(2) calls.
> > > 
> > > Citation needed.  For direct I/O to block devices, the kernel's block layer
> > > checks the alignment before the I/O is actually submitted to the underlying
> > > block device.  See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/fops.c?h=v6.1-rc3#n306
> > > 
> > > > Buglink: https://gitlab.com/qemu-project/qemu/-/issues/1290
> > > 
> > > That "bug" seems to be based on a misunderstanding of the kernel source code,
> > > and not any actual testing.
> > > 
> > > I just tested it, and the error code is EINVAL.
> > > 
> > 
> > I think I see what's happening.  The kernel code was broken just a few months
> > ago, in v6.0 by the commit "block: relax direct io memory alignment"
> > (https://git.kernel.org/linus/b1a000d3b8ec582d).  Now the block layer lets DIO
> > through when the user buffer is only aligned to the device's dma_alignment.  But
> > a dm-crypt device has a dma_alignment of 512 even when the crypto sector size
> > (and thus also the logical block size) is 4096.  So there is now a case where
> > misaligned DIO can reach dm-crypt, when that shouldn't be possible.
> > 
> > It also means that STATX_DIOALIGN will give the wrong value for
> > stx_dio_mem_align in the above case, 512 instead of 4096.  This is because
> > STATX_DIOALIGN for block devices relies on the dma_alignment.
> 
> In other words, STATX_DIOALIGN is unusable from the start because we
> don't know whether the information it returns is actually correct? :-/

That's a silly point of view.  STATX_DIOALIGN has only been in a released kernel
for a few weeks (v6.0), the bug is only in one edge case, and it will get fixed
quickly and backported to v6.0.y where users of 6.0 will get it.

Basically every Linux API has been broken at one point or the other, but things
get fixed.  Direct I/O itself has been totally broken on some filesystems, so by
this argument why are you even using direct I/O?  Actually, why are you even
using Linux?  Just use one of those operating systems with no bugs instead.

Also note that your alternative is probing, which is super broken because many
filesystems fall back to buffered I/O for misaligned direct I/O instead of
returning an error.

So please cooperate with getting things fixed properly instead of continuing to
use broken workarounds.

> > I'll raise this on the linux-block and dm-devel mailing lists.  It
> > would be nice if people reported kernel bugs instead of silently
> > working around them...
> 
> I wasn't involved in this specific one, but in case you're wondering why
> I wouldn't have reported it either...
> 
> On one hand, I agree with you because I want bugs in my code reported,
> too, but on the other hand, it has also happened to me before that
> you're treated as the stupid userland developer who doesn't know how
> the kernel works and who should better have kept his ideas of how it
> should work to himself - which is not exactly encouraging to report
> things when you can just deal with the way they are. I wouldn't have
> been able to tell whether in the mind of the respective maintainers,
> -EINVAL is required behaviour or whether that was just a totally
> unreasonable assumption on our side. Erring on the safe side, I'll give
> up an assumption that obviously doesn't match reality.

The error code is documented in the read(2) and write(2) man pages.  So clearly
either the kernel was wrong or the man page was wrong.  Either way it needed to
be reported.

- Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-03 16:26         ` Eric Biggers
@ 2022-11-03 16:54           ` Eric Biggers
  2022-11-03 17:54             ` Stefan Hajnoczi
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Biggers @ 2022-11-03 16:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, hreitz, qemu-block, nsoffer

On Thu, Nov 03, 2022 at 04:26:14PM +0000, Eric Biggers wrote:
> > In other words, STATX_DIOALIGN is unusable from the start because we
> > don't know whether the information it returns is actually correct? :-/
> 
> That's a silly point of view.  STATX_DIOALIGN has only been in a released kernel
> for a few weeks (v6.0), the bug is only in one edge case, and it will get fixed
> quickly and backported to v6.0.y where users of 6.0 will get it.

Actually, scratch that.  STATX_DIOALIGN is in 6.1, not 6.0.  So it hasn't even
been released yet.  Upstream is currently on v6.1-rc3.

So thank you for reporting (or for not reporting?) this.  We'll make sure it
gets fixed before release :-)

- Eric


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned
  2022-11-03 16:54           ` Eric Biggers
@ 2022-11-03 17:54             ` Stefan Hajnoczi
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Hajnoczi @ 2022-11-03 17:54 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Kevin Wolf, qemu-devel, hreitz, qemu-block, nsoffer

[-- Attachment #1: Type: text/plain, Size: 825 bytes --]

On Thu, Nov 03, 2022 at 04:54:09PM +0000, Eric Biggers wrote:
> On Thu, Nov 03, 2022 at 04:26:14PM +0000, Eric Biggers wrote:
> > > In other words, STATX_DIOALIGN is unusable from the start because we
> > > don't know whether the information it returns is actually correct? :-/
> > 
> > That's a silly point of view.  STATX_DIOALIGN has only been in a released kernel
> > for a few weeks (v6.0), the bug is only in one edge case, and it will get fixed
> > quickly and backported to v6.0.y where users of 6.0 will get it.
> 
> Actually, scratch that.  STATX_DIOALIGN is in 6.1, not 6.0.  So it hasn't even
> been released yet.  Upstream is currently on v6.1-rc3.
> 
> So thank you for reporting (or for not reporting?) this.  We'll make sure it
> gets fixed before release :-)

That's great. Thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-11-03 17:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01 19:00 [PATCH 0/2] file-posix: alignment probing improvements Stefan Hajnoczi
2022-11-01 19:00 ` [PATCH 1/2] file-posix: fix Linux alignment probing when EIO is returned Stefan Hajnoczi
2022-11-02  2:27   ` Eric Biggers
2022-11-02  2:49     ` Eric Biggers
2022-11-02 18:50       ` Stefan Hajnoczi
2022-11-03  9:52       ` Kevin Wolf
2022-11-03 13:57         ` Stefan Hajnoczi
2022-11-03 16:26         ` Eric Biggers
2022-11-03 16:54           ` Eric Biggers
2022-11-03 17:54             ` Stefan Hajnoczi
2022-11-01 19:00 ` [PATCH 2/2] file-posix: add statx(STATX_DIOALIGN) support Stefan Hajnoczi
2022-11-02  3:32   ` Eric Biggers
2022-11-02 18:53     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.