All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP
@ 2014-11-12 19:27 Markus Armbruster
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-12 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha, mreitz

See PATCH 2/2 for rationale.

Would you like this included in 2.2?

Markus Armbruster (2):
  raw-posix: Fix comment for raw_co_get_block_status()
  raw-posix: SEEK_HOLE suffices, get rid of FIEMAP

 block/raw-posix.c | 132 ++++++++++++++++++++----------------------------------
 1 file changed, 48 insertions(+), 84 deletions(-)

-- 
1.9.3

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status()
  2014-11-12 19:27 [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP Markus Armbruster
@ 2014-11-12 19:27 ` Markus Armbruster
  2014-11-12 23:18   ` Eric Blake
                     ` (2 more replies)
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
  2014-11-12 22:14 ` [Qemu-devel] [PATCH 0/2] raw-posix: Get " Paolo Bonzini
  2 siblings, 3 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-12 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha, mreitz

Missed in commit 705be72.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index e100ae2..706d3c0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
 }
 
 /*
- * Returns true iff the specified sector is present in the disk image. Drivers
- * not implementing the functionality are assumed to not support backing files,
- * hence all their sectors are reported as allocated.
+ * Returns the allocation status of the specified sectors.
  *
  * If 'sector_num' is beyond the end of the disk image the return value is 0
  * and 'pnum' is set to 0.
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 19:27 [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP Markus Armbruster
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
@ 2014-11-12 19:27 ` Markus Armbruster
  2014-11-12 23:25   ` Eric Blake
                     ` (2 more replies)
  2014-11-12 22:14 ` [Qemu-devel] [PATCH 0/2] raw-posix: Get " Paolo Bonzini
  2 siblings, 3 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-12 19:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha, mreitz

Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
follows:

1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl

2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()

3. Else pretend there are no holes

Later on, raw_co_is_allocated() was generalized to
raw_co_get_block_status().

Commit 4f11aa8 (May 2014) changed it to try the three methods in order
until success, because "there may be implementations which support
[SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
versa."

Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
speed hit, the next commit 38c4d0a put SEEK_HOLE/SEEK_DATA first.

As you see, the obvious use of FIEMAP is wrong, and the correct use is
slow.  I guess this puts it somewhere between -7 "The obvious use is
wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
to Misuse scale[*].

"Fortunately", the FIEMAP code is used only when

* SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is

  Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
  was introduced for ext4 and btrfs) and 2012.

* SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails

  Unlikely.

Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
bugs.  Worse, bugs hiding there can theoretically bite even on a host
that has SEEK_HOLE/SEEK_DATA.

I don't want to worry about this crap, not even theoretically.  Get
rid of it, then clean up the mess, including spotty error checking.

[*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html

Signed-off-by: Markus Armbruster <armbru@redhat.com>
---
 block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
 1 file changed, 47 insertions(+), 81 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 706d3c0..d16764c 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -60,9 +60,6 @@
 #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
 #endif
 #endif
-#ifdef CONFIG_FIEMAP
-#include <linux/fiemap.h>
-#endif
 #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
 #include <linux/falloc.h>
 #endif
@@ -1481,77 +1478,56 @@ out:
     return result;
 }
 
-static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
-                      off_t *hole, int nb_sectors)
+/*
+ * Find allocation range in @bs around offset @start.
+ * If @start is in a hole, store @start in @hole and the end of the
+ * hole in @data.
+ * If @start is in a data, store @start to @data, and the end of the
+ * data to @hole.
+ * If we can't find out, pretend there are no holes.
+ */
+static void find_allocation(BlockDriverState *bs, off_t start,
+                            off_t *data, off_t *hole)
 {
-#ifdef CONFIG_FIEMAP
+#if defined(SEEK_DATA) && defined(SEEK_HOLE)
     BDRVRawState *s = bs->opaque;
-    int ret = 0;
-    struct {
-        struct fiemap fm;
-        struct fiemap_extent fe;
-    } f;
+    off_t offs;
 
-    if (s->skip_fiemap) {
-        return -ENOTSUP;
+    offs = lseek(s->fd, start, SEEK_HOLE);
+    if (offs < 0) {
+        goto dunno;
     }
+    assert(offs >= start);
 
-    f.fm.fm_start = start;
-    f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
-    f.fm.fm_flags = FIEMAP_FLAG_SYNC;
-    f.fm.fm_extent_count = 1;
-    f.fm.fm_reserved = 0;
-    if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
-        s->skip_fiemap = true;
-        return -errno;
-    }
-
-    if (f.fm.fm_mapped_extents == 0) {
-        /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
-         * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
-         */
-        off_t length = lseek(s->fd, 0, SEEK_END);
-        *hole = f.fm.fm_start;
-        *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
-    } else {
-        *data = f.fe.fe_logical;
-        *hole = f.fe.fe_logical + f.fe.fe_length;
-        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
-            ret |= BDRV_BLOCK_ZERO;
-        }
-    }
-
-    return ret;
-#else
-    return -ENOTSUP;
-#endif
-}
-
-static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
-                         off_t *hole)
-{
-#if defined SEEK_HOLE && defined SEEK_DATA
-    BDRVRawState *s = bs->opaque;
-
-    *hole = lseek(s->fd, start, SEEK_HOLE);
-    if (*hole == -1) {
-        return -errno;
-    }
-
-    if (*hole > start) {
+    if (offs > start) {
+        /* in data, next hole at offs */
         *data = start;
-    } else {
-        /* On a hole.  We need another syscall to find its end.  */
-        *data = lseek(s->fd, start, SEEK_DATA);
-        if (*data == -1) {
-            *data = lseek(s->fd, 0, SEEK_END);
-        }
+        *hole = offs;
+        return;
     }
 
-    return 0;
-#else
-    return -ENOTSUP;
+    /* in hole, end not yet known */
+    offs = lseek(s->fd, start, SEEK_DATA);
+    if (offs < 0) {
+        /* no idea where the hole ends, give up (unlikely to happen) */
+        goto dunno;
+    }
+    assert(offs >= start);
+    *hole = start;
+    *data = offs;
+    return;
+
+dunno:
 #endif
+    /* assume all data */
+    offs = lseek(s->fd, 0, SEEK_END);
+    if (offs < 0) {
+        /* now that's *really* unexpected */
+        offs = (off_t)1 << (sizeof(off_t) * 8 - 1);
+        offs += offs - 1;
+    }
+    *data = start;
+    *hole = offs;
 }
 
 /*
@@ -1591,28 +1567,18 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
         nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
     }
 
-    ret = try_seek_hole(bs, start, &data, &hole);
-    if (ret < 0) {
-        ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
-        if (ret < 0) {
-            /* Assume everything is allocated. */
-            data = 0;
-            hole = start + nb_sectors * BDRV_SECTOR_SIZE;
-            ret = 0;
-        }
-    }
-
-    assert(ret >= 0);
-
-    if (data <= start) {
+    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+    find_allocation(bs, start, &data, &hole);
+    if (data == start) {
         /* On a data extent, compute sectors to the end of the extent.  */
         *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
-        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
     } else {
         /* On a hole, compute sectors to the beginning of the next extent.  */
+        assert(hole == start);
         *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
-        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
+        ret |= BDRV_BLOCK_ZERO;
     }
+    return ret;
 }
 
 static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP
  2014-11-12 19:27 [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP Markus Armbruster
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-12 22:14 ` Paolo Bonzini
  2014-11-13  8:53   ` Markus Armbruster
  2 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2014-11-12 22:14 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, tony, stefanha, mreitz



On 12/11/2014 20:27, Markus Armbruster wrote:
> See PATCH 2/2 for rationale.
> 
> Would you like this included in 2.2?
> 
> Markus Armbruster (2):
>   raw-posix: Fix comment for raw_co_get_block_status()
>   raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
> 
>  block/raw-posix.c | 132 ++++++++++++++++++++----------------------------------
>  1 file changed, 48 insertions(+), 84 deletions(-)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

though I'm not sure (and for once _I_ can say this to you :)) why you
did a bigger change than just removing the call to try_fiemap and the
definition of the function.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status()
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
@ 2014-11-12 23:18   ` Eric Blake
  2014-11-13  1:46   ` Fam Zheng
  2014-11-13  8:39   ` Max Reitz
  2 siblings, 0 replies; 20+ messages in thread
From: Eric Blake @ 2014-11-12 23:18 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha, mreitz

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On 11/12/2014 01:27 PM, Markus Armbruster wrote:
> Missed in commit 705be72.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index e100ae2..706d3c0 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>  }
>  
>  /*
> - * Returns true iff the specified sector is present in the disk image. Drivers
> - * not implementing the functionality are assumed to not support backing files,
> - * hence all their sectors are reported as allocated.
> + * Returns the allocation status of the specified sectors.
>   *
>   * If 'sector_num' is beyond the end of the disk image the return value is 0
>   * and 'pnum' is set to 0.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
@ 2014-11-12 23:25   ` Eric Blake
  2014-11-13  8:53     ` Markus Armbruster
  2014-11-13 11:40     ` Kevin Wolf
  2014-11-13  2:21   ` Fam Zheng
  2014-11-13  8:39   ` Max Reitz
  2 siblings, 2 replies; 20+ messages in thread
From: Eric Blake @ 2014-11-12 23:25 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha, mreitz

[-- Attachment #1: Type: text/plain, Size: 2995 bytes --]

On 11/12/2014 01:27 PM, Markus Armbruster wrote:
> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
> follows:
> 

> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
> bugs.  Worse, bugs hiding there can theoretically bite even on a host
> that has SEEK_HOLE/SEEK_DATA.
> 
> I don't want to worry about this crap, not even theoretically.  Get
> rid of it, then clean up the mess, including spotty error checking.

Sounds reasonable to me.  It's rather a big patch (both nuking a bad
interface and rewriting the use of the good interface) that might have
been better as two commits, but I can live with it.

> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
>  1 file changed, 47 insertions(+), 81 deletions(-)
> 

> +/*
> + * Find allocation range in @bs around offset @start.
> + * If @start is in a hole, store @start in @hole and the end of the
> + * hole in @data.
> + * If @start is in a data, store @start to @data, and the end of the
> + * data to @hole.
> + * If we can't find out, pretend there are no holes.
> + */
> +static void find_allocation(BlockDriverState *bs, off_t start,
> +                            off_t *data, off_t *hole)

Sounds like a good contract interface.

> +    /* in hole, end not yet known */
> +    offs = lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        /* no idea where the hole ends, give up (unlikely to happen) */
> +        goto dunno;
> +    }
> +    assert(offs >= start);
> +    *hole = start;
> +    *data = offs;

This assertion feels like an off-by-one.  The same offset cannot be both
a hole and data (except in some racy situation where some other process
is writing data to that offset in between our two lseek calls, but
that's already in no-man's land because no one else should be writing
the file while qemu has it open).  Is it worth using 'assert(offs >
start)' instead?

> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +    find_allocation(bs, start, &data, &hole);
> +    if (data == start) {
>          /* On a data extent, compute sectors to the end of the extent.  */
>          *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>      } else {
>          /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
>          *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
> +        ret |= BDRV_BLOCK_ZERO;
>      }
> +    return ret;

The old code omits BDRV_BLOCK_DATA on a hole.  Why are you adding it
here, and why are you not mentioning it in the commit message?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status()
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
  2014-11-12 23:18   ` Eric Blake
@ 2014-11-13  1:46   ` Fam Zheng
  2014-11-13  8:39   ` Max Reitz
  2 siblings, 0 replies; 20+ messages in thread
From: Fam Zheng @ 2014-11-13  1:46 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kwolf, tony, qemu-devel, mreitz, stefanha, pbonzini

On Wed, 11/12 20:27, Markus Armbruster wrote:
> Missed in commit 705be72.
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index e100ae2..706d3c0 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>  }
>  
>  /*
> - * Returns true iff the specified sector is present in the disk image. Drivers
> - * not implementing the functionality are assumed to not support backing files,
> - * hence all their sectors are reported as allocated.
> + * Returns the allocation status of the specified sectors.
>   *
>   * If 'sector_num' is beyond the end of the disk image the return value is 0
>   * and 'pnum' is set to 0.
> -- 
> 1.9.3
> 
> 

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
  2014-11-12 23:25   ` Eric Blake
@ 2014-11-13  2:21   ` Fam Zheng
  2014-11-13  8:26     ` Markus Armbruster
  2014-11-13  8:39   ` Max Reitz
  2 siblings, 1 reply; 20+ messages in thread
From: Fam Zheng @ 2014-11-13  2:21 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: kwolf, tony, qemu-devel, mreitz, stefanha, pbonzini

On Wed, 11/12 20:27, Markus Armbruster wrote:
> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
> follows:
> 
> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
> 
> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
> 
> 3. Else pretend there are no holes
> 
> Later on, raw_co_is_allocated() was generalized to
> raw_co_get_block_status().
> 
> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
> until success, because "there may be implementations which support
> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
> versa."
> 
> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
> speed hit, the next commit 38c4d0a put SEEK_HOLE/SEEK_DATA first.

s/38c4d0a/7c159037/

> 
> As you see, the obvious use of FIEMAP is wrong, and the correct use is
> slow.  I guess this puts it somewhere between -7 "The obvious use is
> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
> to Misuse scale[*].

Nice reading :)

> 
> "Fortunately", the FIEMAP code is used only when
> 
> * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is
> 
>   Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>   was introduced for ext4 and btrfs) and 2012.
> 
> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
> 
>   Unlikely.
> 
> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
> bugs.  Worse, bugs hiding there can theoretically bite even on a host
> that has SEEK_HOLE/SEEK_DATA.
> 
> I don't want to worry about this crap, not even theoretically.  Get
> rid of it, then clean up the mess, including spotty error checking.
> 
> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
> 
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>  block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
>  1 file changed, 47 insertions(+), 81 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 706d3c0..d16764c 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -60,9 +60,6 @@
>  #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
>  #endif
>  #endif
> -#ifdef CONFIG_FIEMAP
> -#include <linux/fiemap.h>
> -#endif
>  #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>  #include <linux/falloc.h>
>  #endif
> @@ -1481,77 +1478,56 @@ out:
>      return result;
>  }
>  
> -static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
> -                      off_t *hole, int nb_sectors)
> +/*
> + * Find allocation range in @bs around offset @start.
> + * If @start is in a hole, store @start in @hole and the end of the
> + * hole in @data.
> + * If @start is in a data, store @start to @data, and the end of the
> + * data to @hole.
> + * If we can't find out, pretend there are no holes.
> + */
> +static void find_allocation(BlockDriverState *bs, off_t start,
> +                            off_t *data, off_t *hole)
>  {
> -#ifdef CONFIG_FIEMAP
> +#if defined(SEEK_DATA) && defined(SEEK_HOLE)
>      BDRVRawState *s = bs->opaque;
> -    int ret = 0;
> -    struct {
> -        struct fiemap fm;
> -        struct fiemap_extent fe;
> -    } f;
> +    off_t offs;
>  
> -    if (s->skip_fiemap) {
> -        return -ENOTSUP;
> +    offs = lseek(s->fd, start, SEEK_HOLE);
> +    if (offs < 0) {
> +        goto dunno;
>      }
> +    assert(offs >= start);
>  
> -    f.fm.fm_start = start;
> -    f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
> -    f.fm.fm_flags = FIEMAP_FLAG_SYNC;
> -    f.fm.fm_extent_count = 1;
> -    f.fm.fm_reserved = 0;
> -    if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
> -        s->skip_fiemap = true;
> -        return -errno;
> -    }
> -
> -    if (f.fm.fm_mapped_extents == 0) {
> -        /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
> -         * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
> -         */
> -        off_t length = lseek(s->fd, 0, SEEK_END);
> -        *hole = f.fm.fm_start;
> -        *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
> -    } else {
> -        *data = f.fe.fe_logical;
> -        *hole = f.fe.fe_logical + f.fe.fe_length;
> -        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
> -            ret |= BDRV_BLOCK_ZERO;
> -        }
> -    }
> -
> -    return ret;
> -#else
> -    return -ENOTSUP;
> -#endif
> -}
> -
> -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
> -                         off_t *hole)
> -{
> -#if defined SEEK_HOLE && defined SEEK_DATA
> -    BDRVRawState *s = bs->opaque;
> -
> -    *hole = lseek(s->fd, start, SEEK_HOLE);
> -    if (*hole == -1) {
> -        return -errno;
> -    }
> -
> -    if (*hole > start) {
> +    if (offs > start) {
> +        /* in data, next hole at offs */
>          *data = start;
> -    } else {
> -        /* On a hole.  We need another syscall to find its end.  */
> -        *data = lseek(s->fd, start, SEEK_DATA);
> -        if (*data == -1) {
> -            *data = lseek(s->fd, 0, SEEK_END);
> -        }
> +        *hole = offs;
> +        return;
>      }
>  
> -    return 0;
> -#else
> -    return -ENOTSUP;
> +    /* in hole, end not yet known */
> +    offs = lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        /* no idea where the hole ends, give up (unlikely to happen) */
> +        goto dunno;
> +    }
> +    assert(offs >= start);
> +    *hole = start;
> +    *data = offs;
> +    return;
> +
> +dunno:
>  #endif
> +    /* assume all data */
> +    offs = lseek(s->fd, 0, SEEK_END);
> +    if (offs < 0) {
> +        /* now that's *really* unexpected */
> +        offs = (off_t)1 << (sizeof(off_t) * 8 - 1);
> +        offs += offs - 1;
> +    }
> +    *data = start;
> +    *hole = offs;
>  }
>  
>  /*
> @@ -1591,28 +1567,18 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>          nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>      }
>  
> -    ret = try_seek_hole(bs, start, &data, &hole);
> -    if (ret < 0) {
> -        ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
> -        if (ret < 0) {
> -            /* Assume everything is allocated. */
> -            data = 0;
> -            hole = start + nb_sectors * BDRV_SECTOR_SIZE;
> -            ret = 0;
> -        }
> -    }
> -
> -    assert(ret >= 0);
> -
> -    if (data <= start) {
> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +    find_allocation(bs, start, &data, &hole);
> +    if (data == start) {
>          /* On a data extent, compute sectors to the end of the extent.  */
>          *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>      } else {
>          /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
>          *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
> +        ret |= BDRV_BLOCK_ZERO;
>      }
> +    return ret;
>  }
>  
>  static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,
> -- 
> 1.9.3
> 
> 

Other than the wrong commit id in message,

Reviewed-by: Fam Zheng <famz@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13  2:21   ` Fam Zheng
@ 2014-11-13  8:26     ` Markus Armbruster
  0 siblings, 0 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-13  8:26 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, qemu-devel, tony, mreitz, stefanha, pbonzini

Fam Zheng <famz@redhat.com> writes:

> On Wed, 11/12 20:27, Markus Armbruster wrote:
>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>> follows:
>> 
>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>> 
>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>> 
>> 3. Else pretend there are no holes
>> 
>> Later on, raw_co_is_allocated() was generalized to
>> raw_co_get_block_status().
>> 
>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>> until success, because "there may be implementations which support
>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>> versa."
>> 
>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
>> speed hit, the next commit 38c4d0a put SEEK_HOLE/SEEK_DATA first.
>
> s/38c4d0a/7c159037/

Fixing...

>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>> slow.  I guess this puts it somewhere between -7 "The obvious use is
>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>> to Misuse scale[*].
>
> Nice reading :)

Adapted from a comment Paolo made in a discussion preceding this patch
:)

[...]
> Other than the wrong commit id in message,
>
> Reviewed-by: Fam Zheng <famz@redhat.com>

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
  2014-11-12 23:25   ` Eric Blake
  2014-11-13  2:21   ` Fam Zheng
@ 2014-11-13  8:39   ` Max Reitz
  2014-11-13  9:25     ` Markus Armbruster
  2 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2014-11-13  8:39 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha

On 2014-11-12 at 20:27, Markus Armbruster wrote:
> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
> follows:
>
> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>
> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>
> 3. Else pretend there are no holes
>
> Later on, raw_co_is_allocated() was generalized to
> raw_co_get_block_status().
>
> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
> until success, because "there may be implementations which support
> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
> versa."
>
> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
> speed hit, the next commit 38c4d0a put SEEK_HOLE/SEEK_DATA first.
>
> As you see, the obvious use of FIEMAP is wrong, and the correct use is
> slow.  I guess this puts it somewhere between -7 "The obvious use is
> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
> to Misuse scale[*].
>
> "Fortunately", the FIEMAP code is used only when
>
> * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is
>
>    Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>    was introduced for ext4 and btrfs) and 2012.
>
> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>
>    Unlikely.
>
> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
> bugs.  Worse, bugs hiding there can theoretically bite even on a host
> that has SEEK_HOLE/SEEK_DATA.
>
> I don't want to worry about this crap, not even theoretically.  Get
> rid of it, then clean up the mess, including spotty error checking.
>
> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
>   1 file changed, 47 insertions(+), 81 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 706d3c0..d16764c 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -60,9 +60,6 @@
>   #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
>   #endif
>   #endif
> -#ifdef CONFIG_FIEMAP
> -#include <linux/fiemap.h>
> -#endif
>   #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>   #include <linux/falloc.h>
>   #endif
> @@ -1481,77 +1478,56 @@ out:
>       return result;
>   }
>   
> -static int try_fiemap(BlockDriverState *bs, off_t start, off_t *data,
> -                      off_t *hole, int nb_sectors)
> +/*
> + * Find allocation range in @bs around offset @start.
> + * If @start is in a hole, store @start in @hole and the end of the
> + * hole in @data.
> + * If @start is in a data, store @start to @data, and the end of the
> + * data to @hole.
> + * If we can't find out, pretend there are no holes.
> + */
> +static void find_allocation(BlockDriverState *bs, off_t start,
> +                            off_t *data, off_t *hole)
>   {
> -#ifdef CONFIG_FIEMAP
> +#if defined(SEEK_DATA) && defined(SEEK_HOLE)
>       BDRVRawState *s = bs->opaque;
> -    int ret = 0;
> -    struct {
> -        struct fiemap fm;
> -        struct fiemap_extent fe;
> -    } f;
> +    off_t offs;
>   
> -    if (s->skip_fiemap) {
> -        return -ENOTSUP;
> +    offs = lseek(s->fd, start, SEEK_HOLE);
> +    if (offs < 0) {
> +        goto dunno;
>       }
> +    assert(offs >= start);
>   
> -    f.fm.fm_start = start;
> -    f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
> -    f.fm.fm_flags = FIEMAP_FLAG_SYNC;
> -    f.fm.fm_extent_count = 1;
> -    f.fm.fm_reserved = 0;
> -    if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
> -        s->skip_fiemap = true;
> -        return -errno;
> -    }
> -
> -    if (f.fm.fm_mapped_extents == 0) {
> -        /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
> -         * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
> -         */
> -        off_t length = lseek(s->fd, 0, SEEK_END);
> -        *hole = f.fm.fm_start;
> -        *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
> -    } else {
> -        *data = f.fe.fe_logical;
> -        *hole = f.fe.fe_logical + f.fe.fe_length;
> -        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
> -            ret |= BDRV_BLOCK_ZERO;
> -        }
> -    }
> -
> -    return ret;
> -#else
> -    return -ENOTSUP;
> -#endif
> -}
> -
> -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
> -                         off_t *hole)
> -{
> -#if defined SEEK_HOLE && defined SEEK_DATA
> -    BDRVRawState *s = bs->opaque;
> -
> -    *hole = lseek(s->fd, start, SEEK_HOLE);
> -    if (*hole == -1) {
> -        return -errno;
> -    }
> -
> -    if (*hole > start) {
> +    if (offs > start) {
> +        /* in data, next hole at offs */
>           *data = start;
> -    } else {
> -        /* On a hole.  We need another syscall to find its end.  */
> -        *data = lseek(s->fd, start, SEEK_DATA);
> -        if (*data == -1) {
> -            *data = lseek(s->fd, 0, SEEK_END);
> -        }
> +        *hole = offs;
> +        return;
>       }
>   
> -    return 0;
> -#else
> -    return -ENOTSUP;
> +    /* in hole, end not yet known */
> +    offs = lseek(s->fd, start, SEEK_DATA);
> +    if (offs < 0) {
> +        /* no idea where the hole ends, give up (unlikely to happen) */
> +        goto dunno;
> +    }
> +    assert(offs >= start);
> +    *hole = start;
> +    *data = offs;
> +    return;
> +
> +dunno:
>   #endif
> +    /* assume all data */
> +    offs = lseek(s->fd, 0, SEEK_END);

Why are you calling lseek() here at all? Just set offs to the maximum 
value and let the MIN() in the caller handle the rest.

> +    if (offs < 0) {
> +        /* now that's *really* unexpected */
> +        offs = (off_t)1 << (sizeof(off_t) * 8 - 1);
> +        offs += offs - 1;
> +    }
> +    *data = start;
> +    *hole = offs;
>   }
>   
>   /*
> @@ -1591,28 +1567,18 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>           nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>       }
>   
> -    ret = try_seek_hole(bs, start, &data, &hole);
> -    if (ret < 0) {
> -        ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
> -        if (ret < 0) {
> -            /* Assume everything is allocated. */
> -            data = 0;
> -            hole = start + nb_sectors * BDRV_SECTOR_SIZE;
> -            ret = 0;
> -        }
> -    }
> -
> -    assert(ret >= 0);
> -
> -    if (data <= start) {
> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +    find_allocation(bs, start, &data, &hole);
> +    if (data == start) {
>           /* On a data extent, compute sectors to the end of the extent.  */
>           *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>       } else {
>           /* On a hole, compute sectors to the beginning of the next extent.  */
> +        assert(hole == start);
>           *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
> -        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
> +        ret |= BDRV_BLOCK_ZERO;

As Eric already said, this changes the behavior (might even break some 
tests, I'm not sure). It seems fine to me, though. Whether DATA should 
be included on holes in the file or not is a question which I don't have 
an answer to, so I'm with either; but you may want to mention it in the 
commit message.

>       }
> +    return ret;
>   }
>   
>   static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState *bs,

Because nothing is strictly* wrong (except the ID in the commit 
message), have another R-b (there seem to be plenty of them today):

Reviewed-by: Max Reitz <mreitz@redhat.com>

*with "not strictly wrong" I'm referring to the DATA+ZERO change.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status()
  2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
  2014-11-12 23:18   ` Eric Blake
  2014-11-13  1:46   ` Fam Zheng
@ 2014-11-13  8:39   ` Max Reitz
  2 siblings, 0 replies; 20+ messages in thread
From: Max Reitz @ 2014-11-13  8:39 UTC (permalink / raw)
  To: Markus Armbruster, qemu-devel; +Cc: kwolf, pbonzini, tony, stefanha

On 2014-11-12 at 20:27, Markus Armbruster wrote:
> Missed in commit 705be72.
>
> Signed-off-by: Markus Armbruster <armbru@redhat.com>
> ---
>   block/raw-posix.c | 4 +---
>   1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index e100ae2..706d3c0 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1555,9 +1555,7 @@ static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>   }
>   
>   /*
> - * Returns true iff the specified sector is present in the disk image. Drivers
> - * not implementing the functionality are assumed to not support backing files,
> - * hence all their sectors are reported as allocated.
> + * Returns the allocation status of the specified sectors.
>    *
>    * If 'sector_num' is beyond the end of the disk image the return value is 0
>    * and 'pnum' is set to 0.

Reviewed-by: Max Reitz <mreitz@redhat.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 23:25   ` Eric Blake
@ 2014-11-13  8:53     ` Markus Armbruster
  2014-11-13 11:40     ` Kevin Wolf
  1 sibling, 0 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-13  8:53 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, tony, qemu-devel, mreitz, stefanha, pbonzini

Eric Blake <eblake@redhat.com> writes:

> On 11/12/2014 01:27 PM, Markus Armbruster wrote:
>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>> follows:
>> 
>
>> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
>> bugs.  Worse, bugs hiding there can theoretically bite even on a host
>> that has SEEK_HOLE/SEEK_DATA.
>> 
>> I don't want to worry about this crap, not even theoretically.  Get
>> rid of it, then clean up the mess, including spotty error checking.
>
> Sounds reasonable to me.  It's rather a big patch (both nuking a bad
> interface and rewriting the use of the good interface) that might have
> been better as two commits, but I can live with it.
>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>  block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
>>  1 file changed, 47 insertions(+), 81 deletions(-)
>> 
>
>> +/*
>> + * Find allocation range in @bs around offset @start.
>> + * If @start is in a hole, store @start in @hole and the end of the
>> + * hole in @data.
>> + * If @start is in a data, store @start to @data, and the end of the
>> + * data to @hole.
>> + * If we can't find out, pretend there are no holes.
>> + */
>> +static void find_allocation(BlockDriverState *bs, off_t start,
>> +                            off_t *data, off_t *hole)
>
> Sounds like a good contract interface.
>
>> +    /* in hole, end not yet known */
>> +    offs = lseek(s->fd, start, SEEK_DATA);
>> +    if (offs < 0) {
>> +        /* no idea where the hole ends, give up (unlikely to happen) */
>> +        goto dunno;
>> +    }
>> +    assert(offs >= start);
>> +    *hole = start;
>> +    *data = offs;
>
> This assertion feels like an off-by-one.  The same offset cannot be both
> a hole and data (except in some racy situation where some other process
> is writing data to that offset in between our two lseek calls, but
> that's already in no-man's land because no one else should be writing
> the file while qemu has it open).  Is it worth using 'assert(offs >
> start)' instead?

Yes.  Fixing...

>> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>> +    find_allocation(bs, start, &data, &hole);
>> +    if (data == start) {
>>          /* On a data extent, compute sectors to the end of the extent.  */
>>          *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
>> -        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>>      } else {
>>          /* On a hole, compute sectors to the beginning of the next extent.  */
>> +        assert(hole == start);
>>          *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
>> -        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
>> +        ret |= BDRV_BLOCK_ZERO;
>>      }
>> +    return ret;
>
> The old code omits BDRV_BLOCK_DATA on a hole.  Why are you adding it
> here, and why are you not mentioning it in the commit message?

I got confused.

Here's how block.h explains the allocation flags:

 * DATA ZERO OFFSET_VALID
 *  t    t        t       sectors read as zero, bs->file is zero at offset
 *  t    f        t       sectors read as valid from bs->file at offset
 *  f    t        t       sectors preallocated, read as zero, bs->file not
 *                        necessarily zero at offset
 *  f    f        t       sectors preallocated but read from backing_hd,
 *                        bs->file contains garbage at offset

Should a hole in a bdrv_file bs have status DATA | ZERO (first row) or
just ZERO (third row)?

First row:

* "sectors read as zero": certainly true in a hole.

* "bs->file is zero at offset": not sure what that's supposed to mean.
  bs->file is null.

Third row:

* "sectors preallocated": not sure what that's supposed to mean.
  Probably preallocation != off.  If that's what it means, then it's
  false in a hole.

* "read as zero": certainly true in a hole.

* "bs->file not necessarily zero at offset": not sure what that's
  supposed to mean.  bs->file is null.

Now you're probably confused, too.

Anyway, I shouldn't make such a change in a cleanup patch!  v2 will
stick to the old flags.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP
  2014-11-12 22:14 ` [Qemu-devel] [PATCH 0/2] raw-posix: Get " Paolo Bonzini
@ 2014-11-13  8:53   ` Markus Armbruster
  0 siblings, 0 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-13  8:53 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, mreitz, qemu-devel, stefanha, tony

Paolo Bonzini <pbonzini@redhat.com> writes:

> On 12/11/2014 20:27, Markus Armbruster wrote:
>> See PATCH 2/2 for rationale.
>> 
>> Would you like this included in 2.2?
>> 
>> Markus Armbruster (2):
>>   raw-posix: Fix comment for raw_co_get_block_status()
>>   raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
>> 
>>  block/raw-posix.c | 132 ++++++++++++++++++++----------------------------------
>>  1 file changed, 48 insertions(+), 84 deletions(-)
>> 
>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>
> though I'm not sure (and for once _I_ can say this to you :)) why you
> did a bigger change than just removing the call to try_fiemap and the
> definition of the function.

You and the other reviewers are right: I should split up 2/2 into a
minimal remove of FIEMAP and the cleanup.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13  8:39   ` Max Reitz
@ 2014-11-13  9:25     ` Markus Armbruster
  0 siblings, 0 replies; 20+ messages in thread
From: Markus Armbruster @ 2014-11-13  9:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: kwolf, pbonzini, qemu-devel, stefanha, tony

Max Reitz <mreitz@redhat.com> writes:

> On 2014-11-12 at 20:27, Markus Armbruster wrote:
>> Commit 5500316 (May 2012) implemented raw_co_is_allocated() as
>> follows:
>>
>> 1. If defined(CONFIG_FIEMAP), use the FS_IOC_FIEMAP ioctl
>>
>> 2. Else if defined(SEEK_HOLE) && defined(SEEK_DATA), use lseek()
>>
>> 3. Else pretend there are no holes
>>
>> Later on, raw_co_is_allocated() was generalized to
>> raw_co_get_block_status().
>>
>> Commit 4f11aa8 (May 2014) changed it to try the three methods in order
>> until success, because "there may be implementations which support
>> [SEEK_HOLE/SEEK_DATA] but not [FIEMAP] (e.g., NFSv4.2) as well as vice
>> versa."
>>
>> Unfortunately, we used FIEMAP incorrectly: we lacked FIEMAP_FLAG_SYNC.
>> Commit 38c4d0a (Sep 2014) added it.  Because that's a significant
>> speed hit, the next commit 38c4d0a put SEEK_HOLE/SEEK_DATA first.
>>
>> As you see, the obvious use of FIEMAP is wrong, and the correct use is
>> slow.  I guess this puts it somewhere between -7 "The obvious use is
>> wrong" and -10 "It's impossible to get right" on Rusty Russel's Hard
>> to Misuse scale[*].
>>
>> "Fortunately", the FIEMAP code is used only when
>>
>> * SEEK_HOLE/SEEK_DATA arent't defined, but CONFIG_FIEMAP is
>>
>>    Uncommon.  SEEK_HOLE had no XFS implementation between 2011 (when it
>>    was introduced for ext4 and btrfs) and 2012.
>>
>> * SEEK_HOLE/SEEK_DATA and CONFIG_FIEMAP are defined, but lseek() fails
>>
>>    Unlikely.
>>
>> Thus, the FIEMAP code executes rarely.  Makes it a nice hidey-hole for
>> bugs.  Worse, bugs hiding there can theoretically bite even on a host
>> that has SEEK_HOLE/SEEK_DATA.
>>
>> I don't want to worry about this crap, not even theoretically.  Get
>> rid of it, then clean up the mess, including spotty error checking.
>>
>> [*] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
>>
>> Signed-off-by: Markus Armbruster <armbru@redhat.com>
>> ---
>>   block/raw-posix.c | 128 ++++++++++++++++++++----------------------------------
>>   1 file changed, 47 insertions(+), 81 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 706d3c0..d16764c 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -60,9 +60,6 @@
>>   #define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
>>   #endif
>>   #endif
>> -#ifdef CONFIG_FIEMAP
>> -#include <linux/fiemap.h>
>> -#endif
>>   #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>   #include <linux/falloc.h>
>>   #endif
>> @@ -1481,77 +1478,56 @@ out:
>>       return result;
>>   }
>>   -static int try_fiemap(BlockDriverState *bs, off_t start, off_t
>> *data,
>> -                      off_t *hole, int nb_sectors)
>> +/*
>> + * Find allocation range in @bs around offset @start.
>> + * If @start is in a hole, store @start in @hole and the end of the
>> + * hole in @data.
>> + * If @start is in a data, store @start to @data, and the end of the
>> + * data to @hole.
>> + * If we can't find out, pretend there are no holes.
>> + */
>> +static void find_allocation(BlockDriverState *bs, off_t start,
>> +                            off_t *data, off_t *hole)
>>   {
>> -#ifdef CONFIG_FIEMAP
>> +#if defined(SEEK_DATA) && defined(SEEK_HOLE)
>>       BDRVRawState *s = bs->opaque;
>> -    int ret = 0;
>> -    struct {
>> -        struct fiemap fm;
>> -        struct fiemap_extent fe;
>> -    } f;
>> +    off_t offs;
>>   -    if (s->skip_fiemap) {
>> -        return -ENOTSUP;
>> +    offs = lseek(s->fd, start, SEEK_HOLE);
>> +    if (offs < 0) {
>> +        goto dunno;
>>       }
>> +    assert(offs >= start);
>>   -    f.fm.fm_start = start;
>> -    f.fm.fm_length = (int64_t)nb_sectors * BDRV_SECTOR_SIZE;
>> -    f.fm.fm_flags = FIEMAP_FLAG_SYNC;
>> -    f.fm.fm_extent_count = 1;
>> -    f.fm.fm_reserved = 0;
>> -    if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
>> -        s->skip_fiemap = true;
>> -        return -errno;
>> -    }
>> -
>> -    if (f.fm.fm_mapped_extents == 0) {
>> -        /* No extents found, data is beyond f.fm.fm_start + f.fm.fm_length.
>> -         * f.fm.fm_start + f.fm.fm_length must be clamped to the file size!
>> -         */
>> -        off_t length = lseek(s->fd, 0, SEEK_END);
>> -        *hole = f.fm.fm_start;
>> -        *data = MIN(f.fm.fm_start + f.fm.fm_length, length);
>> -    } else {
>> -        *data = f.fe.fe_logical;
>> -        *hole = f.fe.fe_logical + f.fe.fe_length;
>> -        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
>> -            ret |= BDRV_BLOCK_ZERO;
>> -        }
>> -    }
>> -
>> -    return ret;
>> -#else
>> -    return -ENOTSUP;
>> -#endif
>> -}
>> -
>> -static int try_seek_hole(BlockDriverState *bs, off_t start, off_t *data,
>> -                         off_t *hole)
>> -{
>> -#if defined SEEK_HOLE && defined SEEK_DATA
>> -    BDRVRawState *s = bs->opaque;
>> -
>> -    *hole = lseek(s->fd, start, SEEK_HOLE);
>> -    if (*hole == -1) {
>> -        return -errno;
>> -    }
>> -
>> -    if (*hole > start) {
>> +    if (offs > start) {
>> +        /* in data, next hole at offs */
>>           *data = start;
>> -    } else {
>> -        /* On a hole.  We need another syscall to find its end.  */
>> -        *data = lseek(s->fd, start, SEEK_DATA);
>> -        if (*data == -1) {
>> -            *data = lseek(s->fd, 0, SEEK_END);
>> -        }
>> +        *hole = offs;
>> +        return;
>>       }
>>   -    return 0;
>> -#else
>> -    return -ENOTSUP;
>> +    /* in hole, end not yet known */
>> +    offs = lseek(s->fd, start, SEEK_DATA);
>> +    if (offs < 0) {
>> +        /* no idea where the hole ends, give up (unlikely to happen) */
>> +        goto dunno;
>> +    }
>> +    assert(offs >= start);
>> +    *hole = start;
>> +    *data = offs;
>> +    return;
>> +
>> +dunno:
>>   #endif
>> +    /* assume all data */
>> +    offs = lseek(s->fd, 0, SEEK_END);
>
> Why are you calling lseek() here at all? Just set offs to the maximum
> value and let the MIN() in the caller handle the rest.

You're right.

Furthermore, making up a value for *hole here that the caller will clamp
to nb_sectors feels stupid.  I'll simplify in v2.

>> +    if (offs < 0) {
>> +        /* now that's *really* unexpected */
>> +        offs = (off_t)1 << (sizeof(off_t) * 8 - 1);
>> +        offs += offs - 1;
>> +    }
>> +    *data = start;
>> +    *hole = offs;
>>   }
>>     /*
>> @@ -1591,28 +1567,18 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>>           nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>>       }
>>   -    ret = try_seek_hole(bs, start, &data, &hole);
>> -    if (ret < 0) {
>> -        ret = try_fiemap(bs, start, &data, &hole, nb_sectors);
>> -        if (ret < 0) {
>> -            /* Assume everything is allocated. */
>> -            data = 0;
>> -            hole = start + nb_sectors * BDRV_SECTOR_SIZE;
>> -            ret = 0;
>> -        }
>> -    }
>> -
>> -    assert(ret >= 0);
>> -
>> -    if (data <= start) {
>> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>> +    find_allocation(bs, start, &data, &hole);
>> +    if (data == start) {
>>           /* On a data extent, compute sectors to the end of the extent.  */
>>           *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
>> -        return ret | BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
>>       } else {
>>           /* On a hole, compute sectors to the beginning of the next extent.  */
>> +        assert(hole == start);
>>           *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
>> -        return ret | BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID | start;
>> +        ret |= BDRV_BLOCK_ZERO;
>
> As Eric already said, this changes the behavior (might even break some
> tests, I'm not sure). It seems fine to me, though. Whether DATA should
> be included on holes in the file or not is a question which I don't
> have an answer to, so I'm with either; but you may want to mention it
> in the commit message.

See my reply to Eric.

>>       }
>> +    return ret;
>>   }
>>     static coroutine_fn BlockAIOCB *raw_aio_discard(BlockDriverState
>> *bs,
>
> Because nothing is strictly* wrong (except the ID in the commit
> message), have another R-b (there seem to be plenty of them today):
>
> Reviewed-by: Max Reitz <mreitz@redhat.com>
>
> *with "not strictly wrong" I'm referring to the DATA+ZERO change.

Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-12 23:25   ` Eric Blake
  2014-11-13  8:53     ` Markus Armbruster
@ 2014-11-13 11:40     ` Kevin Wolf
  2014-11-13 11:45       ` Max Reitz
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2014-11-13 11:40 UTC (permalink / raw)
  To: Eric Blake
  Cc: tony, qemu-devel, Markus Armbruster, mreitz, stefanha, pbonzini

[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]

Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
> On 11/12/2014 01:27 PM, Markus Armbruster wrote:
> > +    /* in hole, end not yet known */
> > +    offs = lseek(s->fd, start, SEEK_DATA);
> > +    if (offs < 0) {
> > +        /* no idea where the hole ends, give up (unlikely to happen) */
> > +        goto dunno;
> > +    }
> > +    assert(offs >= start);
> > +    *hole = start;
> > +    *data = offs;
> 
> This assertion feels like an off-by-one.  The same offset cannot be both
> a hole and data (except in some racy situation where some other process
> is writing data to that offset in between our two lseek calls, but
> that's already in no-man's land because no one else should be writing
> the file while qemu has it open).  Is it worth using 'assert(offs >
> start)' instead?

As soon as you say "except", it's wrong to assert this at all. We can't
guarantee that the condition is true and it's not a programming error
in qemu if it's false. Sounds to me as if it should be a normal error
check rather than an assertion.

Also, what happens after EOF? I haven't read the patch yet, maybe it
handles the situation already earlier, but if it doesn't, won't we get
offset == start then?

Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13 11:40     ` Kevin Wolf
@ 2014-11-13 11:45       ` Max Reitz
  2014-11-13 12:00         ` Kevin Wolf
  0 siblings, 1 reply; 20+ messages in thread
From: Max Reitz @ 2014-11-13 11:45 UTC (permalink / raw)
  To: Kevin Wolf, Eric Blake
  Cc: pbonzini, stefanha, Markus Armbruster, tony, qemu-devel

On 2014-11-13 at 12:40, Kevin Wolf wrote:
> Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
>> On 11/12/2014 01:27 PM, Markus Armbruster wrote:
>>> +    /* in hole, end not yet known */
>>> +    offs = lseek(s->fd, start, SEEK_DATA);
>>> +    if (offs < 0) {
>>> +        /* no idea where the hole ends, give up (unlikely to happen) */
>>> +        goto dunno;
>>> +    }
>>> +    assert(offs >= start);
>>> +    *hole = start;
>>> +    *data = offs;
>> This assertion feels like an off-by-one.  The same offset cannot be both
>> a hole and data (except in some racy situation where some other process
>> is writing data to that offset in between our two lseek calls, but
>> that's already in no-man's land because no one else should be writing
>> the file while qemu has it open).  Is it worth using 'assert(offs >
>> start)' instead?
> As soon as you say "except", it's wrong to assert this at all. We can't
> guarantee that the condition is true and it's not a programming error
> in qemu if it's false. Sounds to me as if it should be a normal error
> check rather than an assertion.
>
> Also, what happens after EOF? I haven't read the patch yet, maybe it
> handles the situation already earlier, but if it doesn't, won't we get
> offset == start then?

raw_co_get_block_status() already bails out if start is at or beyond EOF.

Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13 11:45       ` Max Reitz
@ 2014-11-13 12:00         ` Kevin Wolf
  2014-11-13 12:05           ` Max Reitz
  2014-11-13 12:38           ` Markus Armbruster
  0 siblings, 2 replies; 20+ messages in thread
From: Kevin Wolf @ 2014-11-13 12:00 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Markus Armbruster, tony, stefanha, pbonzini

Am 13.11.2014 um 12:45 hat Max Reitz geschrieben:
> On 2014-11-13 at 12:40, Kevin Wolf wrote:
> >Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
> >>On 11/12/2014 01:27 PM, Markus Armbruster wrote:
> >>>+    /* in hole, end not yet known */
> >>>+    offs = lseek(s->fd, start, SEEK_DATA);
> >>>+    if (offs < 0) {
> >>>+        /* no idea where the hole ends, give up (unlikely to happen) */
> >>>+        goto dunno;
> >>>+    }
> >>>+    assert(offs >= start);
> >>>+    *hole = start;
> >>>+    *data = offs;
> >>This assertion feels like an off-by-one.  The same offset cannot be both
> >>a hole and data (except in some racy situation where some other process
> >>is writing data to that offset in between our two lseek calls, but
> >>that's already in no-man's land because no one else should be writing
> >>the file while qemu has it open).  Is it worth using 'assert(offs >
> >>start)' instead?
> >As soon as you say "except", it's wrong to assert this at all. We can't
> >guarantee that the condition is true and it's not a programming error
> >in qemu if it's false. Sounds to me as if it should be a normal error
> >check rather than an assertion.
> >
> >Also, what happens after EOF? I haven't read the patch yet, maybe it
> >handles the situation already earlier, but if it doesn't, won't we get
> >offset == start then?
> 
> raw_co_get_block_status() already bails out if start is at or beyond EOF.

Okay, so that's basically the same "except" as above.

Except that the window for the race is much larger because the
raw_co_get_block_status() check uses the cached value, so any file size
change in the background after qemu has opened the image would trigger
an assertion failure.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13 12:00         ` Kevin Wolf
@ 2014-11-13 12:05           ` Max Reitz
  2014-11-13 12:38           ` Markus Armbruster
  1 sibling, 0 replies; 20+ messages in thread
From: Max Reitz @ 2014-11-13 12:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Markus Armbruster, tony, stefanha, pbonzini

On 2014-11-13 at 13:00, Kevin Wolf wrote:
> Am 13.11.2014 um 12:45 hat Max Reitz geschrieben:
>> On 2014-11-13 at 12:40, Kevin Wolf wrote:
>>> Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
>>>> On 11/12/2014 01:27 PM, Markus Armbruster wrote:
>>>>> +    /* in hole, end not yet known */
>>>>> +    offs = lseek(s->fd, start, SEEK_DATA);
>>>>> +    if (offs < 0) {
>>>>> +        /* no idea where the hole ends, give up (unlikely to happen) */
>>>>> +        goto dunno;
>>>>> +    }
>>>>> +    assert(offs >= start);
>>>>> +    *hole = start;
>>>>> +    *data = offs;
>>>> This assertion feels like an off-by-one.  The same offset cannot be both
>>>> a hole and data (except in some racy situation where some other process
>>>> is writing data to that offset in between our two lseek calls, but
>>>> that's already in no-man's land because no one else should be writing
>>>> the file while qemu has it open).  Is it worth using 'assert(offs >
>>>> start)' instead?
>>> As soon as you say "except", it's wrong to assert this at all. We can't
>>> guarantee that the condition is true and it's not a programming error
>>> in qemu if it's false. Sounds to me as if it should be a normal error
>>> check rather than an assertion.
>>>
>>> Also, what happens after EOF? I haven't read the patch yet, maybe it
>>> handles the situation already earlier, but if it doesn't, won't we get
>>> offset == start then?
>> raw_co_get_block_status() already bails out if start is at or beyond EOF.
> Okay, so that's basically the same "except" as above.
>
> Except that the window for the race is much larger because the
> raw_co_get_block_status() check uses the cached value, so any file size
> change in the background after qemu has opened the image would trigger
> an assertion failure.

Well, iotest 102 tests exactly that.

Max

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13 12:00         ` Kevin Wolf
  2014-11-13 12:05           ` Max Reitz
@ 2014-11-13 12:38           ` Markus Armbruster
  2014-11-13 13:10             ` Kevin Wolf
  1 sibling, 1 reply; 20+ messages in thread
From: Markus Armbruster @ 2014-11-13 12:38 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, stefanha, qemu-devel, tony, Max Reitz

Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.11.2014 um 12:45 hat Max Reitz geschrieben:
>> On 2014-11-13 at 12:40, Kevin Wolf wrote:
>> >Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
>> >>On 11/12/2014 01:27 PM, Markus Armbruster wrote:
>> >>>+    /* in hole, end not yet known */
>> >>>+    offs = lseek(s->fd, start, SEEK_DATA);
>> >>>+    if (offs < 0) {
>> >>>+        /* no idea where the hole ends, give up (unlikely to happen) */
>> >>>+        goto dunno;
>> >>>+    }
>> >>>+    assert(offs >= start);
>> >>>+    *hole = start;
>> >>>+    *data = offs;
>> >>This assertion feels like an off-by-one.  The same offset cannot be both
>> >>a hole and data (except in some racy situation where some other process
>> >>is writing data to that offset in between our two lseek calls, but
>> >>that's already in no-man's land because no one else should be writing
>> >>the file while qemu has it open).  Is it worth using 'assert(offs >
>> >>start)' instead?
>> >As soon as you say "except", it's wrong to assert this at all. We can't
>> >guarantee that the condition is true and it's not a programming error
>> >in qemu if it's false. Sounds to me as if it should be a normal error
>> >check rather than an assertion.

You're right, it's not necessarily a programming error, it could also be
caused by another process filling in holes behind our back.  We need to
handle == some other way.  We could start over, but I figure return
-EBUSY is simpler and good enough for this corner case.

>> >Also, what happens after EOF? I haven't read the patch yet, maybe it
>> >handles the situation already earlier, but if it doesn't, won't we get
>> >offset == start then?
>> 
>> raw_co_get_block_status() already bails out if start is at or beyond EOF.
>
> Okay, so that's basically the same "except" as above.
>
> Except that the window for the race is much larger because the
> raw_co_get_block_status() check uses the cached value, so any file size
> change in the background after qemu has opened the image would trigger
> an assertion failure.

Bails out like this:

    total_size = bdrv_getlength(bs);
    if (total_size < 0) {
        return total_size;

Can't actually happen, because bdrv_nb_sectors() can fail only if
!bs->drv (surely false here), or drv->has_variable_length (also false
here).

    } else if (start >= total_size) {
        *pnum = 0;
        return 0;

If something else has lengthened the file, we simply refuse to notice.

    } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
        nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
    }

Likewise.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
  2014-11-13 12:38           ` Markus Armbruster
@ 2014-11-13 13:10             ` Kevin Wolf
  0 siblings, 0 replies; 20+ messages in thread
From: Kevin Wolf @ 2014-11-13 13:10 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: pbonzini, stefanha, qemu-devel, tony, Max Reitz

Am 13.11.2014 um 13:38 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 13.11.2014 um 12:45 hat Max Reitz geschrieben:
> >> On 2014-11-13 at 12:40, Kevin Wolf wrote:
> >> >Am 13.11.2014 um 00:25 hat Eric Blake geschrieben:
> >> >>On 11/12/2014 01:27 PM, Markus Armbruster wrote:
> >> >>>+    /* in hole, end not yet known */
> >> >>>+    offs = lseek(s->fd, start, SEEK_DATA);
> >> >>>+    if (offs < 0) {
> >> >>>+        /* no idea where the hole ends, give up (unlikely to happen) */
> >> >>>+        goto dunno;
> >> >>>+    }
> >> >>>+    assert(offs >= start);
> >> >>>+    *hole = start;
> >> >>>+    *data = offs;
> >> >>This assertion feels like an off-by-one.  The same offset cannot be both
> >> >>a hole and data (except in some racy situation where some other process
> >> >>is writing data to that offset in between our two lseek calls, but
> >> >>that's already in no-man's land because no one else should be writing
> >> >>the file while qemu has it open).  Is it worth using 'assert(offs >
> >> >>start)' instead?
> >> >As soon as you say "except", it's wrong to assert this at all. We can't
> >> >guarantee that the condition is true and it's not a programming error
> >> >in qemu if it's false. Sounds to me as if it should be a normal error
> >> >check rather than an assertion.
> 
> You're right, it's not necessarily a programming error, it could also be
> caused by another process filling in holes behind our back.  We need to
> handle == some other way.  We could start over, but I figure return
> -EBUSY is simpler and good enough for this corner case.
> 
> >> >Also, what happens after EOF? I haven't read the patch yet, maybe it
> >> >handles the situation already earlier, but if it doesn't, won't we get
> >> >offset == start then?
> >> 
> >> raw_co_get_block_status() already bails out if start is at or beyond EOF.
> >
> > Okay, so that's basically the same "except" as above.
> >
> > Except that the window for the race is much larger because the
> > raw_co_get_block_status() check uses the cached value, so any file size
> > change in the background after qemu has opened the image would trigger
> > an assertion failure.
> 
> Bails out like this:
> 
>     total_size = bdrv_getlength(bs);
>     if (total_size < 0) {
>         return total_size;
> 
> Can't actually happen, because bdrv_nb_sectors() can fail only if
> !bs->drv (surely false here), or drv->has_variable_length (also false
> here).
> 
>     } else if (start >= total_size) {
>         *pnum = 0;
>         return 0;
> 
> If something else has lengthened the file, we simply refuse to notice.
> 
>     } else if (start + nb_sectors * BDRV_SECTOR_SIZE > total_size) {
>         nb_sectors = DIV_ROUND_UP(total_size - start, BDRV_SECTOR_SIZE);
>     }
> 
> Likewise.

If something has shortened the file outside of qemu, total_size still
has the old larger size and we don't restrict nb_sectors to an area
before EOF.

I was however confused about the lseek() behaviour after EOF and assumed
that we would get offs == start and an assertion failure. In fact,
however, we get -ENXIO, which is probably good enough for this case.

So with your code, the problem only exists for external modification
between our two lseek() calls, not for any resize after opening the
image.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-11-13 13:11 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-11-12 19:27 [Qemu-devel] [PATCH 0/2] raw-posix: Get rid of FIEMAP Markus Armbruster
2014-11-12 19:27 ` [Qemu-devel] [PATCH 1/2] raw-posix: Fix comment for raw_co_get_block_status() Markus Armbruster
2014-11-12 23:18   ` Eric Blake
2014-11-13  1:46   ` Fam Zheng
2014-11-13  8:39   ` Max Reitz
2014-11-12 19:27 ` [Qemu-devel] [PATCH 2/2] raw-posix: SEEK_HOLE suffices, get rid of FIEMAP Markus Armbruster
2014-11-12 23:25   ` Eric Blake
2014-11-13  8:53     ` Markus Armbruster
2014-11-13 11:40     ` Kevin Wolf
2014-11-13 11:45       ` Max Reitz
2014-11-13 12:00         ` Kevin Wolf
2014-11-13 12:05           ` Max Reitz
2014-11-13 12:38           ` Markus Armbruster
2014-11-13 13:10             ` Kevin Wolf
2014-11-13  2:21   ` Fam Zheng
2014-11-13  8:26     ` Markus Armbruster
2014-11-13  8:39   ` Max Reitz
2014-11-13  9:25     ` Markus Armbruster
2014-11-12 22:14 ` [Qemu-devel] [PATCH 0/2] raw-posix: Get " Paolo Bonzini
2014-11-13  8:53   ` Markus Armbruster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.