All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-img convert cache mode for source
@ 2014-02-26 10:14 Peter Lieven
  2014-02-26 15:41 ` Stefan Hajnoczi
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-02-26 10:14 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf, Stefan Hajnoczi, qemu-devel

Hi,

I was wondering if it would be a good idea to set the O_DIRECT mode for the source
files of a qemu-img convert process if the source is a host_device?

Currently the backup of a host device is polluting the page cache.

Peter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-26 10:14 [Qemu-devel] qemu-img convert cache mode for source Peter Lieven
@ 2014-02-26 15:41 ` Stefan Hajnoczi
  2014-02-26 15:54   ` Eric Blake
                     ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-02-26 15:41 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> files of a qemu-img convert process if the source is a host_device?
> 
> Currently the backup of a host device is polluting the page cache.

Points to consider:

1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
   the file.  A fallback is necessary.

2. O_DIRECT has no readahead so performance could actually decrease.
   The question is, how important is reahead versus polluting page
   cache?

3. For raw files it would make sense to tell the kernel that access is
   sequential and data will be used only once.  Then we can get the best
   of both worlds (avoid polluting page cache but still get readahead).
   This is done using posix_fadvise(2).

   The problem is what to do for image formats.  An image file can be
   very fragmented so the readahead might not be a win.  Does this mean
   that for image formats we should tell the kernel access will be
   random?

   Furthermore, maybe it's best to do readahead inside QEMU so that even
   network protocols (nbd, iscsi, etc) can get good performance.  They
   act like O_DIRECT is always on.

It seems reasonable to investigate this stuff more.  Please run
benchmarks so we have justification to merge patches.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-26 15:41 ` Stefan Hajnoczi
@ 2014-02-26 15:54   ` Eric Blake
  2014-02-26 16:01   ` Peter Lieven
  2014-02-27  1:10   ` Fam Zheng
  2 siblings, 0 replies; 29+ messages in thread
From: Eric Blake @ 2014-02-26 15:54 UTC (permalink / raw)
  To: Stefan Hajnoczi, Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 1841 bytes --]

On 02/26/2014 08:41 AM, Stefan Hajnoczi wrote:
> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>> files of a qemu-img convert process if the source is a host_device?
>>
>> Currently the backup of a host device is polluting the page cache.
> 
> Points to consider:
> 
> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>    the file.  A fallback is necessary.
> 
> 2. O_DIRECT has no readahead so performance could actually decrease.
>    The question is, how important is reahead versus polluting page
>    cache?
> 
> 3. For raw files it would make sense to tell the kernel that access is
>    sequential and data will be used only once.  Then we can get the best
>    of both worlds (avoid polluting page cache but still get readahead).
>    This is done using posix_fadvise(2).

Except that posix_fadvise is advisory only (the kernel is free to ignore
it), and currently not stateful enough inside the kernel to be useful
when handing fds between processes.  For several years now, I've asked
if the kernel could provide better guarantees about what posix_fadvise
can actually do, and expose user-space introspection of those guarantees
through procfs and/or fpathconf.

See https://bugzilla.redhat.com/show_bug.cgi?id=634653 for some
backstory on libvirt's dealings with O_DIRECT. I'd really like to ditch
libvirt's use of O_DIRECT in favor of posix_fadvise for avoiding page
cache pollution, but the kernel isn't at a point yet that lets libvirt
do that.  I suppose that if the kernel ever does improve posix_fadvise,
then both libvirt and qemu would benefit from it.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-26 15:41 ` Stefan Hajnoczi
  2014-02-26 15:54   ` Eric Blake
@ 2014-02-26 16:01   ` Peter Lieven
  2014-02-27  8:57     ` Stefan Hajnoczi
  2014-02-27  1:10   ` Fam Zheng
  2 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-02-26 16:01 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 26.02.2014 16:41, Stefan Hajnoczi wrote:
> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>> files of a qemu-img convert process if the source is a host_device?
>>
>> Currently the backup of a host device is polluting the page cache.
> Points to consider:
>
> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>     the file.  A fallback is necessary.
>
> 2. O_DIRECT has no readahead so performance could actually decrease.
>     The question is, how important is reahead versus polluting page
>     cache?
>
> 3. For raw files it would make sense to tell the kernel that access is
>     sequential and data will be used only once.  Then we can get the best
>     of both worlds (avoid polluting page cache but still get readahead).
>     This is done using posix_fadvise(2).
>
>     The problem is what to do for image formats.  An image file can be
>     very fragmented so the readahead might not be a win.  Does this mean
>     that for image formats we should tell the kernel access will be
>     random?
>
>     Furthermore, maybe it's best to do readahead inside QEMU so that even
>     network protocols (nbd, iscsi, etc) can get good performance.  They
>     act like O_DIRECT is always on.
your comments are regarding qemu-img convert, right?
How would you implement this? A new open flag because
the fadvise had to goto inside the protocol driver.

I would start with host_devices first and see how it performs there.

For qemu-img convert I would issue a FADV_DONTNEED after
a write for the bytes that have been written
(i have tested this with Linux and it seems to work quite well).

Question is, what is the right paramter for reads? Also FADV_DONTNEED?

Peter
>
> It seems reasonable to investigate this stuff more.  Please run
> benchmarks so we have justification to merge patches.
>
> Stefan
Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...........................................................

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-26 15:41 ` Stefan Hajnoczi
  2014-02-26 15:54   ` Eric Blake
  2014-02-26 16:01   ` Peter Lieven
@ 2014-02-27  1:10   ` Fam Zheng
  2014-02-27 11:07     ` Kevin Wolf
  2 siblings, 1 reply; 29+ messages in thread
From: Fam Zheng @ 2014-02-27  1:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Peter Lieven, qemu-devel, Stefan Hajnoczi

On Wed, 02/26 16:41, Stefan Hajnoczi wrote:
> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> > I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> > files of a qemu-img convert process if the source is a host_device?
> > 
> > Currently the backup of a host device is polluting the page cache.

Peter, can you give some more detailed explanation of the issue? An example or
benchmark will help a lot. As I understand, one time scanning of a file doesn't
promote the page cache to active list, so it probably won't hurt real hot cache
at all, and will get replaced very soon.

Considering readahead and page cache on metadata, I'm not sure if forcing
O_DIRECT is a good idea.

>    The problem is what to do for image formats.  An image file can be
>    very fragmented so the readahead might not be a win.  Does this mean
>    that for image formats we should tell the kernel access will be
>    random?
> 
>    Furthermore, maybe it's best to do readahead inside QEMU so that even
>    network protocols (nbd, iscsi, etc) can get good performance.  They
>    act like O_DIRECT is always on.

Also, experience with booting a network backed guest can be greatly improved,
because sometimes BIOS and bootloader are simple minded and load a kernel or
initrd by sending thousands of 1 sector requests with iodepth=1, which means
latency of network based IO hurts a lot.

Doing readahead in QEMU makes sense for image formats as well, because we know
where the next data cluster is better than kernel. But again, we are
replicating things from kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-26 16:01   ` Peter Lieven
@ 2014-02-27  8:57     ` Stefan Hajnoczi
  2014-02-28 14:35       ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-02-27  8:57 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Paolo Bonzini

On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
> >On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> >>I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> >>files of a qemu-img convert process if the source is a host_device?
> >>
> >>Currently the backup of a host device is polluting the page cache.
> >Points to consider:
> >
> >1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
> >    the file.  A fallback is necessary.
> >
> >2. O_DIRECT has no readahead so performance could actually decrease.
> >    The question is, how important is reahead versus polluting page
> >    cache?
> >
> >3. For raw files it would make sense to tell the kernel that access is
> >    sequential and data will be used only once.  Then we can get the best
> >    of both worlds (avoid polluting page cache but still get readahead).
> >    This is done using posix_fadvise(2).
> >
> >    The problem is what to do for image formats.  An image file can be
> >    very fragmented so the readahead might not be a win.  Does this mean
> >    that for image formats we should tell the kernel access will be
> >    random?
> >
> >    Furthermore, maybe it's best to do readahead inside QEMU so that even
> >    network protocols (nbd, iscsi, etc) can get good performance.  They
> >    act like O_DIRECT is always on.
> your comments are regarding qemu-img convert, right?
> How would you implement this? A new open flag because
> the fadvise had to goto inside the protocol driver.
> 
> I would start with host_devices first and see how it performs there.
> 
> For qemu-img convert I would issue a FADV_DONTNEED after
> a write for the bytes that have been written
> (i have tested this with Linux and it seems to work quite well).
> 
> Question is, what is the right paramter for reads? Also FADV_DONTNEED?

I think so but this should be justified with benchmark results.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-27  1:10   ` Fam Zheng
@ 2014-02-27 11:07     ` Kevin Wolf
  2014-02-27 16:12       ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2014-02-27 11:07 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, Stefan Hajnoczi,
	Paolo Bonzini

Am 27.02.2014 um 02:10 hat Fam Zheng geschrieben:
> On Wed, 02/26 16:41, Stefan Hajnoczi wrote:
> > On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> > > I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> > > files of a qemu-img convert process if the source is a host_device?
> > > 
> > > Currently the backup of a host device is polluting the page cache.
> 
> Peter, can you give some more detailed explanation of the issue? An example or
> benchmark will help a lot. As I understand, one time scanning of a file doesn't
> promote the page cache to active list, so it probably won't hurt real hot cache
> at all, and will get replaced very soon.
> 
> Considering readahead and page cache on metadata, I'm not sure if forcing
> O_DIRECT is a good idea.
> 
> >    The problem is what to do for image formats.  An image file can be
> >    very fragmented so the readahead might not be a win.  Does this mean
> >    that for image formats we should tell the kernel access will be
> >    random?
> > 
> >    Furthermore, maybe it's best to do readahead inside QEMU so that even
> >    network protocols (nbd, iscsi, etc) can get good performance.  They
> >    act like O_DIRECT is always on.
> 
> Also, experience with booting a network backed guest can be greatly improved,
> because sometimes BIOS and bootloader are simple minded and load a kernel or
> initrd by sending thousands of 1 sector requests with iodepth=1, which means
> latency of network based IO hurts a lot.

I think I mentioned it a while ago, but our IDE emulation plays a role
in this as well. PIO requests are always handled sector by sector, no
matter how big the request was that we got from the BIOS.

Kevin

> Doing readahead in QEMU makes sense for image formats as well, because we know
> where the next data cluster is better than kernel. But again, we are
> replicating things from kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-27 11:07     ` Kevin Wolf
@ 2014-02-27 16:12       ` Peter Lieven
  2014-03-03 10:40         ` Kevin Wolf
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-02-27 16:12 UTC (permalink / raw)
  To: Kevin Wolf, Fam Zheng
  Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 27.02.2014 12:07, schrieb Kevin Wolf:
> Am 27.02.2014 um 02:10 hat Fam Zheng geschrieben:
>> On Wed, 02/26 16:41, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>> files of a qemu-img convert process if the source is a host_device?
>>>>
>>>> Currently the backup of a host device is polluting the page cache.
>> Peter, can you give some more detailed explanation of the issue? An example or
>> benchmark will help a lot. As I understand, one time scanning of a file doesn't
>> promote the page cache to active list, so it probably won't hurt real hot cache
>> at all, and will get replaced very soon.
>>
>> Considering readahead and page cache on metadata, I'm not sure if forcing
>> O_DIRECT is a good idea.
>>
>>>    The problem is what to do for image formats.  An image file can be
>>>    very fragmented so the readahead might not be a win.  Does this mean
>>>    that for image formats we should tell the kernel access will be
>>>    random?
>>>
>>>    Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>    network protocols (nbd, iscsi, etc) can get good performance.  They
>>>    act like O_DIRECT is always on.
>> Also, experience with booting a network backed guest can be greatly improved,
>> because sometimes BIOS and bootloader are simple minded and load a kernel or
>> initrd by sending thousands of 1 sector requests with iodepth=1, which means
>> latency of network based IO hurts a lot.
> I think I mentioned it a while ago, but our IDE emulation plays a role
> in this as well. PIO requests are always handled sector by sector, no
> matter how big the request was that we got from the BIOS.
Yes, you have pointed that out before. How complicated is it to fix this?

Peter
>
> Kevin
>
>> Doing readahead in QEMU makes sense for image formats as well, because we know
>> where the next data cluster is better than kernel. But again, we are
>> replicating things from kernel.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-27  8:57     ` Stefan Hajnoczi
@ 2014-02-28 14:35       ` Peter Lieven
  2014-03-03 10:38         ` Kevin Wolf
  2014-03-03 12:03         ` Stefan Hajnoczi
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Lieven @ 2014-02-28 14:35 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Paolo Bonzini

On 27.02.2014 09:57, Stefan Hajnoczi wrote:
> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
>> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>> files of a qemu-img convert process if the source is a host_device?
>>>>
>>>> Currently the backup of a host device is polluting the page cache.
>>> Points to consider:
>>>
>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>>>     the file.  A fallback is necessary.
>>>
>>> 2. O_DIRECT has no readahead so performance could actually decrease.
>>>     The question is, how important is reahead versus polluting page
>>>     cache?
>>>
>>> 3. For raw files it would make sense to tell the kernel that access is
>>>     sequential and data will be used only once.  Then we can get the best
>>>     of both worlds (avoid polluting page cache but still get readahead).
>>>     This is done using posix_fadvise(2).
>>>
>>>     The problem is what to do for image formats.  An image file can be
>>>     very fragmented so the readahead might not be a win.  Does this mean
>>>     that for image formats we should tell the kernel access will be
>>>     random?
>>>
>>>     Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>     network protocols (nbd, iscsi, etc) can get good performance.  They
>>>     act like O_DIRECT is always on.
>> your comments are regarding qemu-img convert, right?
>> How would you implement this? A new open flag because
>> the fadvise had to goto inside the protocol driver.
>>
>> I would start with host_devices first and see how it performs there.
>>
>> For qemu-img convert I would issue a FADV_DONTNEED after
>> a write for the bytes that have been written
>> (i have tested this with Linux and it seems to work quite well).
>>
>> Question is, what is the right paramter for reads? Also FADV_DONTNEED?
> I think so but this should be justified with benchmark results.

I ran some benchmarks at found that a FADV_DONTNEED issues after
a read does not hurt regarding to performance. But it avoids buffers
increasing while I read from a host_device of raw file.
As for writing it does only work if I issue a fdatasync after each write, but
this should be equivalent to O_DIRECT. So I would keep the patch
to support qemu-img convert sources if they are host_device or file.

Here is a proposal for a patch:

diff --git a/block.c b/block.c
index 2fd5482..2445433 100644
--- a/block.c
+++ b/block.c
@@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
              qemu_aio_wait();
          }
      }
+
+#ifdef POSIX_FADV_DONTNEED
+    if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
+        bs->drv->bdrv_fadvise && !is_write) {
+        bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
+    }
+#endif
+
      return rwco.ret;
  }

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 161ea14..d8d78d8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1397,6 +1397,12 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
      return 0;
  }

+static int raw_fadvise(BlockDriverState *bs, off_t offset, off_t len, int advise)
+{
+    BDRVRawState *s = bs->opaque;
+    return posix_fadvise(s->fd, offset, len, advise);
+}
+
  static QEMUOptionParameter raw_create_options[] = {
      {
          .name = BLOCK_OPT_SIZE,
@@ -1433,6 +1439,7 @@ static BlockDriver bdrv_file = {
      .bdrv_get_info = raw_get_info,
      .bdrv_get_allocated_file_size
                          = raw_get_allocated_file_size,
+    .bdrv_fadvise = raw_fadvise,

      .create_options = raw_create_options,
  };
@@ -1811,6 +1818,7 @@ static BlockDriver bdrv_host_device = {
      .bdrv_get_info = raw_get_info,
      .bdrv_get_allocated_file_size
                          = raw_get_allocated_file_size,
+    .bdrv_fadvise = raw_fadvise,

      /* generic scsi device */
  #ifdef __linux__
diff --git a/block/raw_bsd.c b/block/raw_bsd.c
index 01ea692..f09bc70 100644
--- a/block/raw_bsd.c
+++ b/block/raw_bsd.c
@@ -171,6 +171,15 @@ static int raw_probe(const uint8_t *buf, int buf_size, const char *filename)
      return 1;
  }

+static int raw_fadvise(BlockDriverState *bs, off_t offset, off_t len, int advise)
+{
+    if (bs->file->drv->bdrv_fadvise) {
+        return bs->file->drv->bdrv_fadvise(bs->file, offset, len, advise);
+    }
+    return 0;
+}
+
+
  static BlockDriver bdrv_raw = {
      .format_name          = "raw",
      .bdrv_probe           = &raw_probe,
@@ -195,7 +204,8 @@ static BlockDriver bdrv_raw = {
      .bdrv_ioctl           = &raw_ioctl,
      .bdrv_aio_ioctl       = &raw_aio_ioctl,
      .create_options       = &raw_create_options[0],
-    .bdrv_has_zero_init   = &raw_has_zero_init
+    .bdrv_has_zero_init   = &raw_has_zero_init,
+    .bdrv_fadvise         = &raw_fadvise,
  };

  static void bdrv_raw_init(void)
diff --git a/include/block/block.h b/include/block/block.h
index 780f48b..a4dcc3c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -105,6 +105,9 @@ typedef enum {
  #define BDRV_O_PROTOCOL    0x8000  /* if no block driver is explicitly given:
                                        select an appropriate protocol driver,
                                        ignoring the format layer */
+#define BDRV_O_SEQUENTIAL 0x10000  /* open device for sequential read/write */
+
+

  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0bcf1c9..7efad55 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -246,6 +246,8 @@ struct BlockDriver {
       * zeros, 0 otherwise.
       */
      int (*bdrv_has_zero_init)(BlockDriverState *bs);
+
+    int (*bdrv_fadvise)(BlockDriverState *bs, off_t offset, off_t len, int advise);

      QLIST_ENTRY(BlockDriver) list;
  };
diff --git a/qemu-img.c b/qemu-img.c
index 78fc868..2b900d0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1298,7 +1298,8 @@ static int img_convert(int argc, char **argv)

      total_sectors = 0;
      for (bs_i = 0; bs_i < bs_n; bs_i++) {
-        bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BDRV_O_FLAGS, true,
+        bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt,
+                                 BDRV_O_FLAGS | BDRV_O_SEQUENTIAL, true,
                                   quiet);
          if (!bs[bs_i]) {
              error_report("Could not open '%s'", argv[optind + bs_i]);
........................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael 
Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ...........................................................

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-28 14:35       ` Peter Lieven
@ 2014-03-03 10:38         ` Kevin Wolf
  2014-03-03 11:20           ` Peter Lieven
  2014-03-03 12:03         ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: Kevin Wolf @ 2014-03-03 10:38 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben:
> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
> >On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
> >>On 26.02.2014 16:41, Stefan Hajnoczi wrote:
> >>>On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> >>>>I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> >>>>files of a qemu-img convert process if the source is a host_device?
> >>>>
> >>>>Currently the backup of a host device is polluting the page cache.
> >>>Points to consider:
> >>>
> >>>1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
> >>>    the file.  A fallback is necessary.
> >>>
> >>>2. O_DIRECT has no readahead so performance could actually decrease.
> >>>    The question is, how important is reahead versus polluting page
> >>>    cache?
> >>>
> >>>3. For raw files it would make sense to tell the kernel that access is
> >>>    sequential and data will be used only once.  Then we can get the best
> >>>    of both worlds (avoid polluting page cache but still get readahead).
> >>>    This is done using posix_fadvise(2).
> >>>
> >>>    The problem is what to do for image formats.  An image file can be
> >>>    very fragmented so the readahead might not be a win.  Does this mean
> >>>    that for image formats we should tell the kernel access will be
> >>>    random?
> >>>
> >>>    Furthermore, maybe it's best to do readahead inside QEMU so that even
> >>>    network protocols (nbd, iscsi, etc) can get good performance.  They
> >>>    act like O_DIRECT is always on.
> >>your comments are regarding qemu-img convert, right?
> >>How would you implement this? A new open flag because
> >>the fadvise had to goto inside the protocol driver.
> >>
> >>I would start with host_devices first and see how it performs there.
> >>
> >>For qemu-img convert I would issue a FADV_DONTNEED after
> >>a write for the bytes that have been written
> >>(i have tested this with Linux and it seems to work quite well).
> >>
> >>Question is, what is the right paramter for reads? Also FADV_DONTNEED?
> >I think so but this should be justified with benchmark results.
> 
> I ran some benchmarks at found that a FADV_DONTNEED issues after
> a read does not hurt regarding to performance. But it avoids buffers
> increasing while I read from a host_device of raw file.

Okay, sounds reasonable.

> As for writing it does only work if I issue a fdatasync after each write, but
> this should be equivalent to O_DIRECT. So I would keep the patch
> to support qemu-img convert sources if they are host_device or file.

Doing an fdatasync() is not an option (and not equivalent to O_DIRECT
at all).

> Here is a proposal for a patch:
> 
> diff --git a/block.c b/block.c
> index 2fd5482..2445433 100644
> --- a/block.c
> +++ b/block.c
> @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
>              qemu_aio_wait();
>          }
>      }
> +
> +#ifdef POSIX_FADV_DONTNEED
> +    if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
> +        bs->drv->bdrv_fadvise && !is_write) {
> +        bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
> +    }
> +#endif
> +

This #ifdef should be in the raw-posix driver. Please try to keep the
qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
friends as an implementation detail of block drivers.

> diff --git a/include/block/block.h b/include/block/block.h
> index 780f48b..a4dcc3c 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -105,6 +105,9 @@ typedef enum {
>  #define BDRV_O_PROTOCOL    0x8000  /* if no block driver is explicitly given:
>                                        select an appropriate protocol driver,
>                                        ignoring the format layer */
> +#define BDRV_O_SEQUENTIAL 0x10000  /* open device for sequential read/write */
> +
> +
> 
>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)

Why two additional newlines?

BDRV_O_SEQUENTIAL works for me as the external interface.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-27 16:12       ` Peter Lieven
@ 2014-03-03 10:40         ` Kevin Wolf
  0 siblings, 0 replies; 29+ messages in thread
From: Kevin Wolf @ 2014-03-03 10:40 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Stefan Hajnoczi, Fam Zheng, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 27.02.2014 um 17:12 hat Peter Lieven geschrieben:
> Am 27.02.2014 12:07, schrieb Kevin Wolf:
> > Am 27.02.2014 um 02:10 hat Fam Zheng geschrieben:
> >> On Wed, 02/26 16:41, Stefan Hajnoczi wrote:
> >>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> >>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> >>>> files of a qemu-img convert process if the source is a host_device?
> >>>>
> >>>> Currently the backup of a host device is polluting the page cache.
> >> Peter, can you give some more detailed explanation of the issue? An example or
> >> benchmark will help a lot. As I understand, one time scanning of a file doesn't
> >> promote the page cache to active list, so it probably won't hurt real hot cache
> >> at all, and will get replaced very soon.
> >>
> >> Considering readahead and page cache on metadata, I'm not sure if forcing
> >> O_DIRECT is a good idea.
> >>
> >>>    The problem is what to do for image formats.  An image file can be
> >>>    very fragmented so the readahead might not be a win.  Does this mean
> >>>    that for image formats we should tell the kernel access will be
> >>>    random?
> >>>
> >>>    Furthermore, maybe it's best to do readahead inside QEMU so that even
> >>>    network protocols (nbd, iscsi, etc) can get good performance.  They
> >>>    act like O_DIRECT is always on.
> >> Also, experience with booting a network backed guest can be greatly improved,
> >> because sometimes BIOS and bootloader are simple minded and load a kernel or
> >> initrd by sending thousands of 1 sector requests with iodepth=1, which means
> >> latency of network based IO hurts a lot.
> > I think I mentioned it a while ago, but our IDE emulation plays a role
> > in this as well. PIO requests are always handled sector by sector, no
> > matter how big the request was that we got from the BIOS.
> Yes, you have pointed that out before. How complicated is it to fix this?

I don't think it should be very complicated, but it basically means a
rewrite of the PIO read/write code, so there are chances to introduce
new bugs.

Kevin

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-03 10:38         ` Kevin Wolf
@ 2014-03-03 11:20           ` Peter Lieven
  2014-03-03 12:59             ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-03-03 11:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On 03.03.2014 11:38, Kevin Wolf wrote:
> Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben:
>> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
>>>> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
>>>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>>>> files of a qemu-img convert process if the source is a host_device?
>>>>>>
>>>>>> Currently the backup of a host device is polluting the page cache.
>>>>> Points to consider:
>>>>>
>>>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>>>>>     the file.  A fallback is necessary.
>>>>>
>>>>> 2. O_DIRECT has no readahead so performance could actually decrease.
>>>>>     The question is, how important is reahead versus polluting page
>>>>>     cache?
>>>>>
>>>>> 3. For raw files it would make sense to tell the kernel that access is
>>>>>     sequential and data will be used only once.  Then we can get the best
>>>>>     of both worlds (avoid polluting page cache but still get readahead).
>>>>>     This is done using posix_fadvise(2).
>>>>>
>>>>>     The problem is what to do for image formats.  An image file can be
>>>>>     very fragmented so the readahead might not be a win.  Does this mean
>>>>>     that for image formats we should tell the kernel access will be
>>>>>     random?
>>>>>
>>>>>     Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>>>     network protocols (nbd, iscsi, etc) can get good performance.  They
>>>>>     act like O_DIRECT is always on.
>>>> your comments are regarding qemu-img convert, right?
>>>> How would you implement this? A new open flag because
>>>> the fadvise had to goto inside the protocol driver.
>>>>
>>>> I would start with host_devices first and see how it performs there.
>>>>
>>>> For qemu-img convert I would issue a FADV_DONTNEED after
>>>> a write for the bytes that have been written
>>>> (i have tested this with Linux and it seems to work quite well).
>>>>
>>>> Question is, what is the right paramter for reads? Also FADV_DONTNEED?
>>> I think so but this should be justified with benchmark results.
>> I ran some benchmarks at found that a FADV_DONTNEED issues after
>> a read does not hurt regarding to performance. But it avoids buffers
>> increasing while I read from a host_device of raw file.
> Okay, sounds reasonable.
>
>> As for writing it does only work if I issue a fdatasync after each write, but
>> this should be equivalent to O_DIRECT. So I would keep the patch
>> to support qemu-img convert sources if they are host_device or file.
> Doing an fdatasync() is not an option (and not equivalent to O_DIRECT
> at all).
of course, fdatasync is no option. I just wanted to point out that one
should use O_DIRECT if the pagecache should be disable for the output
file.
>
>> Here is a proposal for a patch:
>>
>> diff --git a/block.c b/block.c
>> index 2fd5482..2445433 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
>>               qemu_aio_wait();
>>           }
>>       }
>> +
>> +#ifdef POSIX_FADV_DONTNEED
>> +    if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
>> +        bs->drv->bdrv_fadvise && !is_write) {
>> +        bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
>> +    }
>> +#endif
>> +
> This #ifdef should be in the raw-posix driver. Please try to keep the
> qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
> friends as an implementation detail of block drivers.
I had the same idee, but as far as I see the callback to the completes
request is handled in block.c. raw-posix uses the aio interface and
not coroutines.
>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 780f48b..a4dcc3c 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -105,6 +105,9 @@ typedef enum {
>>   #define BDRV_O_PROTOCOL    0x8000  /* if no block driver is explicitly given:
>>                                         select an appropriate protocol driver,
>>                                         ignoring the format layer */
>> +#define BDRV_O_SEQUENTIAL 0x10000  /* open device for sequential read/write */
>> +
>> +
>>
>>   #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
> Why two additional newlines?
typo. this patch war basically an RFC :-)
>
> BDRV_O_SEQUENTIAL works for me as the external interface.
>
> Kevin

Peter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-02-28 14:35       ` Peter Lieven
  2014-03-03 10:38         ` Kevin Wolf
@ 2014-03-03 12:03         ` Stefan Hajnoczi
  2014-03-03 12:20           ` Peter Lieven
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-03-03 12:03 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Fri, Feb 28, 2014 at 03:35:05PM +0100, Peter Lieven wrote:
> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
> >On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
> >>On 26.02.2014 16:41, Stefan Hajnoczi wrote:
> >>>On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
> >>>>I was wondering if it would be a good idea to set the O_DIRECT mode for the source
> >>>>files of a qemu-img convert process if the source is a host_device?
> >>>>
> >>>>Currently the backup of a host device is polluting the page cache.
> >>>Points to consider:
> >>>
> >>>1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
> >>>    the file.  A fallback is necessary.
> >>>
> >>>2. O_DIRECT has no readahead so performance could actually decrease.
> >>>    The question is, how important is reahead versus polluting page
> >>>    cache?
> >>>
> >>>3. For raw files it would make sense to tell the kernel that access is
> >>>    sequential and data will be used only once.  Then we can get the best
> >>>    of both worlds (avoid polluting page cache but still get readahead).
> >>>    This is done using posix_fadvise(2).
> >>>
> >>>    The problem is what to do for image formats.  An image file can be
> >>>    very fragmented so the readahead might not be a win.  Does this mean
> >>>    that for image formats we should tell the kernel access will be
> >>>    random?
> >>>
> >>>    Furthermore, maybe it's best to do readahead inside QEMU so that even
> >>>    network protocols (nbd, iscsi, etc) can get good performance.  They
> >>>    act like O_DIRECT is always on.
> >>your comments are regarding qemu-img convert, right?
> >>How would you implement this? A new open flag because
> >>the fadvise had to goto inside the protocol driver.
> >>
> >>I would start with host_devices first and see how it performs there.
> >>
> >>For qemu-img convert I would issue a FADV_DONTNEED after
> >>a write for the bytes that have been written
> >>(i have tested this with Linux and it seems to work quite well).
> >>
> >>Question is, what is the right paramter for reads? Also FADV_DONTNEED?
> >I think so but this should be justified with benchmark results.
> 
> I ran some benchmarks at found that a FADV_DONTNEED issues after
> a read does not hurt regarding to performance. But it avoids buffers
> increasing while I read from a host_device of raw file.

It was mentioned in this thread that a sequential shouldn't promote the
pages anyway - they should be dropped by the kernel if there is memory
pressure.

So what is the actual performance problem you are trying to solve and
what benchmark output are you getting when you compare with
FADV_DONTNEED against without FADV_DONTNEED?

I think there's a danger that the discussion will go around in circles.
Please post the performance results that kicked off this whole effort
and let's focus on the data.  That way it's much easier to evaluate what
changes to QEMU are a win and which are not necessary.

> As for writing it does only work if I issue a fdatasync after each write, but
> this should be equivalent to O_DIRECT. So I would keep the patch
> to support qemu-img convert sources if they are host_device or file.

fdatasync(2) is much more heavy-weight than writing out a pages because
it sends a disk write cache flush command and waits for it to complete.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-03 12:03         ` Stefan Hajnoczi
@ 2014-03-03 12:20           ` Peter Lieven
  2014-03-04  9:24             ` Stefan Hajnoczi
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-03-03 12:20 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On 03.03.2014 13:03, Stefan Hajnoczi wrote:
> On Fri, Feb 28, 2014 at 03:35:05PM +0100, Peter Lieven wrote:
>> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
>>>> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
>>>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>>>> files of a qemu-img convert process if the source is a host_device?
>>>>>>
>>>>>> Currently the backup of a host device is polluting the page cache.
>>>>> Points to consider:
>>>>>
>>>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>>>>>     the file.  A fallback is necessary.
>>>>>
>>>>> 2. O_DIRECT has no readahead so performance could actually decrease.
>>>>>     The question is, how important is reahead versus polluting page
>>>>>     cache?
>>>>>
>>>>> 3. For raw files it would make sense to tell the kernel that access is
>>>>>     sequential and data will be used only once.  Then we can get the best
>>>>>     of both worlds (avoid polluting page cache but still get readahead).
>>>>>     This is done using posix_fadvise(2).
>>>>>
>>>>>     The problem is what to do for image formats.  An image file can be
>>>>>     very fragmented so the readahead might not be a win.  Does this mean
>>>>>     that for image formats we should tell the kernel access will be
>>>>>     random?
>>>>>
>>>>>     Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>>>     network protocols (nbd, iscsi, etc) can get good performance.  They
>>>>>     act like O_DIRECT is always on.
>>>> your comments are regarding qemu-img convert, right?
>>>> How would you implement this? A new open flag because
>>>> the fadvise had to goto inside the protocol driver.
>>>>
>>>> I would start with host_devices first and see how it performs there.
>>>>
>>>> For qemu-img convert I would issue a FADV_DONTNEED after
>>>> a write for the bytes that have been written
>>>> (i have tested this with Linux and it seems to work quite well).
>>>>
>>>> Question is, what is the right paramter for reads? Also FADV_DONTNEED?
>>> I think so but this should be justified with benchmark results.
>> I ran some benchmarks at found that a FADV_DONTNEED issues after
>> a read does not hurt regarding to performance. But it avoids buffers
>> increasing while I read from a host_device of raw file.
> It was mentioned in this thread that a sequential shouldn't promote the
> pages anyway - they should be dropped by the kernel if there is memory
> pressure.
Yes, but this costs cpu time in spikes and the page cache is polluted
with data that is definetely not needed.
>
> So what is the actual performance problem you are trying to solve and
> what benchmark output are you getting when you compare with
> FADV_DONTNEED against without FADV_DONTNEED?
I found the performance to be identical. For the problem see below please.
>
> I think there's a danger that the discussion will go around in circles.
> Please post the performance results that kicked off this whole effort
> and let's focus on the data.  That way it's much easier to evaluate what
> changes to QEMU are a win and which are not necessary.
I found that under memory pressure situations the increasing buffers
leads to vserver memory being swapped out. This caused trouble
especially in overcommit scenarios (where all memory is backed by
swap).
>
>> As for writing it does only work if I issue a fdatasync after each write, but
>> this should be equivalent to O_DIRECT. So I would keep the patch
>> to support qemu-img convert sources if they are host_device or file.
> fdatasync(2) is much more heavy-weight than writing out a pages because
> it sends a disk write cache flush command and waits for it to complete.
as mentioned before for the write path the

FADV_DONTNEED stuff doesn't work.

Peter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-03 11:20           ` Peter Lieven
@ 2014-03-03 12:59             ` Paolo Bonzini
  2014-03-03 13:07               ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-03-03 12:59 UTC (permalink / raw)
  To: Peter Lieven, Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

Il 03/03/2014 12:20, Peter Lieven ha scritto:
>>>
>> This #ifdef should be in the raw-posix driver. Please try to keep the
>> qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
>> friends as an implementation detail of block drivers.
> I had the same idee, but as far as I see the callback to the completes
> request is handled in block.c. raw-posix uses the aio interface and
> not coroutines.

You can put it in the QEMU_AIO_READ case of aio_worker.

linux-aio.c is only for O_DIRECT, so you do not need it there.

BTW, perhaps you need to exclude the fadvise if O_DIRECT is in use?

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-03 12:59             ` Paolo Bonzini
@ 2014-03-03 13:07               ` Peter Lieven
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Lieven @ 2014-03-03 13:07 UTC (permalink / raw)
  To: Paolo Bonzini, Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi

On 03.03.2014 13:59, Paolo Bonzini wrote:
> Il 03/03/2014 12:20, Peter Lieven ha scritto:
>>>>
>>> This #ifdef should be in the raw-posix driver. Please try to keep the
>>> qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
>>> friends as an implementation detail of block drivers.
>> I had the same idee, but as far as I see the callback to the completes
>> request is handled in block.c. raw-posix uses the aio interface and
>> not coroutines.
>
> You can put it in the QEMU_AIO_READ case of aio_worker.
Thanks for the pointer.
>
> linux-aio.c is only for O_DIRECT, so you do not need it there.
>
> BTW, perhaps you need to exclude the fadvise if O_DIRECT is in use?
Good point!
>
> Paolo

Peter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-03 12:20           ` Peter Lieven
@ 2014-03-04  9:24             ` Stefan Hajnoczi
  2014-03-05 14:44               ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-03-04  9:24 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Mon, Mar 03, 2014 at 01:20:21PM +0100, Peter Lieven wrote:
> On 03.03.2014 13:03, Stefan Hajnoczi wrote:
> >So what is the actual performance problem you are trying to solve and
> >what benchmark output are you getting when you compare with
> >FADV_DONTNEED against without FADV_DONTNEED?
> I found the performance to be identical. For the problem see below please.
> >
> >I think there's a danger that the discussion will go around in circles.
> >Please post the performance results that kicked off this whole effort
> >and let's focus on the data.  That way it's much easier to evaluate what
> >changes to QEMU are a win and which are not necessary.
> I found that under memory pressure situations the increasing buffers
> leads to vserver memory being swapped out. This caused trouble
> especially in overcommit scenarios (where all memory is backed by
> swap).

I think the general idea is qemu-img should not impact running guests,
even on a heavily loaded machine.  But again, this needs to be discussed
using concrete benchmarks with configurations and results posted to the
list.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-04  9:24             ` Stefan Hajnoczi
@ 2014-03-05 14:44               ` Peter Lieven
  2014-03-05 15:20                 ` Marcus
  2014-03-06 10:29                 ` Stefan Hajnoczi
  0 siblings, 2 replies; 29+ messages in thread
From: Peter Lieven @ 2014-03-05 14:44 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Am 04.03.2014 10:24, schrieb Stefan Hajnoczi:
> On Mon, Mar 03, 2014 at 01:20:21PM +0100, Peter Lieven wrote:
>> On 03.03.2014 13:03, Stefan Hajnoczi wrote:
>>> So what is the actual performance problem you are trying to solve and
>>> what benchmark output are you getting when you compare with
>>> FADV_DONTNEED against without FADV_DONTNEED?
>> I found the performance to be identical. For the problem see below please.
>>> I think there's a danger that the discussion will go around in circles.
>>> Please post the performance results that kicked off this whole effort
>>> and let's focus on the data.  That way it's much easier to evaluate what
>>> changes to QEMU are a win and which are not necessary.
>> I found that under memory pressure situations the increasing buffers
>> leads to vserver memory being swapped out. This caused trouble
>> especially in overcommit scenarios (where all memory is backed by
>> swap).
> I think the general idea is qemu-img should not impact running guests,
> even on a heavily loaded machine.  But again, this needs to be discussed
> using concrete benchmarks with configurations and results posted to the
> list.
Sure, this is why I started to look at this. I found that under high memory
pressure a backup (local storage -> NFS) causes swapping. I started to
use libnfs as destination to avoid influence of the kernel NFS client. But
I saw that the buffers still increase while a backup is running. With the
proposed patch I sent recently

[PATCH] block: introduce BDRV_O_SEQUENTIAL

I don't see this behaviour while I have not yet observed a performance penalty.

Peter
>
> Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 14:44               ` Peter Lieven
@ 2014-03-05 15:20                 ` Marcus
  2014-03-05 15:53                   ` Peter Lieven
  2014-03-06 10:29                 ` Stefan Hajnoczi
  1 sibling, 1 reply; 29+ messages in thread
From: Marcus @ 2014-03-05 15:20 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

I think this is a more generic sysadmin problem.  I've seen the same
thing in the past with simply snapshotting a logical volume or zfs
zvol and copying it off somewhere. Page cache bloats, the system
starts swapping. To avoid it, we wrote a small C program that calls
FADV_DONTNEED on a file, and fork off a process to call it on the
source file every X seconds in our backup scripts.

It's a little strange to me to have qemu-img do this, just like it
would be strange if 'cp' did it, but I can see it as a very useful
shortcut if it's an optional flag.  qemu-img to me is just an admin
tool, and the admin should decide if they want their tool's reads
cached.  Some additional things that come to mind:

* If you are running qemu-img on a running VM's source file,
FADV_DONTNEED may ruin the cache you wanted if the VM is not running
cache=none.

* O_DIRECT I think will cause unexpected problems, for example the
zfsonlinux guys (and tmpfs as mentioned) don't yet support it. If it
is used, there has to be a fallback or a way to turn it off.

On Wed, Mar 5, 2014 at 7:44 AM, Peter Lieven <pl@kamp.de> wrote:
> Am 04.03.2014 10:24, schrieb Stefan Hajnoczi:
>> On Mon, Mar 03, 2014 at 01:20:21PM +0100, Peter Lieven wrote:
>>> On 03.03.2014 13:03, Stefan Hajnoczi wrote:
>>>> So what is the actual performance problem you are trying to solve and
>>>> what benchmark output are you getting when you compare with
>>>> FADV_DONTNEED against without FADV_DONTNEED?
>>> I found the performance to be identical. For the problem see below please.
>>>> I think there's a danger that the discussion will go around in circles.
>>>> Please post the performance results that kicked off this whole effort
>>>> and let's focus on the data.  That way it's much easier to evaluate what
>>>> changes to QEMU are a win and which are not necessary.
>>> I found that under memory pressure situations the increasing buffers
>>> leads to vserver memory being swapped out. This caused trouble
>>> especially in overcommit scenarios (where all memory is backed by
>>> swap).
>> I think the general idea is qemu-img should not impact running guests,
>> even on a heavily loaded machine.  But again, this needs to be discussed
>> using concrete benchmarks with configurations and results posted to the
>> list.
> Sure, this is why I started to look at this. I found that under high memory
> pressure a backup (local storage -> NFS) causes swapping. I started to
> use libnfs as destination to avoid influence of the kernel NFS client. But
> I saw that the buffers still increase while a backup is running. With the
> proposed patch I sent recently
>
> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>
> I don't see this behaviour while I have not yet observed a performance penalty.
>
> Peter
>>
>> Stefan
>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 15:20                 ` Marcus
@ 2014-03-05 15:53                   ` Peter Lieven
  2014-03-05 17:38                     ` Marcus
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-03-05 15:53 UTC (permalink / raw)
  To: Marcus
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 05.03.2014 16:20, schrieb Marcus:
> I think this is a more generic sysadmin problem.  I've seen the same
> thing in the past with simply snapshotting a logical volume or zfs
> zvol and copying it off somewhere. Page cache bloats, the system
> starts swapping. To avoid it, we wrote a small C program that calls
> FADV_DONTNEED on a file, and fork off a process to call it on the
> source file every X seconds in our backup scripts.
I do not call FADV_DONTNEED on the whole file, but only
on the block that has just been read.
>
> It's a little strange to me to have qemu-img do this, just like it
> would be strange if 'cp' did it, but I can see it as a very useful
> shortcut if it's an optional flag.  qemu-img to me is just an admin
> tool, and the admin should decide if they want their tool's reads
> cached.  Some additional things that come to mind:
>
> * If you are running qemu-img on a running VM's source file,
> FADV_DONTNEED may ruin the cache you wanted if the VM is not running
> cache=none.
You would normally not run it on the source directly. In my case
I run it on a snapshot of an logical volume, but I see your point.

So you can confirm my oberservations and would be happy if
this behaviour could be toggled with a cmdline switch?
>
> * O_DIRECT I think will cause unexpected problems, for example the
> zfsonlinux guys (and tmpfs as mentioned) don't yet support it. If it
> is used, there has to be a fallback or a way to turn it off.
I don't use O_DIRECT. Its an option for the destination file only at the
moment. You can set it with -t none as qemu-img argument.

Peter
>
> On Wed, Mar 5, 2014 at 7:44 AM, Peter Lieven <pl@kamp.de> wrote:
>> Am 04.03.2014 10:24, schrieb Stefan Hajnoczi:
>>> On Mon, Mar 03, 2014 at 01:20:21PM +0100, Peter Lieven wrote:
>>>> On 03.03.2014 13:03, Stefan Hajnoczi wrote:
>>>>> So what is the actual performance problem you are trying to solve and
>>>>> what benchmark output are you getting when you compare with
>>>>> FADV_DONTNEED against without FADV_DONTNEED?
>>>> I found the performance to be identical. For the problem see below please.
>>>>> I think there's a danger that the discussion will go around in circles.
>>>>> Please post the performance results that kicked off this whole effort
>>>>> and let's focus on the data.  That way it's much easier to evaluate what
>>>>> changes to QEMU are a win and which are not necessary.
>>>> I found that under memory pressure situations the increasing buffers
>>>> leads to vserver memory being swapped out. This caused trouble
>>>> especially in overcommit scenarios (where all memory is backed by
>>>> swap).
>>> I think the general idea is qemu-img should not impact running guests,
>>> even on a heavily loaded machine.  But again, this needs to be discussed
>>> using concrete benchmarks with configurations and results posted to the
>>> list.
>> Sure, this is why I started to look at this. I found that under high memory
>> pressure a backup (local storage -> NFS) causes swapping. I started to
>> use libnfs as destination to avoid influence of the kernel NFS client. But
>> I saw that the buffers still increase while a backup is running. With the
>> proposed patch I sent recently
>>
>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>>
>> I don't see this behaviour while I have not yet observed a performance penalty.
>>
>> Peter
>>> Stefan
>>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 15:53                   ` Peter Lieven
@ 2014-03-05 17:38                     ` Marcus
  2014-03-05 18:09                       ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Marcus @ 2014-03-05 17:38 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Wed, Mar 5, 2014 at 8:53 AM, Peter Lieven <pl@kamp.de> wrote:
> Am 05.03.2014 16:20, schrieb Marcus:
>> I think this is a more generic sysadmin problem.  I've seen the same
>> thing in the past with simply snapshotting a logical volume or zfs
>> zvol and copying it off somewhere. Page cache bloats, the system
>> starts swapping. To avoid it, we wrote a small C program that calls
>> FADV_DONTNEED on a file, and fork off a process to call it on the
>> source file every X seconds in our backup scripts.
> I do not call FADV_DONTNEED on the whole file, but only
> on the block that has just been read.

Yes, I suppose that's one of the advantages of having it integrated
into the reader.

>>
>> It's a little strange to me to have qemu-img do this, just like it
>> would be strange if 'cp' did it, but I can see it as a very useful
>> shortcut if it's an optional flag.  qemu-img to me is just an admin
>> tool, and the admin should decide if they want their tool's reads
>> cached.  Some additional things that come to mind:
>>
>> * If you are running qemu-img on a running VM's source file,
>> FADV_DONTNEED may ruin the cache you wanted if the VM is not running
>> cache=none.
> You would normally not run it on the source directly. In my case
> I run it on a snapshot of an logical volume, but I see your point.

Totally depends on the situation, just thought it was worth consideration.

>
> So you can confirm my oberservations and would be happy if
> this behaviour could be toggled with a cmdline switch?

Yes, I've seen the same behavior you mention just with 'cp'. It was
with a version of the CentOS 6.2 kernel, at least, before we added
FADV_DONTNEED into the backup scripts.

>>
>> * O_DIRECT I think will cause unexpected problems, for example the
>> zfsonlinux guys (and tmpfs as mentioned) don't yet support it. If it
>> is used, there has to be a fallback or a way to turn it off.
> I don't use O_DIRECT. Its an option for the destination file only at the
> moment. You can set it with -t none as qemu-img argument.

I just mentioned it because setting it on the source was suggested
originally and subsequently discussed.

>
> Peter
>>
>> On Wed, Mar 5, 2014 at 7:44 AM, Peter Lieven <pl@kamp.de> wrote:
>>> Am 04.03.2014 10:24, schrieb Stefan Hajnoczi:
>>>> On Mon, Mar 03, 2014 at 01:20:21PM +0100, Peter Lieven wrote:
>>>>> On 03.03.2014 13:03, Stefan Hajnoczi wrote:
>>>>>> So what is the actual performance problem you are trying to solve and
>>>>>> what benchmark output are you getting when you compare with
>>>>>> FADV_DONTNEED against without FADV_DONTNEED?
>>>>> I found the performance to be identical. For the problem see below please.
>>>>>> I think there's a danger that the discussion will go around in circles.
>>>>>> Please post the performance results that kicked off this whole effort
>>>>>> and let's focus on the data.  That way it's much easier to evaluate what
>>>>>> changes to QEMU are a win and which are not necessary.
>>>>> I found that under memory pressure situations the increasing buffers
>>>>> leads to vserver memory being swapped out. This caused trouble
>>>>> especially in overcommit scenarios (where all memory is backed by
>>>>> swap).
>>>> I think the general idea is qemu-img should not impact running guests,
>>>> even on a heavily loaded machine.  But again, this needs to be discussed
>>>> using concrete benchmarks with configurations and results posted to the
>>>> list.
>>> Sure, this is why I started to look at this. I found that under high memory
>>> pressure a backup (local storage -> NFS) causes swapping. I started to
>>> use libnfs as destination to avoid influence of the kernel NFS client. But
>>> I saw that the buffers still increase while a backup is running. With the
>>> proposed patch I sent recently
>>>
>>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>>>
>>> I don't see this behaviour while I have not yet observed a performance penalty.
>>>
>>> Peter
>>>> Stefan
>>>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 17:38                     ` Marcus
@ 2014-03-05 18:09                       ` Peter Lieven
  2014-03-06 10:41                         ` Stefan Hajnoczi
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Lieven @ 2014-03-05 18:09 UTC (permalink / raw)
  To: Marcus
  Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 05.03.2014 18:38, schrieb Marcus:
> On Wed, Mar 5, 2014 at 8:53 AM, Peter Lieven <pl@kamp.de> wrote:
>> Am 05.03.2014 16:20, schrieb Marcus:
>>> I think this is a more generic sysadmin problem.  I've seen the same
>>> thing in the past with simply snapshotting a logical volume or zfs
>>> zvol and copying it off somewhere. Page cache bloats, the system
>>> starts swapping. To avoid it, we wrote a small C program that calls
>>> FADV_DONTNEED on a file, and fork off a process to call it on the
>>> source file every X seconds in our backup scripts.
>> I do not call FADV_DONTNEED on the whole file, but only
>> on the block that has just been read.
> Yes, I suppose that's one of the advantages of having it integrated
> into the reader.
>
>>> It's a little strange to me to have qemu-img do this, just like it
>>> would be strange if 'cp' did it, but I can see it as a very useful
>>> shortcut if it's an optional flag.  qemu-img to me is just an admin
>>> tool, and the admin should decide if they want their tool's reads
>>> cached.  Some additional things that come to mind:
>>>
>>> * If you are running qemu-img on a running VM's source file,
>>> FADV_DONTNEED may ruin the cache you wanted if the VM is not running
>>> cache=none.
>> You would normally not run it on the source directly. In my case
>> I run it on a snapshot of an logical volume, but I see your point.
> Totally depends on the situation, just thought it was worth consideration.
Yes, and it was a good remark.
>
>> So you can confirm my oberservations and would be happy if
>> this behaviour could be toggled with a cmdline switch?
> Yes, I've seen the same behavior you mention just with 'cp'. It was
> with a version of the CentOS 6.2 kernel, at least, before we added
> FADV_DONTNEED into the backup scripts.
Ok, Stefan would you be happy with it?
>
>>> * O_DIRECT I think will cause unexpected problems, for example the
>>> zfsonlinux guys (and tmpfs as mentioned) don't yet support it. If it
>>> is used, there has to be a fallback or a way to turn it off.
>> I don't use O_DIRECT. Its an option for the destination file only at the
>> moment. You can set it with -t none as qemu-img argument.
> I just mentioned it because setting it on the source was suggested
> originally and subsequently discussed.
Yes, but it would break readahead and would not work and tmpfs
and many other things we don't see know.

Peter

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 14:44               ` Peter Lieven
  2014-03-05 15:20                 ` Marcus
@ 2014-03-06 10:29                 ` Stefan Hajnoczi
  2014-03-06 11:29                   ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-03-06 10:29 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, Anthony Liguori

On Wed, Mar 05, 2014 at 03:44:17PM +0100, Peter Lieven wrote:
> [PATCH] block: introduce BDRV_O_SEQUENTIAL

It hasn't shown up on the mailing list yet.

I received it privately because I am CCed but it needs to be on the list
in order to get review and be merged.

Anthony: how can Peter troubleshoot a patch email that isn't appearing
on the mailing list?

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-05 18:09                       ` Peter Lieven
@ 2014-03-06 10:41                         ` Stefan Hajnoczi
  2014-03-06 18:58                           ` Peter Lieven
  0 siblings, 1 reply; 29+ messages in thread
From: Stefan Hajnoczi @ 2014-03-06 10:41 UTC (permalink / raw)
  To: Peter Lieven
  Cc: Marcus, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

On Wed, Mar 05, 2014 at 07:09:13PM +0100, Peter Lieven wrote:
> Am 05.03.2014 18:38, schrieb Marcus:
> > On Wed, Mar 5, 2014 at 8:53 AM, Peter Lieven <pl@kamp.de> wrote:
> >> So you can confirm my oberservations and would be happy if
> >> this behaviour could be toggled with a cmdline switch?
> > Yes, I've seen the same behavior you mention just with 'cp'. It was
> > with a version of the CentOS 6.2 kernel, at least, before we added
> > FADV_DONTNEED into the backup scripts.
> Ok, Stefan would you be happy with it?

I'm happy with it.

But I'm a little frustrated that I've asked multiple times for a
concrete benchmark and it hasn't been provided.

In order to review patches that improve performance, we need a benchmark
that can be reproduced.  And the commit description must include results
that quantify the improvement.

Otherwise it's too easy to merge patches that sound reasonable but don't
have the effect that everyone thought they had.

I see two possible benchmarks:

1. Create memory pressure so that a benchmark performs worse unless we
   slim down our page cache usage.

2. Compare mm statistics before and after qemu-img to prove no longer
   bloats the page cache.

#2 doesn't prove that there is a practical performance issue, it just
optimizes the mm statistics.  But at least it quantifies that and serves
as a test case.

Either would be okay.

Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-06 10:29                 ` Stefan Hajnoczi
@ 2014-03-06 11:29                   ` Paolo Bonzini
  2014-03-06 14:19                     ` Liguori, Anthony
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2014-03-06 11:29 UTC (permalink / raw)
  To: Stefan Hajnoczi, Peter Lieven
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi, Anthony Liguori

Il 06/03/2014 11:29, Stefan Hajnoczi ha scritto:
> On Wed, Mar 05, 2014 at 03:44:17PM +0100, Peter Lieven wrote:
>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>
> It hasn't shown up on the mailing list yet.
>
> I received it privately because I am CCed but it needs to be on the list
> in order to get review and be merged.
>
> Anthony: how can Peter troubleshoot a patch email that isn't appearing
> on the mailing list?

Maybe it's been caught by the savannah spam filter.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-06 11:29                   ` Paolo Bonzini
@ 2014-03-06 14:19                     ` Liguori, Anthony
  2014-03-06 18:07                       ` Peter Lieven
  2014-03-07  8:03                       ` Peter Lieven
  0 siblings, 2 replies; 29+ messages in thread
From: Liguori, Anthony @ 2014-03-06 14:19 UTC (permalink / raw)
  To: Paolo Bonzini, Stefan Hajnoczi, Peter Lieven
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

We can check the moderation queue although it's usually empty.  Savannah has a pretty aggressive spam filter and mail delivery isn't always reliable.

Regards,

Anthony Liguori

________________________________________
From: Paolo Bonzini [paolo.bonzini@gmail.com] on behalf of Paolo Bonzini [pbonzini@redhat.com]
Sent: Thursday, March 06, 2014 3:29 AM
To: Stefan Hajnoczi; Peter Lieven
Cc: Kevin Wolf; qemu-devel@nongnu.org; Stefan Hajnoczi; Liguori, Anthony
Subject: Re: qemu-img convert cache mode for source

Il 06/03/2014 11:29, Stefan Hajnoczi ha scritto:
> On Wed, Mar 05, 2014 at 03:44:17PM +0100, Peter Lieven wrote:
>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>
> It hasn't shown up on the mailing list yet.
>
> I received it privately because I am CCed but it needs to be on the list
> in order to get review and be merged.
>
> Anthony: how can Peter troubleshoot a patch email that isn't appearing
> on the mailing list?

Maybe it's been caught by the savannah spam filter.

Paolo


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-06 14:19                     ` Liguori, Anthony
@ 2014-03-06 18:07                       ` Peter Lieven
  2014-03-07  8:03                       ` Peter Lieven
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Lieven @ 2014-03-06 18:07 UTC (permalink / raw)
  To: Liguori, Anthony, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Hi Anthony,

Am 06.03.2014 15:19, schrieb Liguori, Anthony:
> We can check the moderation queue although it's usually empty.  Savannah has a pretty aggressive spam filter and mail delivery isn't always reliable.
It would be good to know why these mails got lost. It seems to have happened sometimes in the past as well.

Thanks,
Peter
>
> Regards,
>
> Anthony Liguori
>
> ________________________________________
> From: Paolo Bonzini [paolo.bonzini@gmail.com] on behalf of Paolo Bonzini [pbonzini@redhat.com]
> Sent: Thursday, March 06, 2014 3:29 AM
> To: Stefan Hajnoczi; Peter Lieven
> Cc: Kevin Wolf; qemu-devel@nongnu.org; Stefan Hajnoczi; Liguori, Anthony
> Subject: Re: qemu-img convert cache mode for source
>
> Il 06/03/2014 11:29, Stefan Hajnoczi ha scritto:
>> On Wed, Mar 05, 2014 at 03:44:17PM +0100, Peter Lieven wrote:
>>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>> It hasn't shown up on the mailing list yet.
>>
>> I received it privately because I am CCed but it needs to be on the list
>> in order to get review and be merged.
>>
>> Anthony: how can Peter troubleshoot a patch email that isn't appearing
>> on the mailing list?
> Maybe it's been caught by the savannah spam filter.
>
> Paolo
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-06 10:41                         ` Stefan Hajnoczi
@ 2014-03-06 18:58                           ` Peter Lieven
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Lieven @ 2014-03-06 18:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Marcus, Kevin Wolf, qemu-devel, Stefan Hajnoczi, Paolo Bonzini

Am 06.03.2014 11:41, schrieb Stefan Hajnoczi:
> On Wed, Mar 05, 2014 at 07:09:13PM +0100, Peter Lieven wrote:
>> Am 05.03.2014 18:38, schrieb Marcus:
>>> On Wed, Mar 5, 2014 at 8:53 AM, Peter Lieven <pl@kamp.de> wrote:
>>>> So you can confirm my oberservations and would be happy if
>>>> this behaviour could be toggled with a cmdline switch?
>>> Yes, I've seen the same behavior you mention just with 'cp'. It was
>>> with a version of the CentOS 6.2 kernel, at least, before we added
>>> FADV_DONTNEED into the backup scripts.
>> Ok, Stefan would you be happy with it?
> I'm happy with it.
>
> But I'm a little frustrated that I've asked multiple times for a
> concrete benchmark and it hasn't been provided.
I did not want to frustrate you with that. I thought my abstract
description of my observations in the commit message would be sufficient.
Especially when Marcus provided his observations.

I will try to setup a test case.

Peter
>
> In order to review patches that improve performance, we need a benchmark
> that can be reproduced.  And the commit description must include results
> that quantify the improvement.
>
> Otherwise it's too easy to merge patches that sound reasonable but don't
> have the effect that everyone thought they had.
>
> I see two possible benchmarks:
>
> 1. Create memory pressure so that a benchmark performs worse unless we
>    slim down our page cache usage.
>
> 2. Compare mm statistics before and after qemu-img to prove no longer
>    bloats the page cache.
>
> #2 doesn't prove that there is a practical performance issue, it just
> optimizes the mm statistics.  But at least it quantifies that and serves
> as a test case.
>
> Either would be okay.
>
> Stefan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] qemu-img convert cache mode for source
  2014-03-06 14:19                     ` Liguori, Anthony
  2014-03-06 18:07                       ` Peter Lieven
@ 2014-03-07  8:03                       ` Peter Lieven
  1 sibling, 0 replies; 29+ messages in thread
From: Peter Lieven @ 2014-03-07  8:03 UTC (permalink / raw)
  To: Liguori, Anthony, Paolo Bonzini, Stefan Hajnoczi
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 06.03.2014 15:19, Liguori, Anthony wrote:
> We can check the moderation queue although it's usually empty.  Savannah has a pretty aggressive spam filter and mail delivery isn't always reliable.
I think I got it. I accidently had a typo (missing space) in the git send-email command which left the envelope-sender empty.

git send-email --envelope-sender=pl@kamp.de--cc=pbonzini@redhat.com --cc=kwolf@redhat.com --cc=stefanha@redhat.com 0001-block-introduce-BDRV_O_SEQUENTIAL.patch

Peter

>
> Regards,
>
> Anthony Liguori
>
> ________________________________________
> From: Paolo Bonzini [paolo.bonzini@gmail.com] on behalf of Paolo Bonzini [pbonzini@redhat.com]
> Sent: Thursday, March 06, 2014 3:29 AM
> To: Stefan Hajnoczi; Peter Lieven
> Cc: Kevin Wolf; qemu-devel@nongnu.org; Stefan Hajnoczi; Liguori, Anthony
> Subject: Re: qemu-img convert cache mode for source
>
> Il 06/03/2014 11:29, Stefan Hajnoczi ha scritto:
>> On Wed, Mar 05, 2014 at 03:44:17PM +0100, Peter Lieven wrote:
>>> [PATCH] block: introduce BDRV_O_SEQUENTIAL
>> It hasn't shown up on the mailing list yet.
>>
>> I received it privately because I am CCed but it needs to be on the list
>> in order to get review and be merged.
>>
>> Anthony: how can Peter troubleshoot a patch email that isn't appearing
>> on the mailing list?
> Maybe it's been caught by the savannah spam filter.
>
> Paolo
>


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2014-03-07  8:03 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 10:14 [Qemu-devel] qemu-img convert cache mode for source Peter Lieven
2014-02-26 15:41 ` Stefan Hajnoczi
2014-02-26 15:54   ` Eric Blake
2014-02-26 16:01   ` Peter Lieven
2014-02-27  8:57     ` Stefan Hajnoczi
2014-02-28 14:35       ` Peter Lieven
2014-03-03 10:38         ` Kevin Wolf
2014-03-03 11:20           ` Peter Lieven
2014-03-03 12:59             ` Paolo Bonzini
2014-03-03 13:07               ` Peter Lieven
2014-03-03 12:03         ` Stefan Hajnoczi
2014-03-03 12:20           ` Peter Lieven
2014-03-04  9:24             ` Stefan Hajnoczi
2014-03-05 14:44               ` Peter Lieven
2014-03-05 15:20                 ` Marcus
2014-03-05 15:53                   ` Peter Lieven
2014-03-05 17:38                     ` Marcus
2014-03-05 18:09                       ` Peter Lieven
2014-03-06 10:41                         ` Stefan Hajnoczi
2014-03-06 18:58                           ` Peter Lieven
2014-03-06 10:29                 ` Stefan Hajnoczi
2014-03-06 11:29                   ` Paolo Bonzini
2014-03-06 14:19                     ` Liguori, Anthony
2014-03-06 18:07                       ` Peter Lieven
2014-03-07  8:03                       ` Peter Lieven
2014-02-27  1:10   ` Fam Zheng
2014-02-27 11:07     ` Kevin Wolf
2014-02-27 16:12       ` Peter Lieven
2014-03-03 10:40         ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.