All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean
@ 2016-07-04 14:38 Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 1/4] ide: refactor retry_unit set and clear into separate function Denis V. Lunev
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-07-04 14:38 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

Changes from v4:
- Moved to write generation scheme instead of dirty flag
- Added retry setup to IDE PIO and FLUSH requests

Changes from v3:
- Fixed a typo in commit message
- Rebased on Kevin'n origin/block

Changes from v2:
- Better comments
- Rebased on latest master

Changes from v1:
- Flush requests that should be skipped will now wait for completion
of any previous requests already in flight
- Fixed IDE and AHCI tests to dirty media for new flush behaviour
- Fixed a problem in IDE CMD_FLUSH_CACHE failure handling

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>

Evgeny Yakovlev (4):
  ide: refactor retry_unit set and clear into separate function
  ide: set retry_unit for PIO and FLUSH requests
  tests: in IDE and AHCI tests perform DMA write before flushing
  block: ignore flush requests when storage is clean

 block.c                   |  3 +++
 block/io.c                | 18 ++++++++++++++++++
 hw/ide/core.c             | 24 ++++++++++++++++++------
 include/block/block_int.h |  5 +++++
 tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 119 insertions(+), 8 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v5 1/4] ide: refactor retry_unit set and clear into separate function
  2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
@ 2016-07-04 14:38 ` Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 2/4] ide: set retry_unit for PIO and FLUSH requests Denis V. Lunev
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-07-04 14:38 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Code to set and clear state associated with retry in moved into
ide_set_retry and ide_clear_retry to make adding retry setups easier.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 hw/ide/core.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 029f6b9..b72346e 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -466,6 +466,20 @@ void ide_abort_command(IDEState *s)
     s->error = ABRT_ERR;
 }
 
+static void ide_set_retry(IDEState *s)
+{
+    s->bus->retry_unit = s->unit;
+    s->bus->retry_sector_num = ide_get_sector(s);
+    s->bus->retry_nsector = s->nsector;
+}
+
+static void ide_clear_retry(IDEState *s)
+{
+    s->bus->retry_unit = -1;
+    s->bus->retry_sector_num = 0;
+    s->bus->retry_nsector = 0;
+}
+
 /* prepare data transfer and tell what to do after */
 void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
                         EndTransferFunc *end_transfer_func)
@@ -756,9 +770,7 @@ void dma_buf_commit(IDEState *s, uint32_t tx_bytes)
 void ide_set_inactive(IDEState *s, bool more)
 {
     s->bus->dma->aiocb = NULL;
-    s->bus->retry_unit = -1;
-    s->bus->retry_sector_num = 0;
-    s->bus->retry_nsector = 0;
+    ide_clear_retry(s);
     if (s->bus->dma->ops->set_inactive) {
         s->bus->dma->ops->set_inactive(s->bus->dma, more);
     }
@@ -914,9 +926,7 @@ static void ide_sector_start_dma(IDEState *s, enum ide_dma_cmd dma_cmd)
 void ide_start_dma(IDEState *s, BlockCompletionFunc *cb)
 {
     s->io_buffer_index = 0;
-    s->bus->retry_unit = s->unit;
-    s->bus->retry_sector_num = ide_get_sector(s);
-    s->bus->retry_nsector = s->nsector;
+    ide_set_retry(s);
     if (s->bus->dma->ops->start_dma) {
         s->bus->dma->ops->start_dma(s->bus->dma, s, cb);
     }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v5 2/4] ide: set retry_unit for PIO and FLUSH requests
  2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 1/4] ide: refactor retry_unit set and clear into separate function Denis V. Lunev
@ 2016-07-04 14:38 ` Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 3/4] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-07-04 14:38 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush

When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.

This patch restricts retry unit assertion only to ops that actually use
retry logic.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 hw/ide/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index b72346e..14f03d2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -487,6 +487,7 @@ void ide_transfer_start(IDEState *s, uint8_t *buf, int size,
     s->end_transfer_func = end_transfer_func;
     s->data_ptr = buf;
     s->data_end = buf + size;
+    ide_set_retry(s);
     if (!(s->status & ERR_STAT)) {
         s->status |= DRQ_STAT;
     }
@@ -1056,6 +1057,7 @@ static void ide_flush_cache(IDEState *s)
     }
 
     s->status |= BUSY_STAT;
+    ide_set_retry(s);
     block_acct_start(blk_get_stats(s->blk), &s->acct, 0, BLOCK_ACCT_FLUSH);
     s->pio_aiocb = blk_aio_flush(s->blk, ide_flush_cb, s);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v5 3/4] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 1/4] ide: refactor retry_unit set and clear into separate function Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 2/4] ide: set retry_unit for PIO and FLUSH requests Denis V. Lunev
@ 2016-07-04 14:38 ` Denis V. Lunev
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-07-04 14:53 ` [Qemu-devel] [PATCH v5 0/4] " Paolo Bonzini
  4 siblings, 0 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-07-04 14:38 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.

This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 57dc44c..d707714 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
     g_free(tx);
 }
 
+/*
+ * Write sector 0 with random data to make AHCI storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(AHCIQState* ahci, uint8_t port)
+{
+    uint64_t ptr;
+    unsigned bufsize = 512;
+
+    ptr = ahci_alloc(ahci, bufsize);
+    g_assert(ptr);
+
+    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
+    ahci_free(ahci, ptr);
+}
+
 static void test_flush(void)
 {
     AHCIQState *ahci;
+    uint8_t port;
 
     ahci = ahci_boot_and_enable(NULL);
+
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    make_dirty(ahci, port);
+
     ahci_test_flush(ahci);
     ahci_shutdown(ahci);
 }
@@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
                                 debug_path,
                                 tmp_path, imgfmt);
 
-    /* Issue Flush Command and wait for error */
     port = ahci_port_select(ahci);
     ahci_port_clear(ahci, port);
 
+    /* Issue write so that flush actually goes to disk */
+    make_dirty(ahci, port);
+
+    /* Issue Flush Command and wait for error */
     cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
     ahci_guest_io_resume(ahci, cmd);
 
@@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
 
     set_context(src->parent);
 
-    /* Issue Flush Command */
     px = ahci_port_select(src);
     ahci_port_clear(src, px);
+
+    /* Dirty device so that flush reaches disk */
+    make_dirty(src, px);
+
+    /* Issue Flush Command */
     cmd = ahci_command_create(CMD_FLUSH_CACHE);
     ahci_command_commit(src, cmd, px);
     ahci_command_issue_async(src, cmd);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index fed1b2e..8466d0f 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -499,6 +499,39 @@ static void test_identify(void)
     ide_test_quit();
 }
 
+/*
+ * Write sector 0 with random data to make IDE storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(uint8_t device)
+{
+    uint8_t status;
+    size_t len = 512;
+    uintptr_t guest_buf;
+    void* buf;
+
+    guest_buf = guest_alloc(guest_malloc, len);
+    buf = g_malloc(len);
+    g_assert(guest_buf);
+    g_assert(buf);
+
+    memwrite(guest_buf, buf, len);
+
+    PrdtEntry prdt[] = {
+        {
+            .addr = cpu_to_le32(guest_buf),
+            .size = cpu_to_le32(len | PRDT_EOT),
+        },
+    };
+
+    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
+                              ARRAY_SIZE(prdt), NULL);
+    g_assert_cmphex(status, ==, BM_STS_INTR);
+    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
+
+    g_free(buf);
+}
+
 static void test_flush(void)
 {
     uint8_t data;
@@ -507,6 +540,11 @@ static void test_flush(void)
         "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
         tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* Delay the completion of the flush request until we explicitly do it */
     g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
 
@@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
         "rerror=stop,werror=stop",
         debug_path, tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* FLUSH CACHE command on device 0*/
     outb(IDE_BASE + reg_device, 0);
     outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
                   ` (2 preceding siblings ...)
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 3/4] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
@ 2016-07-04 14:38 ` Denis V. Lunev
  2016-07-07 23:04   ` Eric Blake
  2016-07-08 18:44   ` John Snow
  2016-07-04 14:53 ` [Qemu-devel] [PATCH v5 0/4] " Paolo Bonzini
  4 siblings, 2 replies; 13+ messages in thread
From: Denis V. Lunev @ 2016-07-04 14:38 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 block.c                   |  3 +++
 block/io.c                | 18 ++++++++++++++++++
 include/block/block_int.h |  5 +++++
 3 files changed, 26 insertions(+)

diff --git a/block.c b/block.c
index f4648e9..366fad6 100644
--- a/block.c
+++ b/block.c
@@ -234,6 +234,8 @@ BlockDriverState *bdrv_new(void)
     bs->refcnt = 1;
     bs->aio_context = qemu_get_aio_context();
 
+    qemu_co_queue_init(&bs->flush_queue);
+
     QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
 
     return bs;
@@ -2582,6 +2584,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
         ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
         bdrv_dirty_bitmap_truncate(bs);
         bdrv_parent_cb_resize(bs);
+        ++bs->write_gen;
     }
     return ret;
 }
diff --git a/block/io.c b/block/io.c
index 7cf3645..a5451b6 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     }
     bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
 
+    ++bs->write_gen;
     bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
 
     if (bs->wr_highest_offset < offset + bytes) {
@@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 {
     int ret;
     BdrvTrackedRequest req;
+    int current_gen = bs->write_gen;
 
     if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
         bdrv_is_sg(bs)) {
@@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 
     tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
 
+    /* Wait until any previous flushes are completed */
+    while (bs->flush_started_gen != bs->flushed_gen) {
+        qemu_co_queue_wait(&bs->flush_queue);
+    }
+    bs->flush_started_gen = current_gen;
+
     /* Write back all layers by calling one driver function */
     if (bs->drv->bdrv_co_flush) {
         ret = bs->drv->bdrv_co_flush(bs);
@@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
+    /* Check if we really need to flush anything */
+    if (bs->flushed_gen == current_gen) {
+        goto flush_parent;
+    }
+
     BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
     if (bs->drv->bdrv_co_flush_to_disk) {
         ret = bs->drv->bdrv_co_flush_to_disk(bs);
@@ -2279,6 +2292,10 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
 flush_parent:
     ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
 out:
+    /* Notify any pending flushes that we have completed */
+    bs->flushed_gen = current_gen;
+    qemu_co_queue_restart_all(&bs->flush_queue);
+
     tracked_request_end(&req);
     return ret;
 }
@@ -2402,6 +2419,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
     }
     ret = 0;
 out:
+    ++bs->write_gen;
     bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
                    req.bytes >> BDRV_SECTOR_BITS);
     tracked_request_end(&req);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 2057156..8543daf 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -420,6 +420,11 @@ struct BlockDriverState {
                          note this is a reference count */
     bool probed;
 
+    CoQueue flush_queue;            /* Serializing flush queue */
+    unsigned int write_gen;         /* Current data generation */
+    unsigned int flush_started_gen; /* Generation for which flush has started */
+    unsigned int flushed_gen;       /* Flushed write generation */
+
     BlockDriver *drv; /* NULL means no media */
     void *opaque;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean
  2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
                   ` (3 preceding siblings ...)
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean Denis V. Lunev
@ 2016-07-04 14:53 ` Paolo Bonzini
  2016-07-04 15:48   ` Evgeny Yakovlev
  2016-07-07 22:06   ` John Snow
  4 siblings, 2 replies; 13+ messages in thread
From: Paolo Bonzini @ 2016-07-04 14:53 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow

On 04/07/2016 16:38, Denis V. Lunev wrote:
> Changes from v4:
> - Moved to write generation scheme instead of dirty flag
> - Added retry setup to IDE PIO and FLUSH requests
> 
> Changes from v3:
> - Fixed a typo in commit message
> - Rebased on Kevin'n origin/block
> 
> Changes from v2:
> - Better comments
> - Rebased on latest master
> 
> Changes from v1:
> - Flush requests that should be skipped will now wait for completion
> of any previous requests already in flight
> - Fixed IDE and AHCI tests to dirty media for new flush behaviour
> - Fixed a problem in IDE CMD_FLUSH_CACHE failure handling
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> 
> Evgeny Yakovlev (4):
>   ide: refactor retry_unit set and clear into separate function
>   ide: set retry_unit for PIO and FLUSH requests
>   tests: in IDE and AHCI tests perform DMA write before flushing
>   block: ignore flush requests when storage is clean
> 
>  block.c                   |  3 +++
>  block/io.c                | 18 ++++++++++++++++++
>  hw/ide/core.c             | 24 ++++++++++++++++++------
>  include/block/block_int.h |  5 +++++
>  tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
>  tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 119 insertions(+), 8 deletions(-)
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean
  2016-07-04 14:53 ` [Qemu-devel] [PATCH v5 0/4] " Paolo Bonzini
@ 2016-07-04 15:48   ` Evgeny Yakovlev
  2016-07-07 22:06   ` John Snow
  1 sibling, 0 replies; 13+ messages in thread
From: Evgeny Yakovlev @ 2016-07-04 15:48 UTC (permalink / raw)
  To: Paolo Bonzini, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi, John Snow



On 04.07.2016 17:53, Paolo Bonzini wrote:
> On 04/07/2016 16:38, Denis V. Lunev wrote:
>> Changes from v4:
>> - Moved to write generation scheme instead of dirty flag
>> - Added retry setup to IDE PIO and FLUSH requests
>>
>> Changes from v3:
>> - Fixed a typo in commit message
>> - Rebased on Kevin'n origin/block
>>
>> Changes from v2:
>> - Better comments
>> - Rebased on latest master
>>
>> Changes from v1:
>> - Flush requests that should be skipped will now wait for completion
>> of any previous requests already in flight
>> - Fixed IDE and AHCI tests to dirty media for new flush behaviour
>> - Fixed a problem in IDE CMD_FLUSH_CACHE failure handling
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>>
>> Evgeny Yakovlev (4):
>>    ide: refactor retry_unit set and clear into separate function
>>    ide: set retry_unit for PIO and FLUSH requests
>>    tests: in IDE and AHCI tests perform DMA write before flushing
>>    block: ignore flush requests when storage is clean
>>
>>   block.c                   |  3 +++
>>   block/io.c                | 18 ++++++++++++++++++
>>   hw/ide/core.c             | 24 ++++++++++++++++++------
>>   include/block/block_int.h |  5 +++++
>>   tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
>>   tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   6 files changed, 119 insertions(+), 8 deletions(-)
>>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>
> Thanks!

You're welcome!

>
> Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean
  2016-07-04 14:53 ` [Qemu-devel] [PATCH v5 0/4] " Paolo Bonzini
  2016-07-04 15:48   ` Evgeny Yakovlev
@ 2016-07-07 22:06   ` John Snow
  1 sibling, 0 replies; 13+ messages in thread
From: John Snow @ 2016-07-07 22:06 UTC (permalink / raw)
  To: Paolo Bonzini, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi, Fam Zheng



On 07/04/2016 10:53 AM, Paolo Bonzini wrote:
> On 04/07/2016 16:38, Denis V. Lunev wrote:
>> Changes from v4:
>> - Moved to write generation scheme instead of dirty flag
>> - Added retry setup to IDE PIO and FLUSH requests
>>
>> Changes from v3:
>> - Fixed a typo in commit message
>> - Rebased on Kevin'n origin/block
>>
>> Changes from v2:
>> - Better comments
>> - Rebased on latest master
>>
>> Changes from v1:
>> - Flush requests that should be skipped will now wait for completion
>> of any previous requests already in flight
>> - Fixed IDE and AHCI tests to dirty media for new flush behaviour
>> - Fixed a problem in IDE CMD_FLUSH_CACHE failure handling
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>>
>> Evgeny Yakovlev (4):
>>   ide: refactor retry_unit set and clear into separate function
>>   ide: set retry_unit for PIO and FLUSH requests
>>   tests: in IDE and AHCI tests perform DMA write before flushing
>>   block: ignore flush requests when storage is clean
>>
>>  block.c                   |  3 +++
>>  block/io.c                | 18 ++++++++++++++++++
>>  hw/ide/core.c             | 24 ++++++++++++++++++------
>>  include/block/block_int.h |  5 +++++
>>  tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
>>  tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 119 insertions(+), 8 deletions(-)
>>
> 
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Thanks!
> 
> Paolo
> 

I'll stage and send tomorrow, thank you.

--js

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean Denis V. Lunev
@ 2016-07-07 23:04   ` Eric Blake
  2016-07-08 15:19     ` Evgeny Yakovlev
  2016-07-08 18:44   ` John Snow
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Blake @ 2016-07-07 23:04 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz,
	Stefan Hajnoczi, John Snow

[-- Attachment #1: Type: text/plain, Size: 3169 bytes --]

On 07/04/2016 08:38 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a write generation scheme in BlockDriverState.
> Current write generation is checked against last flushed generation to
> avoid unnessesary flushes.
> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 

> +++ b/block/io.c
> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>      }
>      bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>  
> +    ++bs->write_gen;

Why pre-increment?  Most code uses post-increment when done as a
statement in isolation.

>      bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>  
>      if (bs->wr_highest_offset < offset + bytes) {
> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  {
>      int ret;
>      BdrvTrackedRequest req;
> +    int current_gen = bs->write_gen;
>  
>      if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>          bdrv_is_sg(bs)) {
> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  
>      tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>  
> +    /* Wait until any previous flushes are completed */
> +    while (bs->flush_started_gen != bs->flushed_gen) {

Should this be an inequality, as in s/!=/</, in case several flushes can
be started in parallel and where the later flush ends up finishing
before the earlier flush?

> +        qemu_co_queue_wait(&bs->flush_queue);
> +    }
> +    bs->flush_started_gen = current_gen;
> +
>      /* Write back all layers by calling one driver function */
>      if (bs->drv->bdrv_co_flush) {
>          ret = bs->drv->bdrv_co_flush(bs);
> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>          goto flush_parent;
>      }
>  
> +    /* Check if we really need to flush anything */
> +    if (bs->flushed_gen == current_gen) {

Likewise, if you are tracking generations, should this be s/==/<=/ (am I
getting the direction correct)?

> +++ b/include/block/block_int.h
> @@ -420,6 +420,11 @@ struct BlockDriverState {
>                           note this is a reference count */
>      bool probed;
>  
> +    CoQueue flush_queue;            /* Serializing flush queue */
> +    unsigned int write_gen;         /* Current data generation */
> +    unsigned int flush_started_gen; /* Generation for which flush has started */
> +    unsigned int flushed_gen;       /* Flushed write generation */

Should these be 64-bit integers to avoid risk of overflow after just
2^32 flush attempts?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-07 23:04   ` Eric Blake
@ 2016-07-08 15:19     ` Evgeny Yakovlev
  0 siblings, 0 replies; 13+ messages in thread
From: Evgeny Yakovlev @ 2016-07-08 15:19 UTC (permalink / raw)
  To: Eric Blake, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi, John Snow



On 08.07.2016 02:04, Eric Blake wrote:
> On 07/04/2016 08:38 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a write generation scheme in BlockDriverState.
>> Current write generation is checked against last flushed generation to
>> avoid unnessesary flushes.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> +++ b/block/io.c
>> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>>       }
>>       bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>>   
>> +    ++bs->write_gen;
> Why pre-increment?  Most code uses post-increment when done as a
> statement in isolation.

Just a habit of mine, from C++ days, where you can never be sure if 
someone overloaded post-increment operator and it will then generate a 
temporary object because post-increment is defined to return previous 
value. Now it's just a muscle memory :)

>
>>       bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>>   
>>       if (bs->wr_highest_offset < offset + bytes) {
>> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   {
>>       int ret;
>>       BdrvTrackedRequest req;
>> +    int current_gen = bs->write_gen;
>>   
>>       if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>>           bdrv_is_sg(bs)) {
>> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   
>>       tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>>   
>> +    /* Wait until any previous flushes are completed */
>> +    while (bs->flush_started_gen != bs->flushed_gen) {
> Should this be an inequality, as in s/!=/</, in case several flushes can
> be started in parallel and where the later flush ends up finishing
> before the earlier flush?

flush_started_gen is always ahead of flushed_gen or is equal to it no 
matter how many requests we have in flight. using != allows us to ignore 
checking for unsigned overflow (which you also mention in this email).

>
>> +        qemu_co_queue_wait(&bs->flush_queue);
>> +    }
>> +    bs->flush_started_gen = current_gen;
>> +
>>       /* Write back all layers by calling one driver function */
>>       if (bs->drv->bdrv_co_flush) {
>>           ret = bs->drv->bdrv_co_flush(bs);
>> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>           goto flush_parent;
>>       }
>>   
>> +    /* Check if we really need to flush anything */
>> +    if (bs->flushed_gen == current_gen) {
> Likewise, if you are tracking generations, should this be s/==/<=/ (am I
> getting the direction correct)?

Same here - '==' is so that we don't have to worry about unsigned overflow.

>
>> +++ b/include/block/block_int.h
>> @@ -420,6 +420,11 @@ struct BlockDriverState {
>>                            note this is a reference count */
>>       bool probed;
>>   
>> +    CoQueue flush_queue;            /* Serializing flush queue */
>> +    unsigned int write_gen;         /* Current data generation */
>> +    unsigned int flush_started_gen; /* Generation for which flush has started */
>> +    unsigned int flushed_gen;       /* Flushed write generation */
> Should these be 64-bit integers to avoid risk of overflow after just
> 2^32 flush attempts?
>

We don't have to care about unsigned overflow. It has a well defined 
behavior and we only check if generations are equal or not.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-07-07 23:04   ` Eric Blake
@ 2016-07-08 18:44   ` John Snow
  2016-07-11 10:12     ` Evgeny Yakovlev
  1 sibling, 1 reply; 13+ messages in thread
From: John Snow @ 2016-07-08 18:44 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi



On 07/04/2016 10:38 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a write generation scheme in BlockDriverState.
> Current write generation is checked against last flushed generation to
> avoid unnessesary flushes.
> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---
>  block.c                   |  3 +++
>  block/io.c                | 18 ++++++++++++++++++
>  include/block/block_int.h |  5 +++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/block.c b/block.c
> index f4648e9..366fad6 100644
> --- a/block.c
> +++ b/block.c
> @@ -234,6 +234,8 @@ BlockDriverState *bdrv_new(void)
>      bs->refcnt = 1;
>      bs->aio_context = qemu_get_aio_context();
>  
> +    qemu_co_queue_init(&bs->flush_queue);
> +
>      QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
>  
>      return bs;
> @@ -2582,6 +2584,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>          bdrv_dirty_bitmap_truncate(bs);
>          bdrv_parent_cb_resize(bs);
> +        ++bs->write_gen;
>      }
>      return ret;
>  }
> diff --git a/block/io.c b/block/io.c
> index 7cf3645..a5451b6 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>      }
>      bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>  
> +    ++bs->write_gen;
>      bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>  
>      if (bs->wr_highest_offset < offset + bytes) {
> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  {
>      int ret;
>      BdrvTrackedRequest req;
> +    int current_gen = bs->write_gen;
>  
>      if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>          bdrv_is_sg(bs)) {
> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  
>      tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>  
> +    /* Wait until any previous flushes are completed */
> +    while (bs->flush_started_gen != bs->flushed_gen) {
> +        qemu_co_queue_wait(&bs->flush_queue);
> +    }
> +    bs->flush_started_gen = current_gen;
> +
>      /* Write back all layers by calling one driver function */
>      if (bs->drv->bdrv_co_flush) {
>          ret = bs->drv->bdrv_co_flush(bs);
> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>          goto flush_parent;
>      }
>  
> +    /* Check if we really need to flush anything */
> +    if (bs->flushed_gen == current_gen) {
> +        goto flush_parent;
> +    }
> +
>      BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>      if (bs->drv->bdrv_co_flush_to_disk) {
>          ret = bs->drv->bdrv_co_flush_to_disk(bs);
> @@ -2279,6 +2292,10 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>  flush_parent:
>      ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>  out:
> +    /* Notify any pending flushes that we have completed */
> +    bs->flushed_gen = current_gen;
> +    qemu_co_queue_restart_all(&bs->flush_queue);
> +
>      tracked_request_end(&req);
>      return ret;
>  }
> @@ -2402,6 +2419,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>      }
>      ret = 0;
>  out:
> +    ++bs->write_gen;
>      bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
>                     req.bytes >> BDRV_SECTOR_BITS);
>      tracked_request_end(&req);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 2057156..8543daf 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -420,6 +420,11 @@ struct BlockDriverState {
>                           note this is a reference count */
>      bool probed;
>  
> +    CoQueue flush_queue;            /* Serializing flush queue */
> +    unsigned int write_gen;         /* Current data generation */
> +    unsigned int flush_started_gen; /* Generation for which flush has started */
> +    unsigned int flushed_gen;       /* Flushed write generation */
> +
>      BlockDriver *drv; /* NULL means no media */
>      void *opaque;
>  
> 

Breaks qcow2 iotests 026 089 141 144

--js

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-08 18:44   ` John Snow
@ 2016-07-11 10:12     ` Evgeny Yakovlev
  2016-07-11 21:01       ` John Snow
  0 siblings, 1 reply; 13+ messages in thread
From: Evgeny Yakovlev @ 2016-07-11 10:12 UTC (permalink / raw)
  To: John Snow, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi



On 08.07.2016 21:44, John Snow wrote:
>
> On 07/04/2016 10:38 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a write generation scheme in BlockDriverState.
>> Current write generation is checked against last flushed generation to
>> avoid unnessesary flushes.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>   block.c                   |  3 +++
>>   block/io.c                | 18 ++++++++++++++++++
>>   include/block/block_int.h |  5 +++++
>>   3 files changed, 26 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index f4648e9..366fad6 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -234,6 +234,8 @@ BlockDriverState *bdrv_new(void)
>>       bs->refcnt = 1;
>>       bs->aio_context = qemu_get_aio_context();
>>   
>> +    qemu_co_queue_init(&bs->flush_queue);
>> +
>>       QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
>>   
>>       return bs;
>> @@ -2582,6 +2584,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>           bdrv_dirty_bitmap_truncate(bs);
>>           bdrv_parent_cb_resize(bs);
>> +        ++bs->write_gen;
>>       }
>>       return ret;
>>   }
>> diff --git a/block/io.c b/block/io.c
>> index 7cf3645..a5451b6 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -1294,6 +1294,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
>>       }
>>       bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>>   
>> +    ++bs->write_gen;
>>       bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>>   
>>       if (bs->wr_highest_offset < offset + bytes) {
>> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   {
>>       int ret;
>>       BdrvTrackedRequest req;
>> +    int current_gen = bs->write_gen;
>>   
>>       if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>>           bdrv_is_sg(bs)) {
>> @@ -2219,6 +2221,12 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   
>>       tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>>   
>> +    /* Wait until any previous flushes are completed */
>> +    while (bs->flush_started_gen != bs->flushed_gen) {
>> +        qemu_co_queue_wait(&bs->flush_queue);
>> +    }
>> +    bs->flush_started_gen = current_gen;
>> +
>>       /* Write back all layers by calling one driver function */
>>       if (bs->drv->bdrv_co_flush) {
>>           ret = bs->drv->bdrv_co_flush(bs);
>> @@ -2239,6 +2247,11 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>           goto flush_parent;
>>       }
>>   
>> +    /* Check if we really need to flush anything */
>> +    if (bs->flushed_gen == current_gen) {
>> +        goto flush_parent;
>> +    }
>> +
>>       BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>>       if (bs->drv->bdrv_co_flush_to_disk) {
>>           ret = bs->drv->bdrv_co_flush_to_disk(bs);
>> @@ -2279,6 +2292,10 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>   flush_parent:
>>       ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>>   out:
>> +    /* Notify any pending flushes that we have completed */
>> +    bs->flushed_gen = current_gen;
>> +    qemu_co_queue_restart_all(&bs->flush_queue);
>> +
>>       tracked_request_end(&req);
>>       return ret;
>>   }
>> @@ -2402,6 +2419,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>>       }
>>       ret = 0;
>>   out:
>> +    ++bs->write_gen;
>>       bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
>>                      req.bytes >> BDRV_SECTOR_BITS);
>>       tracked_request_end(&req);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 2057156..8543daf 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -420,6 +420,11 @@ struct BlockDriverState {
>>                            note this is a reference count */
>>       bool probed;
>>   
>> +    CoQueue flush_queue;            /* Serializing flush queue */
>> +    unsigned int write_gen;         /* Current data generation */
>> +    unsigned int flush_started_gen; /* Generation for which flush has started */
>> +    unsigned int flushed_gen;       /* Flushed write generation */
>> +
>>       BlockDriver *drv; /* NULL means no media */
>>       void *opaque;
>>   
>>
> Breaks qcow2 iotests 026 089 141 144

Sorry, didn't knew those tests existed, only ran make check previously.
Looking at 026, looks like it is the same problem as in IDE and AHCI. 
Test case injects blkdebug write errors which should be triggered by 
flushes and expects to see them in output. However those flushes are now 
skipped and no events are generated. Otherwise resulting image looks 
consistent, all data was flushed. Expect the same problem to be with 
other tests, but maybe test case is incorrect now?

>
> --js

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean
  2016-07-11 10:12     ` Evgeny Yakovlev
@ 2016-07-11 21:01       ` John Snow
  0 siblings, 0 replies; 13+ messages in thread
From: John Snow @ 2016-07-11 21:01 UTC (permalink / raw)
  To: Evgeny Yakovlev, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi



On 07/11/2016 06:12 AM, Evgeny Yakovlev wrote:
> 
> 
> On 08.07.2016 21:44, John Snow wrote:
>>
>> On 07/04/2016 10:38 AM, Denis V. Lunev wrote:
>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>
>>> Some guests (win2008 server for example) do a lot of unnecessary
>>> flushing when underlying media has not changed. This adds additional
>>> overhead on host when calling fsync/fdatasync.
>>>
>>> This change introduces a write generation scheme in BlockDriverState.
>>> Current write generation is checked against last flushed generation to
>>> avoid unnessesary flushes.
>>>
>>> The problem with excessive flushing was found by a performance test
>>> which does parallel directory tree creation (from 2 processes).
>>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>>> Each loop creates 10^3 directories with 10 files in each.
>>>
>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: John Snow <jsnow@redhat.com>
>>> ---
>>>   block.c                   |  3 +++
>>>   block/io.c                | 18 ++++++++++++++++++
>>>   include/block/block_int.h |  5 +++++
>>>   3 files changed, 26 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index f4648e9..366fad6 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -234,6 +234,8 @@ BlockDriverState *bdrv_new(void)
>>>       bs->refcnt = 1;
>>>       bs->aio_context = qemu_get_aio_context();
>>>   +    qemu_co_queue_init(&bs->flush_queue);
>>> +
>>>       QTAILQ_INSERT_TAIL(&all_bdrv_states, bs, bs_list);
>>>         return bs;
>>> @@ -2582,6 +2584,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t
>>> offset)
>>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>>           bdrv_dirty_bitmap_truncate(bs);
>>>           bdrv_parent_cb_resize(bs);
>>> +        ++bs->write_gen;
>>>       }
>>>       return ret;
>>>   }
>>> diff --git a/block/io.c b/block/io.c
>>> index 7cf3645..a5451b6 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -1294,6 +1294,7 @@ static int coroutine_fn
>>> bdrv_aligned_pwritev(BlockDriverState *bs,
>>>       }
>>>       bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE);
>>>   +    ++bs->write_gen;
>>>       bdrv_set_dirty(bs, start_sector, end_sector - start_sector);
>>>         if (bs->wr_highest_offset < offset + bytes) {
>>> @@ -2211,6 +2212,7 @@ int coroutine_fn bdrv_co_flush(BlockDriverState
>>> *bs)
>>>   {
>>>       int ret;
>>>       BdrvTrackedRequest req;
>>> +    int current_gen = bs->write_gen;
>>>         if (!bs || !bdrv_is_inserted(bs) || bdrv_is_read_only(bs) ||
>>>           bdrv_is_sg(bs)) {
>>> @@ -2219,6 +2221,12 @@ int coroutine_fn
>>> bdrv_co_flush(BlockDriverState *bs)
>>>         tracked_request_begin(&req, bs, 0, 0, BDRV_TRACKED_FLUSH);
>>>   +    /* Wait until any previous flushes are completed */
>>> +    while (bs->flush_started_gen != bs->flushed_gen) {
>>> +        qemu_co_queue_wait(&bs->flush_queue);
>>> +    }
>>> +    bs->flush_started_gen = current_gen;
>>> +
>>>       /* Write back all layers by calling one driver function */
>>>       if (bs->drv->bdrv_co_flush) {
>>>           ret = bs->drv->bdrv_co_flush(bs);
>>> @@ -2239,6 +2247,11 @@ int coroutine_fn
>>> bdrv_co_flush(BlockDriverState *bs)
>>>           goto flush_parent;
>>>       }
>>>   +    /* Check if we really need to flush anything */
>>> +    if (bs->flushed_gen == current_gen) {
>>> +        goto flush_parent;
>>> +    }
>>> +
>>>       BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>>>       if (bs->drv->bdrv_co_flush_to_disk) {
>>>           ret = bs->drv->bdrv_co_flush_to_disk(bs);
>>> @@ -2279,6 +2292,10 @@ int coroutine_fn
>>> bdrv_co_flush(BlockDriverState *bs)
>>>   flush_parent:
>>>       ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
>>>   out:
>>> +    /* Notify any pending flushes that we have completed */
>>> +    bs->flushed_gen = current_gen;
>>> +    qemu_co_queue_restart_all(&bs->flush_queue);
>>> +
>>>       tracked_request_end(&req);
>>>       return ret;
>>>   }
>>> @@ -2402,6 +2419,7 @@ int coroutine_fn
>>> bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
>>>       }
>>>       ret = 0;
>>>   out:
>>> +    ++bs->write_gen;
>>>       bdrv_set_dirty(bs, req.offset >> BDRV_SECTOR_BITS,
>>>                      req.bytes >> BDRV_SECTOR_BITS);
>>>       tracked_request_end(&req);
>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>> index 2057156..8543daf 100644
>>> --- a/include/block/block_int.h
>>> +++ b/include/block/block_int.h
>>> @@ -420,6 +420,11 @@ struct BlockDriverState {
>>>                            note this is a reference count */
>>>       bool probed;
>>>   +    CoQueue flush_queue;            /* Serializing flush queue */
>>> +    unsigned int write_gen;         /* Current data generation */
>>> +    unsigned int flush_started_gen; /* Generation for which flush
>>> has started */
>>> +    unsigned int flushed_gen;       /* Flushed write generation */
>>> +
>>>       BlockDriver *drv; /* NULL means no media */
>>>       void *opaque;
>>>  
>> Breaks qcow2 iotests 026 089 141 144
> 
> Sorry, didn't knew those tests existed, only ran make check previously.
> Looking at 026, looks like it is the same problem as in IDE and AHCI.
> Test case injects blkdebug write errors which should be triggered by
> flushes and expects to see them in output. However those flushes are now
> skipped and no events are generated. Otherwise resulting image looks
> consistent, all data was flushed. Expect the same problem to be with
> other tests, but maybe test case is incorrect now?
> 

No problem -- these tests don't *always* get run before merge but I like
to enforce it for my tree.

Yes, it just looks like most of the test output has to be updated.

In the case of 026, it looks like most test cases that are testing for
failures have other error text to confirm that the error did indeed
happen, and that the flush messages are just "extraneous" failure
messages -- i.e. not the primary effect being tested for, so it should
be safe to just update the reference output.

I suspect the other tests are similar.

Thank you,
--John

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-11 21:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-04 14:38 [Qemu-devel] [PATCH v5 0/4] block: ignore flush requests when storage is clean Denis V. Lunev
2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 1/4] ide: refactor retry_unit set and clear into separate function Denis V. Lunev
2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 2/4] ide: set retry_unit for PIO and FLUSH requests Denis V. Lunev
2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 3/4] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
2016-07-04 14:38 ` [Qemu-devel] [PATCH v5 4/4] block: ignore flush requests when storage is clean Denis V. Lunev
2016-07-07 23:04   ` Eric Blake
2016-07-08 15:19     ` Evgeny Yakovlev
2016-07-08 18:44   ` John Snow
2016-07-11 10:12     ` Evgeny Yakovlev
2016-07-11 21:01       ` John Snow
2016-07-04 14:53 ` [Qemu-devel] [PATCH v5 0/4] " Paolo Bonzini
2016-07-04 15:48   ` Evgeny Yakovlev
2016-07-07 22:06   ` John Snow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.