All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/3] block: ignore flush requests when storage is clean
@ 2016-06-27 14:47 Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 1/3] " Denis V. Lunev
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-27 14:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

Changes from v3:
- Fixed a typo in commit message
- Rebased on Kevin'n origin/block

Changes from v2:
- Better comments
- Rebased on latest master

Changes from v1:
- Flush requests that should be skipped will now wait for completion
  of any previous requests already in flight
- Fixed IDE and AHCI tests to dirty media for new flush behaviour
- Fixed a problem in IDE CMD_FLUSH_CACHE failure handling

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>

Evgeny Yakovlev (3):
  block: ignore flush requests when storage is clean
  ide: ignore retry_unit check for non-retry operation
  tests: in IDE and AHCI tests perform DMA write before flushing

 block.c                   |  1 +
 block/dirty-bitmap.c      |  3 +++
 block/io.c                | 19 +++++++++++++++++++
 hw/ide/core.c             |  3 ++-
 include/block/block_int.h |  1 +
 tests/ahci-test.c         | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c          | 43 +++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 101 insertions(+), 3 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-27 14:47 [Qemu-devel] [PATCH v4 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
@ 2016-06-27 14:47 ` Denis V. Lunev
  2016-06-28  1:27   ` Fam Zheng
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 2/3] ide: ignore retry_unit check for non-retry operation Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
  2 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-27 14:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a dirty flag in BlockDriverState which is set
in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
avoid unnecessary flushing when storage is clean.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 block.c                   |  1 +
 block/dirty-bitmap.c      |  3 +++
 block/io.c                | 19 +++++++++++++++++++
 include/block/block_int.h |  1 +
 4 files changed, 24 insertions(+)

diff --git a/block.c b/block.c
index 947df29..68ae3a0 100644
--- a/block.c
+++ b/block.c
@@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
         ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
         bdrv_dirty_bitmap_truncate(bs);
         bdrv_parent_cb_resize(bs);
+        bs->dirty = true; /* file node sync is needed after truncate */
     }
     return ret;
 }
diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
index 4902ca5..54e0413 100644
--- a/block/dirty-bitmap.c
+++ b/block/dirty-bitmap.c
@@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
         }
         hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
     }
+
+    /* Set global block driver dirty flag even if bitmap is disabled */
+    bs->dirty = true;
 }
 
 /**
diff --git a/block/io.c b/block/io.c
index b9e53e3..152f5a9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
         goto flush_parent;
     }
 
+    /* Check if storage is actually dirty before flushing to disk */
+    if (!bs->dirty) {
+        /* Flush requests are appended to tracked request list in order so that
+         * most recent request is at the head of the list. Following code uses
+         * this ordering to wait for the most recent flush request to complete
+         * to ensure that requests return in order */
+        BdrvTrackedRequest *prev_req;
+        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
+            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
+                continue;
+            }
+
+            qemu_co_queue_wait(&prev_req->wait_queue);
+            break;
+        }
+        goto flush_parent;
+    }
+    bs->dirty = false;
+
     BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
     if (bs->drv->bdrv_co_flush_to_disk) {
         ret = bs->drv->bdrv_co_flush_to_disk(bs);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0432ba5..59a7def 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -435,6 +435,7 @@ struct BlockDriverState {
     bool valid_key; /* if true, a valid encryption key has been set */
     bool sg;        /* if true, the device is a /dev/sg* */
     bool probed;    /* if true, format was probed rather than specified */
+    bool dirty;     /* if true, media is dirty and should be flushed */
 
     int copy_on_read; /* if nonzero, copy read backing sectors into image.
                          note this is a reference count */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v4 2/3] ide: ignore retry_unit check for non-retry operation
  2016-06-27 14:47 [Qemu-devel] [PATCH v4 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 1/3] " Denis V. Lunev
@ 2016-06-27 14:47 ` Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
  2 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-27 14:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush

When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.

This patch restricts retry unit assertion only to ops that actually use
retry logic.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 hw/ide/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 029f6b9..17f884b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -779,7 +779,8 @@ int ide_handle_rw_error(IDEState *s, int error, int op)
     BlockErrorAction action = blk_get_error_action(s->blk, is_read, error);
 
     if (action == BLOCK_ERROR_ACTION_STOP) {
-        assert(s->bus->retry_unit == s->unit);
+        assert(!(IS_IDE_RETRY_DMA(op) || IS_IDE_RETRY_PIO(op))
+                || s->bus->retry_unit == s->unit);
         s->bus->error_status = op;
     } else if (action == BLOCK_ERROR_ACTION_REPORT) {
         block_acct_failed(blk_get_stats(s->blk), &s->acct);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-27 14:47 [Qemu-devel] [PATCH v4 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 1/3] " Denis V. Lunev
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 2/3] ide: ignore retry_unit check for non-retry operation Denis V. Lunev
@ 2016-06-27 14:47 ` Denis V. Lunev
  2016-06-27 23:19   ` John Snow
  2 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-27 14:47 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: den, Evgeny Yakovlev, Kevin Wolf, Max Reitz, Stefan Hajnoczi,
	Fam Zheng, John Snow

From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>

Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.

This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
---
 tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
 tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/tests/ahci-test.c b/tests/ahci-test.c
index 57dc44c..d707714 100644
--- a/tests/ahci-test.c
+++ b/tests/ahci-test.c
@@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
     g_free(tx);
 }
 
+/*
+ * Write sector 0 with random data to make AHCI storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(AHCIQState* ahci, uint8_t port)
+{
+    uint64_t ptr;
+    unsigned bufsize = 512;
+
+    ptr = ahci_alloc(ahci, bufsize);
+    g_assert(ptr);
+
+    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
+    ahci_free(ahci, ptr);
+}
+
 static void test_flush(void)
 {
     AHCIQState *ahci;
+    uint8_t port;
 
     ahci = ahci_boot_and_enable(NULL);
+
+    port = ahci_port_select(ahci);
+    ahci_port_clear(ahci, port);
+
+    make_dirty(ahci, port);
+
     ahci_test_flush(ahci);
     ahci_shutdown(ahci);
 }
@@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
                                 debug_path,
                                 tmp_path, imgfmt);
 
-    /* Issue Flush Command and wait for error */
     port = ahci_port_select(ahci);
     ahci_port_clear(ahci, port);
 
+    /* Issue write so that flush actually goes to disk */
+    make_dirty(ahci, port);
+
+    /* Issue Flush Command and wait for error */
     cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
     ahci_guest_io_resume(ahci, cmd);
 
@@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
 
     set_context(src->parent);
 
-    /* Issue Flush Command */
     px = ahci_port_select(src);
     ahci_port_clear(src, px);
+
+    /* Dirty device so that flush reaches disk */
+    make_dirty(src, px);
+
+    /* Issue Flush Command */
     cmd = ahci_command_create(CMD_FLUSH_CACHE);
     ahci_command_commit(src, cmd, px);
     ahci_command_issue_async(src, cmd);
diff --git a/tests/ide-test.c b/tests/ide-test.c
index fed1b2e..8466d0f 100644
--- a/tests/ide-test.c
+++ b/tests/ide-test.c
@@ -499,6 +499,39 @@ static void test_identify(void)
     ide_test_quit();
 }
 
+/*
+ * Write sector 0 with random data to make IDE storage dirty
+ * Needed for flush tests so that flushes actually go though the block layer
+ */
+static void make_dirty(uint8_t device)
+{
+    uint8_t status;
+    size_t len = 512;
+    uintptr_t guest_buf;
+    void* buf;
+
+    guest_buf = guest_alloc(guest_malloc, len);
+    buf = g_malloc(len);
+    g_assert(guest_buf);
+    g_assert(buf);
+
+    memwrite(guest_buf, buf, len);
+
+    PrdtEntry prdt[] = {
+        {
+            .addr = cpu_to_le32(guest_buf),
+            .size = cpu_to_le32(len | PRDT_EOT),
+        },
+    };
+
+    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
+                              ARRAY_SIZE(prdt), NULL);
+    g_assert_cmphex(status, ==, BM_STS_INTR);
+    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
+
+    g_free(buf);
+}
+
 static void test_flush(void)
 {
     uint8_t data;
@@ -507,6 +540,11 @@ static void test_flush(void)
         "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
         tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* Delay the completion of the flush request until we explicitly do it */
     g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
 
@@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
         "rerror=stop,werror=stop",
         debug_path, tmp_path);
 
+    qtest_irq_intercept_in(global_qtest, "ioapic");
+
+    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
+    make_dirty(0);
+
     /* FLUSH CACHE command on device 0*/
     outb(IDE_BASE + reg_device, 0);
     outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
@ 2016-06-27 23:19   ` John Snow
  2016-06-28  9:11     ` Denis V. Lunev
  2016-06-28  9:21     ` Evgeny Yakovlev
  0 siblings, 2 replies; 14+ messages in thread
From: John Snow @ 2016-06-27 23:19 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi



On 06/27/2016 10:47 AM, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Due to changes in flush behaviour clean disks stopped generating
> flush_to_disk events and IDE and AHCI tests that test flush commands
> started to fail.
> 
> This change adds additional DMA writes to affected tests before sending
> flush commands so that bdrv_flush actually generates flush_to_disk event.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---
>  tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
>  tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 75 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
> index 57dc44c..d707714 100644
> --- a/tests/ahci-test.c
> +++ b/tests/ahci-test.c
> @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
>      g_free(tx);
>  }
>  
> +/*
> + * Write sector 0 with random data to make AHCI storage dirty
> + * Needed for flush tests so that flushes actually go though the block layer
> + */
> +static void make_dirty(AHCIQState* ahci, uint8_t port)
> +{
> +    uint64_t ptr;
> +    unsigned bufsize = 512;
> +
> +    ptr = ahci_alloc(ahci, bufsize);
> +    g_assert(ptr);
> +
> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
> +    ahci_free(ahci, ptr);
> +}
> +
>  static void test_flush(void)
>  {
>      AHCIQState *ahci;
> +    uint8_t port;
>  
>      ahci = ahci_boot_and_enable(NULL);
> +
> +    port = ahci_port_select(ahci);
> +    ahci_port_clear(ahci, port);
> +
> +    make_dirty(ahci, port);
> +
>      ahci_test_flush(ahci);
>      ahci_shutdown(ahci);
>  }
> @@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
>                                  debug_path,
>                                  tmp_path, imgfmt);
>  
> -    /* Issue Flush Command and wait for error */
>      port = ahci_port_select(ahci);
>      ahci_port_clear(ahci, port);
>  
> +    /* Issue write so that flush actually goes to disk */
> +    make_dirty(ahci, port);
> +
> +    /* Issue Flush Command and wait for error */
>      cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
>      ahci_guest_io_resume(ahci, cmd);
>  
> @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
>  
>      set_context(src->parent);
>  
> -    /* Issue Flush Command */
>      px = ahci_port_select(src);
>      ahci_port_clear(src, px);
> +
> +    /* Dirty device so that flush reaches disk */
> +    make_dirty(src, px);
> +
> +    /* Issue Flush Command */
>      cmd = ahci_command_create(CMD_FLUSH_CACHE);
>      ahci_command_commit(src, cmd, px);
>      ahci_command_issue_async(src, cmd);
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index fed1b2e..8466d0f 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -499,6 +499,39 @@ static void test_identify(void)
>      ide_test_quit();
>  }
>  
> +/*
> + * Write sector 0 with random data to make IDE storage dirty
> + * Needed for flush tests so that flushes actually go though the block layer
> + */
> +static void make_dirty(uint8_t device)
> +{
> +    uint8_t status;
> +    size_t len = 512;
> +    uintptr_t guest_buf;
> +    void* buf;
> +
> +    guest_buf = guest_alloc(guest_malloc, len);
> +    buf = g_malloc(len);
> +    g_assert(guest_buf);
> +    g_assert(buf);
> +
> +    memwrite(guest_buf, buf, len);
> +
> +    PrdtEntry prdt[] = {
> +        {
> +            .addr = cpu_to_le32(guest_buf),
> +            .size = cpu_to_le32(len | PRDT_EOT),
> +        },
> +    };
> +
> +    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
> +                              ARRAY_SIZE(prdt), NULL);
> +    g_assert_cmphex(status, ==, BM_STS_INTR);
> +    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
> +
> +    g_free(buf);
> +}
> +
>  static void test_flush(void)
>  {
>      uint8_t data;
> @@ -507,6 +540,11 @@ static void test_flush(void)
>          "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
>          tmp_path);
>  
> +    qtest_irq_intercept_in(global_qtest, "ioapic");
> +
> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
> +    make_dirty(0);
> +
>      /* Delay the completion of the flush request until we explicitly do it */
>      g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
>  
> @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
>          "rerror=stop,werror=stop",
>          debug_path, tmp_path);
>  
> +    qtest_irq_intercept_in(global_qtest, "ioapic");
> +
> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
> +    make_dirty(0);
> +
>      /* FLUSH CACHE command on device 0*/
>      outb(IDE_BASE + reg_device, 0);
>      outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
> 

Reviewed-by: John Snow <jsnow@redhat.com>

However, the patch-set needs to be re-ordered as 3-1-2. Intermediate
patch commits cannot have failing tests.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 1/3] " Denis V. Lunev
@ 2016-06-28  1:27   ` Fam Zheng
  2016-06-28  9:10     ` Denis V. Lunev
  0 siblings, 1 reply; 14+ messages in thread
From: Fam Zheng @ 2016-06-28  1:27 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: qemu-block, qemu-devel, Evgeny Yakovlev, Kevin Wolf, Max Reitz,
	Stefan Hajnoczi, John Snow

On Mon, 06/27 17:47, Denis V. Lunev wrote:
> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> 
> Some guests (win2008 server for example) do a lot of unnecessary
> flushing when underlying media has not changed. This adds additional
> overhead on host when calling fsync/fdatasync.
> 
> This change introduces a dirty flag in BlockDriverState which is set
> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> avoid unnecessary flushing when storage is clean.
> 
> The problem with excessive flushing was found by a performance test
> which does parallel directory tree creation (from 2 processes).
> Results improved from 0.424 loops/sec to 0.432 loops/sec.
> Each loop creates 10^3 directories with 10 files in each.
> 
> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <famz@redhat.com>
> CC: John Snow <jsnow@redhat.com>
> ---
>  block.c                   |  1 +
>  block/dirty-bitmap.c      |  3 +++
>  block/io.c                | 19 +++++++++++++++++++
>  include/block/block_int.h |  1 +
>  4 files changed, 24 insertions(+)
> 
> diff --git a/block.c b/block.c
> index 947df29..68ae3a0 100644
> --- a/block.c
> +++ b/block.c
> @@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>          ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>          bdrv_dirty_bitmap_truncate(bs);
>          bdrv_parent_cb_resize(bs);
> +        bs->dirty = true; /* file node sync is needed after truncate */
>      }
>      return ret;
>  }
> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> index 4902ca5..54e0413 100644
> --- a/block/dirty-bitmap.c
> +++ b/block/dirty-bitmap.c
> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>          }
>          hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>      }
> +
> +    /* Set global block driver dirty flag even if bitmap is disabled */
> +    bs->dirty = true;
>  }
>  
>  /**
> diff --git a/block/io.c b/block/io.c
> index b9e53e3..152f5a9 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>          goto flush_parent;
>      }
>  
> +    /* Check if storage is actually dirty before flushing to disk */
> +    if (!bs->dirty) {
> +        /* Flush requests are appended to tracked request list in order so that
> +         * most recent request is at the head of the list. Following code uses
> +         * this ordering to wait for the most recent flush request to complete
> +         * to ensure that requests return in order */
> +        BdrvTrackedRequest *prev_req;
> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
> +                continue;
> +            }
> +
> +            qemu_co_queue_wait(&prev_req->wait_queue);
> +            break;
> +        }
> +        goto flush_parent;

Should we check bs->dirty again after qemu_co_queue_wait()? I think another
write request could sneak in while this coroutine yields.

> +    }
> +    bs->dirty = false;
> +
>      BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>      if (bs->drv->bdrv_co_flush_to_disk) {
>          ret = bs->drv->bdrv_co_flush_to_disk(bs);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 0432ba5..59a7def 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -435,6 +435,7 @@ struct BlockDriverState {
>      bool valid_key; /* if true, a valid encryption key has been set */
>      bool sg;        /* if true, the device is a /dev/sg* */
>      bool probed;    /* if true, format was probed rather than specified */
> +    bool dirty;     /* if true, media is dirty and should be flushed */

How about renaming this to "need_flush"? The one "dirty" we had is set by
bdrv_set_dirty, and cleared by bdrv_reset_dirty_bitmap. I'd avoid the
confusion between the two concepts.

Fam

>  
>      int copy_on_read; /* if nonzero, copy read backing sectors into image.
>                           note this is a reference count */
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-28  1:27   ` Fam Zheng
@ 2016-06-28  9:10     ` Denis V. Lunev
  2016-06-29  1:12       ` Fam Zheng
  0 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-28  9:10 UTC (permalink / raw)
  To: Fam Zheng
  Cc: qemu-block, qemu-devel, Evgeny Yakovlev, Kevin Wolf, Max Reitz,
	Stefan Hajnoczi, John Snow

On 06/28/2016 04:27 AM, Fam Zheng wrote:
> On Mon, 06/27 17:47, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Some guests (win2008 server for example) do a lot of unnecessary
>> flushing when underlying media has not changed. This adds additional
>> overhead on host when calling fsync/fdatasync.
>>
>> This change introduces a dirty flag in BlockDriverState which is set
>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>> avoid unnecessary flushing when storage is clean.
>>
>> The problem with excessive flushing was found by a performance test
>> which does parallel directory tree creation (from 2 processes).
>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>> Each loop creates 10^3 directories with 10 files in each.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>   block.c                   |  1 +
>>   block/dirty-bitmap.c      |  3 +++
>>   block/io.c                | 19 +++++++++++++++++++
>>   include/block/block_int.h |  1 +
>>   4 files changed, 24 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index 947df29..68ae3a0 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>           bdrv_dirty_bitmap_truncate(bs);
>>           bdrv_parent_cb_resize(bs);
>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>       }
>>       return ret;
>>   }
>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>> index 4902ca5..54e0413 100644
>> --- a/block/dirty-bitmap.c
>> +++ b/block/dirty-bitmap.c
>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>           }
>>           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>       }
>> +
>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>> +    bs->dirty = true;
>>   }
>>   
>>   /**
>> diff --git a/block/io.c b/block/io.c
>> index b9e53e3..152f5a9 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>           goto flush_parent;
>>       }
>>   
>> +    /* Check if storage is actually dirty before flushing to disk */
>> +    if (!bs->dirty) {
>> +        /* Flush requests are appended to tracked request list in order so that
>> +         * most recent request is at the head of the list. Following code uses
>> +         * this ordering to wait for the most recent flush request to complete
>> +         * to ensure that requests return in order */
>> +        BdrvTrackedRequest *prev_req;
>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>> +                continue;
>> +            }
>> +
>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>> +            break;
>> +        }
>> +        goto flush_parent;
> Should we check bs->dirty again after qemu_co_queue_wait()? I think another
> write request could sneak in while this coroutine yields.
no, we do not care. Any subsequent to FLUSH write does not guaranteed to
be flushed. We have the warranty only that all write requests completed
prior to this flush are really flushed.



>> +    }
>> +    bs->dirty = false;
>> +
>>       BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
>>       if (bs->drv->bdrv_co_flush_to_disk) {
>>           ret = bs->drv->bdrv_co_flush_to_disk(bs);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 0432ba5..59a7def 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -435,6 +435,7 @@ struct BlockDriverState {
>>       bool valid_key; /* if true, a valid encryption key has been set */
>>       bool sg;        /* if true, the device is a /dev/sg* */
>>       bool probed;    /* if true, format was probed rather than specified */
>> +    bool dirty;     /* if true, media is dirty and should be flushed */
> How about renaming this to "need_flush"? The one "dirty" we had is set by
> bdrv_set_dirty, and cleared by bdrv_reset_dirty_bitmap. I'd avoid the
> confusion between the two concepts.
>
> Fam

can be

>>   
>>       int copy_on_read; /* if nonzero, copy read backing sectors into image.
>>                            note this is a reference count */
>> -- 
>> 2.1.4
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-27 23:19   ` John Snow
@ 2016-06-28  9:11     ` Denis V. Lunev
  2016-06-28  9:21     ` Evgeny Yakovlev
  1 sibling, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-28  9:11 UTC (permalink / raw)
  To: John Snow, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Evgeny Yakovlev, Max Reitz, Stefan Hajnoczi

On 06/28/2016 02:19 AM, John Snow wrote:
>
> On 06/27/2016 10:47 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Due to changes in flush behaviour clean disks stopped generating
>> flush_to_disk events and IDE and AHCI tests that test flush commands
>> started to fail.
>>
>> This change adds additional DMA writes to affected tests before sending
>> flush commands so that bdrv_flush actually generates flush_to_disk event.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>   tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
>>   tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 75 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
>> index 57dc44c..d707714 100644
>> --- a/tests/ahci-test.c
>> +++ b/tests/ahci-test.c
>> @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
>>       g_free(tx);
>>   }
>>   
>> +/*
>> + * Write sector 0 with random data to make AHCI storage dirty
>> + * Needed for flush tests so that flushes actually go though the block layer
>> + */
>> +static void make_dirty(AHCIQState* ahci, uint8_t port)
>> +{
>> +    uint64_t ptr;
>> +    unsigned bufsize = 512;
>> +
>> +    ptr = ahci_alloc(ahci, bufsize);
>> +    g_assert(ptr);
>> +
>> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
>> +    ahci_free(ahci, ptr);
>> +}
>> +
>>   static void test_flush(void)
>>   {
>>       AHCIQState *ahci;
>> +    uint8_t port;
>>   
>>       ahci = ahci_boot_and_enable(NULL);
>> +
>> +    port = ahci_port_select(ahci);
>> +    ahci_port_clear(ahci, port);
>> +
>> +    make_dirty(ahci, port);
>> +
>>       ahci_test_flush(ahci);
>>       ahci_shutdown(ahci);
>>   }
>> @@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
>>                                   debug_path,
>>                                   tmp_path, imgfmt);
>>   
>> -    /* Issue Flush Command and wait for error */
>>       port = ahci_port_select(ahci);
>>       ahci_port_clear(ahci, port);
>>   
>> +    /* Issue write so that flush actually goes to disk */
>> +    make_dirty(ahci, port);
>> +
>> +    /* Issue Flush Command and wait for error */
>>       cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
>>       ahci_guest_io_resume(ahci, cmd);
>>   
>> @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
>>   
>>       set_context(src->parent);
>>   
>> -    /* Issue Flush Command */
>>       px = ahci_port_select(src);
>>       ahci_port_clear(src, px);
>> +
>> +    /* Dirty device so that flush reaches disk */
>> +    make_dirty(src, px);
>> +
>> +    /* Issue Flush Command */
>>       cmd = ahci_command_create(CMD_FLUSH_CACHE);
>>       ahci_command_commit(src, cmd, px);
>>       ahci_command_issue_async(src, cmd);
>> diff --git a/tests/ide-test.c b/tests/ide-test.c
>> index fed1b2e..8466d0f 100644
>> --- a/tests/ide-test.c
>> +++ b/tests/ide-test.c
>> @@ -499,6 +499,39 @@ static void test_identify(void)
>>       ide_test_quit();
>>   }
>>   
>> +/*
>> + * Write sector 0 with random data to make IDE storage dirty
>> + * Needed for flush tests so that flushes actually go though the block layer
>> + */
>> +static void make_dirty(uint8_t device)
>> +{
>> +    uint8_t status;
>> +    size_t len = 512;
>> +    uintptr_t guest_buf;
>> +    void* buf;
>> +
>> +    guest_buf = guest_alloc(guest_malloc, len);
>> +    buf = g_malloc(len);
>> +    g_assert(guest_buf);
>> +    g_assert(buf);
>> +
>> +    memwrite(guest_buf, buf, len);
>> +
>> +    PrdtEntry prdt[] = {
>> +        {
>> +            .addr = cpu_to_le32(guest_buf),
>> +            .size = cpu_to_le32(len | PRDT_EOT),
>> +        },
>> +    };
>> +
>> +    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
>> +                              ARRAY_SIZE(prdt), NULL);
>> +    g_assert_cmphex(status, ==, BM_STS_INTR);
>> +    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
>> +
>> +    g_free(buf);
>> +}
>> +
>>   static void test_flush(void)
>>   {
>>       uint8_t data;
>> @@ -507,6 +540,11 @@ static void test_flush(void)
>>           "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
>>           tmp_path);
>>   
>> +    qtest_irq_intercept_in(global_qtest, "ioapic");
>> +
>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>> +    make_dirty(0);
>> +
>>       /* Delay the completion of the flush request until we explicitly do it */
>>       g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
>>   
>> @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
>>           "rerror=stop,werror=stop",
>>           debug_path, tmp_path);
>>   
>> +    qtest_irq_intercept_in(global_qtest, "ioapic");
>> +
>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>> +    make_dirty(0);
>> +
>>       /* FLUSH CACHE command on device 0*/
>>       outb(IDE_BASE + reg_device, 0);
>>       outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
>>
> Reviewed-by: John Snow <jsnow@redhat.com>
>
> However, the patch-set needs to be re-ordered as 3-1-2. Intermediate
> patch commits cannot have failing tests.
ok

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-27 23:19   ` John Snow
  2016-06-28  9:11     ` Denis V. Lunev
@ 2016-06-28  9:21     ` Evgeny Yakovlev
  2016-06-28 16:37       ` John Snow
  1 sibling, 1 reply; 14+ messages in thread
From: Evgeny Yakovlev @ 2016-06-28  9:21 UTC (permalink / raw)
  To: John Snow, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi


On 28.06.2016 02:19, John Snow wrote:
>
> On 06/27/2016 10:47 AM, Denis V. Lunev wrote:
>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>
>> Due to changes in flush behaviour clean disks stopped generating
>> flush_to_disk events and IDE and AHCI tests that test flush commands
>> started to fail.
>>
>> This change adds additional DMA writes to affected tests before sending
>> flush commands so that bdrv_flush actually generates flush_to_disk event.
>>
>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <famz@redhat.com>
>> CC: John Snow <jsnow@redhat.com>
>> ---
>>   tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
>>   tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 75 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
>> index 57dc44c..d707714 100644
>> --- a/tests/ahci-test.c
>> +++ b/tests/ahci-test.c
>> @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
>>       g_free(tx);
>>   }
>>   
>> +/*
>> + * Write sector 0 with random data to make AHCI storage dirty
>> + * Needed for flush tests so that flushes actually go though the block layer
>> + */
>> +static void make_dirty(AHCIQState* ahci, uint8_t port)
>> +{
>> +    uint64_t ptr;
>> +    unsigned bufsize = 512;
>> +
>> +    ptr = ahci_alloc(ahci, bufsize);
>> +    g_assert(ptr);
>> +
>> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
>> +    ahci_free(ahci, ptr);
>> +}
>> +
>>   static void test_flush(void)
>>   {
>>       AHCIQState *ahci;
>> +    uint8_t port;
>>   
>>       ahci = ahci_boot_and_enable(NULL);
>> +
>> +    port = ahci_port_select(ahci);
>> +    ahci_port_clear(ahci, port);
>> +
>> +    make_dirty(ahci, port);
>> +
>>       ahci_test_flush(ahci);
>>       ahci_shutdown(ahci);
>>   }
>> @@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
>>                                   debug_path,
>>                                   tmp_path, imgfmt);
>>   
>> -    /* Issue Flush Command and wait for error */
>>       port = ahci_port_select(ahci);
>>       ahci_port_clear(ahci, port);
>>   
>> +    /* Issue write so that flush actually goes to disk */
>> +    make_dirty(ahci, port);
>> +
>> +    /* Issue Flush Command and wait for error */
>>       cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
>>       ahci_guest_io_resume(ahci, cmd);
>>   
>> @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
>>   
>>       set_context(src->parent);
>>   
>> -    /* Issue Flush Command */
>>       px = ahci_port_select(src);
>>       ahci_port_clear(src, px);
>> +
>> +    /* Dirty device so that flush reaches disk */
>> +    make_dirty(src, px);
>> +
>> +    /* Issue Flush Command */
>>       cmd = ahci_command_create(CMD_FLUSH_CACHE);
>>       ahci_command_commit(src, cmd, px);
>>       ahci_command_issue_async(src, cmd);
>> diff --git a/tests/ide-test.c b/tests/ide-test.c
>> index fed1b2e..8466d0f 100644
>> --- a/tests/ide-test.c
>> +++ b/tests/ide-test.c
>> @@ -499,6 +499,39 @@ static void test_identify(void)
>>       ide_test_quit();
>>   }
>>   
>> +/*
>> + * Write sector 0 with random data to make IDE storage dirty
>> + * Needed for flush tests so that flushes actually go though the block layer
>> + */
>> +static void make_dirty(uint8_t device)
>> +{
>> +    uint8_t status;
>> +    size_t len = 512;
>> +    uintptr_t guest_buf;
>> +    void* buf;
>> +
>> +    guest_buf = guest_alloc(guest_malloc, len);
>> +    buf = g_malloc(len);
>> +    g_assert(guest_buf);
>> +    g_assert(buf);
>> +
>> +    memwrite(guest_buf, buf, len);
>> +
>> +    PrdtEntry prdt[] = {
>> +        {
>> +            .addr = cpu_to_le32(guest_buf),
>> +            .size = cpu_to_le32(len | PRDT_EOT),
>> +        },
>> +    };
>> +
>> +    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
>> +                              ARRAY_SIZE(prdt), NULL);
>> +    g_assert_cmphex(status, ==, BM_STS_INTR);
>> +    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
>> +
>> +    g_free(buf);
>> +}
>> +
>>   static void test_flush(void)
>>   {
>>       uint8_t data;
>> @@ -507,6 +540,11 @@ static void test_flush(void)
>>           "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
>>           tmp_path);
>>   
>> +    qtest_irq_intercept_in(global_qtest, "ioapic");
>> +
>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>> +    make_dirty(0);
>> +
>>       /* Delay the completion of the flush request until we explicitly do it */
>>       g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
>>   
>> @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
>>           "rerror=stop,werror=stop",
>>           debug_path, tmp_path);
>>   
>> +    qtest_irq_intercept_in(global_qtest, "ioapic");
>> +
>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>> +    make_dirty(0);
>> +
>>       /* FLUSH CACHE command on device 0*/
>>       outb(IDE_BASE + reg_device, 0);
>>       outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
>>
> Reviewed-by: John Snow <jsnow@redhat.com>
>
> However, the patch-set needs to be re-ordered as 3-1-2. Intermediate
> patch commits cannot have failing tests.

Sure, although 2-3-1 would probably be better, since 2 fixes an assert 
in IDE that is triggered by new tests in 3?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-28  9:21     ` Evgeny Yakovlev
@ 2016-06-28 16:37       ` John Snow
  2016-06-29 17:40         ` John Snow
  0 siblings, 1 reply; 14+ messages in thread
From: John Snow @ 2016-06-28 16:37 UTC (permalink / raw)
  To: Evgeny Yakovlev, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi



On 06/28/2016 05:21 AM, Evgeny Yakovlev wrote:
> 
> On 28.06.2016 02:19, John Snow wrote:
>>
>> On 06/27/2016 10:47 AM, Denis V. Lunev wrote:
>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>
>>> Due to changes in flush behaviour clean disks stopped generating
>>> flush_to_disk events and IDE and AHCI tests that test flush commands
>>> started to fail.
>>>
>>> This change adds additional DMA writes to affected tests before sending
>>> flush commands so that bdrv_flush actually generates flush_to_disk
>>> event.
>>>
>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <famz@redhat.com>
>>> CC: John Snow <jsnow@redhat.com>
>>> ---
>>>   tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
>>>   tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>   2 files changed, 75 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
>>> index 57dc44c..d707714 100644
>>> --- a/tests/ahci-test.c
>>> +++ b/tests/ahci-test.c
>>> @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
>>>       g_free(tx);
>>>   }
>>>   +/*
>>> + * Write sector 0 with random data to make AHCI storage dirty
>>> + * Needed for flush tests so that flushes actually go though the
>>> block layer
>>> + */
>>> +static void make_dirty(AHCIQState* ahci, uint8_t port)
>>> +{
>>> +    uint64_t ptr;
>>> +    unsigned bufsize = 512;
>>> +
>>> +    ptr = ahci_alloc(ahci, bufsize);
>>> +    g_assert(ptr);
> With no link to allow us to view the keynote from another room or our
> desks, most of us had not choice but to abandon our attempt to actually
> listen Jim talk about the future of Red Hat. 
>>> +
>>> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
>>> +    ahci_free(ahci, ptr);
>>> +}
>>> +
>>>   static void test_flush(void)
>>>   {
>>>       AHCIQState *ahci;
>>> +    uint8_t port;
>>>         ahci = ahci_boot_and_enable(NULL);
>>> +
>>> +    port = ahci_port_select(ahci);
>>> +    ahci_port_clear(ahci, port);
>>> +
>>> +    make_dirty(ahci, port);
>>> +
>>>       ahci_test_flush(ahci);
>>>       ahci_shutdown(ahci);
>>>   }
>>> @@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
>>>                                   debug_path,
>>>                                   tmp_path, imgfmt);
>>>   -    /* Issue Flush Command and wait for error */
>>>       port = ahci_port_select(ahci);
>>>       ahci_port_clear(ahci, port);
>>>   +    /* Issue write so that flush actually goes to disk */
>>> +    make_dirty(ahci, port);
>>> +
>>> +    /* Issue Flush Command and wait for error */
>>>       cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
>>>       ahci_guest_io_resume(ahci, cmd);
>>>   @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
>>>         set_context(src->parent);
>>>   -    /* Issue Flush Command */
>>>       px = ahci_port_select(src);
>>>       ahci_port_clear(src, px);
>>> +
>>> +    /* Dirty device so that flush reaches disk */
>>> +    make_dirty(src, px);
>>> +
>>> +    /* Issue Flush Command */
>>>       cmd = ahci_command_create(CMD_FLUSH_CACHE);
>>>       ahci_command_commit(src, cmd, px);
>>>       ahci_command_issue_async(src, cmd);
>>> diff --git a/tests/ide-test.c b/tests/ide-test.c
>>> index fed1b2e..8466d0f 100644
>>> --- a/tests/ide-test.c
>>> +++ b/tests/ide-test.c
>>> @@ -499,6 +499,39 @@ static void test_identify(void)
>>>       ide_test_quit();
>>>   }
>>>   +/*
>>> + * Write sector 0 with random data to make IDE storage dirty
>>> + * Needed for flush tests so that flushes actually go though the
>>> block layer
>>> + */
>>> +static void make_dirty(uint8_t device)
>>> +{
>>> +    uint8_t status;
>>> +    size_t len = 512;
>>> +    uintptr_t guest_buf;
>>> +    void* buf;
>>> +
> With no link to allow us to view the keynote from another room or our
> desks, most of us had not choice but to abandon our attempt to actually
> listen Jim talk about the future of Red Hat. 
>>> +    guest_buf = guest_alloc(guest_malloc, len);
>>> +    buf = g_malloc(len);
>>> +    g_assert(guest_buf);
>>> +    g_assert(buf);
>>> +
>>> +    memwrite(guest_buf, buf, len);
>>> +
>>> +    PrdtEntry prdt[] = {
>>> +        {
>>> +            .addr = cpu_to_le32(guest_buf),
>>> +            .size = cpu_to_le32(len | PRDT_EOT),
>>> +        },
>>> +    };
>>> +
>>> +    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
>>> +                              ARRAY_SIZE(prdt), NULL);
>>> +    g_assert_cmphex(status, ==, BM_STS_INTR);
>>> +    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
>>> +
>>> +    g_free(buf);
>>> +}
>>> +
>>>   static void test_flush(void)
>>>   {
>>>       uint8_t data;
>>> @@ -507,6 +540,11 @@ static void test_flush(void)
>>>           "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
>>>           tmp_path);
>>>   +    qtest_irq_intercept_in(global_qtest, "ioapic");
>>> +
>>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>>> +    make_dirty(0);
>>> +
>>>       /* Delay the completion of the flush request until we
>>> explicitly do it */
>>>       g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
>>>   @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
>>>           "rerror=stop,werror=stop",
>>>           debug_path, tmp_path);
>>>   +    qtest_irq_intercept_in(global_qtest, "ioapic");
>>> +
>>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>>> +    make_dirty(0);
>>> +
>>>       /* FLUSH CACHE command on device 0*/
>>>       outb(IDE_BASE + reg_device, 0);
>>>       outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
>>>
>> Reviewed-by: John Snow <jsnow@redhat.com>
>>
>> However, the patch-set needs to be re-ordered as 3-1-2. Intermediate
>> patch commits cannot have failing tests.
> 
> Sure, although 2-3-1 would probably be better, since 2 fixes an assert
> in IDE that is triggered by new tests in 3?
> 

:)

Yes, thanks!

--js

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-28  9:10     ` Denis V. Lunev
@ 2016-06-29  1:12       ` Fam Zheng
  2016-06-29  8:30         ` Denis V. Lunev
  2016-06-29  9:09         ` Stefan Hajnoczi
  0 siblings, 2 replies; 14+ messages in thread
From: Fam Zheng @ 2016-06-29  1:12 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Kevin Wolf, Evgeny Yakovlev, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, John Snow

On Tue, 06/28 12:10, Denis V. Lunev wrote:
> On 06/28/2016 04:27 AM, Fam Zheng wrote:
> > On Mon, 06/27 17:47, Denis V. Lunev wrote:
> > > From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> > > 
> > > Some guests (win2008 server for example) do a lot of unnecessary
> > > flushing when underlying media has not changed. This adds additional
> > > overhead on host when calling fsync/fdatasync.
> > > 
> > > This change introduces a dirty flag in BlockDriverState which is set
> > > in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> > > avoid unnecessary flushing when storage is clean.
> > > 
> > > The problem with excessive flushing was found by a performance test
> > > which does parallel directory tree creation (from 2 processes).
> > > Results improved from 0.424 loops/sec to 0.432 loops/sec.
> > > Each loop creates 10^3 directories with 10 files in each.
> > > 
> > > Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > CC: Kevin Wolf <kwolf@redhat.com>
> > > CC: Max Reitz <mreitz@redhat.com>
> > > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > > CC: Fam Zheng <famz@redhat.com>
> > > CC: John Snow <jsnow@redhat.com>
> > > ---
> > >   block.c                   |  1 +
> > >   block/dirty-bitmap.c      |  3 +++
> > >   block/io.c                | 19 +++++++++++++++++++
> > >   include/block/block_int.h |  1 +
> > >   4 files changed, 24 insertions(+)
> > > 
> > > diff --git a/block.c b/block.c
> > > index 947df29..68ae3a0 100644
> > > --- a/block.c
> > > +++ b/block.c
> > > @@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
> > >           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
> > >           bdrv_dirty_bitmap_truncate(bs);
> > >           bdrv_parent_cb_resize(bs);
> > > +        bs->dirty = true; /* file node sync is needed after truncate */
> > >       }
> > >       return ret;
> > >   }
> > > diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> > > index 4902ca5..54e0413 100644
> > > --- a/block/dirty-bitmap.c
> > > +++ b/block/dirty-bitmap.c
> > > @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> > >           }
> > >           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> > >       }
> > > +
> > > +    /* Set global block driver dirty flag even if bitmap is disabled */
> > > +    bs->dirty = true;
> > >   }
> > >   /**
> > > diff --git a/block/io.c b/block/io.c
> > > index b9e53e3..152f5a9 100644
> > > --- a/block/io.c
> > > +++ b/block/io.c
> > > @@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> > >           goto flush_parent;
> > >       }
> > > +    /* Check if storage is actually dirty before flushing to disk */
> > > +    if (!bs->dirty) {
> > > +        /* Flush requests are appended to tracked request list in order so that
> > > +         * most recent request is at the head of the list. Following code uses
> > > +         * this ordering to wait for the most recent flush request to complete
> > > +         * to ensure that requests return in order */
> > > +        BdrvTrackedRequest *prev_req;
> > > +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
> > > +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
> > > +                continue;
> > > +            }
> > > +
> > > +            qemu_co_queue_wait(&prev_req->wait_queue);
> > > +            break;
> > > +        }
> > > +        goto flush_parent;
> > Should we check bs->dirty again after qemu_co_queue_wait()? I think another
> > write request could sneak in while this coroutine yields.
> no, we do not care. Any subsequent to FLUSH write does not guaranteed to
> be flushed. We have the warranty only that all write requests completed
> prior to this flush are really flushed.

I'm not worried about subsequent requests.

A prior request can be already in progress or be waiting when we check
bs->dirty, though it would be false there, but it will become true soon --
bdrv_set_dirty is only called when a request is completing.

Fam

> 
> 
> 
> > > +    }
> > > +    bs->dirty = false;
> > > +
> > >       BLKDBG_EVENT(bs->file, BLKDBG_FLUSH_TO_DISK);
> > >       if (bs->drv->bdrv_co_flush_to_disk) {
> > >           ret = bs->drv->bdrv_co_flush_to_disk(bs);
> > > diff --git a/include/block/block_int.h b/include/block/block_int.h
> > > index 0432ba5..59a7def 100644
> > > --- a/include/block/block_int.h
> > > +++ b/include/block/block_int.h
> > > @@ -435,6 +435,7 @@ struct BlockDriverState {
> > >       bool valid_key; /* if true, a valid encryption key has been set */
> > >       bool sg;        /* if true, the device is a /dev/sg* */
> > >       bool probed;    /* if true, format was probed rather than specified */
> > > +    bool dirty;     /* if true, media is dirty and should be flushed */
> > How about renaming this to "need_flush"? The one "dirty" we had is set by
> > bdrv_set_dirty, and cleared by bdrv_reset_dirty_bitmap. I'd avoid the
> > confusion between the two concepts.
> > 
> > Fam
> 
> can be
> 
> > >       int copy_on_read; /* if nonzero, copy read backing sectors into image.
> > >                            note this is a reference count */
> > > -- 
> > > 2.1.4
> > > 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-29  1:12       ` Fam Zheng
@ 2016-06-29  8:30         ` Denis V. Lunev
  2016-06-29  9:09         ` Stefan Hajnoczi
  1 sibling, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2016-06-29  8:30 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Kevin Wolf, Evgeny Yakovlev, qemu-block, qemu-devel, Max Reitz,
	Stefan Hajnoczi, John Snow

On 06/29/2016 04:12 AM, Fam Zheng wrote:
> On Tue, 06/28 12:10, Denis V. Lunev wrote:
>> On 06/28/2016 04:27 AM, Fam Zheng wrote:
>>> On Mon, 06/27 17:47, Denis V. Lunev wrote:
>>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>>
>>>> Some guests (win2008 server for example) do a lot of unnecessary
>>>> flushing when underlying media has not changed. This adds additional
>>>> overhead on host when calling fsync/fdatasync.
>>>>
>>>> This change introduces a dirty flag in BlockDriverState which is set
>>>> in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
>>>> avoid unnecessary flushing when storage is clean.
>>>>
>>>> The problem with excessive flushing was found by a performance test
>>>> which does parallel directory tree creation (from 2 processes).
>>>> Results improved from 0.424 loops/sec to 0.432 loops/sec.
>>>> Each loop creates 10^3 directories with 10 files in each.
>>>>
>>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Kevin Wolf <kwolf@redhat.com>
>>>> CC: Max Reitz <mreitz@redhat.com>
>>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>>> CC: Fam Zheng <famz@redhat.com>
>>>> CC: John Snow <jsnow@redhat.com>
>>>> ---
>>>>    block.c                   |  1 +
>>>>    block/dirty-bitmap.c      |  3 +++
>>>>    block/io.c                | 19 +++++++++++++++++++
>>>>    include/block/block_int.h |  1 +
>>>>    4 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 947df29..68ae3a0 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
>>>>            ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
>>>>            bdrv_dirty_bitmap_truncate(bs);
>>>>            bdrv_parent_cb_resize(bs);
>>>> +        bs->dirty = true; /* file node sync is needed after truncate */
>>>>        }
>>>>        return ret;
>>>>    }
>>>> diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
>>>> index 4902ca5..54e0413 100644
>>>> --- a/block/dirty-bitmap.c
>>>> +++ b/block/dirty-bitmap.c
>>>> @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
>>>>            }
>>>>            hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
>>>>        }
>>>> +
>>>> +    /* Set global block driver dirty flag even if bitmap is disabled */
>>>> +    bs->dirty = true;
>>>>    }
>>>>    /**
>>>> diff --git a/block/io.c b/block/io.c
>>>> index b9e53e3..152f5a9 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>> @@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
>>>>            goto flush_parent;
>>>>        }
>>>> +    /* Check if storage is actually dirty before flushing to disk */
>>>> +    if (!bs->dirty) {
>>>> +        /* Flush requests are appended to tracked request list in order so that
>>>> +         * most recent request is at the head of the list. Following code uses
>>>> +         * this ordering to wait for the most recent flush request to complete
>>>> +         * to ensure that requests return in order */
>>>> +        BdrvTrackedRequest *prev_req;
>>>> +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
>>>> +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
>>>> +                continue;
>>>> +            }
>>>> +
>>>> +            qemu_co_queue_wait(&prev_req->wait_queue);
>>>> +            break;
>>>> +        }
>>>> +        goto flush_parent;
>>> Should we check bs->dirty again after qemu_co_queue_wait()? I think another
>>> write request could sneak in while this coroutine yields.
>> no, we do not care. Any subsequent to FLUSH write does not guaranteed to
>> be flushed. We have the warranty only that all write requests completed
>> prior to this flush are really flushed.
> I'm not worried about subsequent requests.
>
> A prior request can be already in progress or be waiting when we check
> bs->dirty, though it would be false there, but it will become true soon --
> bdrv_set_dirty is only called when a request is completing.
>
> Fam
I have written specifically about this situation. FLUSH in the
controller does not guarantee that requests which are in the
progress at the moment when the flush is initiated will be
flushed.

It guarantees that requests, which are completed, i.e. which
status 'COMPLETED' was returned to the guest, will  be flushed.

Den

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 1/3] block: ignore flush requests when storage is clean
  2016-06-29  1:12       ` Fam Zheng
  2016-06-29  8:30         ` Denis V. Lunev
@ 2016-06-29  9:09         ` Stefan Hajnoczi
  1 sibling, 0 replies; 14+ messages in thread
From: Stefan Hajnoczi @ 2016-06-29  9:09 UTC (permalink / raw)
  To: Fam Zheng
  Cc: Denis V. Lunev, Kevin Wolf, Evgeny Yakovlev, qemu-block,
	qemu-devel, Max Reitz, John Snow

[-- Attachment #1: Type: text/plain, Size: 4630 bytes --]

On Wed, Jun 29, 2016 at 09:12:41AM +0800, Fam Zheng wrote:
> On Tue, 06/28 12:10, Denis V. Lunev wrote:
> > On 06/28/2016 04:27 AM, Fam Zheng wrote:
> > > On Mon, 06/27 17:47, Denis V. Lunev wrote:
> > > > From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> > > > 
> > > > Some guests (win2008 server for example) do a lot of unnecessary
> > > > flushing when underlying media has not changed. This adds additional
> > > > overhead on host when calling fsync/fdatasync.
> > > > 
> > > > This change introduces a dirty flag in BlockDriverState which is set
> > > > in bdrv_set_dirty and is checked in bdrv_co_flush. This allows us to
> > > > avoid unnecessary flushing when storage is clean.
> > > > 
> > > > The problem with excessive flushing was found by a performance test
> > > > which does parallel directory tree creation (from 2 processes).
> > > > Results improved from 0.424 loops/sec to 0.432 loops/sec.
> > > > Each loop creates 10^3 directories with 10 files in each.
> > > > 
> > > > Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
> > > > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > > > CC: Kevin Wolf <kwolf@redhat.com>
> > > > CC: Max Reitz <mreitz@redhat.com>
> > > > CC: Stefan Hajnoczi <stefanha@redhat.com>
> > > > CC: Fam Zheng <famz@redhat.com>
> > > > CC: John Snow <jsnow@redhat.com>
> > > > ---
> > > >   block.c                   |  1 +
> > > >   block/dirty-bitmap.c      |  3 +++
> > > >   block/io.c                | 19 +++++++++++++++++++
> > > >   include/block/block_int.h |  1 +
> > > >   4 files changed, 24 insertions(+)
> > > > 
> > > > diff --git a/block.c b/block.c
> > > > index 947df29..68ae3a0 100644
> > > > --- a/block.c
> > > > +++ b/block.c
> > > > @@ -2581,6 +2581,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset)
> > > >           ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
> > > >           bdrv_dirty_bitmap_truncate(bs);
> > > >           bdrv_parent_cb_resize(bs);
> > > > +        bs->dirty = true; /* file node sync is needed after truncate */
> > > >       }
> > > >       return ret;
> > > >   }
> > > > diff --git a/block/dirty-bitmap.c b/block/dirty-bitmap.c
> > > > index 4902ca5..54e0413 100644
> > > > --- a/block/dirty-bitmap.c
> > > > +++ b/block/dirty-bitmap.c
> > > > @@ -370,6 +370,9 @@ void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector,
> > > >           }
> > > >           hbitmap_set(bitmap->bitmap, cur_sector, nr_sectors);
> > > >       }
> > > > +
> > > > +    /* Set global block driver dirty flag even if bitmap is disabled */
> > > > +    bs->dirty = true;
> > > >   }
> > > >   /**
> > > > diff --git a/block/io.c b/block/io.c
> > > > index b9e53e3..152f5a9 100644
> > > > --- a/block/io.c
> > > > +++ b/block/io.c
> > > > @@ -2247,6 +2247,25 @@ int coroutine_fn bdrv_co_flush(BlockDriverState *bs)
> > > >           goto flush_parent;
> > > >       }
> > > > +    /* Check if storage is actually dirty before flushing to disk */
> > > > +    if (!bs->dirty) {
> > > > +        /* Flush requests are appended to tracked request list in order so that
> > > > +         * most recent request is at the head of the list. Following code uses
> > > > +         * this ordering to wait for the most recent flush request to complete
> > > > +         * to ensure that requests return in order */
> > > > +        BdrvTrackedRequest *prev_req;
> > > > +        QLIST_FOREACH(prev_req, &bs->tracked_requests, list) {
> > > > +            if (prev_req == &req || prev_req->type != BDRV_TRACKED_FLUSH) {
> > > > +                continue;
> > > > +            }
> > > > +
> > > > +            qemu_co_queue_wait(&prev_req->wait_queue);
> > > > +            break;
> > > > +        }
> > > > +        goto flush_parent;
> > > Should we check bs->dirty again after qemu_co_queue_wait()? I think another
> > > write request could sneak in while this coroutine yields.
> > no, we do not care. Any subsequent to FLUSH write does not guaranteed to
> > be flushed. We have the warranty only that all write requests completed
> > prior to this flush are really flushed.
> 
> I'm not worried about subsequent requests.
> 
> A prior request can be already in progress or be waiting when we check
> bs->dirty, though it would be false there, but it will become true soon --
> bdrv_set_dirty is only called when a request is completing.

Flush only guarantees that already completed writes are persistent.  It
is not a barrier operation.  It does not wait for in-flight writes and
makes no guarantee regarding them.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing
  2016-06-28 16:37       ` John Snow
@ 2016-06-29 17:40         ` John Snow
  0 siblings, 0 replies; 14+ messages in thread
From: John Snow @ 2016-06-29 17:40 UTC (permalink / raw)
  To: Evgeny Yakovlev, Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Max Reitz, Stefan Hajnoczi



On 06/28/2016 12:37 PM, John Snow wrote:
> 
> 
> On 06/28/2016 05:21 AM, Evgeny Yakovlev wrote:
>>
>> On 28.06.2016 02:19, John Snow wrote:
>>>
>>> On 06/27/2016 10:47 AM, Denis V. Lunev wrote:
>>>> From: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>>
>>>> Due to changes in flush behaviour clean disks stopped generating
>>>> flush_to_disk events and IDE and AHCI tests that test flush commands
>>>> started to fail.
>>>>
>>>> This change adds additional DMA writes to affected tests before sending
>>>> flush commands so that bdrv_flush actually generates flush_to_disk
>>>> event.
>>>>
>>>> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
>>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>>> CC: Kevin Wolf <kwolf@redhat.com>
>>>> CC: Max Reitz <mreitz@redhat.com>
>>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>>> CC: Fam Zheng <famz@redhat.com>
>>>> CC: John Snow <jsnow@redhat.com>
>>>> ---
>>>>   tests/ahci-test.c | 34 ++++++++++++++++++++++++++++++++--
>>>>   tests/ide-test.c  | 43 +++++++++++++++++++++++++++++++++++++++++++
>>>>   2 files changed, 75 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/tests/ahci-test.c b/tests/ahci-test.c
>>>> index 57dc44c..d707714 100644
>>>> --- a/tests/ahci-test.c
>>>> +++ b/tests/ahci-test.c
>>>> @@ -1063,11 +1063,34 @@ static void test_dma_fragmented(void)
>>>>       g_free(tx);
>>>>   }
>>>>   +/*
>>>> + * Write sector 0 with random data to make AHCI storage dirty
>>>> + * Needed for flush tests so that flushes actually go though the
>>>> block layer
>>>> + */
>>>> +static void make_dirty(AHCIQState* ahci, uint8_t port)
>>>> +{
>>>> +    uint64_t ptr;
>>>> +    unsigned bufsize = 512;
>>>> +
>>>> +    ptr = ahci_alloc(ahci, bufsize);
>>>> +    g_assert(ptr);
>> With no link to allow us to view the keynote from another room or our
>> desks, most of us had not choice but to abandon our attempt to actually
>> listen Jim talk about the future of Red Hat. 

Apologies, random clipboard detritus escaped into my emails again.
(Red Hat summit is ongoing right now and some people had concerns about
access to the live feeds. Apologies.)

Please pardon the noise.

>>>> +
>>>> +    ahci_guest_io(ahci, port, CMD_WRITE_DMA, ptr, bufsize, 0);
>>>> +    ahci_free(ahci, ptr);
>>>> +}
>>>> +
>>>>   static void test_flush(void)
>>>>   {
>>>>       AHCIQState *ahci;
>>>> +    uint8_t port;
>>>>         ahci = ahci_boot_and_enable(NULL);
>>>> +
>>>> +    port = ahci_port_select(ahci);
>>>> +    ahci_port_clear(ahci, port);
>>>> +
>>>> +    make_dirty(ahci, port);
>>>> +
>>>>       ahci_test_flush(ahci);
>>>>       ahci_shutdown(ahci);
>>>>   }
>>>> @@ -1087,10 +1110,13 @@ static void test_flush_retry(void)
>>>>                                   debug_path,
>>>>                                   tmp_path, imgfmt);
>>>>   -    /* Issue Flush Command and wait for error */
>>>>       port = ahci_port_select(ahci);
>>>>       ahci_port_clear(ahci, port);
>>>>   +    /* Issue write so that flush actually goes to disk */
>>>> +    make_dirty(ahci, port);
>>>> +
>>>> +    /* Issue Flush Command and wait for error */
>>>>       cmd = ahci_guest_io_halt(ahci, port, CMD_FLUSH_CACHE, 0, 0, 0);
>>>>       ahci_guest_io_resume(ahci, cmd);
>>>>   @@ -1343,9 +1369,13 @@ static void test_flush_migrate(void)
>>>>         set_context(src->parent);
>>>>   -    /* Issue Flush Command */
>>>>       px = ahci_port_select(src);
>>>>       ahci_port_clear(src, px);
>>>> +
>>>> +    /* Dirty device so that flush reaches disk */
>>>> +    make_dirty(src, px);
>>>> +
>>>> +    /* Issue Flush Command */
>>>>       cmd = ahci_command_create(CMD_FLUSH_CACHE);
>>>>       ahci_command_commit(src, cmd, px);
>>>>       ahci_command_issue_async(src, cmd);
>>>> diff --git a/tests/ide-test.c b/tests/ide-test.c
>>>> index fed1b2e..8466d0f 100644
>>>> --- a/tests/ide-test.c
>>>> +++ b/tests/ide-test.c
>>>> @@ -499,6 +499,39 @@ static void test_identify(void)
>>>>       ide_test_quit();
>>>>   }
>>>>   +/*
>>>> + * Write sector 0 with random data to make IDE storage dirty
>>>> + * Needed for flush tests so that flushes actually go though the
>>>> block layer
>>>> + */
>>>> +static void make_dirty(uint8_t device)
>>>> +{
>>>> +    uint8_t status;
>>>> +    size_t len = 512;
>>>> +    uintptr_t guest_buf;
>>>> +    void* buf;
>>>> +
>> With no link to allow us to view the keynote from another room or our
>> desks, most of us had not choice but to abandon our attempt to actually
>> listen Jim talk about the future of Red Hat. 

My clipboard must have been really angry.

>>>> +    guest_buf = guest_alloc(guest_malloc, len);
>>>> +    buf = g_malloc(len);
>>>> +    g_assert(guest_buf);
>>>> +    g_assert(buf);
>>>> +
>>>> +    memwrite(guest_buf, buf, len);
>>>> +
>>>> +    PrdtEntry prdt[] = {
>>>> +        {
>>>> +            .addr = cpu_to_le32(guest_buf),
>>>> +            .size = cpu_to_le32(len | PRDT_EOT),
>>>> +        },
>>>> +    };
>>>> +
>>>> +    status = send_dma_request(CMD_WRITE_DMA, 0, 1, prdt,
>>>> +                              ARRAY_SIZE(prdt), NULL);
>>>> +    g_assert_cmphex(status, ==, BM_STS_INTR);
>>>> +    assert_bit_clear(inb(IDE_BASE + reg_status), DF | ERR);
>>>> +
>>>> +    g_free(buf);
>>>> +}
>>>> +
>>>>   static void test_flush(void)
>>>>   {
>>>>       uint8_t data;
>>>> @@ -507,6 +540,11 @@ static void test_flush(void)
>>>>           "-drive file=blkdebug::%s,if=ide,cache=writeback,format=raw",
>>>>           tmp_path);
>>>>   +    qtest_irq_intercept_in(global_qtest, "ioapic");
>>>> +
>>>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>>>> +    make_dirty(0);
>>>> +
>>>>       /* Delay the completion of the flush request until we
>>>> explicitly do it */
>>>>       g_free(hmp("qemu-io ide0-hd0 \"break flush_to_os A\""));
>>>>   @@ -549,6 +587,11 @@ static void test_retry_flush(const char *machine)
>>>>           "rerror=stop,werror=stop",
>>>>           debug_path, tmp_path);
>>>>   +    qtest_irq_intercept_in(global_qtest, "ioapic");
>>>> +
>>>> +    /* Dirty media so that CMD_FLUSH_CACHE will actually go to disk */
>>>> +    make_dirty(0);
>>>> +
>>>>       /* FLUSH CACHE command on device 0*/
>>>>       outb(IDE_BASE + reg_device, 0);
>>>>       outb(IDE_BASE + reg_command, CMD_FLUSH_CACHE);
>>>>
>>> Reviewed-by: John Snow <jsnow@redhat.com>
>>>
>>> However, the patch-set needs to be re-ordered as 3-1-2. Intermediate
>>> patch commits cannot have failing tests.
>>
>> Sure, although 2-3-1 would probably be better, since 2 fixes an assert
>> in IDE that is triggered by new tests in 3?
>>
> 
> :)
> 
> Yes, thanks!
> 
> --js
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-06-29 17:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-27 14:47 [Qemu-devel] [PATCH v4 0/3] block: ignore flush requests when storage is clean Denis V. Lunev
2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 1/3] " Denis V. Lunev
2016-06-28  1:27   ` Fam Zheng
2016-06-28  9:10     ` Denis V. Lunev
2016-06-29  1:12       ` Fam Zheng
2016-06-29  8:30         ` Denis V. Lunev
2016-06-29  9:09         ` Stefan Hajnoczi
2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 2/3] ide: ignore retry_unit check for non-retry operation Denis V. Lunev
2016-06-27 14:47 ` [Qemu-devel] [PATCH v4 3/3] tests: in IDE and AHCI tests perform DMA write before flushing Denis V. Lunev
2016-06-27 23:19   ` John Snow
2016-06-28  9:11     ` Denis V. Lunev
2016-06-28  9:21     ` Evgeny Yakovlev
2016-06-28 16:37       ` John Snow
2016-06-29 17:40         ` John Snow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.