All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v4 0/8] discard blockstats
@ 2018-08-21  9:46 Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 1/8] qapi: group BlockDeviceStats fields Anton Nefedov
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

new in v4:
    - patch 7: discard and write-zeroes code paths had been separated in
      34fa110e: file-posix: Fix write_zeroes with unmap on block devices.
      This patch now only accounts discards that come explicitly
      through .bdrv_co_pdiscard handler.
    - qapi 'Since' clauses changed 3.0 -> 3.1

v3: http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg03688.html

----

qmp query-blockstats provides stats info for write/read/flush ops.

Patches 1-6 implement the similar for discard (unmap) command for scsi
and ide disks.
Discard stat "unmap_ops / unmap_bytes" is supposed to account the ops that
have completed without an error.

However, discard operation is advisory. Specifically,
 - common block layer ignores ENOTSUP error code.
   That might be returned if the block driver does not support discard,
   or discard has been configured to be ignored.
 - format drivers such as qcow2 may ignore discard if they were configured
   to ignore that, or if the corresponding area is already marked unused
   (unallocated / zero clusters).

And what is actually useful is the number of bytes actually discarded
down on the host filesystem.
To achieve that, driver-specific statistics has been added to blockstats
(patch 8).
With patch 7, file-posix driver accounts discard operations on its level too.

query-blockstat result:

(note the difference between blockdevice unmap and file discard stats. qcow2
sends fewer ops down to the file as the clusters are actually unallocated
on qcow2 level)

{
  "return": [
    {
      "device": "drive-scsi0-0-0-0",
      "parent": {
>       "discard-bytes-ok": 262144,
>       "discard-nb-ok": 4,
        "stats": {
>         "unmap_operations": 0,
>         "unmap_merged": 0,
          "flush_total_time_ns": 0,
          "wr_highest_offset": 8111718400,
          "wr_total_time_ns": 0,
          "failed_wr_operations": 0,
          "failed_rd_operations": 0,
          "wr_merged": 0,
          "wr_bytes": 0,
          "timed_stats": [
            
          ],
>         "failed_unmap_operations": 0,
          "failed_flush_operations": 0,
          "account_invalid": false,
          "rd_total_time_ns": 0,
>         "invalid_unmap_operations": 0,
          "flush_operations": 0,
          "wr_operations": 0,
>         "unmap_bytes": 0,
          "rd_merged": 0,
          "rd_bytes": 0,
>         "unmap_total_time_ns": 0,
          "invalid_flush_operations": 0,
          "account_failed": false,
          "rd_operations": 0,
          "invalid_wr_operations": 0,
          "invalid_rd_operations": 0
        },
        "node-name": "#block012",
>       "driver": "file",
>       "discard-nb-failed": 0
      },
      "stats": {
>       "unmap_operations": 860,
>       "unmap_merged": 0,
        "flush_total_time_ns": 21506733,
        "wr_highest_offset": 13411741696,
        "wr_total_time_ns": 2212749334,
        "failed_wr_operations": 0,
        "failed_rd_operations": 0,
        "wr_merged": 0,
        "wr_bytes": 3426304,
        "timed_stats": [
          
        ],
>       "failed_unmap_operations": 0,
        "failed_flush_operations": 0,
        "account_invalid": true,
        "rd_total_time_ns": 3617478206,
>       "invalid_unmap_operations": 0,
        "flush_operations": 24,
        "wr_operations": 309,
>       "unmap_bytes": 11949633536,
        "rd_merged": 0,
        "rd_bytes": 141967360,
>       "unmap_total_time_ns": 14871816,
        [..]

Anton Nefedov (8):
  qapi: group BlockDeviceStats fields
  qapi: add unmap to BlockDeviceStats
  ide: account UNMAP (TRIM) operations
  scsi: store unmap offset and nb_sectors in request struct
  scsi: move unmap error checking to the complete callback
  scsi: account unmap operations
  file-posix: account discard operations
  qapi: query-blockstat: add driver specific file-posix stats

 qapi/block-core.json       | 82 +++++++++++++++++++++++++++++++++++++++-------
 include/block/accounting.h |  1 +
 include/block/block.h      |  1 +
 include/block/block_int.h  |  1 +
 block.c                    |  9 +++++
 block/file-posix.c         | 45 +++++++++++++++++++++++--
 block/qapi.c               | 11 +++++++
 hw/ide/core.c              | 12 +++++++
 hw/scsi/scsi-disk.c        | 29 +++++++++-------
 tests/qemu-iotests/227.out | 18 ++++++++++
 10 files changed, 184 insertions(+), 25 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 1/8] qapi: group BlockDeviceStats fields
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 2/8] qapi: add unmap to BlockDeviceStats Anton Nefedov
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

Make the stat fields definition slightly more readable.
Also reorder total_time_ns stats read-write-flush as done elsewhere.
Cosmetic change only.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 qapi/block-core.json | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 4c7a37a..44d992f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -784,12 +784,12 @@
 # @flush_operations: The number of cache flush operations performed by the
 #                    device (since 0.15.0)
 #
-# @flush_total_time_ns: Total time spend on cache flushes in nano-seconds
-#                       (since 0.15.0).
+# @rd_total_time_ns: Total time spent on reads in nanoseconds (since 0.15.0).
 #
-# @wr_total_time_ns: Total time spend on writes in nano-seconds (since 0.15.0).
+# @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15.0).
 #
-# @rd_total_time_ns: Total_time_spend on reads in nano-seconds (since 0.15.0).
+# @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
+#                       (since 0.15.0).
 #
 # @wr_highest_offset: The offset after the greatest byte written to the
 #                     device.  The intended use of this information is for
@@ -842,14 +842,18 @@
 # Since: 0.14.0
 ##
 { 'struct': 'BlockDeviceStats',
-  'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'rd_operations': 'int',
-           'wr_operations': 'int', 'flush_operations': 'int',
-           'flush_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-           'rd_total_time_ns': 'int', 'wr_highest_offset': 'int',
-           'rd_merged': 'int', 'wr_merged': 'int', '*idle_time_ns': 'int',
+  'data': {'rd_bytes': 'int', 'wr_bytes': 'int',
+           'rd_operations': 'int', 'wr_operations': 'int',
+           'flush_operations': 'int',
+           'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
+           'flush_total_time_ns': 'int',
+           'wr_highest_offset': 'int',
+           'rd_merged': 'int', 'wr_merged': 'int',
+           '*idle_time_ns': 'int',
            'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
-           'failed_flush_operations': 'int', 'invalid_rd_operations': 'int',
-           'invalid_wr_operations': 'int', 'invalid_flush_operations': 'int',
+           'failed_flush_operations': 'int',
+           'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
+           'invalid_flush_operations': 'int',
            'account_invalid': 'bool', 'account_failed': 'bool',
            'timed_stats': ['BlockDeviceTimedStats'],
            '*x_rd_latency_histogram': 'BlockLatencyHistogramInfo',
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 2/8] qapi: add unmap to BlockDeviceStats
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 1/8] qapi: group BlockDeviceStats fields Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations Anton Nefedov
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 qapi/block-core.json       | 29 +++++++++++++++++++++++------
 include/block/accounting.h |  1 +
 block/qapi.c               |  6 ++++++
 3 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 44d992f..19d4743 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -777,6 +777,8 @@
 #
 # @wr_bytes:      The number of bytes written by the device.
 #
+# @unmap_bytes: The number of bytes unmapped by the device (Since 3.1)
+#
 # @rd_operations: The number of read operations performed by the device.
 #
 # @wr_operations: The number of write operations performed by the device.
@@ -784,6 +786,9 @@
 # @flush_operations: The number of cache flush operations performed by the
 #                    device (since 0.15.0)
 #
+# @unmap_operations: The number of unmap operations performed by the device
+#                    (Since 3.1)
+#
 # @rd_total_time_ns: Total time spent on reads in nanoseconds (since 0.15.0).
 #
 # @wr_total_time_ns: Total time spent on writes in nanoseconds (since 0.15.0).
@@ -791,6 +796,9 @@
 # @flush_total_time_ns: Total time spent on cache flushes in nanoseconds
 #                       (since 0.15.0).
 #
+# @unmap_total_time_ns: Total time spent on unmap operations in nanoseconds
+#                       (Since 3.1)
+#
 # @wr_highest_offset: The offset after the greatest byte written to the
 #                     device.  The intended use of this information is for
 #                     growable sparse files (like qcow2) that are used on top
@@ -802,6 +810,9 @@
 # @wr_merged: Number of write requests that have been merged into another
 #             request (Since 2.3).
 #
+# @unmap_merged: Number of unmap requests that have been merged into another
+#                request (Since 3.1)
+#
 # @idle_time_ns: Time since the last I/O operation, in
 #                nanoseconds. If the field is absent it means that
 #                there haven't been any operations yet (Since 2.5).
@@ -815,6 +826,9 @@
 # @failed_flush_operations: The number of failed flush operations
 #                           performed by the device (Since 2.5)
 #
+# @failed_unmap_operations: The number of failed unmap operations performed
+#                           by the device (Since 3.1)
+#
 # @invalid_rd_operations: The number of invalid read operations
 #                          performed by the device (Since 2.5)
 #
@@ -824,6 +838,9 @@
 # @invalid_flush_operations: The number of invalid flush operations
 #                            performed by the device (Since 2.5)
 #
+# @invalid_unmap_operations: The number of invalid unmap operations performed
+#                            by the device (Since 3.1)
+#
 # @account_invalid: Whether invalid operations are included in the
 #                   last access statistics (Since 2.5)
 #
@@ -842,18 +859,18 @@
 # Since: 0.14.0
 ##
 { 'struct': 'BlockDeviceStats',
-  'data': {'rd_bytes': 'int', 'wr_bytes': 'int',
+  'data': {'rd_bytes': 'int', 'wr_bytes': 'int', 'unmap_bytes' : 'int',
            'rd_operations': 'int', 'wr_operations': 'int',
-           'flush_operations': 'int',
+           'flush_operations': 'int', 'unmap_operations': 'int',
            'rd_total_time_ns': 'int', 'wr_total_time_ns': 'int',
-           'flush_total_time_ns': 'int',
+           'flush_total_time_ns': 'int', 'unmap_total_time_ns': 'int',
            'wr_highest_offset': 'int',
-           'rd_merged': 'int', 'wr_merged': 'int',
+           'rd_merged': 'int', 'wr_merged': 'int', 'unmap_merged': 'int',
            '*idle_time_ns': 'int',
            'failed_rd_operations': 'int', 'failed_wr_operations': 'int',
-           'failed_flush_operations': 'int',
+           'failed_flush_operations': 'int', 'failed_unmap_operations': 'int',
            'invalid_rd_operations': 'int', 'invalid_wr_operations': 'int',
-           'invalid_flush_operations': 'int',
+           'invalid_flush_operations': 'int', 'invalid_unmap_operations': 'int',
            'account_invalid': 'bool', 'account_failed': 'bool',
            'timed_stats': ['BlockDeviceTimedStats'],
            '*x_rd_latency_histogram': 'BlockLatencyHistogramInfo',
diff --git a/include/block/accounting.h b/include/block/accounting.h
index d1f67b1..ba8b04d 100644
--- a/include/block/accounting.h
+++ b/include/block/accounting.h
@@ -36,6 +36,7 @@ enum BlockAcctType {
     BLOCK_ACCT_READ,
     BLOCK_ACCT_WRITE,
     BLOCK_ACCT_FLUSH,
+    BLOCK_ACCT_UNMAP,
     BLOCK_MAX_IOTYPE,
 };
 
diff --git a/block/qapi.c b/block/qapi.c
index c66f949..df31f35 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -432,24 +432,30 @@ static void bdrv_query_blk_stats(BlockDeviceStats *ds, BlockBackend *blk)
 
     ds->rd_bytes = stats->nr_bytes[BLOCK_ACCT_READ];
     ds->wr_bytes = stats->nr_bytes[BLOCK_ACCT_WRITE];
+    ds->unmap_bytes = stats->nr_bytes[BLOCK_ACCT_UNMAP];
     ds->rd_operations = stats->nr_ops[BLOCK_ACCT_READ];
     ds->wr_operations = stats->nr_ops[BLOCK_ACCT_WRITE];
+    ds->unmap_operations = stats->nr_ops[BLOCK_ACCT_UNMAP];
 
     ds->failed_rd_operations = stats->failed_ops[BLOCK_ACCT_READ];
     ds->failed_wr_operations = stats->failed_ops[BLOCK_ACCT_WRITE];
     ds->failed_flush_operations = stats->failed_ops[BLOCK_ACCT_FLUSH];
+    ds->failed_unmap_operations = stats->failed_ops[BLOCK_ACCT_UNMAP];
 
     ds->invalid_rd_operations = stats->invalid_ops[BLOCK_ACCT_READ];
     ds->invalid_wr_operations = stats->invalid_ops[BLOCK_ACCT_WRITE];
     ds->invalid_flush_operations =
         stats->invalid_ops[BLOCK_ACCT_FLUSH];
+    ds->invalid_unmap_operations = stats->invalid_ops[BLOCK_ACCT_UNMAP];
 
     ds->rd_merged = stats->merged[BLOCK_ACCT_READ];
     ds->wr_merged = stats->merged[BLOCK_ACCT_WRITE];
+    ds->unmap_merged = stats->merged[BLOCK_ACCT_UNMAP];
     ds->flush_operations = stats->nr_ops[BLOCK_ACCT_FLUSH];
     ds->wr_total_time_ns = stats->total_time_ns[BLOCK_ACCT_WRITE];
     ds->rd_total_time_ns = stats->total_time_ns[BLOCK_ACCT_READ];
     ds->flush_total_time_ns = stats->total_time_ns[BLOCK_ACCT_FLUSH];
+    ds->unmap_total_time_ns = stats->total_time_ns[BLOCK_ACCT_UNMAP];
 
     ds->has_idle_time_ns = stats->last_access_time_ns > 0;
     if (ds->has_idle_time_ns) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 1/8] qapi: group BlockDeviceStats fields Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 2/8] qapi: add unmap to BlockDeviceStats Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-10-04 15:33   ` Kevin Wolf
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 4/8] scsi: store unmap offset and nb_sectors in request struct Anton Nefedov
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 hw/ide/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 2c62efc..352429b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
     TrimAIOCB *iocb = opaque;
     IDEState *s = iocb->s;
 
+    if (iocb->i >= 0) {
+        if (ret >= 0) {
+            block_acct_done(blk_get_stats(s->blk), &s->acct);
+        } else {
+            block_acct_failed(blk_get_stats(s->blk), &s->acct);
+        }
+    }
+
     if (ret >= 0) {
         while (iocb->j < iocb->qiov->niov) {
             int j = iocb->j;
@@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
                     goto done;
                 }
 
+                block_acct_start(blk_get_stats(s->blk), &s->acct,
+                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
+
                 /* Got an entry! Submit and exit.  */
                 iocb->aiocb = blk_aio_pdiscard(s->blk,
                                                sector << BDRV_SECTOR_BITS,
@@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
     }
 
     if (ret == -EINVAL) {
+        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
         ide_dma_error(s);
         return;
     }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 4/8] scsi: store unmap offset and nb_sectors in request struct
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
                   ` (2 preceding siblings ...)
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 5/8] scsi: move unmap error checking to the complete callback Anton Nefedov
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

it allows to report it in the error handler

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 hw/scsi/scsi-disk.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 5ae7baa..d41eea1 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1653,8 +1653,6 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
 {
     SCSIDiskReq *r = data->r;
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
-    uint64_t sector_num;
-    uint32_t nb_sectors;
 
     assert(r->req.aiocb == NULL);
     if (scsi_disk_req_check_error(r, ret, false)) {
@@ -1662,16 +1660,16 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
     }
 
     if (data->count > 0) {
-        sector_num = ldq_be_p(&data->inbuf[0]);
-        nb_sectors = ldl_be_p(&data->inbuf[8]) & 0xffffffffULL;
-        if (!check_lba_range(s, sector_num, nb_sectors)) {
+        r->sector = ldq_be_p(&data->inbuf[0]);
+        r->sector_count = ldl_be_p(&data->inbuf[8]) & 0xffffffffULL;
+        if (!check_lba_range(s, r->sector, r->sector_count)) {
             scsi_check_condition(r, SENSE_CODE(LBA_OUT_OF_RANGE));
             goto done;
         }
 
         r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
-                                        sector_num * s->qdev.blocksize,
-                                        nb_sectors * s->qdev.blocksize,
+                                        r->sector * s->qdev.blocksize,
+                                        r->sector_count * s->qdev.blocksize,
                                         scsi_unmap_complete, data);
         data->count--;
         data->inbuf += 16;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 5/8] scsi: move unmap error checking to the complete callback
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
                   ` (3 preceding siblings ...)
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 4/8] scsi: store unmap offset and nb_sectors in request struct Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations Anton Nefedov
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

This will help to account the operation in the following commit.

The difference is that we don't call scsi_disk_req_check_error() before
the 1st discard iteration anymore. That function also checks if
the request is cancelled, however it shouldn't get canceled until it
yields in blk_aio() functions anyway.
Same approach is already used for emulate_write_same.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 hw/scsi/scsi-disk.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index d41eea1..9d10daf 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1655,9 +1655,6 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
     SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
     assert(r->req.aiocb == NULL);
-    if (scsi_disk_req_check_error(r, ret, false)) {
-        goto done;
-    }
 
     if (data->count > 0) {
         r->sector = ldq_be_p(&data->inbuf[0]);
@@ -1693,7 +1690,12 @@ static void scsi_unmap_complete(void *opaque, int ret)
     r->req.aiocb = NULL;
 
     aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-    scsi_unmap_complete_noio(data, ret);
+    if (scsi_disk_req_check_error(r, ret, false)) {
+        scsi_req_unref(&r->req);
+        g_free(data);
+    } else {
+        scsi_unmap_complete_noio(data, ret);
+    }
     aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
                   ` (4 preceding siblings ...)
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 5/8] scsi: move unmap error checking to the complete callback Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-10-04 15:47   ` Kevin Wolf
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations Anton Nefedov
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
 hw/scsi/scsi-disk.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 9d10daf..0aac137 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1664,6 +1664,10 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
             goto done;
         }
 
+        block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
+                         r->sector_count * s->qdev.blocksize,
+                         BLOCK_ACCT_UNMAP);
+
         r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
                                         r->sector * s->qdev.blocksize,
                                         r->sector_count * s->qdev.blocksize,
@@ -1690,10 +1694,11 @@ static void scsi_unmap_complete(void *opaque, int ret)
     r->req.aiocb = NULL;
 
     aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
-    if (scsi_disk_req_check_error(r, ret, false)) {
+    if (scsi_disk_req_check_error(r, ret, true)) {
         scsi_req_unref(&r->req);
         g_free(data);
     } else {
+        block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
         scsi_unmap_complete_noio(data, ret);
     }
     aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
@@ -1740,10 +1745,12 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
     return;
 
 invalid_param_len:
+    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
     scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
     return;
 
 invalid_field:
+    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
     scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
                   ` (5 preceding siblings ...)
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
  2018-10-04 15:52   ` Kevin Wolf
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 8/8] qapi: query-blockstat: add driver specific file-posix stats Anton Nefedov
       [not found] ` <6d0b2d5b-f0a4-b62c-3dc4-e8d92eeb76b2@virtuozzo.com>
  8 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

This will help to identify how many of the user-issued discard operations
(accounted on a device level) have actually suceeded down on the host file
(even though the numbers will not be exactly the same if non-raw format
driver is used (e.g. qcow2 sending metadata discards)).

Note that these numbers will not include discards triggered by
write-zeroes + MAY_UNMAP calls.

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 block/file-posix.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index fe83cbf..c420f76 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -163,6 +163,11 @@ typedef struct BDRVRawState {
     bool has_fallocate;
     bool needs_alignment;
     bool check_cache_dropped;
+    struct {
+        int64_t discard_nb_ok;
+        int64_t discard_nb_failed;
+        int64_t discard_bytes_ok;
+    } stats;
 
     PRManager *pr_mgr;
 } BDRVRawState;
@@ -2575,12 +2580,25 @@ static void coroutine_fn raw_co_invalidate_cache(BlockDriverState *bs,
 #endif /* !__linux__ */
 }
 
+static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
+{
+    if (ret) {
+        s->stats.discard_nb_failed++;
+    } else {
+        s->stats.discard_nb_ok++;
+        s->stats.discard_bytes_ok += nbytes;
+    }
+}
+
 static coroutine_fn int
 raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
 {
     BDRVRawState *s = bs->opaque;
+    int ret;
 
-    return paio_submit_co(bs, s->fd, offset, NULL, bytes, QEMU_AIO_DISCARD);
+    ret = paio_submit_co(bs, s->fd, offset, NULL, bytes, QEMU_AIO_DISCARD);
+    raw_account_discard(s, bytes, ret);
+    return ret;
 }
 
 static int coroutine_fn raw_co_pwrite_zeroes(
@@ -3077,10 +3095,14 @@ hdev_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
 
     ret = fd_open(bs);
     if (ret < 0) {
+        raw_account_discard(s, bytes, ret);
         return ret;
     }
-    return paio_submit_co(bs, s->fd, offset, NULL, bytes,
-                          QEMU_AIO_DISCARD | QEMU_AIO_BLKDEV);
+
+    ret = paio_submit_co(bs, s->fd, offset, NULL, bytes,
+                         QEMU_AIO_DISCARD | QEMU_AIO_BLKDEV);
+    raw_account_discard(s, bytes, ret);
+    return ret;
 }
 
 static coroutine_fn int hdev_co_pwrite_zeroes(BlockDriverState *bs,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH v4 8/8] qapi: query-blockstat: add driver specific file-posix stats
  2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
                   ` (6 preceding siblings ...)
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations Anton Nefedov
@ 2018-08-21  9:46 ` Anton Nefedov
       [not found] ` <6d0b2d5b-f0a4-b62c-3dc4-e8d92eeb76b2@virtuozzo.com>
  8 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-08-21  9:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	den, berto, Anton Nefedov

A block driver can provide a callback to report driver-specific
statistics.

file-posix driver now reports discard statistics

Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
 qapi/block-core.json       | 39 +++++++++++++++++++++++++++++++++++++++
 include/block/block.h      |  1 +
 include/block/block_int.h  |  1 +
 block.c                    |  9 +++++++++
 block/file-posix.c         | 17 +++++++++++++++++
 block/qapi.c               |  5 +++++
 tests/qemu-iotests/227.out | 18 ++++++++++++++++++
 7 files changed, 90 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 19d4743..9cac2aa 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -878,6 +878,42 @@
            '*x_flush_latency_histogram': 'BlockLatencyHistogramInfo' } }
 
 ##
+# @BlockStatsSpecificFile:
+#
+# File driver statistics
+#
+# @discard-nb-ok: The number of succeeded discard operations performed by
+#                 the driver.
+#
+# @discard-nb-failed: The number of failed discard operations performed by
+#                     the driver.
+#
+# @discard-bytes-ok: The number of bytes discarded by the driver.
+#
+# Since 3.1
+##
+{ 'struct': 'BlockStatsSpecificFile',
+  'data': {
+      'discard-nb-ok': 'int',
+      'discard-nb-failed': 'int',
+      'discard-bytes-ok': 'int'
+  } }
+
+##
+# @BlockStatsSpecific:
+#
+# Block driver specific statistics
+#
+# Since: 3.1
+##
+{ 'union': 'BlockStatsSpecific',
+  'base': { 'driver': 'BlockdevDriver' },
+  'discriminator': 'driver',
+  'data': {
+      'file': 'BlockStatsSpecificFile'
+  } }
+
+##
 # @BlockStats:
 #
 # Statistics of a virtual block device or a block backing device.
@@ -892,6 +928,8 @@
 #
 # @stats:  A @BlockDeviceStats for the device.
 #
+# @driver-specific: Optional driver-specific stats. (Since 3.1)
+#
 # @parent: This describes the file block device if it has one.
 #          Contains recursively the statistics of the underlying
 #          protocol (e.g. the host file for a qcow2 image). If there is
@@ -905,6 +943,7 @@
 { 'struct': 'BlockStats',
   'data': {'*device': 'str', '*qdev': 'str', '*node-name': 'str',
            'stats': 'BlockDeviceStats',
+           '*driver-specific': 'BlockStatsSpecific',
            '*parent': 'BlockStats',
            '*backing': 'BlockStats'} }
 
diff --git a/include/block/block.h b/include/block/block.h
index 4e0871a..cbfd831 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -478,6 +478,7 @@ const char *bdrv_get_device_or_node_name(const BlockDriverState *bs);
 int bdrv_get_flags(BlockDriverState *bs);
 int bdrv_get_info(BlockDriverState *bs, BlockDriverInfo *bdi);
 ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs);
+BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs);
 void bdrv_round_to_clusters(BlockDriverState *bs,
                             int64_t offset, int64_t bytes,
                             int64_t *cluster_offset,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 903b9c1..3ffe2e1 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -320,6 +320,7 @@ struct BlockDriver {
                                   Error **errp);
     int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
     ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs);
+    BlockStatsSpecific *(*bdrv_get_specific_stats)(BlockDriverState *bs);
 
     int coroutine_fn (*bdrv_save_vmstate)(BlockDriverState *bs,
                                           QEMUIOVector *qiov,
diff --git a/block.c b/block.c
index 6161dbe..0ce4f4d 100644
--- a/block.c
+++ b/block.c
@@ -4200,6 +4200,15 @@ ImageInfoSpecific *bdrv_get_specific_info(BlockDriverState *bs)
     return NULL;
 }
 
+BlockStatsSpecific *bdrv_get_specific_stats(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+    if (!drv || !drv->bdrv_get_specific_stats) {
+        return NULL;
+    }
+    return drv->bdrv_get_specific_stats(bs);
+}
+
 void bdrv_debug_event(BlockDriverState *bs, BlkdebugEvent event)
 {
     if (!bs || !bs->drv || !bs->drv->bdrv_debug_event) {
diff --git a/block/file-posix.c b/block/file-posix.c
index c420f76..cc46ef8 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2623,6 +2623,21 @@ static int raw_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
     return 0;
 }
 
+static BlockStatsSpecific *raw_get_specific_stats(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    BlockStatsSpecific *stats = g_new(BlockStatsSpecific, 1);
+
+    stats->driver = BLOCKDEV_DRIVER_FILE;
+    stats->u.file = (BlockStatsSpecificFile){
+        .discard_nb_ok = s->stats.discard_nb_ok,
+        .discard_nb_failed = s->stats.discard_nb_failed,
+        .discard_bytes_ok = s->stats.discard_bytes_ok,
+    };
+
+    return stats;
+}
+
 static QemuOptsList raw_create_opts = {
     .name = "raw-create-opts",
     .head = QTAILQ_HEAD_INITIALIZER(raw_create_opts.head),
@@ -2734,6 +2749,7 @@ BlockDriver bdrv_file = {
     .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
+    .bdrv_get_specific_stats = raw_get_specific_stats,
     .bdrv_check_perm = raw_check_perm,
     .bdrv_set_perm   = raw_set_perm,
     .bdrv_abort_perm_update = raw_abort_perm_update,
@@ -3219,6 +3235,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_get_info = raw_get_info,
     .bdrv_get_allocated_file_size
                         = raw_get_allocated_file_size,
+    .bdrv_get_specific_stats = raw_get_specific_stats,
     .bdrv_check_perm = raw_check_perm,
     .bdrv_set_perm   = raw_set_perm,
     .bdrv_abort_perm_update = raw_abort_perm_update,
diff --git a/block/qapi.c b/block/qapi.c
index df31f35..74f762e 100644
--- a/block/qapi.c
+++ b/block/qapi.c
@@ -535,6 +535,11 @@ static BlockStats *bdrv_query_bds_stats(BlockDriverState *bs,
 
     s->stats->wr_highest_offset = stat64_get(&bs->wr_highest_offset);
 
+    s->driver_specific = bdrv_get_specific_stats(bs);
+    if (s->driver_specific) {
+        s->has_driver_specific = true;
+    }
+
     if (bs->file) {
         s->has_parent = true;
         s->parent = bdrv_query_bds_stats(bs->file->bs, blk_level);
diff --git a/tests/qemu-iotests/227.out b/tests/qemu-iotests/227.out
index 736f2e3..62a9dba 100644
--- a/tests/qemu-iotests/227.out
+++ b/tests/qemu-iotests/227.out
@@ -15,6 +15,8 @@ Testing: -drive driver=null-co,if=virtio
         {
             "device": "virtio0",
             "stats": {
+                "unmap_operations": 0,
+                "unmap_merged": 0,
                 "flush_total_time_ns": 0,
                 "wr_highest_offset": 0,
                 "wr_total_time_ns": 0,
@@ -24,13 +26,17 @@ Testing: -drive driver=null-co,if=virtio
                 "wr_bytes": 0,
                 "timed_stats": [
                 ],
+                "failed_unmap_operations": 0,
                 "failed_flush_operations": 0,
                 "account_invalid": true,
                 "rd_total_time_ns": 0,
+                "invalid_unmap_operations": 0,
                 "flush_operations": 0,
                 "wr_operations": 0,
+                "unmap_bytes": 0,
                 "rd_merged": 0,
                 "rd_bytes": 0,
+                "unmap_total_time_ns": 0,
                 "invalid_flush_operations": 0,
                 "account_failed": true,
                 "rd_operations": 0,
@@ -73,6 +79,8 @@ Testing: -drive driver=null-co,if=none
         {
             "device": "none0",
             "stats": {
+                "unmap_operations": 0,
+                "unmap_merged": 0,
                 "flush_total_time_ns": 0,
                 "wr_highest_offset": 0,
                 "wr_total_time_ns": 0,
@@ -82,13 +90,17 @@ Testing: -drive driver=null-co,if=none
                 "wr_bytes": 0,
                 "timed_stats": [
                 ],
+                "failed_unmap_operations": 0,
                 "failed_flush_operations": 0,
                 "account_invalid": true,
                 "rd_total_time_ns": 0,
+                "invalid_unmap_operations": 0,
                 "flush_operations": 0,
                 "wr_operations": 0,
+                "unmap_bytes": 0,
                 "rd_merged": 0,
                 "rd_bytes": 0,
+                "unmap_total_time_ns": 0,
                 "invalid_flush_operations": 0,
                 "account_failed": true,
                 "rd_operations": 0,
@@ -160,6 +172,8 @@ Testing: -blockdev driver=null-co,node-name=null -device virtio-blk,drive=null,i
         {
             "device": "",
             "stats": {
+                "unmap_operations": 0,
+                "unmap_merged": 0,
                 "flush_total_time_ns": 0,
                 "wr_highest_offset": 0,
                 "wr_total_time_ns": 0,
@@ -169,13 +183,17 @@ Testing: -blockdev driver=null-co,node-name=null -device virtio-blk,drive=null,i
                 "wr_bytes": 0,
                 "timed_stats": [
                 ],
+                "failed_unmap_operations": 0,
                 "failed_flush_operations": 0,
                 "account_invalid": false,
                 "rd_total_time_ns": 0,
+                "invalid_unmap_operations": 0,
                 "flush_operations": 0,
                 "wr_operations": 0,
+                "unmap_bytes": 0,
                 "rd_merged": 0,
                 "rd_bytes": 0,
+                "unmap_total_time_ns": 0,
                 "invalid_flush_operations": 0,
                 "account_failed": false,
                 "rd_operations": 0,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/8] discard blockstats
       [not found] ` <6d0b2d5b-f0a4-b62c-3dc4-e8d92eeb76b2@virtuozzo.com>
@ 2018-10-04 14:04   ` Anton Nefedov
  0 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-10-04 14:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, kwolf, mreitz, armbru, jsnow, pbonzini, famz, eblake,
	Denis Lunev, berto

ping-2

On 18/9/2018 11:12 AM, Anton Nefedov wrote:
> ping
> 
> do you think we might proceed with this? or is there any general doubt
> about the idea?
> 
> thanks,
> 
> On 21/8/2018 12:46 PM, Anton Nefedov wrote:
>> new in v4:
>>      - patch 7: discard and write-zeroes code paths had been separated in
>>        34fa110e: file-posix: Fix write_zeroes with unmap on block 
>> devices.
>>        This patch now only accounts discards that come explicitly
>>        through .bdrv_co_pdiscard handler.
>>      - qapi 'Since' clauses changed 3.0 -> 3.1
>>
>> v3: http://lists.nongnu.org/archive/html/qemu-devel/2018-06/msg03688.html
>>
>> ----
>>
>> qmp query-blockstats provides stats info for write/read/flush ops.
>>
>> Patches 1-6 implement the similar for discard (unmap) command for scsi
>> and ide disks.
>> Discard stat "unmap_ops / unmap_bytes" is supposed to account the ops 
>> that
>> have completed without an error.
>>
>> However, discard operation is advisory. Specifically,
>>   - common block layer ignores ENOTSUP error code.
>>     That might be returned if the block driver does not support discard,
>>     or discard has been configured to be ignored.
>>   - format drivers such as qcow2 may ignore discard if they were 
>> configured
>>     to ignore that, or if the corresponding area is already marked unused
>>     (unallocated / zero clusters).
>>
>> And what is actually useful is the number of bytes actually discarded
>> down on the host filesystem.
>> To achieve that, driver-specific statistics has been added to blockstats
>> (patch 8).
>> With patch 7, file-posix driver accounts discard operations on its 
>> level too.
>>
>> query-blockstat result:
>>
>> (note the difference between blockdevice unmap and file discard stats. 
>> qcow2
>> sends fewer ops down to the file as the clusters are actually unallocated
>> on qcow2 level)
>>
>> {
>>    "return": [
>>      {
>>        "device": "drive-scsi0-0-0-0",
>>        "parent": {
>>>        "discard-bytes-ok": 262144,
>>>        "discard-nb-ok": 4,
>>          "stats": {
>>>          "unmap_operations": 0,
>>>          "unmap_merged": 0,
>>            "flush_total_time_ns": 0,
>>            "wr_highest_offset": 8111718400,
>>            "wr_total_time_ns": 0,
>>            "failed_wr_operations": 0,
>>            "failed_rd_operations": 0,
>>            "wr_merged": 0,
>>            "wr_bytes": 0,
>>            "timed_stats": [
>>            ],
>>>          "failed_unmap_operations": 0,
>>            "failed_flush_operations": 0,
>>            "account_invalid": false,
>>            "rd_total_time_ns": 0,
>>>          "invalid_unmap_operations": 0,
>>            "flush_operations": 0,
>>            "wr_operations": 0,
>>>          "unmap_bytes": 0,
>>            "rd_merged": 0,
>>            "rd_bytes": 0,
>>>          "unmap_total_time_ns": 0,
>>            "invalid_flush_operations": 0,
>>            "account_failed": false,
>>            "rd_operations": 0,
>>            "invalid_wr_operations": 0,
>>            "invalid_rd_operations": 0
>>          },
>>          "node-name": "#block012",
>>>        "driver": "file",
>>>        "discard-nb-failed": 0
>>        },
>>        "stats": {
>>>        "unmap_operations": 860,
>>>        "unmap_merged": 0,
>>          "flush_total_time_ns": 21506733,
>>          "wr_highest_offset": 13411741696,
>>          "wr_total_time_ns": 2212749334,
>>          "failed_wr_operations": 0,
>>          "failed_rd_operations": 0,
>>          "wr_merged": 0,
>>          "wr_bytes": 3426304,
>>          "timed_stats": [
>>          ],
>>>        "failed_unmap_operations": 0,
>>          "failed_flush_operations": 0,
>>          "account_invalid": true,
>>          "rd_total_time_ns": 3617478206,
>>>        "invalid_unmap_operations": 0,
>>          "flush_operations": 24,
>>          "wr_operations": 309,
>>>        "unmap_bytes": 11949633536,
>>          "rd_merged": 0,
>>          "rd_bytes": 141967360,
>>>        "unmap_total_time_ns": 14871816,
>>          [..]
>>
>> Anton Nefedov (8):
>>    qapi: group BlockDeviceStats fields
>>    qapi: add unmap to BlockDeviceStats
>>    ide: account UNMAP (TRIM) operations
>>    scsi: store unmap offset and nb_sectors in request struct
>>    scsi: move unmap error checking to the complete callback
>>    scsi: account unmap operations
>>    file-posix: account discard operations
>>    qapi: query-blockstat: add driver specific file-posix stats
>>
>>   qapi/block-core.json       | 82 
>> +++++++++++++++++++++++++++++++++++++++-------
>>   include/block/accounting.h |  1 +
>>   include/block/block.h      |  1 +
>>   include/block/block_int.h  |  1 +
>>   block.c                    |  9 +++++
>>   block/file-posix.c         | 45 +++++++++++++++++++++++--
>>   block/qapi.c               | 11 +++++++
>>   hw/ide/core.c              | 12 +++++++
>>   hw/scsi/scsi-disk.c        | 29 +++++++++-------
>>   tests/qemu-iotests/227.out | 18 ++++++++++
>>   10 files changed, 184 insertions(+), 25 deletions(-)
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations Anton Nefedov
@ 2018-10-04 15:33   ` Kevin Wolf
  2018-10-08 14:38     ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-04 15:33 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, den, berto

Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  hw/ide/core.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/hw/ide/core.c b/hw/ide/core.c
> index 2c62efc..352429b 100644
> --- a/hw/ide/core.c
> +++ b/hw/ide/core.c
> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>      TrimAIOCB *iocb = opaque;
>      IDEState *s = iocb->s;
>  
> +    if (iocb->i >= 0) {
> +        if (ret >= 0) {
> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> +        } else {
> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> +        }
> +    }
> +
>      if (ret >= 0) {
>          while (iocb->j < iocb->qiov->niov) {
>              int j = iocb->j;
> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>                      goto done;
>                  }
>  
> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> +
>                  /* Got an entry! Submit and exit.  */
>                  iocb->aiocb = blk_aio_pdiscard(s->blk,
>                                                 sector << BDRV_SECTOR_BITS,
> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>      }
>  
>      if (ret == -EINVAL) {
> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);

This looks wrong to me, ide_dma_cb() is not only called for unmap, but
also for reads and writes, and each of them could return -EINVAL.

Also, -EINVAL doesn't necessarily mean that the guest driver did
something wrong, it could also be the result of a host problem.
Therefore, it isn't right to call block_acct_invalid() here - especially
since the request may already have been accounted for as either done or
failed in ide_issue_trim_cb().

Instead, I think it would be better to immediately account for invalid
requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
know for sure that indeed !ide_sect_range_ok() is the cause for the
-EINVAL return code.

>          ide_dma_error(s);
>          return;
>      }

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations Anton Nefedov
@ 2018-10-04 15:47   ` Kevin Wolf
  2018-10-08 14:43     ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-04 15:47 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, den, berto

Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> ---
>  hw/scsi/scsi-disk.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
> index 9d10daf..0aac137 100644
> --- a/hw/scsi/scsi-disk.c
> +++ b/hw/scsi/scsi-disk.c
> @@ -1664,6 +1664,10 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
>              goto done;
>          }
>  
> +        block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
> +                         r->sector_count * s->qdev.blocksize,
> +                         BLOCK_ACCT_UNMAP);

If the check just above this (check_lba_range) fails, we should account
for an invalid request.

>          r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
>                                          r->sector * s->qdev.blocksize,
>                                          r->sector_count * s->qdev.blocksize,
> @@ -1690,10 +1694,11 @@ static void scsi_unmap_complete(void *opaque, int ret)
>      r->req.aiocb = NULL;
>  
>      aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
> -    if (scsi_disk_req_check_error(r, ret, false)) {
> +    if (scsi_disk_req_check_error(r, ret, true)) {
>          scsi_req_unref(&r->req);
>          g_free(data);
>      } else {
> +        block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
>          scsi_unmap_complete_noio(data, ret);
>      }
>      aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
> @@ -1740,10 +1745,12 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
>      return;
>  
>  invalid_param_len:
> +    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
>      scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
>      return;
>  
>  invalid_field:
> +    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
>      scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
>  }

What about the blk_is_read_only() case which directly returns without
jumping to one of the error labels?

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations
  2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations Anton Nefedov
@ 2018-10-04 15:52   ` Kevin Wolf
  2018-10-08 13:47     ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-04 15:52 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, den, berto

Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> This will help to identify how many of the user-issued discard operations
> (accounted on a device level) have actually suceeded down on the host file
> (even though the numbers will not be exactly the same if non-raw format
> driver is used (e.g. qcow2 sending metadata discards)).
> 
> Note that these numbers will not include discards triggered by
> write-zeroes + MAY_UNMAP calls.
> 
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>

Why not implement accounting at the BDS level for all drivers? Then we
can also reuse the existing BlockStats fields instead of duplicating
them into driver-specific new ones.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations
  2018-10-04 15:52   ` Kevin Wolf
@ 2018-10-08 13:47     ` Anton Nefedov
  0 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-10-08 13:47 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto



On 4/10/2018 6:52 PM, Kevin Wolf wrote:
> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>> This will help to identify how many of the user-issued discard operations
>> (accounted on a device level) have actually suceeded down on the host file
>> (even though the numbers will not be exactly the same if non-raw format
>> driver is used (e.g. qcow2 sending metadata discards)).
>>
>> Note that these numbers will not include discards triggered by
>> write-zeroes + MAY_UNMAP calls.
>>
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> 
> Why not implement accounting at the BDS level for all drivers? Then we
> can also reuse the existing BlockStats fields instead of duplicating
> them into driver-specific new ones.
> 
> Kevin
> 

I just wonder how useful is that?
Discards are interesting to see on the BDS level as they are optional.
Maybe one can be curious how much data is being read from backing
images. Anything else?

It feels like BDS level is complex enough as it is.
Accounting means we should keep that in mind in all the error paths,
with a risk to not account or to account twice (even more complicated
with bounce buffer fallback paths).

We could probably choose some anchor point to bind to, like

   - account at the very bottom only, i.e. right after drv->bdrv_x()
       (-) missing the cases where earlier sanity checks fired
       (-) easy to double-account fallback scenarios

   - account at tracked_request_begin/end
       (-) missing the cases where earlier sanity checks fired
       (-) accounting blk layer requests, and not the ones actually
           passed to the driver (which can be smaller in size due to
           BlockLimits - and can fail partially in discard case)

Maybe another option would be to keep BlockStats on BDS but still let
the drivers update them?

/Anton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-04 15:33   ` Kevin Wolf
@ 2018-10-08 14:38     ` Anton Nefedov
  2018-10-08 15:03       ` Kevin Wolf
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-10-08 14:38 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto



On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   hw/ide/core.c | 12 ++++++++++++
>>   1 file changed, 12 insertions(+)
>>
>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>> index 2c62efc..352429b 100644
>> --- a/hw/ide/core.c
>> +++ b/hw/ide/core.c
>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>       TrimAIOCB *iocb = opaque;
>>       IDEState *s = iocb->s;
>>   
>> +    if (iocb->i >= 0) {
>> +        if (ret >= 0) {
>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>> +        } else {
>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>> +        }
>> +    }
>> +
>>       if (ret >= 0) {
>>           while (iocb->j < iocb->qiov->niov) {
>>               int j = iocb->j;
>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>                       goto done;
>>                   }
>>   
>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>> +
>>                   /* Got an entry! Submit and exit.  */
>>                   iocb->aiocb = blk_aio_pdiscard(s->blk,
>>                                                  sector << BDRV_SECTOR_BITS,
>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>       }
>>   
>>       if (ret == -EINVAL) {
>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> 
> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> also for reads and writes, and each of them could return -EINVAL.
> 

Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(

> Also, -EINVAL doesn't necessarily mean that the guest driver did
> something wrong, it could also be the result of a host problem.
> Therefore, it isn't right to call block_acct_invalid() here - especially
> since the request may already have been accounted for as either done or
> failed in ide_issue_trim_cb().
> 

Couldn't be accounted done with such retcode;
and it seems I shouldnt do block_acct_failed() there anyway - or it's
accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()

But if EINVAL (from further layers) should not be accounted as an
invalid op, then it should be accounted failed instead, the thing that
current code does not do.
(and which brings us back to possible double-accounting if we account
invalid in ide_issue_trim_cb() )

> Instead, I think it would be better to immediately account for invalid
> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> know for sure that indeed !ide_sect_range_ok() is the cause for the
> -EINVAL return code.
> 
So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
acct_failed there, and filter off TRIM commands in the common
accounting.

/Anton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations
  2018-10-04 15:47   ` Kevin Wolf
@ 2018-10-08 14:43     ` Anton Nefedov
  0 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-10-08 14:43 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto



On 4/10/2018 6:47 PM, Kevin Wolf wrote:
> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   hw/scsi/scsi-disk.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
>> index 9d10daf..0aac137 100644
>> --- a/hw/scsi/scsi-disk.c
>> +++ b/hw/scsi/scsi-disk.c
>> @@ -1664,6 +1664,10 @@ static void scsi_unmap_complete_noio(UnmapCBData *data, int ret)
>>               goto done;
>>           }
>>   
>> +        block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
>> +                         r->sector_count * s->qdev.blocksize,
>> +                         BLOCK_ACCT_UNMAP);
> 
> If the check just above this (check_lba_range) fails, we should account
> for an invalid request.
> 

Done.

>>           r->req.aiocb = blk_aio_pdiscard(s->qdev.conf.blk,
>>                                           r->sector * s->qdev.blocksize,
>>                                           r->sector_count * s->qdev.blocksize,
>> @@ -1690,10 +1694,11 @@ static void scsi_unmap_complete(void *opaque, int ret)
>>       r->req.aiocb = NULL;
>>   
>>       aio_context_acquire(blk_get_aio_context(s->qdev.conf.blk));
>> -    if (scsi_disk_req_check_error(r, ret, false)) {
>> +    if (scsi_disk_req_check_error(r, ret, true)) {
>>           scsi_req_unref(&r->req);
>>           g_free(data);
>>       } else {
>> +        block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
>>           scsi_unmap_complete_noio(data, ret);
>>       }
>>       aio_context_release(blk_get_aio_context(s->qdev.conf.blk));
>> @@ -1740,10 +1745,12 @@ static void scsi_disk_emulate_unmap(SCSIDiskReq *r, uint8_t *inbuf)
>>       return;
>>   
>>   invalid_param_len:
>> +    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
>>       scsi_check_condition(r, SENSE_CODE(INVALID_PARAM_LEN));
>>       return;
>>   
>>   invalid_field:
>> +    block_acct_invalid(blk_get_stats(s->qdev.conf.blk), BLOCK_ACCT_UNMAP);
>>       scsi_check_condition(r, SENSE_CODE(INVALID_FIELD));
>>   }
> 
> What about the blk_is_read_only() case which directly returns without
> jumping to one of the error labels?
> 

So basically anything we don't bring to blk layer is invalid req.
Fixed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 14:38     ` Anton Nefedov
@ 2018-10-08 15:03       ` Kevin Wolf
  2018-10-08 15:25         ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-08 15:03 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto

Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> > Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >> ---
> >>   hw/ide/core.c | 12 ++++++++++++
> >>   1 file changed, 12 insertions(+)
> >>
> >> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >> index 2c62efc..352429b 100644
> >> --- a/hw/ide/core.c
> >> +++ b/hw/ide/core.c
> >> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>       TrimAIOCB *iocb = opaque;
> >>       IDEState *s = iocb->s;
> >>   
> >> +    if (iocb->i >= 0) {
> >> +        if (ret >= 0) {
> >> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >> +        } else {
> >> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >> +        }
> >> +    }
> >> +
> >>       if (ret >= 0) {
> >>           while (iocb->j < iocb->qiov->niov) {
> >>               int j = iocb->j;
> >> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>                       goto done;
> >>                   }
> >>   
> >> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >> +
> >>                   /* Got an entry! Submit and exit.  */
> >>                   iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>                                                  sector << BDRV_SECTOR_BITS,
> >> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>       }
> >>   
> >>       if (ret == -EINVAL) {
> >> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> > 
> > This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> > also for reads and writes, and each of them could return -EINVAL.
> > 
> 
> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> 
> > Also, -EINVAL doesn't necessarily mean that the guest driver did
> > something wrong, it could also be the result of a host problem.
> > Therefore, it isn't right to call block_acct_invalid() here - especially
> > since the request may already have been accounted for as either done or
> > failed in ide_issue_trim_cb().
> > 
> 
> Couldn't be accounted done with such retcode;
> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> 
> But if EINVAL (from further layers) should not be accounted as an
> invalid op, then it should be accounted failed instead, the thing that
> current code does not do.
> (and which brings us back to possible double-accounting if we account
> invalid in ide_issue_trim_cb() )

Yes, commit caeadbc8ba4 was already wrong in assuming that there is
only one possible source for -EINVAL.

> > Instead, I think it would be better to immediately account for invalid
> > requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> > know for sure that indeed !ide_sect_range_ok() is the cause for the
> > -EINVAL return code.
> > 
> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> acct_failed there, and filter off TRIM commands in the common
> accounting.

blk_aio_discard() can fail with -EINVAL, too, so getting this error code
from a TRIM command doesn't mean anything. It can still have multiple
possible sources.

Maybe we just need to remember somewhere whether we already accounted
for a request (maybe an additional field in BlockAcctCookie? Or change
the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
block_account_one_io() call a no-op for such requests.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 15:03       ` Kevin Wolf
@ 2018-10-08 15:25         ` Anton Nefedov
  2018-10-08 15:46           ` Kevin Wolf
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-10-08 15:25 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto



On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>> ---
>>>>    hw/ide/core.c | 12 ++++++++++++
>>>>    1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>> index 2c62efc..352429b 100644
>>>> --- a/hw/ide/core.c
>>>> +++ b/hw/ide/core.c
>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>        TrimAIOCB *iocb = opaque;
>>>>        IDEState *s = iocb->s;
>>>>    
>>>> +    if (iocb->i >= 0) {
>>>> +        if (ret >= 0) {
>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>> +        } else {
>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>> +        }
>>>> +    }
>>>> +
>>>>        if (ret >= 0) {
>>>>            while (iocb->j < iocb->qiov->niov) {
>>>>                int j = iocb->j;
>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>                        goto done;
>>>>                    }
>>>>    
>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>> +
>>>>                    /* Got an entry! Submit and exit.  */
>>>>                    iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>                                                   sector << BDRV_SECTOR_BITS,
>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>        }
>>>>    
>>>>        if (ret == -EINVAL) {
>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>
>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>> also for reads and writes, and each of them could return -EINVAL.
>>>
>>
>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>
>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>> something wrong, it could also be the result of a host problem.
>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>> since the request may already have been accounted for as either done or
>>> failed in ide_issue_trim_cb().
>>>
>>
>> Couldn't be accounted done with such retcode;
>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>
>> But if EINVAL (from further layers) should not be accounted as an
>> invalid op, then it should be accounted failed instead, the thing that
>> current code does not do.
>> (and which brings us back to possible double-accounting if we account
>> invalid in ide_issue_trim_cb() )
> 
> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> only one possible source for -EINVAL.
> 
>>> Instead, I think it would be better to immediately account for invalid
>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>> -EINVAL return code.
>>>
>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>> acct_failed there, and filter off TRIM commands in the common
>> accounting.
> 
> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> from a TRIM command doesn't mean anything. It can still have multiple
> possible sources.
> 

I meant that common ide_dma_cb() should account EINVAL (along with other
errors) as failed, unless it's TRIM, which means it's already
accounted (either invalid or failed)

> Maybe we just need to remember somewhere whether we already accounted
> for a request (maybe an additional field in BlockAcctCookie? Or change
> the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
> block_account_one_io() call a no-op for such requests.
>  > Kevin
> 

Maybe even resetting to BLOCK_ACCT_NONE == 0. It should also protect
from accounting uninitialized cookie.

/Anton

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 15:25         ` Anton Nefedov
@ 2018-10-08 15:46           ` Kevin Wolf
  2018-10-08 16:04             ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-08 15:46 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto

Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
> 
> 
> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> > Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> >> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> >>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >>>> ---
> >>>>    hw/ide/core.c | 12 ++++++++++++
> >>>>    1 file changed, 12 insertions(+)
> >>>>
> >>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >>>> index 2c62efc..352429b 100644
> >>>> --- a/hw/ide/core.c
> >>>> +++ b/hw/ide/core.c
> >>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>        TrimAIOCB *iocb = opaque;
> >>>>        IDEState *s = iocb->s;
> >>>>    
> >>>> +    if (iocb->i >= 0) {
> >>>> +        if (ret >= 0) {
> >>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>>> +        } else {
> >>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >>>> +        }
> >>>> +    }
> >>>> +
> >>>>        if (ret >= 0) {
> >>>>            while (iocb->j < iocb->qiov->niov) {
> >>>>                int j = iocb->j;
> >>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>                        goto done;
> >>>>                    }
> >>>>    
> >>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >>>> +
> >>>>                    /* Got an entry! Submit and exit.  */
> >>>>                    iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>>>                                                   sector << BDRV_SECTOR_BITS,
> >>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>>>        }
> >>>>    
> >>>>        if (ret == -EINVAL) {
> >>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> >>>
> >>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> >>> also for reads and writes, and each of them could return -EINVAL.
> >>>
> >>
> >> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> >>
> >>> Also, -EINVAL doesn't necessarily mean that the guest driver did
> >>> something wrong, it could also be the result of a host problem.
> >>> Therefore, it isn't right to call block_acct_invalid() here - especially
> >>> since the request may already have been accounted for as either done or
> >>> failed in ide_issue_trim_cb().
> >>>
> >>
> >> Couldn't be accounted done with such retcode;
> >> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> >> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> >>
> >> But if EINVAL (from further layers) should not be accounted as an
> >> invalid op, then it should be accounted failed instead, the thing that
> >> current code does not do.
> >> (and which brings us back to possible double-accounting if we account
> >> invalid in ide_issue_trim_cb() )
> > 
> > Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> > only one possible source for -EINVAL.
> > 
> >>> Instead, I think it would be better to immediately account for invalid
> >>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> >>> know for sure that indeed !ide_sect_range_ok() is the cause for the
> >>> -EINVAL return code.
> >>>
> >> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> >> acct_failed there, and filter off TRIM commands in the common
> >> accounting.
> > 
> > blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> > from a TRIM command doesn't mean anything. It can still have multiple
> > possible sources.
> > 
> 
> I meant that common ide_dma_cb() should account EINVAL (along with other
> errors) as failed, unless it's TRIM, which means it's already
> accounted (either invalid or failed)

Oh, you would already account for failure in ide_issue_trim_cb(), too,
but only if it's EINVAL? That feels like it would complicate the code
quite a bit.

And actually, didn't commit caeadbc8ba4 break werror=stop for requests
returning -EINVAL because we don't call ide_handle_rw_error() any more?

> > Maybe we just need to remember somewhere whether we already accounted
> > for a request (maybe an additional field in BlockAcctCookie? Or change
> > the type to BLOCK_ACCT_ALREADY_ACCOUNTED?) and then make an additional
> > block_account_one_io() call a no-op for such requests.
> 
> Maybe even resetting to BLOCK_ACCT_NONE == 0. It should also protect
> from accounting uninitialized cookie.

That sounds good to me.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 15:46           ` Kevin Wolf
@ 2018-10-08 16:04             ` Anton Nefedov
  2018-10-08 16:43               ` Kevin Wolf
  0 siblings, 1 reply; 22+ messages in thread
From: Anton Nefedov @ 2018-10-08 16:04 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto



On 8/10/2018 6:46 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
>>
>>
>> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>>>> ---
>>>>>>     hw/ide/core.c | 12 ++++++++++++
>>>>>>     1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>>>> index 2c62efc..352429b 100644
>>>>>> --- a/hw/ide/core.c
>>>>>> +++ b/hw/ide/core.c
>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>         TrimAIOCB *iocb = opaque;
>>>>>>         IDEState *s = iocb->s;
>>>>>>     
>>>>>> +    if (iocb->i >= 0) {
>>>>>> +        if (ret >= 0) {
>>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>> +        } else {
>>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>>>> +        }
>>>>>> +    }
>>>>>> +
>>>>>>         if (ret >= 0) {
>>>>>>             while (iocb->j < iocb->qiov->niov) {
>>>>>>                 int j = iocb->j;
>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>                         goto done;
>>>>>>                     }
>>>>>>     
>>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>>>> +
>>>>>>                     /* Got an entry! Submit and exit.  */
>>>>>>                     iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>>>                                                    sector << BDRV_SECTOR_BITS,
>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>>>         }
>>>>>>     
>>>>>>         if (ret == -EINVAL) {
>>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>>>
>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>>>> also for reads and writes, and each of them could return -EINVAL.
>>>>>
>>>>
>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>>>
>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>>>> something wrong, it could also be the result of a host problem.
>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>>>> since the request may already have been accounted for as either done or
>>>>> failed in ide_issue_trim_cb().
>>>>>
>>>>
>>>> Couldn't be accounted done with such retcode;
>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>>>
>>>> But if EINVAL (from further layers) should not be accounted as an
>>>> invalid op, then it should be accounted failed instead, the thing that
>>>> current code does not do.
>>>> (and which brings us back to possible double-accounting if we account
>>>> invalid in ide_issue_trim_cb() )
>>>
>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
>>> only one possible source for -EINVAL.
>>>
>>>>> Instead, I think it would be better to immediately account for invalid
>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>>>> -EINVAL return code.
>>>>>
>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>>>> acct_failed there, and filter off TRIM commands in the common
>>>> accounting.
>>>
>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
>>> from a TRIM command doesn't mean anything. It can still have multiple
>>> possible sources.
>>>
>>
>> I meant that common ide_dma_cb() should account EINVAL (along with other
>> errors) as failed, unless it's TRIM, which means it's already
>> accounted (either invalid or failed)
> 
> Oh, you would already account for failure in ide_issue_trim_cb(), too,
> but only if it's EINVAL? That feels like it would complicate the code
> quite a bit.
> 

No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
for TRIM.
Then common path (ide_dma_cb()) does not account TRIM operations at all
regardless of the error code. No need to check for TRIM specifically if
we have BLOCK_ACCT_NONE.

> And actually, didn't commit caeadbc8ba4 break werror=stop for requests
> returning -EINVAL because we don't call ide_handle_rw_error() any more?
> 

Yes.

Read/write ignore werror=stop for invalid range case (not for EINVAL).
I wonder if it's crucial to ignore it for TRIM too, otherwise we could
just remove this chunk

      if (ret == -EINVAL) {
          ide_dma_error(s);
          return;
      }


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 16:04             ` Anton Nefedov
@ 2018-10-08 16:43               ` Kevin Wolf
  2018-10-17 15:32                 ` Anton Nefedov
  0 siblings, 1 reply; 22+ messages in thread
From: Kevin Wolf @ 2018-10-08 16:43 UTC (permalink / raw)
  To: Anton Nefedov
  Cc: qemu-devel, qemu-block, mreitz, armbru, jsnow, pbonzini, famz,
	eblake, Denis Lunev, berto

Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben:
> 
> 
> On 8/10/2018 6:46 PM, Kevin Wolf wrote:
> > Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
> >>
> >>
> >> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
> >>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
> >>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
> >>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
> >>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> >>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
> >>>>>> ---
> >>>>>>     hw/ide/core.c | 12 ++++++++++++
> >>>>>>     1 file changed, 12 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
> >>>>>> index 2c62efc..352429b 100644
> >>>>>> --- a/hw/ide/core.c
> >>>>>> +++ b/hw/ide/core.c
> >>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>>>         TrimAIOCB *iocb = opaque;
> >>>>>>         IDEState *s = iocb->s;
> >>>>>>     
> >>>>>> +    if (iocb->i >= 0) {
> >>>>>> +        if (ret >= 0) {
> >>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
> >>>>>> +        } else {
> >>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
> >>>>>> +        }
> >>>>>> +    }
> >>>>>> +
> >>>>>>         if (ret >= 0) {
> >>>>>>             while (iocb->j < iocb->qiov->niov) {
> >>>>>>                 int j = iocb->j;
> >>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
> >>>>>>                         goto done;
> >>>>>>                     }
> >>>>>>     
> >>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
> >>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
> >>>>>> +
> >>>>>>                     /* Got an entry! Submit and exit.  */
> >>>>>>                     iocb->aiocb = blk_aio_pdiscard(s->blk,
> >>>>>>                                                    sector << BDRV_SECTOR_BITS,
> >>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
> >>>>>>         }
> >>>>>>     
> >>>>>>         if (ret == -EINVAL) {
> >>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
> >>>>>
> >>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
> >>>>> also for reads and writes, and each of them could return -EINVAL.
> >>>>>
> >>>>
> >>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
> >>>>
> >>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
> >>>>> something wrong, it could also be the result of a host problem.
> >>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
> >>>>> since the request may already have been accounted for as either done or
> >>>>> failed in ide_issue_trim_cb().
> >>>>>
> >>>>
> >>>> Couldn't be accounted done with such retcode;
> >>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
> >>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
> >>>>
> >>>> But if EINVAL (from further layers) should not be accounted as an
> >>>> invalid op, then it should be accounted failed instead, the thing that
> >>>> current code does not do.
> >>>> (and which brings us back to possible double-accounting if we account
> >>>> invalid in ide_issue_trim_cb() )
> >>>
> >>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
> >>> only one possible source for -EINVAL.
> >>>
> >>>>> Instead, I think it would be better to immediately account for invalid
> >>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
> >>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
> >>>>> -EINVAL return code.
> >>>>>
> >>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
> >>>> acct_failed there, and filter off TRIM commands in the common
> >>>> accounting.
> >>>
> >>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
> >>> from a TRIM command doesn't mean anything. It can still have multiple
> >>> possible sources.
> >>>
> >>
> >> I meant that common ide_dma_cb() should account EINVAL (along with other
> >> errors) as failed, unless it's TRIM, which means it's already
> >> accounted (either invalid or failed)
> > 
> > Oh, you would already account for failure in ide_issue_trim_cb(), too,
> > but only if it's EINVAL? That feels like it would complicate the code
> > quite a bit.
> > 
> 
> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
> for TRIM.
> Then common path (ide_dma_cb()) does not account TRIM operations at all
> regardless of the error code. No need to check for TRIM specifically if
> we have BLOCK_ACCT_NONE.
> 
> > And actually, didn't commit caeadbc8ba4 break werror=stop for requests
> > returning -EINVAL because we don't call ide_handle_rw_error() any more?
> > 
> 
> Yes.
> 
> Read/write ignore werror=stop for invalid range case (not for EINVAL).
> I wonder if it's crucial to ignore it for TRIM too, otherwise we could
> just remove this chunk
> 
>       if (ret == -EINVAL) {
>           ide_dma_error(s);
>           return;
>       }

Ah, right, I forgot about this.

It is actually desirable to avoid stopping for invalid requests because
we should only stop for host errors. An invalid request is a guest error
and stopping the VM will do nothing to fix it. (Resuming the VM would
immediately fail again, so the VM would be locked in paused state.)

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations
  2018-10-08 16:43               ` Kevin Wolf
@ 2018-10-17 15:32                 ` Anton Nefedov
  0 siblings, 0 replies; 22+ messages in thread
From: Anton Nefedov @ 2018-10-17 15:32 UTC (permalink / raw)
  To: Kevin Wolf, jsnow
  Cc: qemu-devel, qemu-block, mreitz, armbru, pbonzini, famz, eblake,
	Denis Lunev, berto



On 8/10/2018 7:43 PM, Kevin Wolf wrote:
> Am 08.10.2018 um 18:04 hat Anton Nefedov geschrieben:
>>
>>
>> On 8/10/2018 6:46 PM, Kevin Wolf wrote:
>>> Am 08.10.2018 um 17:25 hat Anton Nefedov geschrieben:
>>>>
>>>>
>>>> On 8/10/2018 6:03 PM, Kevin Wolf wrote:
>>>>> Am 08.10.2018 um 16:38 hat Anton Nefedov geschrieben:
>>>>>> On 4/10/2018 6:33 PM, Kevin Wolf wrote:
>>>>>>> Am 21.08.2018 um 11:46 hat Anton Nefedov geschrieben:
>>>>>>>> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
>>>>>>>> Reviewed-by: Alberto Garcia <berto@igalia.com>
>>>>>>>> ---
>>>>>>>>      hw/ide/core.c | 12 ++++++++++++
>>>>>>>>      1 file changed, 12 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/ide/core.c b/hw/ide/core.c
>>>>>>>> index 2c62efc..352429b 100644
>>>>>>>> --- a/hw/ide/core.c
>>>>>>>> +++ b/hw/ide/core.c
>>>>>>>> @@ -440,6 +440,14 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>>>          TrimAIOCB *iocb = opaque;
>>>>>>>>          IDEState *s = iocb->s;
>>>>>>>>      
>>>>>>>> +    if (iocb->i >= 0) {
>>>>>>>> +        if (ret >= 0) {
>>>>>>>> +            block_acct_done(blk_get_stats(s->blk), &s->acct);
>>>>>>>> +        } else {
>>>>>>>> +            block_acct_failed(blk_get_stats(s->blk), &s->acct);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>          if (ret >= 0) {
>>>>>>>>              while (iocb->j < iocb->qiov->niov) {
>>>>>>>>                  int j = iocb->j;
>>>>>>>> @@ -461,6 +469,9 @@ static void ide_issue_trim_cb(void *opaque, int ret)
>>>>>>>>                          goto done;
>>>>>>>>                      }
>>>>>>>>      
>>>>>>>> +                block_acct_start(blk_get_stats(s->blk), &s->acct,
>>>>>>>> +                                 count << BDRV_SECTOR_BITS, BLOCK_ACCT_UNMAP);
>>>>>>>> +
>>>>>>>>                      /* Got an entry! Submit and exit.  */
>>>>>>>>                      iocb->aiocb = blk_aio_pdiscard(s->blk,
>>>>>>>>                                                     sector << BDRV_SECTOR_BITS,
>>>>>>>> @@ -845,6 +856,7 @@ static void ide_dma_cb(void *opaque, int ret)
>>>>>>>>          }
>>>>>>>>      
>>>>>>>>          if (ret == -EINVAL) {
>>>>>>>> +        block_acct_invalid(blk_get_stats(s->blk), BLOCK_ACCT_UNMAP);
>>>>>>>
>>>>>>> This looks wrong to me, ide_dma_cb() is not only called for unmap, but
>>>>>>> also for reads and writes, and each of them could return -EINVAL.
>>>>>>>
>>>>>>
>>>>>> Stating here BLOCK_ACCT_UNMAP is definitely a blunder :(
>>>>>>
>>>>>>> Also, -EINVAL doesn't necessarily mean that the guest driver did
>>>>>>> something wrong, it could also be the result of a host problem.
>>>>>>> Therefore, it isn't right to call block_acct_invalid() here - especially
>>>>>>> since the request may already have been accounted for as either done or
>>>>>>> failed in ide_issue_trim_cb().
>>>>>>>
>>>>>>
>>>>>> Couldn't be accounted done with such retcode;
>>>>>> and it seems I shouldnt do block_acct_failed() there anyway - or it's
>>>>>> accounted twice: there and in ide_dma_cb()->ide_handle_rw_error()
>>>>>>
>>>>>> But if EINVAL (from further layers) should not be accounted as an
>>>>>> invalid op, then it should be accounted failed instead, the thing that
>>>>>> current code does not do.
>>>>>> (and which brings us back to possible double-accounting if we account
>>>>>> invalid in ide_issue_trim_cb() )
>>>>>
>>>>> Yes, commit caeadbc8ba4 was already wrong in assuming that there is
>>>>> only one possible source for -EINVAL.
>>>>>
>>>>>>> Instead, I think it would be better to immediately account for invalid
>>>>>>> requests in ide_issue_trim_cb() where iocb->ret = -EINVAL is set and we
>>>>>>> know for sure that indeed !ide_sect_range_ok() is the cause for the
>>>>>>> -EINVAL return code.
>>>>>>>
>>>>>> So I guess yes, move acct_invalid in ide_issue_trim_cb() and leave
>>>>>> acct_failed there, and filter off TRIM commands in the common
>>>>>> accounting.
>>>>>
>>>>> blk_aio_discard() can fail with -EINVAL, too, so getting this error code
>>>>> from a TRIM command doesn't mean anything. It can still have multiple
>>>>> possible sources.
>>>>>
>>>>
>>>> I meant that common ide_dma_cb() should account EINVAL (along with other
>>>> errors) as failed, unless it's TRIM, which means it's already
>>>> accounted (either invalid or failed)
>>>
>>> Oh, you would already account for failure in ide_issue_trim_cb(), too,
>>> but only if it's EINVAL? That feels like it would complicate the code
>>> quite a bit.
>>>
>>
>> No, no :) ide_issue_trim_cb does the proper accounting (failed/invalid)
>> for TRIM.
>> Then common path (ide_dma_cb()) does not account TRIM operations at all
>> regardless of the error code. No need to check for TRIM specifically if
>> we have BLOCK_ACCT_NONE.
>>
>>> And actually, didn't commit caeadbc8ba4 break werror=stop for requests
>>> returning -EINVAL because we don't call ide_handle_rw_error() any more?
>>>
>>
>> Yes.
>>
>> Read/write ignore werror=stop for invalid range case (not for EINVAL).
>> I wonder if it's crucial to ignore it for TRIM too, otherwise we could
>> just remove this chunk
>>
>>        if (ret == -EINVAL) {
>>            ide_dma_error(s);
>>            return;
>>        }
> 
> Ah, right, I forgot about this.
> 
> It is actually desirable to avoid stopping for invalid requests because
> we should only stop for host errors. An invalid request is a guest error
> and stopping the VM will do nothing to fix it. (Resuming the VM would
> immediately fail again, so the VM would be locked in paused state.)
> 
> Kevin
> 

I can't come up with a perfect solution of this.

It's hard to reach TRIM sector-ranges from common ide_dma_cb() (which
only has QEMUSGList) and it's hard to accurately report range invalidity
to ide_dma_cb() with a single retval.

I see the following options for invalid range trim:

   1. distinguish range invalidity in ide_dma_cb() with some unused
      errcode instead of -EINVAL
      - does one exist?
   2. use some new global flag on IDEState
      - every callback (ide_dma_cb, pmac_ide_transfer_cb) must check
        and clear that
   3. parse SGList and check range invalidity separately in ide_dma_cb()
      - somewhat too intrusive change

or put up with it:
   4. ignore (account as invalid but do not abort)
   5. treat as a host error i.e. call ide_handle_rw_error() and stop VM
      if werror=stop
   6. keep -EINVAL (as done now) and potentially ignore configured error
      action for the host returned -EINVAL

What do you think?

/Anton

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-10-17 15:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-21  9:46 [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 1/8] qapi: group BlockDeviceStats fields Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 2/8] qapi: add unmap to BlockDeviceStats Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 3/8] ide: account UNMAP (TRIM) operations Anton Nefedov
2018-10-04 15:33   ` Kevin Wolf
2018-10-08 14:38     ` Anton Nefedov
2018-10-08 15:03       ` Kevin Wolf
2018-10-08 15:25         ` Anton Nefedov
2018-10-08 15:46           ` Kevin Wolf
2018-10-08 16:04             ` Anton Nefedov
2018-10-08 16:43               ` Kevin Wolf
2018-10-17 15:32                 ` Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 4/8] scsi: store unmap offset and nb_sectors in request struct Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 5/8] scsi: move unmap error checking to the complete callback Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 6/8] scsi: account unmap operations Anton Nefedov
2018-10-04 15:47   ` Kevin Wolf
2018-10-08 14:43     ` Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 7/8] file-posix: account discard operations Anton Nefedov
2018-10-04 15:52   ` Kevin Wolf
2018-10-08 13:47     ` Anton Nefedov
2018-08-21  9:46 ` [Qemu-devel] [PATCH v4 8/8] qapi: query-blockstat: add driver specific file-posix stats Anton Nefedov
     [not found] ` <6d0b2d5b-f0a4-b62c-3dc4-e8d92eeb76b2@virtuozzo.com>
2018-10-04 14:04   ` [Qemu-devel] [PATCH v4 0/8] discard blockstats Anton Nefedov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.