qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/4] support NVMe smart critial warning injection
@ 2021-01-14  7:22 zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 1/4] block/nvme: introduce bit 5 for critical warning zhenwei pi
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: zhenwei pi @ 2021-01-14  7:22 UTC (permalink / raw)
  To: kbusch, its, kwolf, mreitz; +Cc: zhenwei pi, philmd, qemu-devel, qemu-block

v2 -> v3:
- Introduce "Persistent Memory Region has become read-only or
  unreliable"

- Fix overwritten bar.cap

- Check smart critical warning value from QOM.

- Trigger asynchronous event during smart warning injection.

v1 -> v2:
- Suggested by Philippe & Klaus, set/get smart_critical_warning by QMP.

v1:
- Add smart_critical_warning for nvme device which can be set by QEMU
  command line to emulate hardware error.

Zhenwei Pi (4):
  block/nvme: introduce bit 5 for critical warning
  hw/block/nvme: fix overwritten bar.cap
  hw/block/nvme: add smart_critical_warning property
  hw/blocl/nvme: trigger async event during injecting smart warning

 hw/block/nvme.c      | 86 ++++++++++++++++++++++++++++++++++++++++----
 hw/block/nvme.h      |  1 +
 include/block/nvme.h |  1 +
 3 files changed, 81 insertions(+), 7 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/4] block/nvme: introduce bit 5 for critical warning
  2021-01-14  7:22 [PATCH v3 0/4] support NVMe smart critial warning injection zhenwei pi
@ 2021-01-14  7:22 ` zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap zhenwei pi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: zhenwei pi @ 2021-01-14  7:22 UTC (permalink / raw)
  To: kbusch, its, kwolf, mreitz; +Cc: zhenwei pi, philmd, qemu-devel, qemu-block

According to NVMe spec 1.4 section
<SMART / Health Information (Log Identifier 02h)>, introduce bit 5
for "Persistent Memory Region has become read-only or unreliable".

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 include/block/nvme.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/block/nvme.h b/include/block/nvme.h
index 3e02d9ca98..f68a88c712 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -749,6 +749,7 @@ enum NvmeSmartWarn {
     NVME_SMART_RELIABILITY            = 1 << 2,
     NVME_SMART_MEDIA_READ_ONLY        = 1 << 3,
     NVME_SMART_FAILED_VOLATILE_MEDIA  = 1 << 4,
+    NVME_SMART_PMR_UNRELIABLE         = 1 << 5,
 };
 
 enum NvmeLogIdentifier {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap
  2021-01-14  7:22 [PATCH v3 0/4] support NVMe smart critial warning injection zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 1/4] block/nvme: introduce bit 5 for critical warning zhenwei pi
@ 2021-01-14  7:22 ` zhenwei pi
  2021-01-14  8:24   ` Klaus Jensen
  2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
  3 siblings, 1 reply; 12+ messages in thread
From: zhenwei pi @ 2021-01-14  7:22 UTC (permalink / raw)
  To: kbusch, its, kwolf, mreitz; +Cc: zhenwei pi, philmd, qemu-devel, qemu-block

After PMR initialization, bar.cap should not be clear in function
nvme_init_ctrl. Otherwise the PMR cap would be always disabled.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 hw/block/nvme.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 27d2c72716..f361103bb4 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -2745,7 +2745,6 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
     id->psd[0].enlat = cpu_to_le32(0x10);
     id->psd[0].exlat = cpu_to_le32(0x4);
 
-    n->bar.cap = 0;
     NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
     NVME_CAP_SET_CQR(n->bar.cap, 1);
     NVME_CAP_SET_TO(n->bar.cap, 0xf);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property
  2021-01-14  7:22 [PATCH v3 0/4] support NVMe smart critial warning injection zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 1/4] block/nvme: introduce bit 5 for critical warning zhenwei pi
  2021-01-14  7:22 ` [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap zhenwei pi
@ 2021-01-14  7:22 ` zhenwei pi
  2021-01-14  8:29   ` Klaus Jensen
                     ` (2 more replies)
  2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
  3 siblings, 3 replies; 12+ messages in thread
From: zhenwei pi @ 2021-01-14  7:22 UTC (permalink / raw)
  To: kbusch, its, kwolf, mreitz; +Cc: zhenwei pi, philmd, qemu-devel, qemu-block

There is a very low probability that hitting physical NVMe disk
hardware critical warning case, it's hard to write & test a monitor
agent service.

For debugging purposes, add a new 'smart_critical_warning' property
to emulate this situation.

The orignal version of this change is implemented by adding a fixed
property which could be initialized by QEMU command line. Suggested
by Philippe & Klaus, rework like current version.

Test with this patch:
1, change smart_critical_warning property for a running VM:
 #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
  "arguments": { "path": "/machine/peripheral-anon/device[0]",
  "property": "smart_critical_warning", "value":16 } }'
2, run smartctl in guest
 #smartctl -H -l error /dev/nvme0n1

  === START OF SMART DATA SECTION ===
  SMART overall-health self-assessment test result: FAILED!
  - volatile memory backup device has failed

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 hw/block/nvme.c | 40 ++++++++++++++++++++++++++++++++++++++++
 hw/block/nvme.h |  1 +
 2 files changed, 41 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index f361103bb4..ce9a9c9023 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1214,6 +1214,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
     }
 
     trans_len = MIN(sizeof(smart) - off, buf_len);
+    smart.critical_warning = n->smart_critical_warning;
 
     smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
                                                         1000));
@@ -2826,6 +2827,41 @@ static Property nvme_props[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+
+static void nvme_get_smart_warning(Object *obj, Visitor *v, const char *name,
+                                   void *opaque, Error **errp)
+{
+    NvmeCtrl *s = NVME(obj);
+    uint8_t value = s->smart_critical_warning;
+
+    visit_type_uint8(v, name, &value, errp);
+}
+
+static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
+                                   void *opaque, Error **errp)
+{
+    NvmeCtrl *s = NVME(obj);
+    uint8_t value, cap = 0;
+    uint64_t pmr_cap = CAP_PMR_MASK;
+
+    if (!visit_type_uint8(v, name, &value, errp)) {
+        return;
+    }
+
+    cap = NVME_SMART_SPARE | NVME_SMART_TEMPERATURE | NVME_SMART_RELIABILITY
+          | NVME_SMART_MEDIA_READ_ONLY | NVME_SMART_FAILED_VOLATILE_MEDIA;
+    if (s->bar.cap & (pmr_cap << CAP_PMR_SHIFT)) {
+        cap |= NVME_SMART_PMR_UNRELIABLE;
+    }
+
+    if ((value & cap) != value) {
+        error_setg(errp, "unsupported smart critical warning value");
+        return;
+    }
+
+    s->smart_critical_warning = value;
+}
+
 static const VMStateDescription nvme_vmstate = {
     .name = "nvme",
     .unmigratable = 1,
@@ -2856,6 +2892,10 @@ static void nvme_instance_init(Object *obj)
                                       "bootindex", "/namespace@1,0",
                                       DEVICE(obj));
     }
+
+    object_property_add(obj, "smart_critical_warning", "uint8",
+                        nvme_get_smart_warning,
+                        nvme_set_smart_warning, NULL, NULL);
 }
 
 static const TypeInfo nvme_info = {
diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index e080a2318a..64e3497244 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -139,6 +139,7 @@ typedef struct NvmeCtrl {
     uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
     uint64_t    starttime_ms;
     uint16_t    temperature;
+    uint8_t     smart_critical_warning;
 
     HostMemoryBackend *pmrdev;
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning
  2021-01-14  7:22 [PATCH v3 0/4] support NVMe smart critial warning injection zhenwei pi
                   ` (2 preceding siblings ...)
  2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
@ 2021-01-14  7:22 ` zhenwei pi
  2021-01-14  8:23   ` Klaus Jensen
                     ` (2 more replies)
  3 siblings, 3 replies; 12+ messages in thread
From: zhenwei pi @ 2021-01-14  7:22 UTC (permalink / raw)
  To: kbusch, its, kwolf, mreitz; +Cc: zhenwei pi, philmd, qemu-devel, qemu-block

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=ascii, Size: 3097 bytes --]

During smart critical warning injection by setting property from QMP
command, also try to trigger asynchronous event.

Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
---
 hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 7 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index ce9a9c9023..1feb603471 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -847,6 +847,36 @@ static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
     nvme_process_aers(n);
 }
 
+static void nvme_enqueue_smart_event(NvmeCtrl *n, uint8_t event)
+{
+    uint8_t aer_info;
+
+    if (!(NVME_AEC_SMART(n->features.async_config) & event)) {
+        return;
+    }
+
+    /* Ref SPEC <Asynchronous Event Information – SMART / Health Status> */
+    switch (event) {
+    case NVME_SMART_SPARE:
+        aer_info = NVME_AER_INFO_SMART_SPARE_THRESH;
+        break;
+    case NVME_SMART_TEMPERATURE:
+        aer_info = NVME_AER_INFO_SMART_TEMP_THRESH;
+        break;
+    case NVME_SMART_RELIABILITY:
+    case NVME_SMART_MEDIA_READ_ONLY:
+    case NVME_SMART_FAILED_VOLATILE_MEDIA:
+        aer_info = NVME_AER_INFO_SMART_RELIABILITY;
+        break;
+    case NVME_SMART_PMR_UNRELIABLE:
+        /* TODO if NVME_SMART_PMR_UNRELIABLE is defined in future */
+    default:
+        return;
+    }
+
+    nvme_enqueue_event(n, NVME_AER_TYPE_SMART, aer_info, NVME_LOG_SMART_INFO);
+}
+
 static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
 {
     n->aer_mask &= ~(1 << event_type);
@@ -1824,12 +1854,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
             return NVME_INVALID_FIELD | NVME_DNR;
         }
 
-        if (((n->temperature >= n->features.temp_thresh_hi) ||
-             (n->temperature <= n->features.temp_thresh_low)) &&
-            NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
-            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
-                               NVME_AER_INFO_SMART_TEMP_THRESH,
-                               NVME_LOG_SMART_INFO);
+        if ((n->temperature >= n->features.temp_thresh_hi) ||
+             (n->temperature <= n->features.temp_thresh_low)) {
+            nvme_enqueue_smart_event(n, NVME_AER_INFO_SMART_TEMP_THRESH);
         }
 
         break;
@@ -2841,7 +2868,7 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
                                    void *opaque, Error **errp)
 {
     NvmeCtrl *s = NVME(obj);
-    uint8_t value, cap = 0;
+    uint8_t value, cap = 0, event;
     uint64_t pmr_cap = CAP_PMR_MASK;
 
     if (!visit_type_uint8(v, name, &value, errp)) {
@@ -2860,6 +2887,12 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
     }
 
     s->smart_critical_warning = value;
+
+    /* test each bit of uint8_t for smart.critical_warning */
+    for (event = 0; event < 8; event++) {
+        if (value & (1 << event))
+            nvme_enqueue_smart_event(s, 1 << event);
+    }
 }
 
 static const VMStateDescription nvme_vmstate = {
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning
  2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
@ 2021-01-14  8:23   ` Klaus Jensen
  2021-01-14 15:57   ` Philippe Mathieu-Daudé
  2021-01-14 22:29   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Klaus Jensen @ 2021-01-14  8:23 UTC (permalink / raw)
  To: zhenwei pi; +Cc: kwolf, qemu-block, qemu-devel, mreitz, kbusch, philmd

[-- Attachment #1: Type: text/plain, Size: 3938 bytes --]

On Jan 14 15:22, zhenwei pi wrote:
> During smart critical warning injection by setting property from QMP
> command, also try to trigger asynchronous event.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index ce9a9c9023..1feb603471 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -847,6 +847,36 @@ static void nvme_enqueue_event(NvmeCtrl *n, uint8_t event_type,
>      nvme_process_aers(n);
>  }
>  
> +static void nvme_enqueue_smart_event(NvmeCtrl *n, uint8_t event)

Maybe rename to just nvme_smart_event, since it is conditional if it
enqueues anything.

> +{
> +    uint8_t aer_info;
> +
> +    if (!(NVME_AEC_SMART(n->features.async_config) & event)) {
> +        return;
> +    }
> +
> +    /* Ref SPEC <Asynchronous Event Information ??? SMART / Health Status> */
> +    switch (event) {
> +    case NVME_SMART_SPARE:
> +        aer_info = NVME_AER_INFO_SMART_SPARE_THRESH;
> +        break;
> +    case NVME_SMART_TEMPERATURE:
> +        aer_info = NVME_AER_INFO_SMART_TEMP_THRESH;
> +        break;
> +    case NVME_SMART_RELIABILITY:
> +    case NVME_SMART_MEDIA_READ_ONLY:
> +    case NVME_SMART_FAILED_VOLATILE_MEDIA:
> +        aer_info = NVME_AER_INFO_SMART_RELIABILITY;
> +        break;
> +    case NVME_SMART_PMR_UNRELIABLE:
> +        /* TODO if NVME_SMART_PMR_UNRELIABLE is defined in future */

Doesn't NVME_SMART_PMR_UNRELIABLE fall under the
NVME_AER_INFO_SMART_RELIABILITY SMART/Health information group? The spec
says that the PMR becoming unreliable can cause an AEN, so I think that
is the only group that is usable.

> +    default:
> +        return;
> +    }
> +
> +    nvme_enqueue_event(n, NVME_AER_TYPE_SMART, aer_info, NVME_LOG_SMART_INFO);
> +}
> +
>  static void nvme_clear_events(NvmeCtrl *n, uint8_t event_type)
>  {
>      n->aer_mask &= ~(1 << event_type);
> @@ -1824,12 +1854,9 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeRequest *req)
>              return NVME_INVALID_FIELD | NVME_DNR;
>          }
>  
> -        if (((n->temperature >= n->features.temp_thresh_hi) ||
> -             (n->temperature <= n->features.temp_thresh_low)) &&
> -            NVME_AEC_SMART(n->features.async_config) & NVME_SMART_TEMPERATURE) {
> -            nvme_enqueue_event(n, NVME_AER_TYPE_SMART,
> -                               NVME_AER_INFO_SMART_TEMP_THRESH,
> -                               NVME_LOG_SMART_INFO);
> +        if ((n->temperature >= n->features.temp_thresh_hi) ||
> +             (n->temperature <= n->features.temp_thresh_low)) {
> +            nvme_enqueue_smart_event(n, NVME_AER_INFO_SMART_TEMP_THRESH);
>          }
>  
>          break;
> @@ -2841,7 +2868,7 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>                                     void *opaque, Error **errp)
>  {
>      NvmeCtrl *s = NVME(obj);
> -    uint8_t value, cap = 0;
> +    uint8_t value, cap = 0, event;
>      uint64_t pmr_cap = CAP_PMR_MASK;
>  
>      if (!visit_type_uint8(v, name, &value, errp)) {
> @@ -2860,6 +2887,12 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>      }
>  
>      s->smart_critical_warning = value;
> +
> +    /* test each bit of uint8_t for smart.critical_warning */
> +    for (event = 0; event < 8; event++) {
> +        if (value & (1 << event))
> +            nvme_enqueue_smart_event(s, 1 << event);
> +    }

I suggest you add a NVME_SMART_WARN_MAX to the NvmeSmartWarn enum with
value '6' and use that instead of the literal '8'.

>  }
>  
>  static const VMStateDescription nvme_vmstate = {
> -- 
> 2.25.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap
  2021-01-14  7:22 ` [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap zhenwei pi
@ 2021-01-14  8:24   ` Klaus Jensen
  0 siblings, 0 replies; 12+ messages in thread
From: Klaus Jensen @ 2021-01-14  8:24 UTC (permalink / raw)
  To: zhenwei pi; +Cc: kwolf, qemu-block, qemu-devel, mreitz, kbusch, philmd

[-- Attachment #1: Type: text/plain, Size: 983 bytes --]

On Jan 14 15:22, zhenwei pi wrote:
> After PMR initialization, bar.cap should not be clear in function
> nvme_init_ctrl. Otherwise the PMR cap would be always disabled.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 27d2c72716..f361103bb4 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -2745,7 +2745,6 @@ static void nvme_init_ctrl(NvmeCtrl *n, PCIDevice *pci_dev)
>      id->psd[0].enlat = cpu_to_le32(0x10);
>      id->psd[0].exlat = cpu_to_le32(0x4);
>  
> -    n->bar.cap = 0;
>      NVME_CAP_SET_MQES(n->bar.cap, 0x7ff);
>      NVME_CAP_SET_CQR(n->bar.cap, 1);
>      NVME_CAP_SET_TO(n->bar.cap, 0xf);
> -- 
> 2.25.1
> 
> 

Good fix, but looks like you are on master and not on nvme-next[1]? The
same fix is already staged.

  [1]: http://git.infradead.org/qemu-nvme.git/shortlog/refs/heads/nvme-next

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property
  2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
@ 2021-01-14  8:29   ` Klaus Jensen
  2021-01-14 15:55   ` Philippe Mathieu-Daudé
  2021-01-14 22:23   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Klaus Jensen @ 2021-01-14  8:29 UTC (permalink / raw)
  To: zhenwei pi; +Cc: kwolf, qemu-block, qemu-devel, mreitz, kbusch, philmd

[-- Attachment #1: Type: text/plain, Size: 4263 bytes --]

On Jan 14 15:22, zhenwei pi wrote:
> There is a very low probability that hitting physical NVMe disk
> hardware critical warning case, it's hard to write & test a monitor
> agent service.
> 
> For debugging purposes, add a new 'smart_critical_warning' property
> to emulate this situation.
> 
> The orignal version of this change is implemented by adding a fixed
> property which could be initialized by QEMU command line. Suggested
> by Philippe & Klaus, rework like current version.
> 
> Test with this patch:
> 1, change smart_critical_warning property for a running VM:
>  #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
>   "arguments": { "path": "/machine/peripheral-anon/device[0]",
>   "property": "smart_critical_warning", "value":16 } }'
> 2, run smartctl in guest
>  #smartctl -H -l error /dev/nvme0n1
> 
>   === START OF SMART DATA SECTION ===
>   SMART overall-health self-assessment test result: FAILED!
>   - volatile memory backup device has failed
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  hw/block/nvme.h |  1 +
>  2 files changed, 41 insertions(+)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index f361103bb4..ce9a9c9023 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1214,6 +1214,7 @@ static uint16_t nvme_smart_info(NvmeCtrl *n, uint8_t rae, uint32_t buf_len,
>      }
>  
>      trans_len = MIN(sizeof(smart) - off, buf_len);
> +    smart.critical_warning = n->smart_critical_warning;
>  
>      smart.data_units_read[0] = cpu_to_le64(DIV_ROUND_UP(stats.units_read,
>                                                          1000));
> @@ -2826,6 +2827,41 @@ static Property nvme_props[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +
> +static void nvme_get_smart_warning(Object *obj, Visitor *v, const char *name,
> +                                   void *opaque, Error **errp)
> +{
> +    NvmeCtrl *s = NVME(obj);
> +    uint8_t value = s->smart_critical_warning;
> +
> +    visit_type_uint8(v, name, &value, errp);
> +}
> +
> +static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
> +                                   void *opaque, Error **errp)
> +{
> +    NvmeCtrl *s = NVME(obj);
> +    uint8_t value, cap = 0;
> +    uint64_t pmr_cap = CAP_PMR_MASK;
> +
> +    if (!visit_type_uint8(v, name, &value, errp)) {
> +        return;
> +    }
> +
> +    cap = NVME_SMART_SPARE | NVME_SMART_TEMPERATURE | NVME_SMART_RELIABILITY
> +          | NVME_SMART_MEDIA_READ_ONLY | NVME_SMART_FAILED_VOLATILE_MEDIA;
> +    if (s->bar.cap & (pmr_cap << CAP_PMR_SHIFT)) {
> +        cap |= NVME_SMART_PMR_UNRELIABLE;
> +    }

Looks like an NVME_CAP_PMRS(cap) macro is missing in
include/block/nvme.h. I have added it in another PMR series under
review, but you can add it here as well instead of manually doing the
shift and check.

> +
> +    if ((value & cap) != value) {
> +        error_setg(errp, "unsupported smart critical warning value");
> +        return;
> +    }
> +
> +    s->smart_critical_warning = value;
> +}
> +
>  static const VMStateDescription nvme_vmstate = {
>      .name = "nvme",
>      .unmigratable = 1,
> @@ -2856,6 +2892,10 @@ static void nvme_instance_init(Object *obj)
>                                        "bootindex", "/namespace@1,0",
>                                        DEVICE(obj));
>      }
> +
> +    object_property_add(obj, "smart_critical_warning", "uint8",
> +                        nvme_get_smart_warning,
> +                        nvme_set_smart_warning, NULL, NULL);
>  }
>  
>  static const TypeInfo nvme_info = {
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index e080a2318a..64e3497244 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -139,6 +139,7 @@ typedef struct NvmeCtrl {
>      uint64_t    timestamp_set_qemu_clock_ms;    /* QEMU clock time */
>      uint64_t    starttime_ms;
>      uint16_t    temperature;
> +    uint8_t     smart_critical_warning;
>  
>      HostMemoryBackend *pmrdev;
>  
> -- 
> 2.25.1
> 
> 

-- 
One of us - No more doubt, silence or taboo about mental illness.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property
  2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
  2021-01-14  8:29   ` Klaus Jensen
@ 2021-01-14 15:55   ` Philippe Mathieu-Daudé
  2021-01-14 22:23   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-14 15:55 UTC (permalink / raw)
  To: zhenwei pi, kbusch, its, kwolf, mreitz; +Cc: qemu-devel, qemu-block

On 1/14/21 8:22 AM, zhenwei pi wrote:
> There is a very low probability that hitting physical NVMe disk
> hardware critical warning case, it's hard to write & test a monitor
> agent service.
> 
> For debugging purposes, add a new 'smart_critical_warning' property
> to emulate this situation.
> 
> The orignal version of this change is implemented by adding a fixed
> property which could be initialized by QEMU command line. Suggested
> by Philippe & Klaus, rework like current version.
> 
> Test with this patch:
> 1, change smart_critical_warning property for a running VM:
>  #virsh qemu-monitor-command nvme-upstream '{ "execute": "qom-set",
>   "arguments": { "path": "/machine/peripheral-anon/device[0]",
>   "property": "smart_critical_warning", "value":16 } }'
> 2, run smartctl in guest
>  #smartctl -H -l error /dev/nvme0n1
> 
>   === START OF SMART DATA SECTION ===
>   SMART overall-health self-assessment test result: FAILED!
>   - volatile memory backup device has failed
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  hw/block/nvme.h |  1 +
>  2 files changed, 41 insertions(+)
...

> +static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
> +                                   void *opaque, Error **errp)
> +{
> +    NvmeCtrl *s = NVME(obj);
> +    uint8_t value, cap = 0;
> +    uint64_t pmr_cap = CAP_PMR_MASK;
> +
> +    if (!visit_type_uint8(v, name, &value, errp)) {
> +        return;
> +    }
> +
> +    cap = NVME_SMART_SPARE | NVME_SMART_TEMPERATURE | NVME_SMART_RELIABILITY
> +          | NVME_SMART_MEDIA_READ_ONLY | NVME_SMART_FAILED_VOLATILE_MEDIA;
> +    if (s->bar.cap & (pmr_cap << CAP_PMR_SHIFT)) {
> +        cap |= NVME_SMART_PMR_UNRELIABLE;
> +    }
> +
> +    if ((value & cap) != value) {
> +        error_setg(errp, "unsupported smart critical warning value");

More useful:

           error_setg(errp,
                      "unsupported smart critical warning bits: 0x%x",
                      value & ~cap);

Regardless:
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Thanks!



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning
  2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
  2021-01-14  8:23   ` Klaus Jensen
@ 2021-01-14 15:57   ` Philippe Mathieu-Daudé
  2021-01-14 22:29   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-01-14 15:57 UTC (permalink / raw)
  To: zhenwei pi, kbusch, its, kwolf, mreitz; +Cc: qemu-devel, qemu-block

On 1/14/21 8:22 AM, zhenwei pi wrote:
> During smart critical warning injection by setting property from QMP
> command, also try to trigger asynchronous event.
> 
> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
> ---
>  hw/block/nvme.c | 47 ++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 40 insertions(+), 7 deletions(-)
...
> +static void nvme_enqueue_smart_event(NvmeCtrl *n, uint8_t event)
> +{
> +    uint8_t aer_info;
> +
> +    if (!(NVME_AEC_SMART(n->features.async_config) & event)) {
> +        return;
> +    }
> +
> +    /* Ref SPEC <Asynchronous Event Information – SMART / Health Status> */

Mojibake UTF-8 encoding problem?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property
  2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
  2021-01-14  8:29   ` Klaus Jensen
  2021-01-14 15:55   ` Philippe Mathieu-Daudé
@ 2021-01-14 22:23   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2021-01-14 22:23 UTC (permalink / raw)
  To: zhenwei pi; +Cc: kwolf, qemu-block, qemu-devel, mreitz, its, philmd

On Thu, Jan 14, 2021 at 03:22:50PM +0800, zhenwei pi wrote:
> +static void nvme_get_smart_warning(Object *obj, Visitor *v, const char *name,
> +                                   void *opaque, Error **errp)
> +{
> +    NvmeCtrl *s = NVME(obj);

With only one exception, all variables of type 'NvmeCtrl' in this
program are called 'n', so let's keep that consistency please.
Otherwise, this looks fine.

> +    uint8_t value = s->smart_critical_warning;
> +
> +    visit_type_uint8(v, name, &value, errp);
> +}
> +
> +static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
> +                                   void *opaque, Error **errp)
> +{
> +    NvmeCtrl *s = NVME(obj);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning
  2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
  2021-01-14  8:23   ` Klaus Jensen
  2021-01-14 15:57   ` Philippe Mathieu-Daudé
@ 2021-01-14 22:29   ` Keith Busch
  2 siblings, 0 replies; 12+ messages in thread
From: Keith Busch @ 2021-01-14 22:29 UTC (permalink / raw)
  To: zhenwei pi; +Cc: kwolf, qemu-block, qemu-devel, mreitz, its, philmd

On Thu, Jan 14, 2021 at 03:22:51PM +0800, zhenwei pi wrote:
> @@ -2860,6 +2887,12 @@ static void nvme_set_smart_warning(Object *obj, Visitor *v, const char *name,
>      }
>  
>      s->smart_critical_warning = value;
> +
> +    /* test each bit of uint8_t for smart.critical_warning */
> +    for (event = 0; event < 8; event++) {
> +        if (value & (1 << event))
> +            nvme_enqueue_smart_event(s, 1 << event);

I think you need to save the events that have already been raised with
the host so that you don't send duplicate responses everytime a new
event is added to the 'critical_warning'.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-14 22:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14  7:22 [PATCH v3 0/4] support NVMe smart critial warning injection zhenwei pi
2021-01-14  7:22 ` [PATCH v3 1/4] block/nvme: introduce bit 5 for critical warning zhenwei pi
2021-01-14  7:22 ` [PATCH v3 2/4] hw/block/nvme: fix overwritten bar.cap zhenwei pi
2021-01-14  8:24   ` Klaus Jensen
2021-01-14  7:22 ` [PATCH v3 3/4] hw/block/nvme: add smart_critical_warning property zhenwei pi
2021-01-14  8:29   ` Klaus Jensen
2021-01-14 15:55   ` Philippe Mathieu-Daudé
2021-01-14 22:23   ` Keith Busch
2021-01-14  7:22 ` [PATCH v3 4/4] hw/blocl/nvme: trigger async event during injecting smart warning zhenwei pi
2021-01-14  8:23   ` Klaus Jensen
2021-01-14 15:57   ` Philippe Mathieu-Daudé
2021-01-14 22:29   ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).