qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
To: Klaus Jensen <its@irrelevant.dk>
Cc: "Fam Zheng" <fam@euphon.net>, "Kevin Wolf" <kwolf@redhat.com>,
	"Damien Le Moal" <Damien.LeMoal@wdc.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"Niklas Cassel" <Niklas.Cassel@wdc.com>,
	"Klaus Jensen" <k.jensen@samsung.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Maxim Levitsky" <mlevitsk@redhat.com>,
	"Alistair Francis" <Alistair.Francis@wdc.com>,
	"Keith Busch" <kbusch@kernel.org>,
	"Philippe Mathieu-Daudé" <philmd@redhat.com>,
	"Matias Bjorling" <Matias.Bjorling@wdc.com>
Subject: RE: [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace Command Set
Date: Mon, 5 Oct 2020 00:53:22 +0000	[thread overview]
Message-ID: <MN2PR04MB595121FF05A5E22B027F962CE10C0@MN2PR04MB5951.namprd04.prod.outlook.com> (raw)
In-Reply-To: <20200928104258.GB33043@apples.localdomain>

> -----Original Message-----
> From: Klaus Jensen <its@irrelevant.dk>
> Sent: Monday, September 28, 2020 6:43 AM
> To: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
> Cc: Keith Busch <kbusch@kernel.org>; Klaus Jensen
> <k.jensen@samsung.com>; Kevin Wolf <kwolf@redhat.com>; Philippe
> Mathieu-Daudé <philmd@redhat.com>; Maxim Levitsky
> <mlevitsk@redhat.com>; Fam Zheng <fam@euphon.net>; Niklas Cassel
> <Niklas.Cassel@wdc.com>; Damien Le Moal <Damien.LeMoal@wdc.com>;
> qemu-block@nongnu.org; qemu-devel@nongnu.org; Alistair Francis
> <Alistair.Francis@wdc.com>; Matias Bjorling <Matias.Bjorling@wdc.com>
> Subject: Re: [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace
> Command Set
> 
> On Sep 28 11:35, Dmitry Fomichev wrote:
> > The emulation code has been changed to advertise NVM Command Set
> when
> > "zoned" device property is not set (default) and Zoned Namespace
> > Command Set otherwise.
> >
> > Handlers for three new NVMe commands introduced in Zoned Namespace
> > Command Set specification are added, namely for Zone Management
> > Receive, Zone Management Send and Zone Append.
> >
> > Device initialization code has been extended to create a proper
> > configuration for zoned operation using device properties.
> >
> > Read/Write command handler is modified to only allow writes at the
> > write pointer if the namespace is zoned. For Zone Append command,
> > writes implicitly happen at the write pointer and the starting write
> > pointer value is returned as the result of the command. Write Zeroes
> > handler is modified to add zoned checks that are identical to those
> > done as a part of Write flow.
> >
> > The code to support for Zone Descriptor Extensions is not included in
> > this commit and ZDES 0 is always reported. A later commit in this
> > series will add ZDE support.
> >
> > This commit doesn't yet include checks for active and open zone
> > limits. It is assumed that there are no limits on either active or
> > open zones.
> >
> 
> I think the fill_pattern feature stands separate, so it would be nice to
> extract that to a patch on its own.
> 
> > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
> > Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
> > Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com>
> > Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> > Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com>
> > Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com>
> > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com>
> > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > ---
> >  block/nvme.c         |   2 +-
> >  hw/block/nvme-ns.c   | 185 ++++++++-
> >  hw/block/nvme-ns.h   |   6 +-
> >  hw/block/nvme.c      | 872
> +++++++++++++++++++++++++++++++++++++++++--
> >  include/block/nvme.h |   6 +-
> >  5 files changed, 1033 insertions(+), 38 deletions(-)
> >
> > diff --git a/hw/block/nvme-ns.h b/hw/block/nvme-ns.h
> > index 04172f083e..daa13546c4 100644
> > --- a/hw/block/nvme-ns.h
> > +++ b/hw/block/nvme-ns.h
> > @@ -38,7 +38,6 @@ typedef struct NvmeZoneList {
> >
> >  typedef struct NvmeNamespaceParams {
> >      uint32_t nsid;
> > -    uint8_t  csi;
> >      bool     attached;
> >      QemuUUID uuid;
> >
> > @@ -52,6 +51,7 @@ typedef struct NvmeNamespace {
> >      DeviceState  parent_obj;
> >      BlockConf    blkconf;
> >      int32_t      bootindex;
> > +    uint8_t      csi;
> >      int64_t      size;
> >      NvmeIdNs     id_ns;
> 
> This should be squashed into the namespace types patch.
> 

Yes, thanks.

> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 63ad03d6d6..38e25a4d1f 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -54,6 +54,7 @@
> >  #include "qemu/osdep.h"
> >  #include "qemu/units.h"
> >  #include "qemu/error-report.h"
> > +#include "crypto/random.h"
> 
> I think this is not used until the offline/read-only zones injection
> patch, right?
> 

Indeed, will move.

> > +static bool nvme_finalize_zoned_write(NvmeNamespace *ns,
> NvmeRequest *req,
> > +                                      bool failed)
> > +{
> > +    NvmeRwCmd *rw = (NvmeRwCmd *)&req->cmd;
> > +    NvmeZone *zone;
> > +    uint64_t slba, start_wp = req->cqe.result64;
> > +    uint32_t nlb, zone_idx;
> > +    uint8_t zs;
> > +
> > +    if (rw->opcode != NVME_CMD_WRITE &&
> > +        rw->opcode != NVME_CMD_ZONE_APPEND &&
> > +        rw->opcode != NVME_CMD_WRITE_ZEROES) {
> > +        return false;
> > +    }
> > +
> > +    slba = le64_to_cpu(rw->slba);
> > +    nlb = le16_to_cpu(rw->nlb) + 1;
> > +    zone_idx = nvme_zone_idx(ns, slba);
> > +    assert(zone_idx < ns->num_zones);
> > +    zone = &ns->zone_array[zone_idx];
> > +
> > +    if (!failed && zone->w_ptr < start_wp + nlb) {
> > +        /*
> > +         * A preceding queued write to the zone has failed,
> > +         * now this write is not at the WP, fail it too.
> > +         */
> > +        failed = true;
> > +    }
> > +
> > +    if (failed) {
> > +        if (zone->w_ptr > start_wp) {
> > +            zone->w_ptr = start_wp;
> > +        }
> 
> It is possible (though unlikely) that you already posted the CQE for the
> write that moved the WP to w_ptr - and now you are reverting it.  This
> looks like a recipe for data corruption to me.
> 
> Take this example. I use append, because if you have multiple regular
> writes in queue you're screwed anyway.
> 
>   w_ptr = 0, d.wp = 0
>   append 1 lba  -> w_ptr = 1, start_wp = 0, issues aio A
>   append 2 lbas -> w_ptr = 3, start_wp = 1, issues aio B
> 
>   aio B success -> d.wp = 2 (since you are adding nlb),
> 
> Now, I totally do the same. Even though the zone descriptor write
> pointer gets "out of sync", it will be reconciled in the absence of
> failures and its fair to define that the host cannot expect a consistent
> view of the write pointer without quescing I/O.
> 
> The problem is if a write then fails:
> 
>   aio A fails   -> w_ptr > start_wp (3 > 1), so you revert to w_ptr = 1
> 
> That looks bad to me. I dont think this is ever reconciled? If another
> append then comes in:
> 
>   append 1 lba -> w_ptr = 2, start_wp = 1, issues aio C and overwrites
>                                            the second append from before.
>   aio C success -> d.wp = 3 (but it should be 2)
> 

Right, need to sync w_ptr and d.wp here. Good find!

> > @@ -1513,11 +2267,16 @@ static uint16_t nvme_identify_ctrl(NvmeCtrl
> *n, NvmeRequest *req)
> >  static uint16_t nvme_identify_ctrl_csi(NvmeCtrl *n, NvmeRequest *req)
> >  {
> >      NvmeIdentify *c = (NvmeIdentify *)&req->cmd;
> > +    NvmeIdCtrlZoned id = {};
> >
> >      trace_pci_nvme_identify_ctrl_csi(c->csi);
> >
> >      if (c->csi == NVME_CSI_NVM) {
> >          return nvme_rpt_empty_id_struct(n, req);
> > +    } else if (c->csi == NVME_CSI_ZONED) {
> > +        id.zasl = n->zasl;
> 
> I dont think it should overwrite the zasl value specified by the user.
> If the user specified 0, then it should return 0 for zasl here.

Not sure if I get this. The value of n->zasl is calculated based on the
setting given by the user (or by default).

> 
> > @@ -2310,16 +3086,28 @@ static int nvme_start_ctrl(NvmeCtrl *n)
> >              continue;
> >          }
> >          ns->params.attached = false;
> > -        switch (ns->params.csi) {
> > +        switch (ns->csi) {
> >          case NVME_CSI_NVM:
> >              if (NVME_CC_CSS(n->bar.cc) == CSS_NVM_ONLY ||
> >                  NVME_CC_CSS(n->bar.cc) == CSS_CSI) {
> >                  ns->params.attached = true;
> >              }
> >              break;
> > +        case NVME_CSI_ZONED:
> > +            if (NVME_CC_CSS(n->bar.cc) == CSS_CSI) {
> > +                ns->params.attached = true;
> > +            }
> > +            break;
> >          }
> >      }
> >
> > +    if (!n->zasl_bs) {
> > +        assert(n->params.mdts);
> 
> A value of 0 for MDTS is perfectly valid.

Ok, need to remove this assert.
 
> 
> > @@ -2382,10 +3170,11 @@ static void nvme_write_bar(NvmeCtrl *n,
> hwaddr offset, uint64_t data,
> >                  case CSS_NVM_ONLY:
> >                      trace_pci_nvme_css_nvm_cset_selected_by_host(data &
> >                                                                   0xffffffff);
> > -                    break;
> > +                break;
> 
> Spurious misaligned break here.

Nice catch! It's misaligned by 4, so checkpatch doesn't complain about it :)

  parent reply	other threads:[~2020-10-05  0:55 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-28  2:35 [PATCH v5 00/14] hw/block/nvme: Support Namespace Types and Zoned Namespace Command Set Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 01/14] hw/block/nvme: Report actual LBA data shift in LBAF Dmitry Fomichev
2020-09-28  8:51   ` Klaus Jensen
2020-09-28  2:35 ` [PATCH v5 02/14] hw/block/nvme: Add Commands Supported and Effects log Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 03/14] hw/block/nvme: Introduce the Namespace Types definitions Dmitry Fomichev
2020-09-30  8:08   ` Klaus Jensen
2020-09-30 15:21   ` Keith Busch
2020-09-28  2:35 ` [PATCH v5 04/14] hw/block/nvme: Define trace events related to NS Types Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 05/14] hw/block/nvme: Add support for Namespace Types Dmitry Fomichev
2020-09-30  8:15   ` Klaus Jensen
2020-09-30 12:47   ` Niklas Cassel
2020-10-01 11:22   ` Niklas Cassel
2020-10-01 15:29     ` Keith Busch
2020-10-01 15:50       ` Niklas Cassel
2020-10-01 15:59         ` Keith Busch
2020-10-01 16:23           ` Niklas Cassel
2020-10-01 17:08             ` Keith Busch
2020-10-01 22:15   ` Klaus Jensen
2020-10-01 22:30     ` Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 06/14] hw/block/nvme: Add support for active/inactive namespaces Dmitry Fomichev
2020-09-30 13:50   ` Niklas Cassel
2020-10-04 23:54     ` Dmitry Fomichev
2020-10-05 11:26       ` Niklas Cassel
2020-09-28  2:35 ` [PATCH v5 07/14] hw/block/nvme: Make Zoned NS Command Set definitions Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 08/14] hw/block/nvme: Define Zoned NS Command Set trace events Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 09/14] hw/block/nvme: Support Zoned Namespace Command Set Dmitry Fomichev
2020-09-28  6:44   ` Klaus Jensen
2020-09-28 10:42   ` Klaus Jensen
2020-09-30  5:20     ` Klaus Jensen
2020-10-05  0:53     ` Dmitry Fomichev [this message]
2020-09-30  5:59   ` Klaus Jensen
2020-10-04 23:48     ` Dmitry Fomichev
2020-09-30 14:50   ` Niklas Cassel
2020-09-30 18:23     ` Klaus Jensen
2020-10-04 23:57     ` Dmitry Fomichev
2020-10-05 11:41       ` Niklas Cassel
2020-10-05 23:08         ` Dmitry Fomichev
2020-09-30 15:12   ` Niklas Cassel
2020-09-28  2:35 ` [PATCH v5 10/14] hw/block/nvme: Introduce max active and open zone limits Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 11/14] hw/block/nvme: Support Zone Descriptor Extensions Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 12/14] hw/block/nvme: Add injection of Offline/Read-Only zones Dmitry Fomichev
2020-09-28  2:35 ` [PATCH v5 13/14] hw/block/nvme: Use zone metadata file for persistence Dmitry Fomichev
2020-09-28  7:51   ` Klaus Jensen
2020-09-29 15:43     ` Dmitry Fomichev
2020-09-29 16:46       ` Klaus Jensen
2020-09-28  2:35 ` [PATCH v5 14/14] hw/block/nvme: Document zoned parameters in usage text Dmitry Fomichev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR04MB595121FF05A5E22B027F962CE10C0@MN2PR04MB5951.namprd04.prod.outlook.com \
    --to=dmitry.fomichev@wdc.com \
    --cc=Alistair.Francis@wdc.com \
    --cc=Damien.LeMoal@wdc.com \
    --cc=Matias.Bjorling@wdc.com \
    --cc=Niklas.Cassel@wdc.com \
    --cc=fam@euphon.net \
    --cc=its@irrelevant.dk \
    --cc=k.jensen@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).