All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] hw/block/nvme: fix zone write finalize
@ 2021-01-12  9:42 Klaus Jensen
  2021-01-14 23:03 ` Keith Busch
  2021-01-20  9:14 ` Klaus Jensen
  0 siblings, 2 replies; 4+ messages in thread
From: Klaus Jensen @ 2021-01-12  9:42 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Dmitry Fomichev, Klaus Jensen, Max Reitz,
	Klaus Jensen, Keith Busch

From: Klaus Jensen <k.jensen@samsung.com>

The zone write pointer is unconditionally advanced, even for write
faults. Make sure that the zone is always transitioned to Full if the
write pointer reaches zone capacity.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
---
 hw/block/nvme.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 0854ee307227..280b31b4459d 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
     nlb = le16_to_cpu(rw->nlb) + 1;
     zone = nvme_get_zone_by_slba(ns, slba);
 
+    zone->d.wp += nlb;
+
     if (failed) {
         res->slba = 0;
-        zone->d.wp += nlb;
-    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
+    }
+
+    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
         switch (nvme_get_zone_state(zone)) {
         case NVME_ZONE_STATE_IMPLICITLY_OPEN:
         case NVME_ZONE_STATE_EXPLICITLY_OPEN:
@@ -1288,9 +1291,6 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
         default:
             assert(false);
         }
-        zone->d.wp = zone->w_ptr;
-    } else {
-        zone->d.wp += nlb;
     }
 }
 
-- 
2.30.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] hw/block/nvme: fix zone write finalize
  2021-01-12  9:42 [PATCH] hw/block/nvme: fix zone write finalize Klaus Jensen
@ 2021-01-14 23:03 ` Keith Busch
  2021-01-15  6:01   ` Klaus Jensen
  2021-01-20  9:14 ` Klaus Jensen
  1 sibling, 1 reply; 4+ messages in thread
From: Keith Busch @ 2021-01-14 23:03 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: Kevin Wolf, qemu-block, Dmitry Fomichev, Klaus Jensen,
	qemu-devel, Max Reitz

On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The zone write pointer is unconditionally advanced, even for write
> faults. Make sure that the zone is always transitioned to Full if the
> write pointer reaches zone capacity.

Looks like some spec weirdness. It says we can transition to full:

  b) as a result of successful completion of a write operation that
     writes one or more logical blocks that causes the zone to reach its
     writeable zone capacity;

But a failed write also advances the write pointer as you've pointed
out, so they might want to strike "successful".

Looks fine to me.

Reviewed-by: Keith Busch <kbusch@kernel.org>

> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
>  hw/block/nvme.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 0854ee307227..280b31b4459d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>      nlb = le16_to_cpu(rw->nlb) + 1;
>      zone = nvme_get_zone_by_slba(ns, slba);
>  
> +    zone->d.wp += nlb;
> +
>      if (failed) {
>          res->slba = 0;
> -        zone->d.wp += nlb;
> -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> +    }
> +
> +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {

The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
As far as I can tell, this difference will mean the zone won't finalize
until the last write completes, where before it could finalize after the
zone's last write is submitted. Either way looks okay, but I think these
two values ought to always be in sync.

>          switch (nvme_get_zone_state(zone)) {
>          case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>          case NVME_ZONE_STATE_EXPLICITLY_OPEN:
> @@ -1288,9 +1291,6 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>          default:
>              assert(false);
>          }
> -        zone->d.wp = zone->w_ptr;
> -    } else {
> -        zone->d.wp += nlb;
>      }
>  }


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] hw/block/nvme: fix zone write finalize
  2021-01-14 23:03 ` Keith Busch
@ 2021-01-15  6:01   ` Klaus Jensen
  0 siblings, 0 replies; 4+ messages in thread
From: Klaus Jensen @ 2021-01-15  6:01 UTC (permalink / raw)
  To: Keith Busch; +Cc: Kevin Wolf, Klaus Jensen, qemu-devel, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 3430 bytes --]

On Jan 14 15:03, Keith Busch wrote:
> On Tue, Jan 12, 2021 at 10:42:35AM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > The zone write pointer is unconditionally advanced, even for write
> > faults. Make sure that the zone is always transitioned to Full if the
> > write pointer reaches zone capacity.
> 
> Looks like some spec weirdness. It says we can transition to full:
> 
>   b) as a result of successful completion of a write operation that
>      writes one or more logical blocks that causes the zone to reach its
>      writeable zone capacity;
> 
> But a failed write also advances the write pointer as you've pointed
> out, so they might want to strike "successful".
> 

Yes, its slightly inconsistent.

Empty/Closed to Opened is also kinda fuzzy in the spec. The wording is
slightly different when transitioning from Empty and Closed to
Implicitly Opened. ZSE->ZSIO just says "and a write operation writes on
or more logical blocks of that zone". ZSC->ZSIO says "and a write
operation that writes one or more logical blocks to the zone completes
succesfully". This has given me some headaches on if the transition
should occur when "preparing/submitting" the write or when the write
completes. But I guess this is pretty much an implementation detail of
the device.

I should mention this to my representatives ;)

> Looks fine to me.
> 
> Reviewed-by: Keith Busch <kbusch@kernel.org>
> 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > ---
> >  hw/block/nvme.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index 0854ee307227..280b31b4459d 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
> >      nlb = le16_to_cpu(rw->nlb) + 1;
> >      zone = nvme_get_zone_by_slba(ns, slba);
> >  
> > +    zone->d.wp += nlb;
> > +
> >      if (failed) {
> >          res->slba = 0;
> > -        zone->d.wp += nlb;
> > -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> > +    }
> > +
> > +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
> 
> The previous check was using 'zone->w_ptr', but now it's 'zone->d.wp'.
> As far as I can tell, this difference will mean the zone won't finalize
> until the last write completes, where before it could finalize after the
> zone's last write is submitted. Either way looks okay, but I think these
> two values ought to always be in sync.
> 

Right - the zone will transition to Full on the last completed write. I
really not sure if this models devices better or not. But I would argue
that a real device would not relinquish resources until the last write
completes.

An issue is that for QD>1 (zone append), we can easily end up with
appends A and B completing in reversed order such that d.wp is advanced
by B_nlb, but we left a "hole" of A_nlb in the zone because that write
has not completed yet.

But - I think *some* inconsistency on the write pointer value is to be
expected when using Zone Append. The host can only ever depend on a
consistent value of the write pointer by quiescing I/O to the zone and
then getting a zone report - and d.wp and d.w_ptr will always
converge in that case.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] hw/block/nvme: fix zone write finalize
  2021-01-12  9:42 [PATCH] hw/block/nvme: fix zone write finalize Klaus Jensen
  2021-01-14 23:03 ` Keith Busch
@ 2021-01-20  9:14 ` Klaus Jensen
  1 sibling, 0 replies; 4+ messages in thread
From: Klaus Jensen @ 2021-01-20  9:14 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Klaus Jensen, Keith Busch, qemu-block, Max Reitz

[-- Attachment #1: Type: text/plain, Size: 1510 bytes --]

On Jan 12 10:42, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> The zone write pointer is unconditionally advanced, even for write
> faults. Make sure that the zone is always transitioned to Full if the
> write pointer reaches zone capacity.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> Cc: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> ---
>  hw/block/nvme.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 0854ee307227..280b31b4459d 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -1268,10 +1268,13 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>      nlb = le16_to_cpu(rw->nlb) + 1;
>      zone = nvme_get_zone_by_slba(ns, slba);
>  
> +    zone->d.wp += nlb;
> +
>      if (failed) {
>          res->slba = 0;
> -        zone->d.wp += nlb;
> -    } else if (zone->w_ptr == nvme_zone_wr_boundary(zone)) {
> +    }
> +
> +    if (zone->d.wp == nvme_zone_wr_boundary(zone)) {
>          switch (nvme_get_zone_state(zone)) {
>          case NVME_ZONE_STATE_IMPLICITLY_OPEN:
>          case NVME_ZONE_STATE_EXPLICITLY_OPEN:
> @@ -1288,9 +1291,6 @@ static void nvme_finalize_zoned_write(NvmeNamespace *ns, NvmeRequest *req,
>          default:
>              assert(false);
>          }
> -        zone->d.wp = zone->w_ptr;
> -    } else {
> -        zone->d.wp += nlb;
>      }
>  }
>  

Applied to nvme-next.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-01-20  9:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-12  9:42 [PATCH] hw/block/nvme: fix zone write finalize Klaus Jensen
2021-01-14 23:03 ` Keith Busch
2021-01-15  6:01   ` Klaus Jensen
2021-01-20  9:14 ` Klaus Jensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.