All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
@ 2022-10-30 16:29 Christoph Hellwig
  2022-10-31  9:01 ` Guixin Liu
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-10-30 16:29 UTC (permalink / raw)
  To: kbusch, sagi; +Cc: linux-nvme

While the specification allows devices to either deallocate data
or to actually write zeroes on any Write Zeroes command, many SSDs
only do the sensible thing and deallocate data when the DEAC bit
is specific.  Set it when it is suppored and the caller doesn't
explicitly opt out of deallocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---

BTW, I'm tempted to slap a

Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command")

onto this.  While it doesn't fix the Write Zeroes support per se,
it should help us with a lot of the devices timing out on Write Zeroes.

 drivers/nvme/host/core.c | 6 +++++-
 drivers/nvme/host/nvme.h | 1 +
 include/linux/nvme.h     | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0090dc0b3ae6f..14568ae308fba 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -850,8 +850,11 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 	cmnd->write_zeroes.length =
 		cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
 
+	if (!(req->cmd_flags & REQ_NOUNMAP) && ns->dlfeat & (1 << 3))
+		cmnd->write_zeroes.control |= cpu_to_le16(NVME_WZ_DEAC);
+
 	if (nvme_ns_has_pi(ns)) {
-		cmnd->write_zeroes.control = cpu_to_le16(NVME_RW_PRINFO_PRACT);
+		cmnd->write_zeroes.control |= cpu_to_le16(NVME_RW_PRINFO_PRACT);
 
 		switch (ns->pi_type) {
 		case NVME_NS_DPS_PI_TYPE1:
@@ -2004,6 +2007,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 		}
 	}
 
+	ns->dlfeat = id->dlfeat;
 	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
 	set_bit(NVME_NS_READY, &ns->flags);
 	blk_mq_unfreeze_queue(ns->disk->queue);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a29877217ee65..00d3b438efe3f 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -477,6 +477,7 @@ struct nvme_ns {
 	u32 sws;
 	u8 pi_type;
 	u8 guard_type;
+	u8 dlfeat;
 #ifdef CONFIG_BLK_DEV_ZONED
 	u64 zsze;
 #endif
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 050d7d0cd81b0..c96930b2c28fe 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -963,6 +963,7 @@ enum {
 	NVME_RW_PRINFO_PRCHK_GUARD	= 1 << 12,
 	NVME_RW_PRINFO_PRACT		= 1 << 13,
 	NVME_RW_DTYPE_STREAMS		= 1 << 4,
+	NVME_WZ_DEAC			= 1 << 9,
 };
 
 struct nvme_dsm_cmd {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
@ 2022-10-31  9:01 ` Guixin Liu
  2022-10-31 14:43 ` Pankaj Raghav
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Guixin Liu @ 2022-10-31  9:01 UTC (permalink / raw)
  To: Christoph Hellwig, kbusch, sagi; +Cc: linux-nvme


在 2022/10/31 00:29, Christoph Hellwig 写道:
> While the specification allows devices to either deallocate data
> or to actually write zeroes on any Write Zeroes command, many SSDs
> only do the sensible thing and deallocate data when the DEAC bit
> is specific.  Set it when it is suppored and the caller doesn't
> explicitly opt out of deallocation.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>
> BTW, I'm tempted to slap a
>
> Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command")
>
> onto this.  While it doesn't fix the Write Zeroes support per se,
> it should help us with a lot of the devices timing out on Write Zeroes.
>
>   drivers/nvme/host/core.c | 6 +++++-
>   drivers/nvme/host/nvme.h | 1 +
>   include/linux/nvme.h     | 1 +
>   3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 0090dc0b3ae6f..14568ae308fba 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -850,8 +850,11 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
>   	cmnd->write_zeroes.length =
>   		cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
>   
> +	if (!(req->cmd_flags & REQ_NOUNMAP) && ns->dlfeat & (1 << 3))

Maybe we can add a series of macros to make the code easier to read.

Best regards,

Guixin Liu

> +		cmnd->write_zeroes.control |= cpu_to_le16(NVME_WZ_DEAC);
> +
>   	if (nvme_ns_has_pi(ns)) {
> -		cmnd->write_zeroes.control = cpu_to_le16(NVME_RW_PRINFO_PRACT);
> +		cmnd->write_zeroes.control |= cpu_to_le16(NVME_RW_PRINFO_PRACT);
>   
>   		switch (ns->pi_type) {
>   		case NVME_NS_DPS_PI_TYPE1:
> @@ -2004,6 +2007,7 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
>   		}
>   	}
>   
> +	ns->dlfeat = id->dlfeat;
>   	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
>   	set_bit(NVME_NS_READY, &ns->flags);
>   	blk_mq_unfreeze_queue(ns->disk->queue);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index a29877217ee65..00d3b438efe3f 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -477,6 +477,7 @@ struct nvme_ns {
>   	u32 sws;
>   	u8 pi_type;
>   	u8 guard_type;
> +	u8 dlfeat;
>   #ifdef CONFIG_BLK_DEV_ZONED
>   	u64 zsze;
>   #endif
> diff --git a/include/linux/nvme.h b/include/linux/nvme.h
> index 050d7d0cd81b0..c96930b2c28fe 100644
> --- a/include/linux/nvme.h
> +++ b/include/linux/nvme.h
> @@ -963,6 +963,7 @@ enum {
>   	NVME_RW_PRINFO_PRCHK_GUARD	= 1 << 12,
>   	NVME_RW_PRINFO_PRACT		= 1 << 13,
>   	NVME_RW_DTYPE_STREAMS		= 1 << 4,
> +	NVME_WZ_DEAC			= 1 << 9,
>   };
>   
>   struct nvme_dsm_cmd {


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
  2022-10-31  9:01 ` Guixin Liu
@ 2022-10-31 14:43 ` Pankaj Raghav
  2022-10-31 14:54 ` Martin K. Petersen
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Pankaj Raghav @ 2022-10-31 14:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme, Pankaj Raghav

On Sun, Oct 30, 2022 at 05:29:06PM +0100, Christoph Hellwig wrote:
> While the specification allows devices to either deallocate data
> or to actually write zeroes on any Write Zeroes command, many SSDs
> only do the sensible thing and deallocate data when the DEAC bit
> is specific.  Set it when it is suppored and the caller doesn't
> explicitly opt out of deallocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
  2022-10-31  9:01 ` Guixin Liu
  2022-10-31 14:43 ` Pankaj Raghav
@ 2022-10-31 14:54 ` Martin K. Petersen
  2022-10-31 15:07   ` Keith Busch
  2022-11-02 10:46 ` Niklas Cassel
  2022-11-02 15:56 ` Chaitanya Kulkarni
  4 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2022-10-31 14:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme


Hi Christoph!

> +	if (!(req->cmd_flags & REQ_NOUNMAP) && ns->dlfeat & (1 << 3))
> +		cmnd->write_zeroes.control |= cpu_to_le16(NVME_WZ_DEAC);

Shouldn't we also verify that DLFEAT bits 2:0 are equal to 1 to ensure
that deallocated blocks actually return zeroes?

Looks good otherwise.

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-31 14:54 ` Martin K. Petersen
@ 2022-10-31 15:07   ` Keith Busch
  2022-10-31 15:48     ` Martin K. Petersen
  2022-11-01  9:59     ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Keith Busch @ 2022-10-31 15:07 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, sagi, linux-nvme

On Mon, Oct 31, 2022 at 10:54:40AM -0400, Martin K. Petersen wrote:
> 
> Hi Christoph!
> 
> > +	if (!(req->cmd_flags & REQ_NOUNMAP) && ns->dlfeat & (1 << 3))
> > +		cmnd->write_zeroes.control |= cpu_to_le16(NVME_WZ_DEAC);
> 
> Shouldn't we also verify that DLFEAT bits 2:0 are equal to 1 to ensure
> that deallocated blocks actually return zeroes?

The device must return all 0's for the range covered by a write zeroes
command no matter what. The controller shall not deallocate the logical
blocks if it can't guarantee reading deallocated blocks will return 0's,
so I think it's okay to skip checking the lower dlfeat bits.

Minor nit on the patch: I think seeing the dlfeat value as an enum would
help.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-31 15:07   ` Keith Busch
@ 2022-10-31 15:48     ` Martin K. Petersen
  2022-11-01  9:58       ` Christoph Hellwig
  2022-11-01  9:59     ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2022-10-31 15:48 UTC (permalink / raw)
  To: Keith Busch; +Cc: Martin K. Petersen, Christoph Hellwig, sagi, linux-nvme


Keith,

> The device must return all 0's for the range covered by a write zeroes
> command no matter what. The controller shall not deallocate the
> logical blocks if it can't guarantee reading deallocated blocks will
> return 0's, so I think it's okay to skip checking the lower dlfeat
> bits.

The spec is sufficiently vague for the does-not-return-zeroes case that
I'm concerned about "shall not deallocate" being interpreted by vendors
as "free to ignore" based on past ATA/SCSI experience.

But we can always quirk it. Devices that return all-Fs are rare.

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-31 15:48     ` Martin K. Petersen
@ 2022-11-01  9:58       ` Christoph Hellwig
  2022-11-01 15:49         ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2022-11-01  9:58 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Keith Busch, Christoph Hellwig, sagi, linux-nvme

On Mon, Oct 31, 2022 at 11:48:08AM -0400, Martin K. Petersen wrote:
> 
> Keith,
> 
> > The device must return all 0's for the range covered by a write zeroes
> > command no matter what. The controller shall not deallocate the
> > logical blocks if it can't guarantee reading deallocated blocks will
> > return 0's, so I think it's okay to skip checking the lower dlfeat
> > bits.
> 
> The spec is sufficiently vague for the does-not-return-zeroes case that
> I'm concerned about "shall not deallocate" being interpreted by vendors
> as "free to ignore" based on past ATA/SCSI experience.

I think SCSI actually has the same semantics as NVMe here.  WRITE SAME
must always ensure reads return the data written using it, just like
for Write Zeros in NVMe except that NVMe forces the 0s while SCSI
is more flexible.  UNMAP in SCSI and DSM Deallocate in NVMe can also
return other values depending on the conditions.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-31 15:07   ` Keith Busch
  2022-10-31 15:48     ` Martin K. Petersen
@ 2022-11-01  9:59     ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-11-01  9:59 UTC (permalink / raw)
  To: Keith Busch; +Cc: Martin K. Petersen, Christoph Hellwig, sagi, linux-nvme

On Mon, Oct 31, 2022 at 09:07:47AM -0600, Keith Busch wrote:
> Minor nit on the patch: I think seeing the dlfeat value as an enum would
> help.

The issue here is that the bit doesn't have a name in the NVMe spec,
so making up one would be a bit odd.  But Mike is set out to eventually
name all the bits, and if that happens I'd be glad to add a name.

That's also an argument against the check if the lower flfeat bits
that Martin suggested, which shouldn't be needed but is otherwise
harmless: due to the lack of naming in dlfeat it would be really hard
to read..


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-11-01  9:58       ` Christoph Hellwig
@ 2022-11-01 15:49         ` Martin K. Petersen
  2022-11-04  7:06           ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2022-11-01 15:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, Keith Busch, sagi, linux-nvme


Christoph,

> I think SCSI actually has the same semantics as NVMe here.  WRITE SAME
> must always ensure reads return the data written using it,

Yep, but many device vendors messed that up due to the initial ambiguity
in the spec wrt. how to handle the UNMAP flag. Based on experience with
the SCSI UNMAP command being able to ignore blocks and ATA DSM TRIM
being non-deterministic, several vendors ended up not treating WRITE
SAME with the UNMAP flag set as a data integrity operation.

I was just concerned that this practice would carry over to NVMe. Bits
2:0 being set to 1 is an indication that somebody at least thought about
this scenario during device implementation. But again, devices that
return non-zero blocks after deallocate are rare and we can always quirk
them if there's a problem.

Anyway. Not a big issue as far as I'm concerned. Just tainted by the
twisted heuristics I had to deal with in SCSI...

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
                   ` (2 preceding siblings ...)
  2022-10-31 14:54 ` Martin K. Petersen
@ 2022-11-02 10:46 ` Niklas Cassel
  2022-11-02 15:56 ` Chaitanya Kulkarni
  4 siblings, 0 replies; 12+ messages in thread
From: Niklas Cassel @ 2022-11-02 10:46 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: kbusch, sagi, linux-nvme

On Sun, Oct 30, 2022 at 05:29:06PM +0100, Christoph Hellwig wrote:
> While the specification allows devices to either deallocate data
> or to actually write zeroes on any Write Zeroes command, many SSDs
> only do the sensible thing and deallocate data when the DEAC bit
> is specific.  Set it when it is suppored and the caller doesn't

Nit: supported

> explicitly opt out of deallocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
                   ` (3 preceding siblings ...)
  2022-11-02 10:46 ` Niklas Cassel
@ 2022-11-02 15:56 ` Chaitanya Kulkarni
  4 siblings, 0 replies; 12+ messages in thread
From: Chaitanya Kulkarni @ 2022-11-02 15:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-nvme, sagi, kbusch

On 10/30/22 09:29, Christoph Hellwig wrote:
> While the specification allows devices to either deallocate data
> or to actually write zeroes on any Write Zeroes command, many SSDs
> only do the sensible thing and deallocate data when the DEAC bit
> is specific.  Set it when it is suppored and the caller doesn't

's/suppored/supported'

> explicitly opt out of deallocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> 
> BTW, I'm tempted to slap a
> 
> Fixes: 6e02318eaea5 ("nvme: add support for the Write Zeroes command")
> 
> onto this.  While it doesn't fix the Write Zeroes support per se,
> it should help us with a lot of the devices timing out on Write Zeroes.
> 

Indeed it doesn't fix 6e02318eaea5 but I think with appropriate
description it should help.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] nvme: implement the DEAC bit for the Write Zeroes command
  2022-11-01 15:49         ` Martin K. Petersen
@ 2022-11-04  7:06           ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-11-04  7:06 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Christoph Hellwig, Keith Busch, sagi, linux-nvme

On Tue, Nov 01, 2022 at 11:49:06AM -0400, Martin K. Petersen wrote:
> Yep, but many device vendors messed that up due to the initial ambiguity
> in the spec wrt. how to handle the UNMAP flag. Based on experience with
> the SCSI UNMAP command being able to ignore blocks and ATA DSM TRIM
> being non-deterministic, several vendors ended up not treating WRITE
> SAME with the UNMAP flag set as a data integrity operation.

Uh.

> I was just concerned that this practice would carry over to NVMe. Bits
> 2:0 being set to 1 is an indication that somebody at least thought about
> this scenario during device implementation. But again, devices that
> return non-zero blocks after deallocate are rare and we can always quirk
> them if there's a problem.

I'm fine with adding the dlfeat check - it is trivial and if the bits
aren't set, sending the DEAC bit would be pointless anyway.  It'll just
make the people asking for names for the unnamed bits even more unhappy..


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-11-04  7:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-30 16:29 [PATCH] nvme: implement the DEAC bit for the Write Zeroes command Christoph Hellwig
2022-10-31  9:01 ` Guixin Liu
2022-10-31 14:43 ` Pankaj Raghav
2022-10-31 14:54 ` Martin K. Petersen
2022-10-31 15:07   ` Keith Busch
2022-10-31 15:48     ` Martin K. Petersen
2022-11-01  9:58       ` Christoph Hellwig
2022-11-01 15:49         ` Martin K. Petersen
2022-11-04  7:06           ` Christoph Hellwig
2022-11-01  9:59     ` Christoph Hellwig
2022-11-02 10:46 ` Niklas Cassel
2022-11-02 15:56 ` Chaitanya Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.