Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
From: James Smart <james.smart@broadcom.com>
To: Max Gurtovoy <maxg@mellanox.com>,
	linux-nvme@lists.infradead.org, kbusch@kernel.org, hch@lst.de,
	sagi@grimberg.me, martin.petersen@oracle.com
Cc: axboe@kernel.dk, vladimirk@mellanox.com, idanb@mellanox.com,
	israelr@mellanox.com, shlomin@mellanox.com, oren@mellanox.com
Subject: Re: [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata
Date: Tue, 21 Jan 2020 09:40:29 -0800
Message-ID: <82e2093a-5fcd-0731-7ee3-22405cfb31f6@broadcom.com> (raw)
In-Reply-To: <d7b94f4e-4a75-941f-3cf6-22001c1850a3@mellanox.com>



On 1/19/2020 3:20 AM, Max Gurtovoy wrote:
>
> On 1/17/2020 1:53 AM, James Smart wrote:
>>
>>
>> On 1/6/2020 5:37 AM, Max Gurtovoy wrote:
>>> An extended LBA is a larger LBA that is created when metadata 
>>> associated
>>> with the LBA is transferred contiguously with the LBA data (AKA
>>> interleaved). The metadata may be either transferred as part of the LBA
>>> (creating an extended LBA) or it may be transferred as a separate
>>> contiguous buffer of data. According to the NVMeoF spec, a fabrics ctrl
>>> supports only an Extended LBA format. Fail revalidation in case we 
>>> have a
>>> spec violation. Also initialize the integrity profile for the block 
>>> device
>>> for fabrics ctrl.
>>>
>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>> Signed-off-by: Israel Rukshin <israelr@mellanox.com>
>>> ---
>>>   drivers/nvme/host/core.c | 25 +++++++++++++++++++++----
>>>   1 file changed, 21 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>> index d98eb48..089cdc3c 100644
>>> --- a/drivers/nvme/host/core.c
>>> +++ b/drivers/nvme/host/core.c
>>> @@ -1818,7 +1818,7 @@ static void nvme_update_disk_info(struct 
>>> gendisk *disk,
>>>       blk_mq_unfreeze_queue(disk->queue);
>>>   }
>>>   -static void __nvme_revalidate_disk(struct gendisk *disk, struct 
>>> nvme_id_ns *id)
>>> +static int __nvme_revalidate_disk(struct gendisk *disk, struct 
>>> nvme_id_ns *id)
>>>   {
>>>       struct nvme_ns *ns = disk->private_data;
>>>   @@ -1846,11 +1846,21 @@ static void __nvme_revalidate_disk(struct 
>>> gendisk *disk, struct nvme_id_ns *id)
>>>               ns->features |= NVME_NS_EXT_LBAS;
>>>             /*
>>> +         * For Fabrics, only metadata as part of extended data LBA is
>>> +         * supported. Fail in case of a spec violation.
>>> +         */
>>> +        if (ns->ctrl->ops->flags & NVME_F_FABRICS) {
>>> +            if (WARN_ON_ONCE(!(ns->features & NVME_NS_EXT_LBAS)))
>>> +                return -EINVAL;
>>> +        }
>>> +
>>> +        /*
>>>            * For PCI, Extended logical block will be generated by the
>>>            * controller.
>>>            */
>>>           if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>>> -            if (!(ns->features & NVME_NS_EXT_LBAS))
>>> +            if (ns->ctrl->ops->flags & NVME_F_FABRICS ||
>>> +                !(ns->features & NVME_NS_EXT_LBAS))
>>>                   ns->features |= NVME_NS_DIX_SUPPORTED;
>>
>> This last change seems odd - why is DIX set if NVME_F_FABRICS ?
>>
>> Per patch description above, Fabrics spec requires metadata as an 
>> extended LBA, thus it doesn't support DIX.
>
> we refer DIX as memory domain metadata.

It's fine.  But somewhere we need to be clear that "DIX" as it's 
referenced here is relative to the OS to host port interface and is not 
"DIX" as per NVME stds definition. The flag you are setting on the ns 
features is *not* a ns attribute as read from the controller.  A comment 
should be had somewhere.


>
>>
>> Which is touches on a lot of odd things with the nvme spec as it's 
>> certainly possible for, within the os host implementation, to have 
>> the host transmitting engine to convert an OS separate DIF buf to an 
>> extended lba transmission on the wire and as presented to the 
>> controller.  Transports can certainly help make this happen - and add 
>> egress checking as the data leaves the host.    Which means - I'm not 
>> sure this hard DIX definition being implemented this way is the way 
>> to go.
>
> RDMA transport is converting separated SGLs (non-extended mode) that 
> sent by the block layer to extended mode.
>
> The idea here is to define on which conditions we'll ask the block 
> layer to set it's metadata infrastructure.
>
> for PCI - only in case of non-extended mode (in extended mode the 
> block layer will not set integrity, and the nvme driver will set the 
> PRACT/PRCHK if needed) since there is no conversion to extended mode 
> in the nvme driver.
>
> for fabrics - always ask for blk integrity setting since the transport 
> (RDMA only for now) is responsible for transferring it to extended 
> mode on the wire.

Yep agree. But there should be a comment on what is happening within the 
OS to host port interface vs what is happening per the nvme std. It 
seems to get muddled.

-- james


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply index

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-06 13:37 [PATCH 00/15 V3] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH] nvme-cli/fabrics: Add pi_enable param to connect cmd Max Gurtovoy
2020-01-06 13:37 ` [PATCH 01/15] nvme: Introduce namespace features flag Max Gurtovoy
2020-01-07 18:07   ` Keith Busch
2020-01-08 12:00     ` Max Gurtovoy
2020-01-09  3:11   ` Martin K. Petersen
2020-01-09 10:38     ` Max Gurtovoy
2020-01-09 16:26       ` Keith Busch
2020-01-12  9:40         ` Max Gurtovoy
2020-01-13 20:31           ` Keith Busch
2020-01-14 16:04             ` Max Gurtovoy
2020-01-12  9:40         ` Max Gurtovoy
2020-01-06 13:37 ` [PATCH 02/15] nvme: Enforce extended LBA format for fabrics metadata Max Gurtovoy
2020-01-16 23:53   ` James Smart
2020-01-19 11:20     ` Max Gurtovoy
2020-01-21 17:40       ` James Smart [this message]
2020-01-06 13:37 ` [PATCH 03/15] nvme: Introduce max_integrity_segments ctrl attribute Max Gurtovoy
2020-01-09  3:12   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 04/15] nvme-fabrics: Allow user enabling metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH 05/15] nvme: Introduce NVME_INLINE_PROT_SG_CNT Max Gurtovoy
2020-01-09  3:13   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 06/15] nvme-rdma: Introduce nvme_rdma_sgl structure Max Gurtovoy
2020-01-06 13:37 ` [PATCH 07/15] nvme-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-06 13:37 ` [PATCH 08/15] nvmet: Prepare metadata request Max Gurtovoy
2020-01-06 13:37 ` [PATCH 09/15] nvmet: Add metadata characteristics for a namespace Max Gurtovoy
2020-01-09  3:16   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 10/15] nvmet: Rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
2020-01-09  3:17   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 11/15] nvmet: Rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
2020-01-09  3:19   ` Martin K. Petersen
2020-01-06 13:37 ` [PATCH 12/15] nvme: Add Metadata Capabilities enumerations Max Gurtovoy
2020-01-06 13:37 ` [PATCH 13/15] nvmet: Add metadata/T10-PI support Max Gurtovoy
2020-01-09  3:24   ` Martin K. Petersen
2020-01-27 17:17     ` Max Gurtovoy
2020-01-29  2:32       ` Martin K. Petersen
2020-01-17 16:46   ` James Smart
2020-01-19 13:47     ` Max Gurtovoy
2020-01-06 13:37 ` [PATCH 14/15] nvmet: Add metadata support for block devices Max Gurtovoy
2020-01-06 13:37 ` [PATCH 15/15] nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-01-09  3:29   ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82e2093a-5fcd-0731-7ee3-22405cfb31f6@broadcom.com \
    --to=james.smart@broadcom.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=idanb@mellanox.com \
    --cc=israelr@mellanox.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=maxg@mellanox.com \
    --cc=oren@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=shlomin@mellanox.com \
    --cc=vladimirk@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git