All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Gurtovoy <maxg@mellanox.com>
To: James Smart <james.smart@broadcom.com>,
	linux-nvme@lists.infradead.org, kbusch@kernel.org, hch@lst.de,
	sagi@grimberg.me, martin.petersen@oracle.com,
	jsmart2021@gmail.com, axboe@kernel.dk
Cc: vladimirk@mellanox.com, idanb@mellanox.com, israelr@mellanox.com,
	shlomin@mellanox.com, oren@mellanox.com, nitzanc@mellanox.com
Subject: Re: [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag
Date: Thu, 7 May 2020 12:02:14 +0300	[thread overview]
Message-ID: <1b03c314-2e57-5135-6875-a2d37ecf5e20@mellanox.com> (raw)
In-Reply-To: <62322680-afeb-142e-c10b-b4f2d4c419a3@broadcom.com>


On 5/6/2020 11:44 PM, James Smart wrote:
>
>
> On 5/6/2020 1:39 AM, Max Gurtovoy wrote:
>>
>> On 5/6/2020 2:33 AM, James Smart wrote:
>>>
>>>
>>> On 5/4/2020 8:57 AM, Max Gurtovoy wrote:
>>>> This is a preparation for adding support for metadata in fabric
>>>> controllers. New flag will imply that NVMe namespace supports getting
>>>> metadata that was originally generated by host's block layer.
>>>>
>>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>> Reviewed-by: Israel Rukshin <israelr@mellanox.com>
>>>> Reviewed-by: Christoph Hellwig <hch@lst.de>
>>>> ---
>>>>   drivers/nvme/host/core.c | 41 
>>>> ++++++++++++++++++++++++++++++++++-------
>>>>   drivers/nvme/host/nvme.h |  1 +
>>>>   2 files changed, 35 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
>>>> index 1d226cc..4b7faf9 100644
>>>> --- a/drivers/nvme/host/core.c
>>>> +++ b/drivers/nvme/host/core.c
>>>> @@ -1882,13 +1882,27 @@ static void nvme_update_disk_info(struct 
>>>> gendisk *disk,
>>>>       blk_queue_io_min(disk->queue, phys_bs);
>>>>       blk_queue_io_opt(disk->queue, io_opt);
>>>>   -    if (ns->ms && !(ns->features & NVME_NS_EXT_LBAS) &&
>>>> -        (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>>>> -        nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>>> -    if ((ns->ms && !nvme_ns_has_pi(ns) && 
>>>> !blk_get_integrity(disk)) ||
>>>> -        ns->lba_shift > PAGE_SHIFT)
>>>> +    /*
>>>> +     * The block layer can't support LBA sizes larger than the 
>>>> page size
>>>> +     * yet, so catch this early and don't allow block I/O.
>>>> +     */
>>>> +    if (ns->lba_shift > PAGE_SHIFT)
>>>>           capacity = 0;
>>>>   +    /*
>>>> +     * Register a metadata profile for PI, or the plain 
>>>> non-integrity NVMe
>>>> +     * metadata masquerading as Type 0 if supported, otherwise 
>>>> reject block
>>>> +     * I/O to namespaces with metadata except when the namespace 
>>>> supports
>>>> +     * PI, as it can strip/insert in that case.
>>>> +     */
>>>> +    if (ns->ms) {
>>>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
>>>> +            (ns->features & NVME_NS_METADATA_SUPPORTED))
>>>> +            nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>>> +        else if (!nvme_ns_has_pi(ns))
>>>> +            capacity = 0;
>>>> +    }
>>>> +
>>> Look below for how I interpret the meaning of the 
>>> NVME_NS_METADATA_SUPPORTED flag. It's a rollup of several 
>>> conditions. Not all of those conditions are considered in the else 
>>> clause.
>>
>> NVME_NS_METADATA_SUPPORTED has 1 meaning:
>>
>> support getting metadata from the block layer.
>
> Well I disagree with you as several other conditions had to be true in 
> order for it to be set.
>
>>
>> Linux block is supplying only 2 separate pointers for data/metadata 
>> (aka Non-Extended mode in NVMe).
>>
>> So drivers that can't convert between the 2 will set this flag only 
>> in case their controller support separate buffer mode.
>
> Maybe I'm splitting hairs... but..  your 1 meaning is "blk sends 
> separate & transport supports separate buffer" and you missed the "& 
> controller requires separate".
>
>> It doesn't mean the controller can't generate the metadata by himself...
>
> I didn't say that it did.  And I don't see how this case would be 
> covered by this flag unless there's lots of assumptions. Minimally the 
> "& controller requires separate" would likely be false - the 
> controller would require extended LBA.  And we would need something 
> else to indicate "blk doesn't have to send me separate meta, yet I can 
> still do pi".  I know we're going outside the scope of this patch, 
> probably into patch 6.
>
>
>>
>> Maybe we move the IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) to the place 
>> where we suppose to set the NVME_NS_METADATA_SUPPORTED.
>>
>> will it make life easier:
>>
>> if (ns->features & NVME_NS_METADATA_SUPPORTED)
>>
>>     nvme_init_integrity(disk, ns->ms, ns->pi_type);
>>
>> else if (!nvme_ns_has_pi(ns))
>>
>>     capacity = 0;
>
> I think you still missed it.
>
> Let me reword this snippet with what the flag really means:
>
> if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && (ns->ctrl->ops->flags & 
> NVME_F_METADATA_SUPPORTED) && !(ns->features & NVME_NS_EXT_LBAS))
>   nvme_init_integrity(...,  ns->pi_type)
> else if (!nvme_ns_has_pi(ns))
>    capacity=0.
>
> This leaves the cases where capacity is not zero'd, thus there may be 
> io attempted to the ns:
> a kernel w/o CONFIG_BLK_DEV_INTEGRITY enabled, and the ns was 
> formatted for pi.

Controller will generate it.


> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, but 
> the transport has no idea about a separate buffer.

If the transport supports METADATA, it must have an idea about separate 
and non-separate modes.


> a kernel w/ CONFIG_BLK_DEV_INTEGRITY, the ns was formatted for pi, the 
> transport knows how to 2 do buffers, but the controller requires 
> extended LBAS.

In this point of time (patch 3/16) only PCI transport supports the 
metadata feature and it can't convert from 2 buffers to extended mode.

So in case you get a write/read command from block layer (without 
metadata of course), the core layer will see that the ns "has pi" and 
will set the PRACT bit so that the SSD controller will generate/strip 
the metadata.


>
> The 1st and last lines can be cases with pcie drives (you would hope 
> you couldn't format for pi w/o having support for it, hope no one 
> plugs a pre-formatted drive in) .
> The 1st and middle lines can be cases with fabric-attached subsystems.

for PCI we're ok ?

for fabrics, the conditions are different and not supported in this stage.



>
> Rather than resolving it in this patch, let's defer the conversation 
> to patch 6, where the snippet is modified for fabrics.  I commented 
> as, if the 2 patches were ever separated, this patch would leave holes.
>
>
>>
>>
>>
>>>
>>> The "else if" clause looks too light to address all the cases where 
>>> capacity should be set to 0. Probably shouldn't be an else.
>>> Examples:
>>> - ! IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) & meta is pi (aka 
>>> nvme_hs_has_pi)
>>> - meta is not pi (thus pi_type=0 in call to nvme_init_integrity()), 
>>> which results in !blk_get_integrity(disk) which is not checked.
>>
>> This will set dummy nop_profile and blk_get_integrity(disk) will not 
>> return NULL.
>>
>> If we init the integrity in nvme_init_integrity it will not return 
>> NULL also for type 0.
>
> Good - I was under the impression it would have.
>
>
>>
>>
>>> - meta is pi and:
>>>   - !ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED
>>>   - !ns->features & NVME_NS_EXT_LBAS
>>>
>>> may be a couple others.
>>>
>>>> set_capacity_revalidate_and_notify(disk, capacity, false);
>>>>         nvme_config_discard(disk, ns);
>>>> @@ -1923,14 +1937,27 @@ static void __nvme_revalidate_disk(struct 
>>>> gendisk *disk, struct nvme_id_ns *id)
>>>>         ns->features = 0;
>>>>       ns->ms = le16_to_cpu(id->lbaf[id->flbas & 
>>>> NVME_NS_FLBAS_LBA_MASK].ms);
>>>> -    if (ns->ms && (id->flbas & NVME_NS_FLBAS_META_EXT))
>>>> -        ns->features |= NVME_NS_EXT_LBAS;
>>>>       /* the PI implementation requires metadata equal t10 pi tuple 
>>>> size */
>>>>       if (ns->ms == sizeof(struct t10_pi_tuple))
>>>>           ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>>>>       else
>>>>           ns->pi_type = 0;
>>>>   +    if (ns->ms) {
>>>> +        if (id->flbas & NVME_NS_FLBAS_META_EXT)
>>>> +            ns->features |= NVME_NS_EXT_LBAS;
>>>> +
>>>> +        /*
>>>> +         * For PCI, Extended logical block will be generated by the
>>>> +         * controller. Non-extended format can be generated by the
>>>> +         * block layer.
>>>> +         */
>>>> +        if (ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>>>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>>>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>>
>> and here we can do:
>>
>> +        if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) && 
>> ns->ctrl->ops->flags & NVME_F_METADATA_SUPPORTED) {
>> +            if (!(ns->features & NVME_NS_EXT_LBAS))
>> +                ns->features |= NVME_NS_METADATA_SUPPORTED;
>
> it probably is better to apply it here, but it didn't change the 
> discussion above.
>
> -- james
>
>

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-07  9:02 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-04 15:57 [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add metadata/T10-PI support Max Gurtovoy
2020-05-04 15:57 ` [PATCH 01/16] block: always define struct blk_integrity in genhd.h Max Gurtovoy
2020-05-14  2:51   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 02/16] nvme: introduce namespace features flag Max Gurtovoy
2020-05-04 23:59   ` James Smart
2020-05-14  2:52   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy
2020-05-05 23:33   ` James Smart
2020-05-06  8:39     ` Max Gurtovoy
2020-05-06 20:44       ` James Smart
2020-05-07  9:02         ` Max Gurtovoy [this message]
2020-05-11 23:50           ` James Smart
2020-05-13 18:18             ` Christoph Hellwig
2020-05-13 19:53               ` James Smart
2020-05-14  2:53   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 04/16] nvme: make nvme_ns_has_pi accessible to transports Max Gurtovoy
2020-05-14  2:53   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 05/16] nvme: introduce max_integrity_segments ctrl attribute Max Gurtovoy
2020-05-05 23:51   ` James Smart
2020-05-06  7:08     ` Christoph Hellwig
2020-05-13 19:04   ` James Smart
2020-05-04 15:57 ` [PATCH 06/16] nvme: enforce extended LBA format for fabrics metadata Max Gurtovoy
2020-05-13 19:03   ` James Smart
2020-05-14  2:56     ` Martin K. Petersen
2020-05-14  8:28       ` Max Gurtovoy
2020-05-14  8:15     ` Max Gurtovoy
2020-05-04 15:57 ` [PATCH 07/16] nvme: introduce NVME_INLINE_METADATA_SG_CNT Max Gurtovoy
2020-05-13 19:05   ` James Smart
2020-05-04 15:57 ` [PATCH 08/16] nvme-rdma: introduce nvme_rdma_sgl structure Max Gurtovoy
2020-05-04 15:57 ` [PATCH 09/16] nvme-rdma: add metadata/T10-PI support Max Gurtovoy
2020-05-05  6:12   ` Christoph Hellwig
2020-05-14  3:02   ` Martin K. Petersen
2020-05-14  8:48     ` Max Gurtovoy
2020-05-14 22:40       ` Martin K. Petersen
2020-05-15 14:50         ` Max Gurtovoy
2020-05-18 17:22           ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 10/16] nvmet: add metadata characteristics for a namespace Max Gurtovoy
2020-05-13 19:25   ` James Smart
2020-05-14  3:06     ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 11/16] nvmet: rename nvmet_rw_len to nvmet_rw_data_len Max Gurtovoy
2020-05-13 19:25   ` James Smart
2020-05-04 15:57 ` [PATCH 12/16] nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len Max Gurtovoy
2020-05-13 19:27   ` James Smart
2020-05-04 15:57 ` [PATCH 13/16] nvme: add Metadata Capabilities enumerations Max Gurtovoy
2020-05-13 19:27   ` James Smart
2020-05-14  3:07   ` Martin K. Petersen
2020-05-04 15:57 ` [PATCH 14/16] nvmet: add metadata/T10-PI support Max Gurtovoy
2020-05-13 19:51   ` James Smart
2020-05-14 15:09     ` Max Gurtovoy
2020-05-14 15:37       ` James Smart
2020-05-04 15:57 ` [PATCH 15/16] nvmet: add metadata support for block devices Max Gurtovoy
2020-05-04 15:57 ` [PATCH 16/16] nvmet-rdma: add metadata/T10-PI support Max Gurtovoy
2020-05-14  3:10   ` Martin K. Petersen
2020-05-14  8:55     ` Max Gurtovoy
2020-05-05  6:13 ` [PATCH 00/16 v7] nvme-rdma/nvmet-rdma: Add " Christoph Hellwig
2020-05-14 15:55   ` Max Gurtovoy
2020-05-19 14:05 [PATCH 00/16 v8] " Max Gurtovoy
2020-05-19 14:05 ` [PATCH 03/16] nvme: introduce NVME_NS_METADATA_SUPPORTED flag Max Gurtovoy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b03c314-2e57-5135-6875-a2d37ecf5e20@mellanox.com \
    --to=maxg@mellanox.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=idanb@mellanox.com \
    --cc=israelr@mellanox.com \
    --cc=james.smart@broadcom.com \
    --cc=jsmart2021@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=martin.petersen@oracle.com \
    --cc=nitzanc@mellanox.com \
    --cc=oren@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=shlomin@mellanox.com \
    --cc=vladimirk@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.