linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Max Gurtovoy <maxg@mellanox.com>
Cc: axboe@kernel.dk, keith.busch@intel.com, sagi@grimberg.me,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	israelr@mellanox.com, linux-nvme@lists.infradead.org,
	linux-block@vger.kernel.org, shlomin@mellanox.com, hch@lst.de
Subject: Re: [PATCH v4 1/3] block: centralize PI remapping logic to the block layer
Date: Mon, 09 Sep 2019 22:29:34 -0400	[thread overview]
Message-ID: <yq1d0g8hoj5.fsf@oracle.com> (raw)
In-Reply-To: <d6cfe6e5-508a-f01c-267d-c8009fafc571@mellanox.com> (Max Gurtovoy's message of "Mon, 9 Sep 2019 16:55:57 +0300")


Max,

> maybe we can add profiles to type0 and type2 in the future and have
> more readable code.

It's a deliberate feature that we treat DIX Type 0, 1, and 2 the
same. It's very common to mix and match legacy drives, T10 PI Type 1,
and T10 PI Type 2 devices in a system. In order for MD/DM stacking,
multipathing, etc. to work, it is important that all devices share the
same protection format, interpretation of the tags, etc.

Type 2, where the ref tag can be different from the LBA, was designed
exclusively for use inside disk arrays where the array firmware is the
sole entity accessing blocks on media. And thus always knows what the
expected ref tag should be for a given LBA (typically the LUN LBA as
seen by the host interface and not the target LBA on the back-end
drive).

For Linux, however, where we need to support dd'ing from the device node
without any knowledge an application or filesystem may have about the
written PI, it's imperative that the reference tag is something
predictable. Therefore it is deliberate that we always use the LBA (or
a derivative thereof for the smaller intervals) for the reference tag.
Even if T10 PI Type 2 in theory allows for the tag to be an arbitrary
number. But Linux is a general purpose OS and not an array controller
firmware. So we can't really leverage that capability.

Also. Take MD, for instance. The same I/O could be going to a mirror of
Type 1 and Type 2 devices. We obviously can't have two different types
of PI hanging off a bio. Nor do we have the capability to handle
arbitrary MD/DM stacking with PI format properties potentially changing
many times within the block range constituting a single I/O.

That's why we have the integrity profile which describes a common block
layer PI format that's somewhat orthogonal to how the underlying device
is formatted.

There are a couple of warts in that department. One is the IP checksum
which is now mostly a legacy thing and not implemented/relevant for
NVMe. The other is Type 3 devices that need special care and
feeding. But Type 3 does not appear to be actively used by anyone
anymore. We recently discovered that it's completely broken in the NVMe
spec and nobody ever noticed. And I don't think it was ever used
as-written in SCSI (Type 3 was an attempt to standardize a particular
vendor's existing, proprietary format).

Anyway. So my take on all this is that the T10-DIF-TYPE1-CRC profile is
"it" and everything else is legacy.

> I think I'll prepare dummy/empty callbacks for type3 and for nop
> profiles instead of setting it to NULL.
>
> agreed ?

Sure. Whatever works.

-- 
Martin K. Petersen	Oracle Linux Engineering

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2019-09-10  2:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-08 15:26 [PATCH v4 1/3] block: centralize PI remapping logic to the block layer Max Gurtovoy
2019-09-08 15:26 ` [PATCH v4 2/3] block: don't remap ref tag for T10 PI type 0 Max Gurtovoy
2019-09-09  2:22   ` Martin K. Petersen
2019-09-09  2:36     ` Keith Busch
2019-09-09  2:49       ` Martin K. Petersen
2019-09-09 13:31         ` Max Gurtovoy
2019-09-08 15:26 ` [PATCH v4 3/3] nvme: remove PI values definition from NVMe subsystem Max Gurtovoy
2019-09-09  2:21 ` [PATCH v4 1/3] block: centralize PI remapping logic to the block layer Martin K. Petersen
2019-09-09 13:55   ` Max Gurtovoy
2019-09-10  2:29     ` Martin K. Petersen [this message]
2019-09-10 22:27       ` Max Gurtovoy
2019-09-11  1:16         ` Martin K. Petersen
2019-09-11  9:12           ` Max Gurtovoy
2019-09-13 22:20             ` Martin K. Petersen
2019-09-16  8:03               ` Christoph Hellwig
2019-09-16 17:19                 ` Martin K. Petersen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=yq1d0g8hoj5.fsf@oracle.com \
    --to=martin.petersen@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=israelr@mellanox.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=maxg@mellanox.com \
    --cc=sagi@grimberg.me \
    --cc=shlomin@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).