qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Keith Busch <kbusch@kernel.org>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu-block@nongnu.org, Klaus Jensen <k.jensen@samsung.com>,
	qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection support
Date: Fri, 18 Dec 2020 10:39:01 +0100	[thread overview]
Message-ID: <X9x4tUI+QLDNTBtS@apples.localdomain> (raw)
In-Reply-To: <20201217211440.GA502315@dhcp-10-100-145-180.wdc.com>

[-- Attachment #1: Type: text/plain, Size: 3930 bytes --]

On Dec 17 13:14, Keith Busch wrote:
> On Thu, Dec 17, 2020 at 10:02:19PM +0100, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > This series adds support for extended LBAs and end-to-end data
> > protection. Marked RFC, since there are a bunch of issues that could use
> > some discussion.
> > 
> > Storing metadata bytes contiguously with the logical block data and
> > creating a physically extended logical block basically breaks the DULBE
> > and deallocation support I just added. Formatting a namespace with
> > protection information requires the app- and reftags of deallocated or
> > unwritten blocks to be 0xffff and 0xffffffff respectively; this could be
> > used to reintroduce DULBE support in that case, albeit at a somewhat
> > higher cost than the block status flag-based approach.
> > 
> > There is basically three ways of storing metadata (and maybe a forth,
> > but that is probably quite the endeavour):
> > 
> >   1. Storing metadata as extended blocks directly on the blockdev. That
> >      is the approach used in this RFC.
> > 
> >   2. Use a separate blockdev. Incidentially, this is also the easiest
> >      and most straightforward solution to support MPTR-based "separate
> >      metadata". This also allows DULBE and block deallocation to be
> >      supported using the existing approach.
> > 
> >   3. A hybrid of 1 and 2 where the metadata is stored contiguously at
> >     the end of the nvme-ns blockdev.
> > 
> > Option 1 obviously works well with DIF-based protection information and
> > extended LBAs since it maps one to one. Option 2 works flawlessly with
> > MPTR-based metadata, but extended LBAs can be "emulated" at the cost of
> > a bunch of scatter/gather operations.
> 
> Are there any actual users of extended metadata that we care about? I'm
> aware of only a few niche places that can even access an extended
> metadata format. There's not kernel support in any major OS that I know
> of.
> 

Yes, there are definitely actual users in enterprise storage. But the
main use case here is testing (using extended LBAs with SPDK for
instance).

> Option 2 sounds fine.
> 
> If option 3 means that you're still using MPTR, but just sequester space
> at the end of the backing block device for meta-data purposes, then that
> is fine too. You can even resize it dynamically if you want to support
> different metadata sizes.

Heh, I tend to think that my English vocabulary is pretty decent but I
had to look up 'sequester'. I just learned a new word today \o/

Yes. I actually also like option 3. Technically option 2 does not break
image interoperability between devices (ide, virtio), but you would
leave out a bunch of metadata that your application might depend on, so
I don't see any way to not break interoperability really.

And I would then be just fine with "emulating" extended LBAs at the cost
of more I/O. Because I would like the device to support that mode of
operation as well. We have not implemented this, but my gut feeling says
that it can be done.

> 
> > The 4th option is extending an existing image format (QCOW2) or create
> > something on top of RAW to supports metadata bytes per block. But both
> > approaches require full API support through the block layer. And
> > probably a lot of other stuff that I did not think about.
> 
> It definitely sounds appealing to push the feature to a lower level if
> you're really willing to see that through.
> 

Yes, its super appealing and I would like to have some input from the
block layer guys on this. That is, if anyone has ever explored it?

> In any case, calculating T10 CRCs is *really* slow unless you have
> special hardware and software support for it.
> 

Yeah. I know this is super slow. For for emulation and testing purposes
I think it is a nice feature for the device to optionally offer.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2020-12-18  9:44 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 21:02 [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection support Klaus Jensen
2020-12-17 21:02 ` [PATCH RFC 1/3] nvme: add support for extended LBAs Klaus Jensen
2020-12-17 21:02 ` [PATCH RFC 2/3] hw/block/nvme: refactor nvme_dma Klaus Jensen
2020-12-17 21:02 ` [PATCH RFC 3/3] hw/block/nvme: end-to-end data protection Klaus Jensen
2020-12-18 18:08   ` Keith Busch
2020-12-18 18:24     ` Klaus Jensen
2020-12-17 21:14 ` [PATCH RFC 0/3] hw/block/nvme: dif-based end-to-end data protection support Keith Busch
2020-12-18  9:39   ` Klaus Jensen [this message]
2020-12-18 17:50     ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X9x4tUI+QLDNTBtS@apples.localdomain \
    --to=its@irrelevant.dk \
    --cc=fam@euphon.net \
    --cc=k.jensen@samsung.com \
    --cc=kbusch@kernel.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).