All of lore.kernel.org
 help / color / mirror / Atom feed
From: sagi@grimberg.me (Sagi Grimberg)
Subject: [RFC PATCH] nvme-pci: Bounce buffer for interleaved metadata
Date: Sun, 25 Feb 2018 19:30:48 +0200	[thread overview]
Message-ID: <5662c6d9-0c87-6074-12b8-39db53ce3c7f@grimberg.me> (raw)
In-Reply-To: <20180224000547.7252-1-keith.busch@intel.com>

Hi Keith,

> NVMe namespace formats allow the possibility for metadata as extended
> LBAs. These require the memory interleave block and metadata in a single
> virtually contiguous buffer.
> 
> The Linux block layer, however, maintains metadata and data in separate
> buffers, which is unusable for NVMe drives using interleaved metadata
> formats.

That's not specific for NVMe, I vaguely recall we had this discussion
for passthru scsi devices (in scsi target context) 5 years ago...
It makes sense for FC (and few RDMA devices) that already get
interleaved metadata from the wire to keep it as is instead of
scattering it if the backend nvme device supports interleaved mode...

I would say that this support for this is something that belongs in the
block layer. IIRC mkp also expressed interest in using preadv2/pwritev2
to for user-space to use DIF with some accounting on the iovec so maybe
we can add a flag for interleaved metadata.

> This patch will enable such formats by allocating a bounce buffer
> interleaving the block and metadata, copying the everythign into the
> buffer for writes, or from it for reads.
> 
> I dislike this feature intensely. It is incredibly slow and enough memory
> overhead to make this not very useful for reclaim, but it's possible
> people will leave me alone if the Linux nvme driver accomodated this
> format.

Not only that it will be non-useful, but probably unusable. Once upon of
time iser did bounce buffering with large contiguous atomic allocations,
it just doesn't work... especially with nvme large number of deep queues
that can host commands of MDTS bytes each.

If we end up keeping it private to nvme, the first comment I'd give you
is to avoid high-order allocations, you'll see lots of bug reports
otherwise...

  reply	other threads:[~2018-02-25 17:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-24  0:05 [RFC PATCH] nvme-pci: Bounce buffer for interleaved metadata Keith Busch
2018-02-25 17:30 ` Sagi Grimberg [this message]
2018-02-26 16:49   ` Keith Busch
2018-02-28  3:46   ` Martin K. Petersen
2018-03-01  9:22     ` Sagi Grimberg
2018-02-28  3:42 ` Martin K. Petersen
2018-02-28 16:35   ` Keith Busch
2018-02-28 16:37     ` Christoph Hellwig
2018-02-28 19:54       ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5662c6d9-0c87-6074-12b8-39db53ce3c7f@grimberg.me \
    --to=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.