Linux-PCI Archive on
 help / color / Atom feed
From: Bjorn Helgaas <>
To: Daniel Drake <>
Cc: Jens Axboe <>, Keith Busch <>,
	Christoph Hellwig <>,
	Linux PCI <>,
	Linux Upstreaming Team <>,
	Sagi Grimberg <>,
	linux-nvme <>,,
	Dan Williams <>,
	Alex Williamson <>
Subject: Re: [PATCH] PCI: Add Intel remapped NVMe device support
Date: Tue, 11 Jun 2019 14:52:55 -0500
Message-ID: <> (raw)
In-Reply-To: <>

On Tue, Jun 11, 2019 at 11:25:55AM +0800, Daniel Drake wrote:
> On Tue, Jun 11, 2019 at 5:16 AM Bjorn Helgaas <> wrote:
> > Ugh.  Is there a spec that details what's actually going on here?
> Unfortunately there isn't a great spec to go on.
> has some details on the VS_CAP register (section 14.2.10).
> Beyond that, Intel contributed patches to enable support for these
> devices previously:
> and stated that "The patch contents are [the spec]".

It also said (three years ago) that there was some hope of opening the
specs.  But I guess that hasn't happened.

I'd much prefer links, but unfortunately lore doesn't
seem to have linux-ide archives.  If is the best we can do,
maybe at least include Message-IDs so there's some useful info in the
event disappears.

> > I think this creates a fake PCI host bridge, but not an actual PCIe
> > Root Port, right?  I.e., "lspci" doesn't show a new Root Port device,
> > does it?
> > ...
> I appreciate your input here as I don't frequently go down to this
> level of detail with PCI. I'm trying to follow the previous
> suggestions from Christoph Hellwig, and further clarification on the
> most appropriate way to do this would be appreciated:
> "implementing a bridge driver like VMD"
> "The right way to do this would be to expose a fake PCIe root port
> that both the AHCI and NVMe driver bind to."
> I'm not completely clear regarding the difference between a PCI host
> bridge and a PCIe root port, but indeed, after my patch, when running
> lspci, you see:
> 1. The original RAID controller, now claimed by this new intel-nvme-remap driver
> 0000:00:17.0 RAID bus controller: Intel Corporation 82801 Mobile SATA
> Controller [RAID mode] (rev 30)
>     Memory at b4390000 (32-bit, non-prefetchable) [size=32K]

> 2. The RAID controller presented by intel-nvme-remap on a new bus,
> with the cfg space tweaked in a way that it gets probed & accepted by
> the ahci driver:
> 10000:00:00.0 SATA controller: Intel Corporation 82801 Mobile SATA
> Controller [RAID mode] (rev 30) (prog-if 01 [AHCI 1.0])
>     Memory at b4390000 (32-bit, non-prefetchable) [size=32K]

Exposing the same device in two different places (0000:00:17.0 and
10000:00:00.0) is definitely an architectural issue.  Logically we're
saying that accesses to b4390000 are claimed by two different devices.

> 3. The (previously inaccessible) NVMe device as presented on the new
> bus by intel-nvme-remap, probed by the nvme driver
> 10000:00:01.0 Non-Volatile memory controller: Intel Corporation Device
> 0000 (prog-if 02 [NVM Express])
>     Memory at b430c000 (64-bit, non-prefetchable) [size=16K]

From a hardware point of view, I think it *was* previously accessible.
Maybe not in a convenient, driver-bindable way, but I don't think your
patch flips any PCI_COMMAND or similar register enable bits.
Everything should have been accessible before if you knew where to

> I think Christoph's suggestion does ultimately require us to do some
> PCI pretending in some form, but let me know if there are more
> accepable ways to do this. If you'd like to see this appear more like
> a PCIe root port then I guess I can use pci-bridge-emul.c to do this,
> although having a fake root bridge appear in lspci output feels like
> I'd be doing even more pretending.

Maybe exposing a Root Port would help rationalize some of the issues,
but I wasn't suggesting that you *need* to expose a Root Port.  I was
just trying to point out that the comment inaccurately claimed you

> Also happy to experiment with alternative approaches if you have any
> suggestions? 

Why do you need these to be PCI devices?  It looks like the main thing
you get is a hook to bind the driver to.  Could you accomplish
something similar by doing some coordination between the ahci and nvme
drivers directly, without involving PCI?

I assume that whatever magic Intel is doing with this "RST Optane"
mode, the resulting platform topology is at least compliant with the
PCI spec, so all the standard things in the spec like AER, DPC, power
management, etc, still work.

> With the decreasing cost of NVMe SSDs, we're seeing an
> influx of upcoming consumer PC products that will ship with the NVMe
> disk being the only storage device, combined with the BIOS default of
> "RST Optane" mode which will prevent Linux from seeing it at all, 
> so I'm really keen to swiftly find a way forward here.

This all sounds urgent, but without details of what this "RST Optane"
mode means actually means, I don't know what to do with it.  I want to
avoid the voodoo programming of "we don't know *why* we're doing this,
but it seems to work."


  reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-10  7:44 Daniel Drake
2019-06-10 16:00 ` Keith Busch
2019-06-11  2:46   ` Daniel Drake
2019-06-12 14:32     ` Keith Busch
2019-06-13  8:54       ` Christoph Hellwig
2019-06-14  2:26         ` Daniel Drake
2019-06-14 19:36           ` Keith Busch
2019-06-14 20:05             ` Bjorn Helgaas
2019-06-14 21:05               ` Keith Busch
2019-06-18  7:48                 ` Hannes Reinecke
2019-06-18  7:46           ` Hannes Reinecke
2019-06-18  8:06             ` Daniel Drake
2019-06-18 15:15               ` Hannes Reinecke
2019-06-19 13:52                 ` Bjorn Helgaas
2019-06-10 21:16 ` Bjorn Helgaas
2019-06-11  3:25   ` Daniel Drake
2019-06-11 19:52     ` Bjorn Helgaas [this message]
2019-06-12  3:16       ` Daniel Drake
2019-06-12 13:49         ` Bjorn Helgaas

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on

Archives are clonable:
	git clone --mirror linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ \
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone