qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Markus Armbruster <armbru@redhat.com>
To: Klaus Jensen <its@irrelevant.dk>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	stefanha@redhat.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, mst@redhat.com
Subject: Re: making a qdev bus available from a (non-qtree?) device
Date: Fri, 21 May 2021 09:33:46 +0200	[thread overview]
Message-ID: <878s48pmlh.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <YKIQsI4F49R4hEmd@apples.localdomain> (Klaus Jensen's message of "Mon, 17 May 2021 08:44:00 +0200")

I'm about to drop off for two weeks of much-needed vacation.  I meant to
study your explanation and give design advice before I leave, but I'm
out of time.  Regrettable.  I hope Stefan can help you.  Or perhaps
Paolo.  If you still have questions when I'm back, feel free to contact
me again.

Klaus Jensen <its@irrelevant.dk> writes:

> On May 12 14:02, Markus Armbruster wrote:
>>Klaus Jensen <its@irrelevant.dk> writes:
>>
>>> Hi all,
>>>
>>> I need some help with grok'ing qdev busses. Stefan, Michael - David
>>> suggested on IRC that I CC'ed you guys since you might have solved a
>>> similar issue with virtio devices. I've tried to study how that works,
>>> but I'm not exactly sure how to apply it to the issue I'm having.
>>>
>>> Currently, to support multiple namespaces on the emulated nvme device,
>>> one can do something like this:
>>>
>>>   -device nvme,id=nvme-ctrl-0,serial=foo,...
>>>   -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
>>>   -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...
>>>
>>> The nvme device creates an 'nvme-bus' and the nvme-ns devices has
>>> dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
>>> nice overview in `info qtree`:
>>>
>>>   bus: main-system-bus
>>>   type System
>>>     ...
>>>     dev: q35-pcihost, id ""
>>>       ..
>>>       bus: pcie.0
>>> 	type PCIE
>>> 	..
>>> 	dev: nvme, id "nvme-ctrl-0"
>>> 	  ..
>>> 	  bus: nvme-ctrl-0
>>> 	    type nvme-bus
>>> 	    dev: nvme-ns, id "nvme-ns-0"
>>> 	      ..
>>> 	    dev: nvme-ns, id "nvme-ns-1"
>>> 	      ..
>>>
>>>
>>> Nice and qdevy.
>>>
>>> We have since introduced support for NVM Subsystems through an
>>> nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
>>> does not show in `info qtree`
>>
>>Yes.
>>
>>Most devices plug into a bus.  DeviceClass member @bus_type specifies
>>the type of bus they plug into, and DeviceState member @parent_bus
>>points to the actual BusState.  Example: PCI devices plug into a PCI
>>bus, and have ->bus_type = TYPE_PCI_BUS.
>>
>>Some devices don't.  @bus_type and @parent_bus are NULL then.
>>
>>Most buses are provided by a device.  BusState member @parent points to
>>the device.
>>
>>The main-system-bus isn't.  Its @parent is null.
>>
>>"info qtree" only shows the qtree rooted at main-system-bus.  It doesn't
>>show qtrees rooted at bus-less devices or device-less buses other than
>>main-system-bus.  I doubt such buses exist.
>>
>
> Makes sense.
>
>>>                               (I wonder if this should actually just
>>> have been an -object?).
>>
>>Does nvme-subsys expose virtual hardware to the guest?  Memory, IRQs,
>>...
>>
>>If yes, it needs to be a device.
>>
>>If no, object may be more appropriate.  Tell us more about what it does.
>>
>
> It does not expose any virtual hardware. See below.
>
>>
>>>                         Anyway. The nvme device has a 'subsys' link
>>> parameter and we use this to manage the namespaces across the
>>> subsystem that may contain several nvme devices (controllers). The
>>> problem is that this doesnt work too well with unplugging since if the
>>> nvme device is `device_del`'ed, the nvme-ns devices on the nvme-bus
>>> are unrealized which is not what we want. We really want the
>>> namespaces to linger, preferably on an nvme-bus of the nvme-subsys
>>> device so they can be attached to other nvme devices that may show up
>>> (or already exist) in the subsystem.
>>>
>>> The core problem I'm having is that I can't seem to create an nvme-bus
>>> from the nvme-subsys device and make it available to the nvme-ns
>>> device on the command line:
>>>
>>>   -device nvme-subsys,id=nvme-subsys-0,...
>>>   -device nvme-ns,bus=nvme-subsys-0
>>>
>>> The above results in 'No 'nvme-bus' bus found for device 'nvme-ns',
>>> even though I do `qbus_create_inplace()` just like the nvme
>>> device. However, I *can* reparent the nvme-ns device in its realize()
>>> method, so if I instead define it like so:
>>>
>>>   -device nvme-subsys,id=nvme-subsys-0,...
>>>   -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
>>>   -device nvme-ns,bus=nvme-ctrl-0
>>>
>>> I can then call `qdev_set_parent_bus()` and set the parent bus to the
>>> bus creates in the nvme-subsys device. This solves the problem since
>>> the namespaces are not "garbage collected" when the nvme device is
>>> removed, but it just feels wrong you know? Also, if possible, I'd of
>>> course really like to retain the nice entries in `info qtree`.
>>
>>I'm afraid I'm too ignorant on NVME to give useful advice.
>>
>>Can you give us a brief primer on the aspects of physical NVME devices
>>you'd like to model in QEMU?  What are "controllers", "namespaces", and
>>"subsystems", and how do they work together?
>>
>>Once we understand the relevant aspects of physical devices, we can
>>discuss how to best model them in QEMU.
>>
>
> An "NVM Subsystem" is basically just a term to talk about a collection
> of controllers and namespaces. A namespace is just a quantity of 
> non-volatile memory that the controller can use to store stuff on.
>
> Only the controller is a piece of virtual hardware. An example
> subsystem looks like this:
>
>
>           +------------------+     +-----------------+
>           |   controller A   |     |   controller B  |
>           +------------------+     +-----------------+
>           +--------++--------+     +--------++-------+
>           | NSID 1 || NSID 2 |     | NSID 3 | NSID 2 |
>           +--------++--------+     +--------++-------+
>           +--------+    |          +--------+    |
>           |  NS A  |    |          |  NS C  |    |
>           +--------+    |          +--------+    |
>                         |                        |
>                         +------------------------+
>                                      |
>                                  +--------+
>                                  |  NS B  |
>                                  +--------+
>
>
> This is the example in Figure 5 in the NVMe v1.4 specification. Here,
> we have two controllers (that we model with the 'nvme' pci-based
> device). Each controller has one "private" namespace (NS A and NS C)
> and shares one namespace (NS B). The namespace IDs are unique across
> the subsystem and are assigned by the controller when attached to a
> namespace.
>
> We use the 'nvme-ns' device (TYPE_DEVICE) to model the namespaces, and
> I guess this should could also just have been an -object, not sure if
> we can change that now. The 'nvme-ns' device mostly exist to hold the
> block backend configuration and related namespace only
> parameters. Prior to the introduction of subsystem, while we could
> have multiple controllers on the PCI bus, they could not share
> namespaces. To support this we introduced the 'nvme-subsys' device to
> allow the namespaces to be shared. This support is considered
> experimental, so I think we can get away with changing this to be an
> object.
>
> As I explained in my first mail, we attach namespaces to controllers
> through a bus. This means that even in the absence of an explicit 
> "bus=..." parameter on the nvme-ns device, it will "connect" on the
> most recently defined "nvme-bus" (of the most recently defined
> controller). With subsystems we would also like to model "unattached"
> namespaces that exists solely in the subsystem (i.e. NOT attached to
> any controllers). That is why I was trying to get the nvme-ns devices
> to attach to a bus created by the "non-bus-attached" subsystem
> device. And that is what I can't do. We could add a link property to
> the nvme-ns device instead, but then the bus magic in qemu would still
> happen and the namespace would end up "attached" (in qemu terms) to a
> controller anyway - and it would complain if we defined the namespace
> device prior to defining any controller devices since no usable bus
> exist.
>
> Thanks for helping out with this!



  reply	other threads:[~2021-05-21  7:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 18:17 making a qdev bus available from a (non-qtree?) device Klaus Jensen
2021-05-12  3:39 ` Philippe Mathieu-Daudé
2021-05-12  8:00   ` Peter Maydell
2021-05-12 12:02 ` Markus Armbruster
2021-05-13 14:02   ` Stefan Hajnoczi
2021-05-17  6:55     ` Klaus Jensen
2021-05-17  9:56       ` Stefan Hajnoczi
2021-05-17  6:44   ` Klaus Jensen
2021-05-21  7:33     ` Markus Armbruster [this message]
2021-05-21  8:48       ` Klaus Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878s48pmlh.fsf@dusky.pond.sub.org \
    --to=armbru@redhat.com \
    --cc=its@irrelevant.dk \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).