From: Markus Armbruster <armbru@redhat.com>
To: Klaus Jensen <its@irrelevant.dk>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
stefanha@redhat.com, qemu-devel@nongnu.org,
qemu-block@nongnu.org, mst@redhat.com
Subject: Re: making a qdev bus available from a (non-qtree?) device
Date: Fri, 21 May 2021 09:33:46 +0200 [thread overview]
Message-ID: <878s48pmlh.fsf@dusky.pond.sub.org> (raw)
In-Reply-To: <YKIQsI4F49R4hEmd@apples.localdomain> (Klaus Jensen's message of "Mon, 17 May 2021 08:44:00 +0200")
I'm about to drop off for two weeks of much-needed vacation. I meant to
study your explanation and give design advice before I leave, but I'm
out of time. Regrettable. I hope Stefan can help you. Or perhaps
Paolo. If you still have questions when I'm back, feel free to contact
me again.
Klaus Jensen <its@irrelevant.dk> writes:
> On May 12 14:02, Markus Armbruster wrote:
>>Klaus Jensen <its@irrelevant.dk> writes:
>>
>>> Hi all,
>>>
>>> I need some help with grok'ing qdev busses. Stefan, Michael - David
>>> suggested on IRC that I CC'ed you guys since you might have solved a
>>> similar issue with virtio devices. I've tried to study how that works,
>>> but I'm not exactly sure how to apply it to the issue I'm having.
>>>
>>> Currently, to support multiple namespaces on the emulated nvme device,
>>> one can do something like this:
>>>
>>> -device nvme,id=nvme-ctrl-0,serial=foo,...
>>> -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
>>> -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...
>>>
>>> The nvme device creates an 'nvme-bus' and the nvme-ns devices has
>>> dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
>>> nice overview in `info qtree`:
>>>
>>> bus: main-system-bus
>>> type System
>>> ...
>>> dev: q35-pcihost, id ""
>>> ..
>>> bus: pcie.0
>>> type PCIE
>>> ..
>>> dev: nvme, id "nvme-ctrl-0"
>>> ..
>>> bus: nvme-ctrl-0
>>> type nvme-bus
>>> dev: nvme-ns, id "nvme-ns-0"
>>> ..
>>> dev: nvme-ns, id "nvme-ns-1"
>>> ..
>>>
>>>
>>> Nice and qdevy.
>>>
>>> We have since introduced support for NVM Subsystems through an
>>> nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
>>> does not show in `info qtree`
>>
>>Yes.
>>
>>Most devices plug into a bus. DeviceClass member @bus_type specifies
>>the type of bus they plug into, and DeviceState member @parent_bus
>>points to the actual BusState. Example: PCI devices plug into a PCI
>>bus, and have ->bus_type = TYPE_PCI_BUS.
>>
>>Some devices don't. @bus_type and @parent_bus are NULL then.
>>
>>Most buses are provided by a device. BusState member @parent points to
>>the device.
>>
>>The main-system-bus isn't. Its @parent is null.
>>
>>"info qtree" only shows the qtree rooted at main-system-bus. It doesn't
>>show qtrees rooted at bus-less devices or device-less buses other than
>>main-system-bus. I doubt such buses exist.
>>
>
> Makes sense.
>
>>> (I wonder if this should actually just
>>> have been an -object?).
>>
>>Does nvme-subsys expose virtual hardware to the guest? Memory, IRQs,
>>...
>>
>>If yes, it needs to be a device.
>>
>>If no, object may be more appropriate. Tell us more about what it does.
>>
>
> It does not expose any virtual hardware. See below.
>
>>
>>> Anyway. The nvme device has a 'subsys' link
>>> parameter and we use this to manage the namespaces across the
>>> subsystem that may contain several nvme devices (controllers). The
>>> problem is that this doesnt work too well with unplugging since if the
>>> nvme device is `device_del`'ed, the nvme-ns devices on the nvme-bus
>>> are unrealized which is not what we want. We really want the
>>> namespaces to linger, preferably on an nvme-bus of the nvme-subsys
>>> device so they can be attached to other nvme devices that may show up
>>> (or already exist) in the subsystem.
>>>
>>> The core problem I'm having is that I can't seem to create an nvme-bus
>>> from the nvme-subsys device and make it available to the nvme-ns
>>> device on the command line:
>>>
>>> -device nvme-subsys,id=nvme-subsys-0,...
>>> -device nvme-ns,bus=nvme-subsys-0
>>>
>>> The above results in 'No 'nvme-bus' bus found for device 'nvme-ns',
>>> even though I do `qbus_create_inplace()` just like the nvme
>>> device. However, I *can* reparent the nvme-ns device in its realize()
>>> method, so if I instead define it like so:
>>>
>>> -device nvme-subsys,id=nvme-subsys-0,...
>>> -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
>>> -device nvme-ns,bus=nvme-ctrl-0
>>>
>>> I can then call `qdev_set_parent_bus()` and set the parent bus to the
>>> bus creates in the nvme-subsys device. This solves the problem since
>>> the namespaces are not "garbage collected" when the nvme device is
>>> removed, but it just feels wrong you know? Also, if possible, I'd of
>>> course really like to retain the nice entries in `info qtree`.
>>
>>I'm afraid I'm too ignorant on NVME to give useful advice.
>>
>>Can you give us a brief primer on the aspects of physical NVME devices
>>you'd like to model in QEMU? What are "controllers", "namespaces", and
>>"subsystems", and how do they work together?
>>
>>Once we understand the relevant aspects of physical devices, we can
>>discuss how to best model them in QEMU.
>>
>
> An "NVM Subsystem" is basically just a term to talk about a collection
> of controllers and namespaces. A namespace is just a quantity of
> non-volatile memory that the controller can use to store stuff on.
>
> Only the controller is a piece of virtual hardware. An example
> subsystem looks like this:
>
>
> +------------------+ +-----------------+
> | controller A | | controller B |
> +------------------+ +-----------------+
> +--------++--------+ +--------++-------+
> | NSID 1 || NSID 2 | | NSID 3 | NSID 2 |
> +--------++--------+ +--------++-------+
> +--------+ | +--------+ |
> | NS A | | | NS C | |
> +--------+ | +--------+ |
> | |
> +------------------------+
> |
> +--------+
> | NS B |
> +--------+
>
>
> This is the example in Figure 5 in the NVMe v1.4 specification. Here,
> we have two controllers (that we model with the 'nvme' pci-based
> device). Each controller has one "private" namespace (NS A and NS C)
> and shares one namespace (NS B). The namespace IDs are unique across
> the subsystem and are assigned by the controller when attached to a
> namespace.
>
> We use the 'nvme-ns' device (TYPE_DEVICE) to model the namespaces, and
> I guess this should could also just have been an -object, not sure if
> we can change that now. The 'nvme-ns' device mostly exist to hold the
> block backend configuration and related namespace only
> parameters. Prior to the introduction of subsystem, while we could
> have multiple controllers on the PCI bus, they could not share
> namespaces. To support this we introduced the 'nvme-subsys' device to
> allow the namespaces to be shared. This support is considered
> experimental, so I think we can get away with changing this to be an
> object.
>
> As I explained in my first mail, we attach namespaces to controllers
> through a bus. This means that even in the absence of an explicit
> "bus=..." parameter on the nvme-ns device, it will "connect" on the
> most recently defined "nvme-bus" (of the most recently defined
> controller). With subsystems we would also like to model "unattached"
> namespaces that exists solely in the subsystem (i.e. NOT attached to
> any controllers). That is why I was trying to get the nvme-ns devices
> to attach to a bus created by the "non-bus-attached" subsystem
> device. And that is what I can't do. We could add a link property to
> the nvme-ns device instead, but then the bus magic in qemu would still
> happen and the namespace would end up "attached" (in qemu terms) to a
> controller anyway - and it would complain if we defined the namespace
> device prior to defining any controller devices since no usable bus
> exist.
>
> Thanks for helping out with this!
next prev parent reply other threads:[~2021-05-21 7:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-11 18:17 making a qdev bus available from a (non-qtree?) device Klaus Jensen
2021-05-12 3:39 ` Philippe Mathieu-Daudé
2021-05-12 8:00 ` Peter Maydell
2021-05-12 12:02 ` Markus Armbruster
2021-05-13 14:02 ` Stefan Hajnoczi
2021-05-17 6:55 ` Klaus Jensen
2021-05-17 9:56 ` Stefan Hajnoczi
2021-05-17 6:44 ` Klaus Jensen
2021-05-21 7:33 ` Markus Armbruster [this message]
2021-05-21 8:48 ` Klaus Jensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878s48pmlh.fsf@dusky.pond.sub.org \
--to=armbru@redhat.com \
--cc=its@irrelevant.dk \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).