All of lore.kernel.org
 help / color / mirror / Atom feed
From: Klaus Jensen <its@irrelevant.dk>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	stefanha@redhat.com, mst@redhat.com
Subject: Re: making a qdev bus available from a (non-qtree?) device
Date: Mon, 17 May 2021 08:44:00 +0200	[thread overview]
Message-ID: <YKIQsI4F49R4hEmd@apples.localdomain> (raw)
In-Reply-To: <87im3o2m8l.fsf@dusky.pond.sub.org>

[-- Attachment #1: Type: text/plain, Size: 7531 bytes --]

On May 12 14:02, Markus Armbruster wrote:
>Klaus Jensen <its@irrelevant.dk> writes:
>
>> Hi all,
>>
>> I need some help with grok'ing qdev busses. Stefan, Michael - David
>> suggested on IRC that I CC'ed you guys since you might have solved a
>> similar issue with virtio devices. I've tried to study how that works,
>> but I'm not exactly sure how to apply it to the issue I'm having.
>>
>> Currently, to support multiple namespaces on the emulated nvme device,
>> one can do something like this:
>>
>>   -device nvme,id=nvme-ctrl-0,serial=foo,...
>>   -device nvme-ns,id=nvme-ns-0,bus=nvme-ctrl-0,...
>>   -device nvme-ns,id-nvme-ns-1,bus=nvme-ctrl-0,...
>>
>> The nvme device creates an 'nvme-bus' and the nvme-ns devices has
>> dc->bus_type = TYPE_NVME_BUS. This all works very well and provides a
>> nice overview in `info qtree`:
>>
>>   bus: main-system-bus
>>   type System
>>     ...
>>     dev: q35-pcihost, id ""
>>       ..
>>       bus: pcie.0
>> 	type PCIE
>> 	..
>> 	dev: nvme, id "nvme-ctrl-0"
>> 	  ..
>> 	  bus: nvme-ctrl-0
>> 	    type nvme-bus
>> 	    dev: nvme-ns, id "nvme-ns-0"
>> 	      ..
>> 	    dev: nvme-ns, id "nvme-ns-1"
>> 	      ..
>>
>>
>> Nice and qdevy.
>>
>> We have since introduced support for NVM Subsystems through an
>> nvme-subsys device. The nvme-subsys device is just a TYPE_DEVICE and
>> does not show in `info qtree`
>
>Yes.
>
>Most devices plug into a bus.  DeviceClass member @bus_type specifies
>the type of bus they plug into, and DeviceState member @parent_bus
>points to the actual BusState.  Example: PCI devices plug into a PCI
>bus, and have ->bus_type = TYPE_PCI_BUS.
>
>Some devices don't.  @bus_type and @parent_bus are NULL then.
>
>Most buses are provided by a device.  BusState member @parent points to
>the device.
>
>The main-system-bus isn't.  Its @parent is null.
>
>"info qtree" only shows the qtree rooted at main-system-bus.  It doesn't
>show qtrees rooted at bus-less devices or device-less buses other than
>main-system-bus.  I doubt such buses exist.
>

Makes sense.

>>                               (I wonder if this should actually just
>> have been an -object?).
>
>Does nvme-subsys expose virtual hardware to the guest?  Memory, IRQs,
>...
>
>If yes, it needs to be a device.
>
>If no, object may be more appropriate.  Tell us more about what it does.
>

It does not expose any virtual hardware. See below.

>
>>                         Anyway. The nvme device has a 'subsys' link
>> parameter and we use this to manage the namespaces across the
>> subsystem that may contain several nvme devices (controllers). The
>> problem is that this doesnt work too well with unplugging since if the
>> nvme device is `device_del`'ed, the nvme-ns devices on the nvme-bus
>> are unrealized which is not what we want. We really want the
>> namespaces to linger, preferably on an nvme-bus of the nvme-subsys
>> device so they can be attached to other nvme devices that may show up
>> (or already exist) in the subsystem.
>>
>> The core problem I'm having is that I can't seem to create an nvme-bus
>> from the nvme-subsys device and make it available to the nvme-ns
>> device on the command line:
>>
>>   -device nvme-subsys,id=nvme-subsys-0,...
>>   -device nvme-ns,bus=nvme-subsys-0
>>
>> The above results in 'No 'nvme-bus' bus found for device 'nvme-ns',
>> even though I do `qbus_create_inplace()` just like the nvme
>> device. However, I *can* reparent the nvme-ns device in its realize()
>> method, so if I instead define it like so:
>>
>>   -device nvme-subsys,id=nvme-subsys-0,...
>>   -device nvme,id=nvme-ctrl-0,subsys=nvme-subsys-0
>>   -device nvme-ns,bus=nvme-ctrl-0
>>
>> I can then call `qdev_set_parent_bus()` and set the parent bus to the
>> bus creates in the nvme-subsys device. This solves the problem since
>> the namespaces are not "garbage collected" when the nvme device is
>> removed, but it just feels wrong you know? Also, if possible, I'd of
>> course really like to retain the nice entries in `info qtree`.
>
>I'm afraid I'm too ignorant on NVME to give useful advice.
>
>Can you give us a brief primer on the aspects of physical NVME devices
>you'd like to model in QEMU?  What are "controllers", "namespaces", and
>"subsystems", and how do they work together?
>
>Once we understand the relevant aspects of physical devices, we can
>discuss how to best model them in QEMU.
>

An "NVM Subsystem" is basically just a term to talk about a collection 
of controllers and namespaces. A namespace is just a quantity of 
non-volatile memory that the controller can use to store stuff on.

Only the controller is a piece of virtual hardware. An example subsystem 
looks like this:


           +------------------+     +-----------------+
           |   controller A   |     |   controller B  |
           +------------------+     +-----------------+
           +--------++--------+     +--------++-------+
           | NSID 1 || NSID 2 |     | NSID 3 | NSID 2 |
           +--------++--------+     +--------++-------+
           +--------+    |          +--------+    |
           |  NS A  |    |          |  NS C  |    |
           +--------+    |          +--------+    |
                         |                        |
                         +------------------------+
                                      |
                                  +--------+
                                  |  NS B  |
                                  +--------+


This is the example in Figure 5 in the NVMe v1.4 specification. Here, we 
have two controllers (that we model with the 'nvme' pci-based device). 
Each controller has one "private" namespace (NS A and NS C) and shares 
one namespace (NS B). The namespace IDs are unique across the subsystem 
and are assigned by the controller when attached to a namespace.

We use the 'nvme-ns' device (TYPE_DEVICE) to model the namespaces, and I 
guess this should could also just have been an -object, not sure if we 
can change that now. The 'nvme-ns' device mostly exist to hold the block 
backend configuration and related namespace only parameters. Prior to 
the introduction of subsystem, while we could have multiple controllers 
on the PCI bus, they could not share namespaces. To support this we 
introduced the 'nvme-subsys' device to allow the namespaces to be 
shared. This support is considered experimental, so I think we can get 
away with changing this to be an object.

As I explained in my first mail, we attach namespaces to controllers 
through a bus. This means that even in the absence of an explicit 
"bus=..." parameter on the nvme-ns device, it will "connect" on the most 
recently defined "nvme-bus" (of the most recently defined controller). 
With subsystems we would also like to model "unattached" namespaces that 
exists solely in the subsystem (i.e. NOT attached to any controllers). 
That is why I was trying to get the nvme-ns devices to attach to a bus 
created by the "non-bus-attached" subsystem device. And that is what I 
can't do. We could add a link property to the nvme-ns device instead, 
but then the bus magic in qemu would still happen and the namespace 
would end up "attached" (in qemu terms) to a controller anyway - and it 
would complain if we defined the namespace device prior to defining any 
controller devices since no usable bus exist.

Thanks for helping out with this!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2021-05-17  6:50 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-11 18:17 making a qdev bus available from a (non-qtree?) device Klaus Jensen
2021-05-12  3:39 ` Philippe Mathieu-Daudé
2021-05-12  8:00   ` Peter Maydell
2021-05-12 12:02 ` Markus Armbruster
2021-05-13 14:02   ` Stefan Hajnoczi
2021-05-17  6:55     ` Klaus Jensen
2021-05-17  9:56       ` Stefan Hajnoczi
2021-05-17  6:44   ` Klaus Jensen [this message]
2021-05-21  7:33     ` Markus Armbruster
2021-05-21  8:48       ` Klaus Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YKIQsI4F49R4hEmd@apples.localdomain \
    --to=its@irrelevant.dk \
    --cc=armbru@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.