On 07.07.23 16:42, Roger Pau Monné wrote: > On Fri, Jul 07, 2023 at 04:10:14PM +0200, Juergen Gross wrote: >> On 07.07.23 11:50, Roger Pau Monné wrote: >>> On Fri, Jul 07, 2023 at 06:38:48AM +0200, Juergen Gross wrote: >>>> On 06.07.23 23:49, Stefano Stabellini wrote: >>>>> On Thu, 6 Jul 2023, Roger Pau Monné wrote: >>>>>> On Wed, Jul 05, 2023 at 03:41:10PM -0700, Stefano Stabellini wrote: >>>>>>> On Wed, 5 Jul 2023, Roger Pau Monné wrote: >>>>>>>> On Tue, Jul 04, 2023 at 08:14:59PM +0300, Oleksandr Tyshchenko wrote: >>>>>>>>> Part 2 (clarification): >>>>>>>>> >>>>>>>>> I think using a special config space register in the root complex would >>>>>>>>> not be terrible in terms of guest changes because it is easy to >>>>>>>>> introduce a new root complex driver in Linux and other OSes. The root >>>>>>>>> complex would still be ECAM compatible so the regular ECAM driver would >>>>>>>>> still work. A new driver would only be necessary if you want to be able >>>>>>>>> to access the special config space register. >>>>>>>> >>>>>>>> I'm slightly worry of this approach, we end up modifying a root >>>>>>>> complex emulation in order to avoid modifying a PCI device emulation >>>>>>>> on QEMU, not sure that's a good trade off. >>>>>>>> >>>>>>>> Note also that different architectures will likely have different root >>>>>>>> complex, and so you might need to modify several of them, plus then >>>>>>>> arrange the PCI layout correctly in order to have the proper hierarchy >>>>>>>> so that devices belonging to different driver domains are assigned to >>>>>>>> different bridges. >>>>>>> >>>>>>> I do think that adding something to the PCI conf register somewhere is >>>>>>> the best option because it is not dependent on ACPI and it is not >>>>>>> dependent on xenstore both of which are very undesirable. >>>>>>> >>>>>>> I am not sure where specifically is the best place. These are 3 ideas >>>>>>> we came up with: >>>>>>> 1. PCI root complex >>>>>>> 2. a register on the device itself >>>>>>> 3. a new capability of the device >>>>>>> 4. add one extra dummy PCI device for the sole purpose of exposing the >>>>>>> grants capability >>>>>>> >>>>>>> >>>>>>> Looking at the spec, there is a way to add a vendor-specific capability >>>>>>> (cap_vndr = 0x9). Could we use that? It doesn't look like it is used >>>>>>> today, Linux doesn't parse it. >>>>>> >>>>>> I did wonder the same from a quick look at the spec. There's however >>>>>> a text in the specification that says: >>>>>> >>>>>> "The driver SHOULD NOT use the Vendor data capability except for >>>>>> debugging and reporting purposes." >>>>>> >>>>>> So we would at least need to change that because the capability would >>>>>> then be used by other purposes different than debugging and reporting. >>>>>> >>>>>> Seems like a minor adjustment, so might we worth asking upstream about >>>>>> their opinion, and to get a conversation started. >>>>> >>>>> Wait, wouldn't this use-case fall under "reporting" ? It is exactly what >>>>> we are doing, right? >>>> >>>> I'd understand "reporting" as e.g. logging, transferring statistics, ... >>>> >>>> We'd like to use it for configuration purposes. >>> >>> I've also read it that way. >>> >>>> Another idea would be to enhance the virtio IOMMU device to suit our needs: >>>> we could add the domid as another virtio IOMMU device capability and (for now) >>>> use bypass mode for all "productive" devices. >>> >>> If we have to start adding capabilties, won't it be easier to just add >>> it to the each device instead of adding it to virtio IOMMU. Or is the >>> parsing of capabilities device specific, and hence we would have to >>> implement such parsing for each device? I would expect some >>> capabilities are shared between all devices, and a Xen capability could >>> be one of those. >> >> Have a look at [1], which is describing the common device config layout. >> The problem here is that we'd need to add the domid after the queue specific >> data, resulting in a mess if further queue fields would be added later. >> >> We could try that, of course. > > Right, we must make it part of the standard if we modify > virtio_pci_common_cfg, or else newly added fields would overlap the > Xen specific one. > > Would it be possible to signal Xen-grants support in the > `device_feature` field, and then expose it from a vendor capability? > IOW, would it be possible to add a Xen-specific hook in the parsing of > virtio_pci_common_cfg that would then fetch additional data from a > capability? TBH, I don't know. It might require some changes in the central parsing logic, but this shouldn't be too hard to do. > That would likely be less intrusive than adding a new Xen-specific > field to virtio_pci_common_cfg while still allowing us to do Xen > specific configuration for all VirtIO devices. In case we want to go that route, this should be in a new "platform config" capability, which might be just another form of a vendor capability. > >>> >>>> Later we could even add grant-V3 support to Xen and to the virtio IOMMU device >>>> (see my last year Xen Summit design session). This could be usable for >>>> disaggregated KVM setups, too, so I believe there is a chance to get this >>>> accepted. >>>> >>>>>>>>> ********** >>>>>>>>> What do you think about it? Are there any pitfalls, etc? This also requires >>>>>>>>> system changes, but at least without virtio spec changes. >>>>>>>> >>>>>>>> Why are we so reluctant to add spec changes? I understand this might >>>>>>>> take time an effort, but it's the only way IMO to build a sustainable >>>>>>>> VirtIO Xen implementation. Did we already attempt to negotiate with >>>>>>>> Oasis Xen related spec changes and those where refused? >>>>>>> >>>>>>> That's because spec changes can be very slow. This is a bug that we need >>>>>>> a relatively quick solution for and waiting 12-24 months for a spec >>>>>>> update is not realistic. >>>>>>> >>>>>>> I think a spec change would be best as a long term solution. We also >>>>>>> need a short term solution. The short term solution doesn't have to be >>>>>>> ideal but it has to work now. >>>>>> >>>>>> My fear with such approach is that once a bodge is in place people >>>>>> move on to other stuff and this never gets properly fixed. >>>>>> >>>>>> I know this might not be a well received opinion, but it would be >>>>>> better if such bodge is kept in each interested party patchqueue for >>>>>> the time being, until a proper solution is implemented. That way >>>>>> there's an interest from parties into properly fixing it upstream. >>>>> >>>>> Unfortunately we are in the situation where we have an outstanding >>>>> upstream bug, so we have to take action one way or the other. >>>> >>>> The required virtio IOMMU device modification would be rather small, so >>>> adding it maybe under a CONFIG option defaulting to off might be >>>> acceptable. >>> >>> Would you then do the grant allocation as part of virtio IOMMU? >> >> Long term, maybe. Do you remember my Grant-V3 design session last year? Being >> able to reuse the same layout for virtio IOMMU was one of the basic ideas for >> that layout (this would need some heavy work on the virtio IOMMU frontend and >> backend, of course). > > While this might well be the best option, do we have anyone with the > time and expertise to work on this? I might be wrong, but it seems > like a huge task. As a background project I'd like to pursue it. OTOH I'm not sure how much time I could spend on it. Juergen