On 07.07.23 16:42, Roger Pau Monné wrote:
> On Fri, Jul 07, 2023 at 04:10:14PM +0200, Juergen Gross wrote:
>> On 07.07.23 11:50, Roger Pau Monné wrote:
>>> On Fri, Jul 07, 2023 at 06:38:48AM +0200, Juergen Gross wrote:
>>>> On 06.07.23 23:49, Stefano Stabellini wrote:
>>>>> On Thu, 6 Jul 2023, Roger Pau Monné wrote:
>>>>>> On Wed, Jul 05, 2023 at 03:41:10PM -0700, Stefano Stabellini wrote:
>>>>>>> On Wed, 5 Jul 2023, Roger Pau Monné wrote:
>>>>>>>> On Tue, Jul 04, 2023 at 08:14:59PM +0300, Oleksandr Tyshchenko wrote:
>>>>>>>>> Part 2 (clarification):
>>>>>>>>>
>>>>>>>>> I think using a special config space register in the root complex would
>>>>>>>>> not be terrible in terms of guest changes because it is easy to
>>>>>>>>> introduce a new root complex driver in Linux and other OSes. The root
>>>>>>>>> complex would still be ECAM compatible so the regular ECAM driver would
>>>>>>>>> still work. A new driver would only be necessary if you want to be able
>>>>>>>>> to access the special config space register.
>>>>>>>>
>>>>>>>> I'm slightly worry of this approach, we end up modifying a root
>>>>>>>> complex emulation in order to avoid modifying a PCI device emulation
>>>>>>>> on QEMU, not sure that's a good trade off.
>>>>>>>>
>>>>>>>> Note also that different architectures will likely have different root
>>>>>>>> complex, and so you might need to modify several of them, plus then
>>>>>>>> arrange the PCI layout correctly in order to have the proper hierarchy
>>>>>>>> so that devices belonging to different driver domains are assigned to
>>>>>>>> different bridges.
>>>>>>>
>>>>>>> I do think that adding something to the PCI conf register somewhere is
>>>>>>> the best option because it is not dependent on ACPI and it is not
>>>>>>> dependent on xenstore both of which are very undesirable.
>>>>>>>
>>>>>>> I am not sure where specifically is the best place. These are 3 ideas
>>>>>>> we came up with:
>>>>>>> 1. PCI root complex
>>>>>>> 2. a register on the device itself
>>>>>>> 3. a new capability of the device
>>>>>>> 4. add one extra dummy PCI device for the sole purpose of exposing the
>>>>>>>       grants capability
>>>>>>>
>>>>>>>
>>>>>>> Looking at the spec, there is a way to add a vendor-specific capability
>>>>>>> (cap_vndr = 0x9). Could we use that? It doesn't look like it is used
>>>>>>> today, Linux doesn't parse it.
>>>>>>
>>>>>> I did wonder the same from a quick look at the spec.  There's however
>>>>>> a text in the specification that says:
>>>>>>
>>>>>> "The driver SHOULD NOT use the Vendor data capability except for
>>>>>> debugging and reporting purposes."
>>>>>>
>>>>>> So we would at least need to change that because the capability would
>>>>>> then be used by other purposes different than debugging and reporting.
>>>>>>
>>>>>> Seems like a minor adjustment, so might we worth asking upstream about
>>>>>> their opinion, and to get a conversation started.
>>>>>
>>>>> Wait, wouldn't this use-case fall under "reporting" ? It is exactly what
>>>>> we are doing, right?
>>>>
>>>> I'd understand "reporting" as e.g. logging, transferring statistics, ...
>>>>
>>>> We'd like to use it for configuration purposes.
>>>
>>> I've also read it that way.
>>>
>>>> Another idea would be to enhance the virtio IOMMU device to suit our needs:
>>>> we could add the domid as another virtio IOMMU device capability and (for now)
>>>> use bypass mode for all "productive" devices.
>>>
>>> If we have to start adding capabilties, won't it be easier to just add
>>> it to the each device instead of adding it to virtio IOMMU.  Or is the
>>> parsing of capabilities device specific, and hence we would have to
>>> implement such parsing for each device?  I would expect some
>>> capabilities are shared between all devices, and a Xen capability could
>>> be one of those.
>>
>> Have a look at [1], which is describing the common device config layout.
>> The problem here is that we'd need to add the domid after the queue specific
>> data, resulting in a mess if further queue fields would be added later.
>>
>> We could try that, of course.
> 
> Right, we must make it part of the standard if we modify
> virtio_pci_common_cfg, or else newly added fields would overlap the
> Xen specific one.
> 
> Would it be possible to signal Xen-grants support in the
> `device_feature` field, and then expose it from a vendor capability?
> IOW, would it be possible to add a Xen-specific hook in the parsing of
> virtio_pci_common_cfg that would then fetch additional data from a
> capability?

TBH, I don't know. It might require some changes in the central parsing
logic, but this shouldn't be too hard to do.

> That would likely be less intrusive than adding a new Xen-specific
> field to virtio_pci_common_cfg while still allowing us to do Xen
> specific configuration for all VirtIO devices.

In case we want to go that route, this should be in a new "platform config"
capability, which might be just another form of a vendor capability.

> 
>>>
>>>> Later we could even add grant-V3 support to Xen and to the virtio IOMMU device
>>>> (see my last year Xen Summit design session). This could be usable for
>>>> disaggregated KVM setups, too, so I believe there is a chance to get this
>>>> accepted.
>>>>
>>>>>>>>> **********
>>>>>>>>> What do you think about it? Are there any pitfalls, etc? This also requires
>>>>>>>>> system changes, but at least without virtio spec changes.
>>>>>>>>
>>>>>>>> Why are we so reluctant to add spec changes?  I understand this might
>>>>>>>> take time an effort, but it's the only way IMO to build a sustainable
>>>>>>>> VirtIO Xen implementation.  Did we already attempt to negotiate with
>>>>>>>> Oasis Xen related spec changes and those where refused?
>>>>>>>
>>>>>>> That's because spec changes can be very slow. This is a bug that we need
>>>>>>> a relatively quick solution for and waiting 12-24 months for a spec
>>>>>>> update is not realistic.
>>>>>>>
>>>>>>> I think a spec change would be best as a long term solution. We also
>>>>>>> need a short term solution. The short term solution doesn't have to be
>>>>>>> ideal but it has to work now.
>>>>>>
>>>>>> My fear with such approach is that once a bodge is in place people
>>>>>> move on to other stuff and this never gets properly fixed.
>>>>>>
>>>>>> I know this might not be a well received opinion, but it would be
>>>>>> better if such bodge is kept in each interested party patchqueue for
>>>>>> the time being, until a proper solution is implemented.  That way
>>>>>> there's an interest from parties into properly fixing it upstream.
>>>>>
>>>>> Unfortunately we are in the situation where we have an outstanding
>>>>> upstream bug, so we have to take action one way or the other.
>>>>
>>>> The required virtio IOMMU device modification would be rather small, so
>>>> adding it maybe under a CONFIG option defaulting to off might be
>>>> acceptable.
>>>
>>> Would you then do the grant allocation as part of virtio IOMMU?
>>
>> Long term, maybe. Do you remember my Grant-V3 design session last year? Being
>> able to reuse the same layout for virtio IOMMU was one of the basic ideas for
>> that layout (this would need some heavy work on the virtio IOMMU frontend and
>> backend, of course).
> 
> While this might well be the best option, do we have anyone with the
> time and expertise to work on this?  I might be wrong, but it seems
> like a huge task.

As a background project I'd like to pursue it. OTOH I'm not sure how much time
I could spend on it.


Juergen