xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Design session "MSI-X support with Linux stubdomain" notes
@ 2022-09-22 16:05 Anthony PERARD
  2022-09-22 18:00 ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Anthony PERARD @ 2022-09-22 16:05 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, Jan Beulich, George Dunlap,
	Roger Pau Monné

WARNING: Notes missing at the beginning of the meeting.

session description:
> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
> MSI-X from the PCI config space. Some drivers are not happy about that, which
> is understandable (device natively supports MSI-X, so fallback path are
> rarely tested).
>
> This is mostly (?) about qemu accessing /dev/mem directly (here:
> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
> discuss alternative interface that stubdomain could use.



when qemu forward interrupt,
    for correct mask bit, it read physical mask bit.
    an hypercall would make sense.
    -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
    from guest point of view, interrupt should be unmask.

interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
    it's weird..

someone should have a look, and rationalize this weird path.

Xen tries to not forward everything to qemu.

why don't we do that in xen.
    there's already code in xen for that.

Issue: having QEMU open /dev/mem within stubdom isn't working.

We could look at removing the need for /dev/mem by improving support for qemu-depriv.

hypervisor configuration interface was intended for one domain. having stubdom in
the middle makes thing difficult.

See QEMU's code
    https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579
        fd = open("/dev/mem", O_RDWR);

TODO:
step1: Find out why qemu wants that mask?
step2: identify what is missing in the PV interface.

QEMU use this to read the Pending Bit Array (PBA), and read entry in  table

comments at L465 (of xen_pt_msi.c) doesn't makes sense

Xen could do more fixup

passing value from hardware??
    can't pass vector to the guest,
    xen overwrite mask bit. (or something)

Did MSI-X worked in qemu-trad in stubdom?
    No one in the room could remember.

MSI-X is required for pci express, not that thing are implemented correctly.

TODO:
- get rid of opening /dev/mem in qemu


Cheers,

-- 
Anthony PERARD


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-22 16:05 Design session "MSI-X support with Linux stubdomain" notes Anthony PERARD
@ 2022-09-22 18:00 ` Jan Beulich
  2022-09-26 12:43   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-09-22 18:00 UTC (permalink / raw)
  To: xen-devel
  Cc: Marek Marczykowski-Górecki, George Dunlap,
	Roger Pau Monné,
	Anthony PERARD

On 22.09.2022 18:05, Anthony PERARD wrote:
> WARNING: Notes missing at the beginning of the meeting.
> 
> session description:
>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
>> MSI-X from the PCI config space. Some drivers are not happy about that, which
>> is understandable (device natively supports MSI-X, so fallback path are
>> rarely tested).
>>
>> This is mostly (?) about qemu accessing /dev/mem directly (here:
>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
>> discuss alternative interface that stubdomain could use.
> 
> 
> 
> when qemu forward interrupt,
>     for correct mask bit, it read physical mask bit.
>     an hypercall would make sense.
>     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
>     from guest point of view, interrupt should be unmask.
> 
> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
>     it's weird..
> 
> someone should have a look, and rationalize this weird path.
> 
> Xen tries to not forward everything to qemu.
> 
> why don't we do that in xen.
>     there's already code in xen for that.

So what I didn't pay enough attention to when talking was that the
completion logic in Xen is for writes only. Maybe something similar
can be had for reads as well, but that's to be checked ...

Jan

> Issue: having QEMU open /dev/mem within stubdom isn't working.
> 
> We could look at removing the need for /dev/mem by improving support for qemu-depriv.
> 
> hypervisor configuration interface was intended for one domain. having stubdom in
> the middle makes thing difficult.
> 
> See QEMU's code
>     https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579
>         fd = open("/dev/mem", O_RDWR);
> 
> TODO:
> step1: Find out why qemu wants that mask?
> step2: identify what is missing in the PV interface.
> 
> QEMU use this to read the Pending Bit Array (PBA), and read entry in  table
> 
> comments at L465 (of xen_pt_msi.c) doesn't makes sense
> 
> Xen could do more fixup
> 
> passing value from hardware??
>     can't pass vector to the guest,
>     xen overwrite mask bit. (or something)
> 
> Did MSI-X worked in qemu-trad in stubdom?
>     No one in the room could remember.
> 
> MSI-X is required for pci express, not that thing are implemented correctly.
> 
> TODO:
> - get rid of opening /dev/mem in qemu
> 
> 
> Cheers,
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-22 18:00 ` Jan Beulich
@ 2022-09-26 12:43   ` Marek Marczykowski-Górecki
  2022-09-26 12:47     ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Marczykowski-Górecki @ 2022-09-26 12:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, George Dunlap, Roger Pau Monné, Anthony PERARD

[-- Attachment #1: Type: text/plain, Size: 2813 bytes --]

On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
> On 22.09.2022 18:05, Anthony PERARD wrote:
> > WARNING: Notes missing at the beginning of the meeting.
> > 
> > session description:
> >> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
> >> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
> >> MSI-X from the PCI config space. Some drivers are not happy about that, which
> >> is understandable (device natively supports MSI-X, so fallback path are
> >> rarely tested).
> >>
> >> This is mostly (?) about qemu accessing /dev/mem directly (here:
> >> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
> >> discuss alternative interface that stubdomain could use.
> > 
> > 
> > 
> > when qemu forward interrupt,
> >     for correct mask bit, it read physical mask bit.
> >     an hypercall would make sense.
> >     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
> >     from guest point of view, interrupt should be unmask.
> > 
> > interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
> >     it's weird..
> > 
> > someone should have a look, and rationalize this weird path.
> > 
> > Xen tries to not forward everything to qemu.
> > 
> > why don't we do that in xen.
> >     there's already code in xen for that.
> 
> So what I didn't pay enough attention to when talking was that the
> completion logic in Xen is for writes only. Maybe something similar
> can be had for reads as well, but that's to be checked ...

I spent some time trying to follow that part of qemu, and I think it
reads vector control only on the write path, to keep some bits
unchanged, and also detect whether Xen masked it behind qemu's back.
My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
mask-all requests separately" it is unnecessary, because Xen will
remember guest's intention, so qemu can simply use its own internal
state and act on that (guest writes will go through qemu, so it should
have up to date view from guest's point of view).

As for PBA access, it is read by qemu only to pass it to the guest. I'm
not sure whether qemu should use hypercall to retrieve it, or maybe
Xen should fixup value itself on the read path.

I did some preliminary patch here:
https://github.com/marmarek/qubes-vmm-xen-stubdom-linux/commit/80cf769f3659aa0d7f2b5598bf862d83da28807e

but it does not work yet. It seems I haven't undo MSI-X hiding enough
(lspci inside the guest doesn't report MSI-X at all). This I will figure
out, but I'd appreciate comments about how to handle PBA best.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-26 12:43   ` Marek Marczykowski-Górecki
@ 2022-09-26 12:47     ` Jan Beulich
  2022-09-29 10:57       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-09-26 12:47 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, George Dunlap, Roger Pau Monné, Anthony PERARD

On 26.09.2022 14:43, Marek Marczykowski-Górecki wrote:
> On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
>> On 22.09.2022 18:05, Anthony PERARD wrote:
>>> WARNING: Notes missing at the beginning of the meeting.
>>>
>>> session description:
>>>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
>>>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
>>>> MSI-X from the PCI config space. Some drivers are not happy about that, which
>>>> is understandable (device natively supports MSI-X, so fallback path are
>>>> rarely tested).
>>>>
>>>> This is mostly (?) about qemu accessing /dev/mem directly (here:
>>>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
>>>> discuss alternative interface that stubdomain could use.
>>>
>>>
>>>
>>> when qemu forward interrupt,
>>>     for correct mask bit, it read physical mask bit.
>>>     an hypercall would make sense.
>>>     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
>>>     from guest point of view, interrupt should be unmask.
>>>
>>> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
>>>     it's weird..
>>>
>>> someone should have a look, and rationalize this weird path.
>>>
>>> Xen tries to not forward everything to qemu.
>>>
>>> why don't we do that in xen.
>>>     there's already code in xen for that.
>>
>> So what I didn't pay enough attention to when talking was that the
>> completion logic in Xen is for writes only. Maybe something similar
>> can be had for reads as well, but that's to be checked ...
> 
> I spent some time trying to follow that part of qemu, and I think it
> reads vector control only on the write path, to keep some bits
> unchanged, and also detect whether Xen masked it behind qemu's back.
> My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
> mask-all requests separately" it is unnecessary, because Xen will
> remember guest's intention, so qemu can simply use its own internal
> state and act on that (guest writes will go through qemu, so it should
> have up to date view from guest's point of view).
> 
> As for PBA access, it is read by qemu only to pass it to the guest. I'm
> not sure whether qemu should use hypercall to retrieve it, or maybe
> Xen should fixup value itself on the read path.

Forwarding the access to qemu just for qemu to use a hypercall to obtain
the value needed seems backwards to me. If we need new code in Xen, we
can as well handle the read directly I think, without involving qemu.

Jan

> I did some preliminary patch here:
> https://github.com/marmarek/qubes-vmm-xen-stubdom-linux/commit/80cf769f3659aa0d7f2b5598bf862d83da28807e
> 
> but it does not work yet. It seems I haven't undo MSI-X hiding enough
> (lspci inside the guest doesn't report MSI-X at all). This I will figure
> out, but I'd appreciate comments about how to handle PBA best.
> 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-26 12:47     ` Jan Beulich
@ 2022-09-29 10:57       ` Marek Marczykowski-Górecki
  2022-09-29 11:44         ` Jan Beulich
  0 siblings, 1 reply; 8+ messages in thread
From: Marek Marczykowski-Górecki @ 2022-09-29 10:57 UTC (permalink / raw)
  To: Jan Beulich
  Cc: xen-devel, George Dunlap, Roger Pau Monné, Anthony PERARD

[-- Attachment #1: Type: text/plain, Size: 3698 bytes --]

On Mon, Sep 26, 2022 at 02:47:55PM +0200, Jan Beulich wrote:
> On 26.09.2022 14:43, Marek Marczykowski-Górecki wrote:
> > On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
> >> On 22.09.2022 18:05, Anthony PERARD wrote:
> >>> WARNING: Notes missing at the beginning of the meeting.
> >>>
> >>> session description:
> >>>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
> >>>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
> >>>> MSI-X from the PCI config space. Some drivers are not happy about that, which
> >>>> is understandable (device natively supports MSI-X, so fallback path are
> >>>> rarely tested).
> >>>>
> >>>> This is mostly (?) about qemu accessing /dev/mem directly (here:
> >>>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
> >>>> discuss alternative interface that stubdomain could use.
> >>>
> >>>
> >>>
> >>> when qemu forward interrupt,
> >>>     for correct mask bit, it read physical mask bit.
> >>>     an hypercall would make sense.
> >>>     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
> >>>     from guest point of view, interrupt should be unmask.
> >>>
> >>> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
> >>>     it's weird..
> >>>
> >>> someone should have a look, and rationalize this weird path.
> >>>
> >>> Xen tries to not forward everything to qemu.
> >>>
> >>> why don't we do that in xen.
> >>>     there's already code in xen for that.
> >>
> >> So what I didn't pay enough attention to when talking was that the
> >> completion logic in Xen is for writes only. Maybe something similar
> >> can be had for reads as well, but that's to be checked ...
> > 
> > I spent some time trying to follow that part of qemu, and I think it
> > reads vector control only on the write path, to keep some bits
> > unchanged, and also detect whether Xen masked it behind qemu's back.
> > My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
> > mask-all requests separately" it is unnecessary, because Xen will
> > remember guest's intention, so qemu can simply use its own internal
> > state and act on that (guest writes will go through qemu, so it should
> > have up to date view from guest's point of view).
> > 
> > As for PBA access, it is read by qemu only to pass it to the guest. I'm
> > not sure whether qemu should use hypercall to retrieve it, or maybe
> > Xen should fixup value itself on the read path.
> 
> Forwarding the access to qemu just for qemu to use a hypercall to obtain
> the value needed seems backwards to me. If we need new code in Xen, we
> can as well handle the read directly I think, without involving qemu.

I'm not sure if I fully follow what qemu does here, but I think the
reason for such handling is that PBA can (and often do) live on the same
page as the actual MSI-X table. I'm trying to adjust qemu to not
intercept this read, but at this point I'm not yet sure of that's even
possible on sub-page granularity.

But, to go forward with PoC/debugging, I hardwired PBA read to
0xFFFFFFFF, and it seems it doesn't work. My observation is that the
handler in the Linux driver isn't called. There are several moving
part (it could very well be bug in the driver, or some other part in the
VM). Is there some place in Xen I can see if an interrupt gets delivered
to the guest (some function I can add debug print to), or is it
delivered directly to the guest?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-29 10:57       ` Marek Marczykowski-Górecki
@ 2022-09-29 11:44         ` Jan Beulich
  2022-09-29 11:52           ` Roger Pau Monné
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Beulich @ 2022-09-29 11:44 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: xen-devel, George Dunlap, Roger Pau Monné, Anthony PERARD

On 29.09.2022 12:57, Marek Marczykowski-Górecki wrote:
> On Mon, Sep 26, 2022 at 02:47:55PM +0200, Jan Beulich wrote:
>> On 26.09.2022 14:43, Marek Marczykowski-Górecki wrote:
>>> On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
>>>> On 22.09.2022 18:05, Anthony PERARD wrote:
>>>>> WARNING: Notes missing at the beginning of the meeting.
>>>>>
>>>>> session description:
>>>>>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
>>>>>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
>>>>>> MSI-X from the PCI config space. Some drivers are not happy about that, which
>>>>>> is understandable (device natively supports MSI-X, so fallback path are
>>>>>> rarely tested).
>>>>>>
>>>>>> This is mostly (?) about qemu accessing /dev/mem directly (here:
>>>>>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
>>>>>> discuss alternative interface that stubdomain could use.
>>>>>
>>>>>
>>>>>
>>>>> when qemu forward interrupt,
>>>>>     for correct mask bit, it read physical mask bit.
>>>>>     an hypercall would make sense.
>>>>>     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
>>>>>     from guest point of view, interrupt should be unmask.
>>>>>
>>>>> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
>>>>>     it's weird..
>>>>>
>>>>> someone should have a look, and rationalize this weird path.
>>>>>
>>>>> Xen tries to not forward everything to qemu.
>>>>>
>>>>> why don't we do that in xen.
>>>>>     there's already code in xen for that.
>>>>
>>>> So what I didn't pay enough attention to when talking was that the
>>>> completion logic in Xen is for writes only. Maybe something similar
>>>> can be had for reads as well, but that's to be checked ...
>>>
>>> I spent some time trying to follow that part of qemu, and I think it
>>> reads vector control only on the write path, to keep some bits
>>> unchanged, and also detect whether Xen masked it behind qemu's back.
>>> My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
>>> mask-all requests separately" it is unnecessary, because Xen will
>>> remember guest's intention, so qemu can simply use its own internal
>>> state and act on that (guest writes will go through qemu, so it should
>>> have up to date view from guest's point of view).
>>>
>>> As for PBA access, it is read by qemu only to pass it to the guest. I'm
>>> not sure whether qemu should use hypercall to retrieve it, or maybe
>>> Xen should fixup value itself on the read path.
>>
>> Forwarding the access to qemu just for qemu to use a hypercall to obtain
>> the value needed seems backwards to me. If we need new code in Xen, we
>> can as well handle the read directly I think, without involving qemu.
> 
> I'm not sure if I fully follow what qemu does here, but I think the
> reason for such handling is that PBA can (and often do) live on the same
> page as the actual MSI-X table. I'm trying to adjust qemu to not
> intercept this read, but at this point I'm not yet sure of that's even
> possible on sub-page granularity.
> 
> But, to go forward with PoC/debugging, I hardwired PBA read to
> 0xFFFFFFFF, and it seems it doesn't work. My observation is that the
> handler in the Linux driver isn't called. There are several moving
> part (it could very well be bug in the driver, or some other part in the
> VM). Is there some place in Xen I can see if an interrupt gets delivered
> to the guest (some function I can add debug print to), or is it
> delivered directly to the guest?

I guess "iommu=no-intpost" would suppress "direct" delivery (if hardware
is capable of that in the first place). And wait - this option actually
default to off.

As to software delivery - I guess you would want to start from
do_IRQ_guest() and then see where things get lost. (Adding logging to
such a path of course has a fair risk of ending up overly chatty.)

Jan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-29 11:44         ` Jan Beulich
@ 2022-09-29 11:52           ` Roger Pau Monné
  2022-09-29 12:48             ` Juergen Gross
  0 siblings, 1 reply; 8+ messages in thread
From: Roger Pau Monné @ 2022-09-29 11:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Marek Marczykowski-Górecki, xen-devel, George Dunlap,
	Anthony PERARD

On Thu, Sep 29, 2022 at 01:44:28PM +0200, Jan Beulich wrote:
> On 29.09.2022 12:57, Marek Marczykowski-Górecki wrote:
> > On Mon, Sep 26, 2022 at 02:47:55PM +0200, Jan Beulich wrote:
> >> On 26.09.2022 14:43, Marek Marczykowski-Górecki wrote:
> >>> On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
> >>>> On 22.09.2022 18:05, Anthony PERARD wrote:
> >>>>> WARNING: Notes missing at the beginning of the meeting.
> >>>>>
> >>>>> session description:
> >>>>>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
> >>>>>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
> >>>>>> MSI-X from the PCI config space. Some drivers are not happy about that, which
> >>>>>> is understandable (device natively supports MSI-X, so fallback path are
> >>>>>> rarely tested).
> >>>>>>
> >>>>>> This is mostly (?) about qemu accessing /dev/mem directly (here:
> >>>>>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
> >>>>>> discuss alternative interface that stubdomain could use.
> >>>>>
> >>>>>
> >>>>>
> >>>>> when qemu forward interrupt,
> >>>>>     for correct mask bit, it read physical mask bit.
> >>>>>     an hypercall would make sense.
> >>>>>     -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
> >>>>>     from guest point of view, interrupt should be unmask.
> >>>>>
> >>>>> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
> >>>>>     it's weird..
> >>>>>
> >>>>> someone should have a look, and rationalize this weird path.
> >>>>>
> >>>>> Xen tries to not forward everything to qemu.
> >>>>>
> >>>>> why don't we do that in xen.
> >>>>>     there's already code in xen for that.
> >>>>
> >>>> So what I didn't pay enough attention to when talking was that the
> >>>> completion logic in Xen is for writes only. Maybe something similar
> >>>> can be had for reads as well, but that's to be checked ...
> >>>
> >>> I spent some time trying to follow that part of qemu, and I think it
> >>> reads vector control only on the write path, to keep some bits
> >>> unchanged, and also detect whether Xen masked it behind qemu's back.
> >>> My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
> >>> mask-all requests separately" it is unnecessary, because Xen will
> >>> remember guest's intention, so qemu can simply use its own internal
> >>> state and act on that (guest writes will go through qemu, so it should
> >>> have up to date view from guest's point of view).
> >>>
> >>> As for PBA access, it is read by qemu only to pass it to the guest. I'm
> >>> not sure whether qemu should use hypercall to retrieve it, or maybe
> >>> Xen should fixup value itself on the read path.
> >>
> >> Forwarding the access to qemu just for qemu to use a hypercall to obtain
> >> the value needed seems backwards to me. If we need new code in Xen, we
> >> can as well handle the read directly I think, without involving qemu.
> > 
> > I'm not sure if I fully follow what qemu does here, but I think the
> > reason for such handling is that PBA can (and often do) live on the same
> > page as the actual MSI-X table. I'm trying to adjust qemu to not
> > intercept this read, but at this point I'm not yet sure of that's even
> > possible on sub-page granularity.
> > 
> > But, to go forward with PoC/debugging, I hardwired PBA read to
> > 0xFFFFFFFF, and it seems it doesn't work. My observation is that the
> > handler in the Linux driver isn't called. There are several moving
> > part (it could very well be bug in the driver, or some other part in the
> > VM). Is there some place in Xen I can see if an interrupt gets delivered
> > to the guest (some function I can add debug print to), or is it
> > delivered directly to the guest?
> 
> I guess "iommu=no-intpost" would suppress "direct" delivery (if hardware
> is capable of that in the first place). And wait - this option actually
> default to off.
> 
> As to software delivery - I guess you would want to start from
> do_IRQ_guest() and then see where things get lost. (Adding logging to
> such a path of course has a fair risk of ending up overly chatty.)

Having dealt with interrupt issues before, try to limit logging to the
IRQ you are interested on only - using xentrace might be a better
option depending on what you need to debug, albeit it's kind of a pain
to add new trace points as you also need to modify xenalyze to print
them.

Roger.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Design session "MSI-X support with Linux stubdomain" notes
  2022-09-29 11:52           ` Roger Pau Monné
@ 2022-09-29 12:48             ` Juergen Gross
  0 siblings, 0 replies; 8+ messages in thread
From: Juergen Gross @ 2022-09-29 12:48 UTC (permalink / raw)
  To: Roger Pau Monné, Jan Beulich
  Cc: Marek Marczykowski-Górecki, xen-devel, George Dunlap,
	Anthony PERARD


[-- Attachment #1.1.1: Type: text/plain, Size: 4722 bytes --]

On 29.09.22 13:52, Roger Pau Monné wrote:
> On Thu, Sep 29, 2022 at 01:44:28PM +0200, Jan Beulich wrote:
>> On 29.09.2022 12:57, Marek Marczykowski-Górecki wrote:
>>> On Mon, Sep 26, 2022 at 02:47:55PM +0200, Jan Beulich wrote:
>>>> On 26.09.2022 14:43, Marek Marczykowski-Górecki wrote:
>>>>> On Thu, Sep 22, 2022 at 08:00:00PM +0200, Jan Beulich wrote:
>>>>>> On 22.09.2022 18:05, Anthony PERARD wrote:
>>>>>>> WARNING: Notes missing at the beginning of the meeting.
>>>>>>>
>>>>>>> session description:
>>>>>>>> Currently a HVM with PCI passthrough and Qemu Linux stubdomain doesn’t
>>>>>>>> support MSI-X. For the device to (partially) work, Qemu needs a patch masking
>>>>>>>> MSI-X from the PCI config space. Some drivers are not happy about that, which
>>>>>>>> is understandable (device natively supports MSI-X, so fallback path are
>>>>>>>> rarely tested).
>>>>>>>>
>>>>>>>> This is mostly (?) about qemu accessing /dev/mem directly (here:
>>>>>>>> https://github.com/qemu/qemu/blob/master/hw/xen/xen_pt_msi.c#L579) - lets
>>>>>>>> discuss alternative interface that stubdomain could use.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> when qemu forward interrupt,
>>>>>>>      for correct mask bit, it read physical mask bit.
>>>>>>>      an hypercall would make sense.
>>>>>>>      -> benefit, mask bit in hardware will be what hypervisor desire, and device model desire.
>>>>>>>      from guest point of view, interrupt should be unmask.
>>>>>>>
>>>>>>> interrupt request are first forwarded to qemu, so xen have to do some post processing once request comes back from qemu.
>>>>>>>      it's weird..
>>>>>>>
>>>>>>> someone should have a look, and rationalize this weird path.
>>>>>>>
>>>>>>> Xen tries to not forward everything to qemu.
>>>>>>>
>>>>>>> why don't we do that in xen.
>>>>>>>      there's already code in xen for that.
>>>>>>
>>>>>> So what I didn't pay enough attention to when talking was that the
>>>>>> completion logic in Xen is for writes only. Maybe something similar
>>>>>> can be had for reads as well, but that's to be checked ...
>>>>>
>>>>> I spent some time trying to follow that part of qemu, and I think it
>>>>> reads vector control only on the write path, to keep some bits
>>>>> unchanged, and also detect whether Xen masked it behind qemu's back.
>>>>> My understanding is, since 484d7c852e "x86/MSI-X: track host and guest
>>>>> mask-all requests separately" it is unnecessary, because Xen will
>>>>> remember guest's intention, so qemu can simply use its own internal
>>>>> state and act on that (guest writes will go through qemu, so it should
>>>>> have up to date view from guest's point of view).
>>>>>
>>>>> As for PBA access, it is read by qemu only to pass it to the guest. I'm
>>>>> not sure whether qemu should use hypercall to retrieve it, or maybe
>>>>> Xen should fixup value itself on the read path.
>>>>
>>>> Forwarding the access to qemu just for qemu to use a hypercall to obtain
>>>> the value needed seems backwards to me. If we need new code in Xen, we
>>>> can as well handle the read directly I think, without involving qemu.
>>>
>>> I'm not sure if I fully follow what qemu does here, but I think the
>>> reason for such handling is that PBA can (and often do) live on the same
>>> page as the actual MSI-X table. I'm trying to adjust qemu to not
>>> intercept this read, but at this point I'm not yet sure of that's even
>>> possible on sub-page granularity.
>>>
>>> But, to go forward with PoC/debugging, I hardwired PBA read to
>>> 0xFFFFFFFF, and it seems it doesn't work. My observation is that the
>>> handler in the Linux driver isn't called. There are several moving
>>> part (it could very well be bug in the driver, or some other part in the
>>> VM). Is there some place in Xen I can see if an interrupt gets delivered
>>> to the guest (some function I can add debug print to), or is it
>>> delivered directly to the guest?
>>
>> I guess "iommu=no-intpost" would suppress "direct" delivery (if hardware
>> is capable of that in the first place). And wait - this option actually
>> default to off.
>>
>> As to software delivery - I guess you would want to start from
>> do_IRQ_guest() and then see where things get lost. (Adding logging to
>> such a path of course has a fair risk of ending up overly chatty.)
> 
> Having dealt with interrupt issues before, try to limit logging to the
> IRQ you are interested on only - using xentrace might be a better
> option depending on what you need to debug, albeit it's kind of a pain
> to add new trace points as you also need to modify xenalyze to print
> them.

Did you consider using debugtrace_printk()?


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-09-29 12:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-22 16:05 Design session "MSI-X support with Linux stubdomain" notes Anthony PERARD
2022-09-22 18:00 ` Jan Beulich
2022-09-26 12:43   ` Marek Marczykowski-Górecki
2022-09-26 12:47     ` Jan Beulich
2022-09-29 10:57       ` Marek Marczykowski-Górecki
2022-09-29 11:44         ` Jan Beulich
2022-09-29 11:52           ` Roger Pau Monné
2022-09-29 12:48             ` Juergen Gross

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).