[Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
@ 2020-02-03  8:33 Sander Eikelenboom
  2020-02-03 12:23 ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-03  8:33 UTC (permalink / raw)
  To: Roger Pau Monné, xen-devel

Hi Roger,

Last week I encountered an issue with the PCI-passthrough of a USB controller. 
In the guest I get:
    [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
    [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
    [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
    [ 1143.356407] usb 1-2: USB disconnect, device number 2

Bisection turned up as the culprit: 
   commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
   x86/smp: use APIC ALLBUT destination shorthand when possible

I verified by reverting that commit and now it works fine again.

Box is AMD, guest is a HVM.

--
Sander

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03  8:33 [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible Sander Eikelenboom
@ 2020-02-03 12:23 ` Roger Pau Monné
  2020-02-03 12:30   ` Sander Eikelenboom
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-03 12:23 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
> Hi Roger,
> 
> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
> In the guest I get:
>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
> 
> Bisection turned up as the culprit: 
>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>    x86/smp: use APIC ALLBUT destination shorthand when possible

Sorry to hear that, let see if we can figure out what's wrong.

> I verified by reverting that commit and now it works fine again.

Does the same controller work fine when used in dom0?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 12:23 ` Roger Pau Monné
@ 2020-02-03 12:30   ` Sander Eikelenboom
  2020-02-03 12:41     ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-03 12:30 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 03/02/2020 13:23, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>> Hi Roger,
>>
>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>> In the guest I get:
>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>
>> Bisection turned up as the culprit: 
>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>    x86/smp: use APIC ALLBUT destination shorthand when possible
> 
> Sorry to hear that, let see if we can figure out what's wrong.

No problem, that is why I test stuff :)

>> I verified by reverting that commit and now it works fine again.
> 
> Does the same controller work fine when used in dom0?

Will test that, but as all other pci devices in dom0 work fine,
I assume this controller would also work fine in dom0 (as it has also
worked fine for ages with PCI-passthrough to that guest and still works
fine when reverting the referenced commit).

I don't know if your change can somehow have a side effect
on latency around the processing of pci-passthrough ?
(since the driver concluding that a device is non-responsive, will
probably be at least somewhat latency sensitive).

--
Sander

> Thanks, Roger.
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 12:30   ` Sander Eikelenboom
@ 2020-02-03 12:41     ` Roger Pau Monné
  2020-02-03 12:44       ` Sander Eikelenboom
  2020-02-03 12:49       ` Jan Beulich
  0 siblings, 2 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-03 12:41 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
> On 03/02/2020 13:23, Roger Pau Monné wrote:
> > On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
> >> Hi Roger,
> >>
> >> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
> >> In the guest I get:
> >>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
> >>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
> >>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
> >>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
> >>
> >> Bisection turned up as the culprit: 
> >>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> >>    x86/smp: use APIC ALLBUT destination shorthand when possible
> > 
> > Sorry to hear that, let see if we can figure out what's wrong.
> 
> No problem, that is why I test stuff :)
> 
> >> I verified by reverting that commit and now it works fine again.
> > 
> > Does the same controller work fine when used in dom0?
> 
> Will test that, but as all other pci devices in dom0 work fine,
> I assume this controller would also work fine in dom0 (as it has also
> worked fine for ages with PCI-passthrough to that guest and still works
> fine when reverting the referenced commit).

Is this the only device that fails to work when doing pci-passthrough,
or other devices also don't work with the mentioned change applied?

Have you tested on other boxes?

> I don't know if your change can somehow have a side effect
> on latency around the processing of pci-passthrough ?

Hm, the mentioned commit should speed up broadcast IPIs, but I don't
see how it could slow down other interrupts. Also I would think the
domain is not receiving interrupts from the device, rather than
interrupts being slow.

Can you also paste the output of lspci -v for that xHCI device from
dom0?

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 12:41     ` Roger Pau Monné
@ 2020-02-03 12:44       ` Sander Eikelenboom
  2020-02-03 13:21         ` Roger Pau Monné
  2020-02-03 12:49       ` Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-03 12:44 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 03/02/2020 13:41, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>> Hi Roger,
>>>>
>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>>>> In the guest I get:
>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>
>>>> Bisection turned up as the culprit: 
>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Sorry to hear that, let see if we can figure out what's wrong.
>>
>> No problem, that is why I test stuff :)
>>
>>>> I verified by reverting that commit and now it works fine again.
>>>
>>> Does the same controller work fine when used in dom0?
>>
>> Will test that, but as all other pci devices in dom0 work fine,
>> I assume this controller would also work fine in dom0 (as it has also
>> worked fine for ages with PCI-passthrough to that guest and still works
>> fine when reverting the referenced commit).
> 
> Is this the only device that fails to work when doing pci-passthrough,
> or other devices also don't work with the mentioned change applied?
> 
> Have you tested on other boxes?
> 
>> I don't know if your change can somehow have a side effect
>> on latency around the processing of pci-passthrough ?
> 
> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
> see how it could slow down other interrupts. Also I would think the
> domain is not receiving interrupts from the device, rather than
> interrupts being slow.
> 
> Can you also paste the output of lspci -v for that xHCI device from
> dom0?
> 
> Thanks, Roger.

Will do this evening including the testing in dom0 etc.
Will also see if there is any pattern when observing /proc/interrupts in
the guest.

Thanks,

Sander


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 12:41     ` Roger Pau Monné
  2020-02-03 12:44       ` Sander Eikelenboom
@ 2020-02-03 12:49       ` Jan Beulich
  1 sibling, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2020-02-03 12:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Sander Eikelenboom, xen-devel

On 03.02.2020 13:41, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>> Hi Roger,
>>>>
>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>>>> In the guest I get:
>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>
>>>> Bisection turned up as the culprit: 
>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>
>>> Sorry to hear that, let see if we can figure out what's wrong.
>>
>> No problem, that is why I test stuff :)
>>
>>>> I verified by reverting that commit and now it works fine again.
>>>
>>> Does the same controller work fine when used in dom0?
>>
>> Will test that, but as all other pci devices in dom0 work fine,
>> I assume this controller would also work fine in dom0 (as it has also
>> worked fine for ages with PCI-passthrough to that guest and still works
>> fine when reverting the referenced commit).
> 
> Is this the only device that fails to work when doing pci-passthrough,
> or other devices also don't work with the mentioned change applied?
> 
> Have you tested on other boxes?
> 
>> I don't know if your change can somehow have a side effect
>> on latency around the processing of pci-passthrough ?
> 
> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
> see how it could slow down other interrupts. Also I would think the
> domain is not receiving interrupts from the device, rather than
> interrupts being slow.
> 
> Can you also paste the output of lspci -v for that xHCI device from
> dom0?

If this is AMD hardware, then another thing to try just to get an
additional data point would be limiting of CPUs used ("maxcpus="),
as that ought to suppress the actual sending of ALLBUT IPIs.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 12:44       ` Sander Eikelenboom
@ 2020-02-03 13:21         ` Roger Pau Monné
  2020-02-05 10:23           ` Roger Pau Monné
  2020-02-10 20:49           ` Sander Eikelenboom
  0 siblings, 2 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-03 13:21 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
> On 03/02/2020 13:41, Roger Pau Monné wrote:
> > On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
> >> On 03/02/2020 13:23, Roger Pau Monné wrote:
> >>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
> >>>> Hi Roger,
> >>>>
> >>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
> >>>> In the guest I get:
> >>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
> >>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
> >>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
> >>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
> >>>>
> >>>> Bisection turned up as the culprit: 
> >>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> >>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
> >>>
> >>> Sorry to hear that, let see if we can figure out what's wrong.
> >>
> >> No problem, that is why I test stuff :)
> >>
> >>>> I verified by reverting that commit and now it works fine again.
> >>>
> >>> Does the same controller work fine when used in dom0?
> >>
> >> Will test that, but as all other pci devices in dom0 work fine,
> >> I assume this controller would also work fine in dom0 (as it has also
> >> worked fine for ages with PCI-passthrough to that guest and still works
> >> fine when reverting the referenced commit).
> > 
> > Is this the only device that fails to work when doing pci-passthrough,
> > or other devices also don't work with the mentioned change applied?
> > 
> > Have you tested on other boxes?
> > 
> >> I don't know if your change can somehow have a side effect
> >> on latency around the processing of pci-passthrough ?
> > 
> > Hm, the mentioned commit should speed up broadcast IPIs, but I don't
> > see how it could slow down other interrupts. Also I would think the
> > domain is not receiving interrupts from the device, rather than
> > interrupts being slow.
> > 
> > Can you also paste the output of lspci -v for that xHCI device from
> > dom0?
> > 
> > Thanks, Roger.
> 
> Will do this evening including the testing in dom0 etc.
> Will also see if there is any pattern when observing /proc/interrupts in
> the guest.

Thanks! I also have some trivial patch that I would like you to try,
just to discard send_IPI_mask clearing the scratch_cpumask under
another function feet.

Roger.
---
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 65eb7cbda8..aeeb506155 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
     bool cpus_locked = false;
-    cpumask_t *scratch = this_cpu(scratch_cpumask);
+    static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask);
+    cpumask_t *scratch = &this_cpu(send_ipi_cpumask);
 
     /*
      * This can only be safely used when no CPU hotplug or unplug operations


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 13:21         ` Roger Pau Monné
@ 2020-02-05 10:23           ` Roger Pau Monné
  2020-02-05 11:03             ` Sander Eikelenboom
  2020-02-10 20:49           ` Sander Eikelenboom
  1 sibling, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-05 10:23 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

Ping?

On Mon, Feb 03, 2020 at 02:21:08PM +0100, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
> > On 03/02/2020 13:41, Roger Pau Monné wrote:
> > > On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
> > >> On 03/02/2020 13:23, Roger Pau Monné wrote:
> > >>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
> > >>>> Hi Roger,
> > >>>>
> > >>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
> > >>>> In the guest I get:
> > >>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
> > >>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
> > >>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
> > >>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
> > >>>>
> > >>>> Bisection turned up as the culprit: 
> > >>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> > >>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
> > >>>
> > >>> Sorry to hear that, let see if we can figure out what's wrong.
> > >>
> > >> No problem, that is why I test stuff :)
> > >>
> > >>>> I verified by reverting that commit and now it works fine again.
> > >>>
> > >>> Does the same controller work fine when used in dom0?
> > >>
> > >> Will test that, but as all other pci devices in dom0 work fine,
> > >> I assume this controller would also work fine in dom0 (as it has also
> > >> worked fine for ages with PCI-passthrough to that guest and still works
> > >> fine when reverting the referenced commit).
> > > 
> > > Is this the only device that fails to work when doing pci-passthrough,
> > > or other devices also don't work with the mentioned change applied?
> > > 
> > > Have you tested on other boxes?
> > > 
> > >> I don't know if your change can somehow have a side effect
> > >> on latency around the processing of pci-passthrough ?
> > > 
> > > Hm, the mentioned commit should speed up broadcast IPIs, but I don't
> > > see how it could slow down other interrupts. Also I would think the
> > > domain is not receiving interrupts from the device, rather than
> > > interrupts being slow.
> > > 
> > > Can you also paste the output of lspci -v for that xHCI device from
> > > dom0?
> > > 
> > > Thanks, Roger.
> > 
> > Will do this evening including the testing in dom0 etc.
> > Will also see if there is any pattern when observing /proc/interrupts in
> > the guest.
> 
> Thanks! I also have some trivial patch that I would like you to try,
> just to discard send_IPI_mask clearing the scratch_cpumask under
> another function feet.
> 
> Roger.
> ---
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..aeeb506155 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>  void send_IPI_mask(const cpumask_t *mask, int vector)
>  {
>      bool cpus_locked = false;
> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> +    static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask);
> +    cpumask_t *scratch = &this_cpu(send_ipi_cpumask);
>  
>      /*
>       * This can only be safely used when no CPU hotplug or unplug operations
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-05 10:23           ` Roger Pau Monné
@ 2020-02-05 11:03             ` Sander Eikelenboom
  2020-02-05 11:18               ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-05 11:03 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

Hi Roger,

Sorry, I  haven't been able to follow up on testing yet.
(I have some longer running task for which I need some services on the box, 
so testing and rebooting is needed.)
Could be tomorrow, but could also be this weekend before I will come around to
the testing and reporting back.

--
Sander


On 05/02/2020 11:23, Roger Pau Monné wrote:
> Ping?
> 
> On Mon, Feb 03, 2020 at 02:21:08PM +0100, Roger Pau Monné wrote:
>> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
>>> On 03/02/2020 13:41, Roger Pau Monné wrote:
>>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>>>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>>>>> Hi Roger,
>>>>>>>
>>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>>>>>>> In the guest I get:
>>>>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>>>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>>>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>>>>
>>>>>>> Bisection turned up as the culprit: 
>>>>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>>
>>>>>> Sorry to hear that, let see if we can figure out what's wrong.
>>>>>
>>>>> No problem, that is why I test stuff :)
>>>>>
>>>>>>> I verified by reverting that commit and now it works fine again.
>>>>>>
>>>>>> Does the same controller work fine when used in dom0?
>>>>>
>>>>> Will test that, but as all other pci devices in dom0 work fine,
>>>>> I assume this controller would also work fine in dom0 (as it has also
>>>>> worked fine for ages with PCI-passthrough to that guest and still works
>>>>> fine when reverting the referenced commit).
>>>>
>>>> Is this the only device that fails to work when doing pci-passthrough,
>>>> or other devices also don't work with the mentioned change applied?
>>>>
>>>> Have you tested on other boxes?
>>>>
>>>>> I don't know if your change can somehow have a side effect
>>>>> on latency around the processing of pci-passthrough ?
>>>>
>>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
>>>> see how it could slow down other interrupts. Also I would think the
>>>> domain is not receiving interrupts from the device, rather than
>>>> interrupts being slow.
>>>>
>>>> Can you also paste the output of lspci -v for that xHCI device from
>>>> dom0?
>>>>
>>>> Thanks, Roger.
>>>
>>> Will do this evening including the testing in dom0 etc.
>>> Will also see if there is any pattern when observing /proc/interrupts in
>>> the guest.
>>
>> Thanks! I also have some trivial patch that I would like you to try,
>> just to discard send_IPI_mask clearing the scratch_cpumask under
>> another function feet.
>>
>> Roger.
>> ---
>> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
>> index 65eb7cbda8..aeeb506155 100644
>> --- a/xen/arch/x86/smp.c
>> +++ b/xen/arch/x86/smp.c
>> @@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>>  void send_IPI_mask(const cpumask_t *mask, int vector)
>>  {
>>      bool cpus_locked = false;
>> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
>> +    static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask);
>> +    cpumask_t *scratch = &this_cpu(send_ipi_cpumask);
>>  
>>      /*
>>       * This can only be safely used when no CPU hotplug or unplug operations
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xenproject.org
>> https://lists.xenproject.org/mailman/listinfo/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-05 11:03             ` Sander Eikelenboom
@ 2020-02-05 11:18               ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-05 11:18 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Wed, Feb 05, 2020 at 12:03:00PM +0100, Sander Eikelenboom wrote:
> Hi Roger,
> 
> Sorry, I  haven't been able to follow up on testing yet.
> (I have some longer running task for which I need some services on the box, 
> so testing and rebooting is needed.)
> Could be tomorrow, but could also be this weekend before I will come around to
> the testing and reporting back.

Ack no problem. I just wanted to make sure this is not forgotten :).

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-03 13:21         ` Roger Pau Monné
  2020-02-05 10:23           ` Roger Pau Monné
@ 2020-02-10 20:49           ` Sander Eikelenboom
  2020-02-11 14:00             ` Roger Pau Monné
  1 sibling, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-10 20:49 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 03/02/2020 14:21, Roger Pau Monné wrote:
> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 13:41, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>>>> Hi Roger,
>>>>>>
>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>>>>>> In the guest I get:
>>>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>>>
>>>>>> Bisection turned up as the culprit: 
>>>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>
>>>>> Sorry to hear that, let see if we can figure out what's wrong.
>>>>
>>>> No problem, that is why I test stuff :)
>>>>
>>>>>> I verified by reverting that commit and now it works fine again.
>>>>>
>>>>> Does the same controller work fine when used in dom0?
>>>>
>>>> Will test that, but as all other pci devices in dom0 work fine,
>>>> I assume this controller would also work fine in dom0 (as it has also
>>>> worked fine for ages with PCI-passthrough to that guest and still works
>>>> fine when reverting the referenced commit).
>>>
>>> Is this the only device that fails to work when doing pci-passthrough,
>>> or other devices also don't work with the mentioned change applied?
>>>
>>> Have you tested on other boxes?
>>>
>>>> I don't know if your change can somehow have a side effect
>>>> on latency around the processing of pci-passthrough ?
>>>
>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
>>> see how it could slow down other interrupts. Also I would think the
>>> domain is not receiving interrupts from the device, rather than
>>> interrupts being slow.
>>>
>>> Can you also paste the output of lspci -v for that xHCI device from
>>> dom0?
>>>
>>> Thanks, Roger.
>>
>> Will do this evening including the testing in dom0 etc.
>> Will also see if there is any pattern when observing /proc/interrupts in
>> the guest.
> 
> Thanks! I also have some trivial patch that I would like you to try,
> just to discard send_IPI_mask clearing the scratch_cpumask under
> another function feet.
> 
> Roger.

Hi Roger,

Took a while, but I was able to run some tests now.

I also forgot a detail in the first report (probably still a bit tired from FOSDEM), 
namely: the device passedthrough works OK for a while before I get the kernel message.

I tested the patch and it looks like it makes the issue go away,
I tested for a day, while without the patch (or revert of the commit) the device
will give problems within a few hours.

lspci output from dom0 for this device is below.

--
Sander




lspci -vvvknn -s 08:00.0
08:00.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03) (prog-if 30 [XHCI])
	Subsystem: ASUSTeK Computer Inc. P8P67 Deluxe Motherboard [1043:8413]
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 37
	NUMA node: 0
	Region 0: Memory at f9afe000 (64-bit, non-prefetchable) [size=8K]
	Capabilities: [50] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=0 offset=00001000
		PBA: BAR=0 offset=00001080
	Capabilities: [a0] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Capabilities: [140 v1] Device Serial Number ff-ff-ff-ff-ff-ff-ff-ff
	Capabilities: [150 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Kernel driver in use: pciback




> ---
> diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
> index 65eb7cbda8..aeeb506155 100644
> --- a/xen/arch/x86/smp.c
> +++ b/xen/arch/x86/smp.c
> @@ -66,7 +66,8 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
>  void send_IPI_mask(const cpumask_t *mask, int vector)
>  {
>      bool cpus_locked = false;
> -    cpumask_t *scratch = this_cpu(scratch_cpumask);
> +    static DEFINE_PER_CPU(cpumask_t, send_ipi_cpumask);
> +    cpumask_t *scratch = &this_cpu(send_ipi_cpumask);
>  
>      /*
>       * This can only be safely used when no CPU hotplug or unplug operations
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-10 20:49           ` Sander Eikelenboom
@ 2020-02-11 14:00             ` Roger Pau Monné
  2020-02-12  8:46               ` Sander Eikelenboom
  0 siblings, 1 reply; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-11 14:00 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Mon, Feb 10, 2020 at 09:49:30PM +0100, Sander Eikelenboom wrote:
> On 03/02/2020 14:21, Roger Pau Monné wrote:
> > On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
> >> On 03/02/2020 13:41, Roger Pau Monné wrote:
> >>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
> >>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
> >>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
> >>>>>> Hi Roger,
> >>>>>>
> >>>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
> >>>>>> In the guest I get:
> >>>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
> >>>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
> >>>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
> >>>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
> >>>>>>
> >>>>>> Bisection turned up as the culprit: 
> >>>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
> >>>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
> >>>>>
> >>>>> Sorry to hear that, let see if we can figure out what's wrong.
> >>>>
> >>>> No problem, that is why I test stuff :)
> >>>>
> >>>>>> I verified by reverting that commit and now it works fine again.
> >>>>>
> >>>>> Does the same controller work fine when used in dom0?
> >>>>
> >>>> Will test that, but as all other pci devices in dom0 work fine,
> >>>> I assume this controller would also work fine in dom0 (as it has also
> >>>> worked fine for ages with PCI-passthrough to that guest and still works
> >>>> fine when reverting the referenced commit).
> >>>
> >>> Is this the only device that fails to work when doing pci-passthrough,
> >>> or other devices also don't work with the mentioned change applied?
> >>>
> >>> Have you tested on other boxes?
> >>>
> >>>> I don't know if your change can somehow have a side effect
> >>>> on latency around the processing of pci-passthrough ?
> >>>
> >>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
> >>> see how it could slow down other interrupts. Also I would think the
> >>> domain is not receiving interrupts from the device, rather than
> >>> interrupts being slow.
> >>>
> >>> Can you also paste the output of lspci -v for that xHCI device from
> >>> dom0?
> >>>
> >>> Thanks, Roger.
> >>
> >> Will do this evening including the testing in dom0 etc.
> >> Will also see if there is any pattern when observing /proc/interrupts in
> >> the guest.
> > 
> > Thanks! I also have some trivial patch that I would like you to try,
> > just to discard send_IPI_mask clearing the scratch_cpumask under
> > another function feet.
> > 
> > Roger.
> 
> Hi Roger,
> 
> Took a while, but I was able to run some tests now.
> 
> I also forgot a detail in the first report (probably still a bit tired from FOSDEM), 
> namely: the device passedthrough works OK for a while before I get the kernel message.
> 
> I tested the patch and it looks like it makes the issue go away,
> I tested for a day, while without the patch (or revert of the commit) the device
> will give problems within a few hours.

Thanks, I have another patch for you to try, which will likely make
your system crash. Could you give it a try and paste the log output?

Thanks, Roger.
---8<---
commit 909880219efc4fe3c25536454d04f07bfe61e3b1
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Tue Feb 11 11:14:48 2020 +0100

    x86: add accessors for scratch cpu mask
    
    Current usage of the per-CPU scratch cpumask is dangerous since
    there's no way to figure out if the mask is already being used except
    for manual code inspection of all the callers and possible call paths.
    
    This is unsafe and not reliable, so introduce a minimal get/put
    infrastructure to prevent nested usage of the scratch mask.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..4ee261b632 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
     entry.vector = vector;
 
     if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     } else {
         printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
                irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
 
     if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     }
     else
     {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..7ecf5376e3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
     unsigned int cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
-    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+    cpumask_t *tmp_mask = get_scratch_cpumask();
 
     BUG_ON(!valid_irq_vector(vector));
 
@@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
     trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
 
     if ( likely(!desc->arch.move_in_progress) )
+    {
+        put_scratch_cpumask();
         return;
+    }
 
     /* If we were in motion, also clear desc->arch.old_vector */
     old_vector = desc->arch.old_vector;
@@ -236,6 +239,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
+    put_scratch_cpumask();
     release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
@@ -1152,10 +1156,11 @@ static void irq_guest_eoi_timer_fn(void *data)
         break;
 
     case ACKTYPE_EOI:
-        cpu_eoi_map = this_cpu(scratch_cpumask);
+        cpu_eoi_map = get_scratch_cpumask();
         cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
         spin_unlock_irq(&desc->lock);
         on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+        put_scratch_cpumask();
         return;
     }
 
@@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
     unsigned int irq;
     static int warned;
     struct irq_desc *desc;
+    cpumask_t *affinity = get_scratch_cpumask();
 
     for ( irq = 0; irq < nr_irqs; irq++ )
     {
         bool break_affinity = false, set_affinity = true;
         unsigned int vector;
-        cpumask_t *affinity = this_cpu(scratch_cpumask);
 
         if ( irq == 2 )
             continue;
@@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
                    irq, CPUMASK_PR(affinity));
     }
 
+    put_scratch_cpumask();
+
     /* That doesn't seem sufficient.  Give it 1ms. */
     local_irq_enable();
     mdelay(1);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9b33829084..bded19717b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1271,7 +1271,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             struct vcpu *v;
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             cpumask_clear(mask);
 
@@ -1288,6 +1288,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 
             if ( !cpumask_empty(mask) )
                 flush_tlb_mask(mask);
+            put_scratch_cpumask();
         }
 #endif /* CONFIG_PV_LDT_PAGING */
         put_page(page);
@@ -2912,7 +2913,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                  * vital that no other CPUs are left with mappings of a frame
                  * which is about to become writeable to the guest.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 BUG_ON(in_irq());
                 cpumask_copy(mask, d->dirty_cpumask);
@@ -2928,6 +2929,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                     perfc_incr(need_flush_tlb_flush);
                     flush_tlb_mask(mask);
                 }
+                put_scratch_cpumask();
 
                 /* We lose existing type and validity. */
                 nx &= ~(PGT_type_mask | PGT_validated);
@@ -3644,7 +3646,7 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             if ( unlikely(currd != pg_owner) )
                 rc = -EPERM;
@@ -3654,12 +3656,17 @@ long do_mmuext_op(
                                    mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
+            {
+                put_scratch_cpumask();
                 break;
+            }
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
                 flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
                 flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            put_scratch_cpumask();
+
             break;
         }
 
@@ -3692,7 +3699,7 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(currd)) )
             {
                 unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 cpumask_clear(mask);
                 for_each_online_cpu(cpu)
@@ -3700,6 +3707,7 @@ long do_mmuext_op(
                                              per_cpu(cpu_sibling_mask, cpu)) )
                         __cpumask_set_cpu(cpu, mask);
                 flush_mask(mask, FLUSH_CACHE);
+                put_scratch_cpumask();
             }
             else
                 rc = -EINVAL;
@@ -4165,12 +4173,13 @@ long do_mmu_update(
          * Force other vCPU-s of the affected guest to pick up L4 entry
          * changes (if any).
          */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
+        cpumask_t *mask = get_scratch_cpumask();
 
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
+        cpumask_andnot(mask, pt_owner->dirty_cpumask,
+                       cpumask_of(smp_processor_id()));
         if ( !cpumask_empty(mask) )
             flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
+        put_scratch_cpumask();
     }
 
     perfc_add(num_page_updates, i);
@@ -4361,7 +4370,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4381,7 +4390,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4392,6 +4401,9 @@ static int __do_update_va_mapping(
         break;
     }
 
+    if ( mask && mask != d->dirty_cpumask )
+        put_scratch_cpumask();
+
     return rc;
 }
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index c85cf9f85a..1ec1cc51d3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask;
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
 
+        mask = get_scratch_cpumask();
         cpumask_and(mask, cpu_mask, &cpu_online_map);
         msg->dest32 = cpu_mask_to_apicid(mask);
+        put_scratch_cpumask();
     }
 
     msg->address_hi = MSI_ADDR_BASE_HI;
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 1aa55d41e1..b994488d9f 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,21 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+static inline cpumask_t *scratch_cpumask(const char *fn)
+{
+    static DEFINE_PER_CPU(const char *, scratch_cpumask_use);
+
+    if ( fn && unlikely(this_cpu(scratch_cpumask_use)) )
+        panic("scratch CPU mask already in use by %s\n",
+              this_cpu(scratch_cpumask_use));
+    this_cpu(scratch_cpumask_use) = fn;
+
+    return fn ? this_cpu(scratch_cpumask) : NULL;
+}
+
+#define get_scratch_cpumask() scratch_cpumask(__func__)
+#define put_scratch_cpumask() ((void)scratch_cpumask(NULL))
+
 /*
  * Do we, for platform reasons, need to actually keep CPUs online when we
  * would otherwise prefer them to be off?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-11 14:00             ` Roger Pau Monné
@ 2020-02-12  8:46               ` Sander Eikelenboom
  2020-02-12  9:10                 ` Roger Pau Monné
  0 siblings, 1 reply; 14+ messages in thread
From: Sander Eikelenboom @ 2020-02-12  8:46 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On 11/02/2020 15:00, Roger Pau Monné wrote:
> On Mon, Feb 10, 2020 at 09:49:30PM +0100, Sander Eikelenboom wrote:
>> On 03/02/2020 14:21, Roger Pau Monné wrote:
>>> On Mon, Feb 03, 2020 at 01:44:06PM +0100, Sander Eikelenboom wrote:
>>>> On 03/02/2020 13:41, Roger Pau Monné wrote:
>>>>> On Mon, Feb 03, 2020 at 01:30:55PM +0100, Sander Eikelenboom wrote:
>>>>>> On 03/02/2020 13:23, Roger Pau Monné wrote:
>>>>>>> On Mon, Feb 03, 2020 at 09:33:51AM +0100, Sander Eikelenboom wrote:
>>>>>>>> Hi Roger,
>>>>>>>>
>>>>>>>> Last week I encountered an issue with the PCI-passthrough of a USB controller. 
>>>>>>>> In the guest I get:
>>>>>>>>     [ 1143.313756] xhci_hcd 0000:00:05.0: xHCI host not responding to stop endpoint command.
>>>>>>>>     [ 1143.334825] xhci_hcd 0000:00:05.0: xHCI host controller not responding, assume dead
>>>>>>>>     [ 1143.347364] xhci_hcd 0000:00:05.0: HC died; cleaning up
>>>>>>>>     [ 1143.356407] usb 1-2: USB disconnect, device number 2
>>>>>>>>
>>>>>>>> Bisection turned up as the culprit: 
>>>>>>>>    commit 5500d265a2a8fa63d60c08beb549de8ec82ff7a5
>>>>>>>>    x86/smp: use APIC ALLBUT destination shorthand when possible
>>>>>>>
>>>>>>> Sorry to hear that, let see if we can figure out what's wrong.
>>>>>>
>>>>>> No problem, that is why I test stuff :)
>>>>>>
>>>>>>>> I verified by reverting that commit and now it works fine again.
>>>>>>>
>>>>>>> Does the same controller work fine when used in dom0?
>>>>>>
>>>>>> Will test that, but as all other pci devices in dom0 work fine,
>>>>>> I assume this controller would also work fine in dom0 (as it has also
>>>>>> worked fine for ages with PCI-passthrough to that guest and still works
>>>>>> fine when reverting the referenced commit).
>>>>>
>>>>> Is this the only device that fails to work when doing pci-passthrough,
>>>>> or other devices also don't work with the mentioned change applied?
>>>>>
>>>>> Have you tested on other boxes?
>>>>>
>>>>>> I don't know if your change can somehow have a side effect
>>>>>> on latency around the processing of pci-passthrough ?
>>>>>
>>>>> Hm, the mentioned commit should speed up broadcast IPIs, but I don't
>>>>> see how it could slow down other interrupts. Also I would think the
>>>>> domain is not receiving interrupts from the device, rather than
>>>>> interrupts being slow.
>>>>>
>>>>> Can you also paste the output of lspci -v for that xHCI device from
>>>>> dom0?
>>>>>
>>>>> Thanks, Roger.
>>>>
>>>> Will do this evening including the testing in dom0 etc.
>>>> Will also see if there is any pattern when observing /proc/interrupts in
>>>> the guest.
>>>
>>> Thanks! I also have some trivial patch that I would like you to try,
>>> just to discard send_IPI_mask clearing the scratch_cpumask under
>>> another function feet.
>>>
>>> Roger.
>>
>> Hi Roger,
>>
>> Took a while, but I was able to run some tests now.
>>
>> I also forgot a detail in the first report (probably still a bit tired from FOSDEM), 
>> namely: the device passedthrough works OK for a while before I get the kernel message.
>>
>> I tested the patch and it looks like it makes the issue go away,
>> I tested for a day, while without the patch (or revert of the commit) the device
>> will give problems within a few hours.
> 
> Thanks, I have another patch for you to try, which will likely make
> your system crash. Could you give it a try and paste the log output?
> 
> Thanks, Roger.

Applied the patch, rebuild, rebooted and braced for impact ...
However the device bugged again, but no xen panic occured, so nothing
special in the logs.
I only had time to try it once, so I could retry this evening.

--
Sander




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible
  2020-02-12  8:46               ` Sander Eikelenboom
@ 2020-02-12  9:10                 ` Roger Pau Monné
  0 siblings, 0 replies; 14+ messages in thread
From: Roger Pau Monné @ 2020-02-12  9:10 UTC (permalink / raw)
  To: Sander Eikelenboom; +Cc: xen-devel

On Wed, Feb 12, 2020 at 09:46:22AM +0100, Sander Eikelenboom wrote:
> On 11/02/2020 15:00, Roger Pau Monné wrote:
> > Thanks, I have another patch for you to try, which will likely make
> > your system crash. Could you give it a try and paste the log output?
> > 
> > Thanks, Roger.
> 
> Applied the patch, rebuild, rebooted and braced for impact ...
> However the device bugged again, but no xen panic occured, so nothing
> special in the logs.
> I only had time to try it once, so I could retry this evening.

Sorry, that's my fault because I gave you a patch that was missing a
chunk, the following should hopefully trigger the panic. Would you
mind trying again?

Thanks, Roger.
---8<---
commit 9bd7ee8fa836690087f3eef89d24aded0c8cd8ae
Author: Roger Pau Monne <roger.pau@citrix.com>
Date:   Tue Feb 11 11:14:48 2020 +0100

    x86: add accessors for scratch cpu mask
    
    Current usage of the per-CPU scratch cpumask is dangerous since
    there's no way to figure out if the mask is already being used except
    for manual code inspection of all the callers and possible call paths.
    
    This is unsafe and not reliable, so introduce a minimal get/put
    infrastructure to prevent nested usage of the scratch mask.
    
    Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..4ee261b632 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2236,10 +2236,11 @@ int io_apic_set_pci_routing (int ioapic, int pin, int irq, int edge_level, int a
     entry.vector = vector;
 
     if (cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS)) {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(entry, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     } else {
         printk(XENLOG_ERR "IRQ%d: no target CPU (%*pb vs %*pb)\n",
                irq, CPUMASK_PR(desc->arch.cpu_mask), CPUMASK_PR(TARGET_CPUS));
@@ -2433,10 +2434,11 @@ int ioapic_guest_write(unsigned long physbase, unsigned int reg, u32 val)
 
     if ( cpumask_intersects(desc->arch.cpu_mask, TARGET_CPUS) )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask = get_scratch_cpumask();
 
         cpumask_and(mask, desc->arch.cpu_mask, TARGET_CPUS);
         SET_DEST(rte, logical, cpu_mask_to_apicid(mask));
+        put_scratch_cpumask();
     }
     else
     {
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index cc2eb8e925..7ecf5376e3 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -196,7 +196,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
 {
     unsigned int cpu, old_vector, irq = desc->irq;
     unsigned int vector = desc->arch.vector;
-    cpumask_t *tmp_mask = this_cpu(scratch_cpumask);
+    cpumask_t *tmp_mask = get_scratch_cpumask();
 
     BUG_ON(!valid_irq_vector(vector));
 
@@ -223,7 +223,10 @@ static void _clear_irq_vector(struct irq_desc *desc)
     trace_irq_mask(TRC_HW_IRQ_CLEAR_VECTOR, irq, vector, tmp_mask);
 
     if ( likely(!desc->arch.move_in_progress) )
+    {
+        put_scratch_cpumask();
         return;
+    }
 
     /* If we were in motion, also clear desc->arch.old_vector */
     old_vector = desc->arch.old_vector;
@@ -236,6 +239,7 @@ static void _clear_irq_vector(struct irq_desc *desc)
         per_cpu(vector_irq, cpu)[old_vector] = ~irq;
     }
 
+    put_scratch_cpumask();
     release_old_vec(desc);
 
     desc->arch.move_in_progress = 0;
@@ -1152,10 +1156,11 @@ static void irq_guest_eoi_timer_fn(void *data)
         break;
 
     case ACKTYPE_EOI:
-        cpu_eoi_map = this_cpu(scratch_cpumask);
+        cpu_eoi_map = get_scratch_cpumask();
         cpumask_copy(cpu_eoi_map, action->cpu_eoi_map);
         spin_unlock_irq(&desc->lock);
         on_selected_cpus(cpu_eoi_map, set_eoi_ready, desc, 0);
+        put_scratch_cpumask();
         return;
     }
 
@@ -2531,12 +2536,12 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
     unsigned int irq;
     static int warned;
     struct irq_desc *desc;
+    cpumask_t *affinity = get_scratch_cpumask();
 
     for ( irq = 0; irq < nr_irqs; irq++ )
     {
         bool break_affinity = false, set_affinity = true;
         unsigned int vector;
-        cpumask_t *affinity = this_cpu(scratch_cpumask);
 
         if ( irq == 2 )
             continue;
@@ -2640,6 +2645,8 @@ void fixup_irqs(const cpumask_t *mask, bool verbose)
                    irq, CPUMASK_PR(affinity));
     }
 
+    put_scratch_cpumask();
+
     /* That doesn't seem sufficient.  Give it 1ms. */
     local_irq_enable();
     mdelay(1);
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9b33829084..bded19717b 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1271,7 +1271,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
              (l1e_owner == pg_owner) )
         {
             struct vcpu *v;
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             cpumask_clear(mask);
 
@@ -1288,6 +1288,7 @@ void put_page_from_l1e(l1_pgentry_t l1e, struct domain *l1e_owner)
 
             if ( !cpumask_empty(mask) )
                 flush_tlb_mask(mask);
+            put_scratch_cpumask();
         }
 #endif /* CONFIG_PV_LDT_PAGING */
         put_page(page);
@@ -2912,7 +2913,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                  * vital that no other CPUs are left with mappings of a frame
                  * which is about to become writeable to the guest.
                  */
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 BUG_ON(in_irq());
                 cpumask_copy(mask, d->dirty_cpumask);
@@ -2928,6 +2929,7 @@ static int _get_page_type(struct page_info *page, unsigned long type,
                     perfc_incr(need_flush_tlb_flush);
                     flush_tlb_mask(mask);
                 }
+                put_scratch_cpumask();
 
                 /* We lose existing type and validity. */
                 nx &= ~(PGT_type_mask | PGT_validated);
@@ -3644,7 +3646,7 @@ long do_mmuext_op(
         case MMUEXT_TLB_FLUSH_MULTI:
         case MMUEXT_INVLPG_MULTI:
         {
-            cpumask_t *mask = this_cpu(scratch_cpumask);
+            cpumask_t *mask = get_scratch_cpumask();
 
             if ( unlikely(currd != pg_owner) )
                 rc = -EPERM;
@@ -3654,12 +3656,17 @@ long do_mmuext_op(
                                    mask)) )
                 rc = -EINVAL;
             if ( unlikely(rc) )
+            {
+                put_scratch_cpumask();
                 break;
+            }
 
             if ( op.cmd == MMUEXT_TLB_FLUSH_MULTI )
                 flush_tlb_mask(mask);
             else if ( __addr_ok(op.arg1.linear_addr) )
                 flush_tlb_one_mask(mask, op.arg1.linear_addr);
+            put_scratch_cpumask();
+
             break;
         }
 
@@ -3692,7 +3699,7 @@ long do_mmuext_op(
             else if ( likely(cache_flush_permitted(currd)) )
             {
                 unsigned int cpu;
-                cpumask_t *mask = this_cpu(scratch_cpumask);
+                cpumask_t *mask = get_scratch_cpumask();
 
                 cpumask_clear(mask);
                 for_each_online_cpu(cpu)
@@ -3700,6 +3707,7 @@ long do_mmuext_op(
                                              per_cpu(cpu_sibling_mask, cpu)) )
                         __cpumask_set_cpu(cpu, mask);
                 flush_mask(mask, FLUSH_CACHE);
+                put_scratch_cpumask();
             }
             else
                 rc = -EINVAL;
@@ -4165,12 +4173,13 @@ long do_mmu_update(
          * Force other vCPU-s of the affected guest to pick up L4 entry
          * changes (if any).
          */
-        unsigned int cpu = smp_processor_id();
-        cpumask_t *mask = per_cpu(scratch_cpumask, cpu);
+        cpumask_t *mask = get_scratch_cpumask();
 
-        cpumask_andnot(mask, pt_owner->dirty_cpumask, cpumask_of(cpu));
+        cpumask_andnot(mask, pt_owner->dirty_cpumask,
+                       cpumask_of(smp_processor_id()));
         if ( !cpumask_empty(mask) )
             flush_mask(mask, FLUSH_TLB_GLOBAL | FLUSH_ROOT_PGTBL);
+        put_scratch_cpumask();
     }
 
     perfc_add(num_page_updates, i);
@@ -4361,7 +4370,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4381,7 +4390,7 @@ static int __do_update_va_mapping(
             mask = d->dirty_cpumask;
             break;
         default:
-            mask = this_cpu(scratch_cpumask);
+            mask = get_scratch_cpumask();
             rc = vcpumask_to_pcpumask(d, const_guest_handle_from_ptr(bmap_ptr,
                                                                      void),
                                       mask);
@@ -4392,6 +4401,9 @@ static int __do_update_va_mapping(
         break;
     }
 
+    if ( mask && mask != d->dirty_cpumask )
+        put_scratch_cpumask();
+
     return rc;
 }
 
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index c85cf9f85a..1ec1cc51d3 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -159,13 +159,15 @@ void msi_compose_msg(unsigned vector, const cpumask_t *cpu_mask, struct msi_msg
 
     if ( cpu_mask )
     {
-        cpumask_t *mask = this_cpu(scratch_cpumask);
+        cpumask_t *mask;
 
         if ( !cpumask_intersects(cpu_mask, &cpu_online_map) )
             return;
 
+        mask = get_scratch_cpumask();
         cpumask_and(mask, cpu_mask, &cpu_online_map);
         msg->dest32 = cpu_mask_to_apicid(mask);
+        put_scratch_cpumask();
     }
 
     msg->address_hi = MSI_ADDR_BASE_HI;
diff --git a/xen/arch/x86/smp.c b/xen/arch/x86/smp.c
index 9bc925616a..1e5a0c6331 100644
--- a/xen/arch/x86/smp.c
+++ b/xen/arch/x86/smp.c
@@ -67,7 +67,7 @@ static void send_IPI_shortcut(unsigned int shortcut, int vector,
 void send_IPI_mask(const cpumask_t *mask, int vector)
 {
     bool cpus_locked = false;
-    cpumask_t *scratch = this_cpu(scratch_cpumask);
+    cpumask_t *scratch = get_scratch_cpumask();
 
     /*
      * This can only be safely used when no CPU hotplug or unplug operations
@@ -99,6 +99,7 @@ void send_IPI_mask(const cpumask_t *mask, int vector)
 
     if ( cpus_locked )
         put_cpu_maps();
+    put_scratch_cpumask();
 }
 
 void send_IPI_self(int vector)
diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h
index 1aa55d41e1..b994488d9f 100644
--- a/xen/include/asm-x86/smp.h
+++ b/xen/include/asm-x86/smp.h
@@ -26,6 +26,21 @@ DECLARE_PER_CPU(cpumask_var_t, cpu_sibling_mask);
 DECLARE_PER_CPU(cpumask_var_t, cpu_core_mask);
 DECLARE_PER_CPU(cpumask_var_t, scratch_cpumask);
 
+static inline cpumask_t *scratch_cpumask(const char *fn)
+{
+    static DEFINE_PER_CPU(const char *, scratch_cpumask_use);
+
+    if ( fn && unlikely(this_cpu(scratch_cpumask_use)) )
+        panic("scratch CPU mask already in use by %s\n",
+              this_cpu(scratch_cpumask_use));
+    this_cpu(scratch_cpumask_use) = fn;
+
+    return fn ? this_cpu(scratch_cpumask) : NULL;
+}
+
+#define get_scratch_cpumask() scratch_cpumask(__func__)
+#define put_scratch_cpumask() ((void)scratch_cpumask(NULL))
+
 /*
  * Do we, for platform reasons, need to actually keep CPUs online when we
  * would otherwise prefer them to be off?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-02-12  9:11 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03  8:33 [Xen-devel] Xen-unstable: pci-passthrough regression bisected to: x86/smp: use APIC ALLBUT destination shorthand when possible Sander Eikelenboom
2020-02-03 12:23 ` Roger Pau Monné
2020-02-03 12:30   ` Sander Eikelenboom
2020-02-03 12:41     ` Roger Pau Monné
2020-02-03 12:44       ` Sander Eikelenboom
2020-02-03 13:21         ` Roger Pau Monné
2020-02-05 10:23           ` Roger Pau Monné
2020-02-05 11:03             ` Sander Eikelenboom
2020-02-05 11:18               ` Roger Pau Monné
2020-02-10 20:49           ` Sander Eikelenboom
2020-02-11 14:00             ` Roger Pau Monné
2020-02-12  8:46               ` Sander Eikelenboom
2020-02-12  9:10                 ` Roger Pau Monné
2020-02-03 12:49       ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).