All of lore.kernel.org
 help / color / mirror / Atom feed
* netbsd PVH dom0: xen clock event stops
@ 2020-11-15 17:49 Manuel Bouyer
  2020-11-15 17:53 ` Manuel Bouyer
  2020-11-15 18:24 ` Roger Pau Monné
  0 siblings, 2 replies; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-15 17:49 UTC (permalink / raw)
  To: xen-devel

Hello,
I spent some more time debugging NetBSD as a PVH dom0 on Xen,
With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
configuring devices. At first I though it was an issue with hardware
interrupts, but it more likely is an issue with Xen timer events.
Specifically: virtual CPU 0 stops receiving timer events, while other
CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
Others events (the Xen console, the debug event) are properly received
by CPU0. I don't know how to debug this more at this point.

In case it helps, I put by Xen and netbsd kernels at
http://www-soc.lip6.fr/~bouyer/netbsd-dom0-pvh/
I boot it from the NetBSD boot loader with:
menu=Boot Xen PVH:load /netbsd-test console=com0 root=dk0 -vx; multiboot /xen-test.gz dom0_mem=1024M console=com2 com2=57600,8n1 loglvl=all guest_loglvl=all gnttab_max_nr_frames=64 dom0=pvh iommu=debug

I guess with grub this would be
kernel /xen-test.gz dom0_mem=1024M console=com2 com2=57600,8n1 loglvl=all guest_loglvl=all gnttab_max_nr_frames=64 dom0=pvh iommu=debug
module /netbsd-test console=com0 root=dk0 -vx

(yes, com2 for xen and com0 for netbsd, that's not a bug :)
You can enter the NetBSD debugger with
+++++
you can then enter commands, lile
sh ev /i
to see the interrupt counters

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-15 17:49 netbsd PVH dom0: xen clock event stops Manuel Bouyer
@ 2020-11-15 17:53 ` Manuel Bouyer
  2020-11-15 18:24 ` Roger Pau Monné
  1 sibling, 0 replies; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-15 17:53 UTC (permalink / raw)
  To: xen-devel

On Sun, Nov 15, 2020 at 06:49:38PM +0100, Manuel Bouyer wrote:
> Hello,
> I spent some more time debugging NetBSD as a PVH dom0 on Xen,
> With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
> configuring devices. At first I though it was an issue with hardware
> interrupts, but it more likely is an issue with Xen timer events.
> Specifically: virtual CPU 0 stops receiving timer events, while other
> CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
> The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
> Others events (the Xen console, the debug event) are properly received
> by CPU0. I don't know how to debug this more at this point.

I forgot to mention: the same NetBSD kernel boots fine as a PVH domU

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-15 17:49 netbsd PVH dom0: xen clock event stops Manuel Bouyer
  2020-11-15 17:53 ` Manuel Bouyer
@ 2020-11-15 18:24 ` Roger Pau Monné
  2020-11-15 18:37   ` Manuel Bouyer
  2020-11-16 18:22   ` Manuel Bouyer
  1 sibling, 2 replies; 9+ messages in thread
From: Roger Pau Monné @ 2020-11-15 18:24 UTC (permalink / raw)
  To: Manuel Bouyer; +Cc: xen-devel

On Sun, Nov 15, 2020 at 06:49:38PM +0100, Manuel Bouyer wrote:
> Hello,
> I spent some more time debugging NetBSD as a PVH dom0 on Xen,
> With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
> configuring devices. At first I though it was an issue with hardware
> interrupts, but it more likely is an issue with Xen timer events.
> Specifically: virtual CPU 0 stops receiving timer events, while other
> CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
> The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
> Others events (the Xen console, the debug event) are properly received
> by CPU0. I don't know how to debug this more at this point.

You could try to use dom0_vcpus_pin command line option and then dump
the timers using the 'a' debug key, this way you can see if CPU0 has a
timer pending (which would be the vCPU0 timer).

What timer is NetBSD using, is it the PV vCPU single shot timer, the
periodic one, or the emulated local APIC timer?

Depending on the timer you are trying to use I would recommend to add
some printks if needed.

Roger.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-15 18:24 ` Roger Pau Monné
@ 2020-11-15 18:37   ` Manuel Bouyer
  2020-11-16 18:22   ` Manuel Bouyer
  1 sibling, 0 replies; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-15 18:37 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Sun, Nov 15, 2020 at 07:24:16PM +0100, Roger Pau Monné wrote:
> On Sun, Nov 15, 2020 at 06:49:38PM +0100, Manuel Bouyer wrote:
> > Hello,
> > I spent some more time debugging NetBSD as a PVH dom0 on Xen,
> > With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
> > configuring devices. At first I though it was an issue with hardware
> > interrupts, but it more likely is an issue with Xen timer events.
> > Specifically: virtual CPU 0 stops receiving timer events, while other
> > CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
> > The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
> > Others events (the Xen console, the debug event) are properly received
> > by CPU0. I don't know how to debug this more at this point.
> 
> You could try to use dom0_vcpus_pin command line option and then dump
> the timers using the 'a' debug key, this way you can see if CPU0 has a
> timer pending (which would be the vCPU0 timer).
> 
> What timer is NetBSD using, is it the PV vCPU single shot timer, the
> periodic one, or the emulated local APIC timer?

It is the PV single shot timer, I guess. But used as a periodic
timer by rearming it in the handler:
        next = ci->ci_xen_hardclock_systime_ns + NS_PER_TICK;
	error = HYPERVISOR_set_timer_op(next);

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-15 18:24 ` Roger Pau Monné
  2020-11-15 18:37   ` Manuel Bouyer
@ 2020-11-16 18:22   ` Manuel Bouyer
  2020-11-17  9:02     ` Roger Pau Monné
  1 sibling, 1 reply; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-16 18:22 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Sun, Nov 15, 2020 at 07:24:16PM +0100, Roger Pau Monné wrote:
> On Sun, Nov 15, 2020 at 06:49:38PM +0100, Manuel Bouyer wrote:
> > Hello,
> > I spent some more time debugging NetBSD as a PVH dom0 on Xen,
> > With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
> > configuring devices. At first I though it was an issue with hardware
> > interrupts, but it more likely is an issue with Xen timer events.
> > Specifically: virtual CPU 0 stops receiving timer events, while other
> > CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
> > The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
> > Others events (the Xen console, the debug event) are properly received
> > by CPU0. I don't know how to debug this more at this point.
> 
> You could try to use dom0_vcpus_pin command line option and then dump
> the timers using the 'a' debug key, this way you can see if CPU0 has a
> timer pending (which would be the vCPU0 timer).

thanks, this helped. This was a bug in the NetBSD kernel, which would show
up only when there are enough physical device interrupts (which explains why
I didn't notice it on PVH domUs)

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-16 18:22   ` Manuel Bouyer
@ 2020-11-17  9:02     ` Roger Pau Monné
  2020-11-17  9:07       ` Manuel Bouyer
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2020-11-17  9:02 UTC (permalink / raw)
  To: Manuel Bouyer; +Cc: xen-devel

On Mon, Nov 16, 2020 at 07:22:11PM +0100, Manuel Bouyer wrote:
> On Sun, Nov 15, 2020 at 07:24:16PM +0100, Roger Pau Monné wrote:
> > On Sun, Nov 15, 2020 at 06:49:38PM +0100, Manuel Bouyer wrote:
> > > Hello,
> > > I spent some more time debugging NetBSD as a PVH dom0 on Xen,
> > > With Roger's patch to avoid a Xen panic, the NetBSD kernel stalls
> > > configuring devices. At first I though it was an issue with hardware
> > > interrupts, but it more likely is an issue with Xen timer events.
> > > Specifically: virtual CPU 0 stops receiving timer events, while other
> > > CPUs keep receiving them. I tried to force a timer rearm but this didn't help.
> > > The event is not masked nor pending on Xen or NetBSD, as confirmed by 'q'.
> > > Others events (the Xen console, the debug event) are properly received
> > > by CPU0. I don't know how to debug this more at this point.
> > 
> > You could try to use dom0_vcpus_pin command line option and then dump
> > the timers using the 'a' debug key, this way you can see if CPU0 has a
> > timer pending (which would be the vCPU0 timer).
> 
> thanks, this helped. This was a bug in the NetBSD kernel, which would show
> up only when there are enough physical device interrupts (which explains why
> I didn't notice it on PVH domUs)

Great! So all interrupts are working as expected now?

Roger.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-17  9:02     ` Roger Pau Monné
@ 2020-11-17  9:07       ` Manuel Bouyer
  2020-11-17  9:45         ` Roger Pau Monné
  0 siblings, 1 reply; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-17  9:07 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Tue, Nov 17, 2020 at 10:02:04AM +0100, Roger Pau Monné wrote:
> Great! So all interrupts are working as expected now?

No, I'm back at the problem where the PERC raid controller times out on
commands. I'm cleaing up my sources and will try to get more data
about this problem.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-17  9:07       ` Manuel Bouyer
@ 2020-11-17  9:45         ` Roger Pau Monné
  2020-11-17 10:18           ` Manuel Bouyer
  0 siblings, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2020-11-17  9:45 UTC (permalink / raw)
  To: Manuel Bouyer; +Cc: xen-devel

On Tue, Nov 17, 2020 at 10:07:33AM +0100, Manuel Bouyer wrote:
> On Tue, Nov 17, 2020 at 10:02:04AM +0100, Roger Pau Monné wrote:
> > Great! So all interrupts are working as expected now?
> 
> No, I'm back at the problem where the PERC raid controller times out on
> commands. I'm cleaing up my sources and will try to get more data
> about this problem.

OK, the output of the 'M' debug key might be helpful in that case to
see if the MSI-X entries are masked (IIRC you said this controller was
using MSIX).

Rogerr.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: netbsd PVH dom0: xen clock event stops
  2020-11-17  9:45         ` Roger Pau Monné
@ 2020-11-17 10:18           ` Manuel Bouyer
  0 siblings, 0 replies; 9+ messages in thread
From: Manuel Bouyer @ 2020-11-17 10:18 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel

On Tue, Nov 17, 2020 at 10:45:34AM +0100, Roger Pau Monné wrote:
> On Tue, Nov 17, 2020 at 10:07:33AM +0100, Manuel Bouyer wrote:
> > On Tue, Nov 17, 2020 at 10:02:04AM +0100, Roger Pau Monné wrote:
> > > Great! So all interrupts are working as expected now?
> > 
> > No, I'm back at the problem where the PERC raid controller times out on
> > commands. I'm cleaing up my sources and will try to get more data
> > about this problem.
> 
> OK, the output of the 'M' debug key might be helpful in that case to
> see if the MSI-X entries are masked (IIRC you said this controller was
> using MSIX).

No, this one is ioapic

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-17 10:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-15 17:49 netbsd PVH dom0: xen clock event stops Manuel Bouyer
2020-11-15 17:53 ` Manuel Bouyer
2020-11-15 18:24 ` Roger Pau Monné
2020-11-15 18:37   ` Manuel Bouyer
2020-11-16 18:22   ` Manuel Bouyer
2020-11-17  9:02     ` Roger Pau Monné
2020-11-17  9:07       ` Manuel Bouyer
2020-11-17  9:45         ` Roger Pau Monné
2020-11-17 10:18           ` Manuel Bouyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.