On 24.11.20 15:59, Roger Pau Monné wrote: > On Tue, Nov 24, 2020 at 03:42:28PM +0100, Jan Beulich wrote: >> On 24.11.2020 11:05, Jan Beulich wrote: >>> On 23.11.2020 18:39, Manuel Bouyer wrote: >>>> On Mon, Nov 23, 2020 at 06:06:10PM +0100, Roger Pau Monné wrote: >>>>> OK, I'm afraid this is likely too verbose and messes with the timings. >>>>> >>>>> I've been looking (again) into the code, and I found something weird >>>>> that I think could be related to the issue you are seeing, but haven't >>>>> managed to try to boot the NetBSD kernel provided in order to assert >>>>> whether it solves the issue or not (or even whether I'm able to >>>>> repro it). Would you mind giving the patch below a try? >>>> >>>> With this, I get the same hang but XEN outputs don't wake up the interrupt >>>> any more. The NetBSD counter shows only one interrupt for ioapic2 pin 2, >>>> while I would have about 8 at the time of the hang. >>>> >>>> So, now it looks like interrupts are blocked forever. >>> >>> Which may be a good thing for debugging purposes, because now we have >>> a way to investigate what is actually blocking the interrupt's >>> delivery without having to worry about more output screwing the >>> overall picture. >>> >>>> At >>>> http://www-soc.lip6.fr/~bouyer/xen-log5.txt >>>> you'll find the output of the 'i' key. >>> >>> (XEN) IRQ: 34 vec:59 IO-APIC-level status=010 aff:{0}/{0-7} in-flight=1 d0: 34(-MM) >>> >>> (XEN) IRQ 34 Vec 89: >>> (XEN) Apic 0x02, Pin 2: vec=59 delivery=LoPri dest=L status=1 polarity=1 irr=1 trig=L mask=0 dest_id:00000001 >> >> Since it repeats in Manuel's latest dump, perhaps the odd combination >> of status=1 and irr=1 is to tell us something? It is my understanding >> that irr ought to become set only when delivery-status clears. Yet I >> don't know what to take from this... > > My reading of this is that one interrupt was accepted by the lapic > (irr=1) and that there's a further interrupt pending that hasn't yet > been accepted by the lapic (status=1) because it's still serving the > previous one. But that's all weird because there's no matching > vector in ISR, and hence the IRR bit on the IO-APIC has somehow become > stale or out of sync with the lapic state? > > I'm also unsure about how Xen has managed to reach this state, it > shouldn't be possible in the first place. > > I don't think I can instrument the paths further with printfs because > it's likely to result in the behavior itself changing and console > spamming. I could however create a static buffer to trace relevant > actions and then dump all them together with the 'i' debug key output. debugtrace is your friend here. It already has a debug key for printing the buffer contents to console ('T'). As the buffer is wrap-around you can even add debug prints in the related interrupt paths for finding out which paths have been called in which order and on which cpu. Depending on the findings you might want to use percpu buffers. Juergen