linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: "do_IRQ: 0.39 No irq handler for vector" from a 16550 port
@ 2018-11-02 10:58 Holger Schurig
  2018-11-02 15:15 ` Holger Schurig
  0 siblings, 1 reply; 3+ messages in thread
From: Holger Schurig @ 2018-11-02 10:58 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Ingo Molnar, x86

Hi all,

I have a weird bug on systems that uses Haswell Architecture and "real"
serial ports /dev/ttyS*.


Hardware: some embedded device with "Intel(R) Celeron(R) 2980U @
1.60GHz", I tried with microcode 0x23 and 0x24. Also on a HP Elite 840
G1". Both have Haswell architecture.

I can plug a different CPU module into the embedded device, then I have
an "Intel(R) Atom(TM) CPU N455 @ 1.66GHz", obviously no Haswell. With
identical kernel, I don't get the same error.



Kernel: happens with distro kernels (Debian, Ubuntu, Fedora). Common
factor seems that the kernels are >= 4.9.x. But also with upstream
stable kernels, I used 4.13.x, 4.14.x, 4.18.x, even with 4.18.16.




The embedded device also behaves strange (e.g. I had once MCEs with a
32bit kernel, which went away when using a 64bit kernel). We also
sometimes get an error in AUFS with the same timestamp as the
do_IRQ-message. I don't understand what AUFS has to do with hardware
interrupts. However, I don't want to concentrate on this yet, I think
that strange message in a mainland kernel in itself is worthwhile to be
tracked. If some interrupt get's haywire, there is certainly the chance
that some memory get's corrupted. Also, this might be something totally
different, because the HP Elite doesn't show this. Also, the MCE went
away after switching from 32bit kernel to 64bit kernel.

So, let's return to the better reproducible "do_IRQ: 0.39 No irq handler
for vector".


I'm happy that I found a way to reproduce it: the message triggers when
I close the serial port. printk's indicate that after the IER is
cleared, and even after synchronize_irq() in serial8250_do_shutdown()
the error happens.

Sometimes even a "stty </dev/ttyS1" is enough, because it already
opens/closes the port. But it happens only sometimes.

A better way is to use a tool called "stress-ng" in version with various
stressors. Some newer version (e.g. the one in Debian, 0.07.16-1) just
open all files in /dev, run an fstat() on them, and close them again.
All of this in a loop and very fast. This has the side-effect that
/ttyS* are opened/closed very fast. And that shows the error message
easily:

[    6.558244] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[   17.048154] fuse init (API version 7.27)
[   17.248215] do_IRQ: 0.39 No irq handler for vector
[   17.249622] do_IRQ: 0.39 No irq handler for vector
[   17.252415] do_IRQ: 0.39 No irq handler for vector
[   17.253698] do_IRQ: 0.39 No irq handler for vector
[   18.528774] do_IRQ: 0.39 No irq handler for vector
[   18.532305] do_IRQ: 0.39 No irq handler for vector
[   18.532540] do_IRQ: 0.39 No irq handler for vector
[   18.606916] do_IRQ: 0.39 No irq handler for vector
[   20.227241] random: crng init done

Here I did run stress-ng just for some seconds. Unfortunately, from
time to time the exact same setup makes the error scarce, e.g. it can
happen that we don't see the error for 15 minutes.

So when running this for a night I had between 1500 and 30000 of this
messages in my dmesg/journal.


One thing that I noticed is that "noapic=1" makes the error go away.

Also using the Atom cpu with the older architecture makes the error go
away, but that one is no EOL. :-(



Any advice on how to proceed further?


Greetings,
Holger

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: "do_IRQ: 0.39 No irq handler for vector" from a 16550 port
  2018-11-02 10:58 BUG: "do_IRQ: 0.39 No irq handler for vector" from a 16550 port Holger Schurig
@ 2018-11-02 15:15 ` Holger Schurig
  2018-11-04 10:19   ` Thomas Gleixner
  0 siblings, 1 reply; 3+ messages in thread
From: Holger Schurig @ 2018-11-02 15:15 UTC (permalink / raw)
  To: linux-kernel, Thomas Gleixner, Ingo Molnar, x86

I said that kernel 4.9 doesn't show the issue. The same was for later
kernels up to 4.13.

I had a compilation issue with 4.14 (which I later solved, something
unrelated with tools/objcopy when compiling for a different
architecture), so I did a git bisect between v4.13 and v4.15. This is
the outcome:

$ git bisect log
# bad: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
# good: [569dbb88e80deb68974ef6fdd6a13edb9d686261] Linux 4.13
git bisect start 'v4.15' 'v4.13'
# good: [425a08c67317acee103b3ad58f57c762e8834faf] mlxsw: spectrum_router: Prepare for large adjacency groups
git bisect good 425a08c67317acee103b3ad58f57c762e8834faf
# bad: [e60e1ee60630cafef5e430c2ae364877e061d980] Merge tag 'drm-for-v4.15' of git://people.freedesktop.org/~airlied/linux
git bisect bad e60e1ee60630cafef5e430c2ae364877e061d980
# bad: [4008e6a9bcee2f3b61bb11951de0fb0ed764cb91] Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux
git bisect bad 4008e6a9bcee2f3b61bb11951de0fb0ed764cb91
# bad: [3c073991eb417b6f785ddc6afbbdc369eb84aa6a] Merge tag 'devprop-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 3c073991eb417b6f785ddc6afbbdc369eb84aa6a
# good: [2101dd64b304b034862f5ca40877c41b7ccb9c5e] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
git bisect good 2101dd64b304b034862f5ca40877c41b7ccb9c5e
# good: [d6ec9d9a4def52a5094237564eaf6f6979fd7a27] Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d6ec9d9a4def52a5094237564eaf6f6979fd7a27
# good: [7d58e1c9059eefe0066c5acf2ffa582f6f0180e3] Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 7d58e1c9059eefe0066c5acf2ffa582f6f0180e3
# good: [990a848d537e4da966907c8ccec95bc568f2911c] Merge branches 'pm-devfreq' and 'pm-tools'
git bisect good 990a848d537e4da966907c8ccec95bc568f2911c
# bad: [25e960efc63852b84d1c3739aef586285b177395] PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code
git bisect bad 25e960efc63852b84d1c3739aef586285b177395
# skip: [029c6e1c9df776fe1b2ba756a28fb65e9f9e9f69] x86/vector: Store the single CPU targets in apic data
git bisect skip 029c6e1c9df776fe1b2ba756a28fb65e9f9e9f69

- no network
- no USB keyboard
- therefore "git bisect skip"

# bad: [d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c] x86/vector: Respect affinity mask in irq descriptor
git bisect bad d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c
# good: [ef9e56d894eab99a33a06b96ba8057afa67d3702] x86/ioapic: Remove obsolete post hotplug update
git bisect good ef9e56d894eab99a33a06b96ba8057afa67d3702
# good: [8d1e3dca7de6e8513872799a748a1d47d8dce60d] x86/vector: Add tracepoints for vector management
git bisect good 8d1e3dca7de6e8513872799a748a1d47d8dce60d
# good: [5ba204a1817ba95a7b24dbe8ef2c7ddd4cea886e] iommu/amd: Reevaluate vector configuration on activate()
git bisect good 5ba204a1817ba95a7b24dbe8ef2c7ddd4cea886e
# good: [4900be83602b6be07366d3e69f756c1959f4169a] x86/vector/msi: Switch to global reservation mode
git bisect good 4900be83602b6be07366d3e69f756c1959f4169a
# bad: [2cffad7bad83157f89332872015f4305d2ac09ac] x86/irq: Simplify hotplug vector accounting
git bisect bad 2cffad7bad83157f89332872015f4305d2ac09ac
# bad: [464d12309e1b5829597793db551ae8ecaecf4036] x86/vector: Switch IOAPIC to global reservation mode
git bisect bad 464d12309e1b5829597793db551ae8ecaecf4036
# first bad commit: [464d12309e1b5829597793db551ae8ecaecf4036] x86/vector: Switch IOAPIC to global reservation mode




Greetings,
Holger

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: "do_IRQ: 0.39 No irq handler for vector" from a 16550 port
  2018-11-02 15:15 ` Holger Schurig
@ 2018-11-04 10:19   ` Thomas Gleixner
  0 siblings, 0 replies; 3+ messages in thread
From: Thomas Gleixner @ 2018-11-04 10:19 UTC (permalink / raw)
  To: Holger Schurig; +Cc: linux-kernel, Ingo Molnar, x86

On Fri, 2 Nov 2018, Holger Schurig wrote:

> I said that kernel 4.9 doesn't show the issue. The same was for later
> kernels up to 4.13.
> 
> I had a compilation issue with 4.14 (which I later solved, something
> unrelated with tools/objcopy when compiling for a different
> architecture), so I did a git bisect between v4.13 and v4.15. This is
> the outcome:
>
> git bisect bad 464d12309e1b5829597793db551ae8ecaecf4036
> # first bad commit: [464d12309e1b5829597793db551ae8ecaecf4036] x86/vector: Switch IOAPIC to global reservation mode

Which is not surprising because the old model just hid the issue of
interrupts which came in after the interrupt had been torn down.

I have no idea what causes this, but it seems to be related to the
particular hardware/BIOS combination.

Thanks,

	tglx




^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-11-04 10:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-02 10:58 BUG: "do_IRQ: 0.39 No irq handler for vector" from a 16550 port Holger Schurig
2018-11-02 15:15 ` Holger Schurig
2018-11-04 10:19   ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).