* Interrupt timeout
@ 2020-02-25 13:56 Lange Norbert
2020-02-25 15:24 ` Greg Gallagher
0 siblings, 1 reply; 9+ messages in thread
From: Lange Norbert @ 2020-02-25 13:56 UTC (permalink / raw)
To: Xenomai (xenomai@xenomai.org), Philippe Gerum (rpm@xenomai.org)
Hello,
I hope you can give me some pointers to understand why this Bug happened.
It seems an interrupt got lost somehow, maybe some issue with leveltriggers?
Note that I run on an Apollo Lake, which would normally use PINCTRL_BROXTON,
but that’s not fixed up for Xenomai yet. The system works fine from eMMC otherwise,
this bug occurred only once so far.
00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series SDXC/MMC Host Controller
Kernel driver in use: sdhci-pci
Kernel modules: sdhci_pci
00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series eMMC Controller
Kernel driver in use: sdhci-pci
Kernel modules: sdhci_pci
# cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 50 0 0 0 IO-APIC 2-edge timer
4: 0 0 498 0 IO-APIC 4-edge ttyS0
8: 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 IO-APIC 9-fasteoi acpi
20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-2
143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
NMI: 0 0 0 0 Non-maskable interrupts
LOC: 47839 147583 13807 32602 Local timer interrupts
SPU: 0 0 0 0 Spurious interrupts
PMI: 0 0 0 0 Performance monitoring interrupts
IWI: 0 0 0 0 IRQ work interrupts
RTR: 0 0 0 0 APIC ICR read retries
RES: 11955 6931 6567 4946 Rescheduling interrupts
CAL: 268 223 210 217 Function call interrupts
TLB: 63 57 61 50 TLB shootdowns
TRM: 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 Threshold APIC interrupts
MCE: 0 0 0 0 Machine check exceptions
MCP: 5 6 6 6 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 Posted-interrupt notification event
NPI: 0 0 0 0 Nested posted-interrupt event
PIW: 0 0 0 0 Posted-interrupt wakeup event
[ 238.245509] mmc0: Timeout waiting for hardware interrupt.
[ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
[ 238.263900] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
[ 238.270373] mmc0: sdhci: Argument: 0x00118678 | Trn mode: 0x0000002b
[ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
[ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
[ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
[ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
[ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
[ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
[ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
[ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
[ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d
[ 238.346016] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
[ 238.353180] mmc0: sdhci: ============================================
[ 248.485507] mmc0: Timeout waiting for hardware interrupt.
[ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
[ 248.503887] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
[ 248.510362] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000023
[ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
[ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
[ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
[ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
[ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
[ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
[ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
[ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
[ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d
[ 248.586065] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
[ 248.593228] mmc0: sdhci: ============================================
[ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 249.143462] rcu: 1-...!: (1 GPs behind) idle=84a/1/0x4000000000000000 softirq=5875/5876 fqs=2479
[ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
[ 249.158588] Sending NMI from CPU 3 to CPUs 1:
[ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
[ 249.174284] rcu: RCU grace-period kthread stack dump:
[ 249.179364] rcu_sched I 0 10 2 0x80000000
[ 249.184882] Call Trace:
[ 249.187360] ? __schedule+0x340/0x8a0
[ 249.191046] schedule+0x32/0x80
[ 249.194211] schedule_timeout+0x180/0x360
[ 249.198248] ? __next_timer_interrupt+0xd0/0xd0
[ 249.202806] rcu_gp_kthread+0x522/0xec0
[ 249.206667] ? call_rcu_sched+0x20/0x20
[ 249.210528] kthread+0x115/0x130
[ 249.213779] ? kthread_create_worker_on_cpu+0x70/0x70
[ 249.218861] ret_from_fork+0x20/0x30
[ 256.449655] igb 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down
[ 258.725508] mmc0: Timeout waiting for hardware cmd interrupt.
[ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
[ 258.744238] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
[ 258.750716] mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000023
[ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
[ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
[ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
[ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
[ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
[ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
[ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
[ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
[ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d
[ 258.826411] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
[ 258.833578] mmc0: sdhci: ============================================
[ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
[ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
[ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
[ 268.984233] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
[ 268.990704] mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000023
[ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
[ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
[ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
[ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
[ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
[ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
[ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
[ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
[ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
[ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
[ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d
[ 269.066373] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
[ 269.073542] mmc0: sdhci: ============================================
Kind regards,
Norbert
________________________________
This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.
ANDRITZ HYDRO GmbH
Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
Firmensitz/ Registered seat: Wien
Firmenbuchgericht/ Court of registry: Handelsgericht Wien
Firmenbuchnummer/ Company registration: FN 61833 g
DVR: 0605077
UID-Nr.: ATU14756806
Thank You
________________________________
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config-4.19.tar.xz
Type: application/octet-stream
Size: 20192 bytes
Desc: config-4.19.tar.xz
URL: <http://xenomai.org/pipermail/xenomai/attachments/20200225/cf1788ef/attachment.obj>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Interrupt timeout
2020-02-25 13:56 Interrupt timeout Lange Norbert
@ 2020-02-25 15:24 ` Greg Gallagher
2020-02-25 15:40 ` Lange Norbert
0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2020-02-25 15:24 UTC (permalink / raw)
To: Lange Norbert
Cc: Xenomai (xenomai@xenomai.org), Philippe Gerum (rpm@xenomai.org)
Hi,
On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
<xenomai@xenomai.org> wrote:
>
> Hello,
>
> I hope you can give me some pointers to understand why this Bug happened.
> It seems an interrupt got lost somehow, maybe some issue with leveltriggers?
>
> Note that I run on an Apollo Lake, which would normally use PINCTRL_BROXTON,
> but that’s not fixed up for Xenomai yet. The system works fine from eMMC otherwise,
> this bug occurred only once so far.
>
> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series SDXC/MMC Host Controller
> Kernel driver in use: sdhci-pci
> Kernel modules: sdhci_pci
> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series eMMC Controller
> Kernel driver in use: sdhci-pci
> Kernel modules: sdhci_pci
>
> # cat /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 0: 50 0 0 0 IO-APIC 2-edge timer
> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-2
> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
> NMI: 0 0 0 0 Non-maskable interrupts
> LOC: 47839 147583 13807 32602 Local timer interrupts
> SPU: 0 0 0 0 Spurious interrupts
> PMI: 0 0 0 0 Performance monitoring interrupts
> IWI: 0 0 0 0 IRQ work interrupts
> RTR: 0 0 0 0 APIC ICR read retries
> RES: 11955 6931 6567 4946 Rescheduling interrupts
> CAL: 268 223 210 217 Function call interrupts
> TLB: 63 57 61 50 TLB shootdowns
> TRM: 0 0 0 0 Thermal event interrupts
> THR: 0 0 0 0 Threshold APIC interrupts
> MCE: 0 0 0 0 Machine check exceptions
> MCP: 5 6 6 6 Machine check polls
> ERR: 0
> MIS: 0
> PIN: 0 0 0 0 Posted-interrupt notification event
> NPI: 0 0 0 0 Nested posted-interrupt event
> PIW: 0 0 0 0 Posted-interrupt wakeup event
>
>
> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
> [ 238.263900] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
> [ 238.270373] mmc0: sdhci: Argument: 0x00118678 | Trn mode: 0x0000002b
> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
> [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d
> [ 238.346016] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
> [ 238.353180] mmc0: sdhci: ============================================
> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
> [ 248.503887] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
> [ 248.510362] mmc0: sdhci: Argument: 0x00000000 | Trn mode: 0x00000023
> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
> [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d
> [ 248.586065] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
> [ 248.593228] mmc0: sdhci: ============================================
> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 249.143462] rcu: 1-...!: (1 GPs behind) idle=84a/1/0x4000000000000000 softirq=5875/5876 fqs=2479
> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2
> [ 249.174284] rcu: RCU grace-period kthread stack dump:
> [ 249.179364] rcu_sched I 0 10 2 0x80000000
> [ 249.184882] Call Trace:
> [ 249.187360] ? __schedule+0x340/0x8a0
> [ 249.191046] schedule+0x32/0x80
> [ 249.194211] schedule_timeout+0x180/0x360
> [ 249.198248] ? __next_timer_interrupt+0xd0/0xd0
> [ 249.202806] rcu_gp_kthread+0x522/0xec0
> [ 249.206667] ? call_rcu_sched+0x20/0x20
> [ 249.210528] kthread+0x115/0x130
> [ 249.213779] ? kthread_create_worker_on_cpu+0x70/0x70
> [ 249.218861] ret_from_fork+0x20/0x30
> [ 256.449655] igb 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down
> [ 258.725508] mmc0: Timeout waiting for hardware cmd interrupt.
> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
> [ 258.744238] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
> [ 258.750716] mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000023
> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
> [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d
> [ 258.826411] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
> [ 258.833578] mmc0: sdhci: ============================================
> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP ===========
> [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 | Version: 0x00001002
> [ 268.984233] mmc0: sdhci: Blk size: 0x00007200 | Blk cnt: 0x00000000
> [ 268.990704] mmc0: sdhci: Argument: 0x00010000 | Trn mode: 0x00000023
> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab: 0x03ff000b
> [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int: 0x00000001
> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d
> [ 269.066373] mmc0: sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c
> [ 269.073542] mmc0: sdhci: ============================================
>
>
> Kind regards,
> Norbert
I've seen this before when I was playing around with getting the ipipe
to run on my edison board. I don't think it's ipipe related. I found
this thread on patchwork:
https://patchwork.kernel.org/patch/10705823/
There was another thread on the Ubuntu forums that solved the issue by
disabling the sdcard in the bios.
-Greg
>
> ________________________________
>
> This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.
>
> ANDRITZ HYDRO GmbH
>
>
> Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
>
> Firmensitz/ Registered seat: Wien
>
> Firmenbuchgericht/ Court of registry: Handelsgericht Wien
>
> Firmenbuchnummer/ Company registration: FN 61833 g
>
> DVR: 0605077
>
> UID-Nr.: ATU14756806
>
>
> Thank You
> ________________________________
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: config-4.19.tar.xz
> Type: application/octet-stream
> Size: 20192 bytes
> Desc: config-4.19.tar.xz
> URL: <http://xenomai.org/pipermail/xenomai/attachments/20200225/cf1788ef/attachment.obj>
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Interrupt timeout
2020-02-25 15:24 ` Greg Gallagher
@ 2020-02-25 15:40 ` Lange Norbert
2020-02-25 15:46 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Lange Norbert @ 2020-02-25 15:40 UTC (permalink / raw)
To: Greg Gallagher
Cc: Xenomai (xenomai@xenomai.org), Philippe Gerum (rpm@xenomai.org)
> -----Original Message-----
> From: Greg Gallagher <greg@embeddedgreg.com>
> Sent: Dienstag, 25. Februar 2020 16:24
> To: Lange Norbert <norbert.lange@andritz.com>
> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>; Philippe
> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> Hi,
>
> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> <xenomai@xenomai.org> wrote:
> >
> > Hello,
> >
> > I hope you can give me some pointers to understand why this Bug
> happened.
> > It seems an interrupt got lost somehow, maybe some issue with
> leveltriggers?
> >
> > Note that I run on an Apollo Lake, which would normally use
> > PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
> > works fine from eMMC otherwise, this bug occurred only once so far.
> >
> > 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> > Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
> > Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> > Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
> > modules: sdhci_pci
> > 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> > Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> > Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> > Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
> > sdhci_pci
> >
> > # cat /proc/interrupts
> > CPU0 CPU1 CPU2 CPU3
> > 0: 50 0 0 0 IO-APIC 2-edge timer
> > 4: 0 0 498 0 IO-APIC 4-edge ttyS0
> > 8: 0 0 0 0 IO-APIC 8-edge rtc0
> > 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> > 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
> > 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
> > 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
> > 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
> > 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
> > 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
> > 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
> > 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
> > 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
> > 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
> > 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
> > 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
> > 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
> > 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
> > 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
> > 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-2
> > 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
> > 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
> > NMI: 0 0 0 0 Non-maskable interrupts
> > LOC: 47839 147583 13807 32602 Local timer interrupts
> > SPU: 0 0 0 0 Spurious interrupts
> > PMI: 0 0 0 0 Performance monitoring interrupts
> > IWI: 0 0 0 0 IRQ work interrupts
> > RTR: 0 0 0 0 APIC ICR read retries
> > RES: 11955 6931 6567 4946 Rescheduling interrupts
> > CAL: 268 223 210 217 Function call interrupts
> > TLB: 63 57 61 50 TLB shootdowns
> > TRM: 0 0 0 0 Thermal event interrupts
> > THR: 0 0 0 0 Threshold APIC interrupts
> > MCE: 0 0 0 0 Machine check exceptions
> > MCP: 5 6 6 6 Machine check polls
> > ERR: 0
> > MIS: 0
> > PIN: 0 0 0 0 Posted-interrupt notification event
> > NPI: 0 0 0 0 Nested posted-interrupt event
> > PIW: 0 0 0 0 Posted-interrupt wakeup event
> >
> >
>
> > [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
> > [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> > =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
> > Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
> > | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument: 0x00118678 |
> Trn mode: 0x0000002b
> > [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> > [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> > [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> > [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
> > [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> > 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> 0x00000001
> > [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> > [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> > [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> > [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> > [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016] mmc0:
> > sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> > 238.353180] mmc0: sdhci:
> ============================================
> > [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
> > [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> > =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
> > Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
> > | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument: 0x00000000 |
> Trn mode: 0x00000023
> > [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> > [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> > [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> > [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
> > [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> > 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> 0x00000001
> > [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> > [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
> > [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> > [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> > [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065] mmc0:
> > sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> > 248.593228] mmc0: sdhci:
> ============================================
> > [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> > [ 249.143462] rcu: 1-...!: (1 GPs behind) idle=84a/1/0x4000000000000000
> softirq=5875/5876 fqs=2479
> > [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
> > [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
> > [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
> > f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu: RCU
> grace-period kthread stack dump:
> > [ 249.179364] rcu_sched I 0 10 2 0x80000000
> > [ 249.184882] Call Trace:
> > [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
> > schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
> > 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
> > rcu_gp_kthread+0x522/0xec0 [ 249.206667] ? call_rcu_sched+0x20/0x20
> > [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
> > kthread_create_worker_on_cpu+0x70/0x70
> > [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
> > 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508] mmc0:
> > Timeout waiting for hardware cmd interrupt.
> > [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> > =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
> > Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
> > | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument: 0x00010000 |
> Trn mode: 0x00000023
> > [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> > [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> > [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> > [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> > [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> > 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> 0x00000001
> > [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> > [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> > [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
> > [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> > [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411] mmc0:
> > sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> > 258.833578] mmc0: sdhci:
> ============================================
> > [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> > [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> > =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
> > Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
> > | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument: 0x00010000 |
> Trn mode: 0x00000023
> > [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> > [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> > [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> > [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> > [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> > 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> 0x00000001
> > [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> > [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> > [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> > [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> > [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373] mmc0:
> > sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> > 269.073542] mmc0: sdhci:
> ============================================
> >
> >
> > Kind regards,
> > Norbert
> I've seen this before when I was playing around with getting the ipipe to run
> on my edison board. I don't think it's ipipe related. I found this thread on
> patchwork:
>
> https://hes32-
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-a301-
> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
Ah thanks,
Quote: "Commands and resets are under spin lock, so no possibility for preemption,
and certainly a few microseconds isn't going to make any difference to these
timeouts, so I don't see how this patch could help."
So I guess this is not an issue under plain Linux, but Cobalt could preempt for a
couple of ms (or even longer if you managed to get an RT thread stuck).
>
> There was another thread on the Ubuntu forums that solved the issue by
> disabling the sdcard in the bios.
I can't do that, as this is the eMMC the system is running from.
BTW, if I would want to limit such issues, could I limit Linux IRQs (apart from a handful of necessary maintainance/timer) to a single core?
Thanks, Norbert
________________________________
This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.
ANDRITZ HYDRO GmbH
Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
Firmensitz/ Registered seat: Wien
Firmenbuchgericht/ Court of registry: Handelsgericht Wien
Firmenbuchnummer/ Company registration: FN 61833 g
DVR: 0605077
UID-Nr.: ATU14756806
Thank You
________________________________
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Interrupt timeout
2020-02-25 15:40 ` Lange Norbert
@ 2020-02-25 15:46 ` Jan Kiszka
2020-02-25 15:51 ` Greg Gallagher
2020-02-25 15:55 ` Lange Norbert
0 siblings, 2 replies; 9+ messages in thread
From: Jan Kiszka @ 2020-02-25 15:46 UTC (permalink / raw)
To: Lange Norbert, Greg Gallagher; +Cc: Xenomai (xenomai@xenomai.org)
On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
>
>
>> -----Original Message-----
>> From: Greg Gallagher <greg@embeddedgreg.com>
>> Sent: Dienstag, 25. Februar 2020 16:24
>> To: Lange Norbert <norbert.lange@andritz.com>
>> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>; Philippe
>> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
>> Subject: Re: Interrupt timeout
>>
>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>> ATTACHMENTS.
>>
>>
>> Hi,
>>
>> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
>> <xenomai@xenomai.org> wrote:
>>>
>>> Hello,
>>>
>>> I hope you can give me some pointers to understand why this Bug
>> happened.
>>> It seems an interrupt got lost somehow, maybe some issue with
>> leveltriggers?
>>>
>>> Note that I run on an Apollo Lake, which would normally use
>>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
>>> works fine from eMMC otherwise, this bug occurred only once so far.
>>>
>>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
>>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
>>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
>>> modules: sdhci_pci
>>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
>>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
>>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
>>> sdhci_pci
>>>
>>> # cat /proc/interrupts
>>> CPU0 CPU1 CPU2 CPU3
>>> 0: 50 0 0 0 IO-APIC 2-edge timer
>>> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
>>> 8: 0 0 0 0 IO-APIC 8-edge rtc0
>>> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
>>> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
>>> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
>>> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
>>> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
>>> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
>>> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
>>> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
>>> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
>>> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
>>> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
>>> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
>>> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
>>> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
>>> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
>>> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
>>> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-2
>>> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
>>> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
>>> NMI: 0 0 0 0 Non-maskable interrupts
>>> LOC: 47839 147583 13807 32602 Local timer interrupts
>>> SPU: 0 0 0 0 Spurious interrupts
>>> PMI: 0 0 0 0 Performance monitoring interrupts
>>> IWI: 0 0 0 0 IRQ work interrupts
>>> RTR: 0 0 0 0 APIC ICR read retries
>>> RES: 11955 6931 6567 4946 Rescheduling interrupts
>>> CAL: 268 223 210 217 Function call interrupts
>>> TLB: 63 57 61 50 TLB shootdowns
>>> TRM: 0 0 0 0 Thermal event interrupts
>>> THR: 0 0 0 0 Threshold APIC interrupts
>>> MCE: 0 0 0 0 Machine check exceptions
>>> MCP: 5 6 6 6 Machine check polls
>>> ERR: 0
>>> MIS: 0
>>> PIN: 0 0 0 0 Posted-interrupt notification event
>>> NPI: 0 0 0 0 Nested posted-interrupt event
>>> PIW: 0 0 0 0 Posted-interrupt wakeup event
>>>
>>>
>>
>>> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
>>> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>> =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
>>> Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
>>> | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument: 0x00118678 |
>> Trn mode: 0x0000002b
>>> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
>>> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>> 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>> 0x00000001
>>> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
>>> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016] mmc0:
>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>> 238.353180] mmc0: sdhci:
>> ============================================
>>> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
>>> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>> =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
>>> Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
>>> | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument: 0x00000000 |
>> Trn mode: 0x00000023
>>> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
>>> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>> 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>> 0x00000001
>>> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
>>> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065] mmc0:
>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>> 248.593228] mmc0: sdhci:
>> ============================================
>>> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>> [ 249.143462] rcu: 1-...!: (1 GPs behind) idle=84a/1/0x4000000000000000
>> softirq=5875/5876 fqs=2479
>>> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
>>> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
>>> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
>>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu: RCU
>> grace-period kthread stack dump:
>>> [ 249.179364] rcu_sched I 0 10 2 0x80000000
>>> [ 249.184882] Call Trace:
>>> [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
>>> schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
>>> 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
>>> rcu_gp_kthread+0x522/0xec0 [ 249.206667] ? call_rcu_sched+0x20/0x20
>>> [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
>>> kthread_create_worker_on_cpu+0x70/0x70
>>> [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
>>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508] mmc0:
>>> Timeout waiting for hardware cmd interrupt.
>>> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>> =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
>>> Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
>>> | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument: 0x00010000 |
>> Trn mode: 0x00000023
>>> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
>>> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>> 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>> 0x00000001
>>> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
>>> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
>>> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411] mmc0:
>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>> 258.833578] mmc0: sdhci:
>> ============================================
>>> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
>>> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>> =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
>>> Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
>>> | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument: 0x00010000 |
>> Trn mode: 0x00000023
>>> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
>>> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>> 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>> 0x00000001
>>> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
>>> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373] mmc0:
>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>> 269.073542] mmc0: sdhci:
>> ============================================
>>>
>>>
>>> Kind regards,
>>> Norbert
>> I've seen this before when I was playing around with getting the ipipe to run
>> on my edison board. I don't think it's ipipe related. I found this thread on
>> patchwork:
>>
>> https://hes32-
>> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
>> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-a301-
>> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
>> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
>
> Ah thanks,
>
> Quote: "Commands and resets are under spin lock, so no possibility for preemption,
> and certainly a few microseconds isn't going to make any difference to these
> timeouts, so I don't see how this patch could help."
>
> So I guess this is not an issue under plain Linux, but Cobalt could preempt for a
> couple of ms (or even longer if you managed to get an RT thread stuck).
That may also be different on PREEMPT-RT.
>>
>> There was another thread on the Ubuntu forums that solved the issue by
>> disabling the sdcard in the bios.
>
> I can't do that, as this is the eMMC the system is running from.
>
> BTW, if I would want to limit such issues, could I limit Linux IRQs (apart from a handful of necessary maintainance/timer) to a single core?
>
Yes, that's the best option on a multicore system for both sides, RT and
Linux.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Interrupt timeout
2020-02-25 15:46 ` Jan Kiszka
@ 2020-02-25 15:51 ` Greg Gallagher
2020-02-25 15:55 ` Lange Norbert
1 sibling, 0 replies; 9+ messages in thread
From: Greg Gallagher @ 2020-02-25 15:51 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Lange Norbert, Xenomai (xenomai@xenomai.org)
On Tue, Feb 25, 2020 at 10:46 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >
> >
> >> -----Original Message-----
> >> From: Greg Gallagher <greg@embeddedgreg.com>
> >> Sent: Dienstag, 25. Februar 2020 16:24
> >> To: Lange Norbert <norbert.lange@andritz.com>
> >> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>; Philippe
> >> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>
> >>
> >> Hi,
> >>
> >> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >> <xenomai@xenomai.org> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I hope you can give me some pointers to understand why this Bug
> >> happened.
> >>> It seems an interrupt got lost somehow, maybe some issue with
> >> leveltriggers?
> >>>
> >>> Note that I run on an Apollo Lake, which would normally use
> >>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
> >>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>
> >>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> >>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
> >>> modules: sdhci_pci
> >>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900
> >>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
> >>> sdhci_pci
> >>>
> >>> # cat /proc/interrupts
> >>> CPU0 CPU1 CPU2 CPU3
> >>> 0: 50 0 0 0 IO-APIC 2-edge timer
> >>> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
> >>> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> >>> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> >>> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
> >>> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
> >>> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
> >>> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
> >>> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
> >>> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
> >>> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
> >>> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
> >>> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
> >>> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
> >>> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
> >>> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
> >>> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
> >>> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-0
> >>> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
> >>> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-2
> >>> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
> >>> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
> >>> NMI: 0 0 0 0 Non-maskable interrupts
> >>> LOC: 47839 147583 13807 32602 Local timer interrupts
> >>> SPU: 0 0 0 0 Spurious interrupts
> >>> PMI: 0 0 0 0 Performance monitoring interrupts
> >>> IWI: 0 0 0 0 IRQ work interrupts
> >>> RTR: 0 0 0 0 APIC ICR read retries
> >>> RES: 11955 6931 6567 4946 Rescheduling interrupts
> >>> CAL: 268 223 210 217 Function call interrupts
> >>> TLB: 63 57 61 50 TLB shootdowns
> >>> TRM: 0 0 0 0 Thermal event interrupts
> >>> THR: 0 0 0 0 Threshold APIC interrupts
> >>> MCE: 0 0 0 0 Machine check exceptions
> >>> MCP: 5 6 6 6 Machine check polls
> >>> ERR: 0
> >>> MIS: 0
> >>> PIN: 0 0 0 0 Posted-interrupt notification event
> >>> NPI: 0 0 0 0 Nested posted-interrupt event
> >>> PIW: 0 0 0 0 Posted-interrupt wakeup event
> >>>
> >>>
> >>
> >>> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
> >>> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument: 0x00118678 |
> >> Trn mode: 0x0000002b
> >>> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
> >>> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> >>> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 238.353180] mmc0: sdhci:
> >> ============================================
> >>> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
> >>> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument: 0x00000000 |
> >> Trn mode: 0x00000023
> >>> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
> >>> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
> >>> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 248.593228] mmc0: sdhci:
> >> ============================================
> >>> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >>> [ 249.143462] rcu: 1-...!: (1 GPs behind) idle=84a/1/0x4000000000000000
> >> softirq=5875/5876 fqs=2479
> >>> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
> >>> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
> >>> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
> >>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu: RCU
> >> grace-period kthread stack dump:
> >>> [ 249.179364] rcu_sched I 0 10 2 0x80000000
> >>> [ 249.184882] Call Trace:
> >>> [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
> >>> schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
> >>> 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
> >>> rcu_gp_kthread+0x522/0xec0 [ 249.206667] ? call_rcu_sched+0x20/0x20
> >>> [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
> >>> kthread_create_worker_on_cpu+0x70/0x70
> >>> [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
> >>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508] mmc0:
> >>> Timeout waiting for hardware cmd interrupt.
> >>> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument: 0x00010000 |
> >> Trn mode: 0x00000023
> >>> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> >>> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> >>> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
> >>> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 258.833578] mmc0: sdhci:
> >> ============================================
> >>> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> >>> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument: 0x00010000 |
> >> Trn mode: 0x00000023
> >>> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> >>> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> >>> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 269.073542] mmc0: sdhci:
> >> ============================================
> >>>
> >>>
> >>> Kind regards,
> >>> Norbert
> >> I've seen this before when I was playing around with getting the ipipe to run
> >> on my edison board. I don't think it's ipipe related. I found this thread on
> >> patchwork:
> >>
> >> https://hes32-
> >> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
> >> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-a301-
> >> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> >> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
> >
> > Ah thanks,
> >
> > Quote: "Commands and resets are under spin lock, so no possibility for preemption,
> > and certainly a few microseconds isn't going to make any difference to these
> > timeouts, so I don't see how this patch could help."
> >
> > So I guess this is not an issue under plain Linux, but Cobalt could preempt for a
> > couple of ms (or even longer if you managed to get an RT thread stuck).
>
> That may also be different on PREEMPT-RT.
>
I agree, cobalt is probably hitting the issue more often. I see this
issue more when using PREEMPT-RT, I've never seen it (yet) with a
plain kernel.
> >>
> >> There was another thread on the Ubuntu forums that solved the issue by
> >> disabling the sdcard in the bios.
> >
> > I can't do that, as this is the eMMC the system is running from.
> >
> > BTW, if I would want to limit such issues, could I limit Linux IRQs (apart from a handful of necessary maintainance/timer) to a single core?
> >
>
> Yes, that's the best option on a multicore system for both sides, RT and
> Linux.
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Interrupt timeout
2020-02-25 15:46 ` Jan Kiszka
2020-02-25 15:51 ` Greg Gallagher
@ 2020-02-25 15:55 ` Lange Norbert
2020-02-25 17:05 ` Jan Kiszka
1 sibling, 1 reply; 9+ messages in thread
From: Lange Norbert @ 2020-02-25 15:55 UTC (permalink / raw)
To: Jan Kiszka, Greg Gallagher; +Cc: Xenomai (xenomai@xenomai.org)
> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: Dienstag, 25. Februar 2020 16:47
> To: Lange Norbert <norbert.lange@andritz.com>; Greg Gallagher
> <greg@embeddedgreg.com>
> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >
> >
> >> -----Original Message-----
> >> From: Greg Gallagher <greg@embeddedgreg.com>
> >> Sent: Dienstag, 25. Februar 2020 16:24
> >> To: Lange Norbert <norbert.lange@andritz.com>
> >> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>; Philippe
> >> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> Hi,
> >>
> >> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >> <xenomai@xenomai.org> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I hope you can give me some pointers to understand why this Bug
> >> happened.
> >>> It seems an interrupt got lost somehow, maybe some issue with
> >> leveltriggers?
> >>>
> >>> Note that I run on an Apollo Lake, which would normally use
> >>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
> >>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>
> >>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
> 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> E3900
> >>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
> >>> modules: sdhci_pci
> >>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> E3900
> >>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
> >>> sdhci_pci
> >>>
> >>> # cat /proc/interrupts
> >>> CPU0 CPU1 CPU2 CPU3
> >>> 0: 50 0 0 0 IO-APIC 2-edge timer
> >>> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
> >>> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> >>> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> >>> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
> >>> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
> >>> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
> >>> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
> >>> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
> >>> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
> >>> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
> >>> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
> >>> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
> >>> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
> >>> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
> >>> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
> >>> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
> >>> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-
> 0
> >>> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
> >>> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-
> 2
> >>> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
> >>> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
> >>> NMI: 0 0 0 0 Non-maskable interrupts
> >>> LOC: 47839 147583 13807 32602 Local timer interrupts
> >>> SPU: 0 0 0 0 Spurious interrupts
> >>> PMI: 0 0 0 0 Performance monitoring interrupts
> >>> IWI: 0 0 0 0 IRQ work interrupts
> >>> RTR: 0 0 0 0 APIC ICR read retries
> >>> RES: 11955 6931 6567 4946 Rescheduling interrupts
> >>> CAL: 268 223 210 217 Function call interrupts
> >>> TLB: 63 57 61 50 TLB shootdowns
> >>> TRM: 0 0 0 0 Thermal event interrupts
> >>> THR: 0 0 0 0 Threshold APIC interrupts
> >>> MCE: 0 0 0 0 Machine check exceptions
> >>> MCP: 5 6 6 6 Machine check polls
> >>> ERR: 0
> >>> MIS: 0
> >>> PIN: 0 0 0 0 Posted-interrupt notification event
> >>> NPI: 0 0 0 0 Nested posted-interrupt event
> >>> PIW: 0 0 0 0 Posted-interrupt wakeup event
> >>>
> >>>
> >>
> >>> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
> >>> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument: 0x00118678
> |
> >> Trn mode: 0x0000002b
> >>> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
> >>> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
> >>> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 238.353180] mmc0: sdhci:
> >> ============================================
> >>> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
> >>> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument: 0x00000000
> |
> >> Trn mode: 0x00000023
> >>> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
> >>> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
> >>> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 248.593228] mmc0: sdhci:
> >> ============================================
> >>> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >>> [ 249.143462] rcu: 1-...!: (1 GPs behind)
> idle=84a/1/0x4000000000000000
> >> softirq=5875/5876 fqs=2479
> >>> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
> >>> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
> >>> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
> >>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu:
> RCU
> >> grace-period kthread stack dump:
> >>> [ 249.179364] rcu_sched I 0 10 2 0x80000000
> >>> [ 249.184882] Call Trace:
> >>> [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
> >>> schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
> >>> 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
> >>> rcu_gp_kthread+0x522/0xec0 [ 249.206667] ?
> call_rcu_sched+0x20/0x20
> >>> [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
> >>> kthread_create_worker_on_cpu+0x70/0x70
> >>> [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
> >>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508] mmc0:
> >>> Timeout waiting for hardware cmd interrupt.
> >>> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument: 0x00010000
> |
> >> Trn mode: 0x00000023
> >>> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> >>> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> >>> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
> >>> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 258.833578] mmc0: sdhci:
> >> ============================================
> >>> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> >>> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>> =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
> >>> Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
> >>> | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument: 0x00010000
> |
> >> Trn mode: 0x00000023
> >>> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
> >>> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
> >>> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
> >>> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
> >>> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>> 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
> >> 0x00000001
> >>> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
> >>> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
> >>> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
> >>> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373] mmc0:
> >>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>> 269.073542] mmc0: sdhci:
> >> ============================================
> >>>
> >>>
> >>> Kind regards,
> >>> Norbert
> >> I've seen this before when I was playing around with getting the ipipe to
> run
> >> on my edison board. I don't think it's ipipe related. I found this thread on
> >> patchwork:
> >>
> >> https://hes32-
> >>
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
> >> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-
> a301-
> >> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> >> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
> >
> > Ah thanks,
> >
> > Quote: "Commands and resets are under spin lock, so no possibility for
> preemption,
> > and certainly a few microseconds isn't going to make any difference to
> these
> > timeouts, so I don't see how this patch could help."
> >
> > So I guess this is not an issue under plain Linux, but Cobalt could preempt
> for a
> > couple of ms (or even longer if you managed to get an RT thread stuck).
>
> That may also be different on PREEMPT-RT.
>
> >>
> >> There was another thread on the Ubuntu forums that solved the issue by
> >> disabling the sdcard in the bios.
> >
> > I can't do that, as this is the eMMC the system is running from.
> >
> > BTW, if I would want to limit such issues, could I limit Linux IRQs (apart from
> a handful of necessary maintainance/timer) to a single core?
> >
>
> Yes, that's the best option on a multicore system for both sides, RT and
> Linux.
Next question: how?
I played around with isolcpus, but that had severe sideeffects:
https://xenomai.org/pipermail/xenomai/2019-October/041902.html
This was before Philippe fixed some serious scheduler bugs,
but I stopped trying to do anything "smart" after ending up with random weirdness.
Norbert
________________________________
This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.
ANDRITZ HYDRO GmbH
Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
Firmensitz/ Registered seat: Wien
Firmenbuchgericht/ Court of registry: Handelsgericht Wien
Firmenbuchnummer/ Company registration: FN 61833 g
DVR: 0605077
UID-Nr.: ATU14756806
Thank You
________________________________
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Interrupt timeout
2020-02-25 15:55 ` Lange Norbert
@ 2020-02-25 17:05 ` Jan Kiszka
2020-02-25 17:53 ` Lange Norbert
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2020-02-25 17:05 UTC (permalink / raw)
To: Lange Norbert, Greg Gallagher; +Cc: Xenomai (xenomai@xenomai.org)
On 25.02.20 16:55, Lange Norbert wrote:
>
>
>> -----Original Message-----
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>> Sent: Dienstag, 25. Februar 2020 16:47
>> To: Lange Norbert <norbert.lange@andritz.com>; Greg Gallagher
>> <greg@embeddedgreg.com>
>> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
>> Subject: Re: Interrupt timeout
>>
>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>> ATTACHMENTS.
>>
>>
>> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Greg Gallagher <greg@embeddedgreg.com>
>>>> Sent: Dienstag, 25. Februar 2020 16:24
>>>> To: Lange Norbert <norbert.lange@andritz.com>
>>>> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>; Philippe
>>>> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
>>>> Subject: Re: Interrupt timeout
>>>>
>>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
>> ATTACHMENTS.
>>>>
>>>>
>>>> Hi,
>>>>
>>>> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
>>>> <xenomai@xenomai.org> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> I hope you can give me some pointers to understand why this Bug
>>>> happened.
>>>>> It seems an interrupt got lost somehow, maybe some issue with
>>>> leveltriggers?
>>>>>
>>>>> Note that I run on an Apollo Lake, which would normally use
>>>>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The system
>>>>> works fine from eMMC otherwise, this bug occurred only once so far.
>>>>>
>>>>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
>>>>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
>> 0b)
>>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
>> E3900
>>>>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci Kernel
>>>>> modules: sdhci_pci
>>>>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
>>>>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
>>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
>> E3900
>>>>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel modules:
>>>>> sdhci_pci
>>>>>
>>>>> # cat /proc/interrupts
>>>>> CPU0 CPU1 CPU2 CPU3
>>>>> 0: 50 0 0 0 IO-APIC 2-edge timer
>>>>> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
>>>>> 8: 0 0 0 0 IO-APIC 8-edge rtc0
>>>>> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
>>>>> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
>>>>> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
>>>>> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
>>>>> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
>>>>> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
>>>>> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
>>>>> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
>>>>> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
>>>>> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-rx-0
>>>>> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
>>>>> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
>>>>> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-1
>>>>> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
>>>>> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-TxRx-
>> 0
>>>>> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-TxRx-1
>>>>> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-TxRx-
>> 2
>>>>> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-TxRx-3
>>>>> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
>>>>> NMI: 0 0 0 0 Non-maskable interrupts
>>>>> LOC: 47839 147583 13807 32602 Local timer interrupts
>>>>> SPU: 0 0 0 0 Spurious interrupts
>>>>> PMI: 0 0 0 0 Performance monitoring interrupts
>>>>> IWI: 0 0 0 0 IRQ work interrupts
>>>>> RTR: 0 0 0 0 APIC ICR read retries
>>>>> RES: 11955 6931 6567 4946 Rescheduling interrupts
>>>>> CAL: 268 223 210 217 Function call interrupts
>>>>> TLB: 63 57 61 50 TLB shootdowns
>>>>> TRM: 0 0 0 0 Thermal event interrupts
>>>>> THR: 0 0 0 0 Threshold APIC interrupts
>>>>> MCE: 0 0 0 0 Machine check exceptions
>>>>> MCP: 5 6 6 6 Machine check polls
>>>>> ERR: 0
>>>>> MIS: 0
>>>>> PIN: 0 0 0 0 Posted-interrupt notification event
>>>>> NPI: 0 0 0 0 Nested posted-interrupt event
>>>>> PIW: 0 0 0 0 Posted-interrupt wakeup event
>>>>>
>>>>>
>>>>
>>>>> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
>>>>> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>>>> =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
>>>>> Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
>>>>> | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument: 0x00118678
>> |
>>>> Trn mode: 0x0000002b
>>>>> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>>>> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>>>> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>>>> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat: 0x00000003
>>>>> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>>>> 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>>>> 0x00000001
>>>>> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>>>> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr: 0x00000000
>>>>> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>>>> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>>>> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016] mmc0:
>>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>>>> 238.353180] mmc0: sdhci:
>>>> ============================================
>>>>> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
>>>>> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>>>> =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
>>>>> Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
>>>>> | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument: 0x00000000
>> |
>>>> Trn mode: 0x00000023
>>>>> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>>>> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>>>> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>>>> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018002
>>>>> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>>>> 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>>>> 0x00000001
>>>>> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>>>> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr: 0x00000000
>>>>> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>>>> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>>>> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065] mmc0:
>>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>>>> 248.593228] mmc0: sdhci:
>>>> ============================================
>>>>> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>>>> [ 249.143462] rcu: 1-...!: (1 GPs behind)
>> idle=84a/1/0x4000000000000000
>>>> softirq=5875/5876 fqs=2479
>>>>> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
>>>>> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
>>>>> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
>>>>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu:
>> RCU
>>>> grace-period kthread stack dump:
>>>>> [ 249.179364] rcu_sched I 0 10 2 0x80000000
>>>>> [ 249.184882] Call Trace:
>>>>> [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
>>>>> schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
>>>>> 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
>>>>> rcu_gp_kthread+0x522/0xec0 [ 249.206667] ?
>> call_rcu_sched+0x20/0x20
>>>>> [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
>>>>> kthread_create_worker_on_cpu+0x70/0x70
>>>>> [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
>>>>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508] mmc0:
>>>>> Timeout waiting for hardware cmd interrupt.
>>>>> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>>>> =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
>>>>> Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
>>>>> | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument: 0x00010000
>> |
>>>> Trn mode: 0x00000023
>>>>> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>>>> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>>>> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>>>> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
>>>>> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>>>> 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>>>> 0x00000001
>>>>> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>>>> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
>>>>> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
>>>>> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>>>> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411] mmc0:
>>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>>>> 258.833578] mmc0: sdhci:
>>>> ============================================
>>>>> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
>>>>> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
>>>>> =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
>>>>> Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
>>>>> | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument: 0x00010000
>> |
>>>> Trn mode: 0x00000023
>>>>> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl: 0x0000003d
>>>>> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap: 0x00000080
>>>>> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock: 0x00000007
>>>>> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat: 0x00018001
>>>>> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
>>>>> 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot int:
>>>> 0x00000001
>>>>> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1: 0x80000807
>>>>> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr: 0x00000000
>>>>> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
>>>>> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]: 0x00000900
>>>>> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373] mmc0:
>>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
>>>>> 269.073542] mmc0: sdhci:
>>>> ============================================
>>>>>
>>>>>
>>>>> Kind regards,
>>>>> Norbert
>>>> I've seen this before when I was playing around with getting the ipipe to
>> run
>>>> on my edison board. I don't think it's ipipe related. I found this thread on
>>>> patchwork:
>>>>
>>>> https://hes32-
>>>>
>> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
>>>> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-
>> a301-
>>>> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
>>>> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
>>>
>>> Ah thanks,
>>>
>>> Quote: "Commands and resets are under spin lock, so no possibility for
>> preemption,
>>> and certainly a few microseconds isn't going to make any difference to
>> these
>>> timeouts, so I don't see how this patch could help."
>>>
>>> So I guess this is not an issue under plain Linux, but Cobalt could preempt
>> for a
>>> couple of ms (or even longer if you managed to get an RT thread stuck).
>>
>> That may also be different on PREEMPT-RT.
>>
>>>>
>>>> There was another thread on the Ubuntu forums that solved the issue by
>>>> disabling the sdcard in the bios.
>>>
>>> I can't do that, as this is the eMMC the system is running from.
>>>
>>> BTW, if I would want to limit such issues, could I limit Linux IRQs (apart from
>> a handful of necessary maintainance/timer) to a single core?
>>>
>>
>> Yes, that's the best option on a multicore system for both sides, RT and
>> Linux.
>
> Next question: how?
>
> I played around with isolcpus, but that had severe sideeffects:
> https://xenomai.org/pipermail/xenomai/2019-October/041902.html
>
> This was before Philippe fixed some serious scheduler bugs,
> but I stopped trying to do anything "smart" after ending up with random weirdness.
>
isolcpus should work. If not, we need to debug and fix.
Less convenient alternative: manually tuning /proc/irq/*/smp_affinity or
instructing irqbalanced to keep some cores free. cgroups with CPU
affinities can help to keep tasks away.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Interrupt timeout
2020-02-25 17:05 ` Jan Kiszka
@ 2020-02-25 17:53 ` Lange Norbert
2020-02-25 18:17 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Lange Norbert @ 2020-02-25 17:53 UTC (permalink / raw)
To: Jan Kiszka, Greg Gallagher; +Cc: Xenomai (xenomai@xenomai.org)
> -----Original Message-----
> From: Jan Kiszka <jan.kiszka@siemens.com>
> Sent: Dienstag, 25. Februar 2020 18:05
> To: Lange Norbert <norbert.lange@andritz.com>; Greg Gallagher
> <greg@embeddedgreg.com>
> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
> Subject: Re: Interrupt timeout
>
> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
>
>
> On 25.02.20 16:55, Lange Norbert wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Kiszka <jan.kiszka@siemens.com>
> >> Sent: Dienstag, 25. Februar 2020 16:47
> >> To: Lange Norbert <norbert.lange@andritz.com>; Greg Gallagher
> >> <greg@embeddedgreg.com>
> >> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>
> >> Subject: Re: Interrupt timeout
> >>
> >> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> ATTACHMENTS.
> >>
> >>
> >> On 25.02.20 16:40, Lange Norbert via Xenomai wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Greg Gallagher <greg@embeddedgreg.com>
> >>>> Sent: Dienstag, 25. Februar 2020 16:24
> >>>> To: Lange Norbert <norbert.lange@andritz.com>
> >>>> Cc: Xenomai (xenomai@xenomai.org) <xenomai@xenomai.org>;
> Philippe
> >>>> Gerum (rpm@xenomai.org) <rpm@xenomai.org>
> >>>> Subject: Re: Interrupt timeout
> >>>>
> >>>> NON-ANDRITZ SOURCE: BE CAUTIOUS WITH CONTENT, LINKS OR
> >> ATTACHMENTS.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> On Tue, Feb 25, 2020 at 8:57 AM Lange Norbert via Xenomai
> >>>> <xenomai@xenomai.org> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I hope you can give me some pointers to understand why this Bug
> >>>> happened.
> >>>>> It seems an interrupt got lost somehow, maybe some issue with
> >>>> leveltriggers?
> >>>>>
> >>>>> Note that I run on an Apollo Lake, which would normally use
> >>>>> PINCTRL_BROXTON, but that’s not fixed up for Xenomai yet. The
> system
> >>>>> works fine from eMMC otherwise, this bug occurred only once so far.
> >>>>>
> >>>>> 00:1b.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series SDXC/MMC Host Controller (rev
> >> 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series SDXC/MMC Host Controller Kernel driver in use: sdhci-pci
> Kernel
> >>>>> modules: sdhci_pci
> >>>>> 00:1c.0 SD Host controller: Intel Corporation Atom/Celeron/Pentium
> >>>>> Processor N4200/N3350/E3900 Series eMMC Controller (rev 0b)
> >>>>> Subsystem: Intel Corporation Celeron N3350/Pentium N4200/Atom
> >> E3900
> >>>>> Series eMMC Controller Kernel driver in use: sdhci-pci Kernel
> modules:
> >>>>> sdhci_pci
> >>>>>
> >>>>> # cat /proc/interrupts
> >>>>> CPU0 CPU1 CPU2 CPU3
> >>>>> 0: 50 0 0 0 IO-APIC 2-edge timer
> >>>>> 4: 0 0 498 0 IO-APIC 4-edge ttyS0
> >>>>> 8: 0 0 0 0 IO-APIC 8-edge rtc0
> >>>>> 9: 0 0 0 0 IO-APIC 9-fasteoi acpi
> >>>>> 20: 20 0 0 0 IO-APIC 20-fasteoi i801_smbus
> >>>>> 39: 0 1315 0 0 IO-APIC 39-fasteoi mmc0
> >>>>> 129: 0 1 0 0 PCI-MSI 1048576-edge enp2s0
> >>>>> 130: 0 0 839 0 PCI-MSI 1048577-edge enp2s0-rx-0
> >>>>> 131: 0 0 0 773 PCI-MSI 1048578-edge enp2s0-rx-1
> >>>>> 132: 985 0 0 0 PCI-MSI 1048579-edge enp2s0-tx-0
> >>>>> 133: 0 773 0 0 PCI-MSI 1048580-edge enp2s0-tx-1
> >>>>> 134: 0 0 0 1 PCI-MSI 1572864-edge enp3s0
> >>>>> 135: 11464 0 0 0 PCI-MSI 1572865-edge enp3s0-
> rx-0
> >>>>> 136: 0 781 0 0 PCI-MSI 1572866-edge enp3s0-rx-1
> >>>>> 137: 0 0 899 0 PCI-MSI 1572867-edge enp3s0-tx-0
> >>>>> 138: 0 0 0 9701 PCI-MSI 1572868-edge enp3s0-tx-
> 1
> >>>>> 139: 1 0 0 0 PCI-MSI 2097152-edge enp4s0
> >>>>> 140: 0 1985 0 0 PCI-MSI 2097153-edge enp4s0-
> TxRx-
> >> 0
> >>>>> 141: 0 0 774 0 PCI-MSI 2097154-edge enp4s0-
> TxRx-1
> >>>>> 142: 0 0 0 1905 PCI-MSI 2097155-edge enp4s0-
> TxRx-
> >> 2
> >>>>> 143: 775 0 0 0 PCI-MSI 2097156-edge enp4s0-
> TxRx-3
> >>>>> 144: 0 0 97790 0 PCI-MSI 344064-edge xhci_hcd
> >>>>> NMI: 0 0 0 0 Non-maskable interrupts
> >>>>> LOC: 47839 147583 13807 32602 Local timer interrupts
> >>>>> SPU: 0 0 0 0 Spurious interrupts
> >>>>> PMI: 0 0 0 0 Performance monitoring interrupts
> >>>>> IWI: 0 0 0 0 IRQ work interrupts
> >>>>> RTR: 0 0 0 0 APIC ICR read retries
> >>>>> RES: 11955 6931 6567 4946 Rescheduling interrupts
> >>>>> CAL: 268 223 210 217 Function call interrupts
> >>>>> TLB: 63 57 61 50 TLB shootdowns
> >>>>> TRM: 0 0 0 0 Thermal event interrupts
> >>>>> THR: 0 0 0 0 Threshold APIC interrupts
> >>>>> MCE: 0 0 0 0 Machine check exceptions
> >>>>> MCP: 5 6 6 6 Machine check polls
> >>>>> ERR: 0
> >>>>> MIS: 0
> >>>>> PIN: 0 0 0 0 Posted-interrupt notification event
> >>>>> NPI: 0 0 0 0 Nested posted-interrupt event
> >>>>> PIW: 0 0 0 0 Posted-interrupt wakeup event
> >>>>>
> >>>>>
> >>>>
> >>>>> [ 238.245509] mmc0: Timeout waiting for hardware interrupt.
> >>>>> [ 238.250944] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [ 238.257422] mmc0: sdhci: Sys addr: 0x20000008 |
> >>>>> Version: 0x00001002 [ 238.263900] mmc0: sdhci: Blk size: 0x00007200
> >>>>> | Blk cnt: 0x00000000 [ 238.270373] mmc0: sdhci: Argument:
> 0x00118678
> >> |
> >>>> Trn mode: 0x0000002b
> >>>>> [ 238.276848] mmc0: sdhci: Present: 0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [ 238.283323] mmc0: sdhci: Power: 0x0000000b | Blk gap:
> 0x00000080
> >>>>> [ 238.289797] mmc0: sdhci: Wake-up: 0x00000000 | Clock:
> 0x00000007
> >>>>> [ 238.296265] mmc0: sdhci: Timeout: 0x00000007 | Int stat:
> 0x00000003
> >>>>> [ 238.302734] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [ 238.309206] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [ 238.315682] mmc0: sdhci: Caps: 0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [ 238.322154] mmc0: sdhci: Cmd: 0x0000193a | Max curr:
> 0x00000000
> >>>>> [ 238.328618] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>>>> [ 238.335080] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [ 238.341547] mmc0: sdhci: Host ctl2: 0x0000008d [ 238.346016]
> mmc0:
> >>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 238.353180] mmc0: sdhci:
> >>>> ============================================
> >>>>> [ 248.485507] mmc0: Timeout waiting for hardware interrupt.
> >>>>> [ 248.490937] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [ 248.497412] mmc0: sdhci: Sys addr: 0x20000008 |
> >>>>> Version: 0x00001002 [ 248.503887] mmc0: sdhci: Blk size: 0x00007200
> >>>>> | Blk cnt: 0x00000000 [ 248.510362] mmc0: sdhci: Argument:
> 0x00000000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [ 248.516838] mmc0: sdhci: Present: 0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [ 248.523313] mmc0: sdhci: Power: 0x0000000b | Blk gap:
> 0x00000080
> >>>>> [ 248.529790] mmc0: sdhci: Wake-up: 0x00000000 | Clock:
> 0x00000007
> >>>>> [ 248.536266] mmc0: sdhci: Timeout: 0x0000000e | Int stat:
> 0x00018002
> >>>>> [ 248.542742] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [ 248.549219] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [ 248.555693] mmc0: sdhci: Caps: 0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [ 248.562167] mmc0: sdhci: Cmd: 0x00000c1b | Max curr:
> 0x00000000
> >>>>> [ 248.568643] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>>>> [ 248.575117] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [ 248.581594] mmc0: sdhci: Host ctl2: 0x0000008d [ 248.586065]
> mmc0:
> >>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 248.593228] mmc0: sdhci:
> >>>> ============================================
> >>>>> [ 249.137502] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >>>>> [ 249.143462] rcu: 1-...!: (1 GPs behind)
> >> idle=84a/1/0x4000000000000000
> >>>> softirq=5875/5876 fqs=2479
> >>>>> [ 249.152458] rcu: (detected by 3, t=21016 jiffies, g=5793, q=391)
> >>>>> [ 249.158588] Sending NMI from CPU 3 to CPUs 1:
> >>>>> [ 249.163984] rcu: rcu_sched kthread starved for 10920 jiffies! g5793
> >>>>> f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=2 [ 249.174284] rcu:
> >> RCU
> >>>> grace-period kthread stack dump:
> >>>>> [ 249.179364] rcu_sched I 0 10 2 0x80000000
> >>>>> [ 249.184882] Call Trace:
> >>>>> [ 249.187360] ? __schedule+0x340/0x8a0 [ 249.191046]
> >>>>> schedule+0x32/0x80 [ 249.194211] schedule_timeout+0x180/0x360 [
> >>>>> 249.198248] ? __next_timer_interrupt+0xd0/0xd0 [ 249.202806]
> >>>>> rcu_gp_kthread+0x522/0xec0 [ 249.206667] ?
> >> call_rcu_sched+0x20/0x20
> >>>>> [ 249.210528] kthread+0x115/0x130 [ 249.213779] ?
> >>>>> kthread_create_worker_on_cpu+0x70/0x70
> >>>>> [ 249.218861] ret_from_fork+0x20/0x30 [ 256.449655] igb
> >>>>> 0000:04:00.0 enp4s0: igb: enp4s0 NIC Link is Down [ 258.725508]
> mmc0:
> >>>>> Timeout waiting for hardware cmd interrupt.
> >>>>> [ 258.731286] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [ 258.737761] mmc0: sdhci: Sys addr: 0x20000008 |
> >>>>> Version: 0x00001002 [ 258.744238] mmc0: sdhci: Blk size: 0x00007200
> >>>>> | Blk cnt: 0x00000000 [ 258.750716] mmc0: sdhci: Argument:
> 0x00010000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [ 258.757191] mmc0: sdhci: Present: 0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [ 258.763666] mmc0: sdhci: Power: 0x0000000b | Blk gap:
> 0x00000080
> >>>>> [ 258.770143] mmc0: sdhci: Wake-up: 0x00000000 | Clock:
> 0x00000007
> >>>>> [ 258.776620] mmc0: sdhci: Timeout: 0x0000000e | Int stat:
> 0x00018001
> >>>>> [ 258.783097] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [ 258.789570] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [ 258.796044] mmc0: sdhci: Caps: 0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [ 258.802518] mmc0: sdhci: Cmd: 0x00000d1a | Max curr:
> 0x00000000
> >>>>> [ 258.808994] mmc0: sdhci: Resp[0]: 0x00400900 | Resp[1]: 0xffffffff
> >>>>> [ 258.815468] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [ 258.821940] mmc0: sdhci: Host ctl2: 0x0000008d [ 258.826411]
> mmc0:
> >>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 258.833578] mmc0: sdhci:
> >>>> ============================================
> >>>>> [ 268.965509] mmc0: Timeout waiting for hardware cmd interrupt.
> >>>>> [ 268.971285] mmc0: sdhci: ============ SDHCI REGISTER DUMP
> >>>>> =========== [ 268.977759] mmc0: sdhci: Sys addr: 0x20000008 |
> >>>>> Version: 0x00001002 [ 268.984233] mmc0: sdhci: Blk size: 0x00007200
> >>>>> | Blk cnt: 0x00000000 [ 268.990704] mmc0: sdhci: Argument:
> 0x00010000
> >> |
> >>>> Trn mode: 0x00000023
> >>>>> [ 268.997174] mmc0: sdhci: Present: 0x1fff0000 | Host ctl:
> 0x0000003d
> >>>>> [ 269.003647] mmc0: sdhci: Power: 0x0000000b | Blk gap:
> 0x00000080
> >>>>> [ 269.010118] mmc0: sdhci: Wake-up: 0x00000000 | Clock:
> 0x00000007
> >>>>> [ 269.016590] mmc0: sdhci: Timeout: 0x0000000e | Int stat:
> 0x00018001
> >>>>> [ 269.023067] mmc0: sdhci: Int enab: 0x03ff000b | Sig enab:
> >>>>> 0x03ff000b [ 269.029538] mmc0: sdhci: ACmd stat: 0x00000000 | Slot
> int:
> >>>> 0x00000001
> >>>>> [ 269.036011] mmc0: sdhci: Caps: 0x546ec881 | Caps_1:
> 0x80000807
> >>>>> [ 269.042484] mmc0: sdhci: Cmd: 0x00000d1a | Max curr:
> 0x00000000
> >>>>> [ 269.048958] mmc0: sdhci: Resp[0]: 0x00000900 | Resp[1]: 0xffffffff
> >>>>> [ 269.055432] mmc0: sdhci: Resp[2]: 0x328f5903 | Resp[3]:
> 0x00000900
> >>>>> [ 269.061904] mmc0: sdhci: Host ctl2: 0x0000008d [ 269.066373]
> mmc0:
> >>>>> sdhci: ADMA Err: 0x00000000 | ADMA Ptr: 0x0000000179fcb20c [
> >>>>> 269.073542] mmc0: sdhci:
> >>>> ============================================
> >>>>>
> >>>>>
> >>>>> Kind regards,
> >>>>> Norbert
> >>>> I've seen this before when I was playing around with getting the ipipe
> to
> >> run
> >>>> on my edison board. I don't think it's ipipe related. I found this thread
> on
> >>>> patchwork:
> >>>>
> >>>> https://hes32-
> >>>>
> >>
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fpatch
> >>>> work.kernel.org%2fpatch%2f10705823%2f&umid=244f247b-9424-4d97-
> >> a301-
> >>>> 43c83cda477c&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> >>>> f573aabf7731d50ca6bcfe66d4d4d92dfe84b1b2
> >>>
> >>> Ah thanks,
> >>>
> >>> Quote: "Commands and resets are under spin lock, so no possibility for
> >> preemption,
> >>> and certainly a few microseconds isn't going to make any difference to
> >> these
> >>> timeouts, so I don't see how this patch could help."
> >>>
> >>> So I guess this is not an issue under plain Linux, but Cobalt could preempt
> >> for a
> >>> couple of ms (or even longer if you managed to get an RT thread stuck).
> >>
> >> That may also be different on PREEMPT-RT.
> >>
> >>>>
> >>>> There was another thread on the Ubuntu forums that solved the issue
> by
> >>>> disabling the sdcard in the bios.
> >>>
> >>> I can't do that, as this is the eMMC the system is running from.
> >>>
> >>> BTW, if I would want to limit such issues, could I limit Linux IRQs (apart
> from
> >> a handful of necessary maintainance/timer) to a single core?
> >>>
> >>
> >> Yes, that's the best option on a multicore system for both sides, RT and
> >> Linux.
> >
> > Next question: how?
> >
> > I played around with isolcpus, but that had severe sideeffects:
> > https://hes32-
> ctp.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fxenom
> ai.org%2fpipermail%2fxenomai%2f2019%2dOctober%2f041902.html&umid=c
> 44405e5-b378-4e77-a655-
> 510a18e1f8a0&auth=144056baf7302d777acad187aac74d4b9ba425e1-
> 57fcb874bfb6cbc083c56dba173ad30125d41e08
> >
> > This was before Philippe fixed some serious scheduler bugs,
> > but I stopped trying to do anything "smart" after ending up with random
> weirdness.
> >
>
> isolcpus should work. If not, we need to debug and fix.
See it does, atleast I did not run into problems yet.
I added "irqaffinity=0 isolcpus=1-3" to the cmdline, which binds the driver irqs to core 0,
and prevents automatic scheduling to the remaining cpus.
I am a bit unclear how the Xenomai-side is supposed to work.
Setting "xenomai.supported_cpus=0xe" will strictly deny running anything there,
My process did bail out with
[ 27.726681] [Xenomai] thread rtgate[793] switched to non-rt CPU0, aborted.
That means I either have to
echo 0xfffffffe > /proc/xenomai/affinity
or set affinity manually (early enough) each time.
Whatever I do, I guess xenomais "printf" thread will always have to run
on a cobalt-enabled thread, even if its just using linux sycalls?
Why is there no xenomai.affinity kernel commandline parameter (or alternatively an xenomai.isolcpus)?
>
> Less convenient alternative: manually tuning /proc/irq/*/smp_affinity or
> instructing irqbalanced to keep some cores free. cgroups with CPU
> affinities can help to keep tasks away.
Yes, systemd-system.conf has an CPUAffinity option. The Linux-side seems solved
with the two kernel parameters anyways.
Thanks,
Norbert
________________________________
This message and any attachments are solely for the use of the intended recipients. They may contain privileged and/or confidential information or other information protected from disclosure. If you are not an intended recipient, you are hereby notified that you received this email in error and that any review, dissemination, distribution or copying of this email and any attachment is strictly prohibited. If you have received this email in error, please contact the sender and delete the message and any attachment from your system.
ANDRITZ HYDRO GmbH
Rechtsform/ Legal form: Gesellschaft mit beschränkter Haftung / Corporation
Firmensitz/ Registered seat: Wien
Firmenbuchgericht/ Court of registry: Handelsgericht Wien
Firmenbuchnummer/ Company registration: FN 61833 g
DVR: 0605077
UID-Nr.: ATU14756806
Thank You
________________________________
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Interrupt timeout
2020-02-25 17:53 ` Lange Norbert
@ 2020-02-25 18:17 ` Jan Kiszka
0 siblings, 0 replies; 9+ messages in thread
From: Jan Kiszka @ 2020-02-25 18:17 UTC (permalink / raw)
To: Lange Norbert, Greg Gallagher; +Cc: Xenomai (xenomai@xenomai.org)
On 25.02.20 18:53, Lange Norbert wrote:
>> isolcpus should work. If not, we need to debug and fix.
>
> See it does, atleast I did not run into problems yet.
>
> I added "irqaffinity=0 isolcpus=1-3" to the cmdline, which binds the driver irqs to core 0,
> and prevents automatic scheduling to the remaining cpus.
>
> I am a bit unclear how the Xenomai-side is supposed to work.
>
> Setting "xenomai.supported_cpus=0xe" will strictly deny running anything there,
> My process did bail out with
> [ 27.726681] [Xenomai] thread rtgate[793] switched to non-rt CPU0, aborted.
>
> That means I either have to
> echo 0xfffffffe > /proc/xenomai/affinity
>
> or set affinity manually (early enough) each time.
> Whatever I do, I guess xenomais "printf" thread will always have to run
> on a cobalt-enabled thread, even if its just using linux sycalls?
>
> Why is there no xenomai.affinity kernel commandline parameter (or alternatively an xenomai.isolcpus)?
>
xenomai.supported_cpus=0xe should work for your case. Processes starting
RT threads without forcing them to CPU 0 should find them auto-migrated
to a supported RT core. Well, that's without isolcpus. You probably need
to start them right away on one of those isolated cores.
>
>>
>> Less convenient alternative: manually tuning /proc/irq/*/smp_affinity or
>> instructing irqbalanced to keep some cores free. cgroups with CPU
>> affinities can help to keep tasks away.
>
> Yes, systemd-system.conf has an CPUAffinity option. The Linux-side seems solved
> with the two kernel parameters anyways.
Perfect.
Jan
--
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-02-25 18:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25 13:56 Interrupt timeout Lange Norbert
2020-02-25 15:24 ` Greg Gallagher
2020-02-25 15:40 ` Lange Norbert
2020-02-25 15:46 ` Jan Kiszka
2020-02-25 15:51 ` Greg Gallagher
2020-02-25 15:55 ` Lange Norbert
2020-02-25 17:05 ` Jan Kiszka
2020-02-25 17:53 ` Lange Norbert
2020-02-25 18:17 ` Jan Kiszka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.