xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
@ 2021-08-25 15:24 Marek Marczykowski-Górecki
  2021-08-25 15:33 ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-25 15:24 UTC (permalink / raw)
  To: xen-devel, Thomas Gleixner, Bjorn Helgaas; +Cc: linux-pci, stable, regressions

[-- Attachment #1: Type: text/plain, Size: 4883 bytes --]

Hi,

On recent kernel I get kernel panic when starting a Xen PV domain with
PCI devices assigned. This happens on 5.10.60 (worked on .54) and
5.4.142 (worked on .136): 

[   13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
[   13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
[   13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
[   13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
[   13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
[   14.036142] e1000e: Intel(R) PRO/1000 Network Driver
[   14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[   14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
[   14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[   14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
[   14.045197] #PF: supervisor write access in kernel mode
[   14.045202] #PF: error_code(0x0003) - permissions violation
[   14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
[   14.045227] Oops: 0003 [#1] SMP NOPTI
[   14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G        W         5.14.0-rc7-1.fc32.qubes.x86_64 #15
[   14.045245] Workqueue: events work_for_cpu_fn
[   14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
[   14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
[   14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
[   14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
[   14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
[   14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
[   14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
[   14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
[   14.045393] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
[   14.045401] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
[   14.045420] Call Trace:
[   14.045431]  e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
[   14.045479]  e1000_probe+0x41f/0xdb0 [e1000e]
[   14.045506]  local_pci_probe+0x42/0x80
[   14.045515]  work_for_cpu_fn+0x16/0x20
[   14.045522]  process_one_work+0x1ec/0x390
[   14.045529]  worker_thread+0x53/0x3e0
[   14.045534]  ? process_one_work+0x390/0x390
[   14.045540]  kthread+0x127/0x150
[   14.045548]  ? set_kthread_struct+0x40/0x40
[   14.045554]  ret_from_fork+0x22/0x30
[   14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront
[   14.045620] CR2: ffffc9004069100c
[   14.045627] ---[ end trace 307f5bb3bd9f30b4 ]---
[   14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
[   14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
[   14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
[   14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
[   14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
[   14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
[   14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
[   14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
[   14.045698] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
[   14.045706] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[   14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
[   14.045718] Kernel panic - not syncing: Fatal exception
[   14.045726] Kernel Offset: disabled

I've bisected it down to this commit:

    commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Thu Jul 29 23:51:41 2021 +0200

        PCI/MSI: Mask all unused MSI-X entries

I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think
Xen version matters here.

Any idea how to fix it?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
  2021-08-25 15:24 Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough Marek Marczykowski-Górecki
@ 2021-08-25 15:33 ` Jan Beulich
  2021-08-25 15:47   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-08-25 15:33 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: linux-pci, stable, regressions, xen-devel, Thomas Gleixner,
	Bjorn Helgaas

On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
> On recent kernel I get kernel panic when starting a Xen PV domain with
> PCI devices assigned. This happens on 5.10.60 (worked on .54) and
> 5.4.142 (worked on .136): 
> 
> [   13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
> [   13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
> [   13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
> [   13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
> [   13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
> [   14.036142] e1000e: Intel(R) PRO/1000 Network Driver
> [   14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> [   14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
> [   14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> [   14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
> [   14.045197] #PF: supervisor write access in kernel mode
> [   14.045202] #PF: error_code(0x0003) - permissions violation
> [   14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075

I'm curious what lives at physical address FEBD4000. The maximum verbosity
hypervisor log may also have a hint as to why this is a read-only PTE.

> [   14.045227] Oops: 0003 [#1] SMP NOPTI
> [   14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G        W         5.14.0-rc7-1.fc32.qubes.x86_64 #15
> [   14.045245] Workqueue: events work_for_cpu_fn
> [   14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> [   14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> [   14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> [   14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> [   14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> [   14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> [   14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> [   14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> [   14.045393] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> [   14.045401] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> [   14.045420] Call Trace:
> [   14.045431]  e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
> [   14.045479]  e1000_probe+0x41f/0xdb0 [e1000e]

Otoh, from this it's pretty clear it's not a device Xen may have found
a need to access for its own purposes. If aforementioned address covers
(or is adjacent to) the MSI-X table of a device drive by this driver,
then it would also be helpful to know how many MSI-X entries the device
reports its table can have.

Jan

> [   14.045506]  local_pci_probe+0x42/0x80
> [   14.045515]  work_for_cpu_fn+0x16/0x20
> [   14.045522]  process_one_work+0x1ec/0x390
> [   14.045529]  worker_thread+0x53/0x3e0
> [   14.045534]  ? process_one_work+0x390/0x390
> [   14.045540]  kthread+0x127/0x150
> [   14.045548]  ? set_kthread_struct+0x40/0x40
> [   14.045554]  ret_from_fork+0x22/0x30
> [   14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront
> [   14.045620] CR2: ffffc9004069100c
> [   14.045627] ---[ end trace 307f5bb3bd9f30b4 ]---
> [   14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> [   14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> [   14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> [   14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> [   14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> [   14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> [   14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> [   14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> [   14.045698] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> [   14.045706] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> [   14.045718] Kernel panic - not syncing: Fatal exception
> [   14.045726] Kernel Offset: disabled
> 
> I've bisected it down to this commit:
> 
>     commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f
>     Author: Thomas Gleixner <tglx@linutronix.de>
>     Date:   Thu Jul 29 23:51:41 2021 +0200
> 
>         PCI/MSI: Mask all unused MSI-X entries
> 
> I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think
> Xen version matters here.
> 
> Any idea how to fix it?
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
  2021-08-25 15:33 ` Jan Beulich
@ 2021-08-25 15:47   ` Marek Marczykowski-Górecki
  2021-08-25 15:55     ` Jan Beulich
  0 siblings, 1 reply; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-25 15:47 UTC (permalink / raw)
  To: Jan Beulich
  Cc: linux-pci, stable, regressions, xen-devel, Thomas Gleixner,
	Bjorn Helgaas

[-- Attachment #1: Type: text/plain, Size: 8385 bytes --]

On Wed, Aug 25, 2021 at 05:33:54PM +0200, Jan Beulich wrote:
> On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
> > On recent kernel I get kernel panic when starting a Xen PV domain with
> > PCI devices assigned. This happens on 5.10.60 (worked on .54) and
> > 5.4.142 (worked on .136): 
> > 
> > [   13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
> > [   13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
> > [   13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
> > [   13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
> > [   13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
> > [   14.036142] e1000e: Intel(R) PRO/1000 Network Driver
> > [   14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
> > [   14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
> > [   14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
> > [   14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
> > [   14.045197] #PF: supervisor write access in kernel mode
> > [   14.045202] #PF: error_code(0x0003) - permissions violation
> > [   14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
> 
> I'm curious what lives at physical address FEBD4000. 

This is a third BAR of this device, related to MSI-X:

00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
        Subsystem: Intel Corporation Device 0000
        Physical Slot: 4
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=128K]
        Region 1: Memory at feba0000 (32-bit, non-prefetchable) [size=128K]
        Region 2: I/O ports at c080 [size=32]
        Region 3: Memory at febd4000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at feb40000 [disabled] [size=256K]
        Capabilities: [c8] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [e0] Express (v1) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
                        TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000
        Kernel driver in use: pciback
        Kernel modules: e1000e

> The maximum verbosity
> hypervisor log may also have a hint as to why this is a read-only PTE.

I'll try, if that still makes sense.

> > [   14.045227] Oops: 0003 [#1] SMP NOPTI
> > [   14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G        W         5.14.0-rc7-1.fc32.qubes.x86_64 #15
> > [   14.045245] Workqueue: events work_for_cpu_fn
> > [   14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> > [   14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> > [   14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> > [   14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> > [   14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> > [   14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> > [   14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> > [   14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> > [   14.045393] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> > [   14.045401] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> > [   14.045420] Call Trace:
> > [   14.045431]  e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
> > [   14.045479]  e1000_probe+0x41f/0xdb0 [e1000e]
> 
> Otoh, from this it's pretty clear it's not a device Xen may have found
> a need to access for its own purposes. If aforementioned address covers
> (or is adjacent to) the MSI-X table of a device drive by this driver,
> then it would also be helpful to know how many MSI-X entries the device
> reports its table can have.

See above.

Does PCI passthrough for on PV support MSI-X at all?
If so, I guess the issue is the kernel trying to write directly, instead
of via some hypercall, right?

> > [   14.045506]  local_pci_probe+0x42/0x80
> > [   14.045515]  work_for_cpu_fn+0x16/0x20
> > [   14.045522]  process_one_work+0x1ec/0x390
> > [   14.045529]  worker_thread+0x53/0x3e0
> > [   14.045534]  ? process_one_work+0x390/0x390
> > [   14.045540]  kthread+0x127/0x150
> > [   14.045548]  ? set_kthread_struct+0x40/0x40
> > [   14.045554]  ret_from_fork+0x22/0x30
> > [   14.045565] Modules linked in: e1000e(+) edac_mce_amd rfkill xen_pcifront pcspkr xt_REDIRECT ip6table_filter ip6table_mangle ip6table_raw ip6_tables ipt_REJECT nf_reject_ipv4 xt_state xt_conntrack iptable_filter iptable_mangle iptable_raw xt_MASQUERADE iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xen_scsiback target_core_mod xen_netback xen_privcmd xen_gntdev xen_gntalloc xen_blkback xen_evtchn fuse drm bpf_preload ip_tables overlay xen_blkfront
> > [   14.045620] CR2: ffffc9004069100c
> > [   14.045627] ---[ end trace 307f5bb3bd9f30b4 ]---
> > [   14.045632] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
> > [   14.045640] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
> > [   14.045652] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
> > [   14.045657] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
> > [   14.045663] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
> > [   14.045668] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
> > [   14.045674] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
> > [   14.045679] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
> > [   14.045698] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
> > [   14.045706] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   14.045711] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
> > [   14.045718] Kernel panic - not syncing: Fatal exception
> > [   14.045726] Kernel Offset: disabled
> > 
> > I've bisected it down to this commit:
> > 
> >     commit 7d5ec3d3612396dc6d4b76366d20ab9fc06f399f
> >     Author: Thomas Gleixner <tglx@linutronix.de>
> >     Date:   Thu Jul 29 23:51:41 2021 +0200
> > 
> >         PCI/MSI: Mask all unused MSI-X entries
> > 
> > I can reliably reproduce it on Xen 4.14 and Xen 4.8, so I don't think
> > Xen version matters here.
> > 
> > Any idea how to fix it?
> > 
> 

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
  2021-08-25 15:47   ` Marek Marczykowski-Górecki
@ 2021-08-25 15:55     ` Jan Beulich
  2021-08-26  1:28       ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Beulich @ 2021-08-25 15:55 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki
  Cc: linux-pci, stable, regressions, xen-devel, Thomas Gleixner,
	Bjorn Helgaas

On 25.08.2021 17:47, Marek Marczykowski-Górecki wrote:
> On Wed, Aug 25, 2021 at 05:33:54PM +0200, Jan Beulich wrote:
>> On 25.08.2021 17:24, Marek Marczykowski-Górecki wrote:
>>> On recent kernel I get kernel panic when starting a Xen PV domain with
>>> PCI devices assigned. This happens on 5.10.60 (worked on .54) and
>>> 5.4.142 (worked on .136): 
>>>
>>> [   13.683009] pcifront pci-0: claiming resource 0000:00:00.0/0
>>> [   13.683042] pcifront pci-0: claiming resource 0000:00:00.0/1
>>> [   13.683049] pcifront pci-0: claiming resource 0000:00:00.0/2
>>> [   13.683055] pcifront pci-0: claiming resource 0000:00:00.0/3
>>> [   13.683061] pcifront pci-0: claiming resource 0000:00:00.0/6
>>> [   14.036142] e1000e: Intel(R) PRO/1000 Network Driver
>>> [   14.036179] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>> [   14.036982] e1000e 0000:00:00.0: Xen PCI mapped GSI11 to IRQ13
>>> [   14.044561] e1000e 0000:00:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
>>> [   14.045188] BUG: unable to handle page fault for address: ffffc9004069100c
>>> [   14.045197] #PF: supervisor write access in kernel mode
>>> [   14.045202] #PF: error_code(0x0003) - permissions violation
>>> [   14.045211] PGD 18f1c067 P4D 18f1c067 PUD 4dbd067 PMD 4fba067 PTE 80100000febd4075
>>
>> I'm curious what lives at physical address FEBD4000. 
> 
> This is a third BAR of this device, related to MSI-X:
> 
> 00:04.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection
>         Subsystem: Intel Corporation Device 0000
>         Physical Slot: 4
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0
>         Interrupt: pin A routed to IRQ 11
>         Region 0: Memory at feb80000 (32-bit, non-prefetchable) [size=128K]
>         Region 1: Memory at feba0000 (32-bit, non-prefetchable) [size=128K]
>         Region 2: I/O ports at c080 [size=32]
>         Region 3: Memory at febd4000 (32-bit, non-prefetchable) [size=16K]
>         Expansion ROM at feb40000 [disabled] [size=256K]
>         Capabilities: [c8] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>                 Address: 0000000000000000  Data: 0000
>         Capabilities: [e0] Express (v1) Endpoint, MSI 00
>                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
>                 DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
>                 DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 2.5GT/s (ok), Width x1 (ok)
>                         TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
>         Capabilities: [a0] MSI-X: Enable- Count=5 Masked-
>                 Vector table: BAR=3 offset=00000000
>                 PBA: BAR=3 offset=00002000
>         Kernel driver in use: pciback
>         Kernel modules: e1000e
> 
>> The maximum verbosity
>> hypervisor log may also have a hint as to why this is a read-only PTE.
> 
> I'll try, if that still makes sense.

I think the above data clarifies it already.

>>> [   14.045227] Oops: 0003 [#1] SMP NOPTI
>>> [   14.045234] CPU: 0 PID: 234 Comm: kworker/0:2 Tainted: G        W         5.14.0-rc7-1.fc32.qubes.x86_64 #15
>>> [   14.045245] Workqueue: events work_for_cpu_fn
>>> [   14.045259] RIP: e030:__pci_enable_msix_range.part.0+0x26b/0x5f0
>>> [   14.045271] Code: 2f 96 ff 48 89 44 24 28 48 89 c7 48 85 c0 0f 84 f6 01 00 00 45 0f b7 f6 48 8d 40 0c ba 01 00 00 00 49 c1 e6 04 4a 8d 4c 37 1c <89> 10 48 83 c0 10 48 39 c1 75 f5 41 0f b6 44 24 6a 84 c0 0f 84 48
>>> [   14.045284] RSP: e02b:ffffc9004018bd50 EFLAGS: 00010212
>>> [   14.045290] RAX: ffffc9004069100c RBX: ffff88800ed412f8 RCX: ffffc9004069105c
>>> [   14.045296] RDX: 0000000000000001 RSI: 00000000000febd4 RDI: ffffc90040691000
>>> [   14.045302] RBP: 0000000000000003 R08: 0000000000000000 R09: 00000000febd404f
>>> [   14.045308] R10: 0000000000007ff0 R11: ffff88800ee8ae40 R12: ffff88800ed41000
>>> [   14.045313] R13: 0000000000000000 R14: 0000000000000040 R15: 00000000feba0000
>>> [   14.045393] FS:  0000000000000000(0000) GS:ffff888018400000(0000) knlGS:0000000000000000
>>> [   14.045401] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   14.045407] CR2: ffff8000007f5ea0 CR3: 0000000012f6a000 CR4: 0000000000000660
>>> [   14.045420] Call Trace:
>>> [   14.045431]  e1000e_set_interrupt_capability+0xbf/0xd0 [e1000e]
>>> [   14.045479]  e1000_probe+0x41f/0xdb0 [e1000e]
>>
>> Otoh, from this it's pretty clear it's not a device Xen may have found
>> a need to access for its own purposes. If aforementioned address covers
>> (or is adjacent to) the MSI-X table of a device drive by this driver,
>> then it would also be helpful to know how many MSI-X entries the device
>> reports its table can have.
> 
> See above.
> 
> Does PCI passthrough for on PV support MSI-X at all?

It is supposed to work. The treatment by generic code shouldn't be overly
different from how MSI-X works for Dom0 (Xen specific code of course
differs).

> If so, I guess the issue is the kernel trying to write directly, instead
> of via some hypercall, right?

Indeed. Or to be precise - the kernel isn't supposed to be "writing" this
at all. It is supposed to make hypercalls which may result in such writes.
Such "mask everything" functionality imo is the job of the hypervisor
anyway when talking about PV environments; HVM is a different thing here.

Jan



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough
  2021-08-25 15:55     ` Jan Beulich
@ 2021-08-26  1:28       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 5+ messages in thread
From: Marek Marczykowski-Górecki @ 2021-08-26  1:28 UTC (permalink / raw)
  To: Jan Beulich; +Cc: linux-pci, stable, xen-devel, Thomas Gleixner, Bjorn Helgaas

[-- Attachment #1: Type: text/plain, Size: 1017 bytes --]

On Wed, Aug 25, 2021 at 05:55:09PM +0200, Jan Beulich wrote:
> On 25.08.2021 17:47, Marek Marczykowski-Górecki wrote:
> > If so, I guess the issue is the kernel trying to write directly, instead
> > of via some hypercall, right?
> 
> Indeed. Or to be precise - the kernel isn't supposed to be "writing" this
> at all. It is supposed to make hypercalls which may result in such writes.
> Such "mask everything" functionality imo is the job of the hypervisor
> anyway when talking about PV environments; HVM is a different thing here.

Ok, I dug a bit and found why it was working before: there is
pci_mask_ignore_mask variable, that is set to 1 for Xen PV (and only
then). This bypassed __pci_msi{x,}_desc_mask_irq(), but does not bypass the
new msix_mask_all().
Adding that check back fixes the issue - no crash, the device works,
although the driver doesn't seem to enable MSI/MSI-X (but that wasn't
the case before either).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-08-26  1:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 15:24 Kernel panic in __pci_enable_msix_range on Xen PV with PCI passthrough Marek Marczykowski-Górecki
2021-08-25 15:33 ` Jan Beulich
2021-08-25 15:47   ` Marek Marczykowski-Górecki
2021-08-25 15:55     ` Jan Beulich
2021-08-26  1:28       ` Marek Marczykowski-Górecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).