All of lore.kernel.org
 help / color / mirror / Atom feed
* DOMU: Virtual Function FLR in PCI passthrough is crashing
@ 2022-04-20 12:48 Naresh Bhat
  2022-04-20 13:07 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Naresh Bhat @ 2022-04-20 12:48 UTC (permalink / raw)
  To: xen-devel; +Cc: julien, sstabellini

Hi,

I have the following setup and try to test the Function Level Reset feature.  Any suggestions or pointers will be very much helpful.

DOM0
Distribution: Ubuntu-20.04.3 (kernel 5.8.0-43)
Xen version : 4.11.4-pre

DOMU
Distribution: Ubuntu-18.04.6 LTS (kernel 5.8.0)
PCIe device with SRIOV support, VF (Virtual Function) interface connected to DOMU via PCI pass-through

Issue on DOMU: 
1. Enable MSIX on DOMU (We have used the following kernel APIs pci_enable_msix_range, pci_alloc_irq_vectors)
2. Execute FLR (Function Level Reset) via sysfs interface on the PCIe passthrough device in DOMU
   # echo "1" > /sys/bus/pci/devices/<ID>/reset

The following crash observed 

[ 4126.391455] BUG: unable to handle page fault for address: ffffc90040029000
[ 4126.391489] #PF: supervisor write access in kernel mode
[ 4126.391503] #PF: error_code(0x0003) - permissions violation
[ 4126.391516] PGD 94980067 P4D 94980067 PUD 16a155067 PMD 16a156067 PTE 80100000a000c075
[ 4126.391537] Oops: 0003 [#1] SMP NOPTI
[ 4126.391550] CPU: 0 PID: 971 Comm: bash Tainted: G           OE     5.8.0 #1
[ 4126.391570] RIP: e030:__pci_write_msi_msg+0x59/0x150
[ 4126.391580] Code: 8b 50 d8 85 d2 75 31 83 78 fc 03 74 2b f6 47 54 01 74 6e f6 47 55 02 75 1f 0f b7 47 56 c1 e0 04 48 98 48 03 47 60 74 10 8b 16 <89> 10 8b 56 04 89 50 04 8b 56 08 89 50 08 48 8b 03 49 89 44 24 20
[ 4126.391606] RSP: e02b:ffffc90040407cc0 EFLAGS: 00010286
[ 4126.391623] RAX: ffffc90040029000 RBX: ffff888164cfb120 RCX: 0000000000000000
[ 4126.391639] RDX: 0000000000000000 RSI: ffff888164cfb120 RDI: ffff888164cfb100
[ 4126.391653] RBP: ffffc90040407cf8 R08: 000053f2d6975617 R09: ffff888169c4e238
[ 4126.391672] R10: 0000000000000000 R11: ffffffff8266b248 R12: ffff888164cfb100
[ 4126.391688] R13: ffff88815e81c2e0 R14: ffff88815e81c130 R15: ffff8881648394a0
[ 4126.391723] FS:  00007f72b4b9b740(0000) GS:ffff88816ac00000(0000) knlGS:0000000000000000
[ 4126.391742] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4126.391756] CR2: ffffc90040029000 CR3: 0000000167b10000 CR4: 0000000000040660
[ 4126.391781] Call Trace:
[ 4126.391799]  default_restore_msi_irq+0x38/0x70
[ 4126.391818]  default_restore_msi_irqs+0x2f/0x80
[ 4126.391836]  arch_restore_msi_irqs+0x15/0x20
[ 4126.391851]  pci_restore_msi_state+0xa1/0x230
[ 4126.391870]  pci_restore_state.part.0+0x319/0x440
[ 4126.391888]  pci_dev_restore+0x4a/0x60
[ 4126.391901]  pci_reset_function+0x4b/0x70
[ 4126.391915]  reset_store+0x5d/0xa0
[ 4126.391931]  dev_attr_store+0x17/0x30
[ 4126.391944]  sysfs_kf_write+0x3e/0x50
[ 4126.391958]  kernfs_fop_write+0xda/0x1b0
[ 4126.391973]  vfs_write+0xc9/0x200
[ 4126.391986]  ksys_write+0x67/0xe0
[ 4126.392002]  __x64_sys_write+0x1a/0x20
[ 4126.392018]  do_syscall_64+0x52/0xc0
[ 4126.392033]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 4126.392049] RIP: 0033:0x7f72b4277224
[ 4126.392064] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 c1 07 2e 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 41 54 55 49 89 d4 53 48 89 f5
[ 4126.392093] RSP: 002b:00007ffc5236f578 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 4126.392114] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f72b4277224
[ 4126.392128] RDX: 0000000000000002 RSI: 0000562bde52f440 RDI: 0000000000000001
[ 4126.392140] RBP: 0000562bde52f440 R08: 000000000000000a R09: 0000000000000001
[ 4126.392155] R10: 000000000000000a R11: 0000000000000246 R12: 00007f72b4553760
[ 4126.392171] R13: 0000000000000002 R14: 00007f72b454f2a0 R15: 00007f72b454e760
[ 4126.392185] Modules linked in: <driver function> intel_rapl_msr intel_rapl_common crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl xen_pcifront sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4
[ 4126.392228] CR2: ffffc90040029000
[ 4126.392238] ---[ end trace 23e8ad345e1ef956 ]---


Thanks and Regards
-Naresh Bhat

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: DOMU: Virtual Function FLR in PCI passthrough is crashing
  2022-04-20 12:48 DOMU: Virtual Function FLR in PCI passthrough is crashing Naresh Bhat
@ 2022-04-20 13:07 ` Jan Beulich
  2022-04-25 11:01   ` [EXT] " Naresh Bhat
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2022-04-20 13:07 UTC (permalink / raw)
  To: Naresh Bhat; +Cc: julien, sstabellini, xen-devel

On 20.04.2022 14:48, Naresh Bhat wrote:
> I have the following setup and try to test the Function Level Reset feature.  Any suggestions or pointers will be very much helpful.
> 
> DOM0
> Distribution: Ubuntu-20.04.3 (kernel 5.8.0-43)
> Xen version : 4.11.4-pre
> 
> DOMU
> Distribution: Ubuntu-18.04.6 LTS (kernel 5.8.0)
> PCIe device with SRIOV support, VF (Virtual Function) interface connected to DOMU via PCI pass-through
> 
> Issue on DOMU: 
> 1. Enable MSIX on DOMU (We have used the following kernel APIs pci_enable_msix_range, pci_alloc_irq_vectors)
> 2. Execute FLR (Function Level Reset) via sysfs interface on the PCIe passthrough device in DOMU
>    # echo "1" > /sys/bus/pci/devices/<ID>/reset
> 
> The following crash observed 
> 
> [ 4126.391455] BUG: unable to handle page fault for address: ffffc90040029000
> [ 4126.391489] #PF: supervisor write access in kernel mode
> [ 4126.391503] #PF: error_code(0x0003) - permissions violation
> [ 4126.391516] PGD 94980067 P4D 94980067 PUD 16a155067 PMD 16a156067 PTE 80100000a000c075
> [ 4126.391537] Oops: 0003 [#1] SMP NOPTI
> [ 4126.391550] CPU: 0 PID: 971 Comm: bash Tainted: G           OE     5.8.0 #1
> [ 4126.391570] RIP: e030:__pci_write_msi_msg+0x59/0x150
> [ 4126.391580] Code: 8b 50 d8 85 d2 75 31 83 78 fc 03 74 2b f6 47 54 01 74 6e f6 47 55 02 75 1f 0f b7 47 56 c1 e0 04 48 98 48 03 47 60 74 10 8b 16 <89> 10 8b 56 04 89 50 04 8b 56 08 89 50 08 48 8b 03 49 89 44 24 20
> [ 4126.391606] RSP: e02b:ffffc90040407cc0 EFLAGS: 00010286

The RSP related selector value suggests you're talking about a PV DomU.
Such a DomU cannot write the MSI-X table directly, yet at a guess (from
the PTE displayed) that's what the insn does where the crash occurred. I
would guess you've hit yet another place in the kernel where proper PV
abstraction is missing. You may want to check with newer kernels.

As to FLR - I guess this operation as a whole needs passing through
pcifront to pciback, such that the operation can be carried out safely
(e.g. to save and restore active MSIs, which is what I infer is being
attempted here, as per the stack trace).

Jan



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [EXT] Re: DOMU: Virtual Function FLR in PCI passthrough is crashing
  2022-04-20 13:07 ` Jan Beulich
@ 2022-04-25 11:01   ` Naresh Bhat
  2022-04-25 11:15     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Naresh Bhat @ 2022-04-25 11:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

Hi Jan Beulich, 

Thank you very much. Please see my inline comments below.

From: Jan Beulich <jbeulich@suse.com>
Sent: 20 April 2022 18:37
To: Naresh Bhat <nareshb@marvell.com>
Cc: julien@xen.org <julien@xen.org>; sstabellini@kernel.org <sstabellini@kernel.org>; xen-devel@lists.xenproject.org <xen-devel@lists.xenproject.org>
Subject: [EXT] Re: DOMU: Virtual Function FLR in PCI passthrough is crashing 
 
External Email

----------------------------------------------------------------------
On 20.04.2022 14:48, Naresh Bhat wrote:
> I have the following setup and try to test the Function Level Reset feature.  Any suggestions or pointers will be very much helpful.
> 
> DOM0
> Distribution: Ubuntu-20.04.3 (kernel 5.8.0-43)
> Xen version : 4.11.4-pre
> 
> DOMU
> Distribution: Ubuntu-18.04.6 LTS (kernel 5.8.0)
> PCIe device with SRIOV support, VF (Virtual Function) interface connected to DOMU via PCI pass-through
> 
> Issue on DOMU: 
> 1. Enable MSIX on DOMU (We have used the following kernel APIs pci_enable_msix_range, pci_alloc_irq_vectors)
> 2. Execute FLR (Function Level Reset) via sysfs interface on the PCIe passthrough device in DOMU
>    # echo "1" > /sys/bus/pci/devices/<ID>/reset
> 
> The following crash observed 
> 
> [ 4126.391455] BUG: unable to handle page fault for address: ffffc90040029000
> [ 4126.391489] #PF: supervisor write access in kernel mode
> [ 4126.391503] #PF: error_code(0x0003) - permissions violation
> [ 4126.391516] PGD 94980067 P4D 94980067 PUD 16a155067 PMD 16a156067 PTE 80100000a000c075
> [ 4126.391537] Oops: 0003 [#1] SMP NOPTI
> [ 4126.391550] CPU: 0 PID: 971 Comm: bash Tainted: G           OE     5.8.0 #1
> [ 4126.391570] RIP: e030:__pci_write_msi_msg+0x59/0x150
> [ 4126.391580] Code: 8b 50 d8 85 d2 75 31 83 78 fc 03 74 2b f6 47 54 01 74 6e f6 47 55 02 75 1f 0f b7 47 56 c1 e0 04 48 98 48 03 47 60 74 10 8b 16 <89> 10 8b 56 04 89 50 04 8b 56 08 89 50 08 48 8b 03 49 89 44 24 20
> [ 4126.391606] RSP: e02b:ffffc90040407cc0 EFLAGS: 00010286

The RSP related selector value suggests you're talking about a PV DomU.
Such a DomU cannot write the MSI-X table directly, yet at a guess (from
the PTE displayed) that's what the insn does where the crash occurred. I
would guess you've hit yet another place in the kernel where proper PV
abstraction is missing. You may want to check with newer kernels.

[Naresh]: We have tested with latest kernel i.e. 5.17.0 kernel, issue persists.

As to FLR - I guess this operation as a whole needs passing through
pcifront to pciback, such that the operation can be carried out safely
(e.g. to save and restore active MSIs, which is what I infer is being
attempted here, as per the stack trace).

[Naresh]: Any idea when the support will be added to Xen ?

Jan


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [EXT] Re: DOMU: Virtual Function FLR in PCI passthrough is crashing
  2022-04-25 11:01   ` [EXT] " Naresh Bhat
@ 2022-04-25 11:15     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2022-04-25 11:15 UTC (permalink / raw)
  To: Naresh Bhat; +Cc: xen-devel

On 25.04.2022 13:01, Naresh Bhat wrote:
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 20 April 2022 18:37
> 
> On 20.04.2022 14:48, Naresh Bhat wrote:
>> I have the following setup and try to test the Function Level Reset feature.  Any suggestions or pointers will be very much helpful.
>>
>> DOM0
>> Distribution: Ubuntu-20.04.3 (kernel 5.8.0-43)
>> Xen version : 4.11.4-pre
>>
>> DOMU
>> Distribution: Ubuntu-18.04.6 LTS (kernel 5.8.0)
>> PCIe device with SRIOV support, VF (Virtual Function) interface connected to DOMU via PCI pass-through
>>
>> Issue on DOMU: 
>> 1. Enable MSIX on DOMU (We have used the following kernel APIs pci_enable_msix_range, pci_alloc_irq_vectors)
>> 2. Execute FLR (Function Level Reset) via sysfs interface on the PCIe passthrough device in DOMU
>>     # echo "1" > /sys/bus/pci/devices/<ID>/reset
>>
>> The following crash observed 
>>
>> [ 4126.391455] BUG: unable to handle page fault for address: ffffc90040029000
>> [ 4126.391489] #PF: supervisor write access in kernel mode
>> [ 4126.391503] #PF: error_code(0x0003) - permissions violation
>> [ 4126.391516] PGD 94980067 P4D 94980067 PUD 16a155067 PMD 16a156067 PTE 80100000a000c075
>> [ 4126.391537] Oops: 0003 [#1] SMP NOPTI
>> [ 4126.391550] CPU: 0 PID: 971 Comm: bash Tainted: G           OE     5.8.0 #1
>> [ 4126.391570] RIP: e030:__pci_write_msi_msg+0x59/0x150
>> [ 4126.391580] Code: 8b 50 d8 85 d2 75 31 83 78 fc 03 74 2b f6 47 54 01 74 6e f6 47 55 02 75 1f 0f b7 47 56 c1 e0 04 48 98 48 03 47 60 74 10 8b 16 <89> 10 8b 56 04 89 50 04 8b 56 08 89 50 08 48 8b 03 49 89 44 24 20
>> [ 4126.391606] RSP: e02b:ffffc90040407cc0 EFLAGS: 00010286
> 
> The RSP related selector value suggests you're talking about a PV DomU.
> Such a DomU cannot write the MSI-X table directly, yet at a guess (from
> the PTE displayed) that's what the insn does where the crash occurred. I
> would guess you've hit yet another place in the kernel where proper PV
> abstraction is missing. You may want to check with newer kernels.
> 
> [Naresh]: We have tested with latest kernel i.e. 5.17.0 kernel, issue persists.

Thanks for checking.

> As to FLR - I guess this operation as a whole needs passing through
> pcifront to pciback, such that the operation can be carried out safely
> (e.g. to save and restore active MSIs, which is what I infer is being
> attempted here, as per the stack trace).
> 
> [Naresh]: Any idea when the support will be added to Xen ?

I'm not sure there's anything in need to be added to Xen, except for
an addition to the related public header (io/pciif.h). It's pcifront
and pciback (part of the kernel) which would need extending. I'm
unaware of anyone having plans in that direction. Maybe you want to
make an attempt?

Jan



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-04-25 11:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-20 12:48 DOMU: Virtual Function FLR in PCI passthrough is crashing Naresh Bhat
2022-04-20 13:07 ` Jan Beulich
2022-04-25 11:01   ` [EXT] " Naresh Bhat
2022-04-25 11:15     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.