* [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
@ 2021-08-27 4:15 Selvin Xavier
2021-08-27 12:31 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Selvin Xavier @ 2021-08-27 4:15 UTC (permalink / raw)
To: jgg, dledford; +Cc: linux-rdma, Selvin Xavier
Following Host crash is observed when pci_enable_atomic_ops_to_root
is called with VF PCI device.
PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash"
#0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4
#1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d
#2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d
#3 [ffff9a9481713808] oops_end at ffffffffb9025cd6
#4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417
#5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14
#6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace
[exception RIP: pcie_capability_read_dword+28]
RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000
RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000
RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8
R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6
#8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re]
#9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648
RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001
RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580
R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0
R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
To avoid system crash when VFs are created, enable atomics only for PF now.
Fixes: 35f5ace5dea4 ("RDMA/bnxt_re: Enable global atomic ops if platform supports")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
drivers/infiniband/hw/bnxt_re/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
index 4678bd6..04d5c7d 100644
--- a/drivers/infiniband/hw/bnxt_re/main.c
+++ b/drivers/infiniband/hw/bnxt_re/main.c
@@ -129,7 +129,7 @@ static int bnxt_re_setup_chip_ctx(struct bnxt_re_dev *rdev, u8 wqe_mode)
rdev->rcfw.res = &rdev->qplib_res;
bnxt_re_set_drv_mode(rdev, wqe_mode);
- if (bnxt_qplib_determine_atomics(en_dev->pdev))
+ if (!BNXT_VF(bp) && bnxt_qplib_determine_atomics(en_dev->pdev))
ibdev_info(&rdev->ibdev,
"platform doesn't support global atomics.");
return 0;
--
2.5.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
2021-08-27 4:15 [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs Selvin Xavier
@ 2021-08-27 12:31 ` Jason Gunthorpe
2021-08-31 15:57 ` Selvin Xavier
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-08-27 12:31 UTC (permalink / raw)
To: Selvin Xavier; +Cc: dledford, linux-rdma
On Thu, Aug 26, 2021 at 09:15:38PM -0700, Selvin Xavier wrote:
> Following Host crash is observed when pci_enable_atomic_ops_to_root
> is called with VF PCI device.
>
> PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash"
> #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4
> #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d
> #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d
> #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6
> #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417
> #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14
> #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace
> [exception RIP: pcie_capability_read_dword+28]
> RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246
> RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000
> RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000
> RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8
> R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000
> R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6
> #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re]
> #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
> RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246
> RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648
> RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001
> RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580
> R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0
> R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002
> ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
This feels like a bug in pci_enable_atomic_ops_to_root()? I assume it
hit a case where bus->self == NULL?
Why not fix it there?
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
2021-08-27 12:31 ` Jason Gunthorpe
@ 2021-08-31 15:57 ` Selvin Xavier
2021-09-01 11:50 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Selvin Xavier @ 2021-08-31 15:57 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma
[-- Attachment #1: Type: text/plain, Size: 2722 bytes --]
On Fri, Aug 27, 2021 at 6:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Thu, Aug 26, 2021 at 09:15:38PM -0700, Selvin Xavier wrote:
> > Following Host crash is observed when pci_enable_atomic_ops_to_root
> > is called with VF PCI device.
> >
> > PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash"
> > #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4
> > #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d
> > #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d
> > #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6
> > #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417
> > #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14
> > #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace
> > [exception RIP: pcie_capability_read_dword+28]
> > RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246
> > RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000
> > RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000
> > RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8
> > R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6
> > #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re]
> > #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
> > RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246
> > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648
> > RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001
> > RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580
> > R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0
> > R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002
> > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
>
Apologies for the delay in my response. I was exploring internally to
see if it is a specific issue
with the adapter/host. I see the problem in multiple systems.
> This feels like a bug in pci_enable_atomic_ops_to_root()? I assume it
> hit a case where bus->self == NULL?
yes. This crashes because of bus->self is NULL. Is it expected for VF?
>
> Why not fix it there?
Since its a functional breakage in 5.14, I posted a quick fix for
5.14. Also, we haven't done any testing on VF for this
feature. So I wanted to avoid claiming support for VF anyway.
I see that other drivers also use pci_enable_atomic_ops_to_root
without vf/pf check. Anyone seeing this issue?
>
> Jason
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
2021-08-31 15:57 ` Selvin Xavier
@ 2021-09-01 11:50 ` Jason Gunthorpe
2021-09-16 15:05 ` Selvin Xavier
0 siblings, 1 reply; 6+ messages in thread
From: Jason Gunthorpe @ 2021-09-01 11:50 UTC (permalink / raw)
To: Selvin Xavier; +Cc: Doug Ledford, linux-rdma
On Tue, Aug 31, 2021 at 09:27:14PM +0530, Selvin Xavier wrote:
> On Fri, Aug 27, 2021 at 6:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Thu, Aug 26, 2021 at 09:15:38PM -0700, Selvin Xavier wrote:
> > > Following Host crash is observed when pci_enable_atomic_ops_to_root
> > > is called with VF PCI device.
> > >
> > > PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash"
> > > #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4
> > > #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d
> > > #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d
> > > #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6
> > > #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417
> > > #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14
> > > #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace
> > > [exception RIP: pcie_capability_read_dword+28]
> > > RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246
> > > RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000
> > > RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000
> > > RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8
> > > R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000
> > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8
> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6
> > > #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re]
> > > #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
> > > RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246
> > > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648
> > > RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001
> > > RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580
> > > R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0
> > > R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002
> > > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> >
> Apologies for the delay in my response. I was exploring internally to
> see if it is a specific issue
> with the adapter/host. I see the problem in multiple systems.
>
> > This feels like a bug in pci_enable_atomic_ops_to_root()? I assume it
> > hit a case where bus->self == NULL?
> yes. This crashes because of bus->self is NULL. Is it expected for VF?
I'm not sure, you should ask the PCI lists
> > Why not fix it there?
> Since its a functional breakage in 5.14, I posted a quick fix for
> 5.14. Also, we haven't done any testing on VF for this
> feature. So I wanted to avoid claiming support for VF anyway.
>
> I see that other drivers also use pci_enable_atomic_ops_to_root
> without vf/pf check. Anyone seeing this issue?
Which is why I suspect the core code should be fixed not the driver..
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
2021-09-01 11:50 ` Jason Gunthorpe
@ 2021-09-16 15:05 ` Selvin Xavier
2021-09-16 15:09 ` Jason Gunthorpe
0 siblings, 1 reply; 6+ messages in thread
From: Selvin Xavier @ 2021-09-16 15:05 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma
[-- Attachment #1: Type: text/plain, Size: 3605 bytes --]
On Wed, Sep 1, 2021 at 5:20 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Aug 31, 2021 at 09:27:14PM +0530, Selvin Xavier wrote:
> > On Fri, Aug 27, 2021 at 6:01 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > >
> > > On Thu, Aug 26, 2021 at 09:15:38PM -0700, Selvin Xavier wrote:
> > > > Following Host crash is observed when pci_enable_atomic_ops_to_root
> > > > is called with VF PCI device.
> > > >
> > > > PID: 4481 TASK: ffff89c6941b0000 CPU: 53 COMMAND: "bash"
> > > > #0 [ffff9a94817136d8] machine_kexec at ffffffffb90601a4
> > > > #1 [ffff9a9481713728] __crash_kexec at ffffffffb9190d5d
> > > > #2 [ffff9a94817137f0] crash_kexec at ffffffffb9191c4d
> > > > #3 [ffff9a9481713808] oops_end at ffffffffb9025cd6
> > > > #4 [ffff9a9481713828] page_fault_oops at ffffffffb906e417
> > > > #5 [ffff9a9481713888] exc_page_fault at ffffffffb9a0ad14
> > > > #6 [ffff9a94817138b0] asm_exc_page_fault at ffffffffb9c00ace
> > > > [exception RIP: pcie_capability_read_dword+28]
> > > > RIP: ffffffffb952fd5c RSP: ffff9a9481713960 RFLAGS: 00010246
> > > > RAX: 0000000000000001 RBX: ffff89c6b1096000 RCX: 0000000000000000
> > > > RDX: ffff9a9481713990 RSI: 0000000000000024 RDI: 0000000000000000
> > > > RBP: 0000000000000080 R8: 0000000000000008 R9: ffff89c64341a2f8
> > > > R10: 0000000000000002 R11: 0000000000000000 R12: ffff89c648bab000
> > > > R13: 0000000000000000 R14: 0000000000000000 R15: ffff89c648bab0c8
> > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > > #7 [ffff9a9481713988] pci_enable_atomic_ops_to_root at ffffffffb95359a6
> > > > #8 [ffff9a94817139c0] bnxt_qplib_determine_atomics at ffffffffc08c1a33 [bnxt_re]
> > > > #9 [ffff9a94817139d0] bnxt_re_dev_init at ffffffffc08ba2d1 [bnxt_re]
> > > > RIP: 00007f450602f648 RSP: 00007ffe880869e8 RFLAGS: 00000246
> > > > RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f450602f648
> > > > RDX: 0000000000000002 RSI: 0000555c566c4a60 RDI: 0000000000000001
> > > > RBP: 0000555c566c4a60 R8: 000000000000000a R9: 00007f45060c2580
> > > > R10: 000000000000000a R11: 0000000000000246 R12: 00007f45063026e0
> > > > R13: 0000000000000002 R14: 00007f45062fd880 R15: 0000000000000002
> > > > ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
> > >
> > Apologies for the delay in my response. I was exploring internally to
> > see if it is a specific issue
> > with the adapter/host. I see the problem in multiple systems.
> >
> > > This feels like a bug in pci_enable_atomic_ops_to_root()? I assume it
> > > hit a case where bus->self == NULL?
> > yes. This crashes because of bus->self is NULL. Is it expected for VF?
>
> I'm not sure, you should ask the PCI lists
>
> > > Why not fix it there?
> > Since its a functional breakage in 5.14, I posted a quick fix for
> > 5.14. Also, we haven't done any testing on VF for this
> > feature. So I wanted to avoid claiming support for VF anyway.
> >
> > I see that other drivers also use pci_enable_atomic_ops_to_root
> > without vf/pf check. Anyone seeing this issue?
>
> Which is why I suspect the core code should be fixed not the driver..
Hi Jason,
A patch that avoids the crash is merged to the linux-pci tree.
https://lore.kernel.org/linux-pci/20210914201606.GA1452219@bjorn-Precision-5520/T/
With the pci patch, the host will not crash. But driver will get
following error message when called for VF
""platform doesn't support global atomics."
we want to prevent calling pci_enable_atomic_ops_to_root for VF
anyway. Can you please pull this patch in bnxt_re?
Thanks
Selvin
>
> Jason
[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4224 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs
2021-09-16 15:05 ` Selvin Xavier
@ 2021-09-16 15:09 ` Jason Gunthorpe
0 siblings, 0 replies; 6+ messages in thread
From: Jason Gunthorpe @ 2021-09-16 15:09 UTC (permalink / raw)
To: Selvin Xavier; +Cc: Doug Ledford, linux-rdma
On Thu, Sep 16, 2021 at 08:35:37PM +0530, Selvin Xavier wrote:
> Hi Jason,
> A patch that avoids the crash is merged to the linux-pci tree.
> https://lore.kernel.org/linux-pci/20210914201606.GA1452219@bjorn-Precision-5520/T/
> With the pci patch, the host will not crash. But driver will get
> following error message when called for VF
> ""platform doesn't support global atomics."
>
> we want to prevent calling pci_enable_atomic_ops_to_root for VF
> anyway. Can you please pull this patch in bnxt_re?
It doesn't work like this you have to wait until v5.16 for all the
trees to be harmonized. You should take care of it in your internal
testing tree in the interm.
Jason
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-09-16 15:10 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-27 4:15 [PATCH rdma-rc] RDMA/bnxt_re: Disable atomic support on VFs Selvin Xavier
2021-08-27 12:31 ` Jason Gunthorpe
2021-08-31 15:57 ` Selvin Xavier
2021-09-01 11:50 ` Jason Gunthorpe
2021-09-16 15:05 ` Selvin Xavier
2021-09-16 15:09 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).