* Asking for help: Is it possible to force use rxe provider functions atop a Mellanox ConnectX-5 ethernet network?
@ 2020-08-09 7:41 Fan Yang
0 siblings, 0 replies; only message in thread
From: Fan Yang @ 2020-08-09 7:41 UTC (permalink / raw)
To: linux-rdma
Hi all,
I have access to a cloud-provided virtual machine with a "Mellanox
Technologies MT27800 Family [ConnectX-5 Virtual Function]" ethernet
controller. It seems that I cannot use hardware RoCE on it (BTW, do
you know how to check this?), so I decide to use rxe.
However, a divide error occurs when I run my RDMA program:
trap divide error ip:7f8dd1802b8f sp:7ffc63e72a80 error:0 in libmlx5.so.1.12.28.0[7f8dd17eb000+46000]
The backtrace is:
(gdb) bt
#0 0x00007ffff63d3365 in __add_page (context=0x7ffff7f75010)
at ~/src/rdma-core/providers/mlx5/dbrec.c:58
#1 0x00007ffff63d3587 in mlx5_alloc_dbrec (context=0x7ffff7f75010, pd=0x0, custom_alloc=0x5555557671e8)
at ~/src/rdma-core/providers/mlx5/dbrec.c:119
#2 0x00007ffff6403320 in create_cq (context=0x7ffff7f75150, cq_attr=0x7fffffffe6d0, cq_alloc_flags=0,
mlx5cq_attr=0x0) at ~/src/rdma-core/providers/mlx5/verbs.c:1013
#3 0x00007ffff640388a in mlx5_create_cq (context=0x7ffff7f75150, cqe=3, channel=0x555555765ec0, comp_vector=0)
at ~/src/rdma-core/providers/mlx5/verbs.c:1134
#4 0x00007ffff79acfb9 in __ibv_create_cq_1_1 (context=0x7ffff7f75150, cqe=3,
cq_context=0x55555575b440 <admin_qp>, channel=0x555555765ec0, comp_vector=0)
at ~/src/rdma-core/libibverbs/verbs.c:520
#5 0x0000555555556e79 in fsr_process_admin_qp_cm_event_req (aqp=0x55555575b440 <admin_qp>, ev=0x555555765a50)
at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:449
#6 0x0000555555557299 in fsr_process_admin_qp_cm_event (aqp=0x55555575b440 <admin_qp>, ec=0x55555575c410)
at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:552
#7 0x00005555555573b7 in fsr_estab_admin_qp (aqp=0x55555575b440 <admin_qp>)
at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:596
#8 0x0000555555557445 in fmsgserver_rdma_init () at ../../../server/fmsgserver_rdma/fmsgserver_rdma.c:619
#9 0x00005555555562cf in main (argc=1, argv=0x7fffffffe9b8) at driver.c:248
I find that the 'match_device' in rdma-core/libibverbs/init.c
will always match the mlx5 provider instead of the rxe provider since the
the pci ID matches (see hca_table in rdma-core/providers/mlx5/mlx5.c).
This leads to mlx5_xxx functions are invoked instead of the rxe_xxx functions.
Do you know how to force use the rxe-provider functions on top of the
mellanox connectX-5 ethernet network? Currently as a workaround, I
comment out the mlx5 subdirectories in the CMakeLists.txt so that mlx5
won't be tried in 'try_all_drivers'.
Best Regards,
Fan
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2020-08-09 7:48 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-09 7:41 Asking for help: Is it possible to force use rxe provider functions atop a Mellanox ConnectX-5 ethernet network? Fan Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).