* mlx5_ib_post_send panic on s390x @ 2017-02-24 9:51 Ursula Braun [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> [not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com> 0 siblings, 2 replies; 15+ messages in thread From: Ursula Braun @ 2017-02-24 9:51 UTC (permalink / raw) To: matamb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Saeed and Matan, up to now I run SMC-R traffic on Connect X3, which works. But when switching to Connect X4, the first mlx5_ib_post_send() fails: [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 0000000000762408: a7740008 brc 7,762418 #000000000076240c: c05000000011 larl %r5,76242e >0000000000762412: 44405000 ex %r4,0(%r5) 0000000000762416: 07fe bcr 15,%r14 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) 000000000076241e: 41101100 la %r1,256(%r1) 0000000000762422: 41303100 la %r3,256(%r3) [ 247.787780] Call Trace: [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. [ 247.787807] Last Breaking-Event-Address: [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). Kind regards, Ursula Braun (IBM Germany) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: mlx5_ib_post_send panic on s390x [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-02-24 17:28 ` Eli Cohen [not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Eli Cohen @ 2017-02-24 17:28 UTC (permalink / raw) To: Ursula Braun, matamb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi, Can you please send details of the work request you are posting? I assume you are using inline, right? -----Original Message----- From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun Sent: Friday, February 24, 2017 3:52 AM To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com> Cc: linux-rdma@vger.kernel.org Subject: mlx5_ib_post_send panic on s390x Hi Saeed and Matan, up to now I run SMC-R traffic on Connect X3, which works. But when switching to Connect X4, the first mlx5_ib_post_send() fails: [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 0000000000762408: a7740008 brc 7,762418 #000000000076240c: c05000000011 larl %r5,76242e >0000000000762412: 44405000 ex %r4,0(%r5) 0000000000762416: 07fe bcr 15,%r14 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) 000000000076241e: 41101100 la %r1,256(%r1) 0000000000762422: 41303100 la %r3,256(%r3) [ 247.787780] Call Trace: [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. [ 247.787807] Last Breaking-Event-Address: [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). Kind regards, Ursula Braun (IBM Germany) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: mlx5_ib_post_send panic on s390x [not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-03-06 11:17 ` Ursula Braun [not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ursula Braun @ 2017-03-06 11:17 UTC (permalink / raw) To: Eli Cohen, matanb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 02/24/2017 06:28 PM, Eli Cohen wrote: > Hi, > > Can you please send details of the work request you are posting? I assume you are using inline, right? yes, inline is used: lnk->wr_tx_sges[i].addr = lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE; lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey; lnk->wr_tx_ibs[i].next = NULL; lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i]; lnk->wr_tx_ibs[i].num_sge = 1; lnk->wr_tx_ibs[i].opcode = IB_WR_SEND; lnk->wr_tx_ibs[i].send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE; > > -----Original Message----- > From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun > Sent: Friday, February 24, 2017 3:52 AM > To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: mlx5_ib_post_send panic on s390x > > Hi Saeed and Matan, > > up to now I run SMC-R traffic on Connect X3, which works. > But when switching to Connect X4, the first mlx5_ib_post_send() fails: > > [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. > [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 > [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 > [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) > [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) > [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 > [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 > [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 > [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 > [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 > 0000000000762408: a7740008 brc 7,762418 > #000000000076240c: c05000000011 larl %r5,76242e > >0000000000762412: 44405000 ex %r4,0(%r5) > 0000000000762416: 07fe bcr 15,%r14 > 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) > 000000000076241e: 41101100 la %r1,256(%r1) > 0000000000762422: 41303100 la %r3,256(%r3) > [ 247.787780] Call Trace: > [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. > [ 247.787807] Last Breaking-Event-Address: > [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops > > The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). > The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). > > Kind regards, Ursula Braun (IBM Germany) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: mlx5_ib_post_send panic on s390x [not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-03-06 12:56 ` Eli Cohen [not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Eli Cohen @ 2017-03-06 12:56 UTC (permalink / raw) To: Ursula Braun, Matan Barak, Leon Romanovsky Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA Please send information on: - The size of the required inline data in the offending work request - The transport service used - How many outstanding work requests the send queue is configured to - What was the serial number of the work request that triggered this oops (first, second, 65th etc). -----Original Message----- From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] Sent: Monday, March 6, 2017 5:17 AM To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>; Leon Romanovsky <leonro@mellanox.com> Cc: linux-rdma@vger.kernel.org Subject: Re: mlx5_ib_post_send panic on s390x On 02/24/2017 06:28 PM, Eli Cohen wrote: > Hi, > > Can you please send details of the work request you are posting? I assume you are using inline, right? yes, inline is used: lnk->wr_tx_sges[i].addr = lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE; lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey; lnk->wr_tx_ibs[i].next = NULL; lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i]; lnk->wr_tx_ibs[i].num_sge = 1; lnk->wr_tx_ibs[i].opcode = IB_WR_SEND; lnk->wr_tx_ibs[i].send_flags = IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE; > > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun > Sent: Friday, February 24, 2017 3:52 AM > To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com> > Cc: linux-rdma@vger.kernel.org > Subject: mlx5_ib_post_send panic on s390x > > Hi Saeed and Matan, > > up to now I run SMC-R traffic on Connect X3, which works. > But when switching to Connect X4, the first mlx5_ib_post_send() fails: > > [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. > [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 > [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 > [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) > [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) > [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 > [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 > [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 > [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 > [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 > [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 > 0000000000762408: a7740008 brc 7,762418 > #000000000076240c: c05000000011 larl %r5,76242e > >0000000000762412: 44405000 ex %r4,0(%r5) > 0000000000762416: 07fe bcr 15,%r14 > 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) > 000000000076241e: 41101100 la %r1,256(%r1) > 0000000000762422: 41303100 la %r3,256(%r3) > [ 247.787780] Call Trace: > [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. > [ 247.787807] Last Breaking-Event-Address: > [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops > > The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). > The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). > > Kind regards, Ursula Braun (IBM Germany) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: mlx5_ib_post_send panic on s390x [not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-03-06 13:47 ` Ursula Braun 0 siblings, 0 replies; 15+ messages in thread From: Ursula Braun @ 2017-03-06 13:47 UTC (permalink / raw) To: Eli Cohen, Matan Barak, Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA On 03/06/2017 01:56 PM, Eli Cohen wrote: > Please send information on: > > - The size of the required inline data in the offending work request 44 bytes (ib_createqp with ib_qp_init_attr.cap.max_inline_data=44) > - The transport service used IB_QPT_RC > - How many outstanding work requests the send queue is configured to ib_create_cq with ib_cq_init_attr.cqe=32768 ib_create_qp with ib_qp_init_attr.cap.max_send_wr=16 > - What was the serial number of the work request that triggered this oops (first, second, 65th etc). serial number wr_id=1 > > -----Original Message----- > From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] > Sent: Monday, March 6, 2017 5:17 AM > To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Subject: Re: mlx5_ib_post_send panic on s390x > > > On 02/24/2017 06:28 PM, Eli Cohen wrote: >> Hi, >> >> Can you please send details of the work request you are posting? I assume you are using inline, right? > yes, inline is used: > > lnk->wr_tx_sges[i].addr = > lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE; > lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey; > lnk->wr_tx_ibs[i].next = NULL; > lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i]; > lnk->wr_tx_ibs[i].num_sge = 1; > lnk->wr_tx_ibs[i].opcode = IB_WR_SEND; > lnk->wr_tx_ibs[i].send_flags = > IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE; > >> >> -----Original Message----- >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun >> Sent: Friday, February 24, 2017 3:52 AM >> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> Subject: mlx5_ib_post_send panic on s390x >> >> Hi Saeed and Matan, >> >> up to now I run SMC-R traffic on Connect X3, which works. >> But when switching to Connect X4, the first mlx5_ib_post_send() fails: >> >> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE. >> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 >> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 >> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) >> [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) >> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 >> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 >> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 >> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 >> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 >> 0000000000762408: a7740008 brc 7,762418 >> #000000000076240c: c05000000011 larl %r5,76242e >> >0000000000762412: 44405000 ex %r4,0(%r5) >> 0000000000762416: 07fe bcr 15,%r14 >> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) >> 000000000076241e: 41101100 la %r1,256(%r1) >> 0000000000762422: 41303100 la %r3,256(%r3) >> [ 247.787780] Call Trace: >> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off. >> [ 247.787807] Last Breaking-Event-Address: >> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). >> >> Kind regards, Ursula Braun (IBM Germany) >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com>]
[parent not found: <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6@mellanox.com>]
[parent not found: <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2017-03-06 13:03 ` Ursula Braun [not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ursula Braun @ 2017-03-06 13:03 UTC (permalink / raw) To: Matan Barak (External) Cc: Saeed Mahameed (saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org), Eli Cohen, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 02/26/2017 10:45 AM, Matan Barak (External) wrote: > On 24/02/2017 12:27, Ursula Braun wrote: >> sorry, typo in the mail address. >> >> -------- Forwarded Message -------- >> Subject: mlx5_ib_post_send panic on s390x >> Date: Fri, 24 Feb 2017 10:51:32 +0100 >> From: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> >> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org >> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> >> Hi Saeed and Matan, >> >> up to now I run SMC-R traffic on Connect X3, which works. >> But when switching to Connect X4, the first mlx5_ib_post_send() fails: >> >> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space >> [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 >> [ 247.787664] Fault in home space mode while using kernel ASCE. >> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 >> [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP >> [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4 >> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4 >> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR) >> [ 247.787743] Workqueue: events smc_listen_work [smc] >> [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 >> [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48) >> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 >> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8 >> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440 >> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001 >> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38 >> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2 >> 0000000000762408: a7740008 brc 7,762418 >> #000000000076240c: c05000000011 larl %r5,76242e >> >0000000000762412: 44405000 ex %r4,0(%r5) >> 0000000000762416: 07fe bcr 15,%r14 >> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3) >> 000000000076241e: 41101100 la %r1,256(%r1) >> 0000000000762422: 41303100 la %r3,256(%r3) >> [ 247.787780] Call Trace: >> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) >> [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] >> [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] >> [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] >> [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 >> [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 >> [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 >> [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc >> [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc >> [ 247.787806] INFO: lockdep is turned off. >> [ 247.787807] Last Breaking-Event-Address: >> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 >> [ 247.787813] >> [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops >> >> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to >> an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses >> require extra access functions (see arch/s390/include/asm/io.h). >> > > So I guess memcpy_toio is required here, right? > Since we don't have a s390 based system, could you please test this? memcpy_toio() did not help. Then I replaced the memcpy-calls in set_data_inl_seg() by this preliminary test code (just to give an idea, not a real patch proposal): static void *memcpy_usc(void *dest, const void *src, size_t count) { char *tmp_dest = (char *)dest; char *tmp_src = (char *)src; int copied = 0; u32 tmp_u32; while (copied < count) { tmp_u32 = __raw_readl(tmp_src); __raw_writel(tmp_u32, tmp_dest); copied += sizeof(tmp_u32); tmp_dest += sizeof(tmp_u32); tmp_src += sizeof(tmp_u32); } return dest; } This helped; the first mlx5_ib_post_send code initiated from SMC-code (type IB_WR_SEND, flagged with IB_SEND_INLINE, length 44 bytes) run successful. A following mlx5_ib_post_send call of type RDMA_WRITE seems to stall later on, but this is something I have to analyze in more detail. > >> Kind regards, Ursula Braun (IBM Germany) >> > > Thanks for notifying. > > Matan > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-03-06 13:08 ` Eli Cohen [not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Eli Cohen @ 2017-03-06 13:08 UTC (permalink / raw) To: Ursula Braun, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA >> >> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >> The address provided by SMC-code in struct ib_send_wr *wr is an >> address belonging to an area mapped with the ib_dma_map_single() >> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). >> By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work. ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-03-09 9:54 ` Ursula Braun [not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ursula Braun @ 2017-03-09 9:54 UTC (permalink / raw) To: Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA On 03/06/2017 02:08 PM, Eli Cohen wrote: >>> >>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >>> The address provided by SMC-code in struct ib_send_wr *wr is an >>> address belonging to an area mapped with the ib_dma_map_single() >>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). >>> > > By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work. > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards? -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-03-09 12:58 ` Eli Cohen 2017-03-12 20:20 ` Parav Pandit 1 sibling, 0 replies; 15+ messages in thread From: Eli Cohen @ 2017-03-09 12:58 UTC (permalink / raw) To: Ursula Braun, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA Yes, for mlx4 it is ignored. -----Original Message----- From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] Sent: Thursday, March 9, 2017 3:54 AM To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky <leonro@mellanox.com>; linux-rdma@vger.kernel.org Subject: Re: Fwd: mlx5_ib_post_send panic on s390x On 03/06/2017 02:08 PM, Eli Cohen wrote: >>> >>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg(). >>> The address provided by SMC-code in struct ib_send_wr *wr is an >>> address belonging to an area mapped with the ib_dma_map_single() >>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h). >>> > > By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work. > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards? ^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-09 12:58 ` Eli Cohen @ 2017-03-12 20:20 ` Parav Pandit [not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Parav Pandit @ 2017-03-12 20:20 UTC (permalink / raw) To: Ursula Braun, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Ursula, > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Ursula Braun > Sent: Thursday, March 9, 2017 3:54 AM > To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > <leonro@mellanox.com>; linux-rdma@vger.kernel.org > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > > > > On 03/06/2017 02:08 PM, Eli Cohen wrote: > >>> > >>> The problem seems to be caused by the usage of plain memcpy in > set_data_inl_seg(). > >>> The address provided by SMC-code in struct ib_send_wr *wr is an > >>> address belonging to an area mapped with the ib_dma_map_single() > >>> call. On s390x those kind of addresses require extra access functions (see > arch/s390/include/asm/io.h). > >>> > > > > By definition, when you are posting a send request with inline, the address > must be mapped to the cpu so plain memcpy should work. > > > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to > contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean > for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and > thus I needed the ib_dma_map_single() call for the area used with > ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE > flag, if I want to run the same SMC-R code with both, Connect X3 cards and > Connect X4 cards? > I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64. Shall I submit below fix to netdev mailing list? I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr. - lnk->wr_tx_sges[i].addr = - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i); I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base. I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver. Let me know. Regards, Parav Pandit diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, unsigned long flags; int nreq; int err = 0; + int inl = 0; unsigned ind; int uninitialized_var(stamp); int uninitialized_var(size); @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, default: break; } + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { + struct mlx4_wqe_inline_seg *seg; + void *addr; + int len, seg_len; + int num_seg; + int off, to_copy; + + inl = 0; + + seg = wqe; + wqe += sizeof *seg; + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); + num_seg = 0; + seg_len = 0; + + for (i = 0; i < wr->num_sge; ++i) { + addr = (void *) (uintptr_t) wr->sg_list[i].addr; + len = wr->sg_list[i].length; + inl += len; + + if (inl > 16) { + inl = 0; + err = ENOMEM; + *bad_wr = wr; + goto out; + } - /* - * Write data segments in reverse order, so as to - * overwrite cacheline stamp last within each - * cacheline. This avoids issues with WQE - * prefetching. - */ + while (len >= MLX4_INLINE_ALIGN - off) { + to_copy = MLX4_INLINE_ALIGN - off; + memcpy(wqe, addr, to_copy); + len -= to_copy; + wqe += to_copy; + addr += to_copy; + seg_len += to_copy; + wmb(); /* see comment below */ + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len); + seg_len = 0; + seg = wqe; + wqe += sizeof *seg; + off = sizeof *seg; + ++num_seg; + } - dseg = wqe; - dseg += wr->num_sge - 1; - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16); + memcpy(wqe, addr, len); + wqe += len; + seg_len += len; + off += len; + } - /* Add one more inline data segment for ICRC for MLX sends */ - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || - qp->mlx4_ib_qp_type & - (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) { - set_mlx_icrc_seg(dseg + 1); - size += sizeof (struct mlx4_wqe_data_seg) / 16; - } + if (seg_len) { + ++num_seg; + /* + * Need a barrier here to make sure + * all the data is visible before the + * byte_count field is set. Otherwise + * the HCA prefetcher could grab the + * 64-byte chunk with this inline + * segment and get a valid (!= + * 0xffffffff) byte count but stale + * data, and end up sending the wrong + * data. + */ + wmb(); + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len); + } - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) - set_data_seg(dseg, wr->sg_list + i); + size += (inl + num_seg * sizeof (*seg) + 15) / 16; + } else { + /* + * Write data segments in reverse order, so as to + * overwrite cacheline stamp last within each + * cacheline. This avoids issues with WQE + * prefetching. + */ + + dseg = wqe; + dseg += wr->num_sge - 1; + size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16); + + /* Add one more inline data segment for ICRC for MLX sends */ + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || + qp->mlx4_ib_qp_type & + (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) { + set_mlx_icrc_seg(dseg + 1); + size += sizeof (struct mlx4_wqe_data_seg) / 16; + } + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) + set_data_seg(dseg, wr->sg_list + i); + } /* * Possibly overwrite stamping in cacheline with LSO * segment only after making sure all data segments > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body > of a message to majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 15+ messages in thread
[parent not found: <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-03-12 20:38 ` Parav Pandit 2017-03-14 15:02 ` Ursula Braun 1 sibling, 0 replies; 15+ messages in thread From: Parav Pandit @ 2017-03-12 20:38 UTC (permalink / raw) To: Parav Pandit, Ursula Braun, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA I meant mlx4_ib* driver below. Sorry for typo. > -----Original Message----- > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > owner@vger.kernel.org] On Behalf Of Parav Pandit > Sent: Sunday, March 12, 2017 3:21 PM > To: Ursula Braun <ubraun@linux.vnet.ibm.com>; Eli Cohen > <eli@mellanox.com>; Matan Barak <matanb@mellanox.com> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > <leonro@mellanox.com>; linux-rdma@vger.kernel.org > Subject: RE: Fwd: mlx5_ib_post_send panic on s390x > > Hi Ursula, > > > -----Original Message----- > > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > > owner@vger.kernel.org] On Behalf Of Ursula Braun > > Sent: Thursday, March 9, 2017 3:54 AM > > To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com> > > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > > <leonro@mellanox.com>; linux-rdma@vger.kernel.org > > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > > > > > > > > On 03/06/2017 02:08 PM, Eli Cohen wrote: > > >>> > > >>> The problem seems to be caused by the usage of plain memcpy in > > set_data_inl_seg(). > > >>> The address provided by SMC-code in struct ib_send_wr *wr is an > > >>> address belonging to an area mapped with the ib_dma_map_single() > > >>> call. On s390x those kind of addresses require extra access > > >>> functions (see > > arch/s390/include/asm/io.h). > > >>> > > > > > > By definition, when you are posting a send request with inline, the > > > address > > must be mapped to the cpu so plain memcpy should work. > > > > > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does > > not seem to contain extra coding for IB_SEND_INLINE flag for > > ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the > > IB_SEND_INLINE flag is ignored, and thus I needed the > > ib_dma_map_single() call for the area used with ib_post_send()? Does > > this mean I should stay away from the IB_SEND_INLINE flag, if I want > > to run the same SMC-R code with both, Connect X3 cards and Connect X4 > cards? > > > I had encountered the same kernel panic that you mentioned last week on > ConnectX-4 adapters with smc-r on x86_64. > Shall I submit below fix to netdev mailing list? > I have tested above change. I also have optimization that avoids dma mapping > for wr_tx_dma_addr. > > - lnk->wr_tx_sges[i].addr = > - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + > + i); > > I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older > kernel base. > I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver. > Let me know. > > Regards, > Parav Pandit > > diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c > index a2e4ca5..0d984f5 100644 > --- a/drivers/infiniband/hw/mlx4/qp.c > +++ b/drivers/infiniband/hw/mlx4/qp.c > @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct > ib_send_wr *wr, > unsigned long flags; > int nreq; > int err = 0; > + int inl = 0; > unsigned ind; > int uninitialized_var(stamp); > int uninitialized_var(size); > @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct > ib_send_wr *wr, > default: > break; > } > + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { > + struct mlx4_wqe_inline_seg *seg; > + void *addr; > + int len, seg_len; > + int num_seg; > + int off, to_copy; > + > + inl = 0; > + > + seg = wqe; > + wqe += sizeof *seg; > + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); > + num_seg = 0; > + seg_len = 0; > + > + for (i = 0; i < wr->num_sge; ++i) { > + addr = (void *) (uintptr_t) wr->sg_list[i].addr; > + len = wr->sg_list[i].length; > + inl += len; > + > + if (inl > 16) { > + inl = 0; > + err = ENOMEM; > + *bad_wr = wr; > + goto out; > + } > > - /* > - * Write data segments in reverse order, so as to > - * overwrite cacheline stamp last within each > - * cacheline. This avoids issues with WQE > - * prefetching. > - */ > + while (len >= MLX4_INLINE_ALIGN - off) { > + to_copy = MLX4_INLINE_ALIGN - off; > + memcpy(wqe, addr, to_copy); > + len -= to_copy; > + wqe += to_copy; > + addr += to_copy; > + seg_len += to_copy; > + wmb(); /* see comment below */ > + seg->byte_count = > htonl(MLX4_INLINE_SEG | seg_len); > + seg_len = 0; > + seg = wqe; > + wqe += sizeof *seg; > + off = sizeof *seg; > + ++num_seg; > + } > > - dseg = wqe; > - dseg += wr->num_sge - 1; > - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / > 16); > + memcpy(wqe, addr, len); > + wqe += len; > + seg_len += len; > + off += len; > + } > > - /* Add one more inline data segment for ICRC for MLX sends */ > - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > - qp->mlx4_ib_qp_type & > - (MLX4_IB_QPT_PROXY_SMI_OWNER | > MLX4_IB_QPT_TUN_SMI_OWNER))) { > - set_mlx_icrc_seg(dseg + 1); > - size += sizeof (struct mlx4_wqe_data_seg) / 16; > - } > + if (seg_len) { > + ++num_seg; > + /* > + * Need a barrier here to make sure > + * all the data is visible before the > + * byte_count field is set. Otherwise > + * the HCA prefetcher could grab the > + * 64-byte chunk with this inline > + * segment and get a valid (!= > + * 0xffffffff) byte count but stale > + * data, and end up sending the wrong > + * data. > + */ > + wmb(); > + seg->byte_count = htonl(MLX4_INLINE_SEG | > seg_len); > + } > > - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > - set_data_seg(dseg, wr->sg_list + i); > + size += (inl + num_seg * sizeof (*seg) + 15) / 16; > + } else { > + /* > + * Write data segments in reverse order, so as to > + * overwrite cacheline stamp last within each > + * cacheline. This avoids issues with WQE > + * prefetching. > + */ > + > + dseg = wqe; > + dseg += wr->num_sge - 1; > + size += wr->num_sge * (sizeof (struct > mlx4_wqe_data_seg) / 16); > + > + /* Add one more inline data segment for ICRC for MLX > sends */ > + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI > || > + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI > || > + qp->mlx4_ib_qp_type & > + (MLX4_IB_QPT_PROXY_SMI_OWNER | > MLX4_IB_QPT_TUN_SMI_OWNER))) { > + set_mlx_icrc_seg(dseg + 1); > + size += sizeof (struct mlx4_wqe_data_seg) / 16; > + } > > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > + set_data_seg(dseg, wr->sg_list + i); > + } > /* > * Possibly overwrite stamping in cacheline with LSO > * segment only after making sure all data segments > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > > in the body of a message to majordomo@vger.kernel.org More majordomo > > info at http://vger.kernel.org/majordomo-info.html > \x04 {.n + +% lzwm b 맲 r zX \x1aݙ \x17 ܨ} Ơz &j:+v zZ+ +zf h ~ i z \x1e w ? > & )ߢ^[f ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-12 20:38 ` Parav Pandit @ 2017-03-14 15:02 ` Ursula Braun [not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 1 sibling, 1 reply; 15+ messages in thread From: Ursula Braun @ 2017-03-14 15:02 UTC (permalink / raw) To: Parav Pandit, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Parav, I tried your mlx4-patch together with SMC on s390x, but it failed. The SMC-R code tries to send 44 bytes as inline in 1 sge. I wonder about a length check with 16 bytes, which probably explains the failure. See my question below in the patch: On 03/12/2017 09:20 PM, Parav Pandit wrote: > Hi Ursula, > >> -----Original Message----- >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun >> Sent: Thursday, March 9, 2017 3:54 AM >> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky >> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x >> >> >> >> On 03/06/2017 02:08 PM, Eli Cohen wrote: >>>>> >>>>> The problem seems to be caused by the usage of plain memcpy in >> set_data_inl_seg(). >>>>> The address provided by SMC-code in struct ib_send_wr *wr is an >>>>> address belonging to an area mapped with the ib_dma_map_single() >>>>> call. On s390x those kind of addresses require extra access functions (see >> arch/s390/include/asm/io.h). >>>>> >>> >>> By definition, when you are posting a send request with inline, the address >> must be mapped to the cpu so plain memcpy should work. >>> >> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to >> contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean >> for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and >> thus I needed the ib_dma_map_single() call for the area used with >> ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE >> flag, if I want to run the same SMC-R code with both, Connect X3 cards and >> Connect X4 cards? >> > I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64. > Shall I submit below fix to netdev mailing list? > I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr. > > - lnk->wr_tx_sges[i].addr = > - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i); > > I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base. > I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver. > Let me know. > > Regards, > Parav Pandit > > diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c > index a2e4ca5..0d984f5 100644 > --- a/drivers/infiniband/hw/mlx4/qp.c > +++ b/drivers/infiniband/hw/mlx4/qp.c > @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > unsigned long flags; > int nreq; > int err = 0; > + int inl = 0; > unsigned ind; > int uninitialized_var(stamp); > int uninitialized_var(size); > @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, > default: > break; > } > + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { > + struct mlx4_wqe_inline_seg *seg; > + void *addr; > + int len, seg_len; > + int num_seg; > + int off, to_copy; > + > + inl = 0; > + > + seg = wqe; > + wqe += sizeof *seg; > + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); > + num_seg = 0; > + seg_len = 0; > + > + for (i = 0; i < wr->num_sge; ++i) { > + addr = (void *) (uintptr_t) wr->sg_list[i].addr; > + len = wr->sg_list[i].length; > + inl += len; > + > + if (inl > 16) { > + inl = 0; > + err = ENOMEM; > + *bad_wr = wr; > + goto out; > + } SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for IB_SEND_INLINE data? The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function does not seem to return an error. > > - /* > - * Write data segments in reverse order, so as to > - * overwrite cacheline stamp last within each > - * cacheline. This avoids issues with WQE > - * prefetching. > - */ > + while (len >= MLX4_INLINE_ALIGN - off) { > + to_copy = MLX4_INLINE_ALIGN - off; > + memcpy(wqe, addr, to_copy); > + len -= to_copy; > + wqe += to_copy; > + addr += to_copy; > + seg_len += to_copy; > + wmb(); /* see comment below */ > + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len); > + seg_len = 0; > + seg = wqe; > + wqe += sizeof *seg; > + off = sizeof *seg; > + ++num_seg; > + } > > - dseg = wqe; > - dseg += wr->num_sge - 1; > - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16); > + memcpy(wqe, addr, len); > + wqe += len; > + seg_len += len; > + off += len; > + } > > - /* Add one more inline data segment for ICRC for MLX sends */ > - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > - qp->mlx4_ib_qp_type & > - (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) { > - set_mlx_icrc_seg(dseg + 1); > - size += sizeof (struct mlx4_wqe_data_seg) / 16; > - } > + if (seg_len) { > + ++num_seg; > + /* > + * Need a barrier here to make sure > + * all the data is visible before the > + * byte_count field is set. Otherwise > + * the HCA prefetcher could grab the > + * 64-byte chunk with this inline > + * segment and get a valid (!= > + * 0xffffffff) byte count but stale > + * data, and end up sending the wrong > + * data. > + */ > + wmb(); > + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len); > + } > > - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > - set_data_seg(dseg, wr->sg_list + i); > + size += (inl + num_seg * sizeof (*seg) + 15) / 16; > + } else { > + /* > + * Write data segments in reverse order, so as to > + * overwrite cacheline stamp last within each > + * cacheline. This avoids issues with WQE > + * prefetching. > + */ > + > + dseg = wqe; > + dseg += wr->num_sge - 1; > + size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16); > + > + /* Add one more inline data segment for ICRC for MLX sends */ > + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > + qp->mlx4_ib_qp_type & > + (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) { > + set_mlx_icrc_seg(dseg + 1); > + size += sizeof (struct mlx4_wqe_data_seg) / 16; > + } > > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > + set_data_seg(dseg, wr->sg_list + i); > + } > /* > * Possibly overwrite stamping in cacheline with LSO > * segment only after making sure all data segments > >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body >> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at >> http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-03-14 15:24 ` Parav Pandit [not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Parav Pandit @ 2017-03-14 15:24 UTC (permalink / raw) To: Ursula Braun, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Ursula, > -----Original Message----- > From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] > Sent: Tuesday, March 14, 2017 10:02 AM > To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>; > Matan Barak <matanb@mellanox.com> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > <leonro@mellanox.com>; linux-rdma@vger.kernel.org > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > > Hi Parav, > > I tried your mlx4-patch together with SMC on s390x, but it failed. > The SMC-R code tries to send 44 bytes as inline in 1 sge. > I wonder about a length check with 16 bytes, which probably explains the > failure. > See my question below in the patch: > > On 03/12/2017 09:20 PM, Parav Pandit wrote: > > Hi Ursula, > > > >> -----Original Message----- > >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > >> owner@vger.kernel.org] On Behalf Of Ursula Braun > >> Sent: Thursday, March 9, 2017 3:54 AM > >> To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com> > >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org > >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > >> > >> > >> > >> On 03/06/2017 02:08 PM, Eli Cohen wrote: > >>>>> > >>>>> The problem seems to be caused by the usage of plain memcpy in > >> set_data_inl_seg(). > >>>>> The address provided by SMC-code in struct ib_send_wr *wr is an > >>>>> address belonging to an area mapped with the ib_dma_map_single() > >>>>> call. On s390x those kind of addresses require extra access > >>>>> functions (see > >> arch/s390/include/asm/io.h). > >>>>> > >>> > >>> By definition, when you are posting a send request with inline, the > >>> address > >> must be mapped to the cpu so plain memcpy should work. > >>> > >> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does > >> not seem to contain extra coding for IB_SEND_INLINE flag for > >> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the > >> IB_SEND_INLINE flag is ignored, and thus I needed the > >> ib_dma_map_single() call for the area used with ib_post_send()? Does > >> this mean I should stay away from the IB_SEND_INLINE flag, if I want > >> to run the same SMC-R code with both, Connect X3 cards and Connect X4 > cards? > >> > > I had encountered the same kernel panic that you mentioned last week on > ConnectX-4 adapters with smc-r on x86_64. > > Shall I submit below fix to netdev mailing list? > > I have tested above change. I also have optimization that avoids dma mapping > for wr_tx_dma_addr. > > > > - lnk->wr_tx_sges[i].addr = > > - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > > + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs > > + + i); > > > > I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older > kernel base. > > I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver. > > Let me know. > > > > Regards, > > Parav Pandit > > > > diff --git a/drivers/infiniband/hw/mlx4/qp.c > > b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644 > > --- a/drivers/infiniband/hw/mlx4/qp.c > > +++ b/drivers/infiniband/hw/mlx4/qp.c > > @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct > ib_send_wr *wr, > > unsigned long flags; > > int nreq; > > int err = 0; > > + int inl = 0; > > unsigned ind; > > int uninitialized_var(stamp); > > int uninitialized_var(size); > > @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct > ib_send_wr *wr, > > default: > > break; > > } > > + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { > > + struct mlx4_wqe_inline_seg *seg; > > + void *addr; > > + int len, seg_len; > > + int num_seg; > > + int off, to_copy; > > + > > + inl = 0; > > + > > + seg = wqe; > > + wqe += sizeof *seg; > > + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); > > + num_seg = 0; > > + seg_len = 0; > > + > > + for (i = 0; i < wr->num_sge; ++i) { > > + addr = (void *) (uintptr_t) wr->sg_list[i].addr; > > + len = wr->sg_list[i].length; > > + inl += len; > > + > > + if (inl > 16) { > > + inl = 0; > > + err = ENOMEM; > > + *bad_wr = wr; > > + goto out; > > + } > SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for > IB_SEND_INLINE data? > The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function > does not seem to return an error. > > This check should be for max_inline_data variable of the QP. This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes. I will fix this. Is it possible to change to 44 and do quick test? Final patch will have right check in addition to check in create_qp? > > - /* > > - * Write data segments in reverse order, so as to > > - * overwrite cacheline stamp last within each > > - * cacheline. This avoids issues with WQE > > - * prefetching. > > - */ > > + while (len >= MLX4_INLINE_ALIGN - off) { > > + to_copy = MLX4_INLINE_ALIGN - off; > > + memcpy(wqe, addr, to_copy); > > + len -= to_copy; > > + wqe += to_copy; > > + addr += to_copy; > > + seg_len += to_copy; > > + wmb(); /* see comment below */ > > + seg->byte_count = > htonl(MLX4_INLINE_SEG | seg_len); > > + seg_len = 0; > > + seg = wqe; > > + wqe += sizeof *seg; > > + off = sizeof *seg; > > + ++num_seg; > > + } > > > > - dseg = wqe; > > - dseg += wr->num_sge - 1; > > - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / > 16); > > + memcpy(wqe, addr, len); > > + wqe += len; > > + seg_len += len; > > + off += len; > > + } > > > > - /* Add one more inline data segment for ICRC for MLX sends */ > > - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > > - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > > - qp->mlx4_ib_qp_type & > > - (MLX4_IB_QPT_PROXY_SMI_OWNER | > MLX4_IB_QPT_TUN_SMI_OWNER))) { > > - set_mlx_icrc_seg(dseg + 1); > > - size += sizeof (struct mlx4_wqe_data_seg) / 16; > > - } > > + if (seg_len) { > > + ++num_seg; > > + /* > > + * Need a barrier here to make sure > > + * all the data is visible before the > > + * byte_count field is set. Otherwise > > + * the HCA prefetcher could grab the > > + * 64-byte chunk with this inline > > + * segment and get a valid (!= > > + * 0xffffffff) byte count but stale > > + * data, and end up sending the wrong > > + * data. > > + */ > > + wmb(); > > + seg->byte_count = htonl(MLX4_INLINE_SEG | > seg_len); > > + } > > > > - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > > - set_data_seg(dseg, wr->sg_list + i); > > + size += (inl + num_seg * sizeof (*seg) + 15) / 16; > > + } else { > > + /* > > + * Write data segments in reverse order, so as to > > + * overwrite cacheline stamp last within each > > + * cacheline. This avoids issues with WQE > > + * prefetching. > > + */ > > + > > + dseg = wqe; > > + dseg += wr->num_sge - 1; > > + size += wr->num_sge * (sizeof (struct > mlx4_wqe_data_seg) / 16); > > + > > + /* Add one more inline data segment for ICRC for MLX > sends */ > > + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI > || > > + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI > || > > + qp->mlx4_ib_qp_type & > > + (MLX4_IB_QPT_PROXY_SMI_OWNER | > MLX4_IB_QPT_TUN_SMI_OWNER))) { > > + set_mlx_icrc_seg(dseg + 1); > > + size += sizeof (struct mlx4_wqe_data_seg) / 16; > > + } > > > > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > > + set_data_seg(dseg, wr->sg_list + i); > > + } > > /* > > * Possibly overwrite stamping in cacheline with LSO > > * segment only after making sure all data segments > > > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >> in the body of a message to majordomo@vger.kernel.org More majordomo > >> info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-03-16 11:51 ` Ursula Braun [not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 0 siblings, 1 reply; 15+ messages in thread From: Ursula Braun @ 2017-03-16 11:51 UTC (permalink / raw) To: Parav Pandit, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA Hi Parav, I run your new mlx4-Code together with changed SMC-R code no longer mapping the IB_SEND_INLINE area. It worked - great! Below I have added a small improvement idea in your patch. Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag in the SMC-R code, since there is no guarantee that this will work with all kinds of RoCE-devices. The maximum length for IB_SEND_INLINE depends on the RoCE-driver - right? Is there an interface to determine such a maximum length? Would ib_create_qp() return with an error, if the SMC-R specified .cap.max_inline_data = 44 is not supported by a RoCE-driver? On 03/14/2017 04:24 PM, Parav Pandit wrote: > Hi Ursula, > > >> -----Original Message----- >> From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] >> Sent: Tuesday, March 14, 2017 10:02 AM >> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; >> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky >> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x >> >> Hi Parav, >> >> I tried your mlx4-patch together with SMC on s390x, but it failed. >> The SMC-R code tries to send 44 bytes as inline in 1 sge. >> I wonder about a length check with 16 bytes, which probably explains the >> failure. >> See my question below in the patch: >> >> On 03/12/2017 09:20 PM, Parav Pandit wrote: >>> Hi Ursula, >>> >>>> -----Original Message----- >>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma- >>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun >>>> Sent: Thursday, March 9, 2017 3:54 AM >>>> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >>>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky >>>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org >>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x >>>> >>>> >>>> >>>> On 03/06/2017 02:08 PM, Eli Cohen wrote: >>>>>>> >>>>>>> The problem seems to be caused by the usage of plain memcpy in >>>> set_data_inl_seg(). >>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an >>>>>>> address belonging to an area mapped with the ib_dma_map_single() >>>>>>> call. On s390x those kind of addresses require extra access >>>>>>> functions (see >>>> arch/s390/include/asm/io.h). >>>>>>> >>>>> >>>>> By definition, when you are posting a send request with inline, the >>>>> address >>>> must be mapped to the cpu so plain memcpy should work. >>>>> >>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does >>>> not seem to contain extra coding for IB_SEND_INLINE flag for >>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the >>>> IB_SEND_INLINE flag is ignored, and thus I needed the >>>> ib_dma_map_single() call for the area used with ib_post_send()? Does >>>> this mean I should stay away from the IB_SEND_INLINE flag, if I want >>>> to run the same SMC-R code with both, Connect X3 cards and Connect X4 >> cards? >>>> >>> I had encountered the same kernel panic that you mentioned last week on >> ConnectX-4 adapters with smc-r on x86_64. >>> Shall I submit below fix to netdev mailing list? >>> I have tested above change. I also have optimization that avoids dma mapping >> for wr_tx_dma_addr. >>> >>> - lnk->wr_tx_sges[i].addr = >>> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; >>> + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs >>> + + i); >>> >>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older >> kernel base. >>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver. >>> Let me know. >>> >>> Regards, >>> Parav Pandit >>> >>> diff --git a/drivers/infiniband/hw/mlx4/qp.c >>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644 >>> --- a/drivers/infiniband/hw/mlx4/qp.c >>> +++ b/drivers/infiniband/hw/mlx4/qp.c >>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct >> ib_send_wr *wr, >>> unsigned long flags; >>> int nreq; >>> int err = 0; >>> + int inl = 0; >>> unsigned ind; >>> int uninitialized_var(stamp); >>> int uninitialized_var(size); >>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct >> ib_send_wr *wr, >>> default: >>> break; >>> } >>> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { >>> + struct mlx4_wqe_inline_seg *seg; >>> + void *addr; >>> + int len, seg_len; >>> + int num_seg; >>> + int off, to_copy; >>> + >>> + inl = 0; >>> + >>> + seg = wqe; >>> + wqe += sizeof *seg; >>> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); >>> + num_seg = 0; >>> + seg_len = 0; >>> + >>> + for (i = 0; i < wr->num_sge; ++i) { >>> + addr = (void *) (uintptr_t) wr->sg_list[i].addr; >>> + len = wr->sg_list[i].length; >>> + inl += len; >>> + >>> + if (inl > 16) { >>> + inl = 0; >>> + err = ENOMEM; >>> + *bad_wr = wr; >>> + goto out; >>> + } >> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for >> IB_SEND_INLINE data? >> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function >> does not seem to return an error. >>> > This check should be for max_inline_data variable of the QP. > This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes. > I will fix this. Is it possible to change to 44 and do quick test? > Final patch will have right check in addition to check in create_qp? > >>> - /* >>> - * Write data segments in reverse order, so as to >>> - * overwrite cacheline stamp last within each >>> - * cacheline. This avoids issues with WQE >>> - * prefetching. >>> - */ >>> + while (len >= MLX4_INLINE_ALIGN - off) { With this code there are 2 memcpy-Calls, one with to_copy=44, and the next one with len 0. I suggest to change the check to "len > MLX4_INLINE_ALIGN - off". >>> + to_copy = MLX4_INLINE_ALIGN - off; >>> + memcpy(wqe, addr, to_copy); >>> + len -= to_copy; >>> + wqe += to_copy; >>> + addr += to_copy; >>> + seg_len += to_copy; >>> + wmb(); /* see comment below */ >>> + seg->byte_count = >> htonl(MLX4_INLINE_SEG | seg_len); >>> + seg_len = 0; >>> + seg = wqe; >>> + wqe += sizeof *seg; >>> + off = sizeof *seg; >>> + ++num_seg; >>> + } >>> >>> - dseg = wqe; >>> - dseg += wr->num_sge - 1; >>> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / >> 16); >>> + memcpy(wqe, addr, len); >>> + wqe += len; >>> + seg_len += len; >>> + off += len; >>> + } >>> >>> - /* Add one more inline data segment for ICRC for MLX sends */ >>> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || >>> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || >>> - qp->mlx4_ib_qp_type & >>> - (MLX4_IB_QPT_PROXY_SMI_OWNER | >> MLX4_IB_QPT_TUN_SMI_OWNER))) { >>> - set_mlx_icrc_seg(dseg + 1); >>> - size += sizeof (struct mlx4_wqe_data_seg) / 16; >>> - } >>> + if (seg_len) { >>> + ++num_seg; >>> + /* >>> + * Need a barrier here to make sure >>> + * all the data is visible before the >>> + * byte_count field is set. Otherwise >>> + * the HCA prefetcher could grab the >>> + * 64-byte chunk with this inline >>> + * segment and get a valid (!= >>> + * 0xffffffff) byte count but stale >>> + * data, and end up sending the wrong >>> + * data. >>> + */ >>> + wmb(); >>> + seg->byte_count = htonl(MLX4_INLINE_SEG | >> seg_len); >>> + } >>> >>> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) >>> - set_data_seg(dseg, wr->sg_list + i); >>> + size += (inl + num_seg * sizeof (*seg) + 15) / 16; >>> + } else { >>> + /* >>> + * Write data segments in reverse order, so as to >>> + * overwrite cacheline stamp last within each >>> + * cacheline. This avoids issues with WQE >>> + * prefetching. >>> + */ >>> + >>> + dseg = wqe; >>> + dseg += wr->num_sge - 1; >>> + size += wr->num_sge * (sizeof (struct >> mlx4_wqe_data_seg) / 16); >>> + >>> + /* Add one more inline data segment for ICRC for MLX >> sends */ >>> + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI >> || >>> + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI >> || >>> + qp->mlx4_ib_qp_type & >>> + (MLX4_IB_QPT_PROXY_SMI_OWNER | >> MLX4_IB_QPT_TUN_SMI_OWNER))) { >>> + set_mlx_icrc_seg(dseg + 1); >>> + size += sizeof (struct mlx4_wqe_data_seg) / 16; >>> + } >>> >>> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) >>> + set_data_seg(dseg, wr->sg_list + i); >>> + } >>> /* >>> * Possibly overwrite stamping in cacheline with LSO >>> * segment only after making sure all data segments >>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" >>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo >>>> info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>]
* RE: Fwd: mlx5_ib_post_send panic on s390x [not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> @ 2017-03-20 21:04 ` Parav Pandit 0 siblings, 0 replies; 15+ messages in thread From: Parav Pandit @ 2017-03-20 21:04 UTC (permalink / raw) To: Ursula Braun, Eli Cohen, Matan Barak Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="utf-8", Size: 11077 bytes --] Hi Ursula, For the suggestion it still need to continue to check for len >= INLINE_ALIGN - off because 44 = 64-20. Which is still a valid case (len == inline - off). But I agree that it shouldn't do 2nd memcpy with zero length. Therefore there should be additional check for len != 0. Coming to IB_SEND_INLINE_DATA part, when ib_create_qp is called and if HCA doesn't support cap.max_inline_data, provider HCA driver is supposed to fail the call. And ULP is expected to do fallback to non_inline scheme. As it appears mlx4 driver is not failing this call, which is a bug that needs fix. Instead of failing the call, I prefer to provide the data path sooner based on my inline patch in this email thread. Parav > -----Original Message----- > From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] > Sent: Thursday, March 16, 2017 6:51 AM > To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>; > Matan Barak <matanb@mellanox.com> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > <leonro@mellanox.com>; linux-rdma@vger.kernel.org > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > > Hi Parav, > > I run your new mlx4-Code together with changed SMC-R code no longer > mapping the IB_SEND_INLINE area. It worked - great! > > Below I have added a small improvement idea in your patch. > > Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag in > the SMC-R code, since there is no guarantee that this will work with all kinds > of RoCE-devices. The maximum length for IB_SEND_INLINE depends on the > RoCE-driver - right? Is there an interface to determine such a maximum > length? Would ib_create_qp() return with an error, if the SMC-R specified > .cap.max_inline_data = 44 is not supported by a RoCE-driver? > > On 03/14/2017 04:24 PM, Parav Pandit wrote: > > Hi Ursula, > > > > > >> -----Original Message----- > >> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] > >> Sent: Tuesday, March 14, 2017 10:02 AM > >> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>; > >> Matan Barak <matanb@mellanox.com> > >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org > >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > >> > >> Hi Parav, > >> > >> I tried your mlx4-patch together with SMC on s390x, but it failed. > >> The SMC-R code tries to send 44 bytes as inline in 1 sge. > >> I wonder about a length check with 16 bytes, which probably explains > >> the failure. > >> See my question below in the patch: > >> > >> On 03/12/2017 09:20 PM, Parav Pandit wrote: > >>> Hi Ursula, > >>> > >>>> -----Original Message----- > >>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma- > >>>> owner@vger.kernel.org] On Behalf Of Ursula Braun > >>>> Sent: Thursday, March 9, 2017 3:54 AM > >>>> To: Eli Cohen <eli@mellanox.com>; Matan Barak > <matanb@mellanox.com> > >>>> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky > >>>> <leonro@mellanox.com>; linux-rdma@vger.kernel.org > >>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x > >>>> > >>>> > >>>> > >>>> On 03/06/2017 02:08 PM, Eli Cohen wrote: > >>>>>>> > >>>>>>> The problem seems to be caused by the usage of plain memcpy in > >>>> set_data_inl_seg(). > >>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an > >>>>>>> address belonging to an area mapped with the > ib_dma_map_single() > >>>>>>> call. On s390x those kind of addresses require extra access > >>>>>>> functions (see > >>>> arch/s390/include/asm/io.h). > >>>>>>> > >>>>> > >>>>> By definition, when you are posting a send request with inline, > >>>>> the address > >>>> must be mapped to the cpu so plain memcpy should work. > >>>>> > >>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does > >>>> not seem to contain extra coding for IB_SEND_INLINE flag for > >>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards > >>>> the IB_SEND_INLINE flag is ignored, and thus I needed the > >>>> ib_dma_map_single() call for the area used with ib_post_send()? > >>>> Does this mean I should stay away from the IB_SEND_INLINE flag, if > >>>> I want to run the same SMC-R code with both, Connect X3 cards and > >>>> Connect X4 > >> cards? > >>>> > >>> I had encountered the same kernel panic that you mentioned last week > >>> on > >> ConnectX-4 adapters with smc-r on x86_64. > >>> Shall I submit below fix to netdev mailing list? > >>> I have tested above change. I also have optimization that avoids dma > >>> mapping > >> for wr_tx_dma_addr. > >>> > >>> - lnk->wr_tx_sges[i].addr = > >>> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE; > >>> + lnk->wr_tx_sges[i].addr = > >>> + (uintptr_t)(lnk->wr_tx_bufs > >>> + + i); > >>> > >>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on > >>> little older > >> kernel base. > >>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib > driver. > >>> Let me know. > >>> > >>> Regards, > >>> Parav Pandit > >>> > >>> diff --git a/drivers/infiniband/hw/mlx4/qp.c > >>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644 > >>> --- a/drivers/infiniband/hw/mlx4/qp.c > >>> +++ b/drivers/infiniband/hw/mlx4/qp.c > >>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, > >>> struct > >> ib_send_wr *wr, > >>> unsigned long flags; > >>> int nreq; > >>> int err = 0; > >>> + int inl = 0; > >>> unsigned ind; > >>> int uninitialized_var(stamp); > >>> int uninitialized_var(size); > >>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, > >>> struct > >> ib_send_wr *wr, > >>> default: > >>> break; > >>> } > >>> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) { > >>> + struct mlx4_wqe_inline_seg *seg; > >>> + void *addr; > >>> + int len, seg_len; > >>> + int num_seg; > >>> + int off, to_copy; > >>> + > >>> + inl = 0; > >>> + > >>> + seg = wqe; > >>> + wqe += sizeof *seg; > >>> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1); > >>> + num_seg = 0; > >>> + seg_len = 0; > >>> + > >>> + for (i = 0; i < wr->num_sge; ++i) { > >>> + addr = (void *) (uintptr_t) wr->sg_list[i].addr; > >>> + len = wr->sg_list[i].length; > >>> + inl += len; > >>> + > >>> + if (inl > 16) { > >>> + inl = 0; > >>> + err = ENOMEM; > >>> + *bad_wr = wr; > >>> + goto out; > >>> + } > >> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for > >> IB_SEND_INLINE data? > >> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the > >> function does not seem to return an error. > >>> > > This check should be for max_inline_data variable of the QP. > > This was just for error check, I should have fixed it. I was testing with nvme > where inline data was only worth 16 bytes. > > I will fix this. Is it possible to change to 44 and do quick test? > > Final patch will have right check in addition to check in create_qp? > > > >>> - /* > >>> - * Write data segments in reverse order, so as to > >>> - * overwrite cacheline stamp last within each > >>> - * cacheline. This avoids issues with WQE > >>> - * prefetching. > >>> - */ > >>> + while (len >= MLX4_INLINE_ALIGN - off) { > With this code there are 2 memcpy-Calls, one with to_copy=44, and the next > one with len 0. > I suggest to change the check to "len > MLX4_INLINE_ALIGN - off". > >>> + to_copy = MLX4_INLINE_ALIGN - off; > >>> + memcpy(wqe, addr, to_copy); > >>> + len -= to_copy; > >>> + wqe += to_copy; > >>> + addr += to_copy; > >>> + seg_len += to_copy; > >>> + wmb(); /* see comment below */ > >>> + seg->byte_count = > >> htonl(MLX4_INLINE_SEG | seg_len); > >>> + seg_len = 0; > >>> + seg = wqe; > >>> + wqe += sizeof *seg; > >>> + off = sizeof *seg; > >>> + ++num_seg; > >>> + } > >>> > >>> - dseg = wqe; > >>> - dseg += wr->num_sge - 1; > >>> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / > >> 16); > >>> + memcpy(wqe, addr, len); > >>> + wqe += len; > >>> + seg_len += len; > >>> + off += len; > >>> + } > >>> > >>> - /* Add one more inline data segment for ICRC for MLX sends > */ > >>> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI || > >>> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI || > >>> - qp->mlx4_ib_qp_type & > >>> - (MLX4_IB_QPT_PROXY_SMI_OWNER | > >> MLX4_IB_QPT_TUN_SMI_OWNER))) { > >>> - set_mlx_icrc_seg(dseg + 1); > >>> - size += sizeof (struct mlx4_wqe_data_seg) / 16; > >>> - } > >>> + if (seg_len) { > >>> + ++num_seg; > >>> + /* > >>> + * Need a barrier here to make sure > >>> + * all the data is visible before the > >>> + * byte_count field is set. Otherwise > >>> + * the HCA prefetcher could grab the > >>> + * 64-byte chunk with this inline > >>> + * segment and get a valid (!= > >>> + * 0xffffffff) byte count but stale > >>> + * data, and end up sending the wrong > >>> + * data. > >>> + */ > >>> + wmb(); > >>> + seg->byte_count = htonl(MLX4_INLINE_SEG > | > >> seg_len); > >>> + } > >>> > >>> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > >>> - set_data_seg(dseg, wr->sg_list + i); > >>> + size += (inl + num_seg * sizeof (*seg) + 15) / 16; > >>> + } else { > >>> + /* > >>> + * Write data segments in reverse order, so as to > >>> + * overwrite cacheline stamp last within each > >>> + * cacheline. This avoids issues with WQE > >>> + * prefetching. > >>> + */ > >>> + > >>> + dseg = wqe; > >>> + dseg += wr->num_sge - 1; > >>> + size += wr->num_sge * (sizeof (struct > >> mlx4_wqe_data_seg) / 16); > >>> + > >>> + /* Add one more inline data segment for ICRC for > MLX > >> sends */ > >>> + if (unlikely(qp->mlx4_ib_qp_type == > MLX4_IB_QPT_SMI > >> || > >>> + qp->mlx4_ib_qp_type == > MLX4_IB_QPT_GSI > >> || > >>> + qp->mlx4_ib_qp_type & > >>> + (MLX4_IB_QPT_PROXY_SMI_OWNER | > >> MLX4_IB_QPT_TUN_SMI_OWNER))) { > >>> + set_mlx_icrc_seg(dseg + 1); > >>> + size += sizeof (struct mlx4_wqe_data_seg) / > 16; > >>> + } > >>> > >>> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg) > >>> + set_data_seg(dseg, wr->sg_list + i); > >>> + } > >>> /* > >>> * Possibly overwrite stamping in cacheline with LSO > >>> * segment only after making sure all data segments > >>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >>>> in the body of a message to majordomo@vger.kernel.org More > >>>> majordomo info at http://vger.kernel.org/majordomo-info.html > > N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-03-20 21:04 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-02-24 9:51 mlx5_ib_post_send panic on s390x Ursula Braun [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-02-24 17:28 ` Eli Cohen [not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-06 11:17 ` Ursula Braun [not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-06 12:56 ` Eli Cohen [not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-06 13:47 ` Ursula Braun [not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com> [not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6@mellanox.com> [not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2017-03-06 13:03 ` Fwd: " Ursula Braun [not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-06 13:08 ` Eli Cohen [not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-09 9:54 ` Ursula Braun [not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-09 12:58 ` Eli Cohen 2017-03-12 20:20 ` Parav Pandit [not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-12 20:38 ` Parav Pandit 2017-03-14 15:02 ` Ursula Braun [not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-14 15:24 ` Parav Pandit [not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-03-16 11:51 ` Ursula Braun [not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> 2017-03-20 21:04 ` Parav Pandit
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.