* mlx5_ib_post_send panic on s390x
@ 2017-02-24 9:51 Ursula Braun
[not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
[not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com>
0 siblings, 2 replies; 15+ messages in thread
From: Ursula Braun @ 2017-02-24 9:51 UTC (permalink / raw)
To: matamb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Saeed and Matan,
up to now I run SMC-R traffic on Connect X3, which works.
But when switching to Connect X4, the first mlx5_ib_post_send() fails:
[ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space
[ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803
[ 247.787664] Fault in home space mode while using kernel ASCE.
[ 247.787667] AS:00000000011ec007 R3:0000000000000024
[ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP
[ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
[ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
[ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
[ 247.787743] Workqueue: events smc_listen_work [smc]
[ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000
[ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
[ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
[ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
[ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
[ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
[ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
0000000000762408: a7740008 brc 7,762418
#000000000076240c: c05000000011 larl %r5,76242e
>0000000000762412: 44405000 ex %r4,0(%r5)
0000000000762416: 07fe bcr 15,%r14
0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
000000000076241e: 41101100 la %r1,256(%r1)
0000000000762422: 41303100 la %r3,256(%r3)
[ 247.787780] Call Trace:
[ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib])
[ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc]
[ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc]
[ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc]
[ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780
[ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478
[ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170
[ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc
[ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc
[ 247.787806] INFO: lockdep is turned off.
[ 247.787807] Last Breaking-Event-Address:
[ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0
[ 247.787813]
[ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to
an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses
require extra access functions (see arch/s390/include/asm/io.h).
Kind regards, Ursula Braun (IBM Germany)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: mlx5_ib_post_send panic on s390x
[not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-02-24 17:28 ` Eli Cohen
[not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-02-24 17:28 UTC (permalink / raw)
To: Ursula Braun, matamb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi,
Can you please send details of the work request you are posting? I assume you are using inline, right?
-----Original Message-----
From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun
Sent: Friday, February 24, 2017 3:52 AM
To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Subject: mlx5_ib_post_send panic on s390x
Hi Saeed and Matan,
up to now I run SMC-R traffic on Connect X3, which works.
But when switching to Connect X4, the first mlx5_ib_post_send() fails:
[ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE.
[ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
[ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
[ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
[ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
[ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
[ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
[ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
[ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
[ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
0000000000762408: a7740008 brc 7,762418
#000000000076240c: c05000000011 larl %r5,76242e
>0000000000762412: 44405000 ex %r4,0(%r5)
0000000000762416: 07fe bcr 15,%r14
0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
000000000076241e: 41101100 la %r1,256(%r1)
0000000000762422: 41303100 la %r3,256(%r3)
[ 247.787780] Call Trace:
[ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off.
[ 247.787807] Last Breaking-Event-Address:
[ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
Kind regards, Ursula Braun (IBM Germany)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: mlx5_ib_post_send panic on s390x
[not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-06 11:17 ` Ursula Braun
[not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 11:17 UTC (permalink / raw)
To: Eli Cohen, matanb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 02/24/2017 06:28 PM, Eli Cohen wrote:
> Hi,
>
> Can you please send details of the work request you are posting? I assume you are using inline, right?
yes, inline is used:
lnk->wr_tx_sges[i].addr =
lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
lnk->wr_tx_ibs[i].next = NULL;
lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
lnk->wr_tx_ibs[i].num_sge = 1;
lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
lnk->wr_tx_ibs[i].send_flags =
IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;
>
> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
> Sent: Friday, February 24, 2017 3:52 AM
> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: mlx5_ib_post_send panic on s390x
>
> Hi Saeed and Matan,
>
> up to now I run SMC-R traffic on Connect X3, which works.
> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>
> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE.
> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
> [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
> 0000000000762408: a7740008 brc 7,762418
> #000000000076240c: c05000000011 larl %r5,76242e
> >0000000000762412: 44405000 ex %r4,0(%r5)
> 0000000000762416: 07fe bcr 15,%r14
> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
> 000000000076241e: 41101100 la %r1,256(%r1)
> 0000000000762422: 41303100 la %r3,256(%r3)
> [ 247.787780] Call Trace:
> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off.
> [ 247.787807] Last Breaking-Event-Address:
> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>
> Kind regards, Ursula Braun (IBM Germany)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: mlx5_ib_post_send panic on s390x
[not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-06 12:56 ` Eli Cohen
[not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-03-06 12:56 UTC (permalink / raw)
To: Ursula Braun, Matan Barak, Leon Romanovsky
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Please send information on:
- The size of the required inline data in the offending work request
- The transport service used
- How many outstanding work requests the send queue is configured to
- What was the serial number of the work request that triggered this oops (first, second, 65th etc).
-----Original Message-----
From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
Sent: Monday, March 6, 2017 5:17 AM
To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>; Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Subject: Re: mlx5_ib_post_send panic on s390x
On 02/24/2017 06:28 PM, Eli Cohen wrote:
> Hi,
>
> Can you please send details of the work request you are posting? I assume you are using inline, right?
yes, inline is used:
lnk->wr_tx_sges[i].addr =
lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
lnk->wr_tx_ibs[i].next = NULL;
lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
lnk->wr_tx_ibs[i].num_sge = 1;
lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
lnk->wr_tx_ibs[i].send_flags =
IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;
>
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun
> Sent: Friday, February 24, 2017 3:52 AM
> To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com>
> Cc: linux-rdma@vger.kernel.org
> Subject: mlx5_ib_post_send panic on s390x
>
> Hi Saeed and Matan,
>
> up to now I run SMC-R traffic on Connect X3, which works.
> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>
> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE.
> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
> [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
> 0000000000762408: a7740008 brc 7,762418
> #000000000076240c: c05000000011 larl %r5,76242e
> >0000000000762412: 44405000 ex %r4,0(%r5)
> 0000000000762416: 07fe bcr 15,%r14
> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
> 000000000076241e: 41101100 la %r1,256(%r1)
> 0000000000762422: 41303100 la %r3,256(%r3)
> [ 247.787780] Call Trace:
> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off.
> [ 247.787807] Last Breaking-Event-Address:
> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>
> Kind regards, Ursula Braun (IBM Germany)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-06 13:03 ` Ursula Braun
[not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 13:03 UTC (permalink / raw)
To: Matan Barak (External)
Cc: Saeed Mahameed (saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org),
Eli Cohen, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 02/26/2017 10:45 AM, Matan Barak (External) wrote:
> On 24/02/2017 12:27, Ursula Braun wrote:
>> sorry, typo in the mail address.
>>
>> -------- Forwarded Message --------
>> Subject: mlx5_ib_post_send panic on s390x
>> Date: Fri, 24 Feb 2017 10:51:32 +0100
>> From: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
>> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>
>> Hi Saeed and Matan,
>>
>> up to now I run SMC-R traffic on Connect X3, which works.
>> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>>
>> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space
>> [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803
>> [ 247.787664] Fault in home space mode while using kernel ASCE.
>> [ 247.787667] AS:00000000011ec007 R3:0000000000000024
>> [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP
>> [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
>> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
>> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
>> [ 247.787743] Workqueue: events smc_listen_work [smc]
>> [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000
>> [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
>> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
>> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
>> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
>> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
>> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
>> 0000000000762408: a7740008 brc 7,762418
>> #000000000076240c: c05000000011 larl %r5,76242e
>> >0000000000762412: 44405000 ex %r4,0(%r5)
>> 0000000000762416: 07fe bcr 15,%r14
>> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
>> 000000000076241e: 41101100 la %r1,256(%r1)
>> 0000000000762422: 41303100 la %r3,256(%r3)
>> [ 247.787780] Call Trace:
>> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib])
>> [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc]
>> [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc]
>> [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc]
>> [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780
>> [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478
>> [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170
>> [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc
>> [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc
>> [ 247.787806] INFO: lockdep is turned off.
>> [ 247.787807] Last Breaking-Event-Address:
>> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0
>> [ 247.787813]
>> [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to
>> an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses
>> require extra access functions (see arch/s390/include/asm/io.h).
>>
>
> So I guess memcpy_toio is required here, right?
> Since we don't have a s390 based system, could you please test this?
memcpy_toio() did not help. Then I replaced the memcpy-calls in set_data_inl_seg()
by this preliminary test code (just to give an idea, not a real patch proposal):
static void *memcpy_usc(void *dest, const void *src, size_t count)
{
char *tmp_dest = (char *)dest;
char *tmp_src = (char *)src;
int copied = 0;
u32 tmp_u32;
while (copied < count) {
tmp_u32 = __raw_readl(tmp_src);
__raw_writel(tmp_u32, tmp_dest);
copied += sizeof(tmp_u32);
tmp_dest += sizeof(tmp_u32);
tmp_src += sizeof(tmp_u32);
}
return dest;
}
This helped; the first mlx5_ib_post_send code initiated from SMC-code (type IB_WR_SEND,
flagged with IB_SEND_INLINE, length 44 bytes) run successful.
A following mlx5_ib_post_send call of type RDMA_WRITE seems to stall later on, but
this is something I have to analyze in more detail.
>
>> Kind regards, Ursula Braun (IBM Germany)
>>
>
> Thanks for notifying.
>
> Matan
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-06 13:08 ` Eli Cohen
[not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-03-06 13:08 UTC (permalink / raw)
To: Ursula Braun, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an
>> address belonging to an area mapped with the ib_dma_map_single()
>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>
By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: mlx5_ib_post_send panic on s390x
[not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-06 13:47 ` Ursula Braun
0 siblings, 0 replies; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 13:47 UTC (permalink / raw)
To: Eli Cohen, Matan Barak, Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 03/06/2017 01:56 PM, Eli Cohen wrote:
> Please send information on:
>
> - The size of the required inline data in the offending work request
44 bytes (ib_createqp with ib_qp_init_attr.cap.max_inline_data=44)
> - The transport service used
IB_QPT_RC
> - How many outstanding work requests the send queue is configured to
ib_create_cq with ib_cq_init_attr.cqe=32768
ib_create_qp with ib_qp_init_attr.cap.max_send_wr=16
> - What was the serial number of the work request that triggered this oops (first, second, 65th etc).
serial number wr_id=1
>
> -----Original Message-----
> From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org]
> Sent: Monday, March 6, 2017 5:17 AM
> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mlx5_ib_post_send panic on s390x
>
>
> On 02/24/2017 06:28 PM, Eli Cohen wrote:
>> Hi,
>>
>> Can you please send details of the work request you are posting? I assume you are using inline, right?
> yes, inline is used:
>
> lnk->wr_tx_sges[i].addr =
> lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
> lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
> lnk->wr_tx_ibs[i].next = NULL;
> lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
> lnk->wr_tx_ibs[i].num_sge = 1;
> lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
> lnk->wr_tx_ibs[i].send_flags =
> IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;
>
>>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>> Sent: Friday, February 24, 2017 3:52 AM
>> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: mlx5_ib_post_send panic on s390x
>>
>> Hi Saeed and Matan,
>>
>> up to now I run SMC-R traffic on Connect X3, which works.
>> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>>
>> [ 247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [ 247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [ 247.787664] Fault in home space mode while using kernel ASCE.
>> [ 247.787667] AS:00000000011ec007 R3:0000000000000024 [ 247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [ 247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
>> [ 247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G W 4.10.0uschi+ #4
>> [ 247.787739] Hardware name: IBM 2964 N96 704 (LPAR)
>> [ 247.787743] Workqueue: events smc_listen_work [smc] [ 247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [ 247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
>> [ 247.787751] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [ 247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
>> [ 247.787755] 000000000000002b 000000000076242e 000000000000002c 0000000099c96440
>> [ 247.787757] 000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
>> [ 247.787758] 00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
>> [ 247.787764] Krnl Code: 0000000000762404: b9040012 lgr %r1,%r2
>> 0000000000762408: a7740008 brc 7,762418
>> #000000000076240c: c05000000011 larl %r5,76242e
>> >0000000000762412: 44405000 ex %r4,0(%r5)
>> 0000000000762416: 07fe bcr 15,%r14
>> 0000000000762418: d2ff10003000 mvc 0(256,%r1),0(%r3)
>> 000000000076241e: 41101100 la %r1,256(%r1)
>> 0000000000762422: 41303100 la %r3,256(%r3)
>> [ 247.787780] Call Trace:
>> [ 247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [ 247.787789] [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [ 247.787792] [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [ 247.787794] [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [ 247.787797] [<00000000001659e8>] process_one_work+0x3d8/0x780 [ 247.787799] [<0000000000166044>] worker_thread+0x2b4/0x478 [ 247.787801] [<000000000016e62c>] kthread+0x15c/0x170 [ 247.787803] [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [ 247.787804] [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [ 247.787806] INFO: lockdep is turned off.
>> [ 247.787807] Last Breaking-Event-Address:
>> [ 247.787811] [<000003ff8106edc0>] 0x3ff8106edc0 [ 247.787813] [ 247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>
>> Kind regards, Ursula Braun (IBM Germany)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-09 9:54 ` Ursula Braun
[not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-09 9:54 UTC (permalink / raw)
To: Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>
>>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>> address belonging to an area mapped with the ib_dma_map_single()
>>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>>
>
> By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.
>
In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-09 12:58 ` Eli Cohen
2017-03-12 20:20 ` Parav Pandit
1 sibling, 0 replies; 15+ messages in thread
From: Eli Cohen @ 2017-03-09 12:58 UTC (permalink / raw)
To: Ursula Braun, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Yes, for mlx4 it is ignored.
-----Original Message-----
From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
Sent: Thursday, March 9, 2017 3:54 AM
To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky <leonro@mellanox.com>; linux-rdma@vger.kernel.org
Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>
>>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>> address belonging to an area mapped with the ib_dma_map_single()
>>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>>
>
> By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.
>
In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards?
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-09 12:58 ` Eli Cohen
@ 2017-03-12 20:20 ` Parav Pandit
[not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
1 sibling, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2017-03-12 20:20 UTC (permalink / raw)
To: Ursula Braun, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Ursula,
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Ursula Braun
> Sent: Thursday, March 9, 2017 3:54 AM
> To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>
>
>
> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>
> >>> The problem seems to be caused by the usage of plain memcpy in
> set_data_inl_seg().
> >>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>> address belonging to an area mapped with the ib_dma_map_single()
> >>> call. On s390x those kind of addresses require extra access functions (see
> arch/s390/include/asm/io.h).
> >>>
> >
> > By definition, when you are posting a send request with inline, the address
> must be mapped to the cpu so plain memcpy should work.
> >
> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to
> contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean
> for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and
> thus I needed the ib_dma_map_single() call for the area used with
> ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE
> flag, if I want to run the same SMC-R code with both, Connect X3 cards and
> Connect X4 cards?
>
I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64.
Shall I submit below fix to netdev mailing list?
I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr.
- lnk->wr_tx_sges[i].addr =
- lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
+ lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i);
I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base.
I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
Let me know.
Regards,
Parav Pandit
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index a2e4ca5..0d984f5 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
unsigned long flags;
int nreq;
int err = 0;
+ int inl = 0;
unsigned ind;
int uninitialized_var(stamp);
int uninitialized_var(size);
@@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
default:
break;
}
+ if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
+ struct mlx4_wqe_inline_seg *seg;
+ void *addr;
+ int len, seg_len;
+ int num_seg;
+ int off, to_copy;
+
+ inl = 0;
+
+ seg = wqe;
+ wqe += sizeof *seg;
+ off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
+ num_seg = 0;
+ seg_len = 0;
+
+ for (i = 0; i < wr->num_sge; ++i) {
+ addr = (void *) (uintptr_t) wr->sg_list[i].addr;
+ len = wr->sg_list[i].length;
+ inl += len;
+
+ if (inl > 16) {
+ inl = 0;
+ err = ENOMEM;
+ *bad_wr = wr;
+ goto out;
+ }
- /*
- * Write data segments in reverse order, so as to
- * overwrite cacheline stamp last within each
- * cacheline. This avoids issues with WQE
- * prefetching.
- */
+ while (len >= MLX4_INLINE_ALIGN - off) {
+ to_copy = MLX4_INLINE_ALIGN - off;
+ memcpy(wqe, addr, to_copy);
+ len -= to_copy;
+ wqe += to_copy;
+ addr += to_copy;
+ seg_len += to_copy;
+ wmb(); /* see comment below */
+ seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
+ seg_len = 0;
+ seg = wqe;
+ wqe += sizeof *seg;
+ off = sizeof *seg;
+ ++num_seg;
+ }
- dseg = wqe;
- dseg += wr->num_sge - 1;
- size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
+ memcpy(wqe, addr, len);
+ wqe += len;
+ seg_len += len;
+ off += len;
+ }
- /* Add one more inline data segment for ICRC for MLX sends */
- if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
- qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
- qp->mlx4_ib_qp_type &
- (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
- set_mlx_icrc_seg(dseg + 1);
- size += sizeof (struct mlx4_wqe_data_seg) / 16;
- }
+ if (seg_len) {
+ ++num_seg;
+ /*
+ * Need a barrier here to make sure
+ * all the data is visible before the
+ * byte_count field is set. Otherwise
+ * the HCA prefetcher could grab the
+ * 64-byte chunk with this inline
+ * segment and get a valid (!=
+ * 0xffffffff) byte count but stale
+ * data, and end up sending the wrong
+ * data.
+ */
+ wmb();
+ seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
+ }
- for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
- set_data_seg(dseg, wr->sg_list + i);
+ size += (inl + num_seg * sizeof (*seg) + 15) / 16;
+ } else {
+ /*
+ * Write data segments in reverse order, so as to
+ * overwrite cacheline stamp last within each
+ * cacheline. This avoids issues with WQE
+ * prefetching.
+ */
+
+ dseg = wqe;
+ dseg += wr->num_sge - 1;
+ size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
+
+ /* Add one more inline data segment for ICRC for MLX sends */
+ if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
+ qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
+ qp->mlx4_ib_qp_type &
+ (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
+ set_mlx_icrc_seg(dseg + 1);
+ size += sizeof (struct mlx4_wqe_data_seg) / 16;
+ }
+ for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
+ set_data_seg(dseg, wr->sg_list + i);
+ }
/*
* Possibly overwrite stamping in cacheline with LSO
* segment only after making sure all data segments
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-12 20:38 ` Parav Pandit
2017-03-14 15:02 ` Ursula Braun
1 sibling, 0 replies; 15+ messages in thread
From: Parav Pandit @ 2017-03-12 20:38 UTC (permalink / raw)
To: Parav Pandit, Ursula Braun, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
I meant mlx4_ib* driver below. Sorry for typo.
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Parav Pandit
> Sent: Sunday, March 12, 2017 3:21 PM
> To: Ursula Braun <ubraun@linux.vnet.ibm.com>; Eli Cohen
> <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: RE: Fwd: mlx5_ib_post_send panic on s390x
>
> Hi Ursula,
>
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > owner@vger.kernel.org] On Behalf Of Ursula Braun
> > Sent: Thursday, March 9, 2017 3:54 AM
> > To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> > <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >
> >
> >
> > On 03/06/2017 02:08 PM, Eli Cohen wrote:
> > >>>
> > >>> The problem seems to be caused by the usage of plain memcpy in
> > set_data_inl_seg().
> > >>> The address provided by SMC-code in struct ib_send_wr *wr is an
> > >>> address belonging to an area mapped with the ib_dma_map_single()
> > >>> call. On s390x those kind of addresses require extra access
> > >>> functions (see
> > arch/s390/include/asm/io.h).
> > >>>
> > >
> > > By definition, when you are posting a send request with inline, the
> > > address
> > must be mapped to the cpu so plain memcpy should work.
> > >
> > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> > not seem to contain extra coding for IB_SEND_INLINE flag for
> > ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
> > IB_SEND_INLINE flag is ignored, and thus I needed the
> > ib_dma_map_single() call for the area used with ib_post_send()? Does
> > this mean I should stay away from the IB_SEND_INLINE flag, if I want
> > to run the same SMC-R code with both, Connect X3 cards and Connect X4
> cards?
> >
> I had encountered the same kernel panic that you mentioned last week on
> ConnectX-4 adapters with smc-r on x86_64.
> Shall I submit below fix to netdev mailing list?
> I have tested above change. I also have optimization that avoids dma mapping
> for wr_tx_dma_addr.
>
> - lnk->wr_tx_sges[i].addr =
> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs +
> + i);
>
> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
> kernel base.
> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> Let me know.
>
> Regards,
> Parav Pandit
>
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index a2e4ca5..0d984f5 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> unsigned long flags;
> int nreq;
> int err = 0;
> + int inl = 0;
> unsigned ind;
> int uninitialized_var(stamp);
> int uninitialized_var(size);
> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> default:
> break;
> }
> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> + struct mlx4_wqe_inline_seg *seg;
> + void *addr;
> + int len, seg_len;
> + int num_seg;
> + int off, to_copy;
> +
> + inl = 0;
> +
> + seg = wqe;
> + wqe += sizeof *seg;
> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> + num_seg = 0;
> + seg_len = 0;
> +
> + for (i = 0; i < wr->num_sge; ++i) {
> + addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> + len = wr->sg_list[i].length;
> + inl += len;
> +
> + if (inl > 16) {
> + inl = 0;
> + err = ENOMEM;
> + *bad_wr = wr;
> + goto out;
> + }
>
> - /*
> - * Write data segments in reverse order, so as to
> - * overwrite cacheline stamp last within each
> - * cacheline. This avoids issues with WQE
> - * prefetching.
> - */
> + while (len >= MLX4_INLINE_ALIGN - off) {
> + to_copy = MLX4_INLINE_ALIGN - off;
> + memcpy(wqe, addr, to_copy);
> + len -= to_copy;
> + wqe += to_copy;
> + addr += to_copy;
> + seg_len += to_copy;
> + wmb(); /* see comment below */
> + seg->byte_count =
> htonl(MLX4_INLINE_SEG | seg_len);
> + seg_len = 0;
> + seg = wqe;
> + wqe += sizeof *seg;
> + off = sizeof *seg;
> + ++num_seg;
> + }
>
> - dseg = wqe;
> - dseg += wr->num_sge - 1;
> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> 16);
> + memcpy(wqe, addr, len);
> + wqe += len;
> + seg_len += len;
> + off += len;
> + }
>
> - /* Add one more inline data segment for ICRC for MLX sends */
> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> - qp->mlx4_ib_qp_type &
> - (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> - set_mlx_icrc_seg(dseg + 1);
> - size += sizeof (struct mlx4_wqe_data_seg) / 16;
> - }
> + if (seg_len) {
> + ++num_seg;
> + /*
> + * Need a barrier here to make sure
> + * all the data is visible before the
> + * byte_count field is set. Otherwise
> + * the HCA prefetcher could grab the
> + * 64-byte chunk with this inline
> + * segment and get a valid (!=
> + * 0xffffffff) byte count but stale
> + * data, and end up sending the wrong
> + * data.
> + */
> + wmb();
> + seg->byte_count = htonl(MLX4_INLINE_SEG |
> seg_len);
> + }
>
> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> - set_data_seg(dseg, wr->sg_list + i);
> + size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> + } else {
> + /*
> + * Write data segments in reverse order, so as to
> + * overwrite cacheline stamp last within each
> + * cacheline. This avoids issues with WQE
> + * prefetching.
> + */
> +
> + dseg = wqe;
> + dseg += wr->num_sge - 1;
> + size += wr->num_sge * (sizeof (struct
> mlx4_wqe_data_seg) / 16);
> +
> + /* Add one more inline data segment for ICRC for MLX
> sends */
> + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
> ||
> + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
> ||
> + qp->mlx4_ib_qp_type &
> + (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> + set_mlx_icrc_seg(dseg + 1);
> + size += sizeof (struct mlx4_wqe_data_seg) / 16;
> + }
>
> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> + set_data_seg(dseg, wr->sg_list + i);
> + }
> /*
> * Possibly overwrite stamping in cacheline with LSO
> * segment only after making sure all data segments
>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> \x04 {.n + +% lzwm b 맲 r zX \x1aݙ \x17 ܨ} Ơz &j:+v zZ+ +zf h ~ i z \x1e w ?
> & )ߢ^[f
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-12 20:38 ` Parav Pandit
@ 2017-03-14 15:02 ` Ursula Braun
[not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
1 sibling, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-14 15:02 UTC (permalink / raw)
To: Parav Pandit, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Parav,
I tried your mlx4-patch together with SMC on s390x, but it failed.
The SMC-R code tries to send 44 bytes as inline in 1 sge.
I wonder about a length check with 16 bytes, which probably explains the failure.
See my question below in the patch:
On 03/12/2017 09:20 PM, Parav Pandit wrote:
> Hi Ursula,
>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>> Sent: Thursday, March 9, 2017 3:54 AM
>> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>
>>
>>
>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>>>
>>>>> The problem seems to be caused by the usage of plain memcpy in
>> set_data_inl_seg().
>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>>>> address belonging to an area mapped with the ib_dma_map_single()
>>>>> call. On s390x those kind of addresses require extra access functions (see
>> arch/s390/include/asm/io.h).
>>>>>
>>>
>>> By definition, when you are posting a send request with inline, the address
>> must be mapped to the cpu so plain memcpy should work.
>>>
>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to
>> contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean
>> for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and
>> thus I needed the ib_dma_map_single() call for the area used with
>> ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE
>> flag, if I want to run the same SMC-R code with both, Connect X3 cards and
>> Connect X4 cards?
>>
> I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64.
> Shall I submit below fix to netdev mailing list?
> I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr.
>
> - lnk->wr_tx_sges[i].addr =
> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i);
>
> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base.
> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> Let me know.
>
> Regards,
> Parav Pandit
>
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index a2e4ca5..0d984f5 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> unsigned long flags;
> int nreq;
> int err = 0;
> + int inl = 0;
> unsigned ind;
> int uninitialized_var(stamp);
> int uninitialized_var(size);
> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
> default:
> break;
> }
> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> + struct mlx4_wqe_inline_seg *seg;
> + void *addr;
> + int len, seg_len;
> + int num_seg;
> + int off, to_copy;
> +
> + inl = 0;
> +
> + seg = wqe;
> + wqe += sizeof *seg;
> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> + num_seg = 0;
> + seg_len = 0;
> +
> + for (i = 0; i < wr->num_sge; ++i) {
> + addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> + len = wr->sg_list[i].length;
> + inl += len;
> +
> + if (inl > 16) {
> + inl = 0;
> + err = ENOMEM;
> + *bad_wr = wr;
> + goto out;
> + }
SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for IB_SEND_INLINE data?
The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function does not
seem to return an error.
>
> - /*
> - * Write data segments in reverse order, so as to
> - * overwrite cacheline stamp last within each
> - * cacheline. This avoids issues with WQE
> - * prefetching.
> - */
> + while (len >= MLX4_INLINE_ALIGN - off) {
> + to_copy = MLX4_INLINE_ALIGN - off;
> + memcpy(wqe, addr, to_copy);
> + len -= to_copy;
> + wqe += to_copy;
> + addr += to_copy;
> + seg_len += to_copy;
> + wmb(); /* see comment below */
> + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
> + seg_len = 0;
> + seg = wqe;
> + wqe += sizeof *seg;
> + off = sizeof *seg;
> + ++num_seg;
> + }
>
> - dseg = wqe;
> - dseg += wr->num_sge - 1;
> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
> + memcpy(wqe, addr, len);
> + wqe += len;
> + seg_len += len;
> + off += len;
> + }
>
> - /* Add one more inline data segment for ICRC for MLX sends */
> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> - qp->mlx4_ib_qp_type &
> - (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
> - set_mlx_icrc_seg(dseg + 1);
> - size += sizeof (struct mlx4_wqe_data_seg) / 16;
> - }
> + if (seg_len) {
> + ++num_seg;
> + /*
> + * Need a barrier here to make sure
> + * all the data is visible before the
> + * byte_count field is set. Otherwise
> + * the HCA prefetcher could grab the
> + * 64-byte chunk with this inline
> + * segment and get a valid (!=
> + * 0xffffffff) byte count but stale
> + * data, and end up sending the wrong
> + * data.
> + */
> + wmb();
> + seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
> + }
>
> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> - set_data_seg(dseg, wr->sg_list + i);
> + size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> + } else {
> + /*
> + * Write data segments in reverse order, so as to
> + * overwrite cacheline stamp last within each
> + * cacheline. This avoids issues with WQE
> + * prefetching.
> + */
> +
> + dseg = wqe;
> + dseg += wr->num_sge - 1;
> + size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
> +
> + /* Add one more inline data segment for ICRC for MLX sends */
> + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> + qp->mlx4_ib_qp_type &
> + (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
> + set_mlx_icrc_seg(dseg + 1);
> + size += sizeof (struct mlx4_wqe_data_seg) / 16;
> + }
>
> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> + set_data_seg(dseg, wr->sg_list + i);
> + }
> /*
> * Possibly overwrite stamping in cacheline with LSO
> * segment only after making sure all data segments
>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
>> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-14 15:24 ` Parav Pandit
[not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2017-03-14 15:24 UTC (permalink / raw)
To: Ursula Braun, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Ursula,
> -----Original Message-----
> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> Sent: Tuesday, March 14, 2017 10:02 AM
> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>
> Hi Parav,
>
> I tried your mlx4-patch together with SMC on s390x, but it failed.
> The SMC-R code tries to send 44 bytes as inline in 1 sge.
> I wonder about a length check with 16 bytes, which probably explains the
> failure.
> See my question below in the patch:
>
> On 03/12/2017 09:20 PM, Parav Pandit wrote:
> > Hi Ursula,
> >
> >> -----Original Message-----
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Ursula Braun
> >> Sent: Thursday, March 9, 2017 3:54 AM
> >> To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>
> >>
> >>
> >> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>>>
> >>>>> The problem seems to be caused by the usage of plain memcpy in
> >> set_data_inl_seg().
> >>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>>>> address belonging to an area mapped with the ib_dma_map_single()
> >>>>> call. On s390x those kind of addresses require extra access
> >>>>> functions (see
> >> arch/s390/include/asm/io.h).
> >>>>>
> >>>
> >>> By definition, when you are posting a send request with inline, the
> >>> address
> >> must be mapped to the cpu so plain memcpy should work.
> >>>
> >> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> >> not seem to contain extra coding for IB_SEND_INLINE flag for
> >> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
> >> IB_SEND_INLINE flag is ignored, and thus I needed the
> >> ib_dma_map_single() call for the area used with ib_post_send()? Does
> >> this mean I should stay away from the IB_SEND_INLINE flag, if I want
> >> to run the same SMC-R code with both, Connect X3 cards and Connect X4
> cards?
> >>
> > I had encountered the same kernel panic that you mentioned last week on
> ConnectX-4 adapters with smc-r on x86_64.
> > Shall I submit below fix to netdev mailing list?
> > I have tested above change. I also have optimization that avoids dma mapping
> for wr_tx_dma_addr.
> >
> > - lnk->wr_tx_sges[i].addr =
> > - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> > + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs
> > + + i);
> >
> > I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
> kernel base.
> > I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> > Let me know.
> >
> > Regards,
> > Parav Pandit
> >
> > diff --git a/drivers/infiniband/hw/mlx4/qp.c
> > b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
> > --- a/drivers/infiniband/hw/mlx4/qp.c
> > +++ b/drivers/infiniband/hw/mlx4/qp.c
> > @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> > unsigned long flags;
> > int nreq;
> > int err = 0;
> > + int inl = 0;
> > unsigned ind;
> > int uninitialized_var(stamp);
> > int uninitialized_var(size);
> > @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> > default:
> > break;
> > }
> > + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> > + struct mlx4_wqe_inline_seg *seg;
> > + void *addr;
> > + int len, seg_len;
> > + int num_seg;
> > + int off, to_copy;
> > +
> > + inl = 0;
> > +
> > + seg = wqe;
> > + wqe += sizeof *seg;
> > + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> > + num_seg = 0;
> > + seg_len = 0;
> > +
> > + for (i = 0; i < wr->num_sge; ++i) {
> > + addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> > + len = wr->sg_list[i].length;
> > + inl += len;
> > +
> > + if (inl > 16) {
> > + inl = 0;
> > + err = ENOMEM;
> > + *bad_wr = wr;
> > + goto out;
> > + }
> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
> IB_SEND_INLINE data?
> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function
> does not seem to return an error.
> >
This check should be for max_inline_data variable of the QP.
This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes.
I will fix this. Is it possible to change to 44 and do quick test?
Final patch will have right check in addition to check in create_qp?
> > - /*
> > - * Write data segments in reverse order, so as to
> > - * overwrite cacheline stamp last within each
> > - * cacheline. This avoids issues with WQE
> > - * prefetching.
> > - */
> > + while (len >= MLX4_INLINE_ALIGN - off) {
> > + to_copy = MLX4_INLINE_ALIGN - off;
> > + memcpy(wqe, addr, to_copy);
> > + len -= to_copy;
> > + wqe += to_copy;
> > + addr += to_copy;
> > + seg_len += to_copy;
> > + wmb(); /* see comment below */
> > + seg->byte_count =
> htonl(MLX4_INLINE_SEG | seg_len);
> > + seg_len = 0;
> > + seg = wqe;
> > + wqe += sizeof *seg;
> > + off = sizeof *seg;
> > + ++num_seg;
> > + }
> >
> > - dseg = wqe;
> > - dseg += wr->num_sge - 1;
> > - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> 16);
> > + memcpy(wqe, addr, len);
> > + wqe += len;
> > + seg_len += len;
> > + off += len;
> > + }
> >
> > - /* Add one more inline data segment for ICRC for MLX sends */
> > - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> > - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> > - qp->mlx4_ib_qp_type &
> > - (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> > - set_mlx_icrc_seg(dseg + 1);
> > - size += sizeof (struct mlx4_wqe_data_seg) / 16;
> > - }
> > + if (seg_len) {
> > + ++num_seg;
> > + /*
> > + * Need a barrier here to make sure
> > + * all the data is visible before the
> > + * byte_count field is set. Otherwise
> > + * the HCA prefetcher could grab the
> > + * 64-byte chunk with this inline
> > + * segment and get a valid (!=
> > + * 0xffffffff) byte count but stale
> > + * data, and end up sending the wrong
> > + * data.
> > + */
> > + wmb();
> > + seg->byte_count = htonl(MLX4_INLINE_SEG |
> seg_len);
> > + }
> >
> > - for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> > - set_data_seg(dseg, wr->sg_list + i);
> > + size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> > + } else {
> > + /*
> > + * Write data segments in reverse order, so as to
> > + * overwrite cacheline stamp last within each
> > + * cacheline. This avoids issues with WQE
> > + * prefetching.
> > + */
> > +
> > + dseg = wqe;
> > + dseg += wr->num_sge - 1;
> > + size += wr->num_sge * (sizeof (struct
> mlx4_wqe_data_seg) / 16);
> > +
> > + /* Add one more inline data segment for ICRC for MLX
> sends */
> > + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
> ||
> > + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
> ||
> > + qp->mlx4_ib_qp_type &
> > + (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> > + set_mlx_icrc_seg(dseg + 1);
> > + size += sizeof (struct mlx4_wqe_data_seg) / 16;
> > + }
> >
> > + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> > + set_data_seg(dseg, wr->sg_list + i);
> > + }
> > /*
> > * Possibly overwrite stamping in cacheline with LSO
> > * segment only after making sure all data segments
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-16 11:51 ` Ursula Braun
[not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-16 11:51 UTC (permalink / raw)
To: Parav Pandit, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Parav,
I run your new mlx4-Code together with changed SMC-R code no longer mapping
the IB_SEND_INLINE area. It worked - great!
Below I have added a small improvement idea in your patch.
Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag
in the SMC-R code, since there is no guarantee that this will work with
all kinds of RoCE-devices. The maximum length for IB_SEND_INLINE depends
on the RoCE-driver - right? Is there an interface to determine such a
maximum length? Would ib_create_qp() return with an error, if the
SMC-R specified .cap.max_inline_data = 44 is not supported by a RoCE-driver?
On 03/14/2017 04:24 PM, Parav Pandit wrote:
> Hi Ursula,
>
>
>> -----Original Message-----
>> From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org]
>> Sent: Tuesday, March 14, 2017 10:02 AM
>> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>;
>> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>
>> Hi Parav,
>>
>> I tried your mlx4-patch together with SMC on s390x, but it failed.
>> The SMC-R code tries to send 44 bytes as inline in 1 sge.
>> I wonder about a length check with 16 bytes, which probably explains the
>> failure.
>> See my question below in the patch:
>>
>> On 03/12/2017 09:20 PM, Parav Pandit wrote:
>>> Hi Ursula,
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>>>> Sent: Thursday, March 9, 2017 3:54 AM
>>>> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>>>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>>>
>>>>
>>>>
>>>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>>>>>
>>>>>>> The problem seems to be caused by the usage of plain memcpy in
>>>> set_data_inl_seg().
>>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>>>>>> address belonging to an area mapped with the ib_dma_map_single()
>>>>>>> call. On s390x those kind of addresses require extra access
>>>>>>> functions (see
>>>> arch/s390/include/asm/io.h).
>>>>>>>
>>>>>
>>>>> By definition, when you are posting a send request with inline, the
>>>>> address
>>>> must be mapped to the cpu so plain memcpy should work.
>>>>>
>>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
>>>> not seem to contain extra coding for IB_SEND_INLINE flag for
>>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
>>>> IB_SEND_INLINE flag is ignored, and thus I needed the
>>>> ib_dma_map_single() call for the area used with ib_post_send()? Does
>>>> this mean I should stay away from the IB_SEND_INLINE flag, if I want
>>>> to run the same SMC-R code with both, Connect X3 cards and Connect X4
>> cards?
>>>>
>>> I had encountered the same kernel panic that you mentioned last week on
>> ConnectX-4 adapters with smc-r on x86_64.
>>> Shall I submit below fix to netdev mailing list?
>>> I have tested above change. I also have optimization that avoids dma mapping
>> for wr_tx_dma_addr.
>>>
>>> - lnk->wr_tx_sges[i].addr =
>>> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
>>> + lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs
>>> + + i);
>>>
>>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
>> kernel base.
>>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
>>> Let me know.
>>>
>>> Regards,
>>> Parav Pandit
>>>
>>> diff --git a/drivers/infiniband/hw/mlx4/qp.c
>>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
>>> --- a/drivers/infiniband/hw/mlx4/qp.c
>>> +++ b/drivers/infiniband/hw/mlx4/qp.c
>>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
>> ib_send_wr *wr,
>>> unsigned long flags;
>>> int nreq;
>>> int err = 0;
>>> + int inl = 0;
>>> unsigned ind;
>>> int uninitialized_var(stamp);
>>> int uninitialized_var(size);
>>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
>> ib_send_wr *wr,
>>> default:
>>> break;
>>> }
>>> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
>>> + struct mlx4_wqe_inline_seg *seg;
>>> + void *addr;
>>> + int len, seg_len;
>>> + int num_seg;
>>> + int off, to_copy;
>>> +
>>> + inl = 0;
>>> +
>>> + seg = wqe;
>>> + wqe += sizeof *seg;
>>> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
>>> + num_seg = 0;
>>> + seg_len = 0;
>>> +
>>> + for (i = 0; i < wr->num_sge; ++i) {
>>> + addr = (void *) (uintptr_t) wr->sg_list[i].addr;
>>> + len = wr->sg_list[i].length;
>>> + inl += len;
>>> +
>>> + if (inl > 16) {
>>> + inl = 0;
>>> + err = ENOMEM;
>>> + *bad_wr = wr;
>>> + goto out;
>>> + }
>> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
>> IB_SEND_INLINE data?
>> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function
>> does not seem to return an error.
>>>
> This check should be for max_inline_data variable of the QP.
> This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes.
> I will fix this. Is it possible to change to 44 and do quick test?
> Final patch will have right check in addition to check in create_qp?
>
>>> - /*
>>> - * Write data segments in reverse order, so as to
>>> - * overwrite cacheline stamp last within each
>>> - * cacheline. This avoids issues with WQE
>>> - * prefetching.
>>> - */
>>> + while (len >= MLX4_INLINE_ALIGN - off) {
With this code there are 2 memcpy-Calls, one with to_copy=44, and the next one with len 0.
I suggest to change the check to "len > MLX4_INLINE_ALIGN - off".
>>> + to_copy = MLX4_INLINE_ALIGN - off;
>>> + memcpy(wqe, addr, to_copy);
>>> + len -= to_copy;
>>> + wqe += to_copy;
>>> + addr += to_copy;
>>> + seg_len += to_copy;
>>> + wmb(); /* see comment below */
>>> + seg->byte_count =
>> htonl(MLX4_INLINE_SEG | seg_len);
>>> + seg_len = 0;
>>> + seg = wqe;
>>> + wqe += sizeof *seg;
>>> + off = sizeof *seg;
>>> + ++num_seg;
>>> + }
>>>
>>> - dseg = wqe;
>>> - dseg += wr->num_sge - 1;
>>> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
>> 16);
>>> + memcpy(wqe, addr, len);
>>> + wqe += len;
>>> + seg_len += len;
>>> + off += len;
>>> + }
>>>
>>> - /* Add one more inline data segment for ICRC for MLX sends */
>>> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
>>> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
>>> - qp->mlx4_ib_qp_type &
>>> - (MLX4_IB_QPT_PROXY_SMI_OWNER |
>> MLX4_IB_QPT_TUN_SMI_OWNER))) {
>>> - set_mlx_icrc_seg(dseg + 1);
>>> - size += sizeof (struct mlx4_wqe_data_seg) / 16;
>>> - }
>>> + if (seg_len) {
>>> + ++num_seg;
>>> + /*
>>> + * Need a barrier here to make sure
>>> + * all the data is visible before the
>>> + * byte_count field is set. Otherwise
>>> + * the HCA prefetcher could grab the
>>> + * 64-byte chunk with this inline
>>> + * segment and get a valid (!=
>>> + * 0xffffffff) byte count but stale
>>> + * data, and end up sending the wrong
>>> + * data.
>>> + */
>>> + wmb();
>>> + seg->byte_count = htonl(MLX4_INLINE_SEG |
>> seg_len);
>>> + }
>>>
>>> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
>>> - set_data_seg(dseg, wr->sg_list + i);
>>> + size += (inl + num_seg * sizeof (*seg) + 15) / 16;
>>> + } else {
>>> + /*
>>> + * Write data segments in reverse order, so as to
>>> + * overwrite cacheline stamp last within each
>>> + * cacheline. This avoids issues with WQE
>>> + * prefetching.
>>> + */
>>> +
>>> + dseg = wqe;
>>> + dseg += wr->num_sge - 1;
>>> + size += wr->num_sge * (sizeof (struct
>> mlx4_wqe_data_seg) / 16);
>>> +
>>> + /* Add one more inline data segment for ICRC for MLX
>> sends */
>>> + if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
>> ||
>>> + qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
>> ||
>>> + qp->mlx4_ib_qp_type &
>>> + (MLX4_IB_QPT_PROXY_SMI_OWNER |
>> MLX4_IB_QPT_TUN_SMI_OWNER))) {
>>> + set_mlx_icrc_seg(dseg + 1);
>>> + size += sizeof (struct mlx4_wqe_data_seg) / 16;
>>> + }
>>>
>>> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
>>> + set_data_seg(dseg, wr->sg_list + i);
>>> + }
>>> /*
>>> * Possibly overwrite stamping in cacheline with LSO
>>> * segment only after making sure all data segments
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: Fwd: mlx5_ib_post_send panic on s390x
[not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-20 21:04 ` Parav Pandit
0 siblings, 0 replies; 15+ messages in thread
From: Parav Pandit @ 2017-03-20 21:04 UTC (permalink / raw)
To: Ursula Braun, Eli Cohen, Matan Barak
Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 11077 bytes --]
Hi Ursula,
For the suggestion it still need to continue to check for len >= INLINE_ALIGN - off because 44 = 64-20.
Which is still a valid case (len == inline - off).
But I agree that it shouldn't do 2nd memcpy with zero length.
Therefore there should be additional check for len != 0.
Coming to IB_SEND_INLINE_DATA part, when ib_create_qp is called and if HCA doesn't support cap.max_inline_data, provider HCA driver is supposed to fail the call.
And ULP is expected to do fallback to non_inline scheme.
As it appears mlx4 driver is not failing this call, which is a bug that needs fix.
Instead of failing the call, I prefer to provide the data path sooner based on my inline patch in this email thread.
Parav
> -----Original Message-----
> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> Sent: Thursday, March 16, 2017 6:51 AM
> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>
> Hi Parav,
>
> I run your new mlx4-Code together with changed SMC-R code no longer
> mapping the IB_SEND_INLINE area. It worked - great!
>
> Below I have added a small improvement idea in your patch.
>
> Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag in
> the SMC-R code, since there is no guarantee that this will work with all kinds
> of RoCE-devices. The maximum length for IB_SEND_INLINE depends on the
> RoCE-driver - right? Is there an interface to determine such a maximum
> length? Would ib_create_qp() return with an error, if the SMC-R specified
> .cap.max_inline_data = 44 is not supported by a RoCE-driver?
>
> On 03/14/2017 04:24 PM, Parav Pandit wrote:
> > Hi Ursula,
> >
> >
> >> -----Original Message-----
> >> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> >> Sent: Tuesday, March 14, 2017 10:02 AM
> >> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> >> Matan Barak <matanb@mellanox.com>
> >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>
> >> Hi Parav,
> >>
> >> I tried your mlx4-patch together with SMC on s390x, but it failed.
> >> The SMC-R code tries to send 44 bytes as inline in 1 sge.
> >> I wonder about a length check with 16 bytes, which probably explains
> >> the failure.
> >> See my question below in the patch:
> >>
> >> On 03/12/2017 09:20 PM, Parav Pandit wrote:
> >>> Hi Ursula,
> >>>
> >>>> -----Original Message-----
> >>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >>>> owner@vger.kernel.org] On Behalf Of Ursula Braun
> >>>> Sent: Thursday, March 9, 2017 3:54 AM
> >>>> To: Eli Cohen <eli@mellanox.com>; Matan Barak
> <matanb@mellanox.com>
> >>>> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >>>> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>>>
> >>>>
> >>>>
> >>>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>>>>>
> >>>>>>> The problem seems to be caused by the usage of plain memcpy in
> >>>> set_data_inl_seg().
> >>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>>>>>> address belonging to an area mapped with the
> ib_dma_map_single()
> >>>>>>> call. On s390x those kind of addresses require extra access
> >>>>>>> functions (see
> >>>> arch/s390/include/asm/io.h).
> >>>>>>>
> >>>>>
> >>>>> By definition, when you are posting a send request with inline,
> >>>>> the address
> >>>> must be mapped to the cpu so plain memcpy should work.
> >>>>>
> >>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> >>>> not seem to contain extra coding for IB_SEND_INLINE flag for
> >>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards
> >>>> the IB_SEND_INLINE flag is ignored, and thus I needed the
> >>>> ib_dma_map_single() call for the area used with ib_post_send()?
> >>>> Does this mean I should stay away from the IB_SEND_INLINE flag, if
> >>>> I want to run the same SMC-R code with both, Connect X3 cards and
> >>>> Connect X4
> >> cards?
> >>>>
> >>> I had encountered the same kernel panic that you mentioned last week
> >>> on
> >> ConnectX-4 adapters with smc-r on x86_64.
> >>> Shall I submit below fix to netdev mailing list?
> >>> I have tested above change. I also have optimization that avoids dma
> >>> mapping
> >> for wr_tx_dma_addr.
> >>>
> >>> - lnk->wr_tx_sges[i].addr =
> >>> - lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> >>> + lnk->wr_tx_sges[i].addr =
> >>> + (uintptr_t)(lnk->wr_tx_bufs
> >>> + + i);
> >>>
> >>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on
> >>> little older
> >> kernel base.
> >>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib
> driver.
> >>> Let me know.
> >>>
> >>> Regards,
> >>> Parav Pandit
> >>>
> >>> diff --git a/drivers/infiniband/hw/mlx4/qp.c
> >>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
> >>> --- a/drivers/infiniband/hw/mlx4/qp.c
> >>> +++ b/drivers/infiniband/hw/mlx4/qp.c
> >>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp,
> >>> struct
> >> ib_send_wr *wr,
> >>> unsigned long flags;
> >>> int nreq;
> >>> int err = 0;
> >>> + int inl = 0;
> >>> unsigned ind;
> >>> int uninitialized_var(stamp);
> >>> int uninitialized_var(size);
> >>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp,
> >>> struct
> >> ib_send_wr *wr,
> >>> default:
> >>> break;
> >>> }
> >>> + if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> >>> + struct mlx4_wqe_inline_seg *seg;
> >>> + void *addr;
> >>> + int len, seg_len;
> >>> + int num_seg;
> >>> + int off, to_copy;
> >>> +
> >>> + inl = 0;
> >>> +
> >>> + seg = wqe;
> >>> + wqe += sizeof *seg;
> >>> + off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> >>> + num_seg = 0;
> >>> + seg_len = 0;
> >>> +
> >>> + for (i = 0; i < wr->num_sge; ++i) {
> >>> + addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> >>> + len = wr->sg_list[i].length;
> >>> + inl += len;
> >>> +
> >>> + if (inl > 16) {
> >>> + inl = 0;
> >>> + err = ENOMEM;
> >>> + *bad_wr = wr;
> >>> + goto out;
> >>> + }
> >> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
> >> IB_SEND_INLINE data?
> >> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the
> >> function does not seem to return an error.
> >>>
> > This check should be for max_inline_data variable of the QP.
> > This was just for error check, I should have fixed it. I was testing with nvme
> where inline data was only worth 16 bytes.
> > I will fix this. Is it possible to change to 44 and do quick test?
> > Final patch will have right check in addition to check in create_qp?
> >
> >>> - /*
> >>> - * Write data segments in reverse order, so as to
> >>> - * overwrite cacheline stamp last within each
> >>> - * cacheline. This avoids issues with WQE
> >>> - * prefetching.
> >>> - */
> >>> + while (len >= MLX4_INLINE_ALIGN - off) {
> With this code there are 2 memcpy-Calls, one with to_copy=44, and the next
> one with len 0.
> I suggest to change the check to "len > MLX4_INLINE_ALIGN - off".
> >>> + to_copy = MLX4_INLINE_ALIGN - off;
> >>> + memcpy(wqe, addr, to_copy);
> >>> + len -= to_copy;
> >>> + wqe += to_copy;
> >>> + addr += to_copy;
> >>> + seg_len += to_copy;
> >>> + wmb(); /* see comment below */
> >>> + seg->byte_count =
> >> htonl(MLX4_INLINE_SEG | seg_len);
> >>> + seg_len = 0;
> >>> + seg = wqe;
> >>> + wqe += sizeof *seg;
> >>> + off = sizeof *seg;
> >>> + ++num_seg;
> >>> + }
> >>>
> >>> - dseg = wqe;
> >>> - dseg += wr->num_sge - 1;
> >>> - size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> >> 16);
> >>> + memcpy(wqe, addr, len);
> >>> + wqe += len;
> >>> + seg_len += len;
> >>> + off += len;
> >>> + }
> >>>
> >>> - /* Add one more inline data segment for ICRC for MLX sends
> */
> >>> - if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> >>> - qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> >>> - qp->mlx4_ib_qp_type &
> >>> - (MLX4_IB_QPT_PROXY_SMI_OWNER |
> >> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> >>> - set_mlx_icrc_seg(dseg + 1);
> >>> - size += sizeof (struct mlx4_wqe_data_seg) / 16;
> >>> - }
> >>> + if (seg_len) {
> >>> + ++num_seg;
> >>> + /*
> >>> + * Need a barrier here to make sure
> >>> + * all the data is visible before the
> >>> + * byte_count field is set. Otherwise
> >>> + * the HCA prefetcher could grab the
> >>> + * 64-byte chunk with this inline
> >>> + * segment and get a valid (!=
> >>> + * 0xffffffff) byte count but stale
> >>> + * data, and end up sending the wrong
> >>> + * data.
> >>> + */
> >>> + wmb();
> >>> + seg->byte_count = htonl(MLX4_INLINE_SEG
> |
> >> seg_len);
> >>> + }
> >>>
> >>> - for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> >>> - set_data_seg(dseg, wr->sg_list + i);
> >>> + size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> >>> + } else {
> >>> + /*
> >>> + * Write data segments in reverse order, so as to
> >>> + * overwrite cacheline stamp last within each
> >>> + * cacheline. This avoids issues with WQE
> >>> + * prefetching.
> >>> + */
> >>> +
> >>> + dseg = wqe;
> >>> + dseg += wr->num_sge - 1;
> >>> + size += wr->num_sge * (sizeof (struct
> >> mlx4_wqe_data_seg) / 16);
> >>> +
> >>> + /* Add one more inline data segment for ICRC for
> MLX
> >> sends */
> >>> + if (unlikely(qp->mlx4_ib_qp_type ==
> MLX4_IB_QPT_SMI
> >> ||
> >>> + qp->mlx4_ib_qp_type ==
> MLX4_IB_QPT_GSI
> >> ||
> >>> + qp->mlx4_ib_qp_type &
> >>> + (MLX4_IB_QPT_PROXY_SMI_OWNER |
> >> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> >>> + set_mlx_icrc_seg(dseg + 1);
> >>> + size += sizeof (struct mlx4_wqe_data_seg) /
> 16;
> >>> + }
> >>>
> >>> + for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> >>> + set_data_seg(dseg, wr->sg_list + i);
> >>> + }
> >>> /*
> >>> * Possibly overwrite stamping in cacheline with LSO
> >>> * segment only after making sure all data segments
> >>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >>>> in the body of a message to majordomo@vger.kernel.org More
> >>>> majordomo info at http://vger.kernel.org/majordomo-info.html
> >
N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2017-03-20 21:04 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24 9:51 mlx5_ib_post_send panic on s390x Ursula Braun
[not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-02-24 17:28 ` Eli Cohen
[not found] ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-06 11:17 ` Ursula Braun
[not found] ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-06 12:56 ` Eli Cohen
[not found] ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-06 13:47 ` Ursula Braun
[not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com>
[not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6@mellanox.com>
[not found] ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-06 13:03 ` Fwd: " Ursula Braun
[not found] ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-06 13:08 ` Eli Cohen
[not found] ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-09 9:54 ` Ursula Braun
[not found] ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-09 12:58 ` Eli Cohen
2017-03-12 20:20 ` Parav Pandit
[not found] ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-12 20:38 ` Parav Pandit
2017-03-14 15:02 ` Ursula Braun
[not found] ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-14 15:24 ` Parav Pandit
[not found] ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-16 11:51 ` Ursula Braun
[not found] ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-20 21:04 ` Parav Pandit
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.