All of lore.kernel.org
 help / color / mirror / Atom feed
* mlx5_ib_post_send panic on s390x
@ 2017-02-24  9:51 Ursula Braun
       [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
       [not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com>
  0 siblings, 2 replies; 15+ messages in thread
From: Ursula Braun @ 2017-02-24  9:51 UTC (permalink / raw)
  To: matamb-VPRAkNaXOzVWk0Htik3J/w, leonro-VPRAkNaXOzVWk0Htik3J/w
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Saeed and Matan,

up to now I run SMC-R traffic on Connect X3, which works.
But when switching to Connect X4, the first mlx5_ib_post_send() fails:

[  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space
[  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803
[  247.787664] Fault in home space mode while using kernel ASCE.
[  247.787667] AS:00000000011ec007 R3:0000000000000024 
[  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP 
[  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
[  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
[  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
[  247.787743] Workqueue: events smc_listen_work [smc]
[  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000
[  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
[  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
[  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
[  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
[  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
[  247.787764] Krnl Code: 0000000000762404: b9040012		lgr	%r1,%r2
                          0000000000762408: a7740008		brc	7,762418
                         #000000000076240c: c05000000011	larl	%r5,76242e
                         >0000000000762412: 44405000		ex	%r4,0(%r5)
                          0000000000762416: 07fe		bcr	15,%r14
                          0000000000762418: d2ff10003000	mvc	0(256,%r1),0(%r3)
                          000000000076241e: 41101100		la	%r1,256(%r1)
                          0000000000762422: 41303100		la	%r3,256(%r3)
[  247.787780] Call Trace:
[  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib])
[  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] 
[  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] 
[  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] 
[  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780 
[  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478 
[  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170 
[  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc 
[  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc 
[  247.787806] INFO: lockdep is turned off.
[  247.787807] Last Breaking-Event-Address:
[  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0
[  247.787813]  
[  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops

The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to
an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses
require extra access functions (see arch/s390/include/asm/io.h).

Kind regards, Ursula Braun (IBM Germany)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: mlx5_ib_post_send panic on s390x
       [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-02-24 17:28   ` Eli Cohen
       [not found]     ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-02-24 17:28 UTC (permalink / raw)
  To: Ursula Braun, matamb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi,

Can you please send details of the work request you are posting? I assume you are using inline, right?

-----Original Message-----
From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun
Sent: Friday, February 24, 2017 3:52 AM
To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Subject: mlx5_ib_post_send panic on s390x

Hi Saeed and Matan,

up to now I run SMC-R traffic on Connect X3, which works.
But when switching to Connect X4, the first mlx5_ib_post_send() fails:

[  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [  247.787664] Fault in home space mode while using kernel ASCE.
[  247.787667] AS:00000000011ec007 R3:0000000000000024 [  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
[  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
[  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
[  247.787743] Workqueue: events smc_listen_work [smc] [  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
[  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
[  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
[  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
[  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
[  247.787764] Krnl Code: 0000000000762404: b9040012		lgr	%r1,%r2
                          0000000000762408: a7740008		brc	7,762418
                         #000000000076240c: c05000000011	larl	%r5,76242e
                         >0000000000762412: 44405000		ex	%r4,0(%r5)
                          0000000000762416: 07fe		bcr	15,%r14
                          0000000000762418: d2ff10003000	mvc	0(256,%r1),0(%r3)
                          000000000076241e: 41101100		la	%r1,256(%r1)
                          0000000000762422: 41303100		la	%r3,256(%r3)
[  247.787780] Call Trace:
[  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780 [  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478 [  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170 [  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [  247.787806] INFO: lockdep is turned off.
[  247.787807] Last Breaking-Event-Address:
[  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0 [  247.787813] [  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops

The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).

Kind regards, Ursula Braun (IBM Germany)

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mlx5_ib_post_send panic on s390x
       [not found]     ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-06 11:17       ` Ursula Braun
       [not found]         ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 11:17 UTC (permalink / raw)
  To: Eli Cohen, matanb-VPRAkNaXOzVWk0Htik3J/w, Leon Romanovsky
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


On 02/24/2017 06:28 PM, Eli Cohen wrote:
> Hi,
> 
> Can you please send details of the work request you are posting? I assume you are using inline, right?
yes, inline is used:

                lnk->wr_tx_sges[i].addr =
                        lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
                lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
                lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
                lnk->wr_tx_ibs[i].next = NULL;
                lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
                lnk->wr_tx_ibs[i].num_sge = 1;
                lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
                lnk->wr_tx_ibs[i].send_flags =
                        IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;

> 
> -----Original Message-----
> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
> Sent: Friday, February 24, 2017 3:52 AM
> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: mlx5_ib_post_send panic on s390x
> 
> Hi Saeed and Matan,
> 
> up to now I run SMC-R traffic on Connect X3, which works.
> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
> 
> [  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [  247.787664] Fault in home space mode while using kernel ASCE.
> [  247.787667] AS:00000000011ec007 R3:0000000000000024 [  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
> [  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
> [  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
> [  247.787743] Workqueue: events smc_listen_work [smc] [  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
> [  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
> [  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
> [  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
> [  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
> [  247.787764] Krnl Code: 0000000000762404: b9040012		lgr	%r1,%r2
>                           0000000000762408: a7740008		brc	7,762418
>                          #000000000076240c: c05000000011	larl	%r5,76242e
>                          >0000000000762412: 44405000		ex	%r4,0(%r5)
>                           0000000000762416: 07fe		bcr	15,%r14
>                           0000000000762418: d2ff10003000	mvc	0(256,%r1),0(%r3)
>                           000000000076241e: 41101100		la	%r1,256(%r1)
>                           0000000000762422: 41303100		la	%r3,256(%r3)
> [  247.787780] Call Trace:
> [  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780 [  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478 [  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170 [  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [  247.787806] INFO: lockdep is turned off.
> [  247.787807] Last Breaking-Event-Address:
> [  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0 [  247.787813] [  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
> 
> Kind regards, Ursula Braun (IBM Germany)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: mlx5_ib_post_send panic on s390x
       [not found]         ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-06 12:56           ` Eli Cohen
       [not found]             ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-03-06 12:56 UTC (permalink / raw)
  To: Ursula Braun, Matan Barak, Leon Romanovsky
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Please send information on:

- The size of the required inline data in the offending work request
- The transport service used
- How many outstanding work requests the send queue is configured to
- What was the serial number of the work request that triggered this oops (first, second, 65th etc).

-----Original Message-----
From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] 
Sent: Monday, March 6, 2017 5:17 AM
To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>; Leon Romanovsky <leonro@mellanox.com>
Cc: linux-rdma@vger.kernel.org
Subject: Re: mlx5_ib_post_send panic on s390x


On 02/24/2017 06:28 PM, Eli Cohen wrote:
> Hi,
> 
> Can you please send details of the work request you are posting? I assume you are using inline, right?
yes, inline is used:

                lnk->wr_tx_sges[i].addr =
                        lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
                lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
                lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
                lnk->wr_tx_ibs[i].next = NULL;
                lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
                lnk->wr_tx_ibs[i].num_sge = 1;
                lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
                lnk->wr_tx_ibs[i].send_flags =
                        IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;

> 
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Ursula Braun
> Sent: Friday, February 24, 2017 3:52 AM
> To: matamb@mellanox.com; Leon Romanovsky <leonro@mellanox.com>
> Cc: linux-rdma@vger.kernel.org
> Subject: mlx5_ib_post_send panic on s390x
> 
> Hi Saeed and Matan,
> 
> up to now I run SMC-R traffic on Connect X3, which works.
> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
> 
> [  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [  247.787664] Fault in home space mode while using kernel ASCE.
> [  247.787667] AS:00000000011ec007 R3:0000000000000024 [  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
> [  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
> [  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
> [  247.787743] Workqueue: events smc_listen_work [smc] [  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
> [  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
> [  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
> [  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
> [  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
> [  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
> [  247.787764] Krnl Code: 0000000000762404: b9040012		lgr	%r1,%r2
>                           0000000000762408: a7740008		brc	7,762418
>                          #000000000076240c: c05000000011	larl	%r5,76242e
>                          >0000000000762412: 44405000		ex	%r4,0(%r5)
>                           0000000000762416: 07fe		bcr	15,%r14
>                           0000000000762418: d2ff10003000	mvc	0(256,%r1),0(%r3)
>                           000000000076241e: 41101100		la	%r1,256(%r1)
>                           0000000000762422: 41303100		la	%r3,256(%r3)
> [  247.787780] Call Trace:
> [  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780 [  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478 [  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170 [  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [  247.787806] INFO: lockdep is turned off.
> [  247.787807] Last Breaking-Event-Address:
> [  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0 [  247.787813] [  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
> 
> Kind regards, Ursula Braun (IBM Germany)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: mlx5_ib_post_send panic on s390x
       [not found]     ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-03-06 13:03       ` Ursula Braun
       [not found]         ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 13:03 UTC (permalink / raw)
  To: Matan Barak (External)
  Cc: Saeed Mahameed (saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org),
	Eli Cohen, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 02/26/2017 10:45 AM, Matan Barak (External) wrote:
> On 24/02/2017 12:27, Ursula Braun wrote:
>> sorry, typo in the mail address.
>>
>> -------- Forwarded Message --------
>> Subject: mlx5_ib_post_send panic on s390x
>> Date: Fri, 24 Feb 2017 10:51:32 +0100
>> From: Ursula Braun <ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
>> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org
>> CC: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>
>> Hi Saeed and Matan,
>>
>> up to now I run SMC-R traffic on Connect X3, which works.
>> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>>
>> [  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space
>> [  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803
>> [  247.787664] Fault in home space mode while using kernel ASCE.
>> [  247.787667] AS:00000000011ec007 R3:0000000000000024
>> [  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP
>> [  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
>> [  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
>> [  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
>> [  247.787743] Workqueue: events smc_listen_work [smc]
>> [  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000
>> [  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
>> [  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
>> [  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
>> [  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
>> [  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
>> [  247.787764] Krnl Code: 0000000000762404: b9040012        lgr    %r1,%r2
>>                           0000000000762408: a7740008        brc    7,762418
>>                          #000000000076240c: c05000000011    larl    %r5,76242e
>>                          >0000000000762412: 44405000        ex    %r4,0(%r5)
>>                           0000000000762416: 07fe        bcr    15,%r14
>>                           0000000000762418: d2ff10003000    mvc    0(256,%r1),0(%r3)
>>                           000000000076241e: 41101100        la    %r1,256(%r1)
>>                           0000000000762422: 41303100        la    %r3,256(%r3)
>> [  247.787780] Call Trace:
>> [  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib])
>> [  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc]
>> [  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc]
>> [  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc]
>> [  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780
>> [  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478
>> [  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170
>> [  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc
>> [  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc
>> [  247.787806] INFO: lockdep is turned off.
>> [  247.787807] Last Breaking-Event-Address:
>> [  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0
>> [  247.787813]
>> [  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to
>> an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses
>> require extra access functions (see arch/s390/include/asm/io.h).
>>
> 
> So I guess memcpy_toio is required here, right?
> Since we don't have a s390 based system, could you please test this?
memcpy_toio() did not help. Then I replaced the memcpy-calls in set_data_inl_seg()
by this preliminary test code (just to give an idea, not a real patch proposal):

static void *memcpy_usc(void *dest, const void *src, size_t count)
{
        char *tmp_dest = (char *)dest;
        char *tmp_src = (char *)src;
        int copied = 0;
        u32 tmp_u32;

        while (copied < count) {
                tmp_u32 = __raw_readl(tmp_src);
                __raw_writel(tmp_u32, tmp_dest);
                copied += sizeof(tmp_u32);
                tmp_dest += sizeof(tmp_u32);
                tmp_src += sizeof(tmp_u32);
        }
        return dest;
}

This helped; the first mlx5_ib_post_send code initiated from SMC-code (type IB_WR_SEND,
flagged with IB_SEND_INLINE, length 44 bytes) run successful.

A following mlx5_ib_post_send call of type RDMA_WRITE seems to stall later on, but
this is something I have to analyze in more detail.

> 
>> Kind regards, Ursula Braun (IBM Germany)
>>
> 
> Thanks for notifying.
> 
> Matan
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]         ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-06 13:08           ` Eli Cohen
       [not found]             ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Eli Cohen @ 2017-03-06 13:08 UTC (permalink / raw)
  To: Ursula Braun, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an 
>> address belonging to an area mapped with the ib_dma_map_single() 
>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>

By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: mlx5_ib_post_send panic on s390x
       [not found]             ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-06 13:47               ` Ursula Braun
  0 siblings, 0 replies; 15+ messages in thread
From: Ursula Braun @ 2017-03-06 13:47 UTC (permalink / raw)
  To: Eli Cohen, Matan Barak, Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 03/06/2017 01:56 PM, Eli Cohen wrote:
> Please send information on:
> 
> - The size of the required inline data in the offending work request
44 bytes (ib_createqp with ib_qp_init_attr.cap.max_inline_data=44)
> - The transport service used
IB_QPT_RC
> - How many outstanding work requests the send queue is configured to
ib_create_cq with ib_cq_init_attr.cqe=32768
ib_create_qp with ib_qp_init_attr.cap.max_send_wr=16
> - What was the serial number of the work request that triggered this oops (first, second, 65th etc).
serial number wr_id=1
> 
> -----Original Message-----
> From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org] 
> Sent: Monday, March 6, 2017 5:17 AM
> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: Re: mlx5_ib_post_send panic on s390x
> 
> 
> On 02/24/2017 06:28 PM, Eli Cohen wrote:
>> Hi,
>>
>> Can you please send details of the work request you are posting? I assume you are using inline, right?
> yes, inline is used:
> 
>                 lnk->wr_tx_sges[i].addr =
>                         lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
>                 lnk->wr_tx_sges[i].length = SMC_WR_TX_SIZE;
>                 lnk->wr_tx_sges[i].lkey = lnk->roce_pd->local_dma_lkey;
>                 lnk->wr_tx_ibs[i].next = NULL;
>                 lnk->wr_tx_ibs[i].sg_list = &lnk->wr_tx_sges[i];
>                 lnk->wr_tx_ibs[i].num_sge = 1;
>                 lnk->wr_tx_ibs[i].opcode = IB_WR_SEND;
>                 lnk->wr_tx_ibs[i].send_flags =
>                         IB_SEND_SIGNALED | IB_SEND_SOLICITED | IB_SEND_INLINE;
> 
>>
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>> Sent: Friday, February 24, 2017 3:52 AM
>> To: matamb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org; Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: mlx5_ib_post_send panic on s390x
>>
>> Hi Saeed and Matan,
>>
>> up to now I run SMC-R traffic on Connect X3, which works.
>> But when switching to Connect X4, the first mlx5_ib_post_send() fails:
>>
>> [  247.787660] Unable to handle kernel pointer dereference in virtual kernel address space [  247.787662] Failing address: 000000010484a000 TEID: 000000010484a803 [  247.787664] Fault in home space mode while using kernel ASCE.
>> [  247.787667] AS:00000000011ec007 R3:0000000000000024 [  247.787701] Oops: 003b ilc:2 [#1] PREEMPT SMP [  247.787704] Modules linked in: smc_diag smc xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter rpcrdma rdma_ucm ib_ucm ib_uverbs rdma_cm configfs ib_cm iw_cm mlx5_ib ib_core mlx5_core xts gf128mul cbc ecb aes_s390 des_s390 des_generic ptp sha512_s390 pps_core sha256_s390 sha1_s390 sha_common eadm_sch nfsd auth_rpcgss oid_registry nfs_acl lockd vhost_net tun grace vhost sunrpc macvtap sch_fq_codel macvlan dm_multipath kvm dm_mod ip_tables x_tables autofs4
>> [  247.787738] CPU: 0 PID: 10498 Comm: kworker/0:3 Tainted: G        W       4.10.0uschi+ #4
>> [  247.787739] Hardware name: IBM              2964 N96              704              (LPAR)
>> [  247.787743] Workqueue: events smc_listen_work [smc] [  247.787745] task: 00000000b4148008 task.stack: 0000000099c2c000 [  247.787746] Krnl PSW : 0404c00180000000 0000000000762412 (memcpy+0x22/0x48)
>> [  247.787751]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
>> [  247.787753] Krnl GPRS: 0000000000a7a100 0000000099c96414 0000000099c96414 000000010484afc8
>> [  247.787755]            000000000000002b 000000000076242e 000000000000002c 0000000099c96440
>> [  247.787757]            000000010484afc8 000000000000002c 0000000099c96414 0000000000000001
>> [  247.787758]            00000000ae8a75d0 000003ff8108aa50 000003ff8107cde6 0000000099c2fa38
>> [  247.787764] Krnl Code: 0000000000762404: b9040012		lgr	%r1,%r2
>>                           0000000000762408: a7740008		brc	7,762418
>>                          #000000000076240c: c05000000011	larl	%r5,76242e
>>                          >0000000000762412: 44405000		ex	%r4,0(%r5)
>>                           0000000000762416: 07fe		bcr	15,%r14
>>                           0000000000762418: d2ff10003000	mvc	0(256,%r1),0(%r3)
>>                           000000000076241e: 41101100		la	%r1,256(%r1)
>>                           0000000000762422: 41303100		la	%r3,256(%r3)
>> [  247.787780] Call Trace:
>> [  247.787785] ([<000003ff8107cdd4>] mlx5_ib_post_send+0x139c/0x1810 [mlx5_ib]) [  247.787789]  [<000003ff8047999a>] smc_wr_tx_send+0xd2/0x100 [smc] [  247.787792]  [<000003ff8047a97a>] smc_llc_send_confirm_link+0x9a/0xd0 [smc] [  247.787794]  [<000003ff804751ee>] smc_listen_work+0x24e/0x4e0 [smc] [  247.787797]  [<00000000001659e8>] process_one_work+0x3d8/0x780 [  247.787799]  [<0000000000166044>] worker_thread+0x2b4/0x478 [  247.787801]  [<000000000016e62c>] kthread+0x15c/0x170 [  247.787803]  [<0000000000a115f2>] kernel_thread_starter+0x6/0xc [  247.787804]  [<0000000000a115ec>] kernel_thread_starter+0x0/0xc [  247.787806] INFO: lockdep is turned off.
>> [  247.787807] Last Breaking-Event-Address:
>> [  247.787811]  [<000003ff8106edc0>] 0x3ff8106edc0 [  247.787813] [  247.787814] Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>> The address provided by SMC-code in struct ib_send_wr *wr is an address belonging to an area mapped with the ib_dma_map_single() call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>
>> Kind regards, Ursula Braun (IBM Germany)
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: mlx5_ib_post_send panic on s390x
       [not found]             ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-09  9:54               ` Ursula Braun
       [not found]                 ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-09  9:54 UTC (permalink / raw)
  To: Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA



On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>
>>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>>> The address provided by SMC-code in struct ib_send_wr *wr is an 
>>> address belonging to an area mapped with the ib_dma_map_single() 
>>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>>
> 
> By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.
>
In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                 ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-09 12:58                   ` Eli Cohen
  2017-03-12 20:20                   ` Parav Pandit
  1 sibling, 0 replies; 15+ messages in thread
From: Eli Cohen @ 2017-03-09 12:58 UTC (permalink / raw)
  To: Ursula Braun, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Yes, for mlx4 it is ignored.

-----Original Message-----
From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com] 
Sent: Thursday, March 9, 2017 3:54 AM
To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky <leonro@mellanox.com>; linux-rdma@vger.kernel.org
Subject: Re: Fwd: mlx5_ib_post_send panic on s390x



On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>
>>> The problem seems to be caused by the usage of plain memcpy in set_data_inl_seg().
>>> The address provided by SMC-code in struct ib_send_wr *wr is an 
>>> address belonging to an area mapped with the ib_dma_map_single() 
>>> call. On s390x those kind of addresses require extra access functions (see arch/s390/include/asm/io.h).
>>>
> 
> By definition, when you are posting a send request with inline, the address must be mapped to the cpu so plain memcpy should work.
>
In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and thus I needed the ib_dma_map_single() call for the area used with ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE flag, if I want to run the same SMC-R code with both, Connect X3 cards and Connect X4 cards?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                 ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2017-03-09 12:58                   ` Eli Cohen
@ 2017-03-12 20:20                   ` Parav Pandit
       [not found]                     ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2017-03-12 20:20 UTC (permalink / raw)
  To: Ursula Braun, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Ursula,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Ursula Braun
> Sent: Thursday, March 9, 2017 3:54 AM
> To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> 
> 
> 
> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>
> >>> The problem seems to be caused by the usage of plain memcpy in
> set_data_inl_seg().
> >>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>> address belonging to an area mapped with the ib_dma_map_single()
> >>> call. On s390x those kind of addresses require extra access functions (see
> arch/s390/include/asm/io.h).
> >>>
> >
> > By definition, when you are posting a send request with inline, the address
> must be mapped to the cpu so plain memcpy should work.
> >
> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to
> contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean
> for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and
> thus I needed the ib_dma_map_single() call for the area used with
> ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE
> flag, if I want to run the same SMC-R code with both, Connect X3 cards and
> Connect X4 cards?
> 
I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64.
Shall I submit below fix to netdev mailing list?
I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr.

-               lnk->wr_tx_sges[i].addr =
-                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
+               lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i);

I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base.
I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
Let me know.

Regards,
Parav Pandit

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index a2e4ca5..0d984f5 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	unsigned long flags;
 	int nreq;
 	int err = 0;
+	int inl = 0;
 	unsigned ind;
 	int uninitialized_var(stamp);
 	int uninitialized_var(size);
@@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		default:
 			break;
 		}
+		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
+			struct mlx4_wqe_inline_seg *seg;
+			void *addr;
+			int len, seg_len;
+			int num_seg;
+			int off, to_copy;
+
+			inl = 0;
+
+			seg = wqe;
+			wqe += sizeof *seg;
+			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
+			num_seg = 0;
+			seg_len = 0;
+
+			for (i = 0; i < wr->num_sge; ++i) {
+				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
+				len  = wr->sg_list[i].length;
+				inl += len;
+
+				if (inl > 16) {
+					inl = 0;
+					err = ENOMEM;
+					*bad_wr = wr;
+					goto out;
+				}
 
-		/*
-		 * Write data segments in reverse order, so as to
-		 * overwrite cacheline stamp last within each
-		 * cacheline.  This avoids issues with WQE
-		 * prefetching.
-		 */
+				while (len >= MLX4_INLINE_ALIGN - off) {
+					to_copy = MLX4_INLINE_ALIGN - off;
+					memcpy(wqe, addr, to_copy);
+					len -= to_copy;
+					wqe += to_copy;
+					addr += to_copy;
+					seg_len += to_copy;
+					wmb(); /* see comment below */
+					seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
+					seg_len = 0;
+					seg = wqe;
+					wqe += sizeof *seg;
+					off = sizeof *seg;
+					++num_seg;
+				}
 
-		dseg = wqe;
-		dseg += wr->num_sge - 1;
-		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
+				memcpy(wqe, addr, len);
+				wqe += len;
+				seg_len += len;
+				off += len;
+			}
 
-		/* Add one more inline data segment for ICRC for MLX sends */
-		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
-			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
-			     qp->mlx4_ib_qp_type &
-			     (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
-			set_mlx_icrc_seg(dseg + 1);
-			size += sizeof (struct mlx4_wqe_data_seg) / 16;
-		}
+			if (seg_len) {
+				++num_seg;
+				/*
+				 * Need a barrier here to make sure
+				 * all the data is visible before the
+				 * byte_count field is set.  Otherwise
+				 * the HCA prefetcher could grab the
+				 * 64-byte chunk with this inline
+				 * segment and get a valid (!=
+				 * 0xffffffff) byte count but stale
+				 * data, and end up sending the wrong
+				 * data.
+				 */
+				wmb();
+				seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
+			}
 
-		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
-			set_data_seg(dseg, wr->sg_list + i);
+			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
+		} else {
+			/*
+			 * Write data segments in reverse order, so as to
+			 * overwrite cacheline stamp last within each
+			 * cacheline.  This avoids issues with WQE
+			 * prefetching.
+			 */
+
+			dseg = wqe;
+			dseg += wr->num_sge - 1;
+			size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
+
+			/* Add one more inline data segment for ICRC for MLX sends */
+			if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
+				     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
+				     qp->mlx4_ib_qp_type &
+				     (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
+				set_mlx_icrc_seg(dseg + 1);
+				size += sizeof (struct mlx4_wqe_data_seg) / 16;
+			}
 
+			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
+				set_data_seg(dseg, wr->sg_list + i);
+		}
 		/*
 		 * Possibly overwrite stamping in cacheline with LSO
 		 * segment only after making sure all data segments

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                     ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-12 20:38                       ` Parav Pandit
  2017-03-14 15:02                       ` Ursula Braun
  1 sibling, 0 replies; 15+ messages in thread
From: Parav Pandit @ 2017-03-12 20:38 UTC (permalink / raw)
  To: Parav Pandit, Ursula Braun, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

I meant mlx4_ib* driver below. Sorry for typo.

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Parav Pandit
> Sent: Sunday, March 12, 2017 3:21 PM
> To: Ursula Braun <ubraun@linux.vnet.ibm.com>; Eli Cohen
> <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: RE: Fwd: mlx5_ib_post_send panic on s390x
> 
> Hi Ursula,
> 
> > -----Original Message-----
> > From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > owner@vger.kernel.org] On Behalf Of Ursula Braun
> > Sent: Thursday, March 9, 2017 3:54 AM
> > To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> > Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> > <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> > Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >
> >
> >
> > On 03/06/2017 02:08 PM, Eli Cohen wrote:
> > >>>
> > >>> The problem seems to be caused by the usage of plain memcpy in
> > set_data_inl_seg().
> > >>> The address provided by SMC-code in struct ib_send_wr *wr is an
> > >>> address belonging to an area mapped with the ib_dma_map_single()
> > >>> call. On s390x those kind of addresses require extra access
> > >>> functions (see
> > arch/s390/include/asm/io.h).
> > >>>
> > >
> > > By definition, when you are posting a send request with inline, the
> > > address
> > must be mapped to the cpu so plain memcpy should work.
> > >
> > In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> > not seem to contain extra coding for IB_SEND_INLINE flag for
> > ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
> > IB_SEND_INLINE flag is ignored, and thus I needed the
> > ib_dma_map_single() call for the area used with ib_post_send()? Does
> > this mean I should stay away from the IB_SEND_INLINE flag, if I want
> > to run the same SMC-R code with both, Connect X3 cards and Connect X4
> cards?
> >
> I had encountered the same kernel panic that you mentioned last week on
> ConnectX-4 adapters with smc-r on x86_64.
> Shall I submit below fix to netdev mailing list?
> I have tested above change. I also have optimization that avoids dma mapping
> for wr_tx_dma_addr.
> 
> -               lnk->wr_tx_sges[i].addr =
> -                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> +               lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs +
> + i);
> 
> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
> kernel base.
> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> Let me know.
> 
> Regards,
> Parav Pandit
> 
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index a2e4ca5..0d984f5 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
>  	unsigned long flags;
>  	int nreq;
>  	int err = 0;
> +	int inl = 0;
>  	unsigned ind;
>  	int uninitialized_var(stamp);
>  	int uninitialized_var(size);
> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
>  		default:
>  			break;
>  		}
> +		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> +			struct mlx4_wqe_inline_seg *seg;
> +			void *addr;
> +			int len, seg_len;
> +			int num_seg;
> +			int off, to_copy;
> +
> +			inl = 0;
> +
> +			seg = wqe;
> +			wqe += sizeof *seg;
> +			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> +			num_seg = 0;
> +			seg_len = 0;
> +
> +			for (i = 0; i < wr->num_sge; ++i) {
> +				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> +				len  = wr->sg_list[i].length;
> +				inl += len;
> +
> +				if (inl > 16) {
> +					inl = 0;
> +					err = ENOMEM;
> +					*bad_wr = wr;
> +					goto out;
> +				}
> 
> -		/*
> -		 * Write data segments in reverse order, so as to
> -		 * overwrite cacheline stamp last within each
> -		 * cacheline.  This avoids issues with WQE
> -		 * prefetching.
> -		 */
> +				while (len >= MLX4_INLINE_ALIGN - off) {
> +					to_copy = MLX4_INLINE_ALIGN - off;
> +					memcpy(wqe, addr, to_copy);
> +					len -= to_copy;
> +					wqe += to_copy;
> +					addr += to_copy;
> +					seg_len += to_copy;
> +					wmb(); /* see comment below */
> +					seg->byte_count =
> htonl(MLX4_INLINE_SEG | seg_len);
> +					seg_len = 0;
> +					seg = wqe;
> +					wqe += sizeof *seg;
> +					off = sizeof *seg;
> +					++num_seg;
> +				}
> 
> -		dseg = wqe;
> -		dseg += wr->num_sge - 1;
> -		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> 16);
> +				memcpy(wqe, addr, len);
> +				wqe += len;
> +				seg_len += len;
> +				off += len;
> +			}
> 
> -		/* Add one more inline data segment for ICRC for MLX sends */
> -		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> -			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> -			     qp->mlx4_ib_qp_type &
> -			     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> -			set_mlx_icrc_seg(dseg + 1);
> -			size += sizeof (struct mlx4_wqe_data_seg) / 16;
> -		}
> +			if (seg_len) {
> +				++num_seg;
> +				/*
> +				 * Need a barrier here to make sure
> +				 * all the data is visible before the
> +				 * byte_count field is set.  Otherwise
> +				 * the HCA prefetcher could grab the
> +				 * 64-byte chunk with this inline
> +				 * segment and get a valid (!=
> +				 * 0xffffffff) byte count but stale
> +				 * data, and end up sending the wrong
> +				 * data.
> +				 */
> +				wmb();
> +				seg->byte_count = htonl(MLX4_INLINE_SEG |
> seg_len);
> +			}
> 
> -		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> -			set_data_seg(dseg, wr->sg_list + i);
> +			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> +		} else {
> +			/*
> +			 * Write data segments in reverse order, so as to
> +			 * overwrite cacheline stamp last within each
> +			 * cacheline.  This avoids issues with WQE
> +			 * prefetching.
> +			 */
> +
> +			dseg = wqe;
> +			dseg += wr->num_sge - 1;
> +			size += wr->num_sge * (sizeof (struct
> mlx4_wqe_data_seg) / 16);
> +
> +			/* Add one more inline data segment for ICRC for MLX
> sends */
> +			if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
> ||
> +				     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
> ||
> +				     qp->mlx4_ib_qp_type &
> +				     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> +				set_mlx_icrc_seg(dseg + 1);
> +				size += sizeof (struct mlx4_wqe_data_seg) / 16;
> +			}
> 
> +			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> +				set_data_seg(dseg, wr->sg_list + i);
> +		}
>  		/*
>  		 * Possibly overwrite stamping in cacheline with LSO
>  		 * segment only after making sure all data segments
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> \x04 {.n +       +%  lzwm  b 맲  r  zX  \x1aݙ   \x17  ܨ}   Ơz &j:+v        zZ+  +zf   h   ~    i   z \x1e w   ?
> & )ߢ^[f

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                     ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  2017-03-12 20:38                       ` Parav Pandit
@ 2017-03-14 15:02                       ` Ursula Braun
       [not found]                         ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  1 sibling, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-14 15:02 UTC (permalink / raw)
  To: Parav Pandit, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Parav,

I tried your mlx4-patch together with SMC on s390x, but it failed.
The SMC-R code tries to send 44 bytes as inline in 1 sge.
I wonder about a length check with 16 bytes, which probably explains the failure.
See my question below in the patch:

On 03/12/2017 09:20 PM, Parav Pandit wrote:
> Hi Ursula,
> 
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>> Sent: Thursday, March 9, 2017 3:54 AM
>> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>
>>
>>
>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>>>
>>>>> The problem seems to be caused by the usage of plain memcpy in
>> set_data_inl_seg().
>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>>>> address belonging to an area mapped with the ib_dma_map_single()
>>>>> call. On s390x those kind of addresses require extra access functions (see
>> arch/s390/include/asm/io.h).
>>>>>
>>>
>>> By definition, when you are posting a send request with inline, the address
>> must be mapped to the cpu so plain memcpy should work.
>>>
>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does not seem to
>> contain extra coding for IB_SEND_INLINE flag for ib_post_send. Does this mean
>> for SMC-R to run on Connect X3 cards the IB_SEND_INLINE flag is ignored, and
>> thus I needed the ib_dma_map_single() call for the area used with
>> ib_post_send()? Does this mean I should stay away from the IB_SEND_INLINE
>> flag, if I want to run the same SMC-R code with both, Connect X3 cards and
>> Connect X4 cards?
>>
> I had encountered the same kernel panic that you mentioned last week on ConnectX-4 adapters with smc-r on x86_64.
> Shall I submit below fix to netdev mailing list?
> I have tested above change. I also have optimization that avoids dma mapping for wr_tx_dma_addr.
> 
> -               lnk->wr_tx_sges[i].addr =
> -                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> +               lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs + i);
> 
> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older kernel base.
> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> Let me know.
> 
> Regards,
> Parav Pandit
> 
> diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
> index a2e4ca5..0d984f5 100644
> --- a/drivers/infiniband/hw/mlx4/qp.c
> +++ b/drivers/infiniband/hw/mlx4/qp.c
> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  	unsigned long flags;
>  	int nreq;
>  	int err = 0;
> +	int inl = 0;
>  	unsigned ind;
>  	int uninitialized_var(stamp);
>  	int uninitialized_var(size);
> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
>  		default:
>  			break;
>  		}
> +		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> +			struct mlx4_wqe_inline_seg *seg;
> +			void *addr;
> +			int len, seg_len;
> +			int num_seg;
> +			int off, to_copy;
> +
> +			inl = 0;
> +
> +			seg = wqe;
> +			wqe += sizeof *seg;
> +			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> +			num_seg = 0;
> +			seg_len = 0;
> +
> +			for (i = 0; i < wr->num_sge; ++i) {
> +				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> +				len  = wr->sg_list[i].length;
> +				inl += len;
> +
> +				if (inl > 16) {
> +					inl = 0;
> +					err = ENOMEM;
> +					*bad_wr = wr;
> +					goto out;
> +				}
SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for IB_SEND_INLINE data?
The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function does not
seem to return an error.
>  
> -		/*
> -		 * Write data segments in reverse order, so as to
> -		 * overwrite cacheline stamp last within each
> -		 * cacheline.  This avoids issues with WQE
> -		 * prefetching.
> -		 */
> +				while (len >= MLX4_INLINE_ALIGN - off) {
> +					to_copy = MLX4_INLINE_ALIGN - off;
> +					memcpy(wqe, addr, to_copy);
> +					len -= to_copy;
> +					wqe += to_copy;
> +					addr += to_copy;
> +					seg_len += to_copy;
> +					wmb(); /* see comment below */
> +					seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
> +					seg_len = 0;
> +					seg = wqe;
> +					wqe += sizeof *seg;
> +					off = sizeof *seg;
> +					++num_seg;
> +				}
>  
> -		dseg = wqe;
> -		dseg += wr->num_sge - 1;
> -		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
> +				memcpy(wqe, addr, len);
> +				wqe += len;
> +				seg_len += len;
> +				off += len;
> +			}
>  
> -		/* Add one more inline data segment for ICRC for MLX sends */
> -		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> -			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> -			     qp->mlx4_ib_qp_type &
> -			     (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
> -			set_mlx_icrc_seg(dseg + 1);
> -			size += sizeof (struct mlx4_wqe_data_seg) / 16;
> -		}
> +			if (seg_len) {
> +				++num_seg;
> +				/*
> +				 * Need a barrier here to make sure
> +				 * all the data is visible before the
> +				 * byte_count field is set.  Otherwise
> +				 * the HCA prefetcher could grab the
> +				 * 64-byte chunk with this inline
> +				 * segment and get a valid (!=
> +				 * 0xffffffff) byte count but stale
> +				 * data, and end up sending the wrong
> +				 * data.
> +				 */
> +				wmb();
> +				seg->byte_count = htonl(MLX4_INLINE_SEG | seg_len);
> +			}
>  
> -		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> -			set_data_seg(dseg, wr->sg_list + i);
> +			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> +		} else {
> +			/*
> +			 * Write data segments in reverse order, so as to
> +			 * overwrite cacheline stamp last within each
> +			 * cacheline.  This avoids issues with WQE
> +			 * prefetching.
> +			 */
> +
> +			dseg = wqe;
> +			dseg += wr->num_sge - 1;
> +			size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) / 16);
> +
> +			/* Add one more inline data segment for ICRC for MLX sends */
> +			if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> +				     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> +				     qp->mlx4_ib_qp_type &
> +				     (MLX4_IB_QPT_PROXY_SMI_OWNER | MLX4_IB_QPT_TUN_SMI_OWNER))) {
> +				set_mlx_icrc_seg(dseg + 1);
> +				size += sizeof (struct mlx4_wqe_data_seg) / 16;
> +			}
>  
> +			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> +				set_data_seg(dseg, wr->sg_list + i);
> +		}
>  		/*
>  		 * Possibly overwrite stamping in cacheline with LSO
>  		 * segment only after making sure all data segments
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body
>> of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                         ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-14 15:24                           ` Parav Pandit
       [not found]                             ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Parav Pandit @ 2017-03-14 15:24 UTC (permalink / raw)
  To: Ursula Braun, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Ursula,


> -----Original Message-----
> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> Sent: Tuesday, March 14, 2017 10:02 AM
> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> 
> Hi Parav,
> 
> I tried your mlx4-patch together with SMC on s390x, but it failed.
> The SMC-R code tries to send 44 bytes as inline in 1 sge.
> I wonder about a length check with 16 bytes, which probably explains the
> failure.
> See my question below in the patch:
> 
> On 03/12/2017 09:20 PM, Parav Pandit wrote:
> > Hi Ursula,
> >
> >> -----Original Message-----
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Ursula Braun
> >> Sent: Thursday, March 9, 2017 3:54 AM
> >> To: Eli Cohen <eli@mellanox.com>; Matan Barak <matanb@mellanox.com>
> >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>
> >>
> >>
> >> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>>>
> >>>>> The problem seems to be caused by the usage of plain memcpy in
> >> set_data_inl_seg().
> >>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>>>> address belonging to an area mapped with the ib_dma_map_single()
> >>>>> call. On s390x those kind of addresses require extra access
> >>>>> functions (see
> >> arch/s390/include/asm/io.h).
> >>>>>
> >>>
> >>> By definition, when you are posting a send request with inline, the
> >>> address
> >> must be mapped to the cpu so plain memcpy should work.
> >>>
> >> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> >> not seem to contain extra coding for IB_SEND_INLINE flag for
> >> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
> >> IB_SEND_INLINE flag is ignored, and thus I needed the
> >> ib_dma_map_single() call for the area used with ib_post_send()? Does
> >> this mean I should stay away from the IB_SEND_INLINE flag, if I want
> >> to run the same SMC-R code with both, Connect X3 cards and Connect X4
> cards?
> >>
> > I had encountered the same kernel panic that you mentioned last week on
> ConnectX-4 adapters with smc-r on x86_64.
> > Shall I submit below fix to netdev mailing list?
> > I have tested above change. I also have optimization that avoids dma mapping
> for wr_tx_dma_addr.
> >
> > -               lnk->wr_tx_sges[i].addr =
> > -                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> > +               lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs
> > + + i);
> >
> > I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
> kernel base.
> > I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
> > Let me know.
> >
> > Regards,
> > Parav Pandit
> >
> > diff --git a/drivers/infiniband/hw/mlx4/qp.c
> > b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
> > --- a/drivers/infiniband/hw/mlx4/qp.c
> > +++ b/drivers/infiniband/hw/mlx4/qp.c
> > @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> >  	unsigned long flags;
> >  	int nreq;
> >  	int err = 0;
> > +	int inl = 0;
> >  	unsigned ind;
> >  	int uninitialized_var(stamp);
> >  	int uninitialized_var(size);
> > @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
> ib_send_wr *wr,
> >  		default:
> >  			break;
> >  		}
> > +		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> > +			struct mlx4_wqe_inline_seg *seg;
> > +			void *addr;
> > +			int len, seg_len;
> > +			int num_seg;
> > +			int off, to_copy;
> > +
> > +			inl = 0;
> > +
> > +			seg = wqe;
> > +			wqe += sizeof *seg;
> > +			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> > +			num_seg = 0;
> > +			seg_len = 0;
> > +
> > +			for (i = 0; i < wr->num_sge; ++i) {
> > +				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> > +				len  = wr->sg_list[i].length;
> > +				inl += len;
> > +
> > +				if (inl > 16) {
> > +					inl = 0;
> > +					err = ENOMEM;
> > +					*bad_wr = wr;
> > +					goto out;
> > +				}
> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
> IB_SEND_INLINE data?
> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function
> does not seem to return an error.
> >
This check should be for max_inline_data variable of the QP.
This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes.
I will fix this. Is it possible to change to 44 and do quick test?
Final patch will have right check in addition to check in create_qp?

> > -		/*
> > -		 * Write data segments in reverse order, so as to
> > -		 * overwrite cacheline stamp last within each
> > -		 * cacheline.  This avoids issues with WQE
> > -		 * prefetching.
> > -		 */
> > +				while (len >= MLX4_INLINE_ALIGN - off) {
> > +					to_copy = MLX4_INLINE_ALIGN - off;
> > +					memcpy(wqe, addr, to_copy);
> > +					len -= to_copy;
> > +					wqe += to_copy;
> > +					addr += to_copy;
> > +					seg_len += to_copy;
> > +					wmb(); /* see comment below */
> > +					seg->byte_count =
> htonl(MLX4_INLINE_SEG | seg_len);
> > +					seg_len = 0;
> > +					seg = wqe;
> > +					wqe += sizeof *seg;
> > +					off = sizeof *seg;
> > +					++num_seg;
> > +				}
> >
> > -		dseg = wqe;
> > -		dseg += wr->num_sge - 1;
> > -		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> 16);
> > +				memcpy(wqe, addr, len);
> > +				wqe += len;
> > +				seg_len += len;
> > +				off += len;
> > +			}
> >
> > -		/* Add one more inline data segment for ICRC for MLX sends */
> > -		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> > -			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> > -			     qp->mlx4_ib_qp_type &
> > -			     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> > -			set_mlx_icrc_seg(dseg + 1);
> > -			size += sizeof (struct mlx4_wqe_data_seg) / 16;
> > -		}
> > +			if (seg_len) {
> > +				++num_seg;
> > +				/*
> > +				 * Need a barrier here to make sure
> > +				 * all the data is visible before the
> > +				 * byte_count field is set.  Otherwise
> > +				 * the HCA prefetcher could grab the
> > +				 * 64-byte chunk with this inline
> > +				 * segment and get a valid (!=
> > +				 * 0xffffffff) byte count but stale
> > +				 * data, and end up sending the wrong
> > +				 * data.
> > +				 */
> > +				wmb();
> > +				seg->byte_count = htonl(MLX4_INLINE_SEG |
> seg_len);
> > +			}
> >
> > -		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> > -			set_data_seg(dseg, wr->sg_list + i);
> > +			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> > +		} else {
> > +			/*
> > +			 * Write data segments in reverse order, so as to
> > +			 * overwrite cacheline stamp last within each
> > +			 * cacheline.  This avoids issues with WQE
> > +			 * prefetching.
> > +			 */
> > +
> > +			dseg = wqe;
> > +			dseg += wr->num_sge - 1;
> > +			size += wr->num_sge * (sizeof (struct
> mlx4_wqe_data_seg) / 16);
> > +
> > +			/* Add one more inline data segment for ICRC for MLX
> sends */
> > +			if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
> ||
> > +				     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
> ||
> > +				     qp->mlx4_ib_qp_type &
> > +				     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> > +				set_mlx_icrc_seg(dseg + 1);
> > +				size += sizeof (struct mlx4_wqe_data_seg) / 16;
> > +			}
> >
> > +			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> > +				set_data_seg(dseg, wr->sg_list + i);
> > +		}
> >  		/*
> >  		 * Possibly overwrite stamping in cacheline with LSO
> >  		 * segment only after making sure all data segments
> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >> in the body of a message to majordomo@vger.kernel.org More majordomo
> >> info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                             ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2017-03-16 11:51                               ` Ursula Braun
       [not found]                                 ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Ursula Braun @ 2017-03-16 11:51 UTC (permalink / raw)
  To: Parav Pandit, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

Hi Parav,

I run your new mlx4-Code together with changed SMC-R code no longer mapping 
the IB_SEND_INLINE area. It worked - great!

Below I have added a small improvement idea in your patch.

Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag
in the SMC-R code, since there is no guarantee that this will work with
all kinds of RoCE-devices. The maximum length for IB_SEND_INLINE depends
on the RoCE-driver - right? Is there an interface to determine such a
maximum length? Would ib_create_qp() return with an error, if the
SMC-R specified .cap.max_inline_data = 44 is not supported by a RoCE-driver?

On 03/14/2017 04:24 PM, Parav Pandit wrote:
> Hi Ursula,
> 
> 
>> -----Original Message-----
>> From: Ursula Braun [mailto:ubraun-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org]
>> Sent: Tuesday, March 14, 2017 10:02 AM
>> To: Parav Pandit <parav-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>;
>> Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>
>> Hi Parav,
>>
>> I tried your mlx4-patch together with SMC on s390x, but it failed.
>> The SMC-R code tries to send 44 bytes as inline in 1 sge.
>> I wonder about a length check with 16 bytes, which probably explains the
>> failure.
>> See my question below in the patch:
>>
>> On 03/12/2017 09:20 PM, Parav Pandit wrote:
>>> Hi Ursula,
>>>
>>>> -----Original Message-----
>>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Ursula Braun
>>>> Sent: Thursday, March 9, 2017 3:54 AM
>>>> To: Eli Cohen <eli-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; Leon Romanovsky
>>>> <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
>>>>
>>>>
>>>>
>>>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
>>>>>>>
>>>>>>> The problem seems to be caused by the usage of plain memcpy in
>>>> set_data_inl_seg().
>>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
>>>>>>> address belonging to an area mapped with the ib_dma_map_single()
>>>>>>> call. On s390x those kind of addresses require extra access
>>>>>>> functions (see
>>>> arch/s390/include/asm/io.h).
>>>>>>>
>>>>>
>>>>> By definition, when you are posting a send request with inline, the
>>>>> address
>>>> must be mapped to the cpu so plain memcpy should work.
>>>>>
>>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
>>>> not seem to contain extra coding for IB_SEND_INLINE flag for
>>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards the
>>>> IB_SEND_INLINE flag is ignored, and thus I needed the
>>>> ib_dma_map_single() call for the area used with ib_post_send()? Does
>>>> this mean I should stay away from the IB_SEND_INLINE flag, if I want
>>>> to run the same SMC-R code with both, Connect X3 cards and Connect X4
>> cards?
>>>>
>>> I had encountered the same kernel panic that you mentioned last week on
>> ConnectX-4 adapters with smc-r on x86_64.
>>> Shall I submit below fix to netdev mailing list?
>>> I have tested above change. I also have optimization that avoids dma mapping
>> for wr_tx_dma_addr.
>>>
>>> -               lnk->wr_tx_sges[i].addr =
>>> -                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
>>> +               lnk->wr_tx_sges[i].addr = (uintptr_t)(lnk->wr_tx_bufs
>>> + + i);
>>>
>>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on little older
>> kernel base.
>>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib driver.
>>> Let me know.
>>>
>>> Regards,
>>> Parav Pandit
>>>
>>> diff --git a/drivers/infiniband/hw/mlx4/qp.c
>>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
>>> --- a/drivers/infiniband/hw/mlx4/qp.c
>>> +++ b/drivers/infiniband/hw/mlx4/qp.c
>>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
>> ib_send_wr *wr,
>>>  	unsigned long flags;
>>>  	int nreq;
>>>  	int err = 0;
>>> +	int inl = 0;
>>>  	unsigned ind;
>>>  	int uninitialized_var(stamp);
>>>  	int uninitialized_var(size);
>>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct
>> ib_send_wr *wr,
>>>  		default:
>>>  			break;
>>>  		}
>>> +		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
>>> +			struct mlx4_wqe_inline_seg *seg;
>>> +			void *addr;
>>> +			int len, seg_len;
>>> +			int num_seg;
>>> +			int off, to_copy;
>>> +
>>> +			inl = 0;
>>> +
>>> +			seg = wqe;
>>> +			wqe += sizeof *seg;
>>> +			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
>>> +			num_seg = 0;
>>> +			seg_len = 0;
>>> +
>>> +			for (i = 0; i < wr->num_sge; ++i) {
>>> +				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
>>> +				len  = wr->sg_list[i].length;
>>> +				inl += len;
>>> +
>>> +				if (inl > 16) {
>>> +					inl = 0;
>>> +					err = ENOMEM;
>>> +					*bad_wr = wr;
>>> +					goto out;
>>> +				}
>> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
>> IB_SEND_INLINE data?
>> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the function
>> does not seem to return an error.
>>>
> This check should be for max_inline_data variable of the QP.
> This was just for error check, I should have fixed it. I was testing with nvme where inline data was only worth 16 bytes.
> I will fix this. Is it possible to change to 44 and do quick test?
> Final patch will have right check in addition to check in create_qp?
> 
>>> -		/*
>>> -		 * Write data segments in reverse order, so as to
>>> -		 * overwrite cacheline stamp last within each
>>> -		 * cacheline.  This avoids issues with WQE
>>> -		 * prefetching.
>>> -		 */
>>> +				while (len >= MLX4_INLINE_ALIGN - off) {
With this code there are 2 memcpy-Calls, one with to_copy=44, and the next one with len 0.
I suggest to change the check to "len > MLX4_INLINE_ALIGN - off".
>>> +					to_copy = MLX4_INLINE_ALIGN - off;
>>> +					memcpy(wqe, addr, to_copy);
>>> +					len -= to_copy;
>>> +					wqe += to_copy;
>>> +					addr += to_copy;
>>> +					seg_len += to_copy;
>>> +					wmb(); /* see comment below */
>>> +					seg->byte_count =
>> htonl(MLX4_INLINE_SEG | seg_len);
>>> +					seg_len = 0;
>>> +					seg = wqe;
>>> +					wqe += sizeof *seg;
>>> +					off = sizeof *seg;
>>> +					++num_seg;
>>> +				}
>>>
>>> -		dseg = wqe;
>>> -		dseg += wr->num_sge - 1;
>>> -		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
>> 16);
>>> +				memcpy(wqe, addr, len);
>>> +				wqe += len;
>>> +				seg_len += len;
>>> +				off += len;
>>> +			}
>>>
>>> -		/* Add one more inline data segment for ICRC for MLX sends */
>>> -		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
>>> -			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
>>> -			     qp->mlx4_ib_qp_type &
>>> -			     (MLX4_IB_QPT_PROXY_SMI_OWNER |
>> MLX4_IB_QPT_TUN_SMI_OWNER))) {
>>> -			set_mlx_icrc_seg(dseg + 1);
>>> -			size += sizeof (struct mlx4_wqe_data_seg) / 16;
>>> -		}
>>> +			if (seg_len) {
>>> +				++num_seg;
>>> +				/*
>>> +				 * Need a barrier here to make sure
>>> +				 * all the data is visible before the
>>> +				 * byte_count field is set.  Otherwise
>>> +				 * the HCA prefetcher could grab the
>>> +				 * 64-byte chunk with this inline
>>> +				 * segment and get a valid (!=
>>> +				 * 0xffffffff) byte count but stale
>>> +				 * data, and end up sending the wrong
>>> +				 * data.
>>> +				 */
>>> +				wmb();
>>> +				seg->byte_count = htonl(MLX4_INLINE_SEG |
>> seg_len);
>>> +			}
>>>
>>> -		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
>>> -			set_data_seg(dseg, wr->sg_list + i);
>>> +			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
>>> +		} else {
>>> +			/*
>>> +			 * Write data segments in reverse order, so as to
>>> +			 * overwrite cacheline stamp last within each
>>> +			 * cacheline.  This avoids issues with WQE
>>> +			 * prefetching.
>>> +			 */
>>> +
>>> +			dseg = wqe;
>>> +			dseg += wr->num_sge - 1;
>>> +			size += wr->num_sge * (sizeof (struct
>> mlx4_wqe_data_seg) / 16);
>>> +
>>> +			/* Add one more inline data segment for ICRC for MLX
>> sends */
>>> +			if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI
>> ||
>>> +				     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI
>> ||
>>> +				     qp->mlx4_ib_qp_type &
>>> +				     (MLX4_IB_QPT_PROXY_SMI_OWNER |
>> MLX4_IB_QPT_TUN_SMI_OWNER))) {
>>> +				set_mlx_icrc_seg(dseg + 1);
>>> +				size += sizeof (struct mlx4_wqe_data_seg) / 16;
>>> +			}
>>>
>>> +			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
>>> +				set_data_seg(dseg, wr->sg_list + i);
>>> +		}
>>>  		/*
>>>  		 * Possibly overwrite stamping in cacheline with LSO
>>>  		 * segment only after making sure all data segments
>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
>>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: Fwd: mlx5_ib_post_send panic on s390x
       [not found]                                 ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2017-03-20 21:04                                   ` Parav Pandit
  0 siblings, 0 replies; 15+ messages in thread
From: Parav Pandit @ 2017-03-20 21:04 UTC (permalink / raw)
  To: Ursula Braun, Eli Cohen, Matan Barak
  Cc: Saeed Mahameed, Leon Romanovsky, linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 11077 bytes --]

Hi Ursula,

For the suggestion it still need to continue to check for len >= INLINE_ALIGN - off because 44 = 64-20.
Which is still a  valid case (len == inline - off).
But I agree that it shouldn't do 2nd memcpy with zero length.
Therefore there should be additional check for len != 0.

Coming to IB_SEND_INLINE_DATA part, when ib_create_qp is called and if HCA doesn't support cap.max_inline_data, provider HCA driver is supposed to fail the call.
And ULP is expected to do fallback to non_inline scheme.

As it appears mlx4 driver is not failing this call, which is a bug that needs fix.
Instead of failing the call, I prefer to provide the data path sooner based on my inline patch in this email thread.

Parav

> -----Original Message-----
> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> Sent: Thursday, March 16, 2017 6:51 AM
> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> Matan Barak <matanb@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> 
> Hi Parav,
> 
> I run your new mlx4-Code together with changed SMC-R code no longer
> mapping the IB_SEND_INLINE area. It worked - great!
> 
> Below I have added a small improvement idea in your patch.
> 
> Nevertheless I am still not sure, if I should keep the IB_SEND_INLINE flag in
> the SMC-R code, since there is no guarantee that this will work with all kinds
> of RoCE-devices. The maximum length for IB_SEND_INLINE depends on the
> RoCE-driver - right? Is there an interface to determine such a maximum
> length? Would ib_create_qp() return with an error, if the SMC-R specified
> .cap.max_inline_data = 44 is not supported by a RoCE-driver?
> 
> On 03/14/2017 04:24 PM, Parav Pandit wrote:
> > Hi Ursula,
> >
> >
> >> -----Original Message-----
> >> From: Ursula Braun [mailto:ubraun@linux.vnet.ibm.com]
> >> Sent: Tuesday, March 14, 2017 10:02 AM
> >> To: Parav Pandit <parav@mellanox.com>; Eli Cohen <eli@mellanox.com>;
> >> Matan Barak <matanb@mellanox.com>
> >> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>
> >> Hi Parav,
> >>
> >> I tried your mlx4-patch together with SMC on s390x, but it failed.
> >> The SMC-R code tries to send 44 bytes as inline in 1 sge.
> >> I wonder about a length check with 16 bytes, which probably explains
> >> the failure.
> >> See my question below in the patch:
> >>
> >> On 03/12/2017 09:20 PM, Parav Pandit wrote:
> >>> Hi Ursula,
> >>>
> >>>> -----Original Message-----
> >>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >>>> owner@vger.kernel.org] On Behalf Of Ursula Braun
> >>>> Sent: Thursday, March 9, 2017 3:54 AM
> >>>> To: Eli Cohen <eli@mellanox.com>; Matan Barak
> <matanb@mellanox.com>
> >>>> Cc: Saeed Mahameed <saeedm@mellanox.com>; Leon Romanovsky
> >>>> <leonro@mellanox.com>; linux-rdma@vger.kernel.org
> >>>> Subject: Re: Fwd: mlx5_ib_post_send panic on s390x
> >>>>
> >>>>
> >>>>
> >>>> On 03/06/2017 02:08 PM, Eli Cohen wrote:
> >>>>>>>
> >>>>>>> The problem seems to be caused by the usage of plain memcpy in
> >>>> set_data_inl_seg().
> >>>>>>> The address provided by SMC-code in struct ib_send_wr *wr is an
> >>>>>>> address belonging to an area mapped with the
> ib_dma_map_single()
> >>>>>>> call. On s390x those kind of addresses require extra access
> >>>>>>> functions (see
> >>>> arch/s390/include/asm/io.h).
> >>>>>>>
> >>>>>
> >>>>> By definition, when you are posting a send request with inline,
> >>>>> the address
> >>>> must be mapped to the cpu so plain memcpy should work.
> >>>>>
> >>>> In the past I run SMC-R with Connect X3 cards. The mlx4 driver does
> >>>> not seem to contain extra coding for IB_SEND_INLINE flag for
> >>>> ib_post_send. Does this mean for SMC-R to run on Connect X3 cards
> >>>> the IB_SEND_INLINE flag is ignored, and thus I needed the
> >>>> ib_dma_map_single() call for the area used with ib_post_send()?
> >>>> Does this mean I should stay away from the IB_SEND_INLINE flag, if
> >>>> I want to run the same SMC-R code with both, Connect X3 cards and
> >>>> Connect X4
> >> cards?
> >>>>
> >>> I had encountered the same kernel panic that you mentioned last week
> >>> on
> >> ConnectX-4 adapters with smc-r on x86_64.
> >>> Shall I submit below fix to netdev mailing list?
> >>> I have tested above change. I also have optimization that avoids dma
> >>> mapping
> >> for wr_tx_dma_addr.
> >>>
> >>> -               lnk->wr_tx_sges[i].addr =
> >>> -                       lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
> >>> +               lnk->wr_tx_sges[i].addr =
> >>> + (uintptr_t)(lnk->wr_tx_bufs
> >>> + + i);
> >>>
> >>> I also have fix for processing IB_SEND_INLINE in mlx4 driver on
> >>> little older
> >> kernel base.
> >>> I have attached below. I can rebase my kernel and provide fix in mlx5_ib
> driver.
> >>> Let me know.
> >>>
> >>> Regards,
> >>> Parav Pandit
> >>>
> >>> diff --git a/drivers/infiniband/hw/mlx4/qp.c
> >>> b/drivers/infiniband/hw/mlx4/qp.c index a2e4ca5..0d984f5 100644
> >>> --- a/drivers/infiniband/hw/mlx4/qp.c
> >>> +++ b/drivers/infiniband/hw/mlx4/qp.c
> >>> @@ -2748,6 +2748,7 @@ int mlx4_ib_post_send(struct ib_qp *ibqp,
> >>> struct
> >> ib_send_wr *wr,
> >>>  	unsigned long flags;
> >>>  	int nreq;
> >>>  	int err = 0;
> >>> +	int inl = 0;
> >>>  	unsigned ind;
> >>>  	int uninitialized_var(stamp);
> >>>  	int uninitialized_var(size);
> >>> @@ -2958,30 +2959,97 @@ int mlx4_ib_post_send(struct ib_qp *ibqp,
> >>> struct
> >> ib_send_wr *wr,
> >>>  		default:
> >>>  			break;
> >>>  		}
> >>> +		if (wr->send_flags & IB_SEND_INLINE && wr->num_sge) {
> >>> +			struct mlx4_wqe_inline_seg *seg;
> >>> +			void *addr;
> >>> +			int len, seg_len;
> >>> +			int num_seg;
> >>> +			int off, to_copy;
> >>> +
> >>> +			inl = 0;
> >>> +
> >>> +			seg = wqe;
> >>> +			wqe += sizeof *seg;
> >>> +			off = ((uintptr_t) wqe) & (MLX4_INLINE_ALIGN - 1);
> >>> +			num_seg = 0;
> >>> +			seg_len = 0;
> >>> +
> >>> +			for (i = 0; i < wr->num_sge; ++i) {
> >>> +				addr = (void *) (uintptr_t) wr->sg_list[i].addr;
> >>> +				len  = wr->sg_list[i].length;
> >>> +				inl += len;
> >>> +
> >>> +				if (inl > 16) {
> >>> +					inl = 0;
> >>> +					err = ENOMEM;
> >>> +					*bad_wr = wr;
> >>> +					goto out;
> >>> +				}
> >> SMC-R fails due to this check. inl is 44 here. Why is 16 a limit for
> >> IB_SEND_INLINE data?
> >> The SMC-R code calls ib_create_qp() with max_inline_data=44. And the
> >> function does not seem to return an error.
> >>>
> > This check should be for max_inline_data variable of the QP.
> > This was just for error check, I should have fixed it. I was testing with nvme
> where inline data was only worth 16 bytes.
> > I will fix this. Is it possible to change to 44 and do quick test?
> > Final patch will have right check in addition to check in create_qp?
> >
> >>> -		/*
> >>> -		 * Write data segments in reverse order, so as to
> >>> -		 * overwrite cacheline stamp last within each
> >>> -		 * cacheline.  This avoids issues with WQE
> >>> -		 * prefetching.
> >>> -		 */
> >>> +				while (len >= MLX4_INLINE_ALIGN - off) {
> With this code there are 2 memcpy-Calls, one with to_copy=44, and the next
> one with len 0.
> I suggest to change the check to "len > MLX4_INLINE_ALIGN - off".
> >>> +					to_copy = MLX4_INLINE_ALIGN - off;
> >>> +					memcpy(wqe, addr, to_copy);
> >>> +					len -= to_copy;
> >>> +					wqe += to_copy;
> >>> +					addr += to_copy;
> >>> +					seg_len += to_copy;
> >>> +					wmb(); /* see comment below */
> >>> +					seg->byte_count =
> >> htonl(MLX4_INLINE_SEG | seg_len);
> >>> +					seg_len = 0;
> >>> +					seg = wqe;
> >>> +					wqe += sizeof *seg;
> >>> +					off = sizeof *seg;
> >>> +					++num_seg;
> >>> +				}
> >>>
> >>> -		dseg = wqe;
> >>> -		dseg += wr->num_sge - 1;
> >>> -		size += wr->num_sge * (sizeof (struct mlx4_wqe_data_seg) /
> >> 16);
> >>> +				memcpy(wqe, addr, len);
> >>> +				wqe += len;
> >>> +				seg_len += len;
> >>> +				off += len;
> >>> +			}
> >>>
> >>> -		/* Add one more inline data segment for ICRC for MLX sends
> */
> >>> -		if (unlikely(qp->mlx4_ib_qp_type == MLX4_IB_QPT_SMI ||
> >>> -			     qp->mlx4_ib_qp_type == MLX4_IB_QPT_GSI ||
> >>> -			     qp->mlx4_ib_qp_type &
> >>> -			     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> >> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> >>> -			set_mlx_icrc_seg(dseg + 1);
> >>> -			size += sizeof (struct mlx4_wqe_data_seg) / 16;
> >>> -		}
> >>> +			if (seg_len) {
> >>> +				++num_seg;
> >>> +				/*
> >>> +				 * Need a barrier here to make sure
> >>> +				 * all the data is visible before the
> >>> +				 * byte_count field is set.  Otherwise
> >>> +				 * the HCA prefetcher could grab the
> >>> +				 * 64-byte chunk with this inline
> >>> +				 * segment and get a valid (!=
> >>> +				 * 0xffffffff) byte count but stale
> >>> +				 * data, and end up sending the wrong
> >>> +				 * data.
> >>> +				 */
> >>> +				wmb();
> >>> +				seg->byte_count = htonl(MLX4_INLINE_SEG
> |
> >> seg_len);
> >>> +			}
> >>>
> >>> -		for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> >>> -			set_data_seg(dseg, wr->sg_list + i);
> >>> +			size += (inl + num_seg * sizeof (*seg) + 15) / 16;
> >>> +		} else {
> >>> +			/*
> >>> +			 * Write data segments in reverse order, so as to
> >>> +			 * overwrite cacheline stamp last within each
> >>> +			 * cacheline.  This avoids issues with WQE
> >>> +			 * prefetching.
> >>> +			 */
> >>> +
> >>> +			dseg = wqe;
> >>> +			dseg += wr->num_sge - 1;
> >>> +			size += wr->num_sge * (sizeof (struct
> >> mlx4_wqe_data_seg) / 16);
> >>> +
> >>> +			/* Add one more inline data segment for ICRC for
> MLX
> >> sends */
> >>> +			if (unlikely(qp->mlx4_ib_qp_type ==
> MLX4_IB_QPT_SMI
> >> ||
> >>> +				     qp->mlx4_ib_qp_type ==
> MLX4_IB_QPT_GSI
> >> ||
> >>> +				     qp->mlx4_ib_qp_type &
> >>> +				     (MLX4_IB_QPT_PROXY_SMI_OWNER |
> >> MLX4_IB_QPT_TUN_SMI_OWNER))) {
> >>> +				set_mlx_icrc_seg(dseg + 1);
> >>> +				size += sizeof (struct mlx4_wqe_data_seg) /
> 16;
> >>> +			}
> >>>
> >>> +			for (i = wr->num_sge - 1; i >= 0; --i, --dseg)
> >>> +				set_data_seg(dseg, wr->sg_list + i);
> >>> +		}
> >>>  		/*
> >>>  		 * Possibly overwrite stamping in cacheline with LSO
> >>>  		 * segment only after making sure all data segments
> >>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >>>> in the body of a message to majordomo@vger.kernel.org More
> >>>> majordomo info at http://vger.kernel.org/majordomo-info.html
> >

N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-03-20 21:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-24  9:51 mlx5_ib_post_send panic on s390x Ursula Braun
     [not found] ` <56246ac0-a706-291c-7baa-a6dd2c6331cd-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-02-24 17:28   ` Eli Cohen
     [not found]     ` <AM4PR0501MB2787E2BB6C8CBBCA5DCE9E82C5520-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-06 11:17       ` Ursula Braun
     [not found]         ` <ea211a05-f26a-e7a7-27b4-fc5edc2e3b57-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-06 12:56           ` Eli Cohen
     [not found]             ` <AM4PR0501MB27879C1EBF26FBF02F088AD7C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-06 13:47               ` Ursula Braun
     [not found] ` <dcc90daa-b932-8957-d8bc-e1f02d04e03a@linux.vnet.ibm.com>
     [not found]   ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6@mellanox.com>
     [not found]     ` <20e4f31e-b2a7-89fb-d4c0-583c0dc1efb6-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-03-06 13:03       ` Fwd: " Ursula Braun
     [not found]         ` <491cf3e1-b2f8-3695-ecd4-3d34b0ae9e25-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-06 13:08           ` Eli Cohen
     [not found]             ` <AM4PR0501MB278723F1BF4DA9846C664C62C52C0-dp/nxUn679jFcPxmzbbP+MDSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-09  9:54               ` Ursula Braun
     [not found]                 ` <e57691e1-55bc-308a-fc91-0a8072218dd5-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-09 12:58                   ` Eli Cohen
2017-03-12 20:20                   ` Parav Pandit
     [not found]                     ` <VI1PR0502MB300817FC6256218DE800497BD1220-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-12 20:38                       ` Parav Pandit
2017-03-14 15:02                       ` Ursula Braun
     [not found]                         ` <04049739-a008-f7c7-4f7a-30616fbf787a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-14 15:24                           ` Parav Pandit
     [not found]                             ` <VI1PR0502MB30081C4618B1905B82247F05D1240-o1MPJYiShExKsLr+rGaxW8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2017-03-16 11:51                               ` Ursula Braun
     [not found]                                 ` <8e791524-dd66-629d-7f44-9050d9c7715a-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2017-03-20 21:04                                   ` Parav Pandit

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.