All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU
@ 2019-01-23 13:12 Nicolas Morey-Chaisemartin
  2019-01-23 16:51 ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2019-01-23 13:12 UTC (permalink / raw)
  To: linux-rdma; +Cc: linux-nfs, chuck.lever

Make sure host has at least 2 CPU before allocating to CPU#1

Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
---
 net/sunrpc/xprtrdma/verbs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b725911c0f3f..36aa7b2648e4 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -546,7 +546,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
 
 	sendcq = ib_alloc_cq(ia->ri_device, NULL,
 			     ep->rep_attr.cap.max_send_wr + 1,
-			     1, IB_POLL_WORKQUEUE);
+			     num_online_cpus() > 1 ? 1 : 0, IB_POLL_WORKQUEUE);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
 		dprintk("RPC:       %s: failed to create send CQ: %i\n",
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU
  2019-01-23 13:12 [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU Nicolas Morey-Chaisemartin
@ 2019-01-23 16:51 ` Chuck Lever
  2019-01-23 17:06   ` Nicolas Morey-Chaisemartin
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Lever @ 2019-01-23 16:51 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin; +Cc: linux-rdma, Linux NFS Mailing List

Hi Nicolas-

> On Jan 23, 2019, at 8:12 AM, Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> wrote:
> 
> Make sure host has at least 2 CPU before allocating to CPU#1

The fourth parameter of ib_alloc_cq() is not a CPU number,
it's a completion vector number. What failure did you see
that prompted this patch?


> Fixes: a4699f5647f3 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
> Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
> ---
> net/sunrpc/xprtrdma/verbs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index b725911c0f3f..36aa7b2648e4 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -546,7 +546,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
> 
> 	sendcq = ib_alloc_cq(ia->ri_device, NULL,
> 			     ep->rep_attr.cap.max_send_wr + 1,
> -			     1, IB_POLL_WORKQUEUE);
> +			     num_online_cpus() > 1 ? 1 : 0, IB_POLL_WORKQUEUE);
> 	if (IS_ERR(sendcq)) {
> 		rc = PTR_ERR(sendcq);
> 		dprintk("RPC:       %s: failed to create send CQ: %i\n",
> -- 
> 2.18.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU
  2019-01-23 16:51 ` Chuck Lever
@ 2019-01-23 17:06   ` Nicolas Morey-Chaisemartin
  2019-01-23 17:07     ` Nicolas Morey-Chaisemartin
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2019-01-23 17:06 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, linux-rdma


[-- Attachment #1.1: Type: text/plain, Size: 817 bytes --]



On 1/23/19 5:51 PM, Chuck Lever wrote:
> Hi Nicolas-
>
>> On Jan 23, 2019, at 8:12 AM, Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> wrote:
>>
>> Make sure host has at least 2 CPU before allocating to CPU#1
> The fourth parameter of ib_alloc_cq() is not a CPU number,
> it's a completion vector number. What failure did you see
> that prompted this patch?
When trying to mount, I get this:
+ mount -o rdma,port=20049 192.168.20.15:/tmp/RAM /tmp/RAM
mount.nfs: mounting 192.168.20.15:/tmp/RAM failed, reason given by server: No such file or directory

Digging a bit into the code, it appears that the cq allocation here returns a ENOENT which come from mlx5_vector2eqn.
On my system (VM with a mlx5 card with SRIOV), the comp_eqs_list only contains one entry with index == 0

Nicolas


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU
  2019-01-23 17:06   ` Nicolas Morey-Chaisemartin
@ 2019-01-23 17:07     ` Nicolas Morey-Chaisemartin
  2019-01-23 17:30       ` Chuck Lever
  0 siblings, 1 reply; 5+ messages in thread
From: Nicolas Morey-Chaisemartin @ 2019-01-23 17:07 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List, linux-rdma


[-- Attachment #1.1: Type: text/plain, Size: 1003 bytes --]



On 1/23/19 6:06 PM, Nicolas Morey-Chaisemartin wrote:
>
> On 1/23/19 5:51 PM, Chuck Lever wrote:
>> Hi Nicolas-
>>
>>> On Jan 23, 2019, at 8:12 AM, Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> wrote:
>>>
>>> Make sure host has at least 2 CPU before allocating to CPU#1
>> The fourth parameter of ib_alloc_cq() is not a CPU number,
>> it's a completion vector number. What failure did you see
>> that prompted this patch?
> When trying to mount, I get this:
> + mount -o rdma,port=20049 192.168.20.15:/tmp/RAM /tmp/RAM
> mount.nfs: mounting 192.168.20.15:/tmp/RAM failed, reason given by server: No such file or directory
>
> Digging a bit into the code, it appears that the cq allocation here returns a ENOENT which come from mlx5_vector2eqn.
> On my system (VM with a mlx5 card with SRIOV), the comp_eqs_list only contains one entry with index == 0
>
> Nicolas
>

Also, adding a 2nd core to my VM fixes the issue (thus my understanding that it was a CPU number)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU
  2019-01-23 17:07     ` Nicolas Morey-Chaisemartin
@ 2019-01-23 17:30       ` Chuck Lever
  0 siblings, 0 replies; 5+ messages in thread
From: Chuck Lever @ 2019-01-23 17:30 UTC (permalink / raw)
  To: Nicolas Morey-Chaisemartin; +Cc: Linux NFS Mailing List, linux-rdma



> On Jan 23, 2019, at 12:07 PM, Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.de> wrote:
> 
> 
> 
> On 1/23/19 6:06 PM, Nicolas Morey-Chaisemartin wrote:
>> 
>> On 1/23/19 5:51 PM, Chuck Lever wrote:
>>> Hi Nicolas-
>>> 
>>>> On Jan 23, 2019, at 8:12 AM, Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com> wrote:
>>>> 
>>>> Make sure host has at least 2 CPU before allocating to CPU#1
>>> The fourth parameter of ib_alloc_cq() is not a CPU number,
>>> it's a completion vector number. What failure did you see
>>> that prompted this patch?
>> When trying to mount, I get this:
>> + mount -o rdma,port=20049 192.168.20.15:/tmp/RAM /tmp/RAM
>> mount.nfs: mounting 192.168.20.15:/tmp/RAM failed, reason given by server: No such file or directory
>> 
>> Digging a bit into the code, it appears that the cq allocation here returns a ENOENT which come from mlx5_vector2eqn.
>> On my system (VM with a mlx5 card with SRIOV), the comp_eqs_list only contains one entry with index == 0
>> 
>> Nicolas
>> 
> 
> Also, adding a 2nd core to my VM fixes the issue (thus my understanding that it was a CPU number)

Fair enough. The 2nd CPU adds a 2nd compvec. Instead of
num_cpus_online() you want ib_device::num_comp_vectors.

I suspect there's a spiffier way to go about this these
days thanks to ib_get_vector_affinity, but you've found
a longstanding bug. So let's get something that can be
comfortably backported to stable.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-01-23 17:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-23 13:12 [PATCH] xprtrdma: Make sure Send CQ is allocated on an existing CPU Nicolas Morey-Chaisemartin
2019-01-23 16:51 ` Chuck Lever
2019-01-23 17:06   ` Nicolas Morey-Chaisemartin
2019-01-23 17:07     ` Nicolas Morey-Chaisemartin
2019-01-23 17:30       ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.