linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
@ 2019-07-23 19:13 Chuck Lever
  2019-07-24  5:47 ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2019-07-23 19:13 UTC (permalink / raw)
  To: linux-rdma, linux-nfs

Send and Receive completion is handled on a single CPU selected at
the time each Completion Queue is allocated. Typically this is when
an initiator instantiates an RDMA transport, or when a target
accepts an RDMA connection.

Some ULPs cannot open a connection per CPU to spread completion
workload across available CPUs. For these ULPs, allow the RDMA core
to select a completion vector based on the device's complement of
available comp_vecs.

When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
available, a different CPU will be selected for each Completion
Queue. For the moment, a simple round-robin mechanism is used.

Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
 include/rdma/ib_verbs.h                  |    3 +++
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
 net/sunrpc/xprtrdma/verbs.c              |    5 ++---
 4 files changed, 28 insertions(+), 6 deletions(-)

Jason-

If this patch is acceptable to all, then I would expect you to take
it through the RDMA tree.


diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index 7c599878ccf7..a89d549490c4 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
 	queue_work(cq->comp_wq, &cq->work);
 }
 
+/*
+ * Attempt to spread ULP completion queues over a device's completion
+ * vectors so that all available CPU cores can help service the device's
+ * interrupt workload. This mechanism may be improved at a later point
+ * to dynamically take into account the system's actual workload.
+ */
+static int ib_get_comp_vector(struct ib_device *dev)
+{
+	static atomic_t cv;
+
+	if (dev->num_comp_vectors > 1)
+		return atomic_inc_return(&cv) % dev->num_comp_vectors;
+	return 0;
+}
+
 /**
  * __ib_alloc_cq_user - allocate a completion queue
  * @dev:		device to allocate the CQ for
  * @private:		driver private data, accessible from cq->cq_context
  * @nr_cqe:		number of CQEs to allocate
- * @comp_vector:	HCA completion vectors for this CQ
+ * @comp_vector:	HCA completion vector for this CQ
  * @poll_ctx:		context to poll the CQ from.
  * @caller:		module owner name.
  * @udata:		Valid user data or NULL for kernel object
@@ -208,6 +223,9 @@ struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
 	cq->res.type = RDMA_RESTRACK_CQ;
 	rdma_restrack_set_task(&cq->res, caller);
 
+	if (comp_vector == RDMA_CORE_ANY_COMPVEC)
+		cq_attr.comp_vector = ib_get_comp_vector(dev);
+
 	ret = dev->ops.create_cq(cq, &cq_attr, NULL);
 	if (ret)
 		goto out_free_wc;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c5f8a9f17063..547d36bcef7e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3669,6 +3669,9 @@ static inline int ib_post_recv(struct ib_qp *qp,
 	return qp->device->ops.post_recv(qp, recv_wr, bad_recv_wr ? : &dummy);
 }
 
+/* Tell the RDMA core to select an appropriate comp_vector */
+#define RDMA_CORE_ANY_COMPVEC	((int)(-1))
+
 struct ib_cq *__ib_alloc_cq_user(struct ib_device *dev, void *private,
 				 int nr_cqe, int comp_vector,
 				 enum ib_poll_context poll_ctx,
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 3fe665152d95..7df6de6e9162 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -455,13 +455,15 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		goto errout;
 	}
 	newxprt->sc_sq_cq = ib_alloc_cq(dev, newxprt, newxprt->sc_sq_depth,
-					0, IB_POLL_WORKQUEUE);
+					RDMA_CORE_ANY_COMPVEC,
+					IB_POLL_WORKQUEUE);
 	if (IS_ERR(newxprt->sc_sq_cq)) {
 		dprintk("svcrdma: error creating SQ CQ for connect request\n");
 		goto errout;
 	}
 	newxprt->sc_rq_cq = ib_alloc_cq(dev, newxprt, rq_depth,
-					0, IB_POLL_WORKQUEUE);
+					RDMA_CORE_ANY_COMPVEC,
+					IB_POLL_WORKQUEUE);
 	if (IS_ERR(newxprt->sc_rq_cq)) {
 		dprintk("svcrdma: error creating RQ CQ for connect request\n");
 		goto errout;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 805b1f35e1ca..6e5989e2b8ed 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -523,8 +523,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
 
 	sendcq = ib_alloc_cq(ia->ri_id->device, NULL,
 			     ep->rep_attr.cap.max_send_wr + 1,
-			     ia->ri_id->device->num_comp_vectors > 1 ? 1 : 0,
-			     IB_POLL_WORKQUEUE);
+			     RDMA_CORE_ANY_COMPVEC, IB_POLL_WORKQUEUE);
 	if (IS_ERR(sendcq)) {
 		rc = PTR_ERR(sendcq);
 		goto out1;
@@ -532,7 +531,7 @@ int rpcrdma_ep_create(struct rpcrdma_xprt *r_xprt)
 
 	recvcq = ib_alloc_cq(ia->ri_id->device, NULL,
 			     ep->rep_attr.cap.max_recv_wr + 1,
-			     0, IB_POLL_WORKQUEUE);
+			     RDMA_CORE_ANY_COMPVEC, IB_POLL_WORKQUEUE);
 	if (IS_ERR(recvcq)) {
 		rc = PTR_ERR(recvcq);
 		goto out2;


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
  2019-07-23 19:13 [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors Chuck Lever
@ 2019-07-24  5:47 ` Leon Romanovsky
  2019-07-24 14:01   ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2019-07-24  5:47 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
> Send and Receive completion is handled on a single CPU selected at
> the time each Completion Queue is allocated. Typically this is when
> an initiator instantiates an RDMA transport, or when a target
> accepts an RDMA connection.
>
> Some ULPs cannot open a connection per CPU to spread completion
> workload across available CPUs. For these ULPs, allow the RDMA core
> to select a completion vector based on the device's complement of
> available comp_vecs.
>
> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
> available, a different CPU will be selected for each Completion
> Queue. For the moment, a simple round-robin mechanism is used.
>
> Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
From what I see, or callers are internally implementing similar logic
to proposed here, or they don't care (set 0).

Can we enable this comp_vector for everyone and simplify our API?

> ---
>  drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
>  include/rdma/ib_verbs.h                  |    3 +++
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
>  net/sunrpc/xprtrdma/verbs.c              |    5 ++---
>  4 files changed, 28 insertions(+), 6 deletions(-)
>
> Jason-
>
> If this patch is acceptable to all, then I would expect you to take
> it through the RDMA tree.
>
>
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index 7c599878ccf7..a89d549490c4 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
>  	queue_work(cq->comp_wq, &cq->work);
>  }
>
> +/*
> + * Attempt to spread ULP completion queues over a device's completion
> + * vectors so that all available CPU cores can help service the device's
> + * interrupt workload. This mechanism may be improved at a later point
> + * to dynamically take into account the system's actual workload.
> + */
> +static int ib_get_comp_vector(struct ib_device *dev)
> +{
> +	static atomic_t cv;
> +
> +	if (dev->num_comp_vectors > 1)
> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;

It is worth to take into account num_online_cpus(),

Thanks

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
  2019-07-24  5:47 ` Leon Romanovsky
@ 2019-07-24 14:01   ` Chuck Lever
  2019-07-25 13:17     ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2019-07-24 14:01 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma, Linux NFS Mailing List

Hi Leon, thanks for taking a look. Responses below.


> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
>> Send and Receive completion is handled on a single CPU selected at
>> the time each Completion Queue is allocated. Typically this is when
>> an initiator instantiates an RDMA transport, or when a target
>> accepts an RDMA connection.
>> 
>> Some ULPs cannot open a connection per CPU to spread completion
>> workload across available CPUs. For these ULPs, allow the RDMA core
>> to select a completion vector based on the device's complement of
>> available comp_vecs.
>> 
>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
>> available, a different CPU will be selected for each Completion
>> Queue. For the moment, a simple round-robin mechanism is used.
>> 
>> Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> 
> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
> From what I see, or callers are internally implementing similar logic
> to proposed here, or they don't care (set 0).

The goal of this patch is to deduplicate that "similar logic".
Callers that implement this logic already can use
RDMA_CORE_ANY_COMPVEC and get rid of their own copy.


> Can we enable this comp_vector for everyone and simplify our API?

We could create a new CQ allocation API that does not take a
comp vector. That might be cleaner than passing in a -1.

But I think some ULPs still want to use the existing API to
allocate one CQ for each of a device's comp vectors.


>> ---
>> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
>> include/rdma/ib_verbs.h                  |    3 +++
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
>> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
>> 4 files changed, 28 insertions(+), 6 deletions(-)
>> 
>> Jason-
>> 
>> If this patch is acceptable to all, then I would expect you to take
>> it through the RDMA tree.
>> 
>> 
>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>> index 7c599878ccf7..a89d549490c4 100644
>> --- a/drivers/infiniband/core/cq.c
>> +++ b/drivers/infiniband/core/cq.c
>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
>> 	queue_work(cq->comp_wq, &cq->work);
>> }
>> 
>> +/*
>> + * Attempt to spread ULP completion queues over a device's completion
>> + * vectors so that all available CPU cores can help service the device's
>> + * interrupt workload. This mechanism may be improved at a later point
>> + * to dynamically take into account the system's actual workload.
>> + */
>> +static int ib_get_comp_vector(struct ib_device *dev)
>> +{
>> +	static atomic_t cv;
>> +
>> +	if (dev->num_comp_vectors > 1)
>> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
> 
> It is worth to take into account num_online_cpus(),

I don't believe it is.

Håkon has convinced me that assigning interrupt vectors to
CPUs is in the domain of user space (ie, driven by policy).
In addition, one assumes that taking a CPU offline properly
will also involve re-assigning interrupt vectors that point
to that core.

In any event, this code can be modified after it is merged
if it is necessary to accommodate such requirements.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
  2019-07-24 14:01   ` Chuck Lever
@ 2019-07-25 13:17     ` Leon Romanovsky
  2019-07-25 14:03       ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Leon Romanovsky @ 2019-07-25 13:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Wed, Jul 24, 2019 at 10:01:36AM -0400, Chuck Lever wrote:
> Hi Leon, thanks for taking a look. Responses below.
>
>
> > On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
> >> Send and Receive completion is handled on a single CPU selected at
> >> the time each Completion Queue is allocated. Typically this is when
> >> an initiator instantiates an RDMA transport, or when a target
> >> accepts an RDMA connection.
> >>
> >> Some ULPs cannot open a connection per CPU to spread completion
> >> workload across available CPUs. For these ULPs, allow the RDMA core
> >> to select a completion vector based on the device's complement of
> >> available comp_vecs.
> >>
> >> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
> >> available, a different CPU will be selected for each Completion
> >> Queue. For the moment, a simple round-robin mechanism is used.
> >>
> >> Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >
> > It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
> > From what I see, or callers are internally implementing similar logic
> > to proposed here, or they don't care (set 0).
>
> The goal of this patch is to deduplicate that "similar logic".
> Callers that implement this logic already can use
> RDMA_CORE_ANY_COMPVEC and get rid of their own copy.

Can you please send removal patches together with this API proposal?
It will ensure that ULPs authors will notice such changes and we won't
end with special function for one ULP.

>
>
> > Can we enable this comp_vector for everyone and simplify our API?
>
> We could create a new CQ allocation API that does not take a
> comp vector. That might be cleaner than passing in a -1.

+1

>
> But I think some ULPs still want to use the existing API to
> allocate one CQ for each of a device's comp vectors.

It can be "legacy implementation", which is not really needed,
but I don't really know about it.

>
>
> >> ---
> >> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
> >> include/rdma/ib_verbs.h                  |    3 +++
> >> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
> >> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
> >> 4 files changed, 28 insertions(+), 6 deletions(-)
> >>
> >> Jason-
> >>
> >> If this patch is acceptable to all, then I would expect you to take
> >> it through the RDMA tree.
> >>
> >>
> >> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> >> index 7c599878ccf7..a89d549490c4 100644
> >> --- a/drivers/infiniband/core/cq.c
> >> +++ b/drivers/infiniband/core/cq.c
> >> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
> >> 	queue_work(cq->comp_wq, &cq->work);
> >> }
> >>
> >> +/*
> >> + * Attempt to spread ULP completion queues over a device's completion
> >> + * vectors so that all available CPU cores can help service the device's
> >> + * interrupt workload. This mechanism may be improved at a later point
> >> + * to dynamically take into account the system's actual workload.
> >> + */
> >> +static int ib_get_comp_vector(struct ib_device *dev)
> >> +{
> >> +	static atomic_t cv;
> >> +
> >> +	if (dev->num_comp_vectors > 1)
> >> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
> >
> > It is worth to take into account num_online_cpus(),
>
> I don't believe it is.
>
> Håkon has convinced me that assigning interrupt vectors to
> CPUs is in the domain of user space (ie, driven by policy).
> In addition, one assumes that taking a CPU offline properly
> will also involve re-assigning interrupt vectors that point
> to that core.
>
> In any event, this code can be modified after it is merged
> if it is necessary to accommodate such requirements.

It is a simple change, which is worth to do now as long as
we have interested parties involved here.

>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
  2019-07-25 13:17     ` Leon Romanovsky
@ 2019-07-25 14:03       ` Chuck Lever
  2019-07-28 15:05         ` Leon Romanovsky
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2019-07-25 14:03 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma, Linux NFS Mailing List



> On Jul 25, 2019, at 9:17 AM, Leon Romanovsky <leon@kernel.org> wrote:
> 
> On Wed, Jul 24, 2019 at 10:01:36AM -0400, Chuck Lever wrote:
>> Hi Leon, thanks for taking a look. Responses below.
>> 
>> 
>>> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@kernel.org> wrote:
>>> 
>>> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
>>>> Send and Receive completion is handled on a single CPU selected at
>>>> the time each Completion Queue is allocated. Typically this is when
>>>> an initiator instantiates an RDMA transport, or when a target
>>>> accepts an RDMA connection.
>>>> 
>>>> Some ULPs cannot open a connection per CPU to spread completion
>>>> workload across available CPUs. For these ULPs, allow the RDMA core
>>>> to select a completion vector based on the device's complement of
>>>> available comp_vecs.
>>>> 
>>>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
>>>> available, a different CPU will be selected for each Completion
>>>> Queue. For the moment, a simple round-robin mechanism is used.
>>>> 
>>>> Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
>>> From what I see, or callers are internally implementing similar logic
>>> to proposed here, or they don't care (set 0).
>> 
>> The goal of this patch is to deduplicate that "similar logic".
>> Callers that implement this logic already can use
>> RDMA_CORE_ANY_COMPVEC and get rid of their own copy.
> 
> Can you please send removal patches together with this API proposal?
> It will ensure that ULPs authors will notice such changes and we won't
> end with special function for one ULP.

I prefer that the maintainers of those ULPs make those changes.
It would require testing that I am not in a position to do myself.

I can add a couple of other ULPs, like cifs and 9p, which look
like straightforward modifications; but my understanding was that
only one user of a new API was required for adoption.


>>> Can we enable this comp_vector for everyone and simplify our API?
>> 
>> We could create a new CQ allocation API that does not take a
>> comp vector. That might be cleaner than passing in a -1.
> 
> +1

I'll send a v2 with this suggestion.


>> But I think some ULPs still want to use the existing API to
>> allocate one CQ for each of a device's comp vectors.
> 
> It can be "legacy implementation", which is not really needed,
> but I don't really know about it.

Have a look at the iSER initiator. There are legitimate use cases
in the kernel for the current ib_alloc_cq() API.

And don't forget the many users of ib_create_cq that remain in
the kernel.


>>>> ---
>>>> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
>>>> include/rdma/ib_verbs.h                  |    3 +++
>>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
>>>> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
>>>> 4 files changed, 28 insertions(+), 6 deletions(-)
>>>> 
>>>> Jason-
>>>> 
>>>> If this patch is acceptable to all, then I would expect you to take
>>>> it through the RDMA tree.
>>>> 
>>>> 
>>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
>>>> index 7c599878ccf7..a89d549490c4 100644
>>>> --- a/drivers/infiniband/core/cq.c
>>>> +++ b/drivers/infiniband/core/cq.c
>>>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
>>>> 	queue_work(cq->comp_wq, &cq->work);
>>>> }
>>>> 
>>>> +/*
>>>> + * Attempt to spread ULP completion queues over a device's completion
>>>> + * vectors so that all available CPU cores can help service the device's
>>>> + * interrupt workload. This mechanism may be improved at a later point
>>>> + * to dynamically take into account the system's actual workload.
>>>> + */
>>>> +static int ib_get_comp_vector(struct ib_device *dev)
>>>> +{
>>>> +	static atomic_t cv;
>>>> +
>>>> +	if (dev->num_comp_vectors > 1)
>>>> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
>>> 
>>> It is worth to take into account num_online_cpus(),
>> 
>> I don't believe it is.
>> 
>> Håkon has convinced me that assigning interrupt vectors to
>> CPUs is in the domain of user space (ie, driven by policy).
>> In addition, one assumes that taking a CPU offline properly
>> will also involve re-assigning interrupt vectors that point
>> to that core.
>> 
>> In any event, this code can be modified after it is merged
>> if it is necessary to accommodate such requirements.
> 
> It is a simple change, which is worth to do now as long as
> we have interested parties involved here.

Can you propose some code, or point out an example of how you
would prefer it to work?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors
  2019-07-25 14:03       ` Chuck Lever
@ 2019-07-28 15:05         ` Leon Romanovsky
  0 siblings, 0 replies; 6+ messages in thread
From: Leon Romanovsky @ 2019-07-28 15:05 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Thu, Jul 25, 2019 at 10:03:13AM -0400, Chuck Lever wrote:
>
>
> > On Jul 25, 2019, at 9:17 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Wed, Jul 24, 2019 at 10:01:36AM -0400, Chuck Lever wrote:
> >> Hi Leon, thanks for taking a look. Responses below.
> >>
> >>
> >>> On Jul 24, 2019, at 1:47 AM, Leon Romanovsky <leon@kernel.org> wrote:
> >>>
> >>> On Tue, Jul 23, 2019 at 03:13:37PM -0400, Chuck Lever wrote:
> >>>> Send and Receive completion is handled on a single CPU selected at
> >>>> the time each Completion Queue is allocated. Typically this is when
> >>>> an initiator instantiates an RDMA transport, or when a target
> >>>> accepts an RDMA connection.
> >>>>
> >>>> Some ULPs cannot open a connection per CPU to spread completion
> >>>> workload across available CPUs. For these ULPs, allow the RDMA core
> >>>> to select a completion vector based on the device's complement of
> >>>> available comp_vecs.
> >>>>
> >>>> When a ULP elects to use RDMA_CORE_ANY_COMPVEC, if multiple CPUs are
> >>>> available, a different CPU will be selected for each Completion
> >>>> Queue. For the moment, a simple round-robin mechanism is used.
> >>>>
> >>>> Suggested-by: Håkon Bugge <haakon.bugge@oracle.com>
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>
> >>> It make me wonder why do we need comp_vector as an argument to ib_alloc_cq?
> >>> From what I see, or callers are internally implementing similar logic
> >>> to proposed here, or they don't care (set 0).
> >>
> >> The goal of this patch is to deduplicate that "similar logic".
> >> Callers that implement this logic already can use
> >> RDMA_CORE_ANY_COMPVEC and get rid of their own copy.
> >
> > Can you please send removal patches together with this API proposal?
> > It will ensure that ULPs authors will notice such changes and we won't
> > end with special function for one ULP.
>
> I prefer that the maintainers of those ULPs make those changes.
> It would require testing that I am not in a position to do myself.
>
> I can add a couple of other ULPs, like cifs and 9p, which look
> like straightforward modifications; but my understanding was that
> only one user of a new API was required for adoption.
>
>
> >>> Can we enable this comp_vector for everyone and simplify our API?
> >>
> >> We could create a new CQ allocation API that does not take a
> >> comp vector. That might be cleaner than passing in a -1.
> >
> > +1
>
> I'll send a v2 with this suggestion.
>
>
> >> But I think some ULPs still want to use the existing API to
> >> allocate one CQ for each of a device's comp vectors.
> >
> > It can be "legacy implementation", which is not really needed,
> > but I don't really know about it.
>
> Have a look at the iSER initiator. There are legitimate use cases
> in the kernel for the current ib_alloc_cq() API.
>
> And don't forget the many users of ib_create_cq that remain in
> the kernel.
>
>
> >>>> ---
> >>>> drivers/infiniband/core/cq.c             |   20 +++++++++++++++++++-
> >>>> include/rdma/ib_verbs.h                  |    3 +++
> >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |    6 ++++--
> >>>> net/sunrpc/xprtrdma/verbs.c              |    5 ++---
> >>>> 4 files changed, 28 insertions(+), 6 deletions(-)
> >>>>
> >>>> Jason-
> >>>>
> >>>> If this patch is acceptable to all, then I would expect you to take
> >>>> it through the RDMA tree.
> >>>>
> >>>>
> >>>> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> >>>> index 7c599878ccf7..a89d549490c4 100644
> >>>> --- a/drivers/infiniband/core/cq.c
> >>>> +++ b/drivers/infiniband/core/cq.c
> >>>> @@ -165,12 +165,27 @@ static void ib_cq_completion_workqueue(struct ib_cq *cq, void *private)
> >>>> 	queue_work(cq->comp_wq, &cq->work);
> >>>> }
> >>>>
> >>>> +/*
> >>>> + * Attempt to spread ULP completion queues over a device's completion
> >>>> + * vectors so that all available CPU cores can help service the device's
> >>>> + * interrupt workload. This mechanism may be improved at a later point
> >>>> + * to dynamically take into account the system's actual workload.
> >>>> + */
> >>>> +static int ib_get_comp_vector(struct ib_device *dev)
> >>>> +{
> >>>> +	static atomic_t cv;
> >>>> +
> >>>> +	if (dev->num_comp_vectors > 1)
> >>>> +		return atomic_inc_return(&cv) % dev->num_comp_vectors;
> >>>
> >>> It is worth to take into account num_online_cpus(),
> >>
> >> I don't believe it is.
> >>
> >> Håkon has convinced me that assigning interrupt vectors to
> >> CPUs is in the domain of user space (ie, driven by policy).
> >> In addition, one assumes that taking a CPU offline properly
> >> will also involve re-assigning interrupt vectors that point
> >> to that core.
> >>
> >> In any event, this code can be modified after it is merged
> >> if it is necessary to accommodate such requirements.
> >
> > It is a simple change, which is worth to do now as long as
> > we have interested parties involved here.
>
> Can you propose some code, or point out an example of how you
> would prefer it to work?
>

I had in mind drivers/infiniband/ulp/iser/iser_verbs.c

 77         device->comps_used = min_t(int, num_online_cpus(),
 78                                  ib_dev->num_comp_vectors);

>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-07-28 15:05 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23 19:13 [PATCH v1] rdma: Enable ib_alloc_cq to spread work over a device's comp_vectors Chuck Lever
2019-07-24  5:47 ` Leon Romanovsky
2019-07-24 14:01   ` Chuck Lever
2019-07-25 13:17     ` Leon Romanovsky
2019-07-25 14:03       ` Chuck Lever
2019-07-28 15:05         ` Leon Romanovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).