linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
@ 2018-02-20 20:59 Arnd Bergmann
  2018-02-20 21:14 ` Parav Pandit
  2018-02-20 21:14 ` Bart Van Assche
  0 siblings, 2 replies; 16+ messages in thread
From: Arnd Bergmann @ 2018-02-20 20:59 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Arnd Bergmann, Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
	linux-rdma, linux-kernel

The ib_wc structure has grown to much that putting 16 of them on the stack
hits the warning limit for dangerous kernel stack consumption:

drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Using half that number brings us comfortably below that limit again.

Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/infiniband/core/cq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..2626adbb978e 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -16,7 +16,7 @@
 #include <rdma/ib_verbs.h>
 
 /* # of WCs to poll for with a single call to ib_poll_cq */
-#define IB_POLL_BATCH			16
+#define IB_POLL_BATCH			8
 
 /* # of WCs to iterate over before yielding */
 #define IB_POLL_BUDGET_IRQ		256
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
@ 2018-02-20 21:14 ` Parav Pandit
  2018-02-20 21:54   ` Arnd Bergmann
  2018-02-20 21:14 ` Bart Van Assche
  1 sibling, 1 reply; 16+ messages in thread
From: Parav Pandit @ 2018-02-20 21:14 UTC (permalink / raw)
  To: Arnd Bergmann, Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Sagi Grimberg, Bart Van Assche, linux-rdma,
	linux-kernel

Hi Arnd Bergmann,

> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Arnd Bergmann
> Sent: Tuesday, February 20, 2018 2:59 PM
> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky
> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche
> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
> 
> The ib_wc structure has grown to much that putting 16 of them on the stack hits
> the warning limit for dangerous kernel stack consumption:
> 
> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
> than 1024 bytes [-Werror=frame-larger-than=]
> 
> Using half that number brings us comfortably below that limit again.
> 
> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
> RDMA resources")

It is not clear to me how above commit 02d8883f520e introduced this stack issue.

Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?

[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
  2018-02-20 21:14 ` Parav Pandit
@ 2018-02-20 21:14 ` Bart Van Assche
  2018-02-20 21:47   ` Chuck Lever
  1 sibling, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2018-02-20 21:14 UTC (permalink / raw)
  To: jgg, arnd, dledford
  Cc: Bart Van Assche, linux-kernel, leonro, linux-rdma, sagi

On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>  /* # of WCs to poll for with a single call to ib_poll_cq */
> -#define IB_POLL_BATCH			16
> +#define IB_POLL_BATCH			8

The purpose of batch polling is to minimize contention on the cq spinlock.
Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
the performance impact of this change been verified for all affected drivers
(ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
over RDMA, ...)?

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 21:14 ` Bart Van Assche
@ 2018-02-20 21:47   ` Chuck Lever
  2018-02-21  9:47     ` Max Gurtovoy
  2018-02-21 13:44     ` Sagi Grimberg
  0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2018-02-20 21:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma, sagi



> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
> 
> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>> /* # of WCs to poll for with a single call to ib_poll_cq */
>> -#define IB_POLL_BATCH			16
>> +#define IB_POLL_BATCH			8
> 
> The purpose of batch polling is to minimize contention on the cq spinlock.
> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
> the performance impact of this change been verified for all affected drivers
> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
> over RDMA, ...)?

Only the users of the DIRECT polling method use an on-stack
array of ib_wc's. This is only the SRP drivers.

The other two modes have use of a dynamically allocated array
of ib_wc's that hangs off the ib_cq. These shouldn't need any
reduction in the size of this array, and they are the common
case.

IMO a better solution would be to change ib_process_cq_direct
to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

--
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 21:14 ` Parav Pandit
@ 2018-02-20 21:54   ` Arnd Bergmann
  0 siblings, 0 replies; 16+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:54 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Sagi Grimberg,
	Bart Van Assche, linux-rdma, linux-kernel

On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <parav@mellanox.com> wrote:
> Hi Arnd Bergmann,
>
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Arnd Bergmann
>> Sent: Tuesday, February 20, 2018 2:59 PM
>> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky
>> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche
>> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>>
>> The ib_wc structure has grown to much that putting 16 of them on the stack hits
>> the warning limit for dangerous kernel stack consumption:
>>
>> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
>> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
>> than 1024 bytes [-Werror=frame-larger-than=]
>>
>> Using half that number brings us comfortably below that limit again.
>>
>> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
>> RDMA resources")
>
> It is not clear to me how above commit 02d8883f520e introduced this stack issue.

My mistake, I misread the git history.

I did a proper bisection now and ended up with the commit that added the
IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq:
Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")

> Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
> Did you hit this error after/before applying patch [2]?
>
> [1] https://www.spinics.net/lists/linux-rdma/msg50754.html
> [2] https://patchwork.kernel.org/patch/10159623/

I did the analysis a few weeks ago when I first hit the problem but
didn't send it
out at the time. Today I saw the problem still persists on mainline (4.16-rc2),
which does contain the patch from [2].

What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes
of padding, so 16 of them gets us exactly the warning limit, and then there
are a few bytes for the function itself.

       Arnd

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 21:47   ` Chuck Lever
@ 2018-02-21  9:47     ` Max Gurtovoy
  2018-02-21 13:44     ` Sagi Grimberg
  1 sibling, 0 replies; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-21  9:47 UTC (permalink / raw)
  To: Chuck Lever, Bart Van Assche
  Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma, sagi



On 2/20/2018 11:47 PM, Chuck Lever wrote:
> 
> 
>> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>>
>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH			16
>>> +#define IB_POLL_BATCH			8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
> 
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
> 
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
> 
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

Yup, good idea.
you can define IB_DIRECT_POLL_BATCH to be 8 and use it in 
ib_process_cq_direct. *but* please make sure to use the right value in 
ib_poll_cq since the wcs array should be able to hold the requested 
amount of wcs.

-Max.

> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-20 21:47   ` Chuck Lever
  2018-02-21  9:47     ` Max Gurtovoy
@ 2018-02-21 13:44     ` Sagi Grimberg
  2018-02-21 14:45       ` Max Gurtovoy
  2018-02-21 15:10       ` Chuck Lever
  1 sibling, 2 replies; 16+ messages in thread
From: Sagi Grimberg @ 2018-02-21 13:44 UTC (permalink / raw)
  To: Chuck Lever, Bart Van Assche
  Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma


>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH			16
>>> +#define IB_POLL_BATCH			8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
> 
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
> 
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
> 
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.

The only reason why I added this array on-stack was to allow consumers
that did not use ib_alloc_cq api to call it, but that seems like a
wrong decision when thinking it over again (as probably these users
did not set the wr_cqe correctly).

How about we make ib_process_cq_direct use the cq wc array and add
a WARN_ON statement (and fail it gracefully) if the caller used this
API without calling ib_alloc_cq?

--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..cd3e9e124834 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -25,10 +25,10 @@
  #define IB_POLL_FLAGS \
         (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget)
  {
         int i, n, completed = 0;
-       struct ib_wc *wcs = poll_wc ? : cq->wc;
+       struct ib_wc *wcs = cq->wc;

         /*
          * budget might be (-1) if the caller does not
@@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       struct ib_wc wcs[IB_POLL_BATCH];
-
-       return __ib_process_cq(cq, budget, wcs);
+       if (unlikely(WARN_ON_ONCE(!cq->wc)))
+               return 0;
+       return __ib_process_cq(cq, budget);
  }
  EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 
budget)
         struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
         int completed;

-       completed = __ib_process_cq(cq, budget, NULL);
+       completed = __ib_process_cq(cq, budget);
         if (completed < budget) {
                 irq_poll_complete(&cq->iop);
                 if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
         struct ib_cq *cq = container_of(work, struct ib_cq, work);
         int completed;

-       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
         if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
             ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
                 queue_work(ib_comp_wq, &cq->work);
--

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-21 13:44     ` Sagi Grimberg
@ 2018-02-21 14:45       ` Max Gurtovoy
  2018-02-22 15:39         ` Sagi Grimberg
  2018-02-21 15:10       ` Chuck Lever
  1 sibling, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-21 14:45 UTC (permalink / raw)
  To: Sagi Grimberg, Chuck Lever, Bart Van Assche
  Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma



On 2/21/2018 3:44 PM, Sagi Grimberg wrote:
> 
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH            16
>>>> +#define IB_POLL_BATCH            8
>>>
>>> The purpose of batch polling is to minimize contention on the cq 
>>> spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance 
>>> negatively. Has
>>> the performance impact of this change been verified for all affected 
>>> drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB 
>>> Direct, NFS
>>> over RDMA, ...)?
>>
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>>
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>>
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
> 
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
> 
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?

but we tried to avoid cuncurrent access to cq->wc.
Why can't we use the solution I wrote above ?

> 
> -- 
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
>   #define IB_POLL_FLAGS \
>          (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
> 
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
>   {
>          int i, n, completed = 0;
> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
> +       struct ib_wc *wcs = cq->wc;
> 
>          /*
>           * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 
> budget, struct ib_wc *poll_wc)
>    */
>   int ib_process_cq_direct(struct ib_cq *cq, int budget)
>   {
> -       struct ib_wc wcs[IB_POLL_BATCH];
> -
> -       return __ib_process_cq(cq, budget, wcs);
> +       if (unlikely(WARN_ON_ONCE(!cq->wc)))
> +               return 0;
> +       return __ib_process_cq(cq, budget);
>   }
>   EXPORT_SYMBOL(ib_process_cq_direct);
> 
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 
> budget)
>          struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
>          int completed;
> 
> -       completed = __ib_process_cq(cq, budget, NULL);
> +       completed = __ib_process_cq(cq, budget);
>          if (completed < budget) {
>                  irq_poll_complete(&cq->iop);
>                  if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
>          struct ib_cq *cq = container_of(work, struct ib_cq, work);
>          int completed;
> 
> -       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> +       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
>          if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>              ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>                  queue_work(ib_comp_wq, &cq->work);
> -- 
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-21 13:44     ` Sagi Grimberg
  2018-02-21 14:45       ` Max Gurtovoy
@ 2018-02-21 15:10       ` Chuck Lever
  1 sibling, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2018-02-21 15:10 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Bart Van Assche, jgg, arnd, dledford, linux-kernel, leonro, linux-rdma



> On Feb 21, 2018, at 8:44 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
> 
> 
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH			16
>>>> +#define IB_POLL_BATCH			8
>>> 
>>> The purpose of batch polling is to minimize contention on the cq spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>>> the performance impact of this change been verified for all affected drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>>> over RDMA, ...)?
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
> 
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
> 
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?

Agreed, I prefer that all three modes use dynamically allocated
memory for that array.


> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
> #define IB_POLL_FLAGS \
>        (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
> 
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
> {
>        int i, n, completed = 0;
> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
> +       struct ib_wc *wcs = cq->wc;
> 
>        /*
>         * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
>  */
> int ib_process_cq_direct(struct ib_cq *cq, int budget)
> {
> -       struct ib_wc wcs[IB_POLL_BATCH];
> -
> -       return __ib_process_cq(cq, budget, wcs);
> +       if (unlikely(WARN_ON_ONCE(!cq->wc)))
> +               return 0;
> +       return __ib_process_cq(cq, budget);
> }
> EXPORT_SYMBOL(ib_process_cq_direct);
> 
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)
>        struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
>        int completed;
> 
> -       completed = __ib_process_cq(cq, budget, NULL);
> +       completed = __ib_process_cq(cq, budget);
>        if (completed < budget) {
>                irq_poll_complete(&cq->iop);
>                if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
>        struct ib_cq *cq = container_of(work, struct ib_cq, work);
>        int completed;
> 
> -       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> +       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
>        if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
>            ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
>                queue_work(ib_comp_wq, &cq->work);
> --

--
Chuck Lever

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-21 14:45       ` Max Gurtovoy
@ 2018-02-22 15:39         ` Sagi Grimberg
  2018-02-27 22:09           ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-02-22 15:39 UTC (permalink / raw)
  To: Max Gurtovoy, Chuck Lever, Bart Van Assche
  Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma


>> The only reason why I added this array on-stack was to allow consumers
>> that did not use ib_alloc_cq api to call it, but that seems like a
>> wrong decision when thinking it over again (as probably these users
>> did not set the wr_cqe correctly).
>>
>> How about we make ib_process_cq_direct use the cq wc array and add
>> a WARN_ON statement (and fail it gracefully) if the caller used this
>> API without calling ib_alloc_cq?
> 
> but we tried to avoid cuncurrent access to cq->wc.

Not sure its a valid use-case. But if there is a compelling
reason to keep it as is, then we can do smaller on-stack
array.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-22 15:39         ` Sagi Grimberg
@ 2018-02-27 22:09           ` Jason Gunthorpe
  2018-02-27 22:15             ` Max Gurtovoy
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-02-27 22:09 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, Chuck Lever, Bart Van Assche, arnd, dledford,
	linux-kernel, leonro, linux-rdma

On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
> 
> >>The only reason why I added this array on-stack was to allow consumers
> >>that did not use ib_alloc_cq api to call it, but that seems like a
> >>wrong decision when thinking it over again (as probably these users
> >>did not set the wr_cqe correctly).
> >>
> >>How about we make ib_process_cq_direct use the cq wc array and add
> >>a WARN_ON statement (and fail it gracefully) if the caller used this
> >>API without calling ib_alloc_cq?
> >
> >but we tried to avoid cuncurrent access to cq->wc.
> 
> Not sure its a valid use-case. But if there is a compelling
> reason to keep it as is, then we can do smaller on-stack
> array.

Did we come to a conclusion what to do here?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-27 22:09           ` Jason Gunthorpe
@ 2018-02-27 22:15             ` Max Gurtovoy
  2018-02-28  0:21               ` Bart Van Assche
  0 siblings, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-27 22:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Sagi Grimberg
  Cc: Chuck Lever, Bart Van Assche, arnd, dledford, linux-kernel,
	leonro, linux-rdma



On 2/28/2018 12:09 AM, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
>>
>>>> The only reason why I added this array on-stack was to allow consumers
>>>> that did not use ib_alloc_cq api to call it, but that seems like a
>>>> wrong decision when thinking it over again (as probably these users
>>>> did not set the wr_cqe correctly).
>>>>
>>>> How about we make ib_process_cq_direct use the cq wc array and add
>>>> a WARN_ON statement (and fail it gracefully) if the caller used this
>>>> API without calling ib_alloc_cq?
>>>
>>> but we tried to avoid cuncurrent access to cq->wc.
>>
>> Not sure its a valid use-case. But if there is a compelling
>> reason to keep it as is, then we can do smaller on-stack
>> array.
> 
> Did we come to a conclusion what to do here?

guys,
what do you think about the following solution (untested):


diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8..59d2835 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -17,6 +17,7 @@

  /* # of WCs to poll for with a single call to ib_poll_cq */
  #define IB_POLL_BATCH                  16
+#define IB_POLL_BATCH_DIRECT           8

  /* # of WCs to iterate over before yielding */
  #define IB_POLL_BUDGET_IRQ             256
@@ -25,17 +26,25 @@
  #define IB_POLL_FLAGS \
         (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)

-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
*poll_wc,
+                          int batch)
  {
-       int i, n, completed = 0;
-       struct ib_wc *wcs = poll_wc ? : cq->wc;
-
+       int i, n, ib_poll_batch, completed = 0;
+       struct ib_wc *wcs;
+
+       if (poll_wc) {
+               wcs = poll_wc;
+               ib_poll_batch = batch;
+       } else {
+               wcs = cq->wc;
+               ib_poll_batch = IB_POLL_BATCH;
+       }
         /*
          * budget might be (-1) if the caller does not
          * want to bound this call, thus we need unsigned
          * minimum here.
          */
-       while ((n = ib_poll_cq(cq, min_t(u32, IB_POLL_BATCH,
+       while ((n = ib_poll_cq(cq, min_t(u32, ib_poll_batch,
                         budget - completed), wcs)) > 0) {
                 for (i = 0; i < n; i++) {
                         struct ib_wc *wc = &wcs[i];
@@ -48,7 +57,7 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)

                 completed += n;

-               if (n != IB_POLL_BATCH ||
+               if (n != ib_poll_batch ||
                     (budget != -1 && completed >= budget))
                         break;
         }
@@ -72,9 +81,9 @@ static int __ib_process_cq(struct ib_cq *cq, int 
budget, struct ib_wc *poll_wc)
   */
  int ib_process_cq_direct(struct ib_cq *cq, int budget)
  {
-       struct ib_wc wcs[IB_POLL_BATCH];
+       struct ib_wc wcs[IB_POLL_BATCH_DIRECT];

-       return __ib_process_cq(cq, budget, wcs);
+       return __ib_process_cq(cq, budget, wcs, IB_POLL_BATCH_DIRECT);
  }
  EXPORT_SYMBOL(ib_process_cq_direct);

@@ -88,7 +97,7 @@ static int ib_poll_handler(struct irq_poll *iop, int 
budget)
         struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
         int completed;

-       completed = __ib_process_cq(cq, budget, NULL);
+       completed = __ib_process_cq(cq, budget, NULL, 0);
         if (completed < budget) {
                 irq_poll_complete(&cq->iop);
                 if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +117,7 @@ static void ib_cq_poll_work(struct work_struct *work)
         struct ib_cq *cq = container_of(work, struct ib_cq, work);
         int completed;

-       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+       completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL, 0);
         if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
             ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
                 queue_work(ib_comp_wq, &cq->work);



> 
> Jason
> 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-27 22:15             ` Max Gurtovoy
@ 2018-02-28  0:21               ` Bart Van Assche
  2018-02-28  9:50                 ` Max Gurtovoy
  0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2018-02-28  0:21 UTC (permalink / raw)
  To: Max Gurtovoy, Jason Gunthorpe, Sagi Grimberg
  Cc: Chuck Lever, arnd, dledford, linux-kernel, leonro, linux-rdma

On 02/27/18 14:15, Max Gurtovoy wrote:
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
> *poll_wc,
> +                          int batch)
>   {
> -       int i, n, completed = 0;
> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
> +       int i, n, ib_poll_batch, completed = 0;
> +       struct ib_wc *wcs;
> +
> +       if (poll_wc) {
> +               wcs = poll_wc;
> +               ib_poll_batch = batch;
> +       } else {
> +               wcs = cq->wc;
> +               ib_poll_batch = IB_POLL_BATCH;
> +       }

Since this code has to be touched I think that we can use this 
opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 
instead use what the caller passes. That will require to update all 
__ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 
pass ib_poll_batch instead of figuring it out in this function. 
Otherwise the approach of this patch looks fine to me.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-28  0:21               ` Bart Van Assche
@ 2018-02-28  9:50                 ` Max Gurtovoy
  2018-02-28 18:55                   ` Doug Ledford
  0 siblings, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-28  9:50 UTC (permalink / raw)
  To: Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
  Cc: Chuck Lever, arnd, dledford, linux-kernel, leonro, linux-rdma



On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> On 02/27/18 14:15, Max Gurtovoy wrote:
>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
>> *poll_wc)
>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
>> *poll_wc,
>> +                          int batch)
>>   {
>> -       int i, n, completed = 0;
>> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
>> +       int i, n, ib_poll_batch, completed = 0;
>> +       struct ib_wc *wcs;
>> +
>> +       if (poll_wc) {
>> +               wcs = poll_wc;
>> +               ib_poll_batch = batch;
>> +       } else {
>> +               wcs = cq->wc;
>> +               ib_poll_batch = IB_POLL_BATCH;
>> +       }
> 
> Since this code has to be touched I think that we can use this 
> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 
> instead use what the caller passes. That will require to update all 
> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 
> pass ib_poll_batch instead of figuring it out in this function. 
> Otherwise the approach of this patch looks fine to me.

Thanks Bart.
I'll make these changes and submit.

> 
> Thanks,
> 
> Bart.

-Max.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-28  9:50                 ` Max Gurtovoy
@ 2018-02-28 18:55                   ` Doug Ledford
  2018-03-01  9:36                     ` Max Gurtovoy
  0 siblings, 1 reply; 16+ messages in thread
From: Doug Ledford @ 2018-02-28 18:55 UTC (permalink / raw)
  To: Max Gurtovoy, Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
  Cc: Chuck Lever, arnd, linux-kernel, leonro, linux-rdma

[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]

On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
> 
> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> > On 02/27/18 14:15, Max Gurtovoy wrote:
> > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
> > > *poll_wc)
> > > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc 
> > > *poll_wc,
> > > +                          int batch)
> > >   {
> > > -       int i, n, completed = 0;
> > > -       struct ib_wc *wcs = poll_wc ? : cq->wc;
> > > +       int i, n, ib_poll_batch, completed = 0;
> > > +       struct ib_wc *wcs;
> > > +
> > > +       if (poll_wc) {
> > > +               wcs = poll_wc;
> > > +               ib_poll_batch = batch;
> > > +       } else {
> > > +               wcs = cq->wc;
> > > +               ib_poll_batch = IB_POLL_BATCH;
> > > +       }
> > 
> > Since this code has to be touched I think that we can use this 
> > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and 
> > instead use what the caller passes. That will require to update all 
> > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller 
> > pass ib_poll_batch instead of figuring it out in this function. 
> > Otherwise the approach of this patch looks fine to me.
> 
> Thanks Bart.
> I'll make these changes and submit.

That sounds reasonable to me too, thanks for reworking and resubmitting.

-- 
Doug Ledford <dledford@redhat.com>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
  2018-02-28 18:55                   ` Doug Ledford
@ 2018-03-01  9:36                     ` Max Gurtovoy
  0 siblings, 0 replies; 16+ messages in thread
From: Max Gurtovoy @ 2018-03-01  9:36 UTC (permalink / raw)
  To: Doug Ledford, Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
  Cc: Chuck Lever, arnd, linux-kernel, leonro, linux-rdma



On 2/28/2018 8:55 PM, Doug Ledford wrote:
> On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
>>
>> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
>>> On 02/27/18 14:15, Max Gurtovoy wrote:
>>>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc)
>>>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc,
>>>> +                          int batch)
>>>>    {
>>>> -       int i, n, completed = 0;
>>>> -       struct ib_wc *wcs = poll_wc ? : cq->wc;
>>>> +       int i, n, ib_poll_batch, completed = 0;
>>>> +       struct ib_wc *wcs;
>>>> +
>>>> +       if (poll_wc) {
>>>> +               wcs = poll_wc;
>>>> +               ib_poll_batch = batch;
>>>> +       } else {
>>>> +               wcs = cq->wc;
>>>> +               ib_poll_batch = IB_POLL_BATCH;
>>>> +       }
>>>
>>> Since this code has to be touched I think that we can use this
>>> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
>>> instead use what the caller passes. That will require to update all
>>> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
>>> pass ib_poll_batch instead of figuring it out in this function.
>>> Otherwise the approach of this patch looks fine to me.
>>
>> Thanks Bart.
>> I'll make these changes and submit.
> 
> That sounds reasonable to me too, thanks for reworking and resubmitting.
> 

Sure, NP.
We've run NVMe-oF and SRP with the new patch.
I'll send it through Mellanox maintainers pull request.

Thanks for reporting and reviewing.

-Max.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-01  9:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
2018-02-20 21:14 ` Parav Pandit
2018-02-20 21:54   ` Arnd Bergmann
2018-02-20 21:14 ` Bart Van Assche
2018-02-20 21:47   ` Chuck Lever
2018-02-21  9:47     ` Max Gurtovoy
2018-02-21 13:44     ` Sagi Grimberg
2018-02-21 14:45       ` Max Gurtovoy
2018-02-22 15:39         ` Sagi Grimberg
2018-02-27 22:09           ` Jason Gunthorpe
2018-02-27 22:15             ` Max Gurtovoy
2018-02-28  0:21               ` Bart Van Assche
2018-02-28  9:50                 ` Max Gurtovoy
2018-02-28 18:55                   ` Doug Ledford
2018-03-01  9:36                     ` Max Gurtovoy
2018-02-21 15:10       ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).