* [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
@ 2018-02-20 20:59 Arnd Bergmann
2018-02-20 21:14 ` Parav Pandit
2018-02-20 21:14 ` Bart Van Assche
0 siblings, 2 replies; 16+ messages in thread
From: Arnd Bergmann @ 2018-02-20 20:59 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Arnd Bergmann, Leon Romanovsky, Sagi Grimberg, Bart Van Assche,
linux-rdma, linux-kernel
The ib_wc structure has grown to much that putting 16 of them on the stack
hits the warning limit for dangerous kernel stack consumption:
drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
Using half that number brings us comfortably below that limit again.
Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track RDMA resources")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
drivers/infiniband/core/cq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..2626adbb978e 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -16,7 +16,7 @@
#include <rdma/ib_verbs.h>
/* # of WCs to poll for with a single call to ib_poll_cq */
-#define IB_POLL_BATCH 16
+#define IB_POLL_BATCH 8
/* # of WCs to iterate over before yielding */
#define IB_POLL_BUDGET_IRQ 256
--
2.9.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
@ 2018-02-20 21:14 ` Parav Pandit
2018-02-20 21:54 ` Arnd Bergmann
2018-02-20 21:14 ` Bart Van Assche
1 sibling, 1 reply; 16+ messages in thread
From: Parav Pandit @ 2018-02-20 21:14 UTC (permalink / raw)
To: Arnd Bergmann, Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Sagi Grimberg, Bart Van Assche, linux-rdma,
linux-kernel
Hi Arnd Bergmann,
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Arnd Bergmann
> Sent: Tuesday, February 20, 2018 2:59 PM
> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky
> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche
> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>
> The ib_wc structure has grown to much that putting 16 of them on the stack hits
> the warning limit for dangerous kernel stack consumption:
>
> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
> than 1024 bytes [-Werror=frame-larger-than=]
>
> Using half that number brings us comfortably below that limit again.
>
> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
> RDMA resources")
It is not clear to me how above commit 02d8883f520e introduced this stack issue.
Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
Did you hit this error after/before applying patch [2]?
[1] https://www.spinics.net/lists/linux-rdma/msg50754.html
[2] https://patchwork.kernel.org/patch/10159623/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
2018-02-20 21:14 ` Parav Pandit
@ 2018-02-20 21:14 ` Bart Van Assche
2018-02-20 21:47 ` Chuck Lever
1 sibling, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2018-02-20 21:14 UTC (permalink / raw)
To: jgg, arnd, dledford
Cc: Bart Van Assche, linux-kernel, leonro, linux-rdma, sagi
On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
> /* # of WCs to poll for with a single call to ib_poll_cq */
> -#define IB_POLL_BATCH 16
> +#define IB_POLL_BATCH 8
The purpose of batch polling is to minimize contention on the cq spinlock.
Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
the performance impact of this change been verified for all affected drivers
(ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
over RDMA, ...)?
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 21:14 ` Bart Van Assche
@ 2018-02-20 21:47 ` Chuck Lever
2018-02-21 9:47 ` Max Gurtovoy
2018-02-21 13:44 ` Sagi Grimberg
0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2018-02-20 21:47 UTC (permalink / raw)
To: Bart Van Assche
Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma, sagi
> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>
> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>> /* # of WCs to poll for with a single call to ib_poll_cq */
>> -#define IB_POLL_BATCH 16
>> +#define IB_POLL_BATCH 8
>
> The purpose of batch polling is to minimize contention on the cq spinlock.
> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
> the performance impact of this change been verified for all affected drivers
> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
> over RDMA, ...)?
Only the users of the DIRECT polling method use an on-stack
array of ib_wc's. This is only the SRP drivers.
The other two modes have use of a dynamically allocated array
of ib_wc's that hangs off the ib_cq. These shouldn't need any
reduction in the size of this array, and they are the common
case.
IMO a better solution would be to change ib_process_cq_direct
to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 21:14 ` Parav Pandit
@ 2018-02-20 21:54 ` Arnd Bergmann
0 siblings, 0 replies; 16+ messages in thread
From: Arnd Bergmann @ 2018-02-20 21:54 UTC (permalink / raw)
To: Parav Pandit
Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Sagi Grimberg,
Bart Van Assche, linux-rdma, linux-kernel
On Tue, Feb 20, 2018 at 10:14 PM, Parav Pandit <parav@mellanox.com> wrote:
> Hi Arnd Bergmann,
>
>> -----Original Message-----
>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>> owner@vger.kernel.org] On Behalf Of Arnd Bergmann
>> Sent: Tuesday, February 20, 2018 2:59 PM
>> To: Doug Ledford <dledford@redhat.com>; Jason Gunthorpe <jgg@ziepe.ca>
>> Cc: Arnd Bergmann <arnd@arndb.de>; Leon Romanovsky
>> <leonro@mellanox.com>; Sagi Grimberg <sagi@grimberg.me>; Bart Van Assche
>> <bart.vanassche@sandisk.com>; linux-rdma@vger.kernel.org; linux-
>> kernel@vger.kernel.org
>> Subject: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
>>
>> The ib_wc structure has grown to much that putting 16 of them on the stack hits
>> the warning limit for dangerous kernel stack consumption:
>>
>> drivers/infiniband/core/cq.c: In function 'ib_process_cq_direct':
>> drivers/infiniband/core/cq.c:78:1: error: the frame size of 1032 bytes is larger
>> than 1024 bytes [-Werror=frame-larger-than=]
>>
>> Using half that number brings us comfortably below that limit again.
>>
>> Fixes: 02d8883f520e ("RDMA/restrack: Add general infrastructure to track
>> RDMA resources")
>
> It is not clear to me how above commit 02d8883f520e introduced this stack issue.
My mistake, I misread the git history.
I did a proper bisection now and ended up with the commit that added the
IB_POLL_BACK sized array on the stack, i.e. commit 246d8b184c10 ("IB/cq:
Don't force IB_POLL_DIRECT poll context for ib_process_cq_direct")
> Bodong and I came across ib_wc size increase in [1] and it was fixed in [2].
> Did you hit this error after/before applying patch [2]?
>
> [1] https://www.spinics.net/lists/linux-rdma/msg50754.html
> [2] https://patchwork.kernel.org/patch/10159623/
I did the analysis a few weeks ago when I first hit the problem but
didn't send it
out at the time. Today I saw the problem still persists on mainline (4.16-rc2),
which does contain the patch from [2].
What I see is that 'ib_wc' is now exactly 59 bytes on 32-bit ARM, plus 5 bytes
of padding, so 16 of them gets us exactly the warning limit, and then there
are a few bytes for the function itself.
Arnd
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 21:47 ` Chuck Lever
@ 2018-02-21 9:47 ` Max Gurtovoy
2018-02-21 13:44 ` Sagi Grimberg
1 sibling, 0 replies; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-21 9:47 UTC (permalink / raw)
To: Chuck Lever, Bart Van Assche
Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma, sagi
On 2/20/2018 11:47 PM, Chuck Lever wrote:
>
>
>> On Feb 20, 2018, at 4:14 PM, Bart Van Assche <Bart.VanAssche@wdc.com> wrote:
>>
>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH 16
>>> +#define IB_POLL_BATCH 8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
>
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
>
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
>
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
Yup, good idea.
you can define IB_DIRECT_POLL_BATCH to be 8 and use it in
ib_process_cq_direct. *but* please make sure to use the right value in
ib_poll_cq since the wcs array should be able to hold the requested
amount of wcs.
-Max.
>
> --
> Chuck Lever
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-20 21:47 ` Chuck Lever
2018-02-21 9:47 ` Max Gurtovoy
@ 2018-02-21 13:44 ` Sagi Grimberg
2018-02-21 14:45 ` Max Gurtovoy
2018-02-21 15:10 ` Chuck Lever
1 sibling, 2 replies; 16+ messages in thread
From: Sagi Grimberg @ 2018-02-21 13:44 UTC (permalink / raw)
To: Chuck Lever, Bart Van Assche
Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma
>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>> -#define IB_POLL_BATCH 16
>>> +#define IB_POLL_BATCH 8
>>
>> The purpose of batch polling is to minimize contention on the cq spinlock.
>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>> the performance impact of this change been verified for all affected drivers
>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>> over RDMA, ...)?
>
> Only the users of the DIRECT polling method use an on-stack
> array of ib_wc's. This is only the SRP drivers.
>
> The other two modes have use of a dynamically allocated array
> of ib_wc's that hangs off the ib_cq. These shouldn't need any
> reduction in the size of this array, and they are the common
> case.
>
> IMO a better solution would be to change ib_process_cq_direct
> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
The only reason why I added this array on-stack was to allow consumers
that did not use ib_alloc_cq api to call it, but that seems like a
wrong decision when thinking it over again (as probably these users
did not set the wr_cqe correctly).
How about we make ib_process_cq_direct use the cq wc array and add
a WARN_ON statement (and fail it gracefully) if the caller used this
API without calling ib_alloc_cq?
--
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8215d7..cd3e9e124834 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -25,10 +25,10 @@
#define IB_POLL_FLAGS \
(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget)
{
int i, n, completed = 0;
- struct ib_wc *wcs = poll_wc ? : cq->wc;
+ struct ib_wc *wcs = cq->wc;
/*
* budget might be (-1) if the caller does not
@@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)
*/
int ib_process_cq_direct(struct ib_cq *cq, int budget)
{
- struct ib_wc wcs[IB_POLL_BATCH];
-
- return __ib_process_cq(cq, budget, wcs);
+ if (unlikely(WARN_ON_ONCE(!cq->wc)))
+ return 0;
+ return __ib_process_cq(cq, budget);
}
EXPORT_SYMBOL(ib_process_cq_direct);
@@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
budget)
struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
int completed;
- completed = __ib_process_cq(cq, budget, NULL);
+ completed = __ib_process_cq(cq, budget);
if (completed < budget) {
irq_poll_complete(&cq->iop);
if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
struct ib_cq *cq = container_of(work, struct ib_cq, work);
int completed;
- completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+ completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
queue_work(ib_comp_wq, &cq->work);
--
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-21 13:44 ` Sagi Grimberg
@ 2018-02-21 14:45 ` Max Gurtovoy
2018-02-22 15:39 ` Sagi Grimberg
2018-02-21 15:10 ` Chuck Lever
1 sibling, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-21 14:45 UTC (permalink / raw)
To: Sagi Grimberg, Chuck Lever, Bart Van Assche
Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma
On 2/21/2018 3:44 PM, Sagi Grimberg wrote:
>
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH 16
>>>> +#define IB_POLL_BATCH 8
>>>
>>> The purpose of batch polling is to minimize contention on the cq
>>> spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance
>>> negatively. Has
>>> the performance impact of this change been verified for all affected
>>> drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB
>>> Direct, NFS
>>> over RDMA, ...)?
>>
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>>
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>>
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
>
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
>
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?
but we tried to avoid cuncurrent access to cq->wc.
Why can't we use the solution I wrote above ?
>
> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
> #define IB_POLL_FLAGS \
> (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
>
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
> {
> int i, n, completed = 0;
> - struct ib_wc *wcs = poll_wc ? : cq->wc;
> + struct ib_wc *wcs = cq->wc;
>
> /*
> * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
> budget, struct ib_wc *poll_wc)
> */
> int ib_process_cq_direct(struct ib_cq *cq, int budget)
> {
> - struct ib_wc wcs[IB_POLL_BATCH];
> -
> - return __ib_process_cq(cq, budget, wcs);
> + if (unlikely(WARN_ON_ONCE(!cq->wc)))
> + return 0;
> + return __ib_process_cq(cq, budget);
> }
> EXPORT_SYMBOL(ib_process_cq_direct);
>
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
> budget)
> struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
> int completed;
>
> - completed = __ib_process_cq(cq, budget, NULL);
> + completed = __ib_process_cq(cq, budget);
> if (completed < budget) {
> irq_poll_complete(&cq->iop);
> if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
> struct ib_cq *cq = container_of(work, struct ib_cq, work);
> int completed;
>
> - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
> if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
> ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> queue_work(ib_comp_wq, &cq->work);
> --
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-21 13:44 ` Sagi Grimberg
2018-02-21 14:45 ` Max Gurtovoy
@ 2018-02-21 15:10 ` Chuck Lever
1 sibling, 0 replies; 16+ messages in thread
From: Chuck Lever @ 2018-02-21 15:10 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Bart Van Assche, jgg, arnd, dledford, linux-kernel, leonro, linux-rdma
> On Feb 21, 2018, at 8:44 AM, Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
>>> On Tue, 2018-02-20 at 21:59 +0100, Arnd Bergmann wrote:
>>>> /* # of WCs to poll for with a single call to ib_poll_cq */
>>>> -#define IB_POLL_BATCH 16
>>>> +#define IB_POLL_BATCH 8
>>>
>>> The purpose of batch polling is to minimize contention on the cq spinlock.
>>> Reducing the IB_POLL_BATCH constant may affect performance negatively. Has
>>> the performance impact of this change been verified for all affected drivers
>>> (ib_srp, ib_srpt, ib_iser, ib_isert, NVMeOF, NVMeOF target, SMB Direct, NFS
>>> over RDMA, ...)?
>> Only the users of the DIRECT polling method use an on-stack
>> array of ib_wc's. This is only the SRP drivers.
>> The other two modes have use of a dynamically allocated array
>> of ib_wc's that hangs off the ib_cq. These shouldn't need any
>> reduction in the size of this array, and they are the common
>> case.
>> IMO a better solution would be to change ib_process_cq_direct
>> to use a smaller on-stack array, and leave IB_POLL_BATCH alone.
>
> The only reason why I added this array on-stack was to allow consumers
> that did not use ib_alloc_cq api to call it, but that seems like a
> wrong decision when thinking it over again (as probably these users
> did not set the wr_cqe correctly).
>
> How about we make ib_process_cq_direct use the cq wc array and add
> a WARN_ON statement (and fail it gracefully) if the caller used this
> API without calling ib_alloc_cq?
Agreed, I prefer that all three modes use dynamically allocated
memory for that array.
> --
> diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
> index bc79ca8215d7..cd3e9e124834 100644
> --- a/drivers/infiniband/core/cq.c
> +++ b/drivers/infiniband/core/cq.c
> @@ -25,10 +25,10 @@
> #define IB_POLL_FLAGS \
> (IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
>
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget)
> {
> int i, n, completed = 0;
> - struct ib_wc *wcs = poll_wc ? : cq->wc;
> + struct ib_wc *wcs = cq->wc;
>
> /*
> * budget might be (-1) if the caller does not
> @@ -72,9 +72,9 @@ static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc *poll_wc)
> */
> int ib_process_cq_direct(struct ib_cq *cq, int budget)
> {
> - struct ib_wc wcs[IB_POLL_BATCH];
> -
> - return __ib_process_cq(cq, budget, wcs);
> + if (unlikely(WARN_ON_ONCE(!cq->wc)))
> + return 0;
> + return __ib_process_cq(cq, budget);
> }
> EXPORT_SYMBOL(ib_process_cq_direct);
>
> @@ -88,7 +88,7 @@ static int ib_poll_handler(struct irq_poll *iop, int budget)
> struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
> int completed;
>
> - completed = __ib_process_cq(cq, budget, NULL);
> + completed = __ib_process_cq(cq, budget);
> if (completed < budget) {
> irq_poll_complete(&cq->iop);
> if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> @@ -108,7 +108,7 @@ static void ib_cq_poll_work(struct work_struct *work)
> struct ib_cq *cq = container_of(work, struct ib_cq, work);
> int completed;
>
> - completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
> + completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE);
> if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
> ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
> queue_work(ib_comp_wq, &cq->work);
> --
--
Chuck Lever
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-21 14:45 ` Max Gurtovoy
@ 2018-02-22 15:39 ` Sagi Grimberg
2018-02-27 22:09 ` Jason Gunthorpe
0 siblings, 1 reply; 16+ messages in thread
From: Sagi Grimberg @ 2018-02-22 15:39 UTC (permalink / raw)
To: Max Gurtovoy, Chuck Lever, Bart Van Assche
Cc: jgg, arnd, dledford, linux-kernel, leonro, linux-rdma
>> The only reason why I added this array on-stack was to allow consumers
>> that did not use ib_alloc_cq api to call it, but that seems like a
>> wrong decision when thinking it over again (as probably these users
>> did not set the wr_cqe correctly).
>>
>> How about we make ib_process_cq_direct use the cq wc array and add
>> a WARN_ON statement (and fail it gracefully) if the caller used this
>> API without calling ib_alloc_cq?
>
> but we tried to avoid cuncurrent access to cq->wc.
Not sure its a valid use-case. But if there is a compelling
reason to keep it as is, then we can do smaller on-stack
array.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-22 15:39 ` Sagi Grimberg
@ 2018-02-27 22:09 ` Jason Gunthorpe
2018-02-27 22:15 ` Max Gurtovoy
0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2018-02-27 22:09 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Max Gurtovoy, Chuck Lever, Bart Van Assche, arnd, dledford,
linux-kernel, leonro, linux-rdma
On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
>
> >>The only reason why I added this array on-stack was to allow consumers
> >>that did not use ib_alloc_cq api to call it, but that seems like a
> >>wrong decision when thinking it over again (as probably these users
> >>did not set the wr_cqe correctly).
> >>
> >>How about we make ib_process_cq_direct use the cq wc array and add
> >>a WARN_ON statement (and fail it gracefully) if the caller used this
> >>API without calling ib_alloc_cq?
> >
> >but we tried to avoid cuncurrent access to cq->wc.
>
> Not sure its a valid use-case. But if there is a compelling
> reason to keep it as is, then we can do smaller on-stack
> array.
Did we come to a conclusion what to do here?
Jason
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-27 22:09 ` Jason Gunthorpe
@ 2018-02-27 22:15 ` Max Gurtovoy
2018-02-28 0:21 ` Bart Van Assche
0 siblings, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-27 22:15 UTC (permalink / raw)
To: Jason Gunthorpe, Sagi Grimberg
Cc: Chuck Lever, Bart Van Assche, arnd, dledford, linux-kernel,
leonro, linux-rdma
On 2/28/2018 12:09 AM, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2018 at 05:39:09PM +0200, Sagi Grimberg wrote:
>>
>>>> The only reason why I added this array on-stack was to allow consumers
>>>> that did not use ib_alloc_cq api to call it, but that seems like a
>>>> wrong decision when thinking it over again (as probably these users
>>>> did not set the wr_cqe correctly).
>>>>
>>>> How about we make ib_process_cq_direct use the cq wc array and add
>>>> a WARN_ON statement (and fail it gracefully) if the caller used this
>>>> API without calling ib_alloc_cq?
>>>
>>> but we tried to avoid cuncurrent access to cq->wc.
>>
>> Not sure its a valid use-case. But if there is a compelling
>> reason to keep it as is, then we can do smaller on-stack
>> array.
>
> Did we come to a conclusion what to do here?
guys,
what do you think about the following solution (untested):
diff --git a/drivers/infiniband/core/cq.c b/drivers/infiniband/core/cq.c
index bc79ca8..59d2835 100644
--- a/drivers/infiniband/core/cq.c
+++ b/drivers/infiniband/core/cq.c
@@ -17,6 +17,7 @@
/* # of WCs to poll for with a single call to ib_poll_cq */
#define IB_POLL_BATCH 16
+#define IB_POLL_BATCH_DIRECT 8
/* # of WCs to iterate over before yielding */
#define IB_POLL_BUDGET_IRQ 256
@@ -25,17 +26,25 @@
#define IB_POLL_FLAGS \
(IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS)
-static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc)
+static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
*poll_wc,
+ int batch)
{
- int i, n, completed = 0;
- struct ib_wc *wcs = poll_wc ? : cq->wc;
-
+ int i, n, ib_poll_batch, completed = 0;
+ struct ib_wc *wcs;
+
+ if (poll_wc) {
+ wcs = poll_wc;
+ ib_poll_batch = batch;
+ } else {
+ wcs = cq->wc;
+ ib_poll_batch = IB_POLL_BATCH;
+ }
/*
* budget might be (-1) if the caller does not
* want to bound this call, thus we need unsigned
* minimum here.
*/
- while ((n = ib_poll_cq(cq, min_t(u32, IB_POLL_BATCH,
+ while ((n = ib_poll_cq(cq, min_t(u32, ib_poll_batch,
budget - completed), wcs)) > 0) {
for (i = 0; i < n; i++) {
struct ib_wc *wc = &wcs[i];
@@ -48,7 +57,7 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)
completed += n;
- if (n != IB_POLL_BATCH ||
+ if (n != ib_poll_batch ||
(budget != -1 && completed >= budget))
break;
}
@@ -72,9 +81,9 @@ static int __ib_process_cq(struct ib_cq *cq, int
budget, struct ib_wc *poll_wc)
*/
int ib_process_cq_direct(struct ib_cq *cq, int budget)
{
- struct ib_wc wcs[IB_POLL_BATCH];
+ struct ib_wc wcs[IB_POLL_BATCH_DIRECT];
- return __ib_process_cq(cq, budget, wcs);
+ return __ib_process_cq(cq, budget, wcs, IB_POLL_BATCH_DIRECT);
}
EXPORT_SYMBOL(ib_process_cq_direct);
@@ -88,7 +97,7 @@ static int ib_poll_handler(struct irq_poll *iop, int
budget)
struct ib_cq *cq = container_of(iop, struct ib_cq, iop);
int completed;
- completed = __ib_process_cq(cq, budget, NULL);
+ completed = __ib_process_cq(cq, budget, NULL, 0);
if (completed < budget) {
irq_poll_complete(&cq->iop);
if (ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
@@ -108,7 +117,7 @@ static void ib_cq_poll_work(struct work_struct *work)
struct ib_cq *cq = container_of(work, struct ib_cq, work);
int completed;
- completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL);
+ completed = __ib_process_cq(cq, IB_POLL_BUDGET_WORKQUEUE, NULL, 0);
if (completed >= IB_POLL_BUDGET_WORKQUEUE ||
ib_req_notify_cq(cq, IB_POLL_FLAGS) > 0)
queue_work(ib_comp_wq, &cq->work);
>
> Jason
>
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-27 22:15 ` Max Gurtovoy
@ 2018-02-28 0:21 ` Bart Van Assche
2018-02-28 9:50 ` Max Gurtovoy
0 siblings, 1 reply; 16+ messages in thread
From: Bart Van Assche @ 2018-02-28 0:21 UTC (permalink / raw)
To: Max Gurtovoy, Jason Gunthorpe, Sagi Grimberg
Cc: Chuck Lever, arnd, dledford, linux-kernel, leonro, linux-rdma
On 02/27/18 14:15, Max Gurtovoy wrote:
> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc)
> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> *poll_wc,
> + int batch)
> {
> - int i, n, completed = 0;
> - struct ib_wc *wcs = poll_wc ? : cq->wc;
> + int i, n, ib_poll_batch, completed = 0;
> + struct ib_wc *wcs;
> +
> + if (poll_wc) {
> + wcs = poll_wc;
> + ib_poll_batch = batch;
> + } else {
> + wcs = cq->wc;
> + ib_poll_batch = IB_POLL_BATCH;
> + }
Since this code has to be touched I think that we can use this
opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
instead use what the caller passes. That will require to update all
__ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
pass ib_poll_batch instead of figuring it out in this function.
Otherwise the approach of this patch looks fine to me.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-28 0:21 ` Bart Van Assche
@ 2018-02-28 9:50 ` Max Gurtovoy
2018-02-28 18:55 ` Doug Ledford
0 siblings, 1 reply; 16+ messages in thread
From: Max Gurtovoy @ 2018-02-28 9:50 UTC (permalink / raw)
To: Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
Cc: Chuck Lever, arnd, dledford, linux-kernel, leonro, linux-rdma
On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> On 02/27/18 14:15, Max Gurtovoy wrote:
>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>> *poll_wc)
>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>> *poll_wc,
>> + int batch)
>> {
>> - int i, n, completed = 0;
>> - struct ib_wc *wcs = poll_wc ? : cq->wc;
>> + int i, n, ib_poll_batch, completed = 0;
>> + struct ib_wc *wcs;
>> +
>> + if (poll_wc) {
>> + wcs = poll_wc;
>> + ib_poll_batch = batch;
>> + } else {
>> + wcs = cq->wc;
>> + ib_poll_batch = IB_POLL_BATCH;
>> + }
>
> Since this code has to be touched I think that we can use this
> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
> instead use what the caller passes. That will require to update all
> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
> pass ib_poll_batch instead of figuring it out in this function.
> Otherwise the approach of this patch looks fine to me.
Thanks Bart.
I'll make these changes and submit.
>
> Thanks,
>
> Bart.
-Max.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-28 9:50 ` Max Gurtovoy
@ 2018-02-28 18:55 ` Doug Ledford
2018-03-01 9:36 ` Max Gurtovoy
0 siblings, 1 reply; 16+ messages in thread
From: Doug Ledford @ 2018-02-28 18:55 UTC (permalink / raw)
To: Max Gurtovoy, Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
Cc: Chuck Lever, arnd, linux-kernel, leonro, linux-rdma
[-- Attachment #1: Type: text/plain, Size: 1552 bytes --]
On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
>
> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
> > On 02/27/18 14:15, Max Gurtovoy wrote:
> > > -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> > > *poll_wc)
> > > +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
> > > *poll_wc,
> > > + int batch)
> > > {
> > > - int i, n, completed = 0;
> > > - struct ib_wc *wcs = poll_wc ? : cq->wc;
> > > + int i, n, ib_poll_batch, completed = 0;
> > > + struct ib_wc *wcs;
> > > +
> > > + if (poll_wc) {
> > > + wcs = poll_wc;
> > > + ib_poll_batch = batch;
> > > + } else {
> > > + wcs = cq->wc;
> > > + ib_poll_batch = IB_POLL_BATCH;
> > > + }
> >
> > Since this code has to be touched I think that we can use this
> > opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
> > instead use what the caller passes. That will require to update all
> > __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
> > pass ib_poll_batch instead of figuring it out in this function.
> > Otherwise the approach of this patch looks fine to me.
>
> Thanks Bart.
> I'll make these changes and submit.
That sounds reasonable to me too, thanks for reworking and resubmitting.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] RDMA/core: reduce IB_POLL_BATCH constant
2018-02-28 18:55 ` Doug Ledford
@ 2018-03-01 9:36 ` Max Gurtovoy
0 siblings, 0 replies; 16+ messages in thread
From: Max Gurtovoy @ 2018-03-01 9:36 UTC (permalink / raw)
To: Doug Ledford, Bart Van Assche, Jason Gunthorpe, Sagi Grimberg
Cc: Chuck Lever, arnd, linux-kernel, leonro, linux-rdma
On 2/28/2018 8:55 PM, Doug Ledford wrote:
> On Wed, 2018-02-28 at 11:50 +0200, Max Gurtovoy wrote:
>>
>> On 2/28/2018 2:21 AM, Bart Van Assche wrote:
>>> On 02/27/18 14:15, Max Gurtovoy wrote:
>>>> -static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc)
>>>> +static int __ib_process_cq(struct ib_cq *cq, int budget, struct ib_wc
>>>> *poll_wc,
>>>> + int batch)
>>>> {
>>>> - int i, n, completed = 0;
>>>> - struct ib_wc *wcs = poll_wc ? : cq->wc;
>>>> + int i, n, ib_poll_batch, completed = 0;
>>>> + struct ib_wc *wcs;
>>>> +
>>>> + if (poll_wc) {
>>>> + wcs = poll_wc;
>>>> + ib_poll_batch = batch;
>>>> + } else {
>>>> + wcs = cq->wc;
>>>> + ib_poll_batch = IB_POLL_BATCH;
>>>> + }
>>>
>>> Since this code has to be touched I think that we can use this
>>> opportunity to get rid of the "poll_wc ? : cq->wc" conditional and
>>> instead use what the caller passes. That will require to update all
>>> __ib_process_cq(..., ..., NULL) calls. I also propose to let the caller
>>> pass ib_poll_batch instead of figuring it out in this function.
>>> Otherwise the approach of this patch looks fine to me.
>>
>> Thanks Bart.
>> I'll make these changes and submit.
>
> That sounds reasonable to me too, thanks for reworking and resubmitting.
>
Sure, NP.
We've run NVMe-oF and SRP with the new patch.
I'll send it through Mellanox maintainers pull request.
Thanks for reporting and reviewing.
-Max.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-03-01 9:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-20 20:59 [PATCH] RDMA/core: reduce IB_POLL_BATCH constant Arnd Bergmann
2018-02-20 21:14 ` Parav Pandit
2018-02-20 21:54 ` Arnd Bergmann
2018-02-20 21:14 ` Bart Van Assche
2018-02-20 21:47 ` Chuck Lever
2018-02-21 9:47 ` Max Gurtovoy
2018-02-21 13:44 ` Sagi Grimberg
2018-02-21 14:45 ` Max Gurtovoy
2018-02-22 15:39 ` Sagi Grimberg
2018-02-27 22:09 ` Jason Gunthorpe
2018-02-27 22:15 ` Max Gurtovoy
2018-02-28 0:21 ` Bart Van Assche
2018-02-28 9:50 ` Max Gurtovoy
2018-02-28 18:55 ` Doug Ledford
2018-03-01 9:36 ` Max Gurtovoy
2018-02-21 15:10 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).