All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] Modify dql.min_limit value inside the driver
@ 2021-03-09 15:23 Vincent Mailhol
  2021-03-09 15:23 ` [RFC PATCH 1/1] dql: add dql_set_min_limit() Vincent Mailhol
  0 siblings, 1 reply; 5+ messages in thread
From: Vincent Mailhol @ 2021-03-09 15:23 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, linux-kernel, netdev, Tom Herbert,
	Eric Dumazet
  Cc: Peter Zijlstra, Randy Dunlap, Vincent Mailhol

Abstract: would like to directly set dql.min_limit value inside a
driver to improve BQL performances of a CAN USB driver.

CAN packets have a small PDU: for classical CAN maximum size is
roughly 16 bytes (8 for payload and 8 for arbitration, CRC and
others).

I am writing an CAN driver for an USB interface. To compensate the
extra latency introduced by the USB, I want to group several CAN
frames and do one USB bulk send. To this purpose, I implemented BQL in
my driver.

However, the BQL algorithms can take time to adjust, especially if
there are small bursts.

The best way I found is to directly modify the dql.min_limit and set
it to some empirical values. This way, even during small burst events
I can have a good throughput. Slightly increasing the dql.min_limit
has no measurable impact on the latency as long as frames fit in the
same USB packet (i.e. BQL overheard is negligible compared to USB
overhead).

The BQL was not designed for USB nor was it designed for CAN's small
PDUs which probably explains why I am the first one to ever have
thought of using dql.min_limit within the driver.

The code I wrote looks like:

> #ifdef CONFIG_BQL
>	netdev_get_tx_queue(netdev, 0)->dql.min_limit = <some empirical value>;
> #endif

Using #ifdef to set up some variables is not a best practice. I am
sending this RFC to see if we can add a function to set this
dql.min_limit in a more pretty way.

For your reference, this RFQ is a follow-up of a discussion on the
linux-can mailing list:
https://lore.kernel.org/linux-can/20210309125708.ei75tr5vp2sanfh6@pengutronix.de/

Thank you for your comments.

Yours sincerely,
Vincent


Vincent Mailhol (1):
  dql: add dql_set_min_limit()

 include/linux/dynamic_queue_limits.h | 3 +++
 lib/dynamic_queue_limits.c           | 8 ++++++++
 2 files changed, 11 insertions(+)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH 1/1] dql: add dql_set_min_limit()
  2021-03-09 15:23 [RFC PATCH 0/1] Modify dql.min_limit value inside the driver Vincent Mailhol
@ 2021-03-09 15:23 ` Vincent Mailhol
  2021-03-09 18:00   ` Vincent MAILHOL
  2021-03-09 19:44   ` Dave Taht
  0 siblings, 2 replies; 5+ messages in thread
From: Vincent Mailhol @ 2021-03-09 15:23 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, linux-kernel, netdev, Tom Herbert,
	Eric Dumazet
  Cc: Peter Zijlstra, Randy Dunlap, Vincent Mailhol

Add a function to set the dynamic queue limit minimum value.

This function is to be used by network drivers which are able to
prove, at least through empirical tests, that they reach better
performances with a specific predefined dql.min_limit value.

Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
---
 include/linux/dynamic_queue_limits.h | 3 +++
 lib/dynamic_queue_limits.c           | 8 ++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 407c2f281b64..32437f168a35 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -103,6 +103,9 @@ void dql_reset(struct dql *dql);
 /* Initialize dql state */
 void dql_init(struct dql *dql, unsigned int hold_time);
 
+/* Set the dql minimum limit */
+void dql_set_min_limit(struct dql *dql, unsigned int min_limit);
+
 #endif /* _KERNEL_ */
 
 #endif /* _LINUX_DQL_H */
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
index fde0aa244148..8b6ad1e0a2e3 100644
--- a/lib/dynamic_queue_limits.c
+++ b/lib/dynamic_queue_limits.c
@@ -136,3 +136,11 @@ void dql_init(struct dql *dql, unsigned int hold_time)
 	dql_reset(dql);
 }
 EXPORT_SYMBOL(dql_init);
+
+void dql_set_min_limit(struct dql *dql, unsigned int min_limit)
+{
+#ifdef CONFIG_BQL
+	dql->min_limit = min_limit;
+#endif
+}
+EXPORT_SYMBOL(dql_set_min_limit);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/1] dql: add dql_set_min_limit()
  2021-03-09 15:23 ` [RFC PATCH 1/1] dql: add dql_set_min_limit() Vincent Mailhol
@ 2021-03-09 18:00   ` Vincent MAILHOL
  2021-03-09 19:44   ` Dave Taht
  1 sibling, 0 replies; 5+ messages in thread
From: Vincent MAILHOL @ 2021-03-09 18:00 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, open list, netdev, Tom Herbert,
	Eric Dumazet
  Cc: Peter Zijlstra, Randy Dunlap

On Wed. 10 Mar 2021 at 00:23, Vincent Mailhol
<mailhol.vincent@wanadoo.fr> wrote:
>
> Add a function to set the dynamic queue limit minimum value.
>
> This function is to be used by network drivers which are able to
> prove, at least through empirical tests, that they reach better
> performances with a specific predefined dql.min_limit value.
>
> Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
> ---
>  include/linux/dynamic_queue_limits.h | 3 +++
>  lib/dynamic_queue_limits.c           | 8 ++++++++
>  2 files changed, 11 insertions(+)
>
> diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> index 407c2f281b64..32437f168a35 100644
> --- a/include/linux/dynamic_queue_limits.h
> +++ b/include/linux/dynamic_queue_limits.h
> @@ -103,6 +103,9 @@ void dql_reset(struct dql *dql);
>  /* Initialize dql state */
>  void dql_init(struct dql *dql, unsigned int hold_time);
>
> +/* Set the dql minimum limit */
> +void dql_set_min_limit(struct dql *dql, unsigned int min_limit);
> +
>  #endif /* _KERNEL_ */
>
>  #endif /* _LINUX_DQL_H */
> diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
> index fde0aa244148..8b6ad1e0a2e3 100644
> --- a/lib/dynamic_queue_limits.c
> +++ b/lib/dynamic_queue_limits.c
> @@ -136,3 +136,11 @@ void dql_init(struct dql *dql, unsigned int hold_time)
>         dql_reset(dql);
>  }
>  EXPORT_SYMBOL(dql_init);
> +
> +void dql_set_min_limit(struct dql *dql, unsigned int min_limit)
> +{
> +#ifdef CONFIG_BQL
> +       dql->min_limit = min_limit;
> +#endif

Marc pointed some issue on the #ifdef in a separate thread:
https://lore.kernel.org/linux-can/20210309153547.q7zspf46k6terxqv@pengutronix.de/

I will come back with a v2 tomorrow.

> +}
> +EXPORT_SYMBOL(dql_set_min_limit);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/1] dql: add dql_set_min_limit()
  2021-03-09 15:23 ` [RFC PATCH 1/1] dql: add dql_set_min_limit() Vincent Mailhol
  2021-03-09 18:00   ` Vincent MAILHOL
@ 2021-03-09 19:44   ` Dave Taht
  2021-03-10 15:56     ` Vincent MAILHOL
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Taht @ 2021-03-09 19:44 UTC (permalink / raw)
  To: Vincent Mailhol
  Cc: Marc Kleine-Budde, linux-can, LKML,
	Linux Kernel Network Developers, Tom Herbert, Eric Dumazet,
	Peter Zijlstra, Randy Dunlap

I note that "proof" is very much in the developer's opinion and
limited testing base.

Actual operational experience, as in a real deployment, with other applications,
heavy context switching, or virtualization, might yield better results.

There's lots of defaults in the linux kernel that are just swags, the
default NAPI and rx/tx ring buffer sizes being two where devs just
copy/paste stuff, which either doesn't scale up, or doesn't scale
down.

This does not mean I oppose your patch! However I have two points I'd
like to make
regarding bql and dql in general that I have long longed be explored.

0) Me being an advocate of low latency in general, does mean that I
have no problem
and even prefer, starving the device rather than always keeping it busy.

/me hides

1) BQL is MIAD - multiplicative increase, additive decrease. While in
practice so far this does not seem to matter much (and also measuring
things down to "us" really hard), a stabler algorithm is AIMD. BQL
often absorbs a large TSO burst - usually a minimum of 128k is
observed on gbit, where a stabler state (without GSO) seemed to be
around 40k on many of the chipsets I worked with, back when I was
working in this area.

(cake's gso-splitting also gets lower bql values in general, if you
have enough cpu to run cake)

2) BQL + hardware mq is increasingly an issue in my mind in that, say,
you are hitting
64 hw queues, each with 128k stored in there, is additive, where in
order to service interrupts properly and keep the media busy might
only require 128k total, spread across the active queues and flows. I
have often thought that making BQL scale better to multiple hw queues
by globally sharing the buffering state(s), would lead to lower
latency, but
also that probably sharing that state would be too high overhead.

Having not worked out a solution to 2), and preferring to start with
1), and not having a whole lot of support for item 0) in the world, I
just thought I'd mention it, in the hope
someone might give it a go.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH 1/1] dql: add dql_set_min_limit()
  2021-03-09 19:44   ` Dave Taht
@ 2021-03-10 15:56     ` Vincent MAILHOL
  0 siblings, 0 replies; 5+ messages in thread
From: Vincent MAILHOL @ 2021-03-10 15:56 UTC (permalink / raw)
  To: Dave Taht
  Cc: Marc Kleine-Budde, linux-can, LKML,
	Linux Kernel Network Developers, Tom Herbert, Eric Dumazet,
	Peter Zijlstra, Randy Dunlap

Hi Dave,

Thanks for the comprehensive comments!

On Wed. 10 Mar 2021 at 04:44, Dave Taht <dave.taht@gmail.com> wrote:
>
> I note that "proof" is very much in the developer's opinion and
> limited testing base.
>
> Actual operational experience, as in a real deployment, with other applications,
> heavy context switching, or virtualization, might yield better results.

Agree. I was not thorough in my description, but what you pointed
here is actually what I had in mind (and what I did for my
driver).  Let me borrow your exemple and include those in the v2
of the patch.

> There's lots of defaults in the linux kernel that are just swags, the
> default NAPI and rx/tx ring buffer sizes being two where devs just
> copy/paste stuff, which either doesn't scale up, or doesn't scale
> down.
>
> This does not mean I oppose your patch! However I have two points I'd
> like to make
> regarding bql and dql in general that I have long longed be explored.
>
> 0) Me being an advocate of low latency in general, does mean that I
> have no problem
> and even prefer, starving the device rather than always keeping it busy.
>
> /me hides

Fully agree. The intent of this patch is for specific use cases
where setting a default dql.min_limit has minimum latency impact
for a noticeable throughput increase.

My use case is a CAN driver for a USB interface module. The
maximum PDU of CAN protocol is roughly 16 bytes, the USB maximum
packet size is 512 bytes. If I force dql.min_limit to be around
240 bytes (i.e. roughly 15 CAN frames), all 15 frames easily fit
in a single USB packet. Preparing a packet of 240 bytes is
relatively fast (small latency issue) but the gain of not having
to send 15 separate USB packets is huge (big throughput
increase).

My patch was really written for this specific context. However, I
am not knowledgeable enough on other network protocols to give
other examples where this new function could be applied (my blind
guess is that most of the protocol should *not* use it).

> 1) BQL is MIAD - multiplicative increase, additive decrease. While in
> practice so far this does not seem to matter much (and also measuring
> things down to "us" really hard), a stabler algorithm is AIMD. BQL
> often absorbs a large TSO burst - usually a minimum of 128k is
> observed on gbit, where a stabler state (without GSO) seemed to be
> around 40k on many of the chipsets I worked with, back when I was
> working in this area.
>
> (cake's gso-splitting also gets lower bql values in general, if you
> have enough cpu to run cake)
>
> 2) BQL + hardware mq is increasingly an issue in my mind in that, say,
> you are hitting
> 64 hw queues, each with 128k stored in there, is additive, where in
> order to service interrupts properly and keep the media busy might
> only require 128k total, spread across the active queues and flows. I
> have often thought that making BQL scale better to multiple hw queues
> by globally sharing the buffering state(s), would lead to lower
> latency, but
> also that probably sharing that state would be too high overhead.
>
> Having not worked out a solution to 2), and preferring to start with
> 1), and not having a whole lot of support for item 0) in the world, I
> just thought I'd mention it, in the hope
> someone might give it a go.

Thank you for the comments, however, I will be of small help
here. As mentioned above, my use cases are in bytes, not in
kilobytes. I lack experience here.

My experience is that BQL is not adapted for protocol with small
PDU and also not adapted for interfaces with a high
latency (e.g. USB) by default. But, modifying the dql.min_limit
solves it.

So, I let over people continue the discussion on points 1) and 2)


Yours sincerely,
Vincent

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-10 15:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09 15:23 [RFC PATCH 0/1] Modify dql.min_limit value inside the driver Vincent Mailhol
2021-03-09 15:23 ` [RFC PATCH 1/1] dql: add dql_set_min_limit() Vincent Mailhol
2021-03-09 18:00   ` Vincent MAILHOL
2021-03-09 19:44   ` Dave Taht
2021-03-10 15:56     ` Vincent MAILHOL

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.