All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] can: c_can: Speed up rx_poll function
@ 2013-10-29  8:27 Markus Pargmann
  2013-10-29  8:31 ` Joe Perches
       [not found] ` <1383035688.2713.2.camel@joe-AO722>
  0 siblings, 2 replies; 5+ messages in thread
From: Markus Pargmann @ 2013-10-29  8:27 UTC (permalink / raw)
  To: Marc Kleine-Budde, Wolfgang Grandegger
  Cc: Joe Perches, linux-can, netdev, linux-kernel, kernel, Markus Pargmann

This patch speeds up the rx_poll function by reducing the number of
register reads.

Replace the 32bit register read by a 16bit register read. Currently
the 32bit register read is implemented by using 2 16bit reads. This is
inefficient as we only use the lower 16bit in rx_poll.

The for loop reads the pending interrupts in every iteration. This
leads up to 16 reads of pending interrupts. The patch introduces a new
outer loop to read the pending interrupts as long as 'quota' is above 0.
This reduces the total number of reads.

The third change is to replace the for-loop by a find_next_bit loop.

Tested on AM335x. I removed all 'static' and 'inline' from c_can.c to
see the timings for all functions. I used the function tracer with
trace_stats.

125kbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
With patch:
  c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us

1Mbit:
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
With patch:
  c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us

Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
---

Notes:
    Changes in v2:
     - Small changes, find_next_bit -> ffs and other

 drivers/net/can/c_can/c_can.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index a668cd4..428681e 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -798,17 +798,19 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
 	u32 num_rx_pkts = 0;
 	unsigned int msg_obj, msg_ctrl_save;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 val = c_can_read_reg32(priv, C_CAN_INTPND1_REG);
+	u16 val;
+
+	/*
+	 * It is faster to read only one 16bit register. This is only possible
+	 * for a maximum number of 16 objects.
+	 */
+	BUILD_BUG_ON_MSG(C_CAN_MSG_OBJ_RX_LAST > 16,
+			"Implementation does not support more message objects than 16");
+
+	while (quota > 0 && (val = priv->read_reg(priv, C_CAN_INTPND1_REG))) {
+		while ((msg_obj = ffs(val)) && quota > 0) {
+			val &= ~BIT(msg_obj - 1);
 
-	for (msg_obj = C_CAN_MSG_OBJ_RX_FIRST;
-			msg_obj <= C_CAN_MSG_OBJ_RX_LAST && quota > 0;
-			val = c_can_read_reg32(priv, C_CAN_INTPND1_REG),
-			msg_obj++) {
-		/*
-		 * as interrupt pending register's bit n-1 corresponds to
-		 * message object n, we need to handle the same properly.
-		 */
-		if (val & (1 << (msg_obj - 1))) {
 			c_can_object_get(dev, 0, msg_obj, IF_COMM_ALL &
 					~IF_COMM_TXRQST);
 			msg_ctrl_save = priv->read_reg(priv,
-- 
1.8.4.rc3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] can: c_can: Speed up rx_poll function
  2013-10-29  8:27 [PATCH v2] can: c_can: Speed up rx_poll function Markus Pargmann
@ 2013-10-29  8:31 ` Joe Perches
       [not found] ` <1383035688.2713.2.camel@joe-AO722>
  1 sibling, 0 replies; 5+ messages in thread
From: Joe Perches @ 2013-10-29  8:31 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel

On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> This patch speeds up the rx_poll function by reducing the number of
> register reads.
[]
> The third change is to replace the for-loop by a find_next_bit loop.

You need to update the commit message.

> diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
[]
> @@ -798,17 +798,19 @@ static int c_can_do_rx_poll(struct net_device *dev, int quota)
[]
> +	while (quota > 0 && (val = priv->read_reg(priv, C_CAN_INTPND1_REG))) {
> +		while ((msg_obj = ffs(val)) && quota > 0) {
> +			val &= ~BIT(msg_obj - 1);




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] can: c_can: Speed up rx_poll function
       [not found] ` <1383035688.2713.2.camel@joe-AO722>
@ 2013-10-29  8:58   ` Markus Pargmann
  2013-10-29 16:24     ` Joe Perches
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Pargmann @ 2013-10-29  8:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel

On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > This patch speeds up the rx_poll function by reducing the number of
> > register reads.
> []
> > 125kbit:
> >   Function                               Hit    Time            Avg             s^2
> >   --------                               ---    ----            ---             ---
> >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > With patch:
> >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > 
> > 1Mbit:
> >   Function                               Hit    Time            Avg             s^2
> >   --------                               ---    ----            ---             ---
> >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > With patch:
> >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
> 
> Also nicer if you updated the timings table
> 
> 

Yes I just measured the timings again:

./perf_can_test.sh 125000 30
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_poll                          127882    6178806 us     48.316 us       4393411 us 
  c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 
  c_can_enable_all_interrupts         255764    2213697 us     8.655 us        1093934 us 
  c_can_plat_write_reg_aligned_t      807252    1607115 us     1.990 us        10053684 us 
  c_can_isr                           127882    1220001 us     9.540 us        789.495 us  
  c_can_object_put                    119887    1039222 us     8.668 us        1608.668 us 
  c_can_plat_read_reg_aligned_to     1015072    1033283 us     1.017 us        7021.465 us 
  c_can_read_msg_object                63943    1026159 us     16.048 us       718.464 us  
  c_can_activate_all_lower_rx_ms        7992    755782.4 us     94.567 us       55.270 us   
  c_can_mark_rx_msg_obj                55951    709072.1 us     12.673 us       39.974 us   
  c_can_object_get                     63943    555669.2 us     8.690 us        96.211 us   
  c_can_msg_obj_is_busy               183830    527826.1 us     2.871 us        7289.221 us 
  alloc_can_skb                        63943    170966.6 us     2.673 us        165.765 us  
  c_can_has_and_handle_berr            63941    47063.18 us     0.736 us        2.757 us    

./perf_can_test.sh 1000000 30
  Function                               Hit    Time            Avg             s^2
  --------                               ---    ----            ---             ---
  c_can_poll                          270678    30290751 us     111.906 us      5809627 us 
  c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
  c_can_object_put                    841431    7278794 us     8.650 us        305841.0 us 
  c_can_plat_write_reg_aligned_t     4037180    6244636 us     1.546 us        4581066 us 
  c_can_read_msg_object               453988    6033464 us     13.289 us       128809.6 us 
  c_can_enable_all_interrupts         541342    5849826 us     10.806 us       22458900 us 
  c_can_activate_all_lower_rx_ms       55349    5237761 us     94.631 us       19004.79 us 
  c_can_mark_rx_msg_obj               387429    4865632 us     12.558 us       380606.4 us 
  c_can_plat_read_reg_aligned_to     4597629    4633247 us     1.007 us        315286.2 us 
  c_can_object_get                    453988    3919692 us     8.633 us        59847.76 us 
  c_can_msg_obj_is_busy              1295419    3706862 us     2.861 us        316655.7 us 
  c_can_isr                           270671    2496734 us     9.224 us        530.967 us  
  alloc_can_skb                       453988    856917.4 us     1.887 us        18157.68 us 
  c_can_activate_rx_msg_obj            11210    141177.4 us     12.593 us       123.068 us  
  c_can_has_and_handle_berr            63569    44995.85 us     0.707 us        12.780 us

It is interesting that the number of hits for c_can_do_rx_poll is twice as much
as it was with find_next_bit. Unfortunately this reduces the overall benefit of
this patch. Any ideas how to increase the number of waiting message objects we
handle in one poll call?


Regards,

Markus

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] can: c_can: Speed up rx_poll function
  2013-10-29  8:58   ` Markus Pargmann
@ 2013-10-29 16:24     ` Joe Perches
  2013-10-30  7:54       ` Markus Pargmann
  0 siblings, 1 reply; 5+ messages in thread
From: Joe Perches @ 2013-10-29 16:24 UTC (permalink / raw)
  To: Markus Pargmann
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel

On Tue, 2013-10-29 at 09:58 +0100, Markus Pargmann wrote:
> On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> > On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > > This patch speeds up the rx_poll function by reducing the number of
> > > register reads.
> > []
> > > 125kbit:
> > >   Function                               Hit    Time            Avg             s^2
> > >   --------                               ---    ----            ---             ---
> > >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > > With patch:
> > >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > > 
> > > 1Mbit:
> > >   Function                               Hit    Time            Avg             s^2
> > >   --------                               ---    ----            ---             ---
> > >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > > With patch:
> > >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
[]
> Yes I just measured the timings again:
[]
> ./perf_can_test.sh 125000 30
[]
>   c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 

Good, it's slightly faster still.

> ./perf_can_test.sh 1000000 30
[]
>   c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
[]
> It is interesting that the number of hits for c_can_do_rx_poll is twice as much
> as it was with find_next_bit.

How is this possible?  Any idea?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] can: c_can: Speed up rx_poll function
  2013-10-29 16:24     ` Joe Perches
@ 2013-10-30  7:54       ` Markus Pargmann
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Pargmann @ 2013-10-30  7:54 UTC (permalink / raw)
  To: Joe Perches
  Cc: Marc Kleine-Budde, Wolfgang Grandegger, linux-can, netdev,
	linux-kernel, kernel

On Tue, Oct 29, 2013 at 09:24:05AM -0700, Joe Perches wrote:
> On Tue, 2013-10-29 at 09:58 +0100, Markus Pargmann wrote:
> > On Tue, Oct 29, 2013 at 01:34:48AM -0700, Joe Perches wrote:
> > > On Tue, 2013-10-29 at 09:27 +0100, Markus Pargmann wrote:
> > > > This patch speeds up the rx_poll function by reducing the number of
> > > > register reads.
> > > []
> > > > 125kbit:
> > > >   Function                               Hit    Time            Avg             s^2
> > > >   --------                               ---    ----            ---             ---
> > > >   c_can_do_rx_poll                     63960    10168178 us     158.977 us      1493056 us
> > > > With patch:
> > > >   c_can_do_rx_poll                     63939    4268457 us     66.758 us       818790.9 us
> > > > 
> > > > 1Mbit:
> > > >   Function                               Hit    Time            Avg             s^2
> > > >   --------                               ---    ----            ---             ---
> > > >   c_can_do_rx_poll                     69489    30049498 us     432.435 us      9271851 us
> > > > With patch:
> > > >   c_can_do_rx_poll                    103034    24220362 us     235.071 us      6016656 us
> []
> > Yes I just measured the timings again:
> []
> > ./perf_can_test.sh 125000 30
> []
> >   c_can_do_rx_poll                     63941    3764057 us     58.867 us       776162.2 us 
> 
> Good, it's slightly faster still.
> 
> > ./perf_can_test.sh 1000000 30
> []
> >   c_can_do_rx_poll                    207109    24322185 us     117.436 us      171469047 us 
> []
> > It is interesting that the number of hits for c_can_do_rx_poll is twice as much
> > as it was with find_next_bit.
> 
> How is this possible?  Any idea?

Perhaps the can core received more messages in the previous patch
version while being in the poll function. This way it can handle more
messages without a new interrupt/poll call.

The new patch is faster so it does not receive as many new messages as
the old one, leading to more interrupts and less handled messages per
poll call.

Regards,

Markus Pargmann

-- 
Pengutronix e.K.                           |                             |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-10-30  7:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-29  8:27 [PATCH v2] can: c_can: Speed up rx_poll function Markus Pargmann
2013-10-29  8:31 ` Joe Perches
     [not found] ` <1383035688.2713.2.camel@joe-AO722>
2013-10-29  8:58   ` Markus Pargmann
2013-10-29 16:24     ` Joe Perches
2013-10-30  7:54       ` Markus Pargmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.