linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO
@ 2021-06-06 20:17 Dario Binacchi
  2021-06-06 20:17 ` [PATCH 1/3] can: c_can: exit c_can_do_tx() early if no frames have been sent Dario Binacchi
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Dario Binacchi @ 2021-06-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gianluca Falavigna, Dario Binacchi, David S. Miller,
	Jakub Kicinski, Marc Kleine-Budde, Oliver Hartkopp, Tong Zhang,
	Vincent Mailhol, Wolfgang Grandegger, YueHaibing, Zhang Qilong,
	linux-can, netdev


Performance tests of the c_can driver led to the patch that gives the
series its name.


Dario Binacchi (3):
  can: c_can: exit c_can_do_tx() early if no frames have been sent
  can: c_can: support tx ring algorithm
  can: c_can: cache frames to operate as a true FIFO

 drivers/net/can/c_can/c_can.c | 100 ++++++++++++++++++++++++++--------
 drivers/net/can/c_can/c_can.h |  25 ++++++++-
 2 files changed, 101 insertions(+), 24 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] can: c_can: exit c_can_do_tx() early if no frames have been sent
  2021-06-06 20:17 [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
@ 2021-06-06 20:17 ` Dario Binacchi
  2021-06-06 20:17 ` [PATCH 2/3] can: c_can: support tx ring algorithm Dario Binacchi
  2021-06-06 20:17 ` [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
  2 siblings, 0 replies; 8+ messages in thread
From: Dario Binacchi @ 2021-06-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gianluca Falavigna, Dario Binacchi, David S. Miller,
	Jakub Kicinski, Marc Kleine-Budde, Oliver Hartkopp, Tong Zhang,
	Vincent Mailhol, Wolfgang Grandegger, YueHaibing, Zhang Qilong,
	linux-can, netdev

The c_can_poll() handles RX/TX events unconditionally. It may therefore
happen that c_can_do_tx() is called unnecessarily because the interrupt
was triggered by the reception of a frame. In these cases, we avoid to
execute unnecessary statements and exit immediately.

Signed-off-by: Dario Binacchi <dariobin@libero.it>
---

 drivers/net/can/c_can/c_can.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 313793f6922d..2b203bf004f9 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -721,17 +721,18 @@ static void c_can_do_tx(struct net_device *dev)
 		pkts++;
 	}
 
+	if (!pkts)
+		return;
+
 	/* Clear the bits in the tx_active mask */
 	atomic_sub(clr, &priv->tx_active);
 
 	if (clr & BIT(priv->msg_obj_tx_num - 1))
 		netif_wake_queue(dev);
 
-	if (pkts) {
-		stats->tx_bytes += bytes;
-		stats->tx_packets += pkts;
-		can_led_event(dev, CAN_LED_EVENT_TX);
-	}
+	stats->tx_bytes += bytes;
+	stats->tx_packets += pkts;
+	can_led_event(dev, CAN_LED_EVENT_TX);
 }
 
 /* If we have a gap in the pending bits, that means we either
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] can: c_can: support tx ring algorithm
  2021-06-06 20:17 [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
  2021-06-06 20:17 ` [PATCH 1/3] can: c_can: exit c_can_do_tx() early if no frames have been sent Dario Binacchi
@ 2021-06-06 20:17 ` Dario Binacchi
  2021-06-06 20:17 ` [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
  2 siblings, 0 replies; 8+ messages in thread
From: Dario Binacchi @ 2021-06-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gianluca Falavigna, Dario Binacchi, David S. Miller,
	Jakub Kicinski, Marc Kleine-Budde, Oliver Hartkopp, Tong Zhang,
	Vincent Mailhol, Wolfgang Grandegger, YueHaibing, Zhang Qilong,
	linux-can, netdev

The algorithm is already used successfully by other CAN drivers
(e.g. mcp251xfd). Its implementation was kindly suggested to me by
Marc Kleine-Budde following a patch I had previously submitted. You can
find every detail at https://lore.kernel.org/patchwork/patch/1422929/.

The idea is that after this patch, it will be easier to patch the driver
to use the message object memory as a true FIFO.

Suggested-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Dario Binacchi <dariobin@libero.it>
---

 drivers/net/can/c_can/c_can.c | 81 +++++++++++++++++++++++++++--------
 drivers/net/can/c_can/c_can.h | 19 +++++++-
 2 files changed, 82 insertions(+), 18 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 2b203bf004f9..0548485f522d 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -427,24 +427,64 @@ static void c_can_setup_receive_object(struct net_device *dev, int iface,
 	c_can_object_put(dev, iface, obj, IF_COMM_RCV_SETUP);
 }
 
+static u8 c_can_get_tx_free(const struct c_can_tx_ring *ring)
+{
+	u8 head = c_can_get_tx_head(ring);
+	u8 tail = c_can_get_tx_tail(ring);
+
+	/* This is not a FIFO. C/D_CAN sends out the buffers
+	 * prioritized. The lowest buffer number wins.
+	 */
+	if (head < tail)
+		return 0;
+
+	return ring->obj_num - head;
+}
+
+static bool c_can_tx_busy(const struct c_can_priv *priv,
+			  const struct c_can_tx_ring *tx_ring)
+{
+	if (c_can_get_tx_free(tx_ring) > 0)
+		return false;
+
+	netif_stop_queue(priv->dev);
+
+	/* Memory barrier before checking tx_free (head and tail) */
+	smp_mb();
+
+	if (c_can_get_tx_free(tx_ring) == 0) {
+		netdev_dbg(priv->dev,
+			   "Stopping tx-queue (tx_head=0x%08x, tx_tail=0x%08x, len=%d).\n",
+			   tx_ring->head, tx_ring->tail,
+			   tx_ring->head - tx_ring->tail);
+		return true;
+	}
+
+	netif_start_queue(priv->dev);
+	return false;
+}
+
 static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 				    struct net_device *dev)
 {
 	struct can_frame *frame = (struct can_frame *)skb->data;
 	struct c_can_priv *priv = netdev_priv(dev);
+	struct c_can_tx_ring *tx_ring = &priv->tx;
 	u32 idx, obj;
 
 	if (can_dropped_invalid_skb(dev, skb))
 		return NETDEV_TX_OK;
-	/* This is not a FIFO. C/D_CAN sends out the buffers
-	 * prioritized. The lowest buffer number wins.
-	 */
-	idx = fls(atomic_read(&priv->tx_active));
-	obj = idx + priv->msg_obj_tx_first;
 
-	/* If this is the last buffer, stop the xmit queue */
-	if (idx == priv->msg_obj_tx_num - 1)
+	if (c_can_tx_busy(priv, tx_ring))
+		return NETDEV_TX_BUSY;
+
+	idx = c_can_get_tx_head(tx_ring);
+	tx_ring->head++;
+	if (c_can_get_tx_free(tx_ring) == 0)
 		netif_stop_queue(dev);
+
+	obj = idx + priv->msg_obj_tx_first;
+
 	/* Store the message in the interface so we can call
 	 * can_put_echo_skb(). We must do this before we enable
 	 * transmit as we might race against do_tx().
@@ -453,8 +493,6 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	priv->dlc[idx] = frame->len;
 	can_put_echo_skb(skb, dev, idx, 0);
 
-	/* Update the active bits */
-	atomic_add(BIT(idx), &priv->tx_active);
 	/* Start transmission */
 	c_can_object_put(dev, IF_TX, obj, IF_COMM_TX);
 
@@ -567,6 +605,7 @@ static int c_can_software_reset(struct net_device *dev)
 static int c_can_chip_config(struct net_device *dev)
 {
 	struct c_can_priv *priv = netdev_priv(dev);
+	struct c_can_tx_ring *tx_ring = &priv->tx;
 	int err;
 
 	err = c_can_software_reset(dev);
@@ -598,7 +637,8 @@ static int c_can_chip_config(struct net_device *dev)
 	priv->write_reg(priv, C_CAN_STS_REG, LEC_UNUSED);
 
 	/* Clear all internal status */
-	atomic_set(&priv->tx_active, 0);
+	tx_ring->head = 0;
+	tx_ring->tail = 0;
 	priv->rxmasked = 0;
 	priv->tx_dir = 0;
 
@@ -697,14 +737,14 @@ static int c_can_get_berr_counter(const struct net_device *dev,
 static void c_can_do_tx(struct net_device *dev)
 {
 	struct c_can_priv *priv = netdev_priv(dev);
+	struct c_can_tx_ring *tx_ring = &priv->tx;
 	struct net_device_stats *stats = &dev->stats;
-	u32 idx, obj, pkts = 0, bytes = 0, pend, clr;
+	u32 idx, obj, pkts = 0, bytes = 0, pend;
 
 	if (priv->msg_obj_tx_last > 32)
 		pend = priv->read_reg32(priv, C_CAN_INTPND3_REG);
 	else
 		pend = priv->read_reg(priv, C_CAN_INTPND2_REG);
-	clr = pend;
 
 	while ((idx = ffs(pend))) {
 		idx--;
@@ -724,11 +764,14 @@ static void c_can_do_tx(struct net_device *dev)
 	if (!pkts)
 		return;
 
-	/* Clear the bits in the tx_active mask */
-	atomic_sub(clr, &priv->tx_active);
-
-	if (clr & BIT(priv->msg_obj_tx_num - 1))
-		netif_wake_queue(dev);
+	tx_ring->tail += pkts;
+	if (c_can_get_tx_free(tx_ring)) {
+		/* Make sure that anybody stopping the queue after
+		 * this sees the new tx_ring->tail.
+		 */
+		smp_mb();
+		netif_wake_queue(priv->dev);
+	}
 
 	stats->tx_bytes += bytes;
 	stats->tx_packets += pkts;
@@ -1207,6 +1250,10 @@ struct net_device *alloc_c_can_dev(int msg_obj_num)
 	priv->msg_obj_tx_last =
 		priv->msg_obj_tx_first + priv->msg_obj_tx_num - 1;
 
+	priv->tx.head = 0;
+	priv->tx.tail = 0;
+	priv->tx.obj_num = msg_obj_tx_num;
+
 	netif_napi_add(dev, &priv->napi, c_can_poll, priv->msg_obj_rx_num);
 
 	priv->dev = dev;
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 06045f610f0e..c72cb6a7fd37 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -176,6 +176,13 @@ struct c_can_raminit {
 	bool needs_pulse;
 };
 
+/* c_can tx ring structure */
+struct c_can_tx_ring {
+	unsigned int head;
+	unsigned int tail;
+	unsigned int obj_num;
+};
+
 /* c_can private data structure */
 struct c_can_priv {
 	struct can_priv can;	/* must be the first member */
@@ -190,10 +197,10 @@ struct c_can_priv {
 	unsigned int msg_obj_tx_first;
 	unsigned int msg_obj_tx_last;
 	u32 msg_obj_rx_mask;
-	atomic_t tx_active;
 	atomic_t sie_pending;
 	unsigned long tx_dir;
 	int last_status;
+	struct c_can_tx_ring tx;
 	u16 (*read_reg)(const struct c_can_priv *priv, enum reg index);
 	void (*write_reg)(const struct c_can_priv *priv, enum reg index, u16 val);
 	u32 (*read_reg32)(const struct c_can_priv *priv, enum reg index);
@@ -219,4 +226,14 @@ int c_can_power_up(struct net_device *dev);
 int c_can_power_down(struct net_device *dev);
 #endif
 
+static inline u8 c_can_get_tx_head(const struct c_can_tx_ring *ring)
+{
+	return ring->head & (ring->obj_num - 1);
+}
+
+static inline u8 c_can_get_tx_tail(const struct c_can_tx_ring *ring)
+{
+	return ring->tail & (ring->obj_num - 1);
+}
+
 #endif /* C_CAN_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO
  2021-06-06 20:17 [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
  2021-06-06 20:17 ` [PATCH 1/3] can: c_can: exit c_can_do_tx() early if no frames have been sent Dario Binacchi
  2021-06-06 20:17 ` [PATCH 2/3] can: c_can: support tx ring algorithm Dario Binacchi
@ 2021-06-06 20:17 ` Dario Binacchi
  2 siblings, 0 replies; 8+ messages in thread
From: Dario Binacchi @ 2021-06-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Gianluca Falavigna, Dario Binacchi, David S. Miller,
	Jakub Kicinski, Marc Kleine-Budde, Oliver Hartkopp, Tong Zhang,
	Vincent Mailhol, Wolfgang Grandegger, YueHaibing, Zhang Qilong,
	linux-can, netdev

As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if head was less tail in the tx ring ? It
waited until all the frames queued in the FIFO was actually transmitted
by the controller before accepting a new CAN frame to transmit, even if
the FIFO was not full, to ensure that the messages were transmitted in
the order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Suggested-by: Gianluca Falavigna <gianluca.falavigna@inwind.it>
Signed-off-by: Dario Binacchi <dariobin@libero.it>

---

 drivers/net/can/c_can/c_can.c | 42 ++++++++++++++++++++---------------
 drivers/net/can/c_can/c_can.h |  6 +++++
 2 files changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.c b/drivers/net/can/c_can/c_can.c
index 0548485f522d..9b809ea61094 100644
--- a/drivers/net/can/c_can/c_can.c
+++ b/drivers/net/can/c_can/c_can.c
@@ -427,20 +427,6 @@ static void c_can_setup_receive_object(struct net_device *dev, int iface,
 	c_can_object_put(dev, iface, obj, IF_COMM_RCV_SETUP);
 }
 
-static u8 c_can_get_tx_free(const struct c_can_tx_ring *ring)
-{
-	u8 head = c_can_get_tx_head(ring);
-	u8 tail = c_can_get_tx_tail(ring);
-
-	/* This is not a FIFO. C/D_CAN sends out the buffers
-	 * prioritized. The lowest buffer number wins.
-	 */
-	if (head < tail)
-		return 0;
-
-	return ring->obj_num - head;
-}
-
 static bool c_can_tx_busy(const struct c_can_priv *priv,
 			  const struct c_can_tx_ring *tx_ring)
 {
@@ -470,7 +456,7 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	struct can_frame *frame = (struct can_frame *)skb->data;
 	struct c_can_priv *priv = netdev_priv(dev);
 	struct c_can_tx_ring *tx_ring = &priv->tx;
-	u32 idx, obj;
+	u32 idx, obj, cmd = IF_COMM_TX;
 
 	if (can_dropped_invalid_skb(dev, skb))
 		return NETDEV_TX_OK;
@@ -483,7 +469,11 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	if (c_can_get_tx_free(tx_ring) == 0)
 		netif_stop_queue(dev);
 
-	obj = idx + priv->msg_obj_tx_first;
+	spin_lock_bh(&priv->tx_lock);
+	if (idx < c_can_get_tx_tail(tx_ring))
+		cmd &= ~IF_COMM_TXRQST; /* Cache the message */
+	else
+		spin_unlock_bh(&priv->tx_lock);
 
 	/* Store the message in the interface so we can call
 	 * can_put_echo_skb(). We must do this before we enable
@@ -492,9 +482,11 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	c_can_setup_tx_object(dev, IF_TX, frame, idx);
 	priv->dlc[idx] = frame->len;
 	can_put_echo_skb(skb, dev, idx, 0);
+	obj = idx + priv->msg_obj_tx_first;
+	c_can_object_put(dev, IF_TX, obj, cmd);
 
-	/* Start transmission */
-	c_can_object_put(dev, IF_TX, obj, IF_COMM_TX);
+	if (spin_is_locked(&priv->tx_lock))
+		spin_unlock_bh(&priv->tx_lock);
 
 	return NETDEV_TX_OK;
 }
@@ -740,6 +732,7 @@ static void c_can_do_tx(struct net_device *dev)
 	struct c_can_tx_ring *tx_ring = &priv->tx;
 	struct net_device_stats *stats = &dev->stats;
 	u32 idx, obj, pkts = 0, bytes = 0, pend;
+	u8 tail;
 
 	if (priv->msg_obj_tx_last > 32)
 		pend = priv->read_reg32(priv, C_CAN_INTPND3_REG);
@@ -776,6 +769,18 @@ static void c_can_do_tx(struct net_device *dev)
 	stats->tx_bytes += bytes;
 	stats->tx_packets += pkts;
 	can_led_event(dev, CAN_LED_EVENT_TX);
+
+	tail = c_can_get_tx_tail(tx_ring);
+
+	if (tail == 0) {
+		u8 head = c_can_get_tx_head(tx_ring);
+
+		/* Start transmission for all cached messages */
+		for (idx = tail; idx < head; idx++) {
+			obj = idx + priv->msg_obj_tx_first;
+			c_can_object_put(dev, IF_TX, obj, IF_COMM_TXRQST);
+		}
+	}
 }
 
 /* If we have a gap in the pending bits, that means we either
@@ -1238,6 +1243,7 @@ struct net_device *alloc_c_can_dev(int msg_obj_num)
 		return NULL;
 
 	priv = netdev_priv(dev);
+	spin_lock_init(&priv->tx_lock);
 	priv->msg_obj_num = msg_obj_num;
 	priv->msg_obj_rx_num = msg_obj_num - msg_obj_tx_num;
 	priv->msg_obj_rx_first = 1;
diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index c72cb6a7fd37..520daa77f876 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -200,6 +200,7 @@ struct c_can_priv {
 	atomic_t sie_pending;
 	unsigned long tx_dir;
 	int last_status;
+	spinlock_t tx_lock;
 	struct c_can_tx_ring tx;
 	u16 (*read_reg)(const struct c_can_priv *priv, enum reg index);
 	void (*write_reg)(const struct c_can_priv *priv, enum reg index, u16 val);
@@ -236,4 +237,9 @@ static inline u8 c_can_get_tx_tail(const struct c_can_tx_ring *ring)
 	return ring->tail & (ring->obj_num - 1);
 }
 
+static inline u8 c_can_get_tx_free(const struct c_can_tx_ring *ring)
+{
+	return ring->obj_num - (ring->head - ring->tail);
+}
+
 #endif /* C_CAN_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO
  2021-05-10 12:36     ` Marc Kleine-Budde
@ 2021-05-13 11:23       ` Dario Binacchi
  0 siblings, 0 replies; 8+ messages in thread
From: Dario Binacchi @ 2021-05-13 11:23 UTC (permalink / raw)
  To: Marc Kleine-Budde
  Cc: linux-kernel, David S. Miller, Gianluca Falavigna,
	Jakub Kicinski, Oliver Hartkopp, Vincent Mailhol,
	Wolfgang Grandegger, linux-can, netdev

Hi Marc,

> Il 10/05/2021 14:36 Marc Kleine-Budde <mkl@pengutronix.de> ha scritto:
> 
>  
> On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> > On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > > buffer number wins.
> > > 
> > > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > > waited until the only frame of the FIFO was actually transmitted by the
> > > controller. Only one message in the FIFO but we had to wait for it to
> > > empty completely to ensure that the messages were transmitted in the
> > > order in which they were loaded.
> > > 
> > > By storing the frames in the FIFO without requiring its transmission, we
> > > will be able to use the full size of the FIFO even in cases such as the
> > > one described above. The transmission interrupt will trigger their
> > > transmission only when all the messages previously loaded but stored in
> > > less priority positions of the buffers have been transmitted.
> > 
> > The algorithm you implemented looks a bit too complicated to me. Let me
> > sketch the algorithm that's implemented by several other drivers.
> > 
> > - have a power of two number of TX objects
> > - add a number of objects to struct priv (tx_num)
> >   (or make it a define, if the number of tx objects is compile time fixed)
> > - add two "unsigned int" variables to your struct priv,
> >   one "tx_head", one "tx_tail"
> > - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> > - increment tx_head
> > - stop the tx_queue if there is no space or if the object with the
> >   lowest prio has been written
> > - in TX complete IRQ, handle priv->tx_tail object
> > - increment tx_tail
> > - wake queue if there is space but don't wake if we wait for the lowest
> >   prio object to be TX completed.
> > 
> > Special care needs to be taken to implement that lock-less and race
> > free. I suggest to look the the mcp251xfd driver.
> 
> After converting the driver to the above outlined implementation it
> should be more straight forward to add the caching you implemented.  
> 

I took some time to think about your suggestions.
The submitted patch was developed trying to improve the
CAN transmission using the current driver design for minimize
the creation of bugs.
If I'm not missing something you suggest me to change the
driver design as a pre-condition to apply an updated version
of my patch. IMHO this would increase the possibility of generating
bugs, even for parts of the code that are considered stable.
If the algorithm I have implemented is a bit too complicated,
let's try to simplify it starting from the submitted patch.

Waiting for your reply, thanks and regards
Dario

> regards,
> Marc
> 
> -- 
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO
  2021-05-10 12:25   ` Marc Kleine-Budde
@ 2021-05-10 12:36     ` Marc Kleine-Budde
  2021-05-13 11:23       ` Dario Binacchi
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2021-05-10 12:36 UTC (permalink / raw)
  To: Dario Binacchi
  Cc: linux-kernel, David S. Miller, Gianluca Falavigna,
	Jakub Kicinski, Oliver Hartkopp, Vincent Mailhol,
	Wolfgang Grandegger, linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 2369 bytes --]

On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > buffer number wins.
> > 
> > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > waited until the only frame of the FIFO was actually transmitted by the
> > controller. Only one message in the FIFO but we had to wait for it to
> > empty completely to ensure that the messages were transmitted in the
> > order in which they were loaded.
> > 
> > By storing the frames in the FIFO without requiring its transmission, we
> > will be able to use the full size of the FIFO even in cases such as the
> > one described above. The transmission interrupt will trigger their
> > transmission only when all the messages previously loaded but stored in
> > less priority positions of the buffers have been transmitted.
> 
> The algorithm you implemented looks a bit too complicated to me. Let me
> sketch the algorithm that's implemented by several other drivers.
> 
> - have a power of two number of TX objects
> - add a number of objects to struct priv (tx_num)
>   (or make it a define, if the number of tx objects is compile time fixed)
> - add two "unsigned int" variables to your struct priv,
>   one "tx_head", one "tx_tail"
> - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> - increment tx_head
> - stop the tx_queue if there is no space or if the object with the
>   lowest prio has been written
> - in TX complete IRQ, handle priv->tx_tail object
> - increment tx_tail
> - wake queue if there is space but don't wake if we wait for the lowest
>   prio object to be TX completed.
> 
> Special care needs to be taken to implement that lock-less and race
> free. I suggest to look the the mcp251xfd driver.

After converting the driver to the above outlined implementation it
should be more straight forward to add the caching you implemented.  

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO
  2021-05-09 12:43 ` [PATCH 3/3] " Dario Binacchi
@ 2021-05-10 12:25   ` Marc Kleine-Budde
  2021-05-10 12:36     ` Marc Kleine-Budde
  0 siblings, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2021-05-10 12:25 UTC (permalink / raw)
  To: Dario Binacchi
  Cc: linux-kernel, David S. Miller, Gianluca Falavigna,
	Jakub Kicinski, Oliver Hartkopp, Vincent Mailhol,
	Wolfgang Grandegger, linux-can, netdev

[-- Attachment #1: Type: text/plain, Size: 2095 bytes --]

On 09.05.2021 14:43:09, Dario Binacchi wrote:
> As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> C/D_CAN controller sends out the buffers prioritized so that the lowest
> buffer number wins.
> 
> What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> waited until the only frame of the FIFO was actually transmitted by the
> controller. Only one message in the FIFO but we had to wait for it to
> empty completely to ensure that the messages were transmitted in the
> order in which they were loaded.
> 
> By storing the frames in the FIFO without requiring its transmission, we
> will be able to use the full size of the FIFO even in cases such as the
> one described above. The transmission interrupt will trigger their
> transmission only when all the messages previously loaded but stored in
> less priority positions of the buffers have been transmitted.

The algorithm you implemented looks a bit too complicated to me. Let me
sketch the algorithm that's implemented by several other drivers.

- have a power of two number of TX objects
- add a number of objects to struct priv (tx_num)
  (or make it a define, if the number of tx objects is compile time fixed)
- add two "unsigned int" variables to your struct priv,
  one "tx_head", one "tx_tail"
- the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
- increment tx_head
- stop the tx_queue if there is no space or if the object with the
  lowest prio has been written
- in TX complete IRQ, handle priv->tx_tail object
- increment tx_tail
- wake queue if there is space but don't wake if we wait for the lowest
  prio object to be TX completed.

Special care needs to be taken to implement that lock-less and race
free. I suggest to look the the mcp251xfd driver.

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO
  2021-05-09 12:43 [PATCH 0/3] " Dario Binacchi
@ 2021-05-09 12:43 ` Dario Binacchi
  2021-05-10 12:25   ` Marc Kleine-Budde
  0 siblings, 1 reply; 8+ messages in thread
From: Dario Binacchi @ 2021-05-09 12:43 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dario Binacchi, David S. Miller, Gianluca Falavigna,
	Jakub Kicinski, Marc Kleine-Budde, Oliver Hartkopp,
	Vincent Mailhol, Wolfgang Grandegger, linux-can, netdev

As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
waited until the only frame of the FIFO was actually transmitted by the
controller. Only one message in the FIFO but we had to wait for it to
empty completely to ensure that the messages were transmitted in the
order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Suggested-by: Gianluca Falavigna <gianluca.falavigna@inwind.it>
Signed-off-by: Dario Binacchi <dariobin@libero.it>


---

 drivers/net/can/c_can/c_can.h      |  3 ++
 drivers/net/can/c_can/c_can_main.c | 63 ++++++++++++++++++++++++------
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 4247ff80a29c..6abde6cbc0b1 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -191,6 +191,9 @@ struct c_can_priv {
 	unsigned int msg_obj_tx_last;
 	u32 msg_obj_rx_mask;
 	atomic_t tx_active;
+	atomic_t tx_cached;
+	spinlock_t tx_cached_lock;
+	atomic_t tx_avail;
 	atomic_t sie_pending;
 	unsigned long tx_dir;
 	int last_status;
diff --git a/drivers/net/can/c_can/c_can_main.c b/drivers/net/can/c_can/c_can_main.c
index 7588f70ca0fe..d2f44c07d47f 100644
--- a/drivers/net/can/c_can/c_can_main.c
+++ b/drivers/net/can/c_can/c_can_main.c
@@ -124,6 +124,9 @@
 				 IF_COMM_TXRQST |		 \
 				 IF_COMM_DATAA | IF_COMM_DATAB)
 
+#define IF_COMM_TX_FRAME	(IF_COMM_ARB | IF_COMM_CONTROL | \
+				 IF_COMM_DATAA | IF_COMM_DATAB)
+
 /* For the low buffers we clear the interrupt bit, but keep newdat */
 #define IF_COMM_RCV_LOW		(IF_COMM_MASK | IF_COMM_ARB | \
 				 IF_COMM_CONTROL | IF_COMM_CLR_INT_PND | \
@@ -432,19 +435,36 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 {
 	struct can_frame *frame = (struct can_frame *)skb->data;
 	struct c_can_priv *priv = netdev_priv(dev);
-	u32 idx, obj;
+	u32 idx, obj, tx_active, tx_cached;
 
 	if (can_dropped_invalid_skb(dev, skb))
 		return NETDEV_TX_OK;
-	/* This is not a FIFO. C/D_CAN sends out the buffers
-	 * prioritized. The lowest buffer number wins.
-	 */
-	idx = fls(atomic_read(&priv->tx_active));
-	obj = idx + priv->msg_obj_tx_first;
 
-	/* If this is the last buffer, stop the xmit queue */
-	if (idx == priv->msg_obj_tx_num - 1)
+	if (atomic_read(&priv->tx_avail) == 0)
 		netif_stop_queue(dev);
+
+	tx_active = atomic_read(&priv->tx_active);
+	tx_cached = atomic_read(&priv->tx_cached);
+	idx = fls(tx_active);
+	if (idx > priv->msg_obj_tx_num - 1) {
+		idx = fls(tx_cached);
+
+		obj = idx + priv->msg_obj_tx_first;
+		spin_lock_bh(&priv->tx_cached_lock);
+		/* prepare message object for transmission */
+		c_can_setup_tx_object(dev, IF_TX, frame, idx);
+		/* Store the message but don't ask for its transmission */
+		c_can_object_put(dev, IF_TX, obj, IF_COMM_TX_FRAME);
+		spin_unlock_bh(&priv->tx_cached_lock);
+		priv->dlc[idx] = frame->len;
+		can_put_echo_skb(skb, dev, idx, 0);
+		atomic_dec(&priv->tx_avail);
+		atomic_add(BIT(idx), &priv->tx_cached);
+		return NETDEV_TX_OK;
+	}
+
+	obj = idx + priv->msg_obj_tx_first;
+
 	/* Store the message in the interface so we can call
 	 * can_put_echo_skb(). We must do this before we enable
 	 * transmit as we might race against do_tx().
@@ -453,6 +473,7 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
 	priv->dlc[idx] = frame->len;
 	can_put_echo_skb(skb, dev, idx, 0);
 
+	atomic_dec(&priv->tx_avail);
 	/* Update the active bits */
 	atomic_add(BIT(idx), &priv->tx_active);
 	/* Start transmission */
@@ -599,6 +620,8 @@ static int c_can_chip_config(struct net_device *dev)
 
 	/* Clear all internal status */
 	atomic_set(&priv->tx_active, 0);
+	atomic_set(&priv->tx_cached, 0);
+	atomic_set(&priv->tx_avail, priv->msg_obj_tx_num);
 	priv->tx_dir = 0;
 
 	/* set bittiming params */
@@ -723,14 +746,31 @@ static void c_can_do_tx(struct net_device *dev)
 	/* Clear the bits in the tx_active mask */
 	atomic_sub(clr, &priv->tx_active);
 
-	if (clr & BIT(priv->msg_obj_tx_num - 1))
-		netif_wake_queue(dev);
-
 	if (pkts) {
+		atomic_add(pkts, &priv->tx_avail);
+
+		if (netif_queue_stopped(dev))
+			netif_wake_queue(dev);
+
 		stats->tx_bytes += bytes;
 		stats->tx_packets += pkts;
 		can_led_event(dev, CAN_LED_EVENT_TX);
 	}
+
+	if (atomic_read(&priv->tx_active) == 0) {
+		pend = atomic_read(&priv->tx_cached);
+
+		clr = pend;
+		while ((idx = ffs(pend))) {
+			idx--;
+			pend &= ~(1 << idx);
+
+			obj = idx + priv->msg_obj_tx_first;
+			c_can_object_put(dev, IF_TX, obj, IF_COMM_TXRQST);
+		}
+		atomic_sub(clr, &priv->tx_cached);
+		atomic_add(clr, &priv->tx_active);
+	}
 }
 
 /* If we have a gap in the pending bits, that means we either
@@ -1193,6 +1233,7 @@ struct net_device *alloc_c_can_dev(int msg_obj_num)
 		return NULL;
 
 	priv = netdev_priv(dev);
+	spin_lock_init(&priv->tx_cached_lock);
 	priv->msg_obj_num = msg_obj_num;
 	priv->msg_obj_rx_num = msg_obj_num - msg_obj_tx_num;
 	priv->msg_obj_rx_first = 1;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-06-06 20:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-06 20:17 [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
2021-06-06 20:17 ` [PATCH 1/3] can: c_can: exit c_can_do_tx() early if no frames have been sent Dario Binacchi
2021-06-06 20:17 ` [PATCH 2/3] can: c_can: support tx ring algorithm Dario Binacchi
2021-06-06 20:17 ` [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO Dario Binacchi
  -- strict thread matches above, loose matches on Subject: below --
2021-05-09 12:43 [PATCH 0/3] " Dario Binacchi
2021-05-09 12:43 ` [PATCH 3/3] " Dario Binacchi
2021-05-10 12:25   ` Marc Kleine-Budde
2021-05-10 12:36     ` Marc Kleine-Budde
2021-05-13 11:23       ` Dario Binacchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).