From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anssi Hannula Subject: Re: [PATCH 5/5] can: xilinx_can: fix TX FIFO accounting getting out of sync Date: Mon, 13 Feb 2017 16:01:10 +0200 Message-ID: <0f108f45-396f-f12f-128a-aa44f61fe547@bitwise.fi> References: <20170213131208.20227-1-anssi.hannula@bitwise.fi> <20170213131208.20227-6-anssi.hannula@bitwise.fi> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20170213131208.20227-6-anssi.hannula@bitwise.fi> Sender: stable-owner@vger.kernel.org To: Marc Kleine-Budde Cc: linux-can@vger.kernel.org, Kedareswara rao Appana , stable@vger.kernel.org List-Id: linux-can.vger.kernel.org On 13.2.2017 15:12, Anssi Hannula wrote: > The xilinx_can driver assumes that the TXOK interrupt only clears after > it has been acknowledged as many times as there have been successfully > sent frames. > > However, the documentation does not mention such behavior, instead > saying just that the interrupt is cleared when the clear bit is set. > > Similarly, testing seems to also suggest that it is immediately cleared > regardless of the amount of frames having been sent. Performing some > heavy TX load and then going back to idle has the tx_head drifting > further away from tx_tail over time, steadily reducing the amount of > frames the driver keeps in the TX FIFO (but not to zero, as the TXOK > interrupt always frees up space for 1 frame from the driver's > perspective, so frames continue to be sent) and delaying the local echo > frames. > > The TX FIFO tracking is also otherwise buggy as it does not account for > TX FIFO being cleared after software resets, causing > BUG!, TX FIFO full when queue awake! > messages to be output. > > Since there does not seem to be any way to accurately track the state of > the TX FIFO, drop the code trying to do that and perform > netif_stop_queue() based on whether the HW reports the FIFO being full. > Driver-side local echo support is also removed as it is not possible to > do accurately (relying instead on the CAN core fallback support). > > Tested with the integrated CAN on Zynq-7000 SoC. > > Fixes: b1201e44f50b ("can: xilinx CAN controller support") > Signed-off-by: Anssi Hannula > Cc: > --- > drivers/net/can/xilinx_can.c | 33 +++++++++------------------------ > 1 file changed, 9 insertions(+), 24 deletions(-) > > diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c > index 392be77..7cfb8b6d 100644 > --- a/drivers/net/can/xilinx_can.c > +++ b/drivers/net/can/xilinx_can.c [...] > @@ -832,17 +826,9 @@ static int xcan_rx_poll(struct napi_struct *napi, int quota) > static void xcan_tx_interrupt(struct net_device *ndev, u32 isr) > { > struct xcan_priv *priv = netdev_priv(ndev); > - struct net_device_stats *stats = &ndev->stats; > > - while ((priv->tx_head - priv->tx_tail > 0) && > - (isr & XCAN_IXR_TXOK_MASK)) { > - priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); > - can_get_echo_skb(ndev, priv->tx_tail % > - priv->tx_max); > - priv->tx_tail++; > - stats->tx_packets++; > - isr = priv->read_reg(priv, XCAN_ISR_OFFSET); > - } > + priv->write_reg(priv, XCAN_ICR_OFFSET, XCAN_IXR_TXOK_MASK); > + > can_led_event(ndev, CAN_LED_EVENT_TX); > xcan_update_error_state_after_rxtx(ndev); > netif_wake_queue(ndev); Hmm, actually I think this netif_wake_queue() is racy with start_xmit() as the latter could've filled the TX FIFO at this point... So maybe some locking is needed here, unless there is another way this is usually handled. -- Anssi Hannula / Bitwise Oy