* [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack @ 2018-01-30 16:56 Andri Yngvason 2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw) To: linux-can Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell, Andri Yngvason This has been reported to cause stalls on rt-linux. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> --- drivers/net/can/cc770/cc770.c | 15 --------------- 1 file changed, 15 deletions(-) diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c index 1e37313..9fed163 100644 --- a/drivers/net/can/cc770/cc770.c +++ b/drivers/net/can/cc770/cc770.c @@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev) stats->tx_bytes += dlc; - - /* - * HM: We had some cases of repeated IRQs so make sure the - * INT is acknowledged I know it's already further up, but - * doing again fixed the issue - */ - cc770_write_reg(priv, msgobj[mo].ctrl0, - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); - return NETDEV_TX_OK; } @@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device *dev, unsigned int o) /* Nothing more to send, switch off interrupts */ cc770_write_reg(priv, msgobj[mo].ctrl0, MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); - /* - * We had some cases of repeated IRQ so make sure the - * INT is acknowledged - */ - cc770_write_reg(priv, msgobj[mo].ctrl0, - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); stats->tx_packets++; can_get_echo_skb(dev, 0); -- 1.9.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason @ 2018-01-30 16:56 ` Andri Yngvason 2018-02-12 19:40 ` Richard Weinberger 2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason 2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger 2 siblings, 1 reply; 16+ messages in thread From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw) To: linux-can Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell, Andri Yngvason If the queue is not stopped, the start_xmit function will continue to be called until the driver or the system is reset. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> --- drivers/net/can/cc770/cc770.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c index 9fed163..12d3b89 100644 --- a/drivers/net/can/cc770/cc770.c +++ b/drivers/net/can/cc770/cc770.c @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev) if ((cc770_read_reg(priv, msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { + netif_stop_queue(dev); netdev_err(dev, "TX register is still occupied!\n"); return NETDEV_TX_BUSY; } -- 1.9.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason @ 2018-02-12 19:40 ` Richard Weinberger 2018-02-12 20:37 ` Wolfgang Grandegger 0 siblings, 1 reply; 16+ messages in thread From: Richard Weinberger @ 2018-02-12 19:40 UTC (permalink / raw) To: Andri Yngvason Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Andri, Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: > If the queue is not stopped, the start_xmit function will continue to be > called until the driver or the system is reset. > > Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > --- > drivers/net/can/cc770/cc770.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c > index 9fed163..12d3b89 100644 > --- a/drivers/net/can/cc770/cc770.c > +++ b/drivers/net/can/cc770/cc770.c > @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, > struct net_device *dev) > > if ((cc770_read_reg(priv, > msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { > + netif_stop_queue(dev); > netdev_err(dev, "TX register is still occupied!\n"); > return NETDEV_TX_BUSY; > } I see that some can drivers stop the queue, others not. What exactly is the problem in this case? Thanks, //richard -- sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria ATU66964118 - FN 374287y ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 19:40 ` Richard Weinberger @ 2018-02-12 20:37 ` Wolfgang Grandegger 2018-02-12 20:44 ` Wolfgang Grandegger 0 siblings, 1 reply; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-12 20:37 UTC (permalink / raw) To: Richard Weinberger, Andri Yngvason Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Hello, Am 12.02.2018 um 20:40 schrieb Richard Weinberger: > Andri, > > Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >> If the queue is not stopped, the start_xmit function will continue to be >> called until the driver or the system is reset. >> >> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >> --- >> drivers/net/can/cc770/cc770.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c >> index 9fed163..12d3b89 100644 >> --- a/drivers/net/can/cc770/cc770.c >> +++ b/drivers/net/can/cc770/cc770.c >> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, >> struct net_device *dev) >> >> if ((cc770_read_reg(priv, >> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >> + netif_stop_queue(dev); >> netdev_err(dev, "TX register is still occupied!\n"); >> return NETDEV_TX_BUSY; >> } > > I see that some can drivers stop the queue, others not. > What exactly is the problem in this case? Stopping the queue and returning NETDEV_TX_BUSY does make little sense. NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that case should be avoided by stopping the queue earlier. Actually, netif_stop_queue(dev) is called directly after the if block above, which means that "TX register is still occupied" indicates an error/bug. I think there is a race somewhere. Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 20:37 ` Wolfgang Grandegger @ 2018-02-12 20:44 ` Wolfgang Grandegger 2018-02-12 20:55 ` Richard Weinberger 0 siblings, 1 reply; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-12 20:44 UTC (permalink / raw) To: Richard Weinberger, Andri Yngvason Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: > Hello, > > Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >> Andri, >> >> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>> If the queue is not stopped, the start_xmit function will continue to be >>> called until the driver or the system is reset. >>> >>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>> --- >>> drivers/net/can/cc770/cc770.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/net/can/cc770/cc770.c >>> b/drivers/net/can/cc770/cc770.c >>> index 9fed163..12d3b89 100644 >>> --- a/drivers/net/can/cc770/cc770.c >>> +++ b/drivers/net/can/cc770/cc770.c >>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>> sk_buff *skb, >>> struct net_device *dev) >>> >>> if ((cc770_read_reg(priv, >>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>> + netif_stop_queue(dev); >>> netdev_err(dev, "TX register is still occupied!\n"); >>> return NETDEV_TX_BUSY; >>> } >> >> I see that some can drivers stop the queue, others not. >> What exactly is the problem in this case? > > Stopping the queue and returning NETDEV_TX_BUSY does make little sense. > NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that > case should be avoided by stopping the queue earlier. Actually, > netif_stop_queue(dev) is called directly after the if block above, > which means that "TX register is still occupied" indicates an error/bug. > I think there is a race somewhere. If you run on "-rt", the IRQ is threaded. This would explain the issue. Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 20:44 ` Wolfgang Grandegger @ 2018-02-12 20:55 ` Richard Weinberger 2018-02-12 21:28 ` Wolfgang Grandegger 0 siblings, 1 reply; 16+ messages in thread From: Richard Weinberger @ 2018-02-12 20:55 UTC (permalink / raw) To: Wolfgang Grandegger Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: > Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: > > Hello, > > > > Am 12.02.2018 um 20:40 schrieb Richard Weinberger: > >> Andri, > >> > >> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: > >>> If the queue is not stopped, the start_xmit function will continue to be > >>> called until the driver or the system is reset. > >>> > >>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > >>> --- > >>> drivers/net/can/cc770/cc770.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/drivers/net/can/cc770/cc770.c > >>> b/drivers/net/can/cc770/cc770.c > >>> index 9fed163..12d3b89 100644 > >>> --- a/drivers/net/can/cc770/cc770.c > >>> +++ b/drivers/net/can/cc770/cc770.c > >>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct > >>> sk_buff *skb, > >>> struct net_device *dev) > >>> > >>> if ((cc770_read_reg(priv, > >>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { > >>> + netif_stop_queue(dev); > >>> netdev_err(dev, "TX register is still occupied!\n"); > >>> return NETDEV_TX_BUSY; > >>> } > >> > >> I see that some can drivers stop the queue, others not. > >> What exactly is the problem in this case? > > > > Stopping the queue and returning NETDEV_TX_BUSY does make little sense. > > NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that > > case should be avoided by stopping the queue earlier. Actually, > > netif_stop_queue(dev) is called directly after the if block above, > > which means that "TX register is still occupied" indicates an error/bug. > > I think there is a race somewhere. > > If you run on "-rt", the IRQ is threaded. This would explain the issue. > > Wolfgang. Andri, do you see the "TX register is still occupied!" error message? In my case, I never saw it while working on the rt-related stalls during my tests. Thanks, //richard -- sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria ATU66964118 - FN 374287y ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 20:55 ` Richard Weinberger @ 2018-02-12 21:28 ` Wolfgang Grandegger 2018-02-12 21:36 ` Wolfgang Grandegger 0 siblings, 1 reply; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-12 21:28 UTC (permalink / raw) To: Richard Weinberger Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Am 12.02.2018 um 21:55 schrieb Richard Weinberger: > Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: >> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: >>> Hello, >>> >>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >>>> Andri, >>>> >>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>>>> If the queue is not stopped, the start_xmit function will continue to be >>>>> called until the driver or the system is reset. >>>>> >>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>>>> --- >>>>> drivers/net/can/cc770/cc770.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/drivers/net/can/cc770/cc770.c >>>>> b/drivers/net/can/cc770/cc770.c >>>>> index 9fed163..12d3b89 100644 >>>>> --- a/drivers/net/can/cc770/cc770.c >>>>> +++ b/drivers/net/can/cc770/cc770.c >>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>>>> sk_buff *skb, >>>>> struct net_device *dev) >>>>> >>>>> if ((cc770_read_reg(priv, >>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>>>> + netif_stop_queue(dev); >>>>> netdev_err(dev, "TX register is still occupied!\n"); >>>>> return NETDEV_TX_BUSY; >>>>> } >>>> >>>> I see that some can drivers stop the queue, others not. >>>> What exactly is the problem in this case? >>> >>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. >>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that >>> case should be avoided by stopping the queue earlier. Actually, >>> netif_stop_queue(dev) is called directly after the if block above, >>> which means that "TX register is still occupied" indicates an error/bug. >>> I think there is a race somewhere. >> >> If you run on "-rt", the IRQ is threaded. This would explain the issue. >> >> Wolfgang. > > Andri, do you see the "TX register is still occupied!" error message? > In my case, I never saw it while working on the rt-related stalls during my > tests. At a closer look I cannot identify a race. Anyway, the message should never show up. Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 21:28 ` Wolfgang Grandegger @ 2018-02-12 21:36 ` Wolfgang Grandegger 2018-02-13 10:32 ` Andri Yngvason 2018-02-16 14:22 ` Marc Kleine-Budde 0 siblings, 2 replies; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-12 21:36 UTC (permalink / raw) To: Richard Weinberger Cc: Andri Yngvason, linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger: > Am 12.02.2018 um 21:55 schrieb Richard Weinberger: >> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: >>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: >>>> Hello, >>>> >>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >>>>> Andri, >>>>> >>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>>>>> If the queue is not stopped, the start_xmit function will continue >>>>>> to be >>>>>> called until the driver or the system is reset. >>>>>> >>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>>>>> --- >>>>>> drivers/net/can/cc770/cc770.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/drivers/net/can/cc770/cc770.c >>>>>> b/drivers/net/can/cc770/cc770.c >>>>>> index 9fed163..12d3b89 100644 >>>>>> --- a/drivers/net/can/cc770/cc770.c >>>>>> +++ b/drivers/net/can/cc770/cc770.c >>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>>>>> sk_buff *skb, >>>>>> struct net_device *dev) >>>>>> >>>>>> if ((cc770_read_reg(priv, >>>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>>>>> + netif_stop_queue(dev); >>>>>> netdev_err(dev, "TX register is still occupied!\n"); >>>>>> return NETDEV_TX_BUSY; >>>>>> } >>>>> >>>>> I see that some can drivers stop the queue, others not. >>>>> What exactly is the problem in this case? >>>> >>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. >>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that >>>> case should be avoided by stopping the queue earlier. Actually, >>>> netif_stop_queue(dev) is called directly after the if block above, >>>> which means that "TX register is still occupied" indicates an >>>> error/bug. >>>> I think there is a race somewhere. >>> >>> If you run on "-rt", the IRQ is threaded. This would explain the issue. >>> >>> Wolfgang. >> >> Andri, do you see the "TX register is still occupied!" error message? >> In my case, I never saw it while working on the rt-related stalls >> during my >> tests. > > At a closer look I cannot identify a race. Anyway, the message should > never show up. You could try to get a function trace by enabling functions tracing and stopping it with "tracing_off" when the if block above is entered. A filter for "cc770_*" and "netif_*" might be useful as well. Wolfgang. > > Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 21:36 ` Wolfgang Grandegger @ 2018-02-13 10:32 ` Andri Yngvason 2018-02-13 16:30 ` Wolfgang Grandegger 2018-02-16 14:22 ` Marc Kleine-Budde 1 sibling, 1 reply; 16+ messages in thread From: Andri Yngvason @ 2018-02-13 10:32 UTC (permalink / raw) To: Wolfgang Grandegger, Richard Weinberger Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Hello Wolfgang, Richard, Quoting Wolfgang Grandegger (2018-02-12 21:36:37) > Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger: > > Am 12.02.2018 um 21:55 schrieb Richard Weinberger: > >> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: > >>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: > >>>> Hello, > >>>> > >>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: > >>>>> Andri, > >>>>> > >>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: > >>>>>> If the queue is not stopped, the start_xmit function will continue > >>>>>> to be > >>>>>> called until the driver or the system is reset. > >>>>>> > >>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > >>>>>> --- > >>>>>> drivers/net/can/cc770/cc770.c | 1 + > >>>>>> 1 file changed, 1 insertion(+) > >>>>>> > >>>>>> diff --git a/drivers/net/can/cc770/cc770.c > >>>>>> b/drivers/net/can/cc770/cc770.c > >>>>>> index 9fed163..12d3b89 100644 > >>>>>> --- a/drivers/net/can/cc770/cc770.c > >>>>>> +++ b/drivers/net/can/cc770/cc770.c > >>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct > >>>>>> sk_buff *skb, > >>>>>> struct net_device *dev) > >>>>>> > >>>>>> if ((cc770_read_reg(priv, > >>>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { > >>>>>> + netif_stop_queue(dev); > >>>>>> netdev_err(dev, "TX register is still occupied!\n"); > >>>>>> return NETDEV_TX_BUSY; > >>>>>> } > >>>>> > >>>>> I see that some can drivers stop the queue, others not. > >>>>> What exactly is the problem in this case? > >>>> > >>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. > >>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that > >>>> case should be avoided by stopping the queue earlier. Actually, > >>>> netif_stop_queue(dev) is called directly after the if block above, > >>>> which means that "TX register is still occupied" indicates an > >>>> error/bug. > >>>> I think there is a race somewhere. > >>> > >>> If you run on "-rt", the IRQ is threaded. This would explain the issue. > >>> > >>> Wolfgang. > >> > >> Andri, do you see the "TX register is still occupied!" error message? > >> In my case, I never saw it while working on the rt-related stalls > >> during my > >> tests. > > > > At a closer look I cannot identify a race. Anyway, the message should > > never show up. > > You could try to get a function trace by enabling functions tracing and > stopping it with "tracing_off" when the if block above is entered. A > filter for "cc770_*" and "netif_*" might be useful as well. > > Wolfgang. There is no issue currently that will cause NETDEV_TX_BUSY to be returned. I included this patch because while I was working on PATCH 3, this return would stall the system when I ran into the race condition which I subsequently fixed. Also, according to include/linux/netdevice.h: * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, * struct net_device *dev); * Called when a packet needs to be transmitted. * Returns NETDEV_TX_OK. Can return NETDEV_TX_BUSY, but you should stop * the queue before that can happen; it's for obsolete devices and weird * corner cases, but the stack really does a non-trivial amount * of useless work if you return NETDEV_TX_BUSY. * Required; cannot be NULL. All netdevs should stop the queue before returning NETDEV_TX_BUSY. Best regards, Andri ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-13 10:32 ` Andri Yngvason @ 2018-02-13 16:30 ` Wolfgang Grandegger 0 siblings, 0 replies; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-13 16:30 UTC (permalink / raw) To: Andri Yngvason, Richard Weinberger Cc: linux-can, mkl, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Am 13.02.2018 um 11:32 schrieb Andri Yngvason: > Hello Wolfgang, Richard, > > Quoting Wolfgang Grandegger (2018-02-12 21:36:37) >> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger: >>> Am 12.02.2018 um 21:55 schrieb Richard Weinberger: >>>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: >>>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: >>>>>> Hello, >>>>>> >>>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >>>>>>> Andri, >>>>>>> >>>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>>>>>>> If the queue is not stopped, the start_xmit function will continue >>>>>>>> to be >>>>>>>> called until the driver or the system is reset. >>>>>>>> >>>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>>>>>>> --- >>>>>>>> drivers/net/can/cc770/cc770.c | 1 + >>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>> >>>>>>>> diff --git a/drivers/net/can/cc770/cc770.c >>>>>>>> b/drivers/net/can/cc770/cc770.c >>>>>>>> index 9fed163..12d3b89 100644 >>>>>>>> --- a/drivers/net/can/cc770/cc770.c >>>>>>>> +++ b/drivers/net/can/cc770/cc770.c >>>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>>>>>>> sk_buff *skb, >>>>>>>> struct net_device *dev) >>>>>>>> >>>>>>>> if ((cc770_read_reg(priv, >>>>>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>>>>>>> + netif_stop_queue(dev); >>>>>>>> netdev_err(dev, "TX register is still occupied!\n"); >>>>>>>> return NETDEV_TX_BUSY; >>>>>>>> } >>>>>>> >>>>>>> I see that some can drivers stop the queue, others not. >>>>>>> What exactly is the problem in this case? >>>>>> >>>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. >>>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that >>>>>> case should be avoided by stopping the queue earlier. Actually, >>>>>> netif_stop_queue(dev) is called directly after the if block above, >>>>>> which means that "TX register is still occupied" indicates an >>>>>> error/bug. >>>>>> I think there is a race somewhere. >>>>> >>>>> If you run on "-rt", the IRQ is threaded. This would explain the issue. >>>>> >>>>> Wolfgang. >>>> >>>> Andri, do you see the "TX register is still occupied!" error message? >>>> In my case, I never saw it while working on the rt-related stalls >>>> during my >>>> tests. >>> >>> At a closer look I cannot identify a race. Anyway, the message should >>> never show up. >> >> You could try to get a function trace by enabling functions tracing and >> stopping it with "tracing_off" when the if block above is entered. A >> filter for "cc770_*" and "netif_*" might be useful as well. >> >> Wolfgang. > > There is no issue currently that will cause NETDEV_TX_BUSY to be returned. I > included this patch because while I was working on PATCH 3, this return would > stall the system when I ran into the race condition which I subsequently fixed. OK. > Also, according to include/linux/netdevice.h: > * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, > * struct net_device *dev); > * Called when a packet needs to be transmitted. > * Returns NETDEV_TX_OK. Can return NETDEV_TX_BUSY, but you should stop > * the queue before that can happen; it's for obsolete devices and weird > * corner cases, but the stack really does a non-trivial amount > * of useless work if you return NETDEV_TX_BUSY. > * Required; cannot be NULL. > > All netdevs should stop the queue before returning NETDEV_TX_BUSY. Thanks for the info, I was not aware of that. Without stopping the queue, the upper layer will retry asap... which may cause heavy CPU load... which is bad! Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-12 21:36 ` Wolfgang Grandegger 2018-02-13 10:32 ` Andri Yngvason @ 2018-02-16 14:22 ` Marc Kleine-Budde 2018-02-16 15:20 ` Wolfgang Grandegger 1 sibling, 1 reply; 16+ messages in thread From: Marc Kleine-Budde @ 2018-02-16 14:22 UTC (permalink / raw) To: Wolfgang Grandegger, Richard Weinberger Cc: Andri Yngvason, linux-can, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell [-- Attachment #1.1: Type: text/plain, Size: 3068 bytes --] On 02/12/2018 10:36 PM, Wolfgang Grandegger wrote: > Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger: >> Am 12.02.2018 um 21:55 schrieb Richard Weinberger: >>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: >>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: >>>>> Hello, >>>>> >>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >>>>>> Andri, >>>>>> >>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>>>>>> If the queue is not stopped, the start_xmit function will continue >>>>>>> to be >>>>>>> called until the driver or the system is reset. >>>>>>> >>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>>>>>> --- >>>>>>> drivers/net/can/cc770/cc770.c | 1 + >>>>>>> 1 file changed, 1 insertion(+) >>>>>>> >>>>>>> diff --git a/drivers/net/can/cc770/cc770.c >>>>>>> b/drivers/net/can/cc770/cc770.c >>>>>>> index 9fed163..12d3b89 100644 >>>>>>> --- a/drivers/net/can/cc770/cc770.c >>>>>>> +++ b/drivers/net/can/cc770/cc770.c >>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>>>>>> sk_buff *skb, >>>>>>> struct net_device *dev) >>>>>>> >>>>>>> if ((cc770_read_reg(priv, >>>>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>>>>>> + netif_stop_queue(dev); >>>>>>> netdev_err(dev, "TX register is still occupied!\n"); >>>>>>> return NETDEV_TX_BUSY; >>>>>>> } >>>>>> >>>>>> I see that some can drivers stop the queue, others not. >>>>>> What exactly is the problem in this case? >>>>> >>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. >>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that >>>>> case should be avoided by stopping the queue earlier. Actually, >>>>> netif_stop_queue(dev) is called directly after the if block above, >>>>> which means that "TX register is still occupied" indicates an >>>>> error/bug. >>>>> I think there is a race somewhere. >>>> >>>> If you run on "-rt", the IRQ is threaded. This would explain the issue. >>>> >>>> Wolfgang. >>> >>> Andri, do you see the "TX register is still occupied!" error message? >>> In my case, I never saw it while working on the rt-related stalls >>> during my >>> tests. >> >> At a closer look I cannot identify a race. Anyway, the message should >> never show up. > > You could try to get a function trace by enabling functions tracing and > stopping it with "tracing_off" when the if block above is entered. A > filter for "cc770_*" and "netif_*" might be useful as well. It seems to me these patches are still WIP, right? Or should I take some to into my git-tree? Marc -- Pengutronix e.K. | Marc Kleine-Budde | Industrial Linux Solutions | Phone: +49-231-2826-924 | Vertretung West/Dortmund | Fax: +49-5121-206917-5555 | Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de | [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY 2018-02-16 14:22 ` Marc Kleine-Budde @ 2018-02-16 15:20 ` Wolfgang Grandegger 0 siblings, 0 replies; 16+ messages in thread From: Wolfgang Grandegger @ 2018-02-16 15:20 UTC (permalink / raw) To: Marc Kleine-Budde, Richard Weinberger Cc: Andri Yngvason, linux-can, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Hello Marc, Am 16.02.2018 um 15:22 schrieb Marc Kleine-Budde: > On 02/12/2018 10:36 PM, Wolfgang Grandegger wrote: >> Am 12.02.2018 um 22:28 schrieb Wolfgang Grandegger: >>> Am 12.02.2018 um 21:55 schrieb Richard Weinberger: >>>> Am Montag, 12. Februar 2018, 21:44:11 CET schrieb Wolfgang Grandegger: >>>>> Am 12.02.2018 um 21:37 schrieb Wolfgang Grandegger: >>>>>> Hello, >>>>>> >>>>>> Am 12.02.2018 um 20:40 schrieb Richard Weinberger: >>>>>>> Andri, >>>>>>> >>>>>>> Am Dienstag, 30. Januar 2018, 17:56:48 CET schrieb Andri Yngvason: >>>>>>>> If the queue is not stopped, the start_xmit function will continue >>>>>>>> to be >>>>>>>> called until the driver or the system is reset. >>>>>>>> >>>>>>>> Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> >>>>>>>> --- >>>>>>>> drivers/net/can/cc770/cc770.c | 1 + >>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>> >>>>>>>> diff --git a/drivers/net/can/cc770/cc770.c >>>>>>>> b/drivers/net/can/cc770/cc770.c >>>>>>>> index 9fed163..12d3b89 100644 >>>>>>>> --- a/drivers/net/can/cc770/cc770.c >>>>>>>> +++ b/drivers/net/can/cc770/cc770.c >>>>>>>> @@ -405,6 +405,7 @@ static netdev_tx_t cc770_start_xmit(struct >>>>>>>> sk_buff *skb, >>>>>>>> struct net_device *dev) >>>>>>>> >>>>>>>> if ((cc770_read_reg(priv, >>>>>>>> msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { >>>>>>>> + netif_stop_queue(dev); >>>>>>>> netdev_err(dev, "TX register is still occupied!\n"); >>>>>>>> return NETDEV_TX_BUSY; >>>>>>>> } >>>>>>> >>>>>>> I see that some can drivers stop the queue, others not. >>>>>>> What exactly is the problem in this case? >>>>>> >>>>>> Stopping the queue and returning NETDEV_TX_BUSY does make little sense. >>>>>> NETDEV_TX_BUSY will tell the upper layer to retry asap. In general that >>>>>> case should be avoided by stopping the queue earlier. Actually, >>>>>> netif_stop_queue(dev) is called directly after the if block above, >>>>>> which means that "TX register is still occupied" indicates an >>>>>> error/bug. >>>>>> I think there is a race somewhere. >>>>> >>>>> If you run on "-rt", the IRQ is threaded. This would explain the issue. >>>>> >>>>> Wolfgang. >>>> >>>> Andri, do you see the "TX register is still occupied!" error message? >>>> In my case, I never saw it while working on the rt-related stalls >>>> during my >>>> tests. >>> >>> At a closer look I cannot identify a race. Anyway, the message should >>> never show up. >> >> You could try to get a function trace by enabling functions tracing and >> stopping it with "tracing_off" when the if block above is entered. A >> filter for "cc770_*" and "netif_*" might be useful as well. > > It seems to me these patches are still WIP, right? Or should I take some > to into my git-tree? Richard? My comment was about "TX register is still occupied!, which in fact seems not to happen. Wolfgang. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply 2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason 2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason @ 2018-01-30 16:56 ` Andri Yngvason 2018-02-12 19:45 ` Richard Weinberger 2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger 2 siblings, 1 reply; 16+ messages in thread From: Andri Yngvason @ 2018-01-30 16:56 UTC (permalink / raw) To: linux-can Cc: mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell, Andri Yngvason While waiting for the TX object to send an RTR, an external message with a matching id can overwrite the TX data. In this case we must call the rx routine and then try transmitting the message that was overwritten again. The queue was being stalled because the RX event did not generate an interrupt to wake up the queue again and the TX event did not happen because the TXRQST flag is reset by the chip when new data is received. According to the CC770 datasheet the id of a message object should not be changed while the MSGVAL bit is set. This has been fixed by resetting the MSGVAL bit before modifying the object in the transmit function and setting it after. It is not enough to set & reset CPUUPD. It is important to keep the MSGVAL bit reset while the message object is being modified. Otherwise, during RTR transmission, a frame with matching id could trigger an rx-interrupt, which would cause a race condition between the interrupt routine and the transmit function. Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> --- Changes from v1: - Squashed 4 into 3 - Modified comment - Now using NEWDAT flag instead of MSGLST to check if a message was received into TX object. This allows us to check if the message object was overwritten more than once, and report it. drivers/net/can/cc770/cc770.c | 96 ++++++++++++++++++++++++++++++------------- drivers/net/can/cc770/cc770.h | 2 + 2 files changed, 69 insertions(+), 29 deletions(-) diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c index 12d3b89..e6458cb 100644 --- a/drivers/net/can/cc770/cc770.c +++ b/drivers/net/can/cc770/cc770.c @@ -390,38 +390,23 @@ static int cc770_get_berr_counter(const struct net_device *dev, return 0; } -static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev) +static void cc770_tx(struct net_device *dev, int mo) { struct cc770_priv *priv = netdev_priv(dev); - struct net_device_stats *stats = &dev->stats; - struct can_frame *cf = (struct can_frame *)skb->data; - unsigned int mo = obj2msgobj(CC770_OBJ_TX); + struct can_frame *cf = (struct can_frame *)priv->tx_skb->data; u8 dlc, rtr; u32 id; int i; - if (can_dropped_invalid_skb(dev, skb)) - return NETDEV_TX_OK; - - if ((cc770_read_reg(priv, - msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { - netif_stop_queue(dev); - netdev_err(dev, "TX register is still occupied!\n"); - return NETDEV_TX_BUSY; - } - - netif_stop_queue(dev); - dlc = cf->can_dlc; id = cf->can_id; - if (cf->can_id & CAN_RTR_FLAG) - rtr = 0; - else - rtr = MSGCFG_DIR; + rtr = cf->can_id & CAN_RTR_FLAG ? 0 : MSGCFG_DIR; + + cc770_write_reg(priv, msgobj[mo].ctrl0, + MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); cc770_write_reg(priv, msgobj[mo].ctrl1, RMTPND_RES | TXRQST_RES | CPUUPD_SET | NEWDAT_RES); - cc770_write_reg(priv, msgobj[mo].ctrl0, - MSGVAL_SET | TXIE_SET | RXIE_RES | INTPND_RES); + if (id & CAN_EFF_FLAG) { id &= CAN_EFF_MASK; cc770_write_reg(priv, msgobj[mo].config, @@ -440,13 +425,31 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev) for (i = 0; i < dlc; i++) cc770_write_reg(priv, msgobj[mo].data[i], cf->data[i]); - /* Store echo skb before starting the transfer */ - can_put_echo_skb(skb, dev, 0); - cc770_write_reg(priv, msgobj[mo].ctrl1, - RMTPND_RES | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC); + RMTPND_UNC | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC); + cc770_write_reg(priv, msgobj[mo].ctrl0, + MSGVAL_SET | TXIE_SET | RXIE_SET | INTPND_UNC); +} + +static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct cc770_priv *priv = netdev_priv(dev); + unsigned int mo = obj2msgobj(CC770_OBJ_TX); + + if (can_dropped_invalid_skb(dev, skb)) + return NETDEV_TX_OK; - stats->tx_bytes += dlc; + if ((cc770_read_reg(priv, + msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { + netif_stop_queue(dev); + netdev_err(dev, "TX register is still occupied!\n"); + return NETDEV_TX_BUSY; + } + + netif_stop_queue(dev); + + priv->tx_skb = skb; + cc770_tx(dev, mo); return NETDEV_TX_OK; } @@ -672,13 +675,47 @@ static void cc770_tx_interrupt(struct net_device *dev, unsigned int o) struct cc770_priv *priv = netdev_priv(dev); struct net_device_stats *stats = &dev->stats; unsigned int mo = obj2msgobj(o); + struct can_frame *cf; + u8 ctrl1; + + ctrl1 = cc770_read_reg(priv, msgobj[mo].ctrl1); - /* Nothing more to send, switch off interrupts */ cc770_write_reg(priv, msgobj[mo].ctrl0, MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); + cc770_write_reg(priv, msgobj[mo].ctrl1, + RMTPND_RES | TXRQST_RES | MSGLST_RES | NEWDAT_RES); - stats->tx_packets++; + if (unlikely(!priv->tx_skb)) { + netdev_err(dev, "missing tx skb in tx interrupt\n"); + return; + } + + if (unlikely(ctrl1 & MSGLST_SET)) { + stats->rx_over_errors++; + stats->rx_errors++; + } + + /* When the CC770 is sending an RTR message and it receives a regular + * message that matches the id of the RTR message, it will overwrite the + * outgoing message in the TX register. When this happens we must + * process the received message and try to transmit the outgoing skb + * again. + */ + if (unlikely(ctrl1 & NEWDAT_SET)) { + cc770_rx(dev, mo, ctrl1); + cc770_tx(dev, mo); + return; + } + + can_put_echo_skb(priv->tx_skb, dev, 0); can_get_echo_skb(dev, 0); + + cf = (struct can_frame *)priv->tx_skb->data; + stats->tx_bytes += cf->can_dlc; + stats->tx_packets++; + + priv->tx_skb = NULL; + netif_wake_queue(dev); } @@ -790,6 +827,7 @@ struct net_device *alloc_cc770dev(int sizeof_priv) priv->can.do_set_bittiming = cc770_set_bittiming; priv->can.do_set_mode = cc770_set_mode; priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES; + priv->tx_skb = NULL; memcpy(priv->obj_flags, cc770_obj_flags, sizeof(cc770_obj_flags)); diff --git a/drivers/net/can/cc770/cc770.h b/drivers/net/can/cc770/cc770.h index a1739db..95752e1 100644 --- a/drivers/net/can/cc770/cc770.h +++ b/drivers/net/can/cc770/cc770.h @@ -193,6 +193,8 @@ struct cc770_priv { u8 cpu_interface; /* CPU interface register */ u8 clkout; /* Clock out register */ u8 bus_config; /* Bus conffiguration register */ + + struct sk_buff *tx_skb; }; struct net_device *alloc_cc770dev(int sizeof_priv); -- 1.9.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply 2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason @ 2018-02-12 19:45 ` Richard Weinberger 0 siblings, 0 replies; 16+ messages in thread From: Richard Weinberger @ 2018-02-12 19:45 UTC (permalink / raw) To: Andri Yngvason Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Andri, Am Dienstag, 30. Januar 2018, 17:56:49 CET schrieb Andri Yngvason: > While waiting for the TX object to send an RTR, an external message with a > matching id can overwrite the TX data. In this case we must call the rx > routine and then try transmitting the message that was overwritten again. > > The queue was being stalled because the RX event did not generate an > interrupt to wake up the queue again and the TX event did not happen > because the TXRQST flag is reset by the chip when new data is received. > > According to the CC770 datasheet the id of a message object should not be > changed while the MSGVAL bit is set. This has been fixed by resetting the > MSGVAL bit before modifying the object in the transmit function and setting > it after. It is not enough to set & reset CPUUPD. > > It is important to keep the MSGVAL bit reset while the message object is > being modified. Otherwise, during RTR transmission, a frame with matching > id could trigger an rx-interrupt, which would cause a race condition > between the interrupt routine and the transmit function. > > Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > --- > Changes from v1: > - Squashed 4 into 3 > - Modified comment > - Now using NEWDAT flag instead of MSGLST to check if a message was > received into TX object. This allows us to check if the message object > was overwritten more than once, and report it. > > drivers/net/can/cc770/cc770.c | 96 > ++++++++++++++++++++++++++++++------------- drivers/net/can/cc770/cc770.h | > 2 + > 2 files changed, 69 insertions(+), 29 deletions(-) > > diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c > index 12d3b89..e6458cb 100644 > --- a/drivers/net/can/cc770/cc770.c > +++ b/drivers/net/can/cc770/cc770.c > @@ -390,38 +390,23 @@ static int cc770_get_berr_counter(const struct > net_device *dev, return 0; > } > > -static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device > *dev) +static void cc770_tx(struct net_device *dev, int mo) > { > struct cc770_priv *priv = netdev_priv(dev); > - struct net_device_stats *stats = &dev->stats; > - struct can_frame *cf = (struct can_frame *)skb->data; > - unsigned int mo = obj2msgobj(CC770_OBJ_TX); > + struct can_frame *cf = (struct can_frame *)priv->tx_skb->data; > u8 dlc, rtr; > u32 id; > int i; > > - if (can_dropped_invalid_skb(dev, skb)) > - return NETDEV_TX_OK; > - > - if ((cc770_read_reg(priv, > - msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { > - netif_stop_queue(dev); > - netdev_err(dev, "TX register is still occupied!\n"); > - return NETDEV_TX_BUSY; > - } > - > - netif_stop_queue(dev); > - > dlc = cf->can_dlc; > id = cf->can_id; > - if (cf->can_id & CAN_RTR_FLAG) > - rtr = 0; > - else > - rtr = MSGCFG_DIR; > + rtr = cf->can_id & CAN_RTR_FLAG ? 0 : MSGCFG_DIR; > + > + cc770_write_reg(priv, msgobj[mo].ctrl0, > + MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); > cc770_write_reg(priv, msgobj[mo].ctrl1, > RMTPND_RES | TXRQST_RES | CPUUPD_SET | NEWDAT_RES); > - cc770_write_reg(priv, msgobj[mo].ctrl0, > - MSGVAL_SET | TXIE_SET | RXIE_RES | INTPND_RES); > + > if (id & CAN_EFF_FLAG) { > id &= CAN_EFF_MASK; > cc770_write_reg(priv, msgobj[mo].config, > @@ -440,13 +425,31 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff > *skb, struct net_device *dev) for (i = 0; i < dlc; i++) > cc770_write_reg(priv, msgobj[mo].data[i], cf->data[i]); > > - /* Store echo skb before starting the transfer */ > - can_put_echo_skb(skb, dev, 0); > - > cc770_write_reg(priv, msgobj[mo].ctrl1, > - RMTPND_RES | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC); > + RMTPND_UNC | TXRQST_SET | CPUUPD_RES | NEWDAT_UNC); > + cc770_write_reg(priv, msgobj[mo].ctrl0, > + MSGVAL_SET | TXIE_SET | RXIE_SET | INTPND_UNC); > +} > + > +static netdev_tx_t cc770_start_xmit(struct sk_buff *skb, struct net_device > *dev) +{ > + struct cc770_priv *priv = netdev_priv(dev); > + unsigned int mo = obj2msgobj(CC770_OBJ_TX); > + > + if (can_dropped_invalid_skb(dev, skb)) > + return NETDEV_TX_OK; > > - stats->tx_bytes += dlc; > + if ((cc770_read_reg(priv, > + msgobj[mo].ctrl1) & TXRQST_UNC) == TXRQST_SET) { > + netif_stop_queue(dev); > + netdev_err(dev, "TX register is still occupied!\n"); > + return NETDEV_TX_BUSY; > + } > + > + netif_stop_queue(dev); > + > + priv->tx_skb = skb; > + cc770_tx(dev, mo); > > return NETDEV_TX_OK; > } > @@ -672,13 +675,47 @@ static void cc770_tx_interrupt(struct net_device *dev, > unsigned int o) struct cc770_priv *priv = netdev_priv(dev); > struct net_device_stats *stats = &dev->stats; > unsigned int mo = obj2msgobj(o); > + struct can_frame *cf; > + u8 ctrl1; > + > + ctrl1 = cc770_read_reg(priv, msgobj[mo].ctrl1); > > - /* Nothing more to send, switch off interrupts */ > cc770_write_reg(priv, msgobj[mo].ctrl0, > MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); > + cc770_write_reg(priv, msgobj[mo].ctrl1, > + RMTPND_RES | TXRQST_RES | MSGLST_RES | NEWDAT_RES); > > - stats->tx_packets++; > + if (unlikely(!priv->tx_skb)) { > + netdev_err(dev, "missing tx skb in tx interrupt\n"); > + return; > + } > + > + if (unlikely(ctrl1 & MSGLST_SET)) { > + stats->rx_over_errors++; > + stats->rx_errors++; > + } > + > + /* When the CC770 is sending an RTR message and it receives a regular > + * message that matches the id of the RTR message, it will overwrite the > + * outgoing message in the TX register. When this happens we must > + * process the received message and try to transmit the outgoing skb > + * again. > + */ > + if (unlikely(ctrl1 & NEWDAT_SET)) { > + cc770_rx(dev, mo, ctrl1); > + cc770_tx(dev, mo); > + return; > + } > + > + can_put_echo_skb(priv->tx_skb, dev, 0); > can_get_echo_skb(dev, 0); > + > + cf = (struct can_frame *)priv->tx_skb->data; > + stats->tx_bytes += cf->can_dlc; > + stats->tx_packets++; > + > + priv->tx_skb = NULL; > + > netif_wake_queue(dev); > } > > @@ -790,6 +827,7 @@ struct net_device *alloc_cc770dev(int sizeof_priv) > priv->can.do_set_bittiming = cc770_set_bittiming; > priv->can.do_set_mode = cc770_set_mode; > priv->can.ctrlmode_supported = CAN_CTRLMODE_3_SAMPLES; > + priv->tx_skb = NULL; > > memcpy(priv->obj_flags, cc770_obj_flags, sizeof(cc770_obj_flags)); > > diff --git a/drivers/net/can/cc770/cc770.h b/drivers/net/can/cc770/cc770.h > index a1739db..95752e1 100644 > --- a/drivers/net/can/cc770/cc770.h > +++ b/drivers/net/can/cc770/cc770.h > @@ -193,6 +193,8 @@ struct cc770_priv { > u8 cpu_interface; /* CPU interface register */ > u8 clkout; /* Clock out register */ > u8 bus_config; /* Bus conffiguration register */ > + > + struct sk_buff *tx_skb; > }; > > struct net_device *alloc_cc770dev(int sizeof_priv); Looks good to me. If this patch survives 1-2 days on my test bed, I'll replay with a Reviewed/Tested-by. Thanks, //richard -- sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria ATU66964118 - FN 374287y ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack 2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason 2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason 2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason @ 2018-02-12 19:41 ` Richard Weinberger [not found] ` <151851733257.10946.11726494714017260046@maxwell> 2 siblings, 1 reply; 16+ messages in thread From: Richard Weinberger @ 2018-02-12 19:41 UTC (permalink / raw) To: Andri Yngvason Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Andri, Am Dienstag, 30. Januar 2018, 17:56:47 CET schrieb Andri Yngvason: > This has been reported to cause stalls on rt-linux. > > Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > --- > drivers/net/can/cc770/cc770.c | 15 --------------- > 1 file changed, 15 deletions(-) > > diff --git a/drivers/net/can/cc770/cc770.c b/drivers/net/can/cc770/cc770.c > index 1e37313..9fed163 100644 > --- a/drivers/net/can/cc770/cc770.c > +++ b/drivers/net/can/cc770/cc770.c > @@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff > *skb, struct net_device *dev) > > stats->tx_bytes += dlc; > > - > - /* > - * HM: We had some cases of repeated IRQs so make sure the > - * INT is acknowledged I know it's already further up, but > - * doing again fixed the issue > - */ > - cc770_write_reg(priv, msgobj[mo].ctrl0, > - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); > - > return NETDEV_TX_OK; > } > > @@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device *dev, > unsigned int o) /* Nothing more to send, switch off interrupts */ > cc770_write_reg(priv, msgobj[mo].ctrl0, > MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); > - /* > - * We had some cases of repeated IRQ so make sure the > - * INT is acknowledged > - */ > - cc770_write_reg(priv, msgobj[mo].ctrl0, > - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); > > stats->tx_packets++; > can_get_echo_skb(dev, 0); I saw that stalls too. But instead of just removing these writes I suggest adding a quirk for the can card where they cause trouble. Maybe Wolfgang can give more details why they are here. Thanks, //richard -- sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria ATU66964118 - FN 374287y ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <151851733257.10946.11726494714017260046@maxwell>]
* Re: [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack [not found] ` <151851733257.10946.11726494714017260046@maxwell> @ 2018-02-13 10:35 ` Richard Weinberger 0 siblings, 0 replies; 16+ messages in thread From: Richard Weinberger @ 2018-02-13 10:35 UTC (permalink / raw) To: Andri Yngvason Cc: linux-can, mkl, wg, sigurbjorn.narfason, hrafnkell.eiriksson, patric.thysell Andri, Am Dienstag, 13. Februar 2018, 11:22:12 CET schrieb Andri Yngvason: > Hi Richard, > > Quoting Richard Weinberger (2018-02-12 19:41:48) > > > Andri, > > > > Am Dienstag, 30. Januar 2018, 17:56:47 CET schrieb Andri Yngvason: > > > This has been reported to cause stalls on rt-linux. > > > > > > Signed-off-by: Andri Yngvason <andri.yngvason@marel.com> > > > --- > > > > > > drivers/net/can/cc770/cc770.c | 15 --------------- > > > 1 file changed, 15 deletions(-) > > > > > > diff --git a/drivers/net/can/cc770/cc770.c > > > b/drivers/net/can/cc770/cc770.c > > > index 1e37313..9fed163 100644 > > > --- a/drivers/net/can/cc770/cc770.c > > > +++ b/drivers/net/can/cc770/cc770.c > > > @@ -447,15 +447,6 @@ static netdev_tx_t cc770_start_xmit(struct sk_buff > > > *skb, struct net_device *dev) > > > > > > stats->tx_bytes += dlc; > > > > > > - > > > - /* > > > - * HM: We had some cases of repeated IRQs so make sure the > > > - * INT is acknowledged I know it's already further up, but > > > - * doing again fixed the issue > > > - */ > > > - cc770_write_reg(priv, msgobj[mo].ctrl0, > > > - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); > > > - > > > > > > return NETDEV_TX_OK; > > > > > > } > > > > > > @@ -684,12 +675,6 @@ static void cc770_tx_interrupt(struct net_device > > > *dev, > > > unsigned int o) /* Nothing more to send, switch off interrupts */ > > > > > > cc770_write_reg(priv, msgobj[mo].ctrl0, > > > > > > MSGVAL_RES | TXIE_RES | RXIE_RES | INTPND_RES); > > > > > > - /* > > > - * We had some cases of repeated IRQ so make sure the > > > - * INT is acknowledged > > > - */ > > > - cc770_write_reg(priv, msgobj[mo].ctrl0, > > > - MSGVAL_UNC | TXIE_UNC | RXIE_UNC | INTPND_RES); > > > > > > stats->tx_packets++; > > > can_get_echo_skb(dev, 0); > > > > I saw that stalls too. > > But instead of just removing these writes I suggest adding a quirk for the > > can card where they cause trouble. > > Maybe Wolfgang can give more details why they are here. > > I did not run into this issue myself, I just included this because Patric > sent me this patch for an issue that you were having. I did not know the > original author, otherwise I would have added a "Suggested-by". I'm the author of that debug patch. :-) While reviewing the driver these acks alerted me and after removing them the can card no longer stopped working after some time. Thanks, //richard -- sigma star gmbh - Eduard-Bodem-Gasse 6 - 6020 Innsbruck - Austria ATU66964118 - FN 374287y ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2018-02-16 15:20 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-30 16:56 [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Andri Yngvason 2018-01-30 16:56 ` [PATCH v2 2/3] can: cc770: Stop queue on NETDEV_TX_BUSY Andri Yngvason 2018-02-12 19:40 ` Richard Weinberger 2018-02-12 20:37 ` Wolfgang Grandegger 2018-02-12 20:44 ` Wolfgang Grandegger 2018-02-12 20:55 ` Richard Weinberger 2018-02-12 21:28 ` Wolfgang Grandegger 2018-02-12 21:36 ` Wolfgang Grandegger 2018-02-13 10:32 ` Andri Yngvason 2018-02-13 16:30 ` Wolfgang Grandegger 2018-02-16 14:22 ` Marc Kleine-Budde 2018-02-16 15:20 ` Wolfgang Grandegger 2018-01-30 16:56 ` [PATCH v2 3/3] can: cc770: Fix queue stall & dropped RTR reply Andri Yngvason 2018-02-12 19:45 ` Richard Weinberger 2018-02-12 19:41 ` [PATCH v2 1/3] can: cc770: Remove redundant IRQ ack Richard Weinberger [not found] ` <151851733257.10946.11726494714017260046@maxwell> 2018-02-13 10:35 ` Richard Weinberger
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.