linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ezchip: nps_enet: check if napi has been completed
@ 2017-03-29 10:41 Vlad Zakharov
  2017-03-29 14:52 ` Eric Dumazet
  2017-03-29 21:30 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Vlad Zakharov @ 2017-03-29 10:41 UTC (permalink / raw)
  To: netdev
  Cc: David S . Miller, Eric Dumazet, Elad Kanfi, Noam Camus,
	linux-kernel, linux-snps-arc, Vlad Zakharov

After a new NAPI_STATE_MISSED state was added to NAPI we can get into
this state and in such case we have to reschedule NAPI as some work is
still pending and we have to process it. napi_complete_done() function
returns false if we have to reschedule something (e.g. in case we were
in MISSED state) as current polling have not been completed yet.

nps_enet driver hasn't been verifying the return value of
napi_complete_done() and has been forcibly enabling interrupts. That is
not correct as we should not enable interrupts before we have processed
all scheduled work. As a result we were getting trapped in interrupt
hanlder chain as we had never been able to disabale ethernet
interrupts again.

So this patch makes nps_enet_poll() func verify return value of
napi_complete_done() and enable interrupts only in case all scheduled
work has been completed.

Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
---
 drivers/net/ethernet/ezchip/nps_enet.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
index 992ebe9..f819843 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -189,11 +189,9 @@ static int nps_enet_poll(struct napi_struct *napi, int budget)
 
 	nps_enet_tx_handler(ndev);
 	work_done = nps_enet_rx_handler(ndev);
-	if (work_done < budget) {
+	if ((work_done < budget) && napi_complete_done(napi, work_done)) {
 		u32 buf_int_enable_value = 0;
 
-		napi_complete_done(napi, work_done);
-
 		/* set tx_done and rx_rdy bits */
 		buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT;
 		buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-29 10:41 [PATCH] ezchip: nps_enet: check if napi has been completed Vlad Zakharov
@ 2017-03-29 14:52 ` Eric Dumazet
  2017-03-29 21:30 ` David Miller
  1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2017-03-29 14:52 UTC (permalink / raw)
  To: Vlad Zakharov
  Cc: netdev, David S . Miller, Eric Dumazet, Elad Kanfi, Noam Camus,
	linux-kernel, linux-snps-arc

On Wed, 2017-03-29 at 13:41 +0300, Vlad Zakharov wrote:
> After a new NAPI_STATE_MISSED state was added to NAPI we can get into
> this state and in such case we have to reschedule NAPI as some work is
> still pending and we have to process it. napi_complete_done() function
> returns false if we have to reschedule something (e.g. in case we were
> in MISSED state) as current polling have not been completed yet.
> 
> nps_enet driver hasn't been verifying the return value of
> napi_complete_done() and has been forcibly enabling interrupts. That is
> not correct as we should not enable interrupts before we have processed
> all scheduled work. As a result we were getting trapped in interrupt
> hanlder chain as we had never been able to disabale ethernet
> interrupts again.
> 
> So this patch makes nps_enet_poll() func verify return value of
> napi_complete_done() and enable interrupts only in case all scheduled
> work has been completed.
> 
> Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
> ---
>  drivers/net/ethernet/ezchip/nps_enet.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
> index 992ebe9..f819843 100644
> --- a/drivers/net/ethernet/ezchip/nps_enet.c
> +++ b/drivers/net/ethernet/ezchip/nps_enet.c
> @@ -189,11 +189,9 @@ static int nps_enet_poll(struct napi_struct *napi, int budget)
>  
>  	nps_enet_tx_handler(ndev);
>  	work_done = nps_enet_rx_handler(ndev);
> -	if (work_done < budget) {
> +	if ((work_done < budget) && napi_complete_done(napi, work_done)) {
>  		u32 buf_int_enable_value = 0;
>  
> -		napi_complete_done(napi, work_done);
> -
>  		/* set tx_done and rx_rdy bits */
>  		buf_int_enable_value |= NPS_ENET_ENABLE << RX_RDY_SHIFT;
>  		buf_int_enable_value |= NPS_ENET_ENABLE << TX_DONE_SHIFT;

Seems fine, but looking at this driver, it looks it has some races,
trying to be a bit too smart.

nps_enet_irq_handler() really should be simpler, or the risk of missing
an interrupt might be high.

diff --git a/drivers/net/ethernet/ezchip/nps_enet.c b/drivers/net/ethernet/ezchip/nps_enet.c
index 992ebe973d25bfbccff7b5c42dc1801ea41fc9ea..03885ac0c0f845805eadb4659302b5c11bb250f6 100644
--- a/drivers/net/ethernet/ezchip/nps_enet.c
+++ b/drivers/net/ethernet/ezchip/nps_enet.c
@@ -233,14 +233,11 @@ static irqreturn_t nps_enet_irq_handler(s32 irq, void *dev_instance)
 {
 	struct net_device *ndev = dev_instance;
 	struct nps_enet_priv *priv = netdev_priv(ndev);
-	u32 rx_ctrl_value = nps_enet_reg_get(priv, NPS_ENET_REG_RX_CTL);
-	u32 rx_ctrl_cr = (rx_ctrl_value & RX_CTL_CR_MASK) >> RX_CTL_CR_SHIFT;
 
-	if (nps_enet_is_tx_pending(priv) || rx_ctrl_cr)
-		if (likely(napi_schedule_prep(&priv->napi))) {
-			nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
-			__napi_schedule(&priv->napi);
-		}
+	if (likely(napi_schedule_prep(&priv->napi))) {
+		nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
+		__napi_schedule(&priv->napi);
+	}
 
 	return IRQ_HANDLED;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-29 10:41 [PATCH] ezchip: nps_enet: check if napi has been completed Vlad Zakharov
  2017-03-29 14:52 ` Eric Dumazet
@ 2017-03-29 21:30 ` David Miller
  2017-03-29 21:41   ` Eric Dumazet
  2017-04-26 12:56   ` Vlad Zakharov
  1 sibling, 2 replies; 7+ messages in thread
From: David Miller @ 2017-03-29 21:30 UTC (permalink / raw)
  To: Vladislav.Zakharov
  Cc: netdev, edumazet, eladkan, noamca, linux-kernel, linux-snps-arc

From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com>
Date: Wed, 29 Mar 2017 13:41:46 +0300

> After a new NAPI_STATE_MISSED state was added to NAPI we can get into
> this state and in such case we have to reschedule NAPI as some work is
> still pending and we have to process it. napi_complete_done() function
> returns false if we have to reschedule something (e.g. in case we were
> in MISSED state) as current polling have not been completed yet.
> 
> nps_enet driver hasn't been verifying the return value of
> napi_complete_done() and has been forcibly enabling interrupts. That is
> not correct as we should not enable interrupts before we have processed
> all scheduled work. As a result we were getting trapped in interrupt
> hanlder chain as we had never been able to disabale ethernet
> interrupts again.
> 
> So this patch makes nps_enet_poll() func verify return value of
> napi_complete_done() and enable interrupts only in case all scheduled
> work has been completed.
> 
> Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>

Applied.

Eric, if this is really required now, we have 148 broken drivers still.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-29 21:30 ` David Miller
@ 2017-03-29 21:41   ` Eric Dumazet
  2017-03-30  9:16     ` Vlad Zakharov
  2017-04-26 12:56   ` Vlad Zakharov
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2017-03-29 21:41 UTC (permalink / raw)
  To: David Miller
  Cc: Vladislav.Zakharov, netdev, eladkan, noamca, LKML, linux-snps-arc

On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote:
Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
>
> Applied.
>
> Eric, if this is really required now, we have 148 broken drivers still.

Piece of cake :/

If we get more reports like that, we might implement a logic to
prevent infinite loops.

It is not clear to me what exactly happened to this driver, since
testing napi_complete_done() was not mandatory.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-29 21:41   ` Eric Dumazet
@ 2017-03-30  9:16     ` Vlad Zakharov
  2017-03-30 13:25       ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Vlad Zakharov @ 2017-03-30  9:16 UTC (permalink / raw)
  To: edumazet
  Cc: netdev, eladkan, davem, noamca, Vladislav.Zakharov,
	linux-snps-arc, linux-kernel

Hi Eric,

On Wed, 2017-03-29 at 14:41 -0700, Eric Dumazet wrote:
> On Wed, Mar 29, 2017 at 2:30 PM, David Miller <davem@davemloft.net> wrote:
> Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
> > 
> > 
> > Applied.
> > 
> > Eric, if this is really required now, we have 148 broken drivers still.
> 
> Piece of cake :/
> 
> If we get more reports like that, we might implement a logic to
> prevent infinite loops.
> 
> It is not clear to me what exactly happened to this driver, since
> testing napi_complete_done() was not mandatory.

I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit:
39e6c8208d7b6fb9d2047850fb3327db567b564b

if we got into NAPI_STATE_MISSED state the following happened:
in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after
that without any checks were enabling interrupts.

Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look
inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn
napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been
already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq
hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and
disable corresponding irq. 

-- 
Best regards,
Vlad Zakharov <vzakhar@synopsys.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-30  9:16     ` Vlad Zakharov
@ 2017-03-30 13:25       ` Eric Dumazet
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2017-03-30 13:25 UTC (permalink / raw)
  To: Vlad Zakharov
  Cc: netdev, eladkan, davem, noamca, linux-snps-arc, linux-kernel

On Thu, Mar 30, 2017 at 2:16 AM, Vlad Zakharov
<Vladislav.Zakharov@synopsys.com> wrote:

> I am not sure what is happening with other drivers, but in case of ezchip nps_enet driver after the following commit:
> 39e6c8208d7b6fb9d2047850fb3327db567b564b
>
> if we got into NAPI_STATE_MISSED state the following happened:
> in nps_enet_poll func we were calling napi_complete_done() (which reset MISSED state but left SCHED state) and after
> that without any checks were enabling interrupts.
>
> Then we obviously were getting into nps_enet_irq_hanlder() if irq was pending (it is very possbile state). If we look
> inside this function we will see that it disables interrupts only in case napi_sched_prep() returns true. In its turn
> napi_sched_prep() returns true only in case it changes state from non-SCHED to SCHED. But in our case as SCHED had been
> already set it set MISSED state and then returned false. So at that point we had already been trapped: after exiting irq
> hanlder we were getting into nps_enet_irq_hanlder() again and again as we couldn't rescind pending irq signal and
> disable corresponding irq.

Hi Vlad

Considering the use of  napi_schedule_prep() in
nps_enet_irq_handler(), it is strange that nps_enet_poll() uses :

if (nps_enet_is_tx_pending(priv)) {
     nps_enet_reg_set(priv, NPS_ENET_REG_BUF_INT_ENABLE, 0);
     napi_reschedule(napi); // note the return value is ignored...
}

So nps_enet_poll() was enabling interrupts, then disabling them, which
seems very unusual.

I need to check all drivers using napi_schedule_prep() and/or
napi_reschedule() and make sure they also test napi_comple_done()
return value...

I count 12 drivers using napi_reschedule() without checking its return value.
They either should check its return value, or use conventional napi_schedule()

Thanks !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ezchip: nps_enet: check if napi has been completed
  2017-03-29 21:30 ` David Miller
  2017-03-29 21:41   ` Eric Dumazet
@ 2017-04-26 12:56   ` Vlad Zakharov
  1 sibling, 0 replies; 7+ messages in thread
From: Vlad Zakharov @ 2017-04-26 12:56 UTC (permalink / raw)
  To: davem, netdev; +Cc: linux-kernel, eladkan, noamca, edumazet, linux-snps-arc

Hi David, all,

On Wed, 2017-03-29 at 14:30 -0700, David Miller wrote:
> From: Vlad Zakharov <Vladislav.Zakharov@synopsys.com>
> Date: Wed, 29 Mar 2017 13:41:46 +0300
> 
> > 
> > After a new NAPI_STATE_MISSED state was added to NAPI we can get into
> > this state and in such case we have to reschedule NAPI as some work is
> > still pending and we have to process it. napi_complete_done() function
> > returns false if we have to reschedule something (e.g. in case we were
> > in MISSED state) as current polling have not been completed yet.
> > 
> > nps_enet driver hasn't been verifying the return value of
> > napi_complete_done() and has been forcibly enabling interrupts. That is
> > not correct as we should not enable interrupts before we have processed
> > all scheduled work. As a result we were getting trapped in interrupt
> > hanlder chain as we had never been able to disabale ethernet
> > interrupts again.
> > 
> > So this patch makes nps_enet_poll() func verify return value of
> > napi_complete_done() and enable interrupts only in case all scheduled
> > work has been completed.
> > 
> > Signed-off-by: Vlad Zakharov <vzakhar@synopsys.com>
> 
> Applied.
> 
> Eric, if this is really required now, we have 148 broken drivers still.

Could you please backport this patch to stable tree (starting from 4.10) as it is pretty important and fixes nps_enet
driver?

It became actual after Eric's commit 39e6c8208d7b (net: solve a NAPI race) which has already been backported to 4.10.

Thanks.

-- 
Best regards,
Vlad Zakharov <vzakhar@synopsys.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-04-26 12:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-29 10:41 [PATCH] ezchip: nps_enet: check if napi has been completed Vlad Zakharov
2017-03-29 14:52 ` Eric Dumazet
2017-03-29 21:30 ` David Miller
2017-03-29 21:41   ` Eric Dumazet
2017-03-30  9:16     ` Vlad Zakharov
2017-03-30 13:25       ` Eric Dumazet
2017-04-26 12:56   ` Vlad Zakharov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).