linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
@ 2018-09-17 23:35 Al Stone
  2018-09-18  2:35 ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Al Stone @ 2018-09-17 23:35 UTC (permalink / raw)
  To: netdev, linux-kernel
  Cc: Al Stone, Iyappan Subramanian, Keyur Chudgar, Quan Nguyen,
	David S . Miller

When using the user-space command 'tuned-adm profile network-latency',
the XGene enet driver will cause a hang trying to enable IRQs while
the system is trying to tune itself to be more responsive to network
traffic; dmesg will even tell us that the enables/disables are not
in balance.  With this fix, we force the driver to only enable_irq()
when there has been a previous disable_irq() -- i.e., force the number
of calls to each to be balanced.  This allows the kernel to continue
operating.

In an ideal world, the XGene enet driver would be restructured to avoid
this problem (it seems to be an artifact of when additional packets
arrive and differences of opinion in how the NIC responds under those
circumstances, some of which is controlled by firmware).  In the XGene2
driver, this is not an issue.

However, the XGene (aka Mustang) where this NIC is used is most likely
at the end of its useful life (APM which originally created the XGene
has completely morphed into a new company).  It is unlikely the driver
restructuring that is needed will ever be done.   There are, however,
a bunch of these machines out in the real world, and there are many
of us still using them daily (me, for example).  So, while this patch
is not the ideal way to repair the NIC driver, it does work and allows
us to continue using these boxes for a while longer.

Cc: Iyappan Subramanian <isubramanian@apm.com>
Cc: Keyur Chudgar <kchudgar@apm.com>
Cc: Quan Nguyen <qnguyen@apm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Stone <ahs3@redhat.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 3b889efddf78..90fb87f7e24e 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
 	processed = xgene_enet_process_ring(ring, budget);
 
 	if (processed != budget) {
+		struct irq_desc *desc = irq_to_desc(ring->irq);
+
 		napi_complete_done(napi, processed);
-		enable_irq(ring->irq);
+		if (desc && desc->depth > 0)
+			enable_irq(ring->irq);
 	}
 
 	return processed;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-17 23:35 [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage Al Stone
@ 2018-09-18  2:35 ` David Miller
  2018-09-18 20:21   ` Al Stone
  2018-09-18 23:09   ` Lendacky, Thomas
  0 siblings, 2 replies; 11+ messages in thread
From: David Miller @ 2018-09-18  2:35 UTC (permalink / raw)
  To: ahs3; +Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen

From: Al Stone <ahs3@redhat.com>
Date: Mon, 17 Sep 2018 17:35:33 -0600

> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>  	processed = xgene_enet_process_ring(ring, budget);
>  
>  	if (processed != budget) {
> +		struct irq_desc *desc = irq_to_desc(ring->irq);
> +
>  		napi_complete_done(napi, processed);
> -		enable_irq(ring->irq);
> +		if (desc && desc->depth > 0)
> +			enable_irq(ring->irq);

I really can't accept a patch that grovels into IRQ layer internals
to work around a driver's IRQ enable/disable usage problem.

Sorry.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18  2:35 ` David Miller
@ 2018-09-18 20:21   ` Al Stone
  2018-09-18 23:09   ` Lendacky, Thomas
  1 sibling, 0 replies; 11+ messages in thread
From: Al Stone @ 2018-09-18 20:21 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen

On 09/17/2018 08:35 PM, David Miller wrote:
> From: Al Stone <ahs3@redhat.com>
> Date: Mon, 17 Sep 2018 17:35:33 -0600
> 
>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>  	processed = xgene_enet_process_ring(ring, budget);
>>  
>>  	if (processed != budget) {
>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>> +
>>  		napi_complete_done(napi, processed);
>> -		enable_irq(ring->irq);
>> +		if (desc && desc->depth > 0)
>> +			enable_irq(ring->irq);
> 
> I really can't accept a patch that grovels into IRQ layer internals
> to work around a driver's IRQ enable/disable usage problem.
> 
> Sorry.

No worries.  I hesitated even sending it, actually.  The rewrite of the driver
that is needed is just something no one seems to have any interest in, or any
time for, but it really needs it to fix this properly (a colleague has found at
least one other issue in the structure of this driver, for example).  The cost
of doing that rewrite/restructure, though, far outweighs the benefit for this
device -- especially when the workaround is "don't run tuned".

I'll put a proper fix back on the list of Interesting Things to Work On When
Time Allows, for now.

Thanks for the quick -- and kind :) -- response.  I really appreciate it.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Red Hat, Inc.
ahs3@redhat.com
-----------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18  2:35 ` David Miller
  2018-09-18 20:21   ` Al Stone
@ 2018-09-18 23:09   ` Lendacky, Thomas
  2018-09-18 23:15     ` Al Stone
  1 sibling, 1 reply; 11+ messages in thread
From: Lendacky, Thomas @ 2018-09-18 23:09 UTC (permalink / raw)
  To: David Miller, ahs3; +Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen



On 09/17/2018 09:35 PM, David Miller wrote:
> From: Al Stone <ahs3@redhat.com>
> Date: Mon, 17 Sep 2018 17:35:33 -0600
> 
>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>  	processed = xgene_enet_process_ring(ring, budget);
>>  
>>  	if (processed != budget) {
>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>> +
>>  		napi_complete_done(napi, processed);

The problem could be that the driver isn't checking the
napi_complete_done() return code.  It was changed to return a bool and
the check should be more like:

  if ((processed != budget) && napi_complete_done(napi, processed)) {

If it returns false, then the driver will get called for polling again
after having issued enable_irq() and it well then issue the enable_irq()
a second (or more) time without having the matching diable_irq().

Thanks,
Tom

>> -		enable_irq(ring->irq);
>> +		if (desc && desc->depth > 0)
>> +			enable_irq(ring->irq);
> 
> I really can't accept a patch that grovels into IRQ layer internals
> to work around a driver's IRQ enable/disable usage problem.
> 
> Sorry.
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:09   ` Lendacky, Thomas
@ 2018-09-18 23:15     ` Al Stone
  2018-09-18 23:21       ` Florian Fainelli
  2018-09-18 23:25       ` Eric Dumazet
  0 siblings, 2 replies; 11+ messages in thread
From: Al Stone @ 2018-09-18 23:15 UTC (permalink / raw)
  To: Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen

On 09/18/2018 05:09 PM, Lendacky, Thomas wrote:
> 
> 
> On 09/17/2018 09:35 PM, David Miller wrote:
>> From: Al Stone <ahs3@redhat.com>
>> Date: Mon, 17 Sep 2018 17:35:33 -0600
>>
>>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>>  	processed = xgene_enet_process_ring(ring, budget);
>>>  
>>>  	if (processed != budget) {
>>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>>> +
>>>  		napi_complete_done(napi, processed);
> 
> The problem could be that the driver isn't checking the
> napi_complete_done() return code.  It was changed to return a bool and
> the check should be more like:
> 
>   if ((processed != budget) && napi_complete_done(napi, processed)) {
> 
> If it returns false, then the driver will get called for polling again
> after having issued enable_irq() and it well then issue the enable_irq()
> a second (or more) time without having the matching diable_irq().
> 
> Thanks,
> Tom

Aha, that might be.  My apologies -- I play in ACPI but seldom in the network
drivers, so was not fully aware of that change.  I can give that a try.

Thanks for the pointer.

-- 
ciao,
al
-----------------------------------
Al Stone
Software Engineer
Red Hat, Inc.
ahs3@redhat.com
-----------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:15     ` Al Stone
@ 2018-09-18 23:21       ` Florian Fainelli
  2018-09-18 23:27         ` Eric Dumazet
  2018-09-18 23:25       ` Eric Dumazet
  1 sibling, 1 reply; 11+ messages in thread
From: Florian Fainelli @ 2018-09-18 23:21 UTC (permalink / raw)
  To: ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen

On 09/18/2018 04:15 PM, Al Stone wrote:
> On 09/18/2018 05:09 PM, Lendacky, Thomas wrote:
>>
>>
>> On 09/17/2018 09:35 PM, David Miller wrote:
>>> From: Al Stone <ahs3@redhat.com>
>>> Date: Mon, 17 Sep 2018 17:35:33 -0600
>>>
>>>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>>>  	processed = xgene_enet_process_ring(ring, budget);
>>>>  
>>>>  	if (processed != budget) {
>>>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>>>> +
>>>>  		napi_complete_done(napi, processed);
>>
>> The problem could be that the driver isn't checking the
>> napi_complete_done() return code.  It was changed to return a bool and
>> the check should be more like:
>>
>>   if ((processed != budget) && napi_complete_done(napi, processed)) {
>>
>> If it returns false, then the driver will get called for polling again
>> after having issued enable_irq() and it well then issue the enable_irq()
>> a second (or more) time without having the matching diable_irq().
>>
>> Thanks,
>> Tom
> 
> Aha, that might be.  My apologies -- I play in ACPI but seldom in the network
> drivers, so was not fully aware of that change.  I can give that a try.
> 
> Thanks for the pointer.

FWIW this is being discussed in this thread as well:

https://www.spinics.net/lists/netdev/msg523760.html

there should be an update to drivers that have a ndo_poll_controller()
and don't check napi_complete_done(), though I am not clear who is doing
that yet.
-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:15     ` Al Stone
  2018-09-18 23:21       ` Florian Fainelli
@ 2018-09-18 23:25       ` Eric Dumazet
  1 sibling, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-09-18 23:25 UTC (permalink / raw)
  To: ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen



On 09/18/2018 04:15 PM, Al Stone wrote:
> On 09/18/2018 05:09 PM, Lendacky, Thomas wrote:
>>
>>
>> On 09/17/2018 09:35 PM, David Miller wrote:
>>> From: Al Stone <ahs3@redhat.com>
>>> Date: Mon, 17 Sep 2018 17:35:33 -0600
>>>
>>>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>>>  	processed = xgene_enet_process_ring(ring, budget);
>>>>  
>>>>  	if (processed != budget) {
>>>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>>>> +
>>>>  		napi_complete_done(napi, processed);
>>
>> The problem could be that the driver isn't checking the
>> napi_complete_done() return code.  It was changed to return a bool and
>> the check should be more like:
>>
>>   if ((processed != budget) && napi_complete_done(napi, processed)) {
>>
>> If it returns false, then the driver will get called for polling again
>> after having issued enable_irq() and it well then issue the enable_irq()
>> a second (or more) time without having the matching diable_irq().
>>
>> Thanks,
>> Tom
> 
> Aha, that might be.  My apologies -- I play in ACPI but seldom in the network
> drivers, so was not fully aware of that change.  I can give that a try.
> 
> Thanks for the pointer.
> 

Yes, please look at commit d7aba644ffdebf756e51e26a2229055211838e89
("amd-xgbe: Enable IRQs only if napi_complete_done() is true")

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:21       ` Florian Fainelli
@ 2018-09-18 23:27         ` Eric Dumazet
  2018-09-18 23:56           ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2018-09-18 23:27 UTC (permalink / raw)
  To: Florian Fainelli, ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen



On 09/18/2018 04:21 PM, Florian Fainelli wrote:
> On 09/18/2018 04:15 PM, Al Stone wrote:
>> On 09/18/2018 05:09 PM, Lendacky, Thomas wrote:
>>>
>>>
>>> On 09/17/2018 09:35 PM, David Miller wrote:
>>>> From: Al Stone <ahs3@redhat.com>
>>>> Date: Mon, 17 Sep 2018 17:35:33 -0600
>>>>
>>>>> @@ -866,8 +866,11 @@ static int xgene_enet_napi(struct napi_struct *napi, const int budget)
>>>>>  	processed = xgene_enet_process_ring(ring, budget);
>>>>>  
>>>>>  	if (processed != budget) {
>>>>> +		struct irq_desc *desc = irq_to_desc(ring->irq);
>>>>> +
>>>>>  		napi_complete_done(napi, processed);
>>>
>>> The problem could be that the driver isn't checking the
>>> napi_complete_done() return code.  It was changed to return a bool and
>>> the check should be more like:
>>>
>>>   if ((processed != budget) && napi_complete_done(napi, processed)) {
>>>
>>> If it returns false, then the driver will get called for polling again
>>> after having issued enable_irq() and it well then issue the enable_irq()
>>> a second (or more) time without having the matching diable_irq().
>>>
>>> Thanks,
>>> Tom
>>
>> Aha, that might be.  My apologies -- I play in ACPI but seldom in the network
>> drivers, so was not fully aware of that change.  I can give that a try.
>>
>> Thanks for the pointer.
> 
> FWIW this is being discussed in this thread as well:
> 
> https://www.spinics.net/lists/netdev/msg523760.html
> 
> there should be an update to drivers that have a ndo_poll_controller()
> and don't check napi_complete_done(), though I am not clear who is doing
> that yet.

That might be tricky.

I remember one of the napi_complete_done() change had to be reverted,
for some obscure reason.

So clearly, doing a mass update (without being able to test the driver on real hardware) might be risky.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:27         ` Eric Dumazet
@ 2018-09-18 23:56           ` Eric Dumazet
  2018-09-19  0:03             ` Florian Fainelli
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2018-09-18 23:56 UTC (permalink / raw)
  To: Florian Fainelli, ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen



On 09/18/2018 04:27 PM, Eric Dumazet wrote:
> 

> I remember one of the napi_complete_done() change had to be reverted,
> for some obscure reason.



That was not exactly a revert,  :

commit 129c6cda2de2a8ac44fab096152469999b727faf
Author: Eric Dumazet <edumazet@google.com>
Date:   Mon Sep 18 13:03:43 2017 -0700

    8139too: revisit napi_complete_done() usage
    
    It seems we have to be more careful in napi_complete_done()
    use. This patch is not a revert, as it seems we can
    avoid bug that Ville reported by moving the napi_complete_done()
    test in the spinlock section.
    
    Many thanks to Ville for detective work and all tests.
    
    Fixes: 617f01211baf ("8139too: use napi_complete_done()")
    Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
    
    Signed-off-by: David S. Miller <davem@davemloft.net>




     

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-18 23:56           ` Eric Dumazet
@ 2018-09-19  0:03             ` Florian Fainelli
  2018-09-19  2:32               ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Florian Fainelli @ 2018-09-19  0:03 UTC (permalink / raw)
  To: Eric Dumazet, ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen

On 09/18/2018 04:56 PM, Eric Dumazet wrote:
> 
> 
> On 09/18/2018 04:27 PM, Eric Dumazet wrote:
>>
> 
>> I remember one of the napi_complete_done() change had to be reverted,
>> for some obscure reason.
> 
> 
> 
> That was not exactly a revert,  :

This is what I have so far for the drivers that both use
napi_complete_done() without checking the return value and implement a
ndo_poll_controller() callback:

https://github.com/ffainelli/linux/commits/napi-check

> 
> commit 129c6cda2de2a8ac44fab096152469999b727faf
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Mon Sep 18 13:03:43 2017 -0700
> 
>     8139too: revisit napi_complete_done() usage
>     
>     It seems we have to be more careful in napi_complete_done()
>     use. This patch is not a revert, as it seems we can
>     avoid bug that Ville reported by moving the napi_complete_done()
>     test in the spinlock section.
>     
>     Many thanks to Ville for detective work and all tests.
>     
>     Fixes: 617f01211baf ("8139too: use napi_complete_done()")
>     Reported-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>     Tested-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>     
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> 
> 
> 
>      
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage
  2018-09-19  0:03             ` Florian Fainelli
@ 2018-09-19  2:32               ` Eric Dumazet
  0 siblings, 0 replies; 11+ messages in thread
From: Eric Dumazet @ 2018-09-19  2:32 UTC (permalink / raw)
  To: Florian Fainelli, ahs3, Lendacky, Thomas, David Miller
  Cc: netdev, linux-kernel, isubramanian, kchudgar, qnguyen



On 09/18/2018 05:03 PM, Florian Fainelli wrote:
> On 09/18/2018 04:56 PM, Eric Dumazet wrote:
>>
>>
>> On 09/18/2018 04:27 PM, Eric Dumazet wrote:
>>>
>>
>>> I remember one of the napi_complete_done() change had to be reverted,
>>> for some obscure reason.
>>
>>
>>
>> That was not exactly a revert,  :
> 
> This is what I have so far for the drivers that both use
> napi_complete_done() without checking the return value and implement a
> ndo_poll_controller() callback:
> 
> https://github.com/ffainelli/linux/commits/napi-check

In fact, there is still to explain what the bug is.

napi_complete_done() return value can be ignored, unless drivers
have to disable IRQ in their interrupt handler, using the following
construct :

	if (napi_schedule_prep(napi)) {
               .... disable interrupt ....
               __napi_schedule_irqoff(napi);
        }

                 

It _can_ be used by other drivers to not rearm interrupts needlessly.

The bug discussed in this thread (re-balance IRQ usage) is of the same kind than
the one fixed in commit d7aba644ffdebf756e51e26a2229055211838e89 ("amd-xgbe: Enable IRQs only if napi_complete_done() is true")


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-09-19  2:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-17 23:35 [PATCH] net: apm: xgene: force XGene enet driver to re-balance IRQ usage Al Stone
2018-09-18  2:35 ` David Miller
2018-09-18 20:21   ` Al Stone
2018-09-18 23:09   ` Lendacky, Thomas
2018-09-18 23:15     ` Al Stone
2018-09-18 23:21       ` Florian Fainelli
2018-09-18 23:27         ` Eric Dumazet
2018-09-18 23:56           ` Eric Dumazet
2018-09-19  0:03             ` Florian Fainelli
2018-09-19  2:32               ` Eric Dumazet
2018-09-18 23:25       ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).