Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
@ 2016-11-16 13:55 Brian Starkey
  2016-11-16 15:52 ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-16 13:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

Hi,

I'm running an ARM FVP (virtual platform - simluated hardware), which
is failing to reach a login prompt due to extremely slow progress
during boot. systemd gives up waiting for the ttyAMA0 device to
appear, and never starts the getty.

I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
its job".

Without this commit, the system boots to a login prompt in 2 minutes.
With this commit, the system eventually manages to bring up sshd after
22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
out and so I don't get a prompt on my console.

I only hit the issue when my rootfs is mounted over NFS, and with only
a single core enabled. The (simulated) network device is an SMC91C111.
With multiple cores enabled or a non-NFS filesystem, everything seems
to work OK.

I don't have an identical real hardware platform to try, but I
could not reproduce it on a real ARM Juno board, which is similar.

It looks from the logs that udev's workers are unable to make
progress, so the device nodes don't get created. Don't pay too much
attention to the timestamps in the logs below, they are "inside" the
virtual platform, and don't reflect wall-clock time.
Log before 4cd13c21b207:
	https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
Log after 4cd13c21b207:
	https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
Kernel config: arch/arm64/configs/defconfig

I'm not sure how to debug this further, so if you have any suggestions
I'd be glad to hear them.

Many thanks,
Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-16 13:55 Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" Brian Starkey
@ 2016-11-16 15:52 ` Eric Dumazet
  2016-11-16 18:01   ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-16 15:52 UTC (permalink / raw)
  To: Brian Starkey
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

On Wed, Nov 16, 2016 at 5:55 AM, Brian Starkey <brian.starkey@arm.com> wrote:
> Hi,
>
> I'm running an ARM FVP (virtual platform - simluated hardware), which
> is failing to reach a login prompt due to extremely slow progress
> during boot. systemd gives up waiting for the ttyAMA0 device to
> appear, and never starts the getty.
>
> I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
> its job".
>
> Without this commit, the system boots to a login prompt in 2 minutes.
> With this commit, the system eventually manages to bring up sshd after
> 22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
> out and so I don't get a prompt on my console.
>
> I only hit the issue when my rootfs is mounted over NFS, and with only
> a single core enabled. The (simulated) network device is an SMC91C111.
> With multiple cores enabled or a non-NFS filesystem, everything seems
> to work OK.
>
> I don't have an identical real hardware platform to try, but I
> could not reproduce it on a real ARM Juno board, which is similar.
>
> It looks from the logs that udev's workers are unable to make
> progress, so the device nodes don't get created. Don't pay too much
> attention to the timestamps in the logs below, they are "inside" the
> virtual platform, and don't reflect wall-clock time.
> Log before 4cd13c21b207:
>         https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
> Log after 4cd13c21b207:
>         https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
> Kernel config: arch/arm64/configs/defconfig
>
> I'm not sure how to debug this further, so if you have any suggestions
> I'd be glad to hear them.
>
> Many thanks,
> Brian
>

Hi Brian.

Thanks a lot for this report.

If issue triggers when/if using one core, it is possible one driver
has a dependency on
softirqs being serviced during an initialization loop.

If the thread is not yielding cpu (holding something like a spinlock
thus disabling preemption),
then ksoftirqd might not be able to run on the (same) cpu.

I sent a patch for busy polling yesterday, but I am almost certain
this would not fix your issue
(assuming you have CONFIG_PREEMPT)

https://patchwork.ozlabs.org/patch/695185/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-16 15:52 ` Eric Dumazet
@ 2016-11-16 18:01   ` Brian Starkey
  2016-11-16 18:49     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-16 18:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

Hi Eric,

On Wed, Nov 16, 2016 at 07:52:42AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 5:55 AM, Brian Starkey <brian.starkey@arm.com> wrote:
>> Hi,
>>
>> I'm running an ARM FVP (virtual platform - simluated hardware), which
>> is failing to reach a login prompt due to extremely slow progress
>> during boot. systemd gives up waiting for the ttyAMA0 device to
>> appear, and never starts the getty.
>>
>> I've bisected this to commit 4cd13c21b207 "softirq: Let ksoftirqd do
>> its job".
>>
>> Without this commit, the system boots to a login prompt in 2 minutes.
>> With this commit, the system eventually manages to bring up sshd after
>> 22 minutes, but as mentioned, the dev-ttyAMA0.device unit has timed
>> out and so I don't get a prompt on my console.
>>
>> I only hit the issue when my rootfs is mounted over NFS, and with only
>> a single core enabled. The (simulated) network device is an SMC91C111.
>> With multiple cores enabled or a non-NFS filesystem, everything seems
>> to work OK.
>>
>> I don't have an identical real hardware platform to try, but I
>> could not reproduce it on a real ARM Juno board, which is similar.
>>
>> It looks from the logs that udev's workers are unable to make
>> progress, so the device nodes don't get created. Don't pay too much
>> attention to the timestamps in the logs below, they are "inside" the
>> virtual platform, and don't reflect wall-clock time.
>> Log before 4cd13c21b207:
>>         https://drive.google.com/open?id=0B8siaK6ZjvEwMktoa0NUS2hJd1U
>> Log after 4cd13c21b207:
>>         https://drive.google.com/open?id=0B8siaK6ZjvEwZXlfeFFSQl9xZTQ
>> Kernel config: arch/arm64/configs/defconfig
>>
>> I'm not sure how to debug this further, so if you have any suggestions
>> I'd be glad to hear them.
>>
>> Many thanks,
>> Brian
>>
>
>Hi Brian.
>
>Thanks a lot for this report.
>
>If issue triggers when/if using one core, it is possible one driver
>has a dependency on
>softirqs being serviced during an initialization loop.
>
>If the thread is not yielding cpu (holding something like a spinlock
>thus disabling preemption),
>then ksoftirqd might not be able to run on the (same) cpu.
>

The smc91x driver does seem to have some trickiness around softirqs.
I'm not familiar with net drivers, but I'll see if I can figure
anything out there.

>I sent a patch for busy polling yesterday, but I am almost certain
>this would not fix your issue
>(assuming you have CONFIG_PREEMPT)
>
>https://patchwork.ozlabs.org/patch/695185/

You're right in saying that this didn't help.

Thanks,
Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-16 18:01   ` Brian Starkey
@ 2016-11-16 18:49     ` Eric Dumazet
  2016-11-16 21:01       ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-16 18:49 UTC (permalink / raw)
  To: Brian Starkey
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <brian.starkey@arm.com> wrote:

>
> The smc91x driver does seem to have some trickiness around softirqs.
> I'm not familiar with net drivers, but I'll see if I can figure
> anything out there.

Oh this code looks ugly :(

Do you have CONFIG_SMP=y or not ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-16 18:49     ` Eric Dumazet
@ 2016-11-16 21:01       ` Brian Starkey
  2016-11-17 15:29         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-16 21:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <brian.starkey@arm.com> wrote:
>
>>
>> The smc91x driver does seem to have some trickiness around softirqs.
>> I'm not familiar with net drivers, but I'll see if I can figure
>> anything out there.
>
>Oh this code looks ugly :(
>
>Do you have CONFIG_SMP=y or not ?

Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).

I did try forcing it into the no-op locking (as though config SMP
wasn't set), it didn't help (and it doesn't look like that would be
safe with CONFIG_PREEMPT=y either).

The bit in smc_hardware_send_pkt looks like skipping softirq
invocation when there's already one running wouldn't give the same
behaviour as before:

	if (!smc_special_trylock(&lp->lock, flags)) {
		netif_stop_queue(dev);
		tasklet_schedule(&lp->tx_task);
		return;
	}

... that said, I've no idea if that matters.

Of course I also don't know if the network driver is even to blame :-(

Cheers,
-Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-16 21:01       ` Brian Starkey
@ 2016-11-17 15:29         ` Eric Dumazet
  2016-11-17 16:42           ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-17 15:29 UTC (permalink / raw)
  To: Brian Starkey
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 3176 bytes --]

On Wed, Nov 16, 2016 at 1:01 PM, Brian Starkey <brian.starkey@arm.com> wrote:
> On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>>
>> On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <brian.starkey@arm.com>
>> wrote:
>>
>>>
>>> The smc91x driver does seem to have some trickiness around softirqs.
>>> I'm not familiar with net drivers, but I'll see if I can figure
>>> anything out there.
>>
>>
>> Oh this code looks ugly :(
>>
>> Do you have CONFIG_SMP=y or not ?
>
>
> Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).
>
> I did try forcing it into the no-op locking (as though config SMP
> wasn't set), it didn't help (and it doesn't look like that would be
> safe with CONFIG_PREEMPT=y either).
>
> The bit in smc_hardware_send_pkt looks like skipping softirq
> invocation when there's already one running wouldn't give the same
> behaviour as before:
>
>         if (!smc_special_trylock(&lp->lock, flags)) {
>                 netif_stop_queue(dev);
>                 tasklet_schedule(&lp->tx_task);
>                 return;
>         }
>
> ... that said, I've no idea if that matters.
>
> Of course I also don't know if the network driver is even to blame :-(
>

I believe the problem is in SMC_WAIT_MMU_BUSY()

Could you try this patch ? (inlined and attached)

diff --git a/drivers/net/ethernet/smsc/smc91x.c
b/drivers/net/ethernet/smsc/smc91x.c
index 65077c77082a2f042117a0889c2b15099c58eae5..4ef653ae8564c6d2a5120cdaef12db1e1b218b39
100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -229,19 +229,22 @@ static inline void PRINT_PKT(u_char *buf, int length) { }
  * if at all, but let's avoid deadlocking the system if the hardware
  * decides to go south.
  */
-#define SMC_WAIT_MMU_BUSY(lp) do {                                     \
-       if (unlikely(SMC_GET_MMU_CMD(lp) & MC_BUSY)) {          \
-               unsigned long timeout = jiffies + 2;                    \
-               while (SMC_GET_MMU_CMD(lp) & MC_BUSY) {         \
-                       if (time_after(jiffies, timeout)) {             \
-                               netdev_dbg(dev, "timeout %s line %d\n", \
-                                          __FILE__, __LINE__);         \
-                               break;                                  \
-                       }                                               \
-                       cpu_relax();                                    \
-               }                                                       \
-       }                                                               \
-} while (0)
+static void SMC_WAIT_MMU_BUSY(struct smc_local *lp)
+{
+       unsigned long timeout = jiffies + 2;
+       unsigned int count = 10000;
+
+       while (SMC_GET_MMU_CMD(lp) & MC_BUSY) {
+               count--;
+               if (!count || time_after(jiffies, timeout)) {
+                       netdev_dbg(lp->dev, "timeout %s line %d\n",
+                                  __FILE__, __LINE__);
+                       break;
+               }
+       /* TODO : investigate using cond_resched() from allowed contexts */
+       cpu_relax();
+       }
+}


 /*

[-- Attachment #2: patch2250.txt --]
[-- Type: text/plain, Size: 1266 bytes --]

diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
index 65077c77082a2f042117a0889c2b15099c58eae5..4ef653ae8564c6d2a5120cdaef12db1e1b218b39 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -229,19 +229,22 @@ static inline void PRINT_PKT(u_char *buf, int length) { }
  * if at all, but let's avoid deadlocking the system if the hardware
  * decides to go south.
  */
-#define SMC_WAIT_MMU_BUSY(lp) do {					\
-	if (unlikely(SMC_GET_MMU_CMD(lp) & MC_BUSY)) {		\
-		unsigned long timeout = jiffies + 2;			\
-		while (SMC_GET_MMU_CMD(lp) & MC_BUSY) {		\
-			if (time_after(jiffies, timeout)) {		\
-				netdev_dbg(dev, "timeout %s line %d\n",	\
-					   __FILE__, __LINE__);		\
-				break;					\
-			}						\
-			cpu_relax();					\
-		}							\
-	}								\
-} while (0)
+static void SMC_WAIT_MMU_BUSY(struct smc_local *lp)
+{
+	unsigned long timeout = jiffies + 2;
+	unsigned int count = 10000;
+
+	while (SMC_GET_MMU_CMD(lp) & MC_BUSY) {
+		count--;
+		if (!count || time_after(jiffies, timeout)) {
+			netdev_dbg(lp->dev, "timeout %s line %d\n",
+				   __FILE__, __LINE__);
+			break;
+		}
+	/* TODO : investigate using cond_resched() from allowed contexts */
+	cpu_relax();
+	}
+}
 
 
 /*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-17 15:29         ` Eric Dumazet
@ 2016-11-17 16:42           ` Brian Starkey
  2016-11-18  0:40             ` Thomas Gleixner
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-17 16:42 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior,
	Thomas Gleixner

Hi Eric,

On Thu, Nov 17, 2016 at 07:29:14AM -0800, Eric Dumazet wrote:
>On Wed, Nov 16, 2016 at 1:01 PM, Brian Starkey <brian.starkey@arm.com> wrote:
>> On Wed, Nov 16, 2016 at 10:49:06AM -0800, Eric Dumazet wrote:
>>>
>>> On Wed, Nov 16, 2016 at 10:01 AM, Brian Starkey <brian.starkey@arm.com>
>>> wrote:
>>>
>>>>
>>>> The smc91x driver does seem to have some trickiness around softirqs.
>>>> I'm not familiar with net drivers, but I'll see if I can figure
>>>> anything out there.
>>>
>>>
>>> Oh this code looks ugly :(
>>>
>>> Do you have CONFIG_SMP=y or not ?
>>
>>
>> Yeah CONFIG_SMP=y (and CONFIG_PREEMPT=y too, fwiw).
>>
>> I did try forcing it into the no-op locking (as though config SMP
>> wasn't set), it didn't help (and it doesn't look like that would be
>> safe with CONFIG_PREEMPT=y either).
>>
>> The bit in smc_hardware_send_pkt looks like skipping softirq
>> invocation when there's already one running wouldn't give the same
>> behaviour as before:
>>
>>         if (!smc_special_trylock(&lp->lock, flags)) {
>>                 netif_stop_queue(dev);
>>                 tasklet_schedule(&lp->tx_task);
>>                 return;
>>         }
>>
>> ... that said, I've no idea if that matters.
>>
>> Of course I also don't know if the network driver is even to blame :-(
>>
>
>I believe the problem is in SMC_WAIT_MMU_BUSY()
>
>Could you try this patch ? (inlined and attached)
>

No joy with this patch :-(

I had to add an ioaddr argument because apparently that macro depends
on local context (yuck...), but it doesn't help my issue.

FWIW I don't see any timeouts, either with or without the patch.
(I don't know for sure, but I would guess that the model of the
network card doesn't model whatever stall that loop is checking for.
It probably just completes all MMU operations immediately)

-Brian

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-17 16:42           ` Brian Starkey
@ 2016-11-18  0:40             ` Thomas Gleixner
  2016-11-18 20:23               ` Brian Starkey
  2016-11-22 10:33               ` Brian Starkey
  0 siblings, 2 replies; 19+ messages in thread
From: Thomas Gleixner @ 2016-11-18  0:40 UTC (permalink / raw)
  To: Brian Starkey
  Cc: Eric Dumazet, LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior

Brian,

On Thu, 17 Nov 2016, Brian Starkey wrote:
> No joy with this patch :-(
> 
> I had to add an ioaddr argument because apparently that macro depends
> on local context (yuck...), but it doesn't help my issue.
> 
> FWIW I don't see any timeouts, either with or without the patch.
> (I don't know for sure, but I would guess that the model of the
> network card doesn't model whatever stall that loop is checking for.
> It probably just completes all MMU operations immediately)

Is there a chance that you enable trace points at the kernel command line?

  trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit

should be enough for a start. All we need aside of that is a trigger to
stop the trace so we can actually see the events around the time where
things go stale.

I assume that the whole issue is visible throughout the slow progress of
init towards a working system, so for a start it would be sufficient to add
something like this into the startup sequence at some point:

 mount -t debugfs debugfs /sys/kernel/debug
 echo 0 >/sys/kernel/debug/tracing/tracing_on

The only interesting challange is to get the trace data out of the
system. The trace is accessible via:

 cat /sys/kernel/tracing/trace 

So if your ssh works at some point, that might be an option or you just try
to store it over NFS (which will be slow, but better than nothing). Maybe
you have a better idea :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-18  0:40             ` Thomas Gleixner
@ 2016-11-18 20:23               ` Brian Starkey
  2016-11-22 10:33               ` Brian Starkey
  1 sibling, 0 replies; 19+ messages in thread
From: Brian Starkey @ 2016-11-18 20:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior

Hi Thomas,

On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>Brian,
>
>On Thu, 17 Nov 2016, Brian Starkey wrote:
>> No joy with this patch :-(
>>
>> I had to add an ioaddr argument because apparently that macro depends
>> on local context (yuck...), but it doesn't help my issue.
>>
>> FWIW I don't see any timeouts, either with or without the patch.
>> (I don't know for sure, but I would guess that the model of the
>> network card doesn't model whatever stall that loop is checking for.
>> It probably just completes all MMU operations immediately)
>
>Is there a chance that you enable trace points at the kernel command line?
>
>  trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>
>should be enough for a start. All we need aside of that is a trigger to
>stop the trace so we can actually see the events around the time where
>things go stale.
>
>I assume that the whole issue is visible throughout the slow progress of
>init towards a working system, so for a start it would be sufficient to add
>something like this into the startup sequence at some point:
>
> mount -t debugfs debugfs /sys/kernel/debug
> echo 0 >/sys/kernel/debug/tracing/tracing_on
>
>The only interesting challange is to get the trace data out of the
>system. The trace is accessible via:
>
> cat /sys/kernel/tracing/trace
>

Thanks for the pointers on tracing. I haven't used it before so that
was very helpful.

>So if your ssh works at some point, that might be an option or you just try
>to store it over NFS (which will be slow, but better than nothing). Maybe
>you have a better idea :)

I've tried a whole bunch of different ways to reproduce the problem
and get the logs out, so far they've all been unsuccessful
(reproducing is easy, getting data out is not).

I have a few more ideas to try, but it's pretty slow work - it's
taking at least 30 minutes per attempt. I'll let you know if I manage
something.

Thanks!
-Brian

>
>Thanks,
>
>	tglx
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-18  0:40             ` Thomas Gleixner
  2016-11-18 20:23               ` Brian Starkey
@ 2016-11-22 10:33               ` Brian Starkey
  2016-11-22 14:29                 ` Eric Dumazet
  1 sibling, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-22 10:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Eric Dumazet, LKML, Peter Zijlstra, Ingo Molnar, Andrew Morton,
	Alexander Potapenko, Steven Rostedt, Sebastian Andrzej Siewior

Hi,

On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>Brian,
>
>On Thu, 17 Nov 2016, Brian Starkey wrote:
>> No joy with this patch :-(
>>
>> I had to add an ioaddr argument because apparently that macro depends
>> on local context (yuck...), but it doesn't help my issue.
>>
>> FWIW I don't see any timeouts, either with or without the patch.
>> (I don't know for sure, but I would guess that the model of the
>> network card doesn't model whatever stall that loop is checking for.
>> It probably just completes all MMU operations immediately)
>
>Is there a chance that you enable trace points at the kernel command line?
>
>  trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>
>should be enough for a start. All we need aside of that is a trigger to
>stop the trace so we can actually see the events around the time where
>things go stale.
>
>I assume that the whole issue is visible throughout the slow progress of
>init towards a working system, so for a start it would be sufficient to add
>something like this into the startup sequence at some point:
>
> mount -t debugfs debugfs /sys/kernel/debug
> echo 0 >/sys/kernel/debug/tracing/tracing_on
>
>The only interesting challange is to get the trace data out of the
>system. The trace is accessible via:
>
> cat /sys/kernel/tracing/trace
>
>So if your ssh works at some point, that might be an option or you just try
>to store it over NFS (which will be slow, but better than nothing). Maybe
>you have a better idea :)

I finally managed to pry some traces out this morning. It seems like
the system struggles to even invoke echo when it's doing badly.

Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
(btw, if there's a preferred way to send the logs let me know. I
wasn't sure large or non-text attachments would be well received)

I'm not sure how much help the trace is, but it does look like the
system is spending far too much time in the ethernet device's IRQ
handler to be healthy.

Thanks,
Brian
>
>Thanks,
>
>	tglx
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-22 10:33               ` Brian Starkey
@ 2016-11-22 14:29                 ` Eric Dumazet
  2016-11-22 15:27                   ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-22 14:29 UTC (permalink / raw)
  To: Brian Starkey
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

On Tue, Nov 22, 2016 at 2:33 AM, Brian Starkey <brian.starkey@arm.com> wrote:
>
> Hi,
>
> On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>>
>> Brian,
>>
>> On Thu, 17 Nov 2016, Brian Starkey wrote:
>>>
>>> No joy with this patch :-(
>>>
>>> I had to add an ioaddr argument because apparently that macro depends
>>> on local context (yuck...), but it doesn't help my issue.
>>>
>>> FWIW I don't see any timeouts, either with or without the patch.
>>> (I don't know for sure, but I would guess that the model of the
>>> network card doesn't model whatever stall that loop is checking for.
>>> It probably just completes all MMU operations immediately)
>>
>>
>> Is there a chance that you enable trace points at the kernel command line?
>>
>>  trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>>
>> should be enough for a start. All we need aside of that is a trigger to
>> stop the trace so we can actually see the events around the time where
>> things go stale.
>>
>> I assume that the whole issue is visible throughout the slow progress of
>> init towards a working system, so for a start it would be sufficient to add
>> something like this into the startup sequence at some point:
>>
>> mount -t debugfs debugfs /sys/kernel/debug
>> echo 0 >/sys/kernel/debug/tracing/tracing_on
>>
>> The only interesting challange is to get the trace data out of the
>> system. The trace is accessible via:
>>
>> cat /sys/kernel/tracing/trace
>>
>> So if your ssh works at some point, that might be an option or you just try
>> to store it over NFS (which will be slow, but better than nothing). Maybe
>> you have a better idea :)
>
>
> I finally managed to pry some traces out this morning. It seems like
> the system struggles to even invoke echo when it's doing badly.
>
> Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
> Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
> (btw, if there's a preferred way to send the logs let me know. I
> wasn't sure large or non-text attachments would be well received)
>
> I'm not sure how much help the trace is, but it does look like the
> system is spending far too much time in the ethernet device's IRQ
> handler to be healthy.
>

Thanks a lot Brian

Can you confirm interrupt handler is smc911x_interrupt() ?

(ie : is SMC_USE_PXA_DMA / SMC_USE_DMA defined or not ?)


>
> Thanks,
> Brian
>>
>>
>> Thanks,
>>
>>         tglx
>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-22 14:29                 ` Eric Dumazet
@ 2016-11-22 15:27                   ` Brian Starkey
  2016-11-22 16:09                     ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-22 15:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

Hi Eric,

On Tue, Nov 22, 2016 at 06:29:33AM -0800, Eric Dumazet wrote:
>On Tue, Nov 22, 2016 at 2:33 AM, Brian Starkey <brian.starkey@arm.com> wrote:
>>
>> Hi,
>>
>> On Fri, Nov 18, 2016 at 01:40:43AM +0100, Thomas Gleixner wrote:
>>>
>>> Brian,
>>>
>>> On Thu, 17 Nov 2016, Brian Starkey wrote:
>>>>
>>>> No joy with this patch :-(
>>>>
>>>> I had to add an ioaddr argument because apparently that macro depends
>>>> on local context (yuck...), but it doesn't help my issue.
>>>>
>>>> FWIW I don't see any timeouts, either with or without the patch.
>>>> (I don't know for sure, but I would guess that the model of the
>>>> network card doesn't model whatever stall that loop is checking for.
>>>> It probably just completes all MMU operations immediately)
>>>
>>>
>>> Is there a chance that you enable trace points at the kernel command line?
>>>
>>>  trace_event=sched_wakeup,sched_switch,irq_handler_entry,irq_handler_exit,softirq_raise,softirq_entry,softirq_exit
>>>
>>> should be enough for a start. All we need aside of that is a trigger to
>>> stop the trace so we can actually see the events around the time where
>>> things go stale.
>>>
>>> I assume that the whole issue is visible throughout the slow progress of
>>> init towards a working system, so for a start it would be sufficient to add
>>> something like this into the startup sequence at some point:
>>>
>>> mount -t debugfs debugfs /sys/kernel/debug
>>> echo 0 >/sys/kernel/debug/tracing/tracing_on
>>>
>>> The only interesting challange is to get the trace data out of the
>>> system. The trace is accessible via:
>>>
>>> cat /sys/kernel/tracing/trace
>>>
>>> So if your ssh works at some point, that might be an option or you just try
>>> to store it over NFS (which will be slow, but better than nothing). Maybe
>>> you have a better idea :)
>>
>>
>> I finally managed to pry some traces out this morning. It seems like
>> the system struggles to even invoke echo when it's doing badly.
>>
>> Trace before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwU21wNTdZS29kVXc
>> Trace after 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwbXVzcnpieVkzWFU
>> (btw, if there's a preferred way to send the logs let me know. I
>> wasn't sure large or non-text attachments would be well received)
>>
>> I'm not sure how much help the trace is, but it does look like the
>> system is spending far too much time in the ethernet device's IRQ
>> handler to be healthy.
>>
>
>Thanks a lot Brian
>
>Can you confirm interrupt handler is smc911x_interrupt() ?
>
>(ie : is SMC_USE_PXA_DMA / SMC_USE_DMA defined or not ?)
>

Looks like there's a few similarly named devices and drivers. Mine is
an SMSC LAN91C111 using the smc91x driver in
drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
interrupt handler is smc_interrupt()

CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.

-Brian

>
>>
>> Thanks,
>> Brian
>>>
>>>
>>> Thanks,
>>>
>>>         tglx
>>>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-22 15:27                   ` Brian Starkey
@ 2016-11-22 16:09                     ` Eric Dumazet
  2016-11-23 18:21                       ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-22 16:09 UTC (permalink / raw)
  To: Brian Starkey
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

>
> Looks like there's a few similarly named devices and drivers. Mine is
> an SMSC LAN91C111 using the smc91x driver in
> drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
> interrupt handler is smc_interrupt()
>
> CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.

Oh right.

static irqreturn_t smc_interrupt(int irq, void *dev_id)
{
...
mask = SMC_GET_INT_MASK(lp);
SMC_SET_INT_MASK(lp, 0);

loop()

...
} else if (status & IM_ALLOC_INT) {
    DBG(3, dev, "Allocation irq\n");
    tasklet_hi_schedule(&lp->tx_task);
    mask &= ~IM_ALLOC_INT;
}
...

SMC_SET_INT_MASK(lp, mask);
spin_unlock(&lp->lock);
        /*
         * We return IRQ_HANDLED unconditionally here even if there was
         * nothing to do.  There is a possibility that a packet might
         * get enqueued into the chip right after TX_EMPTY_INT is raised
         * but just before the CPU acknowledges the IRQ.
         * Better take an unneeded IRQ in some occasions than complexifying
         * the code for all cases.
         */
        return IRQ_HANDLED;
}

Could you trace mask value, it looks we loop and never acknowledge
some interrupt status.

Maybe driver depends on tasklet_hi_schedule() triggers
smc_hardware_send_pkt() really really soon.

Oh, but smc_hardware_send_pkt() uses a spin_trylock() at its
beginning, then later a spin_lock() hidden in SMC_ENABLE_INT(),
this looks racy.


diff --git a/drivers/net/ethernet/smsc/smc91x.c
b/drivers/net/ethernet/smsc/smc91x.c
index 73212590d04a..32abb0a084ec 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -617,14 +617,13 @@ static void smc_hardware_send_pkt(unsigned long data)

        /* queue the packet for TX */
        SMC_SET_MMU_CMD(lp, MC_ENQUEUE);
+       SMC_SET_INT_MASK(lp, SMC_GET_INT_MASK(lp) | IM_TX_INT |
IM_TX_EMPTY_INT);
        smc_special_unlock(&lp->lock, flags);

        netif_trans_update(dev);
        dev->stats.tx_packets++;
        dev->stats.tx_bytes += len;

-       SMC_ENABLE_INT(lp, IM_TX_INT | IM_TX_EMPTY_INT);
-
 done:  if (!THROTTLE_TX_PKTS)
                netif_wake_queue(dev);

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-22 16:09                     ` Eric Dumazet
@ 2016-11-23 18:21                       ` Brian Starkey
  2016-11-23 20:03                         ` Eric Dumazet
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-23 18:21 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

Hi Eric,

On Tue, Nov 22, 2016 at 08:09:58AM -0800, Eric Dumazet wrote:
>>
>> Looks like there's a few similarly named devices and drivers. Mine is
>> an SMSC LAN91C111 using the smc91x driver in
>> drivers/net/ethernet/smsc/smc91x.c, rather than smc911x.c. So the
>> interrupt handler is smc_interrupt()
>>
>> CONFIG_ARCH_PXA is not set, nor is SMC_USE_PXA_DMA or SMC_USE_DMA.
>
>Oh right.
>
>static irqreturn_t smc_interrupt(int irq, void *dev_id)
>{
>...
>mask = SMC_GET_INT_MASK(lp);
>SMC_SET_INT_MASK(lp, 0);
>
>loop()
>
>...
>} else if (status & IM_ALLOC_INT) {
>    DBG(3, dev, "Allocation irq\n");
>    tasklet_hi_schedule(&lp->tx_task);
>    mask &= ~IM_ALLOC_INT;
>}
>...
>
>SMC_SET_INT_MASK(lp, mask);
>spin_unlock(&lp->lock);
>        /*
>         * We return IRQ_HANDLED unconditionally here even if there was
>         * nothing to do.  There is a possibility that a packet might
>         * get enqueued into the chip right after TX_EMPTY_INT is raised
>         * but just before the CPU acknowledges the IRQ.
>         * Better take an unneeded IRQ in some occasions than complexifying
>         * the code for all cases.
>         */
>        return IRQ_HANDLED;
>}
>
>Could you trace mask value, it looks we loop and never acknowledge
>some interrupt status.
>
>Maybe driver depends on tasklet_hi_schedule() triggers
>smc_hardware_send_pkt() really really soon.
>
>Oh, but smc_hardware_send_pkt() uses a spin_trylock() at its
>beginning, then later a spin_lock() hidden in SMC_ENABLE_INT(),
>this looks racy.

This patch didn't help.

I did get some new traces though - I've attached the diff for the
trace_printks I added.

Before 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
After 4cd13c21b207: https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8

It looks like the difference is that after 4cd13c21b207 the RX softirq
isn't running, and RX interrupts don't call softirq_raise anymore -
presumably because there's one pending, but I didn't have time to
track that down to a code-path.

Cheers,
-Brian

--->8---

diff --git a/drivers/net/ethernet/smsc/smc91x.c b/drivers/net/ethernet/smsc/smc91x.c
index 503a3b6..6598319 100644
--- a/drivers/net/ethernet/smsc/smc91x.c
+++ b/drivers/net/ethernet/smsc/smc91x.c
@@ -418,7 +418,7 @@ static inline void  smc_rcv(struct net_device *dev)
  
  	packet_number = SMC_GET_RXFIFO(lp);
  	if (unlikely(packet_number & RXFIFO_REMPTY)) {
-		PRINTK(dev, "smc_rcv with nothing on FIFO.\n");
+		trace_printk("smc_rcv with nothing on FIFO.\n");
  		return;
  	}
  
@@ -475,6 +475,7 @@ static inline void  smc_rcv(struct net_device *dev)
  			SMC_WAIT_MMU_BUSY(lp);
  			SMC_SET_MMU_CMD(lp, MC_RELEASE);
  			dev->stats.rx_dropped++;
+			trace_printk("rx_dropped.\n");
  			return;
  		}
  
@@ -1216,7 +1217,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
  	int status, mask, timeout, card_stats;
  	int saved_pointer;
  
-	DBG(3, dev, "%s\n", __func__);
+	trace_printk("%s\n", __func__);
  
  	spin_lock(&lp->lock);
  
@@ -1235,7 +1236,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
  	do {
  		status = SMC_GET_INT(lp);
  
-		DBG(2, dev, "INT 0x%02x MASK 0x%02x MEM 0x%04x FIFO 0x%04x\n",
+		trace_printk("INT 0x%02x MASK 0x%02x MEM 0x%04x FIFO 0x%04x\n",
  		    status, mask,
  		    ({ int meminfo; SMC_SELECT_BANK(lp, 0);
  		       meminfo = SMC_GET_MIR(lp);
@@ -1254,7 +1255,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
  			if (THROTTLE_TX_PKTS)
  				netif_wake_queue(dev);
  		} else if (status & IM_RCV_INT) {
-			DBG(3, dev, "RX irq\n");
+			trace_printk("RX irq\n");
  			smc_rcv(dev);
  		} else if (status & IM_ALLOC_INT) {
  			DBG(3, dev, "Allocation irq\n");
@@ -1304,7 +1305,7 @@ static irqreturn_t smc_interrupt(int irq, void *dev_id)
  		PRINTK(dev, "spurious interrupt (mask = 0x%02x)\n",
  		       mask);
  #endif
-	DBG(3, dev, "Interrupt done (%d loops)\n",
+	trace_printk("Interrupt done (%d loops)\n",
  	    MAX_IRQ_LOOPS - timeout);
  
  	/*

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-23 18:21                       ` Brian Starkey
@ 2016-11-23 20:03                         ` Eric Dumazet
  2016-11-25 13:14                           ` Brian Starkey
  0 siblings, 1 reply; 19+ messages in thread
From: Eric Dumazet @ 2016-11-23 20:03 UTC (permalink / raw)
  To: Brian Starkey
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <brian.starkey@arm.com> wrote:

> This patch didn't help.
>
> I did get some new traces though - I've attached the diff for the
> trace_printks I added.
>
> Before 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
> After 4cd13c21b207:
> https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
>
> It looks like the difference is that after 4cd13c21b207 the RX softirq
> isn't running, and RX interrupts don't call softirq_raise anymore -
> presumably because there's one pending, but I didn't have time to
> track that down to a code-path.
>
> Cheers,
> -Brian
>

Hi Brian

Looks like netif_rx() drops the incoming packets then ?

Maybe netif_running() is not happy :(

Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)

Thanks !

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-23 20:03                         ` Eric Dumazet
@ 2016-11-25 13:14                           ` Brian Starkey
  2017-02-06 18:46                             ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Brian Starkey @ 2016-11-25 13:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior

Hi,

On Wed, Nov 23, 2016 at 12:03:28PM -0800, Eric Dumazet wrote:
>On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <brian.starkey@arm.com> wrote:
>
>> This patch didn't help.
>>
>> I did get some new traces though - I've attached the diff for the
>> trace_printks I added.
>>
>> Before 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
>> After 4cd13c21b207:
>> https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
>>
>> It looks like the difference is that after 4cd13c21b207 the RX softirq
>> isn't running, and RX interrupts don't call softirq_raise anymore -
>> presumably because there's one pending, but I didn't have time to
>> track that down to a code-path.
>>
>> Cheers,
>> -Brian
>>
>
>Hi Brian
>
>Looks like netif_rx() drops the incoming packets then ?
>
>Maybe netif_running() is not happy :(
>
>Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)

Some packets are dropped, but not very many:

   $ grep NET_RX_SUCCESS trace_netif_rx.txt | wc -l
   14399
   $ grep NET_RX_DROP trace_netif_rx.txt | wc -l
   22

Without the ksoftirqd change there were zero NET_RX_DROPs.

-Brian

>
>Thanks !

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2016-11-25 13:14                           ` Brian Starkey
@ 2017-02-06 18:46                             ` Will Deacon
  2017-02-06 18:49                               ` Russell King - ARM Linux
  0 siblings, 1 reply; 19+ messages in thread
From: Will Deacon @ 2017-02-06 18:46 UTC (permalink / raw)
  To: Brian Starkey
  Cc: Eric Dumazet, Thomas Gleixner, LKML, Peter Zijlstra, Ingo Molnar,
	Andrew Morton, Alexander Potapenko, Steven Rostedt,
	Sebastian Andrzej Siewior, linux

Hi all,

I've also stumbled over this issue with the ARM fastmodel and, somewhat
embarrassingly, blamed the model developers for the regression. I'm using
NFS and copying ~14MB file from NFS to a virtio-blk device which takes
over 20 minutes with 4cd13c21b207, but <1 min with it reverted.

I also think I've figured out what's going on. See below.

On Fri, Nov 25, 2016 at 01:14:03PM +0000, Brian Starkey wrote:
> On Wed, Nov 23, 2016 at 12:03:28PM -0800, Eric Dumazet wrote:
> >On Wed, Nov 23, 2016 at 10:21 AM, Brian Starkey <brian.starkey@arm.com> wrote:
> >
> >>This patch didn't help.
> >>
> >>I did get some new traces though - I've attached the diff for the
> >>trace_printks I added.
> >>
> >>Before 4cd13c21b207:
> >>https://drive.google.com/open?id=0B8siaK6ZjvEwcEtOeFQzTmY0Nnc
> >>After 4cd13c21b207:
> >>https://drive.google.com/open?id=0B8siaK6ZjvEwZnQ4MVg1d3d1Tm8
> >>
> >>It looks like the difference is that after 4cd13c21b207 the RX softirq
> >>isn't running, and RX interrupts don't call softirq_raise anymore -
> >>presumably because there's one pending, but I didn't have time to
> >>track that down to a code-path.
> >>
> >>Cheers,
> >>-Brian
> >>
> >
> >Hi Brian
> >
> >Looks like netif_rx() drops the incoming packets then ?
> >
> >Maybe netif_running() is not happy :(
> >
> >Could you trace netif_rx() return value (NET_RX_SUCCESS or NET_RX_DROP)
> 
> Some packets are dropped, but not very many:
> 
>   $ grep NET_RX_SUCCESS trace_netif_rx.txt | wc -l
>   14399
>   $ grep NET_RX_DROP trace_netif_rx.txt | wc -l
>   22
> 
> Without the ksoftirqd change there were zero NET_RX_DROPs.

The SMC91x has an on-chip 8KB FIFO (i.e. there's no DMA going on here).
When the FIFO is full (every 4 TCP packets in my case), we get an
interrupt and run down the smc_rcv path. There, we allocate an skb for
the data (netdev_alloc_skb) and copy the data out of the FIFO
(SMC_PULL_DATA) into the buffer, which we hand over the network core via
netif_rx.

The problem is that netif_rx defers to ksoftirqd to process the packet
and more crucially *free* the skb after it's been consumed. Since the
thing was allocated in IRQ context, we end up exhausting our GFP_ATOMIC
memory because ksoftirqd gets interrupted so frequently due to the tiny
FIFO depth that buffers are allocated at a much higher frequency than
they are freed. This may be exagerated by the relative speed of the model
emulated CPU with respect to the network interface, but I'd expect this
to be reproducible on real hardware too (rmk, cc'd, was going to give that
a go).

Prior to 4cd13c21b207, we'd always run softirqs synchronously on the
hardirq exit path and therefore have a chance to free some skbs before
actually EOI'ing the hardirq and allowing the FIFO-full interrupt to
interrupt us again.

Converting the smc91x driver over to NAPI would probably solve this problem,
but given the "vintage" of this code, I'd be more tempted by a simpler
point fix if only I could think of one.

Any ideas?

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2017-02-06 18:46                             ` Will Deacon
@ 2017-02-06 18:49                               ` Russell King - ARM Linux
  2017-02-08  9:46                                 ` Will Deacon
  0 siblings, 1 reply; 19+ messages in thread
From: Russell King - ARM Linux @ 2017-02-06 18:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: Brian Starkey, Eric Dumazet, Thomas Gleixner, LKML,
	Peter Zijlstra, Ingo Molnar, Andrew Morton, Alexander Potapenko,
	Steven Rostedt, Sebastian Andrzej Siewior

On Mon, Feb 06, 2017 at 06:46:19PM +0000, Will Deacon wrote:
> Converting the smc91x driver over to NAPI would probably solve this problem,
> but given the "vintage" of this code, I'd be more tempted by a simpler
> point fix if only I could think of one.

I'm not sure if converting it to NAPI would solve it, or just move
the problem elsewhere - IOW, move it from "we need to drop the packet
because we couldn't allocate a skb" to "the hardware dropped the packed
because the FIFO was full."

Yes, I'm intending giving it a go, once I've a spare moment to build
a kernel for the platform etc.  It runs root NFS, so should be a good
test for it.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job"
  2017-02-06 18:49                               ` Russell King - ARM Linux
@ 2017-02-08  9:46                                 ` Will Deacon
  0 siblings, 0 replies; 19+ messages in thread
From: Will Deacon @ 2017-02-08  9:46 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Brian Starkey, Eric Dumazet, Thomas Gleixner, LKML,
	Peter Zijlstra, Ingo Molnar, Andrew Morton, Alexander Potapenko,
	Steven Rostedt, Sebastian Andrzej Siewior

On Mon, Feb 06, 2017 at 06:49:42PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 06, 2017 at 06:46:19PM +0000, Will Deacon wrote:
> > Converting the smc91x driver over to NAPI would probably solve this problem,
> > but given the "vintage" of this code, I'd be more tempted by a simpler
> > point fix if only I could think of one.
> 
> I'm not sure if converting it to NAPI would solve it, or just move
> the problem elsewhere - IOW, move it from "we need to drop the packet
> because we couldn't allocate a skb" to "the hardware dropped the packed
> because the FIFO was full."

That's quite possible. I did a quick hack using a threaded irq handler,
with the thread basically running a modified version of smc_rcv using
GFP_KERNEL allocations. Whilst this improves things significantly, I do
still see rx drops, probably for the reason you mention above.

Still, NAPI should be better than what mainline is currently doing because
it won't continuously interrupt ksoftirqd when in polling mode. It's all
a rather delicate balancing act and getting back to the old behaviour might
not be possible after 4cd13c21b207.

> Yes, I'm intending giving it a go, once I've a spare moment to build
> a kernel for the platform etc.  It runs root NFS, so should be a good
> test for it.

Thanks, that would be interesting. We resurrected one of our realview-eb
machines with this NIC, but I think it's all on an FPGA so the relative
speed of the NIC vs the CPU isn't different enough that we see the problem.

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-08  9:47 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16 13:55 Regression: Failed boots bisected to 4cd13c21b207 "softirq: Let ksoftirqd do its job" Brian Starkey
2016-11-16 15:52 ` Eric Dumazet
2016-11-16 18:01   ` Brian Starkey
2016-11-16 18:49     ` Eric Dumazet
2016-11-16 21:01       ` Brian Starkey
2016-11-17 15:29         ` Eric Dumazet
2016-11-17 16:42           ` Brian Starkey
2016-11-18  0:40             ` Thomas Gleixner
2016-11-18 20:23               ` Brian Starkey
2016-11-22 10:33               ` Brian Starkey
2016-11-22 14:29                 ` Eric Dumazet
2016-11-22 15:27                   ` Brian Starkey
2016-11-22 16:09                     ` Eric Dumazet
2016-11-23 18:21                       ` Brian Starkey
2016-11-23 20:03                         ` Eric Dumazet
2016-11-25 13:14                           ` Brian Starkey
2017-02-06 18:46                             ` Will Deacon
2017-02-06 18:49                               ` Russell King - ARM Linux
2017-02-08  9:46                                 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).