linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
@ 2016-07-05 13:33 Mason
  2016-07-05 14:50 ` Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-05 13:33 UTC (permalink / raw)
  To: linux-pm, netdev, LKML; +Cc: Sebastian Frias

Hello,

I was testing suspend/resume sequences where the suspend operation
fails and returns without having suspended the platform.

# echo mem > /sys/power/state
[   90.322264] PM: Syncing filesystems ... done.
[   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
[   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   90.355357] Suspending console(s) (use no_console_suspend to debug)
[   90.364590] PM: suspend of devices complete after 2.068 msecs
[   90.365554] PM: late suspend of devices complete after 0.954 msecs
[   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
[   90.366227] Disabling non-boot CPUs ...
[   90.379004] CPU1: shutdown
[   90.412661] Enabling non-boot CPUs ...
[   90.450385] CPU1 is up
[   90.450979] PM: noirq resume of devices complete after 0.584 msecs
[   90.451672] PM: early resume of devices complete after 0.667 msecs
[   90.453149] nb8800 26000.ethernet eth0: Link is Down
[   90.453264] PM: resume of devices complete after 1.583 msecs
[   90.508180] Restarting tasks ... done.
-sh: echo: write error: Input/output error
[   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

(The error message is expected, as my suspend routine returns -EIO
on failure.)

I left the system to idle at the prompt; then 5 minutes later,
the system printed the following trace.

[  400.718491] ------------[ cut here ]------------
[  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  400.731582] Modules linked in:
[  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
[  400.742646] Hardware name: Sigma Tango DT
[  400.746671] Backtrace: 
[  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
[  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
[  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
[  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
[  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
[  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
[  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
[  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
[  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
[  400.829237]  r5:00000000 r4:00000000
[  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
[  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
[  400.849285]  r4:00000000
[  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
[  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  400.867053]  r4:c0735428
[  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
[  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
[  400.893747]  r4:c080eca0 r3:c0801f10
[  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
[  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
[  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
[  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
[  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
[  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
[  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
[  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
[  400.965400]  r7:ffffffff r4:00000002
[  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
[  400.976522]  r5:c081e4c0 r4:00000001
[  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
[  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---


Did I implement something incorrectly?

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 13:33 WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc Mason
@ 2016-07-05 14:50 ` Mason
  2016-07-05 15:28   ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-05 14:50 UTC (permalink / raw)
  To: linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 15:33, Mason wrote:

> I was testing suspend/resume sequences where the suspend operation
> fails and returns without having suspended the platform.
> 
> # echo mem > /sys/power/state
> [   90.322264] PM: Syncing filesystems ... done.
> [   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
> [   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   90.355357] Suspending console(s) (use no_console_suspend to debug)
> [   90.364590] PM: suspend of devices complete after 2.068 msecs
> [   90.365554] PM: late suspend of devices complete after 0.954 msecs
> [   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
> [   90.366227] Disabling non-boot CPUs ...
> [   90.379004] CPU1: shutdown
> [   90.412661] Enabling non-boot CPUs ...
> [   90.450385] CPU1 is up
> [   90.450979] PM: noirq resume of devices complete after 0.584 msecs
> [   90.451672] PM: early resume of devices complete after 0.667 msecs
> [   90.453149] nb8800 26000.ethernet eth0: Link is Down
> [   90.453264] PM: resume of devices complete after 1.583 msecs
> [   90.508180] Restarting tasks ... done.
> -sh: echo: write error: Input/output error
> [   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> (The error message is expected, as my suspend routine returns -EIO
> on failure.)
> 
> I left the system to idle at the prompt; then 5 minutes later,
> the system printed the following trace.
> 
> [  400.718491] ------------[ cut here ]------------
> [  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
> [  400.731582] Modules linked in:
> [  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
> [  400.742646] Hardware name: Sigma Tango DT
> [  400.746671] Backtrace: 
> [  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
> [  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
> [  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
> [  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
> [  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
> [  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
> [  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
> [  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
> [  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
> [  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
> [  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
> [  400.829237]  r5:00000000 r4:00000000
> [  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
> [  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
> [  400.849285]  r4:00000000
> [  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
> [  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
> [  400.867053]  r4:c0735428
> [  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
> [  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
> [  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
> [  400.893747]  r4:c080eca0 r3:c0801f10
> [  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
> [  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
> [  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
> [  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
> [  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
> [  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
> [  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
> [  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
> [  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
> [  400.965400]  r7:ffffffff r4:00000002
> [  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
> [  400.976522]  r5:c081e4c0 r4:00000001
> [  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
> [  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---


NB: The warning shows up 310 seconds after the suspend attempt.

I rebooted, tried the same operation, and hit the same warning
still 310 seconds later:

# echo mem > /sys/power/state
[   25.665905] PM: Syncing filesystems ... done.
[   25.672102] Freezing user space processes ... (elapsed 0.001 seconds) done.
[   25.680807] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) 
[   25.690494] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[   25.699118] Suspending console(s) (use no_console_suspend to debug)
[   25.707899] PM: suspend of devices complete after 1.639 msecs
[   25.708796] PM: late suspend of devices complete after 0.887 msecs
[   25.709460] PM: noirq suspend of devices complete after 0.657 msecs
[   25.709465] Disabling non-boot CPUs ...
[   25.729045] CPU1: shutdown
[   25.762704] Enabling non-boot CPUs ...
[   25.800416] CPU1 is up
[   25.801024] PM: noirq resume of devices complete after 0.595 msecs
[   25.801730] PM: early resume of devices complete after 0.678 msecs
[   25.803194] nb8800 26000.ethernet eth0: Link is Down
[   25.803311] PM: resume of devices complete after 1.571 msecs
[   25.858245] Restarting tasks ... done.
-sh: echo: write error: Input/output error
[   29.186902] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

[  335.865192] ------------[ cut here ]------------
[  335.869875] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  335.878284] Modules linked in:
[  335.881366] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
[  335.889321] Hardware name: Sigma Tango DT
[  335.893346] Backtrace: 
[  335.895815] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  335.903420]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  335.909125] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
[  335.916391] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
[  335.923386]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
[  335.929086] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
[  335.936691]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df75ec54 r5:df5af3c4 r4:df5af2c0
[  335.944491] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
[  335.953326] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
[  335.961542]  r7:c0801de8 r6:df75ec54 r5:df5af2c0 r4:df5af46c
[  335.967248] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
[  335.975901]  r5:00000000 r4:00000000
[  335.979499] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
[  335.988065]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
[  335.995947]  r4:00000000
[  335.998492] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
[  336.005835]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  336.013716]  r4:c0735428
[  336.016264] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
[  336.024136] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  336.032526]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
[  336.040406]  r4:c080eca0 r3:c0801f10
[  336.044001] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  336.051520] Exception stack(0xc0801f10 to 0xc0801f58)
[  336.056594] 1f00:                                     00000000 00000000 000079b2 c0117c80
[  336.064815] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
[  336.073034] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
[  336.079678]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
[  336.087483] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
[  336.095616] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
[  336.104277] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
[  336.112057]  r7:ffffffff r4:00000002
[  336.115662] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
[  336.123180]  r5:c081e4c0 r4:00000001
[  336.126777] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
[  336.133349] ---[ end trace d6b09977089e89b4 ]---


Does this 310 second lag ring a bell for anyone?

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 14:50 ` Mason
@ 2016-07-05 15:28   ` Florian Fainelli
  2016-07-05 15:56     ` Mason
  2016-07-12  9:53     ` Mason
  0 siblings, 2 replies; 14+ messages in thread
From: Florian Fainelli @ 2016-07-05 15:28 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

Le 05/07/2016 07:50, Mason a écrit :
> On 05/07/2016 15:33, Mason wrote:
> 
>> I was testing suspend/resume sequences where the suspend operation
>> fails and returns without having suspended the platform.
>>
>> # echo mem > /sys/power/state
>> [   90.322264] PM: Syncing filesystems ... done.
>> [   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [   90.337092] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
>> [   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [   90.355357] Suspending console(s) (use no_console_suspend to debug)
>> [   90.364590] PM: suspend of devices complete after 2.068 msecs
>> [   90.365554] PM: late suspend of devices complete after 0.954 msecs
>> [   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
>> [   90.366227] Disabling non-boot CPUs ...
>> [   90.379004] CPU1: shutdown
>> [   90.412661] Enabling non-boot CPUs ...
>> [   90.450385] CPU1 is up
>> [   90.450979] PM: noirq resume of devices complete after 0.584 msecs
>> [   90.451672] PM: early resume of devices complete after 0.667 msecs
>> [   90.453149] nb8800 26000.ethernet eth0: Link is Down
>> [   90.453264] PM: resume of devices complete after 1.583 msecs
>> [   90.508180] Restarting tasks ... done.
>> -sh: echo: write error: Input/output error
>> [   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
>>
>> (The error message is expected, as my suspend routine returns -EIO
>> on failure.)
>>
>> I left the system to idle at the prompt; then 5 minutes later,
>> the system printed the following trace.
>>
>> [  400.718491] ------------[ cut here ]------------
>> [  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
>> [  400.731582] Modules linked in:
>> [  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.7.0-rc6-00010-gd07031bdc433 #1
>> [  400.742646] Hardware name: Sigma Tango DT
>> [  400.746671] Backtrace: 
>> [  400.749141] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
>> [  400.756747]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
>> [  400.762454] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
>> [  400.769722] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
>> [  400.776717]  r7:00000009 r6:c05e3fbc r5:00000000 r4:00000000
>> [  400.782417] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
>> [  400.790022]  r9:dfbdd4e0 r8:0000000a r7:c0801de8 r6:df6f9514 r5:df5df144 r4:df5df040
>> [  400.797825] [<c011c070>] (warn_slowpath_null) from [<c0463c04>] (inet_sock_destruct+0x1c4/0x1dc)
>> [  400.806661] [<c0463a40>] (inet_sock_destruct) from [<c03e9c60>] (__sk_destruct+0x28/0xe0)
>> [  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
>> [  400.820584] [<c03e9c38>] (__sk_destruct) from [<c016f230>] (rcu_process_callbacks+0x488/0x59c)
>> [  400.829237]  r5:00000000 r4:00000000
>> [  400.832836] [<c016eda8>] (rcu_process_callbacks) from [<c01207e4>] (__do_softirq+0x138/0x264)
>> [  400.841402]  r10:c08020a0 r9:40000001 r8:00000101 r7:c0800000 r6:c08020a4 r5:00000009
>> [  400.849285]  r4:00000000
>> [  400.851829] [<c01206ac>] (__do_softirq) from [<c0120c04>] (irq_exit+0xc8/0x104)
>> [  400.859172]  r10:c0801f10 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
>> [  400.867053]  r4:c0735428
>> [  400.869601] [<c0120b3c>] (irq_exit) from [<c0162610>] (__handle_domain_irq+0x88/0xf4)
>> [  400.877473] [<c0162588>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
>> [  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c r5:c080277c
>> [  400.893747]  r4:c080eca0 r3:c0801f10
>> [  400.897342] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
>> [  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
>> [  400.909936] 1f00:                                     00000000 00000000 0000826a c0117c80
>> [  400.918156] 1f20: c0800000 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 dfffcdc0 c0801f6c
>> [  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 60000013 ffffffff
>> [  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6:ffffffff r5:60000013 r4:c01086b4
>> [  400.940826] [<c0108674>] (arch_cpu_idle) from [<c0155f54>] (default_idle_call+0x28/0x34)
>> [  400.948960] [<c0155f2c>] (default_idle_call) from [<c0156088>] (cpu_startup_entry+0x128/0x17c)
>> [  400.957620] [<c0155f60>] (cpu_startup_entry) from [<c04a3f54>] (rest_init+0x8c/0x90)
>> [  400.965400]  r7:ffffffff r4:00000002
>> [  400.969005] [<c04a3ec8>] (rest_init) from [<c0700cb4>] (start_kernel+0x310/0x31c)
>> [  400.976522]  r5:c081e4c0 r4:00000001
>> [  400.980121] [<c07009a4>] (start_kernel) from [<8000807c>] (0x8000807c)
>> [  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---
> 
> 
> NB: The warning shows up 310 seconds after the suspend attempt.
> 
> I rebooted, tried the same operation, and hit the same warning
> still 310 seconds later:
> 
> # echo mem > /sys/power/state
> [   25.665905] PM: Syncing filesystems ... done.
> [   25.672102] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [   25.680807] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds) 
> [   25.690494] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
> [   25.699118] Suspending console(s) (use no_console_suspend to debug)
> [   25.707899] PM: suspend of devices complete after 1.639 msecs
> [   25.708796] PM: late suspend of devices complete after 0.887 msecs
> [   25.709460] PM: noirq suspend of devices complete after 0.657 msecs
> [   25.709465] Disabling non-boot CPUs ...
> [   25.729045] CPU1: shutdown
> [   25.762704] Enabling non-boot CPUs ...
> [   25.800416] CPU1 is up
> [   25.801024] PM: noirq resume of devices complete after 0.595 msecs
> [   25.801730] PM: early resume of devices complete after 0.678 msecs
> [   25.803194] nb8800 26000.ethernet eth0: Link is Down
> [   25.803311] PM: resume of devices complete after 1.571 msecs
> [   25.858245] Restarting tasks ... done.
> -sh: echo: write error: Input/output error
> [   29.186902] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

nb8800.c does not currently show suspend/resume hooks implemented, are
you positive that when you suspend, you properly tear down all HW, stop
transmit queues, etc. and do the opposite upon resumption?

Is your system clocksource also correctly saved/restored, or if you go
through a firmware in-between could it be changing the counter values
and make Linux think that more time as elapsed than it really happened?
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 15:28   ` Florian Fainelli
@ 2016-07-05 15:56     ` Mason
  2016-07-05 16:20       ` Florian Fainelli
  2016-07-12  9:53     ` Mason
  1 sibling, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-05 15:56 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 17:28, Florian Fainelli wrote:

> nb8800.c does not currently show suspend/resume hooks implemented, are
> you positive that when you suspend, you properly tear down all HW, stop
> transmit queues, etc. and do the opposite upon resumption?

I am currently testing the error path for my suspend routine.
Firmware is, in fact, denying the suspend request, and immediately
returns control to Linux, without having powered anything down.

I expected not having to save any context in that situation.
Am I mistaken?

You mention "stop transmit queues". Can you say more about this?

> Is your system clocksource also correctly saved/restored, or if you go
> through a firmware in-between could it be changing the counter values
> and make Linux think that more time as elapsed than it really happened?

Thanks for pointing this out, I was not aware I was supposed to save
and restore the tick counter on suspend/resume. (This is not an issue
in this specific situation, as the platform is NOT suspended.)

However, your remark has brought some more confusion to my mind.
Linux is expecting time to stand still when it suspends?
What if the tick counter is in an always-on power domain, and other
processors depend on the counter? I can't just overwrite the reg
when Linux resumes...

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 15:56     ` Mason
@ 2016-07-05 16:20       ` Florian Fainelli
  2016-07-05 20:26         ` Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2016-07-05 16:20 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 08:56 AM, Mason wrote:
> On 05/07/2016 17:28, Florian Fainelli wrote:
> 
>> nb8800.c does not currently show suspend/resume hooks implemented, are
>> you positive that when you suspend, you properly tear down all HW, stop
>> transmit queues, etc. and do the opposite upon resumption?
> 
> I am currently testing the error path for my suspend routine.
> Firmware is, in fact, denying the suspend request, and immediately
> returns control to Linux, without having powered anything down.
> 
> I expected not having to save any context in that situation.
> Am I mistaken?

It depends what power state you are going to and resuming from, and how
much of this is platform dependent, on the platforms I work with S2
preserves register states for our On/Off domain, while S3 only keeps an
always-on power island and shuts off the On/Off domain, you therefore
need to have your drivers in the On/Off domain suspend any activity and
preserve important register states, or re-initialize them from scratch
whichever is the most convenient.


> 
> You mention "stop transmit queues". Can you say more about this?

See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
that takes care of that for instance, look for bcmgenet_{suspend,resume}

> 
>> Is your system clocksource also correctly saved/restored, or if you go
>> through a firmware in-between could it be changing the counter values
>> and make Linux think that more time as elapsed than it really happened?
> 
> Thanks for pointing this out, I was not aware I was supposed to save
> and restore the tick counter on suspend/resume. (This is not an issue
> in this specific situation, as the platform is NOT suspended.)

You don't have to save and restore the clocksource counter, although if
you want proper time accounting to be done across suspend states, you
would want to use a clocksource which is persistent across these suspend
states.

> 
> However, your remark has brought some more confusion to my mind.
> Linux is expecting time to stand still when it suspends?
> What if the tick counter is in an always-on power domain, and other
> processors depend on the counter? I can't just overwrite the reg
> when Linux resumes...

The point is more that if the firmware initializes the timer, or even
re-initializes it, Linux could think that events expired because the
timebase has a big offset compared to where it was. Just pointing out
that this *could* be a problem. If your timer is in the always on domain
and your firmware does not touch it, that should be fine without
anything specific (except adding an "always-on" boolean property to the
timer nodes in DT maybe).
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 16:20       ` Florian Fainelli
@ 2016-07-05 20:26         ` Mason
  2016-07-05 21:22           ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-05 20:26 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 18:20, Florian Fainelli wrote:
> On 07/05/2016 08:56 AM, Mason wrote:
>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>
>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>> you positive that when you suspend, you properly tear down all HW, stop
>>> transmit queues, etc. and do the opposite upon resumption?
>>
>> I am currently testing the error path for my suspend routine.
>> Firmware is, in fact, denying the suspend request, and immediately
>> returns control to Linux, without having powered anything down.
>>
>> I expected not having to save any context in that situation.
>> Am I mistaken?
> 
> It depends what power state you are going to and resuming from, and how
> much of this is platform dependent, on the platforms I work with S2
> preserves register states for our On/Off domain, while S3 only keeps an
> always-on power island and shuts off the On/Off domain, you therefore
> need to have your drivers in the On/Off domain suspend any activity and
> preserve important register states, or re-initialize them from scratch
> whichever is the most convenient.

Thanks for bringing these details to my attention, they will
definitely prove useful when I test an actual suspend/resume
sequence. However, I must stress that the platform did NOT
power down in my test case, because the firmware currently
denies all suspend requests.

Therefore, loss of context cannot possibly explain the
warning I am seeing.

>> You mention "stop transmit queues". Can you say more about this?
> 
> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
> that takes care of that for instance, look for bcmgenet_{suspend,resume}

Thanks. I will look into it.

If I understand correctly, something is missing in the
network interface code? (My system is using an NFS root
filesystem, so network is an important subsystem.)

>>> Is your system clocksource also correctly saved/restored, or if you go
>>> through a firmware in-between could it be changing the counter values
>>> and make Linux think that more time as elapsed than it really happened?
>>
>> Thanks for pointing this out, I was not aware I was supposed to save
>> and restore the tick counter on suspend/resume. (This is not an issue
>> in this specific situation, as the platform is NOT suspended.)
> 
> You don't have to save and restore the clocksource counter, although if
> you want proper time accounting to be done across suspend states, you
> would want to use a clocksource which is persistent across these suspend
> states.

The clocksource is a 27 MHz 32-bit tick counter. In other words,
the counter wraps around every 159 seconds. If Linux suspends
for several hours, how can it determine how much time went by?

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 20:26         ` Mason
@ 2016-07-05 21:22           ` Florian Fainelli
  2016-07-05 21:51             ` Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Florian Fainelli @ 2016-07-05 21:22 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 01:26 PM, Mason wrote:
> On 05/07/2016 18:20, Florian Fainelli wrote:
>> On 07/05/2016 08:56 AM, Mason wrote:
>>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>>
>>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>>> you positive that when you suspend, you properly tear down all HW, stop
>>>> transmit queues, etc. and do the opposite upon resumption?
>>>
>>> I am currently testing the error path for my suspend routine.
>>> Firmware is, in fact, denying the suspend request, and immediately
>>> returns control to Linux, without having powered anything down.
>>>
>>> I expected not having to save any context in that situation.
>>> Am I mistaken?
>>
>> It depends what power state you are going to and resuming from, and how
>> much of this is platform dependent, on the platforms I work with S2
>> preserves register states for our On/Off domain, while S3 only keeps an
>> always-on power island and shuts off the On/Off domain, you therefore
>> need to have your drivers in the On/Off domain suspend any activity and
>> preserve important register states, or re-initialize them from scratch
>> whichever is the most convenient.
> 
> Thanks for bringing these details to my attention, they will
> definitely prove useful when I test an actual suspend/resume
> sequence. However, I must stress that the platform did NOT
> power down in my test case, because the firmware currently
> denies all suspend requests.
> 
> Therefore, loss of context cannot possibly explain the
> warning I am seeing.

No, but if you go all the way down to trying to suspend and the last
step is the firmware failing, anything you have suspended needs to be
unwinded, for your ethernet driver that means that you went through a
successful suspend then resume cycle even if it failed down later when
the platform attempted to suspend.

> 
>>> You mention "stop transmit queues". Can you say more about this?
>>
>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
> 
> Thanks. I will look into it.
> 
> If I understand correctly, something is missing in the
> network interface code? (My system is using an NFS root
> filesystem, so network is an important subsystem.)

The typical things are detaching the network device and stopping
transmit queues, but without knowing what changes you have done to
nb8800.c, hard to tell what else is needed.

> 
>>>> Is your system clocksource also correctly saved/restored, or if you go
>>>> through a firmware in-between could it be changing the counter values
>>>> and make Linux think that more time as elapsed than it really happened?
>>>
>>> Thanks for pointing this out, I was not aware I was supposed to save
>>> and restore the tick counter on suspend/resume. (This is not an issue
>>> in this specific situation, as the platform is NOT suspended.)
>>
>> You don't have to save and restore the clocksource counter, although if
>> you want proper time accounting to be done across suspend states, you
>> would want to use a clocksource which is persistent across these suspend
>> states.
> 
> The clocksource is a 27 MHz 32-bit tick counter. In other words,
> the counter wraps around every 159 seconds. If Linux suspends
> for several hours, how can it determine how much time went by?

Well, that's unfortunate, then you are pretty much either doomed to
accepting to lose time in between and rely on e.g: NTP to resync your
time upon resumption, or, if you had smarter hardware you could have a
prescaler or something that makes this counter wrap far ahead (like
years or days after).
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 21:22           ` Florian Fainelli
@ 2016-07-05 21:51             ` Mason
  2016-07-05 21:55               ` Florian Fainelli
  0 siblings, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-05 21:51 UTC (permalink / raw)
  To: Florian Fainelli, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 05/07/2016 23:22, Florian Fainelli wrote:
> On 07/05/2016 01:26 PM, Mason wrote:
>> On 05/07/2016 18:20, Florian Fainelli wrote:
>>> On 07/05/2016 08:56 AM, Mason wrote:
>>>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>>>
>>>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>>>> you positive that when you suspend, you properly tear down all HW, stop
>>>>> transmit queues, etc. and do the opposite upon resumption?
>>>>
>>>> I am currently testing the error path for my suspend routine.
>>>> Firmware is, in fact, denying the suspend request, and immediately
>>>> returns control to Linux, without having powered anything down.
>>>>
>>>> I expected not having to save any context in that situation.
>>>> Am I mistaken?
>>>
>>> It depends what power state you are going to and resuming from, and how
>>> much of this is platform dependent, on the platforms I work with S2
>>> preserves register states for our On/Off domain, while S3 only keeps an
>>> always-on power island and shuts off the On/Off domain, you therefore
>>> need to have your drivers in the On/Off domain suspend any activity and
>>> preserve important register states, or re-initialize them from scratch
>>> whichever is the most convenient.
>>
>> Thanks for bringing these details to my attention, they will
>> definitely prove useful when I test an actual suspend/resume
>> sequence. However, I must stress that the platform did NOT
>> power down in my test case, because the firmware currently
>> denies all suspend requests.
>>
>> Therefore, loss of context cannot possibly explain the
>> warning I am seeing.
> 
> No, but if you go all the way down to trying to suspend and the last
> step is the firmware failing, anything you have suspended needs to be
> unwinded, for your ethernet driver that means that you went through a
> successful suspend then resume cycle even if it failed down later when
> the platform attempted to suspend.

So it is the driver's responsibility to "shut down" on resume?
(I had the vague impression that the suspend framework would
"disable" the device through the appropriate callback.)

>>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
>>
>> Thanks. I will look into it.
>>
>> If I understand correctly, something is missing in the
>> network interface code? (My system is using an NFS root
>> filesystem, so network is an important subsystem.)
> 
> The typical things are detaching the network device and stopping
> transmit queues, but without knowing what changes you have done to
> nb8800.c, hard to tell what else is needed.

I'm using the driver unaltered. So I guess I need to figure out
the exact steps required for suspending a network device.
(I'll look at bcmgenet.c tomorrow.)

>>>>> Is your system clocksource also correctly saved/restored, or if you go
>>>>> through a firmware in-between could it be changing the counter values
>>>>> and make Linux think that more time as elapsed than it really happened?
>>>>
>>>> Thanks for pointing this out, I was not aware I was supposed to save
>>>> and restore the tick counter on suspend/resume. (This is not an issue
>>>> in this specific situation, as the platform is NOT suspended.)
>>>
>>> You don't have to save and restore the clocksource counter, although if
>>> you want proper time accounting to be done across suspend states, you
>>> would want to use a clocksource which is persistent across these suspend
>>> states.
>>
>> The clocksource is a 27 MHz 32-bit tick counter. In other words,
>> the counter wraps around every 159 seconds. If Linux suspends
>> for several hours, how can it determine how much time went by?
> 
> Well, that's unfortunate, then you are pretty much either doomed to
> accepting to lose time in between and rely on e.g: NTP to resync your
> time upon resumption, or, if you had smarter hardware you could have a
> prescaler or something that makes this counter wrap far ahead (like
> years or days after).

Maybe the hardware devs thought of that problem, because they
"widened" the counter to 64 bits on newer platforms.

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 21:51             ` Mason
@ 2016-07-05 21:55               ` Florian Fainelli
  0 siblings, 0 replies; 14+ messages in thread
From: Florian Fainelli @ 2016-07-05 21:55 UTC (permalink / raw)
  To: Mason, linux-pm, netdev, LKML; +Cc: Sebastian Frias

On 07/05/2016 02:51 PM, Mason wrote:
>>> Therefore, loss of context cannot possibly explain the
>>> warning I am seeing.
>>
>> No, but if you go all the way down to trying to suspend and the last
>> step is the firmware failing, anything you have suspended needs to be
>> unwinded, for your ethernet driver that means that you went through a
>> successful suspend then resume cycle even if it failed down later when
>> the platform attempted to suspend.
> 
> So it is the driver's responsibility to "shut down" on resume?

It is the driver responsibility to know how to suspend and resume a
device it manages, and it does that by implementing appropriate
suspend/resume callbacks.

> (I had the vague impression that the suspend framework would
> "disable" the device through the appropriate callback.)

The suspend framework knows which drivers implement suspend/resume and
calls them appropriately (based on parenting/bus hierarchy), but it
won't automatigally do anything because there is no such thing as magic
when it comes to suspending hardware, this needs to be a controlled
sequence.
-- 
Florian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-05 15:28   ` Florian Fainelli
  2016-07-05 15:56     ` Mason
@ 2016-07-12  9:53     ` Mason
  2016-07-12 11:48       ` Mason
  2016-07-12 14:25       ` Eric Dumazet
  1 sibling, 2 replies; 14+ messages in thread
From: Mason @ 2016-07-12  9:53 UTC (permalink / raw)
  To: Florian Fainelli, netdev, LKML; +Cc: Linux ARM, Sebastian Frias

On 05/07/2016 17:28, Florian Fainelli wrote:

> Le 05/07/2016 07:50, Mason wrote:
> 
>> On 05/07/2016 15:33, Mason wrote:
>> 
>>> I was testing suspend/resume sequences where the suspend operation
>>> fails and returns without having suspended the platform.

Please forget I ever mentioned suspend, that was a red herring.
The warning is displayed even if I never suspend.
(Dropping linux-pm from this discussion.)

>> I rebooted, tried the same operation, and hit the same warning
>> still 310 seconds later:

However, the 310 seconds time span still seems to be relevant.

Steps to reproduce: I booted the system, logged in as root,
mounted an NFS file system, then left the system idling at
the prompt.

(I don't remember seeing this warning in v4.1 and v4.4)

What's going wrong here? Is it related to NFS?

Here is the defconfig I'm using
http://pastebin.ubuntu.com/19160299/

Regards.


[  317.940133] ------------[ cut here ]------------
[  317.944815] WARNING: CPU: 1 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  317.953223] Modules linked in:
[  317.956305] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #2
[  317.964784] Hardware name: Sigma Tango DT
[  317.968809] Backtrace: 
[  317.971279] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  317.978884]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  317.984590] [<c010bb58>] (show_stack) from [<c02e9fc4>] (dump_stack+0x80/0x94)
[  317.991856] [<c02e9f44>] (dump_stack) from [<c011bfb0>] (__warn+0xec/0x104)
[  317.998849]  r7:00000009 r6:c05e3fc8 r5:00000000 r4:00000000
[  318.004549] [<c011bec4>] (__warn) from [<c011c080>] (warn_slowpath_null+0x28/0x30)
[  318.012154]  r9:dfbea4e0 r8:0000000a r7:df45fe30 r6:dec19594 r5:df68f144 r4:df68f040
[  318.019954] [<c011c058>] (warn_slowpath_null) from [<c0463be4>] (inet_sock_destruct+0x1c4/0x1dc)
[  318.028788] [<c0463a20>] (inet_sock_destruct) from [<c03e9c40>] (__sk_destruct+0x28/0xe0)
[  318.037005]  r7:df45fe30 r6:dec19594 r5:df68f040 r4:df68f1ec
[  318.042710] [<c03e9c18>] (__sk_destruct) from [<c016f218>] (rcu_process_callbacks+0x488/0x59c)
[  318.051363]  r5:00000000 r4:00000000
[  318.054962] [<c016ed90>] (rcu_process_callbacks) from [<c01207cc>] (__do_softirq+0x138/0x264)
[  318.063527]  r10:c08020a0 r9:40000001 r8:00000101 r7:df45e000 r6:c08020a4 r5:00000009
[  318.071408]  r4:00000000
[  318.073953] [<c0120694>] (__do_softirq) from [<c0120bec>] (irq_exit+0xc8/0x104)
[  318.081296]  r10:df45ff58 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  318.089176]  r4:c0735428
[  318.091723] [<c0120b24>] (irq_exit) from [<c01625f8>] (__handle_domain_irq+0x88/0xf4)
[  318.099595] [<c0162570>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  318.107986]  r10:00000000 r9:e0803100 r8:e0802100 r7:df45ff58 r6:e080210c r5:c080277c
[  318.115865]  r4:c080eca0 r3:df45ff58
[  318.119461] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  318.126980] Exception stack(0xdf45ff58 to 0xdf45ffa0)
[  318.132053] ff40:                                                       00000001 00000000
[  318.140273] ff60: 0000ab80 c0117c80 df45e000 c08024f8 c0802494 c081e2d6 c05b9550 413fc090
[  318.148492] ff80: 00000000 df45ffb4 df45ffb8 df45ffa8 c01086b0 c01086b4 60000013 ffffffff
[  318.156709]  r9:413fc090 r8:c05b9550 r7:df45ff8c r6:ffffffff r5:60000013 r4:c01086b4
[  318.164512] [<c0108674>] (arch_cpu_idle) from [<c0155f3c>] (default_idle_call+0x28/0x34)
[  318.172646] [<c0155f14>] (default_idle_call) from [<c0156070>] (cpu_startup_entry+0x128/0x17c)
[  318.181305] [<c0155f48>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164)
[  318.190395]  r7:c081e7c8 r4:c080b4f0
[  318.193993] [<c010dabc>] (secondary_start_kernel) from [<8010158c>] (0x8010158c)
[  318.201423]  r5:00000051 r4:9f44006a
[  318.205024] ---[ end trace 6e04001434b19cb9 ]---


Just to be sure, I performed the same steps a second time:

[  316.238527] ------------[ cut here ]------------
[  316.243210] WARNING: CPU: 1 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
[  316.251619] Modules linked in:
[  316.254702] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #2
[  316.263182] Hardware name: Sigma Tango DT
[  316.267206] Backtrace: 
[  316.269675] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[  316.277280]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
[  316.282986] [<c010bb58>] (show_stack) from [<c02e9fc4>] (dump_stack+0x80/0x94)
[  316.290254] [<c02e9f44>] (dump_stack) from [<c011bfb0>] (__warn+0xec/0x104)
[  316.297247]  r7:00000009 r6:c05e3fc8 r5:00000000 r4:00000000
[  316.302947] [<c011bec4>] (__warn) from [<c011c080>] (warn_slowpath_null+0x28/0x30)
[  316.310552]  r9:dfbea4e0 r8:0000000a r7:df45fe30 r6:dec15694 r5:df6063c4 r4:df6062c0
[  316.318354] [<c011c058>] (warn_slowpath_null) from [<c0463be4>] (inet_sock_destruct+0x1c4/0x1dc)
[  316.327190] [<c0463a20>] (inet_sock_destruct) from [<c03e9c40>] (__sk_destruct+0x28/0xe0)
[  316.335406]  r7:df45fe30 r6:dec15694 r5:df6062c0 r4:df60646c
[  316.341112] [<c03e9c18>] (__sk_destruct) from [<c016f218>] (rcu_process_callbacks+0x488/0x59c)
[  316.349765]  r5:00000000 r4:00000000
[  316.353363] [<c016ed90>] (rcu_process_callbacks) from [<c01207cc>] (__do_softirq+0x138/0x264)
[  316.361929]  r10:c08020a0 r9:40000001 r8:00000101 r7:df45e000 r6:c08020a4 r5:00000009
[  316.369811]  r4:00000000
[  316.372356] [<c0120694>] (__do_softirq) from [<c0120bec>] (irq_exit+0xc8/0x104)
[  316.379699]  r10:df45ff58 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
[  316.387579]  r4:c0735428
[  316.390127] [<c0120b24>] (irq_exit) from [<c01625f8>] (__handle_domain_irq+0x88/0xf4)
[  316.397998] [<c0162570>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
[  316.406388]  r10:00000000 r9:e0803100 r8:e0802100 r7:df45ff58 r6:e080210c r5:c080277c
[  316.414268]  r4:c080eca0 r3:df45ff58
[  316.417862] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
[  316.425382] Exception stack(0xdf45ff58 to 0xdf45ffa0)
[  316.430456] ff40:                                                       00000001 00000000
[  316.438676] ff60: 00009370 c0117c80 df45e000 c08024f8 c0802494 c081e2d6 c05b9550 413fc090
[  316.446897] ff80: 00000000 df45ffb4 df45ffb8 df45ffa8 c01086b0 c01086b4 60000013 ffffffff
[  316.455113]  r9:413fc090 r8:c05b9550 r7:df45ff8c r6:ffffffff r5:60000013 r4:c01086b4
[  316.462916] [<c0108674>] (arch_cpu_idle) from [<c0155f3c>] (default_idle_call+0x28/0x34)
[  316.471051] [<c0155f14>] (default_idle_call) from [<c0156070>] (cpu_startup_entry+0x128/0x17c)
[  316.479709] [<c0155f48>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164)
[  316.488799]  r7:c081e7c8 r4:c080b4f0
[  316.492395] [<c010dabc>] (secondary_start_kernel) from [<8010158c>] (0x8010158c)
[  316.499826]  r5:00000051 r4:9f44006a
[  316.503430] ---[ end trace 2dd53d8e86a1a69b ]---

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-12  9:53     ` Mason
@ 2016-07-12 11:48       ` Mason
  2016-07-12 14:25       ` Eric Dumazet
  1 sibling, 0 replies; 14+ messages in thread
From: Mason @ 2016-07-12 11:48 UTC (permalink / raw)
  To: Florian Fainelli, netdev, LKML, linux-nfs; +Cc: Linux ARM, Sebastian Frias

On 12/07/2016 11:53, Mason wrote:

> However, the 310 seconds time span still seems to be relevant.
> 
> Steps to reproduce: I booted the system, logged in as root,
> mounted an NFS file system, then left the system idling at
> the prompt.
> 
> (I don't remember seeing this warning in v4.1 and v4.4)
> 
> What's going wrong here? Is it related to NFS?
> 
> Here is the defconfig I'm using
> http://pastebin.ubuntu.com/19160299/
> 
> 
> [  317.940133] ------------[ cut here ]------------
> [  317.944815] WARNING: CPU: 1 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
> [  317.953223] Modules linked in:
> [  317.956305] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #2
> [  317.964784] Hardware name: Sigma Tango DT
> [  317.968809] Backtrace: 
> [  317.971279] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
> [  317.978884]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
> [  317.984590] [<c010bb58>] (show_stack) from [<c02e9fc4>] (dump_stack+0x80/0x94)
> [  317.991856] [<c02e9f44>] (dump_stack) from [<c011bfb0>] (__warn+0xec/0x104)
> [  317.998849]  r7:00000009 r6:c05e3fc8 r5:00000000 r4:00000000
> [  318.004549] [<c011bec4>] (__warn) from [<c011c080>] (warn_slowpath_null+0x28/0x30)
> [  318.012154]  r9:dfbea4e0 r8:0000000a r7:df45fe30 r6:dec19594 r5:df68f144 r4:df68f040
> [  318.019954] [<c011c058>] (warn_slowpath_null) from [<c0463be4>] (inet_sock_destruct+0x1c4/0x1dc)
> [  318.028788] [<c0463a20>] (inet_sock_destruct) from [<c03e9c40>] (__sk_destruct+0x28/0xe0)
> [  318.037005]  r7:df45fe30 r6:dec19594 r5:df68f040 r4:df68f1ec
> [  318.042710] [<c03e9c18>] (__sk_destruct) from [<c016f218>] (rcu_process_callbacks+0x488/0x59c)
> [  318.051363]  r5:00000000 r4:00000000
> [  318.054962] [<c016ed90>] (rcu_process_callbacks) from [<c01207cc>] (__do_softirq+0x138/0x264)
> [  318.063527]  r10:c08020a0 r9:40000001 r8:00000101 r7:df45e000 r6:c08020a4 r5:00000009
> [  318.071408]  r4:00000000
> [  318.073953] [<c0120694>] (__do_softirq) from [<c0120bec>] (irq_exit+0xc8/0x104)
> [  318.081296]  r10:df45ff58 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
> [  318.089176]  r4:c0735428
> [  318.091723] [<c0120b24>] (irq_exit) from [<c01625f8>] (__handle_domain_irq+0x88/0xf4)
> [  318.099595] [<c0162570>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
> [  318.107986]  r10:00000000 r9:e0803100 r8:e0802100 r7:df45ff58 r6:e080210c r5:c080277c
> [  318.115865]  r4:c080eca0 r3:df45ff58
> [  318.119461] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
> [  318.126980] Exception stack(0xdf45ff58 to 0xdf45ffa0)
> [  318.132053] ff40:                                                       00000001 00000000
> [  318.140273] ff60: 0000ab80 c0117c80 df45e000 c08024f8 c0802494 c081e2d6 c05b9550 413fc090
> [  318.148492] ff80: 00000000 df45ffb4 df45ffb8 df45ffa8 c01086b0 c01086b4 60000013 ffffffff
> [  318.156709]  r9:413fc090 r8:c05b9550 r7:df45ff8c r6:ffffffff r5:60000013 r4:c01086b4
> [  318.164512] [<c0108674>] (arch_cpu_idle) from [<c0155f3c>] (default_idle_call+0x28/0x34)
> [  318.172646] [<c0155f14>] (default_idle_call) from [<c0156070>] (cpu_startup_entry+0x128/0x17c)
> [  318.181305] [<c0155f48>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164)
> [  318.190395]  r7:c081e7c8 r4:c080b4f0
> [  318.193993] [<c010dabc>] (secondary_start_kernel) from [<8010158c>] (0x8010158c)
> [  318.201423]  r5:00000051 r4:9f44006a
> [  318.205024] ---[ end trace 6e04001434b19cb9 ]---
> 
> 
> Just to be sure, I performed the same steps a second time:
> 
> [  316.238527] ------------[ cut here ]------------
> [  316.243210] WARNING: CPU: 1 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
> [  316.251619] Modules linked in:
> [  316.254702] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #2
> [  316.263182] Hardware name: Sigma Tango DT
> [  316.267206] Backtrace: 
> [  316.269675] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
> [  316.277280]  r7:60000113 r6:c080ea84 r5:00000000 r4:c080ea84
> [  316.282986] [<c010bb58>] (show_stack) from [<c02e9fc4>] (dump_stack+0x80/0x94)
> [  316.290254] [<c02e9f44>] (dump_stack) from [<c011bfb0>] (__warn+0xec/0x104)
> [  316.297247]  r7:00000009 r6:c05e3fc8 r5:00000000 r4:00000000
> [  316.302947] [<c011bec4>] (__warn) from [<c011c080>] (warn_slowpath_null+0x28/0x30)
> [  316.310552]  r9:dfbea4e0 r8:0000000a r7:df45fe30 r6:dec15694 r5:df6063c4 r4:df6062c0
> [  316.318354] [<c011c058>] (warn_slowpath_null) from [<c0463be4>] (inet_sock_destruct+0x1c4/0x1dc)
> [  316.327190] [<c0463a20>] (inet_sock_destruct) from [<c03e9c40>] (__sk_destruct+0x28/0xe0)
> [  316.335406]  r7:df45fe30 r6:dec15694 r5:df6062c0 r4:df60646c
> [  316.341112] [<c03e9c18>] (__sk_destruct) from [<c016f218>] (rcu_process_callbacks+0x488/0x59c)
> [  316.349765]  r5:00000000 r4:00000000
> [  316.353363] [<c016ed90>] (rcu_process_callbacks) from [<c01207cc>] (__do_softirq+0x138/0x264)
> [  316.361929]  r10:c08020a0 r9:40000001 r8:00000101 r7:df45e000 r6:c08020a4 r5:00000009
> [  316.369811]  r4:00000000
> [  316.372356] [<c0120694>] (__do_softirq) from [<c0120bec>] (irq_exit+0xc8/0x104)
> [  316.379699]  r10:df45ff58 r9:df402400 r8:00000001 r7:00000000 r6:00000013 r5:00000000
> [  316.387579]  r4:c0735428
> [  316.390127] [<c0120b24>] (irq_exit) from [<c01625f8>] (__handle_domain_irq+0x88/0xf4)
> [  316.397998] [<c0162570>] (__handle_domain_irq) from [<c01014ac>] (gic_handle_irq+0x50/0x94)
> [  316.406388]  r10:00000000 r9:e0803100 r8:e0802100 r7:df45ff58 r6:e080210c r5:c080277c
> [  316.414268]  r4:c080eca0 r3:df45ff58
> [  316.417862] [<c010145c>] (gic_handle_irq) from [<c010c694>] (__irq_svc+0x54/0x90)
> [  316.425382] Exception stack(0xdf45ff58 to 0xdf45ffa0)
> [  316.430456] ff40:                                                       00000001 00000000
> [  316.438676] ff60: 00009370 c0117c80 df45e000 c08024f8 c0802494 c081e2d6 c05b9550 413fc090
> [  316.446897] ff80: 00000000 df45ffb4 df45ffb8 df45ffa8 c01086b0 c01086b4 60000013 ffffffff
> [  316.455113]  r9:413fc090 r8:c05b9550 r7:df45ff8c r6:ffffffff r5:60000013 r4:c01086b4
> [  316.462916] [<c0108674>] (arch_cpu_idle) from [<c0155f3c>] (default_idle_call+0x28/0x34)
> [  316.471051] [<c0155f14>] (default_idle_call) from [<c0156070>] (cpu_startup_entry+0x128/0x17c)
> [  316.479709] [<c0155f48>] (cpu_startup_entry) from [<c010dc14>] (secondary_start_kernel+0x158/0x164)
> [  316.488799]  r7:c081e7c8 r4:c080b4f0
> [  316.492395] [<c010dabc>] (secondary_start_kernel) from [<8010158c>] (0x8010158c)
> [  316.499826]  r5:00000051 r4:9f44006a
> [  316.503430] ---[ end trace 2dd53d8e86a1a69b ]---

Adding linux-nfs in case it is NFS-related.
(I'm using an nfsroot file system, and the issue is triggered by
mounting an NFS file system.)

Similar warnings here:

http://thread.gmane.org/gmane.linux.kernel/2100812
WARNING: CPU: 1 PID: 81 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x188/0x1dc()
(different call stack though)

https://www.spinics.net/lists/linux-nfs/msg58133.html
WARNING: CPU: 9 PID: 31049 at net/ipv4/af_inet.c:155 .inet_sock_destruct+0x170/0x220
(different call stack though)

http://thread.gmane.org/gmane.linux.network/134151
WARNING: at net/ipv4/af_inet.c:155 inet_sock_destruct+0x122/0x13a()
(different call stack though)

http://oops.kernel.org/oops/?function=inet_sock_destruct&bugline=155&search=submit

None of these seem to mention rcu_process_callbacks() in the call stack.

I can reproduce the warning systematically.
What can I do to pinpoint the root of the issue?

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-12  9:53     ` Mason
  2016-07-12 11:48       ` Mason
@ 2016-07-12 14:25       ` Eric Dumazet
  2016-07-12 14:38         ` Mason
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2016-07-12 14:25 UTC (permalink / raw)
  To: Mason; +Cc: Florian Fainelli, netdev, LKML, Linux ARM, Sebastian Frias

On Tue, 2016-07-12 at 11:53 +0200, Mason wrote:
> On 05/07/2016 17:28, Florian Fainelli wrote:
> 
> > Le 05/07/2016 07:50, Mason wrote:
> > 
> >> On 05/07/2016 15:33, Mason wrote:
> >> 
> >>> I was testing suspend/resume sequences where the suspend operation
> >>> fails and returns without having suspended the platform.
> 
> Please forget I ever mentioned suspend, that was a red herring.
> The warning is displayed even if I never suspend.
> (Dropping linux-pm from this discussion.)
> 
> >> I rebooted, tried the same operation, and hit the same warning
> >> still 310 seconds later:
> 
> However, the 310 seconds time span still seems to be relevant.
> 
> Steps to reproduce: I booted the system, logged in as root,
> mounted an NFS file system, then left the system idling at
> the prompt.
> 
> (I don't remember seeing this warning in v4.1 and v4.4)
> 
> What's going wrong here? Is it related to NFS?
> 
> Here is the defconfig I'm using
> http://pastebin.ubuntu.com/19160299/
> 
> Regards.
> 

Could you try this debug patch ?

diff --git a/net/core/sock.c b/net/core/sock.c
index b7f12639c26a..7fb1aeadbda7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1465,6 +1465,7 @@ static void __sk_destruct(struct rcu_head *head)
 
 void sk_destruct(struct sock *sk)
 {
+	WARN_ON_ONCE(sk->sk_forward_alloc);
 	if (sock_flag(sk, SOCK_RCU_FREE))
 		call_rcu(&sk->sk_rcu, __sk_destruct);
 	else

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-12 14:25       ` Eric Dumazet
@ 2016-07-12 14:38         ` Mason
  2016-07-13 12:11           ` Mason
  0 siblings, 1 reply; 14+ messages in thread
From: Mason @ 2016-07-12 14:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Fainelli, netdev, LKML, Linux ARM, Sebastian Frias

On 12/07/2016 16:25, Eric Dumazet wrote:

> Could you try this debug patch ?

Note: I've been unable to trigger the warning again. Dunno what has changed...

With your patch applied, I get a warning at boot:

[    4.668309] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
[    4.688609] Sending DHCP requests ., OK
[    4.711935] IP-Config: Got DHCP answer from 172.27.200.1, my address is 172.27.64.49
[    4.719956] IP-Config: Complete:
[    4.723221]      device=eth0, hwaddr=00:16:e8:02:08:42, ipaddr=172.27.64.49, mask=255.255.192.0, gw=172.27.64.1
[    4.733376]      host=toto5, domain=france.foo.com sac.foo.com asic.foo.com soft.sde, nis-domain=france.foo.com
[    4.745279]      bootserver=172.27.64.1, rootserver=172.27.64.1, rootpath=/export/roots/titi/6_2_0_8756,v3     nameserver0=172.27.0.17
[    4.759725] ------------[ cut here ]------------
[    4.764426] WARNING: CPU: 0 PID: 877 at net/core/sock.c:1468 sk_destruct+0x74/0x78
[    4.772056] Modules linked in:
[    4.775133] CPU: 0 PID: 877 Comm: kworker/0:1H Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #6
[    4.784050] Hardware name: Sigma Tango DT
[    4.788084] Workqueue: rpciod rpc_async_schedule
[    4.792725] Backtrace: 
[    4.795196] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
[    4.802802]  r7:60000013 r6:c080ea84 r5:00000000 r4:c080ea84
[    4.808513] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
[    4.815781] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
[    4.822776]  r7:00000009 r6:c05e4d04 r5:00000000 r4:00000000
[    4.828482] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
[    4.836089]  r9:00000000 r8:df5edb58 r7:df711364 r6:df006c80 r5:df5edb58 r4:df5eda40
[    4.843898] [<c011c070>] (warn_slowpath_null) from [<c03fbd54>] (sk_destruct+0x74/0x78)
[    4.851945] [<c03fbce0>] (sk_destruct) from [<c03fbda8>] (__sk_free+0x50/0xbc)
[    4.859203]  r5:df5edb58 r4:df5eda40
[    4.862802] [<c03fbd58>] (__sk_free) from [<c03fbe50>] (sk_free+0x3c/0x40)
[    4.869710]  r5:df5edb58 r4:df5eda40
[    4.873310] [<c03fbe14>] (sk_free) from [<c03fbf80>] (sk_common_release+0xe8/0xf4)
[    4.880924] [<c03fbe98>] (sk_common_release) from [<c0467e50>] (udp_lib_close+0x10/0x14)
[    4.889054]  r5:df006c80 r4:df5eda40
[    4.892657] [<c0467e40>] (udp_lib_close) from [<c0473dbc>] (inet_release+0x4c/0x78)
[    4.900360] [<c0473d70>] (inet_release) from [<c03f605c>] (sock_release+0x28/0xa8)
[    4.907967]  r5:00000000 r4:df006c80
[    4.911575] [<c03f6034>] (sock_release) from [<c049a7f4>] (xs_reset_transport+0xac/0xbc)
[    4.919706]  r5:df5eda40 r4:df711000
[    4.923306] [<c049a748>] (xs_reset_transport) from [<c049a850>] (xs_destroy+0x24/0x54)
[    4.931262]  r9:00000000 r8:c049c614 r7:df7c3218 r6:df711000 r5:00000000 r4:df711000
[    4.939070] [<c049a82c>] (xs_destroy) from [<c0497c18>] (xprt_destroy+0x88/0x8c)
[    4.946502]  r5:df711218 r4:df711000
[    4.950102] [<c0497b90>] (xprt_destroy) from [<c0497c5c>] (xprt_put+0x40/0x44)
[    4.957358]  r5:df7c3200 r4:df613d00
[    4.960959] [<c0497c1c>] (xprt_put) from [<c04968e4>] (rpc_task_release_client+0x7c/0x80)
[    4.969181] [<c0496868>] (rpc_task_release_client) from [<c049c3ac>] (rpc_release_resources_task+0x34/0x38)
[    4.978971]  r7:c049c298 r6:00000001 r5:00000000 r4:df613d00
[    4.984674] [<c049c378>] (rpc_release_resources_task) from [<c049cdc8>] (__rpc_execute+0xb0/0x2a8)
[    4.993678]  r5:00000000 r4:df613d00
[    4.997277] [<c049cd18>] (__rpc_execute) from [<c049cfd4>] (rpc_async_schedule+0x14/0x18)
[    5.005496]  r10:df683500 r9:00000000 r8:dfbe4700 r7:00000000 r6:dfbdcc80 r5:df683500
[    5.013383]  r4:df613d24
[    5.015935] [<c049cfc0>] (rpc_async_schedule) from [<c0132640>] (process_one_work+0x128/0x3fc)
[    5.024595] [<c0132518>] (process_one_work) from [<c013296c>] (worker_thread+0x58/0x574)
[    5.032726]  r10:df683500 r9:df6d4000 r8:dfbdcc98 r7:c0802100 r6:00000008 r5:df683518
[    5.040614]  r4:dfbdcc80
[    5.043162] [<c0132914>] (worker_thread) from [<c0138470>] (kthread+0xe4/0xfc)
[    5.050419]  r10:00000000 r9:00000000 r8:00000000 r7:c0132914 r6:df683500 r5:df6cc600
[    5.058309]  r4:00000000
[    5.060858] [<c013838c>] (kthread) from [<c0107c18>] (ret_from_fork+0x14/0x3c)
[    5.068116]  r7:00000000 r6:00000000 r5:c013838c r4:df6cc600
[    5.073846] ---[ end trace 05b24e2dedd2f2a0 ]---
[    5.082009] VFS: Mounted root (nfs filesystem) readonly on device 0:11.
[    5.090328] Freeing unused kernel memory: 1024K (c0700000 - c0800000)
[    5.452552] random: udevd urandom read with 68 bits of entropy available

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc
  2016-07-12 14:38         ` Mason
@ 2016-07-13 12:11           ` Mason
  0 siblings, 0 replies; 14+ messages in thread
From: Mason @ 2016-07-13 12:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Fainelli, netdev, LKML, Linux ARM, Sebastian Frias

On 12/07/2016 16:38, Mason wrote:

> With Eric's patch applied, I get this warning at boot:
> 
> [    4.668309] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> [    4.688609] Sending DHCP requests ., OK
> [    4.711935] IP-Config: Got DHCP answer from 172.27.200.1, my address is 172.27.64.49
> [    4.719956] IP-Config: Complete:
> [    4.723221]      device=eth0, hwaddr=00:16:e8:02:08:42, ipaddr=172.27.64.49, mask=255.255.192.0, gw=172.27.64.1
> [    4.733376]      host=toto5, domain=france.foo.com sac.foo.com asic.foo.com soft.sde, nis-domain=france.foo.com
> [    4.745279]      bootserver=172.27.64.1, rootserver=172.27.64.1, rootpath=/export/roots/titi/6_2_0_8756,v3     nameserver0=172.27.0.17
> [    4.759725] ------------[ cut here ]------------
> [    4.764426] WARNING: CPU: 0 PID: 877 at net/core/sock.c:1468 sk_destruct+0x74/0x78
> [    4.772056] Modules linked in:
> [    4.775133] CPU: 0 PID: 877 Comm: kworker/0:1H Not tainted 4.7.0-rc6-00010-gd07031bdc433-dirty #6
> [    4.784050] Hardware name: Sigma Tango DT
> [    4.788084] Workqueue: rpciod rpc_async_schedule
> [    4.792725] Backtrace: 
> [    4.795196] [<c010b974>] (dump_backtrace) from [<c010bb70>] (show_stack+0x18/0x1c)
> [    4.802802]  r7:60000013 r6:c080ea84 r5:00000000 r4:c080ea84
> [    4.808513] [<c010bb58>] (show_stack) from [<c02e9fe4>] (dump_stack+0x80/0x94)
> [    4.815781] [<c02e9f64>] (dump_stack) from [<c011bfc8>] (__warn+0xec/0x104)
> [    4.822776]  r7:00000009 r6:c05e4d04 r5:00000000 r4:00000000
> [    4.828482] [<c011bedc>] (__warn) from [<c011c098>] (warn_slowpath_null+0x28/0x30)
> [    4.836089]  r9:00000000 r8:df5edb58 r7:df711364 r6:df006c80 r5:df5edb58 r4:df5eda40
> [    4.843898] [<c011c070>] (warn_slowpath_null) from [<c03fbd54>] (sk_destruct+0x74/0x78)
> [    4.851945] [<c03fbce0>] (sk_destruct) from [<c03fbda8>] (__sk_free+0x50/0xbc)
> [    4.859203]  r5:df5edb58 r4:df5eda40
> [    4.862802] [<c03fbd58>] (__sk_free) from [<c03fbe50>] (sk_free+0x3c/0x40)
> [    4.869710]  r5:df5edb58 r4:df5eda40
> [    4.873310] [<c03fbe14>] (sk_free) from [<c03fbf80>] (sk_common_release+0xe8/0xf4)
> [    4.880924] [<c03fbe98>] (sk_common_release) from [<c0467e50>] (udp_lib_close+0x10/0x14)
> [    4.889054]  r5:df006c80 r4:df5eda40
> [    4.892657] [<c0467e40>] (udp_lib_close) from [<c0473dbc>] (inet_release+0x4c/0x78)
> [    4.900360] [<c0473d70>] (inet_release) from [<c03f605c>] (sock_release+0x28/0xa8)
> [    4.907967]  r5:00000000 r4:df006c80
> [    4.911575] [<c03f6034>] (sock_release) from [<c049a7f4>] (xs_reset_transport+0xac/0xbc)
> [    4.919706]  r5:df5eda40 r4:df711000
> [    4.923306] [<c049a748>] (xs_reset_transport) from [<c049a850>] (xs_destroy+0x24/0x54)
> [    4.931262]  r9:00000000 r8:c049c614 r7:df7c3218 r6:df711000 r5:00000000 r4:df711000
> [    4.939070] [<c049a82c>] (xs_destroy) from [<c0497c18>] (xprt_destroy+0x88/0x8c)
> [    4.946502]  r5:df711218 r4:df711000
> [    4.950102] [<c0497b90>] (xprt_destroy) from [<c0497c5c>] (xprt_put+0x40/0x44)
> [    4.957358]  r5:df7c3200 r4:df613d00
> [    4.960959] [<c0497c1c>] (xprt_put) from [<c04968e4>] (rpc_task_release_client+0x7c/0x80)
> [    4.969181] [<c0496868>] (rpc_task_release_client) from [<c049c3ac>] (rpc_release_resources_task+0x34/0x38)
> [    4.978971]  r7:c049c298 r6:00000001 r5:00000000 r4:df613d00
> [    4.984674] [<c049c378>] (rpc_release_resources_task) from [<c049cdc8>] (__rpc_execute+0xb0/0x2a8)
> [    4.993678]  r5:00000000 r4:df613d00
> [    4.997277] [<c049cd18>] (__rpc_execute) from [<c049cfd4>] (rpc_async_schedule+0x14/0x18)
> [    5.005496]  r10:df683500 r9:00000000 r8:dfbe4700 r7:00000000 r6:dfbdcc80 r5:df683500
> [    5.013383]  r4:df613d24
> [    5.015935] [<c049cfc0>] (rpc_async_schedule) from [<c0132640>] (process_one_work+0x128/0x3fc)
> [    5.024595] [<c0132518>] (process_one_work) from [<c013296c>] (worker_thread+0x58/0x574)
> [    5.032726]  r10:df683500 r9:df6d4000 r8:dfbdcc98 r7:c0802100 r6:00000008 r5:df683518
> [    5.040614]  r4:dfbdcc80
> [    5.043162] [<c0132914>] (worker_thread) from [<c0138470>] (kthread+0xe4/0xfc)
> [    5.050419]  r10:00000000 r9:00000000 r8:00000000 r7:c0132914 r6:df683500 r5:df6cc600
> [    5.058309]  r4:00000000
> [    5.060858] [<c013838c>] (kthread) from [<c0107c18>] (ret_from_fork+0x14/0x3c)
> [    5.068116]  r7:00000000 r6:00000000 r5:c013838c r4:df6cc600
> [    5.073846] ---[ end trace 05b24e2dedd2f2a0 ]---
> [    5.082009] VFS: Mounted root (nfs filesystem) readonly on device 0:11.
> [    5.090328] Freeing unused kernel memory: 1024K (c0700000 - c0800000)
> [    5.452552] random: udevd urandom read with 68 bits of entropy available

AFAICT, I get the same call stack for every boot.

Is this an unexpected call sequence?

Are there other tests I can run?

Regards.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-07-13 12:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-05 13:33 WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc Mason
2016-07-05 14:50 ` Mason
2016-07-05 15:28   ` Florian Fainelli
2016-07-05 15:56     ` Mason
2016-07-05 16:20       ` Florian Fainelli
2016-07-05 20:26         ` Mason
2016-07-05 21:22           ` Florian Fainelli
2016-07-05 21:51             ` Mason
2016-07-05 21:55               ` Florian Fainelli
2016-07-12  9:53     ` Mason
2016-07-12 11:48       ` Mason
2016-07-12 14:25       ` Eric Dumazet
2016-07-12 14:38         ` Mason
2016-07-13 12:11           ` Mason

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).