* [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
@ 2016-07-26 3:50 Fengguang Wu
2016-07-26 9:14 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-26 3:50 UTC (permalink / raw)
To: LKML
Cc: netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Fengguang Wu, Ye Xiaolong
Greetings,
This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running
modprobe netconsole netconsole=@/,$port@$server/
[ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
[ 39.943285] netpoll: netconsole: local port 6665
[ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[ 39.943609] netpoll: netconsole: interface 'eth0'
[ 39.943756] netpoll: netconsole: remote port 6672
[ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[ 39.944311] netpoll: netconsole: local IP 192.168.1.193
[ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
[ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
[ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
[ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
[ 39.944526] Call Trace:
[ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
[ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
[ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
[ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
[ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
[ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
[ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
[ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
[ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
[ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
[ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
[ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
[ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
[ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
[ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
[ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
[ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
[ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
[ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
[ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
[ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
[ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
[ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
[ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
[ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
[ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
[ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
[ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
[ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
[ 39.946384] console [netcon0] enabled
[ 39.946514] netconsole: network logging started
Can this be possibly fixed?
Thanks,
Fengguang
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-26 3:50 [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-26 9:14 ` Eric Dumazet
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-26 9:14 UTC (permalink / raw)
To: Fengguang Wu
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> Greetings,
>
> This BUG message can be found in recent kernels as well as v4.4 and
> linux-stable. It happens when running
>
> modprobe netconsole netconsole=@/,$port@$server/
>
> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> [ 39.943285] netpoll: netconsole: local port 6665
> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> [ 39.943609] netpoll: netconsole: interface 'eth0'
> [ 39.943756] netpoll: netconsole: remote port 6672
> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193
> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> [ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> [ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> [ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> [ 39.944526] Call Trace:
> [ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
> [ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> [ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> [ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> [ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> [ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> [ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
> [ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> [ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> [ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> [ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> [ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> [ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> [ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> [ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> [ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
> [ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> [ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
> [ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> [ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> [ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> [ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> [ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
> [ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> [ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> [ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> [ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> [ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> [ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [ 39.946384] console [netcon0] enabled
> [ 39.946514] netconsole: network logging started
>
> Can this be possibly fixed?
Could you try this ?
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
{
struct e1000_adapter *adapter = netdev_priv(netdev);
- disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev);
- enable_irq(adapter->pdev->irq);
+ if (napi_schedule_prep(&adapter->napi)) {
+ adapter->total_tx_bytes = 0;
+ adapter->total_tx_packets = 0;
+ adapter->total_rx_bytes = 0;
+ adapter->total_rx_packets = 0;
+ __napi_schedule(&adapter->napi);
+ }
}
#endif
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH] schedule function called for e1000 driver interrupt
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
@ 2016-07-26 9:45 ` kbuild test robot
2016-07-26 9:50 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
2016-07-26 9:50 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2016-07-26 9:45 UTC (permalink / raw)
To: Fengguang Wu
Cc: kbuild-all, Eric Dumazet, nick, netdev, LKML, Ye Xiaolong,
intel-wired-lan, Satyam Sharma, Thomas Gleixner, cc
[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]
Hi,
[auto build test ERROR on jkirsher-next-queue/dev-queue]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Fengguang-Wu/schedule-function-called-for-e1000-driver-interrupt/20160726-173521
base: https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: x86_64-randconfig-x011-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64
All errors (new ones prefixed by >>):
drivers/net/ethernet/intel/e1000/e1000_main.c: In function 'e1000_intr':
>> drivers/net/ethernet/intel/e1000/e1000_main.c:3800:22: error: 'struct e1000_adapter' has no member named 'watchdog_timer'; did you mean 'watchdog_task'?
mod_timer(&adapter->watchdog_timer, jiffies + 1);
^~
vim +3800 drivers/net/ethernet/intel/e1000/e1000_main.c
3794 return IRQ_HANDLED;
3795
3796 if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
3797 hw->get_link_status = 1;
3798 /* guard against interrupt when we're going down */
3799 if (!test_bit(__E1000_DOWN, &adapter->flags))
> 3800 mod_timer(&adapter->watchdog_timer, jiffies + 1);
3801 }
3802
3803 /* disable interrupts, without the synchronize_irq bit */
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 26488 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26 9:45 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
@ 2016-07-26 9:50 ` Thomas Gleixner
[not found] ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
2016-07-26 9:50 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-26 9:50 UTC (permalink / raw)
To: Fengguang Wu
Cc: Eric Dumazet, LKML, netdev, Satyam Sharma, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
On Tue, 26 Jul 2016, Fengguang Wu wrote:
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> hw->get_link_status = 1;
> /* guard against interrupt when we're going down */
> if (!test_bit(__E1000_DOWN, &adapter->flags))
> - schedule_delayed_work(&adapter->watchdog_task, 1);
> + mod_timer(&adapter->watchdog_timer, jiffies + 1);
ROTFL ....
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] schedule function called for e1000 driver interrupt
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26 9:45 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26 9:50 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
@ 2016-07-26 9:50 ` kbuild test robot
2 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2016-07-26 9:50 UTC (permalink / raw)
To: Fengguang Wu
Cc: kbuild-all, Eric Dumazet, nick, netdev, LKML, Ye Xiaolong,
intel-wired-lan, Satyam Sharma, Thomas Gleixner, cc
[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]
Hi,
[auto build test ERROR on jkirsher-next-queue/dev-queue]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Fengguang-Wu/schedule-function-called-for-e1000-driver-interrupt/20160726-173521
base: https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64
All errors (new ones prefixed by >>):
drivers/net/ethernet/intel/e1000/e1000_main.c: In function 'e1000_intr':
>> drivers/net/ethernet/intel/e1000/e1000_main.c:3800:22: error: 'struct e1000_adapter' has no member named 'watchdog_timer'
mod_timer(&adapter->watchdog_timer, jiffies + 1);
^
vim +3800 drivers/net/ethernet/intel/e1000/e1000_main.c
3794 return IRQ_HANDLED;
3795
3796 if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
3797 hw->get_link_status = 1;
3798 /* guard against interrupt when we're going down */
3799 if (!test_bit(__E1000_DOWN, &adapter->flags))
> 3800 mod_timer(&adapter->watchdog_timer, jiffies + 1);
3801 }
3802
3803 /* disable interrupts, without the synchronize_irq bit */
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 46493 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-26 9:14 ` Eric Dumazet
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
@ 2016-07-26 15:32 ` Fengguang Wu
2016-07-26 16:28 ` Eric Dumazet
2016-07-27 21:38 ` Jeff Kirsher
2 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-26 15:32 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
Hi Eric,
It works!
On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> Greetings,
>>
>> This BUG message can be found in recent kernels as well as v4.4 and
>> linux-stable. It happens when running
>>
>> modprobe netconsole netconsole=@/,$port@$server/
>>
>> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
>> [ 39.943285] netpoll: netconsole: local port 6665
>> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> [ 39.943609] netpoll: netconsole: interface 'eth0'
>> [ 39.943756] netpoll: netconsole: remote port 6672
>> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193
>> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
>> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> [ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
>> [ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
>> [ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
>> [ 39.944526] Call Trace:
>> [ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
>> [ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
>> [ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
>> [ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
>> [ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
>> [ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
>> [ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
>> [ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
>> [ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
>> [ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
>> [ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
>> [ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
>> [ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
>> [ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
>> [ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
>> [ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
>> [ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
>> [ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
>> [ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
>> [ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
>> [ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
>> [ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
>> [ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
>> [ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
>> [ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
>> [ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
>> [ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
>> [ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
>> [ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [ 39.946384] console [netcon0] enabled
>> [ 39.946514] netconsole: network logging started
>>
>> Can this be possibly fixed?
>
>Could you try this ?
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>
>- disable_irq(adapter->pdev->irq);
>- e1000_intr(adapter->pdev->irq, netdev);
>- enable_irq(adapter->pdev->irq);
>+ if (napi_schedule_prep(&adapter->napi)) {
>+ adapter->total_tx_bytes = 0;
>+ adapter->total_tx_packets = 0;
>+ adapter->total_rx_bytes = 0;
>+ adapter->total_rx_packets = 0;
>+ __napi_schedule(&adapter->napi);
>+ }
The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:
kern :info : [ 16.109647] netpoll: netconsole: local port 6665
kern :info : [ 16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern :info : [ 16.110346] netpoll: netconsole: interface 'eth0'
kern :info : [ 16.110672] netpoll: netconsole: remote port 6676
kern :info : [ 16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
kern :info : [ 16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
kern :info : [ 16.111845] netpoll: netconsole: local IP 192.168.2.3
kern :info : [ 16.114284] console [netcon0] enabled
kern :info : [ 16.114550] netconsole: network logging started
However I'm not sure if it'll have side effects, because this
effectively disables the various checks in e1000_intr() and
e1000_intr_msi().
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..4f89873 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
case E1000E_INT_MODE_MSIX:
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
- case E1000E_INT_MODE_MSI:
- disable_irq(adapter->pdev->irq);
- e1000_intr_msi(adapter->pdev->irq, netdev);
- enable_irq(adapter->pdev->irq);
- break;
default: /* E1000E_INT_MODE_LEGACY */
- disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev);
- enable_irq(adapter->pdev->irq);
+ if (napi_schedule_prep(&adapter->napi)) {
+ adapter->total_tx_bytes = 0;
+ adapter->total_tx_packets = 0;
+ adapter->total_rx_bytes = 0;
+ adapter->total_rx_packets = 0;
+ __napi_schedule(&adapter->napi);
+ }
break;
}
}
Thanks,
Fengguang
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-26 15:32 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-26 16:28 ` Eric Dumazet
2016-07-27 15:01 ` Fengguang Wu
0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-07-26 16:28 UTC (permalink / raw)
To: Fengguang Wu
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> Hi Eric,
>
> It works!
>
> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> Greetings,
> >>
> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> linux-stable. It happens when running
> >>
> >> modprobe netconsole netconsole=@/,$port@$server/
> >>
> >> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> >> [ 39.943285] netpoll: netconsole: local port 6665
> >> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> [ 39.943609] netpoll: netconsole: interface 'eth0'
> >> [ 39.943756] netpoll: netconsole: remote port 6672
> >> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> >> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> [ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> >> [ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> >> [ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> >> [ 39.944526] Call Trace:
> >> [ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
> >> [ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> >> [ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> >> [ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> >> [ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> >> [ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> >> [ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
> >> [ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> >> [ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> >> [ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> >> [ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> >> [ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> >> [ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> >> [ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> >> [ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> >> [ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
> >> [ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> >> [ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
> >> [ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> >> [ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> >> [ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> [ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> >> [ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
> >> [ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> >> [ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> >> [ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> >> [ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> >> [ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> >> [ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> [ 39.946384] console [netcon0] enabled
> >> [ 39.946514] netconsole: network logging started
> >>
> >> Can this be possibly fixed?
> >
> >Could you try this ?
> >
> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> > {
> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >
> >- disable_irq(adapter->pdev->irq);
> >- e1000_intr(adapter->pdev->irq, netdev);
> >- enable_irq(adapter->pdev->irq);
> >+ if (napi_schedule_prep(&adapter->napi)) {
> >+ adapter->total_tx_bytes = 0;
> >+ adapter->total_tx_packets = 0;
> >+ adapter->total_rx_bytes = 0;
> >+ adapter->total_rx_packets = 0;
> >+ __napi_schedule(&adapter->napi);
> >+ }
>
> The machines are actually running e1000e driver, so I copied your
> approach to e1000e and it works:
>
> kern :info : [ 16.109647] netpoll: netconsole: local port 6665
> kern :info : [ 16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> kern :info : [ 16.110346] netpoll: netconsole: interface 'eth0'
> kern :info : [ 16.110672] netpoll: netconsole: remote port 6676
> kern :info : [ 16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
> kern :info : [ 16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> kern :info : [ 16.111845] netpoll: netconsole: local IP 192.168.2.3
> kern :info : [ 16.114284] console [netcon0] enabled
> kern :info : [ 16.114550] netconsole: network logging started
>
> However I'm not sure if it'll have side effects, because this
> effectively disables the various checks in e1000_intr() and
> e1000_intr_msi().
>
As far as netpoll is concerned, this should not matter.
We only want to drain packets from TX rings.
I have no idea why you hit this issue only recently, since this looks a
rather old bug to me ?
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 9b4ec13..4f89873 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
> case E1000E_INT_MODE_MSIX:
> e1000_intr_msix(adapter->pdev->irq, netdev);
> break;
> - case E1000E_INT_MODE_MSI:
> - disable_irq(adapter->pdev->irq);
> - e1000_intr_msi(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> - break;
> default: /* E1000E_INT_MODE_LEGACY */
> - disable_irq(adapter->pdev->irq);
> - e1000_intr(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_tx_bytes = 0;
> + adapter->total_tx_packets = 0;
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> break;
> }
> }
>
> Thanks,
> Fengguang
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-26 16:28 ` Eric Dumazet
@ 2016-07-27 15:01 ` Fengguang Wu
2016-07-27 18:50 ` Eric Dumazet
0 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-27 15:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
>> Hi Eric,
>>
>> It works!
>>
>> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> >> Greetings,
>> >>
>> >> This BUG message can be found in recent kernels as well as v4.4 and
>> >> linux-stable. It happens when running
>> >>
>> >> modprobe netconsole netconsole=@/,$port@$server/
>> >>
>> >> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
>> >> [ 39.943285] netpoll: netconsole: local port 6665
>> >> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> >> [ 39.943609] netpoll: netconsole: interface 'eth0'
>> >> [ 39.943756] netpoll: netconsole: remote port 6672
>> >> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> >> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> >> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193
>> >> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
>> >> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> >> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> >> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> >> [ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
>> >> [ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
>> >> [ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
>> >> [ 39.944526] Call Trace:
>> >> [ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
>> >> [ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
>> >> [ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
>> >> [ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
>> >> [ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
>> >> [ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
>> >> [ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
>> >> [ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
>> >> [ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
>> >> [ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
>> >> [ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
>> >> [ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
>> >> [ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
>> >> [ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
>> >> [ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
>> >> [ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
>> >> [ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
>> >> [ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
>> >> [ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
>> >> [ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
>> >> [ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
>> >> [ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
>> >> [ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
>> >> [ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
>> >> [ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
>> >> [ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
>> >> [ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
>> >> [ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
>> >> [ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> >> [ 39.946384] console [netcon0] enabled
>> >> [ 39.946514] netconsole: network logging started
>> >>
>> >> Can this be possibly fixed?
>> >
>> >Could you try this ?
>> >
>> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
>> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
>> > {
>> > struct e1000_adapter *adapter = netdev_priv(netdev);
>> >
>> >- disable_irq(adapter->pdev->irq);
>> >- e1000_intr(adapter->pdev->irq, netdev);
>> >- enable_irq(adapter->pdev->irq);
>> >+ if (napi_schedule_prep(&adapter->napi)) {
>> >+ adapter->total_tx_bytes = 0;
>> >+ adapter->total_tx_packets = 0;
>> >+ adapter->total_rx_bytes = 0;
>> >+ adapter->total_rx_packets = 0;
>> >+ __napi_schedule(&adapter->napi);
>> >+ }
>>
>> The machines are actually running e1000e driver, so I copied your
>> approach to e1000e and it works:
>>
>> kern :info : [ 16.109647] netpoll: netconsole: local port 6665
>> kern :info : [ 16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
>> kern :info : [ 16.110346] netpoll: netconsole: interface 'eth0'
>> kern :info : [ 16.110672] netpoll: netconsole: remote port 6676
>> kern :info : [ 16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
>> kern :info : [ 16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> kern :info : [ 16.111845] netpoll: netconsole: local IP 192.168.2.3
>> kern :info : [ 16.114284] console [netcon0] enabled
>> kern :info : [ 16.114550] netconsole: network logging started
>>
>> However I'm not sure if it'll have side effects, because this
>> effectively disables the various checks in e1000_intr() and
>> e1000_intr_msi().
>>
>
>As far as netpoll is concerned, this should not matter.
>
>We only want to drain packets from TX rings.
OK.
>I have no idea why you hit this issue only recently, since this looks a
>rather old bug to me ?
Yeah it's a rather old bug. It becomes obvious when we try to detect
and filter out buggy kernels for the machines that are expected to run
stable services. This BUG effectively blocks the stable machines from
booting because no clean kernel (v4.6.4, v4.6, v4.5, v4.4, ...) are
available at all. ;)
Thanks,
Fengguang
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 9b4ec13..4f89873 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
>> case E1000E_INT_MODE_MSIX:
>> e1000_intr_msix(adapter->pdev->irq, netdev);
>> break;
>> - case E1000E_INT_MODE_MSI:
>> - disable_irq(adapter->pdev->irq);
>> - e1000_intr_msi(adapter->pdev->irq, netdev);
>> - enable_irq(adapter->pdev->irq);
>> - break;
>> default: /* E1000E_INT_MODE_LEGACY */
>> - disable_irq(adapter->pdev->irq);
>> - e1000_intr(adapter->pdev->irq, netdev);
>> - enable_irq(adapter->pdev->irq);
>> + if (napi_schedule_prep(&adapter->napi)) {
>> + adapter->total_tx_bytes = 0;
>> + adapter->total_tx_packets = 0;
>> + adapter->total_rx_bytes = 0;
>> + adapter->total_rx_packets = 0;
>> + __napi_schedule(&adapter->napi);
>> + }
>> break;
>> }
>> }
>>
>> Thanks,
>> Fengguang
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-27 15:01 ` Fengguang Wu
@ 2016-07-27 18:50 ` Eric Dumazet
0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-27 18:50 UTC (permalink / raw)
To: Fengguang Wu
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Jeff Kirsher, Ye Xiaolong
On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote:
> On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> >> Hi Eric,
> >>
> >> It works!
> >>
> >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> >> Greetings,
> >> >>
> >> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> >> linux-stable. It happens when running
> >> >>
> >> >> modprobe netconsole netconsole=@/,$port@$server/
> >> >>
> >> >> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> >> >> [ 39.943285] netpoll: netconsole: local port 6665
> >> >> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> >> [ 39.943609] netpoll: netconsole: interface 'eth0'
> >> >> [ 39.943756] netpoll: netconsole: remote port 6672
> >> >> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> >> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> >> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> >> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> >> >> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> >> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> >> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> >> [ 39.944522] 0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> >> >> [ 39.944524] 000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> >> >> [ 39.944525] ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> >> >> [ 39.944526] Call Trace:
> >> >> [ 39.944533] [<ffffffff813417d9>] dump_stack+0x63/0x8a
> >> >> [ 39.944536] [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> >> >> [ 39.944537] [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> >> >> [ 39.944541] [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> >> >> [ 39.944543] [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> >> >> [ 39.944545] [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> >> >> [ 39.944547] [<ffffffff810e492c>] disable_irq+0x1c/0x20
> >> >> [ 39.944559] [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> >> >> [ 39.944563] [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> >> >> [ 39.944567] [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> >> >> [ 39.944569] [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> >> >> [ 39.944572] [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> >> >> [ 39.944576] [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> >> >> [ 39.944578] [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> >> >> [ 39.944580] [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> >> >> [ 39.944583] [<ffffffff810e2c74>] register_console+0x1c4/0x380
> >> >> [ 39.944586] [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> >> >> [ 39.944588] [<ffffffffa004f000>] ? 0xffffffffa004f000
> >> >> [ 39.944591] [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> >> >> [ 39.944592] [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> >> >> [ 39.944596] [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> >> [ 39.944598] [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> >> >> [ 39.944602] [<ffffffff81114009>] load_module+0x1429/0x1b40
> >> >> [ 39.944604] [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> >> >> [ 39.944607] [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> >> >> [ 39.944608] [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> >> >> [ 39.944611] [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> >> >> [ 39.944614] [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> >> >> [ 39.944617] [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> >> [ 39.946384] console [netcon0] enabled
> >> >> [ 39.946514] netconsole: network logging started
> >> >>
> >> >> Can this be possibly fixed?
> >> >
> >> >Could you try this ?
> >> >
> >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
> >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> >> > {
> >> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >> >
> >> >- disable_irq(adapter->pdev->irq);
> >> >- e1000_intr(adapter->pdev->irq, netdev);
> >> >- enable_irq(adapter->pdev->irq);
> >> >+ if (napi_schedule_prep(&adapter->napi)) {
> >> >+ adapter->total_tx_bytes = 0;
> >> >+ adapter->total_tx_packets = 0;
> >> >+ adapter->total_rx_bytes = 0;
> >> >+ adapter->total_rx_packets = 0;
> >> >+ __napi_schedule(&adapter->napi);
> >> >+ }
> >>
> >> The machines are actually running e1000e driver, so I copied your
> >> approach to e1000e and it works:
> >>
> >> kern :info : [ 16.109647] netpoll: netconsole: local port 6665
> >> kern :info : [ 16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> kern :info : [ 16.110346] netpoll: netconsole: interface 'eth0'
> >> kern :info : [ 16.110672] netpoll: netconsole: remote port 6676
> >> kern :info : [ 16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
> >> kern :info : [ 16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> kern :info : [ 16.111845] netpoll: netconsole: local IP 192.168.2.3
> >> kern :info : [ 16.114284] console [netcon0] enabled
> >> kern :info : [ 16.114550] netconsole: network logging started
> >>
> >> However I'm not sure if it'll have side effects, because this
> >> effectively disables the various checks in e1000_intr() and
> >> e1000_intr_msi().
> >>
> >
> >As far as netpoll is concerned, this should not matter.
> >
> >We only want to drain packets from TX rings.
>
> OK.
>
> >I have no idea why you hit this issue only recently, since this looks a
> >rather old bug to me ?
>
> Yeah it's a rather old bug. It becomes obvious when we try to detect
> and filter out buggy kernels for the machines that are expected to run
> stable services. This BUG effectively blocks the stable machines from
> booting because no clean kernel (v4.6.4, v4.6, v4.5, v4.4, ...) are
> available at all. ;)
>
> Thanks,
> Fengguang
> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> index 9b4ec13..4f89873 100644
> >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device
*netdev)
> >> case E1000E_INT_MODE_MSIX:
> >> e1000_intr_msix(adapter->pdev->irq, netdev);
> >> break;
> >> - case E1000E_INT_MODE_MSI:
> >> - disable_irq(adapter->pdev->irq);
> >> - e1000_intr_msi(adapter->pdev->irq, netdev);
> >> - enable_irq(adapter->pdev->irq);
> >> - break;
> >> default: /* E1000E_INT_MODE_LEGACY */
> >> - disable_irq(adapter->pdev->irq);
> >> - e1000_intr(adapter->pdev->irq, netdev);
> >> - enable_irq(adapter->pdev->irq);
> >> + if (napi_schedule_prep(&adapter->napi)) {
> >> + adapter->total_tx_bytes = 0;
> >> + adapter->total_tx_packets = 0;
> >> + adapter->total_rx_bytes = 0;
> >> + adapter->total_rx_packets = 0;
> >> + __napi_schedule(&adapter->napi);
> >> + }
> >> break;
> >> }
> >> }
> >>
> >> Thanks,
> >> Fengguang
> >
>
About all netpoll implementations use disable_irq(), so I guess netpoll
is not compatible with threaded irqs.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-26 9:14 ` Eric Dumazet
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26 15:32 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-27 21:38 ` Jeff Kirsher
2016-07-28 5:43 ` Eric Dumazet
2 siblings, 1 reply; 17+ messages in thread
From: Jeff Kirsher @ 2016-07-27 21:38 UTC (permalink / raw)
To: Eric Dumazet, Fengguang Wu
Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
Ye Xiaolong
[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]
On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> Could you try this ?
>
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index
> f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> 938b3820b 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> *netdev)
> {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>
> - disable_irq(adapter->pdev->irq);
> - e1000_intr(adapter->pdev->irq, netdev);
> - enable_irq(adapter->pdev->irq);
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_tx_bytes = 0;
> + adapter->total_tx_packets = 0;
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> }
> #endif
>
Since this fixes the issue Fengguang saw, will you be submitting a formal
patch Eric? (please) I can get this queued up for Dave's net tree as soon
as I receive the formal patch.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-27 21:38 ` Jeff Kirsher
@ 2016-07-28 5:43 ` Eric Dumazet
2016-07-28 10:19 ` Sabrina Dubroca
2016-07-28 23:28 ` [Intel-wired-lan] " Francois Romieu
0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-28 5:43 UTC (permalink / raw)
To: Jeff Kirsher
Cc: Fengguang Wu, LKML, netdev, Satyam Sharma, Thomas Gleixner,
intel-wired-lan, Ye Xiaolong
On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > Could you try this ?
> >
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index
> > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > 938b3820b 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > *netdev)
> > {
> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >
> > - disable_irq(adapter->pdev->irq);
> > - e1000_intr(adapter->pdev->irq, netdev);
> > - enable_irq(adapter->pdev->irq);
> > + if (napi_schedule_prep(&adapter->napi)) {
> > + adapter->total_tx_bytes = 0;
> > + adapter->total_tx_packets = 0;
> > + adapter->total_rx_bytes = 0;
> > + adapter->total_rx_packets = 0;
> > + __napi_schedule(&adapter->napi);
> > + }
> > }
> > #endif
> >
>
> Since this fixes the issue Fengguang saw, will you be submitting a formal
> patch Eric? (please) I can get this queued up for Dave's net tree as soon
> as I receive the formal patch.
I would prefer having a definitive advice from Thomas Gleixner and/or
others if disable_irq() is forbidden from IRQ path.
As I said, about all netpoll() methods in net drivers use disable_irq()
so a lot of patches would be needed.
disable_irq() should then test this condition earlier, so that we can
detect potential bug, even if the IRQ is not (yet) threaded.
Thanks.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
[not found] ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
@ 2016-07-28 7:45 ` Thomas Gleixner
2016-07-28 9:46 ` Valdis.Kletnieks
0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-28 7:45 UTC (permalink / raw)
To: nick
Cc: Fengguang Wu, Eric Dumazet, LKML, netdev, Satyam Sharma,
intel-wired-lan, Jeff Kirsher, Ye Xiaolong
On Tue, 26 Jul 2016, nick wrote:
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..e1830af 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> hw->get_link_status = 1;
> /* guard against interrupt when we're going down */
> if (!test_bit(__E1000_DOWN, &adapter->flags))
> - schedule_delayed_work(&adapter->watchdog_task, 1);
> + mod_work(&adapter->watchdog_task, jiffies + 1);
And that's not even funny anymore. Are you using a random generator to create
these patches?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-28 7:45 ` Thomas Gleixner
@ 2016-07-28 9:46 ` Valdis.Kletnieks
0 siblings, 0 replies; 17+ messages in thread
From: Valdis.Kletnieks @ 2016-07-28 9:46 UTC (permalink / raw)
To: Thomas Gleixner
Cc: nick, Fengguang Wu, Eric Dumazet, LKML, netdev, Satyam Sharma,
intel-wired-lan, Jeff Kirsher, Ye Xiaolong
[-- Attachment #1: Type: text/plain, Size: 924 bytes --]
On Thu, 28 Jul 2016 09:45:12 +0200, Thomas Gleixner said:
> On Tue, 26 Jul 2016, nick wrote:
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index f42129d..e1830af 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> > hw->get_link_status = 1;
> > /* guard against interrupt when we're going down */
> > if (!test_bit(__E1000_DOWN, &adapter->flags))
> > - schedule_delayed_work(&adapter->watchdog_task, 1);
> > + mod_work(&adapter->watchdog_task, jiffies + 1);
>
> And that's not even funny anymore. Are you using a random generator to create
> these patches?
At some point, we need to decide if the occasional accidentally-correct
trivial patch from Nick is worth all the wasted maintainer time.
[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-28 5:43 ` Eric Dumazet
@ 2016-07-28 10:19 ` Sabrina Dubroca
2016-07-28 12:21 ` Thomas Gleixner
2016-07-28 13:30 ` Fengguang Wu
2016-07-28 23:28 ` [Intel-wired-lan] " Francois Romieu
1 sibling, 2 replies; 17+ messages in thread
From: Sabrina Dubroca @ 2016-07-28 10:19 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jeff Kirsher, Fengguang Wu, LKML, netdev, Satyam Sharma,
Thomas Gleixner, intel-wired-lan, Ye Xiaolong
2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> > On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > > Could you try this ?
> > >
> > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > index
> > > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > > 938b3820b 100644
> > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > > *netdev)
> > > {
> > > struct e1000_adapter *adapter = netdev_priv(netdev);
> > >
> > > - disable_irq(adapter->pdev->irq);
> > > - e1000_intr(adapter->pdev->irq, netdev);
> > > - enable_irq(adapter->pdev->irq);
> > > + if (napi_schedule_prep(&adapter->napi)) {
> > > + adapter->total_tx_bytes = 0;
> > > + adapter->total_tx_packets = 0;
> > > + adapter->total_rx_bytes = 0;
> > > + adapter->total_rx_packets = 0;
> > > + __napi_schedule(&adapter->napi);
> > > + }
> > > }
> > > #endif
> > >
> >
> > Since this fixes the issue Fengguang saw, will you be submitting a formal
> > patch Eric? (please) I can get this queued up for Dave's net tree as soon
> > as I receive the formal patch.
>
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
>
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.
>
> disable_irq() should then test this condition earlier, so that we can
> detect potential bug, even if the IRQ is not (yet) threaded.
The idea when this first came up was to skip the sleeping part of
disable_irq():
http://marc.info/?l=linux-netdev&m=142314159626052
This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
vector = 0;
msix_irq = adapter->msix_entries[vector].vector;
- disable_irq(msix_irq);
- e1000_intr_msix_rx(msix_irq, netdev);
+ if (disable_hardirq(msix_irq))
+ e1000_intr_msix_rx(msix_irq, netdev);
enable_irq(msix_irq);
vector++;
msix_irq = adapter->msix_entries[vector].vector;
- disable_irq(msix_irq);
- e1000_intr_msix_tx(msix_irq, netdev);
+ if (disable_hardirq(msix_irq))
+ e1000_intr_msix_tx(msix_irq, netdev);
enable_irq(msix_irq);
vector++;
msix_irq = adapter->msix_entries[vector].vector;
- disable_irq(msix_irq);
- e1000_msix_other(msix_irq, netdev);
+ if (disable_hardirq(msix_irq))
+ e1000_msix_other(msix_irq, netdev);
enable_irq(msix_irq);
}
@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
e1000_intr_msix(adapter->pdev->irq, netdev);
break;
case E1000E_INT_MODE_MSI:
- disable_irq(adapter->pdev->irq);
- e1000_intr_msi(adapter->pdev->irq, netdev);
+ if (disable_hardirq(adapter->pdev->irq))
+ e1000_intr_msi(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
default: /* E1000E_INT_MODE_LEGACY */
- disable_irq(adapter->pdev->irq);
- e1000_intr(adapter->pdev->irq, netdev);
+ if (disable_hardirq(adapter->pdev->irq))
+ e1000_intr(adapter->pdev->irq, netdev);
enable_irq(adapter->pdev->irq);
break;
}
--
Sabrina
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-28 10:19 ` Sabrina Dubroca
@ 2016-07-28 12:21 ` Thomas Gleixner
2016-07-28 13:30 ` Fengguang Wu
1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-28 12:21 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Eric Dumazet, Jeff Kirsher, Fengguang Wu, LKML, netdev,
Satyam Sharma, intel-wired-lan, Ye Xiaolong
On Thu, 28 Jul 2016, Sabrina Dubroca wrote:
> 2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> > I would prefer having a definitive advice from Thomas Gleixner and/or
> > others if disable_irq() is forbidden from IRQ path.
Yes it is. Before we added threaded interrupt handlers it was not an issue,
but with (possibly) threaded interrupts it's an absolute no-no.
> > As I said, about all netpoll() methods in net drivers use disable_irq()
> > so a lot of patches would be needed.
> >
> > disable_irq() should then test this condition earlier, so that we can
> > detect potential bug, even if the IRQ is not (yet) threaded.
>
> The idea when this first came up was to skip the sleeping part of
> disable_irq():
>
> http://marc.info/?l=linux-netdev&m=142314159626052
>
> This fell off my todolist and I didn't send the conversion patches,
> which would basically look like this:
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 41f32c0b341e..b022691e680b 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
>
> vector = 0;
> msix_irq = adapter->msix_entries[vector].vector;
> - disable_irq(msix_irq);
> - e1000_intr_msix_rx(msix_irq, netdev);
> + if (disable_hardirq(msix_irq))
> + e1000_intr_msix_rx(msix_irq, netdev);
> enable_irq(msix_irq);
That'll work nicely even when one of the affected interrupts is threaded.
Thanks,
tglx
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-28 10:19 ` Sabrina Dubroca
2016-07-28 12:21 ` Thomas Gleixner
@ 2016-07-28 13:30 ` Fengguang Wu
1 sibling, 0 replies; 17+ messages in thread
From: Fengguang Wu @ 2016-07-28 13:30 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Eric Dumazet, Jeff Kirsher, LKML, netdev, Satyam Sharma,
Thomas Gleixner, intel-wired-lan, Ye Xiaolong
Hi Sabrina,
>The idea when this first came up was to skip the sleeping part of
>disable_irq():
>
>http://marc.info/?l=linux-netdev&m=142314159626052
>
>This fell off my todolist and I didn't send the conversion patches,
>which would basically look like this:
Yes it works in the several machines that had the BUG!
[ 23.806847] netpoll: netconsole: local port 6665
[ 23.807145] netpoll: netconsole: local IPv4 address 0.0.0.0
[ 23.807494] netpoll: netconsole: interface 'eth0'
[ 23.807799] netpoll: netconsole: remote port 6646
[ 23.808096] netpoll: netconsole: remote IPv4 address 192.168.1.1
[ 23.808474] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[ 23.808910] netpoll: netconsole: local IP 192.168.1.161
[ 23.811680] 28 Jul 19:42:10 ntpdate[376]: step time server 192.168.1.1 offset 1696.257557 sec
[ 23.811886] console [netcon0] enabled
[ 23.812131] netconsole: network logging started
Thanks,
Fengguang
>
>diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>index 41f32c0b341e..b022691e680b 100644
>--- a/drivers/net/ethernet/intel/e1000e/netdev.c
>+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
>
> vector = 0;
> msix_irq = adapter->msix_entries[vector].vector;
>- disable_irq(msix_irq);
>- e1000_intr_msix_rx(msix_irq, netdev);
>+ if (disable_hardirq(msix_irq))
>+ e1000_intr_msix_rx(msix_irq, netdev);
> enable_irq(msix_irq);
>
> vector++;
> msix_irq = adapter->msix_entries[vector].vector;
>- disable_irq(msix_irq);
>- e1000_intr_msix_tx(msix_irq, netdev);
>+ if (disable_hardirq(msix_irq))
>+ e1000_intr_msix_tx(msix_irq, netdev);
> enable_irq(msix_irq);
>
> vector++;
> msix_irq = adapter->msix_entries[vector].vector;
>- disable_irq(msix_irq);
>- e1000_msix_other(msix_irq, netdev);
>+ if (disable_hardirq(msix_irq))
>+ e1000_msix_other(msix_irq, netdev);
> enable_irq(msix_irq);
> }
>
>@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
> e1000_intr_msix(adapter->pdev->irq, netdev);
> break;
> case E1000E_INT_MODE_MSI:
>- disable_irq(adapter->pdev->irq);
>- e1000_intr_msi(adapter->pdev->irq, netdev);
>+ if (disable_hardirq(adapter->pdev->irq))
>+ e1000_intr_msi(adapter->pdev->irq, netdev);
> enable_irq(adapter->pdev->irq);
> break;
> default: /* E1000E_INT_MODE_LEGACY */
>- disable_irq(adapter->pdev->irq);
>- e1000_intr(adapter->pdev->irq, netdev);
>+ if (disable_hardirq(adapter->pdev->irq))
>+ e1000_intr(adapter->pdev->irq, netdev);
> enable_irq(adapter->pdev->irq);
> br
>ak;
> }
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
2016-07-28 5:43 ` Eric Dumazet
2016-07-28 10:19 ` Sabrina Dubroca
@ 2016-07-28 23:28 ` Francois Romieu
1 sibling, 0 replies; 17+ messages in thread
From: Francois Romieu @ 2016-07-28 23:28 UTC (permalink / raw)
To: Eric Dumazet
Cc: Jeff Kirsher, netdev, LKML, Ye Xiaolong, intel-wired-lan,
Satyam Sharma, Thomas Gleixner
Eric Dumazet <eric.dumazet@gmail.com> :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
>
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.
s/about all/many/
There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/
--
Ueimor
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2016-07-28 23:29 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-26 3:50 [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26 9:14 ` Eric Dumazet
[not found] ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26 9:45 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26 9:50 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
[not found] ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
2016-07-28 7:45 ` Thomas Gleixner
2016-07-28 9:46 ` Valdis.Kletnieks
2016-07-26 9:50 ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26 15:32 ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26 16:28 ` Eric Dumazet
2016-07-27 15:01 ` Fengguang Wu
2016-07-27 18:50 ` Eric Dumazet
2016-07-27 21:38 ` Jeff Kirsher
2016-07-28 5:43 ` Eric Dumazet
2016-07-28 10:19 ` Sabrina Dubroca
2016-07-28 12:21 ` Thomas Gleixner
2016-07-28 13:30 ` Fengguang Wu
2016-07-28 23:28 ` [Intel-wired-lan] " Francois Romieu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).