linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
@ 2016-07-26  3:50 Fengguang Wu
  2016-07-26  9:14 ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-26  3:50 UTC (permalink / raw)
  To: LKML
  Cc: netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Fengguang Wu, Ye Xiaolong

Greetings,

This BUG message can be found in recent kernels as well as v4.4 and
linux-stable. It happens when running

        modprobe netconsole netconsole=@/,$port@$server/ 

[   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
[   39.943285] netpoll: netconsole: local port 6665
[   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
[   39.943609] netpoll: netconsole: interface 'eth0'
[   39.943756] netpoll: netconsole: remote port 6672
[   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   39.944311] netpoll: netconsole: local IP 192.168.1.193
[   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
[   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
[   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
[   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
[   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
[   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
[   39.944526] Call Trace:
[   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
[   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
[   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
[   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
[   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
[   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
[   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
[   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
[   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
[   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
[   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
[   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
[   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
[   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
[   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
[   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
[   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
[   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
[   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
[   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
[   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
[   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
[   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
[   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
[   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
[   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
[   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
[   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
[   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
[   39.946384] console [netcon0] enabled
[   39.946514] netconsole: network logging started

Can this be possibly fixed?

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-26  3:50 [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-26  9:14 ` Eric Dumazet
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-26  9:14 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> Greetings,
> 
> This BUG message can be found in recent kernels as well as v4.4 and
> linux-stable. It happens when running
> 
>         modprobe netconsole netconsole=@/,$port@$server/ 
> 
> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> [   39.943285] netpoll: netconsole: local port 6665
> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> [   39.943609] netpoll: netconsole: interface 'eth0'
> [   39.943756] netpoll: netconsole: remote port 6672
> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> [   39.944526] Call Trace:
> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> [   39.946384] console [netcon0] enabled
> [   39.946514] netconsole: network logging started
> 
> Can this be possibly fixed?

Could you try this ?

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 
-	disable_irq(adapter->pdev->irq);
-	e1000_intr(adapter->pdev->irq, netdev);
-	enable_irq(adapter->pdev->irq);
+	if (napi_schedule_prep(&adapter->napi)) {
+		adapter->total_tx_bytes = 0;
+		adapter->total_tx_packets = 0;
+		adapter->total_rx_bytes = 0;
+		adapter->total_rx_packets = 0;
+		__napi_schedule(&adapter->napi);
+	}
 }
 #endif
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] schedule function called for e1000 driver interrupt
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
@ 2016-07-26  9:45     ` kbuild test robot
  2016-07-26  9:50     ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
  2016-07-26  9:50     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
  2 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2016-07-26  9:45 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: kbuild-all, Eric Dumazet, nick, netdev, LKML, Ye Xiaolong,
	intel-wired-lan, Satyam Sharma, Thomas Gleixner, cc

[-- Attachment #1: Type: text/plain, Size: 1555 bytes --]

Hi,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Fengguang-Wu/schedule-function-called-for-e1000-driver-interrupt/20160726-173521
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: x86_64-randconfig-x011-201630 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/e1000/e1000_main.c: In function 'e1000_intr':
>> drivers/net/ethernet/intel/e1000/e1000_main.c:3800:22: error: 'struct e1000_adapter' has no member named 'watchdog_timer'; did you mean 'watchdog_task'?
       mod_timer(&adapter->watchdog_timer, jiffies + 1);
                         ^~

vim +3800 drivers/net/ethernet/intel/e1000/e1000_main.c

  3794			return IRQ_HANDLED;
  3795	
  3796		if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
  3797			hw->get_link_status = 1;
  3798			/* guard against interrupt when we're going down */
  3799			if (!test_bit(__E1000_DOWN, &adapter->flags))
> 3800				mod_timer(&adapter->watchdog_timer, jiffies + 1);
  3801		}
  3802	
  3803		/* disable interrupts, without the synchronize_irq bit */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 26488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
  2016-07-26  9:45     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
@ 2016-07-26  9:50     ` Thomas Gleixner
       [not found]       ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
  2016-07-26  9:50     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
  2 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-26  9:50 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: Eric Dumazet, LKML, netdev, Satyam Sharma, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

On Tue, 26 Jul 2016, Fengguang Wu wrote:
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> 		hw->get_link_status = 1;
> 		/* guard against interrupt when we're going down */
> 		if (!test_bit(__E1000_DOWN, &adapter->flags))
> -			schedule_delayed_work(&adapter->watchdog_task, 1);
> +			mod_timer(&adapter->watchdog_timer, jiffies + 1);

ROTFL ....

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] schedule function called for e1000 driver interrupt
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
  2016-07-26  9:45     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
  2016-07-26  9:50     ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
@ 2016-07-26  9:50     ` kbuild test robot
  2 siblings, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2016-07-26  9:50 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: kbuild-all, Eric Dumazet, nick, netdev, LKML, Ye Xiaolong,
	intel-wired-lan, Satyam Sharma, Thomas Gleixner, cc

[-- Attachment #1: Type: text/plain, Size: 1689 bytes --]

Hi,

[auto build test ERROR on jkirsher-next-queue/dev-queue]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Fengguang-Wu/schedule-function-called-for-e1000-driver-interrupt/20160726-173521
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: sparc64-allyesconfig (attached as .config)
compiler: sparc64-linux-gnu-gcc (Debian 5.4.0-6) 5.4.0 20160609
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/intel/e1000/e1000_main.c: In function 'e1000_intr':
>> drivers/net/ethernet/intel/e1000/e1000_main.c:3800:22: error: 'struct e1000_adapter' has no member named 'watchdog_timer'
       mod_timer(&adapter->watchdog_timer, jiffies + 1);
                         ^

vim +3800 drivers/net/ethernet/intel/e1000/e1000_main.c

  3794			return IRQ_HANDLED;
  3795	
  3796		if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
  3797			hw->get_link_status = 1;
  3798			/* guard against interrupt when we're going down */
  3799			if (!test_bit(__E1000_DOWN, &adapter->flags))
> 3800				mod_timer(&adapter->watchdog_timer, jiffies + 1);
  3801		}
  3802	
  3803		/* disable interrupts, without the synchronize_irq bit */

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 46493 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-26  9:14 ` Eric Dumazet
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
@ 2016-07-26 15:32   ` Fengguang Wu
  2016-07-26 16:28     ` Eric Dumazet
  2016-07-27 21:38   ` Jeff Kirsher
  2 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-26 15:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> Greetings,
>>
>> This BUG message can be found in recent kernels as well as v4.4 and
>> linux-stable. It happens when running
>>
>>         modprobe netconsole netconsole=@/,$port@$server/
>>
>> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
>> [   39.943285] netpoll: netconsole: local port 6665
>> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> [   39.943609] netpoll: netconsole: interface 'eth0'
>> [   39.943756] netpoll: netconsole: remote port 6672
>> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
>> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
>> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
>> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
>> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
>> [   39.944526] Call Trace:
>> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
>> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
>> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
>> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
>> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
>> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
>> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
>> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
>> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
>> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
>> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
>> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
>> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
>> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
>> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
>> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
>> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
>> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
>> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
>> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
>> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
>> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
>> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
>> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
>> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
>> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
>> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
>> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
>> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [   39.946384] console [netcon0] enabled
>> [   39.946514] netconsole: network logging started
>>
>> Can this be possibly fixed?
>
>Could you try this ?
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> {
> 	struct e1000_adapter *adapter = netdev_priv(netdev);
>
>-	disable_irq(adapter->pdev->irq);
>-	e1000_intr(adapter->pdev->irq, netdev);
>-	enable_irq(adapter->pdev->irq);
>+	if (napi_schedule_prep(&adapter->napi)) {
>+		adapter->total_tx_bytes = 0;
>+		adapter->total_tx_packets = 0;
>+		adapter->total_rx_bytes = 0;
>+		adapter->total_rx_packets = 0;
>+		__napi_schedule(&adapter->napi);
>+	}

The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
kern  :info  : [   16.114284] console [netcon0] enabled
kern  :info  : [   16.114550] netconsole: network logging started

However I'm not sure if it'll have side effects, because this
effectively disables the various checks in e1000_intr() and
e1000_intr_msi().

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..4f89873 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
 	case E1000E_INT_MODE_MSIX:
 		e1000_intr_msix(adapter->pdev->irq, netdev);
 		break;
-	case E1000E_INT_MODE_MSI:
-		disable_irq(adapter->pdev->irq);
-		e1000_intr_msi(adapter->pdev->irq, netdev);
-		enable_irq(adapter->pdev->irq);
-		break;
 	default:		/* E1000E_INT_MODE_LEGACY */
-		disable_irq(adapter->pdev->irq);
-		e1000_intr(adapter->pdev->irq, netdev);
-		enable_irq(adapter->pdev->irq);
+		if (napi_schedule_prep(&adapter->napi)) {
+			adapter->total_tx_bytes = 0;
+			adapter->total_tx_packets = 0;
+			adapter->total_rx_bytes = 0;
+			adapter->total_rx_packets = 0;
+			__napi_schedule(&adapter->napi);
+		}
 		break;
 	}
 }

Thanks,
Fengguang

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-26 15:32   ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-26 16:28     ` Eric Dumazet
  2016-07-27 15:01       ` Fengguang Wu
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2016-07-26 16:28 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> Hi Eric,
> 
> It works!
> 
> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> Greetings,
> >>
> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> linux-stable. It happens when running
> >>
> >>         modprobe netconsole netconsole=@/,$port@$server/
> >>
> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> >> [   39.943285] netpoll: netconsole: local port 6665
> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> >> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> >> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> >> [   39.944526] Call Trace:
> >> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
> >> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> >> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> >> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> >> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> >> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> >> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
> >> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> >> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> >> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> >> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> >> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> >> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> >> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> >> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> >> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
> >> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> >> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
> >> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> >> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> >> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> >> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
> >> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> >> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> >> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> >> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> >> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> >> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> [   39.946384] console [netcon0] enabled
> >> [   39.946514] netconsole: network logging started
> >>
> >> Can this be possibly fixed?
> >
> >Could you try this ?
> >
> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> > {
> > 	struct e1000_adapter *adapter = netdev_priv(netdev);
> >
> >-	disable_irq(adapter->pdev->irq);
> >-	e1000_intr(adapter->pdev->irq, netdev);
> >-	enable_irq(adapter->pdev->irq);
> >+	if (napi_schedule_prep(&adapter->napi)) {
> >+		adapter->total_tx_bytes = 0;
> >+		adapter->total_tx_packets = 0;
> >+		adapter->total_rx_bytes = 0;
> >+		adapter->total_rx_packets = 0;
> >+		__napi_schedule(&adapter->napi);
> >+	}
> 
> The machines are actually running e1000e driver, so I copied your
> approach to e1000e and it works:
> 
> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
> kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
> kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
> kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
> kern  :info  : [   16.114284] console [netcon0] enabled
> kern  :info  : [   16.114550] netconsole: network logging started
> 
> However I'm not sure if it'll have side effects, because this
> effectively disables the various checks in e1000_intr() and
> e1000_intr_msi().
> 

As far as netpoll is concerned, this should not matter.

We only want to drain packets from TX rings.

I have no idea why you hit this issue only recently, since this looks a
rather old bug to me ?

> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 9b4ec13..4f89873 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
>  	case E1000E_INT_MODE_MSIX:
>  		e1000_intr_msix(adapter->pdev->irq, netdev);
>  		break;
> -	case E1000E_INT_MODE_MSI:
> -		disable_irq(adapter->pdev->irq);
> -		e1000_intr_msi(adapter->pdev->irq, netdev);
> -		enable_irq(adapter->pdev->irq);
> -		break;
>  	default:		/* E1000E_INT_MODE_LEGACY */
> -		disable_irq(adapter->pdev->irq);
> -		e1000_intr(adapter->pdev->irq, netdev);
> -		enable_irq(adapter->pdev->irq);
> +		if (napi_schedule_prep(&adapter->napi)) {
> +			adapter->total_tx_bytes = 0;
> +			adapter->total_tx_packets = 0;
> +			adapter->total_rx_bytes = 0;
> +			adapter->total_rx_packets = 0;
> +			__napi_schedule(&adapter->napi);
> +		}
>  		break;
>  	}
>  }
> 
> Thanks,
> Fengguang

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-26 16:28     ` Eric Dumazet
@ 2016-07-27 15:01       ` Fengguang Wu
  2016-07-27 18:50         ` Eric Dumazet
  0 siblings, 1 reply; 17+ messages in thread
From: Fengguang Wu @ 2016-07-27 15:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
>> Hi Eric,
>>
>> It works!
>>
>> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> >> Greetings,
>> >>
>> >> This BUG message can be found in recent kernels as well as v4.4 and
>> >> linux-stable. It happens when running
>> >>
>> >>         modprobe netconsole netconsole=@/,$port@$server/
>> >>
>> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
>> >> [   39.943285] netpoll: netconsole: local port 6665
>> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> >> [   39.943609] netpoll: netconsole: interface 'eth0'
>> >> [   39.943756] netpoll: netconsole: remote port 6672
>> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> >> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
>> >> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
>> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> >> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> >> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
>> >> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
>> >> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
>> >> [   39.944526] Call Trace:
>> >> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
>> >> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
>> >> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
>> >> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
>> >> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
>> >> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
>> >> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
>> >> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
>> >> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
>> >> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
>> >> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
>> >> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
>> >> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
>> >> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
>> >> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
>> >> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
>> >> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
>> >> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
>> >> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
>> >> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
>> >> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
>> >> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
>> >> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
>> >> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
>> >> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
>> >> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
>> >> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
>> >> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
>> >> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> >> [   39.946384] console [netcon0] enabled
>> >> [   39.946514] netconsole: network logging started
>> >>
>> >> Can this be possibly fixed?
>> >
>> >Could you try this ?
>> >
>> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
>> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
>> > {
>> > 	struct e1000_adapter *adapter = netdev_priv(netdev);
>> >
>> >-	disable_irq(adapter->pdev->irq);
>> >-	e1000_intr(adapter->pdev->irq, netdev);
>> >-	enable_irq(adapter->pdev->irq);
>> >+	if (napi_schedule_prep(&adapter->napi)) {
>> >+		adapter->total_tx_bytes = 0;
>> >+		adapter->total_tx_packets = 0;
>> >+		adapter->total_rx_bytes = 0;
>> >+		adapter->total_rx_packets = 0;
>> >+		__napi_schedule(&adapter->napi);
>> >+	}
>>
>> The machines are actually running e1000e driver, so I copied your
>> approach to e1000e and it works:
>>
>> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
>> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
>> kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
>> kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
>> kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
>> kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
>> kern  :info  : [   16.114284] console [netcon0] enabled
>> kern  :info  : [   16.114550] netconsole: network logging started
>>
>> However I'm not sure if it'll have side effects, because this
>> effectively disables the various checks in e1000_intr() and
>> e1000_intr_msi().
>>
>
>As far as netpoll is concerned, this should not matter.
>
>We only want to drain packets from TX rings.

OK.

>I have no idea why you hit this issue only recently, since this looks a
>rather old bug to me ?

Yeah it's a rather old bug. It becomes obvious when we try to detect
and filter out buggy kernels for the machines that are expected to run
stable services. This BUG effectively blocks the stable machines from
booting because no clean kernel (v4.6.4, v4.6, v4.5, v4.4, ...) are
available at all. ;)

Thanks,
Fengguang

>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 9b4ec13..4f89873 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device *netdev)
>>  	case E1000E_INT_MODE_MSIX:
>>  		e1000_intr_msix(adapter->pdev->irq, netdev);
>>  		break;
>> -	case E1000E_INT_MODE_MSI:
>> -		disable_irq(adapter->pdev->irq);
>> -		e1000_intr_msi(adapter->pdev->irq, netdev);
>> -		enable_irq(adapter->pdev->irq);
>> -		break;
>>  	default:		/* E1000E_INT_MODE_LEGACY */
>> -		disable_irq(adapter->pdev->irq);
>> -		e1000_intr(adapter->pdev->irq, netdev);
>> -		enable_irq(adapter->pdev->irq);
>> +		if (napi_schedule_prep(&adapter->napi)) {
>> +			adapter->total_tx_bytes = 0;
>> +			adapter->total_tx_packets = 0;
>> +			adapter->total_rx_bytes = 0;
>> +			adapter->total_rx_packets = 0;
>> +			__napi_schedule(&adapter->napi);
>> +		}
>>  		break;
>>  	}
>>  }
>>
>> Thanks,
>> Fengguang
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-27 15:01       ` Fengguang Wu
@ 2016-07-27 18:50         ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-27 18:50 UTC (permalink / raw)
  To: Fengguang Wu
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Jeff Kirsher, Ye Xiaolong

On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote:
> On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> >> Hi Eric,
> >>
> >> It works!
> >>
> >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> >> Greetings,
> >> >>
> >> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> >> linux-stable. It happens when running
> >> >>
> >> >>         modprobe netconsole netconsole=@/,$port@$server/
> >> >>
> >> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> >> >> [   39.943285] netpoll: netconsole: local port 6665
> >> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> >> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> >> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> >> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> >> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> >> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> >> >> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> >> >> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> >> >> [   39.944526] Call Trace:
> >> >> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
> >> >> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> >> >> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> >> >> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> >> >> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> >> >> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> >> >> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
> >> >> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> >> >> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> >> >> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> >> >> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> >> >> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> >> >> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> >> >> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> >> >> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> >> >> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
> >> >> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> >> >> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
> >> >> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> >> >> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> >> >> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> >> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> >> >> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
> >> >> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> >> >> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> >> >> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> >> >> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> >> >> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> >> >> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> >> [   39.946384] console [netcon0] enabled
> >> >> [   39.946514] netconsole: network logging started
> >> >>
> >> >> Can this be possibly fixed?
> >> >
> >> >Could you try this ?
> >> >
> >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
> >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> >> > {
> >> > 	struct e1000_adapter *adapter = netdev_priv(netdev);
> >> >
> >> >-	disable_irq(adapter->pdev->irq);
> >> >-	e1000_intr(adapter->pdev->irq, netdev);
> >> >-	enable_irq(adapter->pdev->irq);
> >> >+	if (napi_schedule_prep(&adapter->napi)) {
> >> >+		adapter->total_tx_bytes = 0;
> >> >+		adapter->total_tx_packets = 0;
> >> >+		adapter->total_rx_bytes = 0;
> >> >+		adapter->total_rx_packets = 0;
> >> >+		__napi_schedule(&adapter->napi);
> >> >+	}
> >>
> >> The machines are actually running e1000e driver, so I copied your
> >> approach to e1000e and it works:
> >>
> >> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> >> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
> >> kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
> >> kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
> >> kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
> >> kern  :info  : [   16.114284] console [netcon0] enabled
> >> kern  :info  : [   16.114550] netconsole: network logging started
> >>
> >> However I'm not sure if it'll have side effects, because this
> >> effectively disables the various checks in e1000_intr() and
> >> e1000_intr_msi().
> >>
> >
> >As far as netpoll is concerned, this should not matter.
> >
> >We only want to drain packets from TX rings.
> 
> OK.
> 
> >I have no idea why you hit this issue only recently, since this looks a
> >rather old bug to me ?
> 
> Yeah it's a rather old bug. It becomes obvious when we try to detect
> and filter out buggy kernels for the machines that are expected to run
> stable services. This BUG effectively blocks the stable machines from
> booting because no clean kernel (v4.6.4, v4.6, v4.5, v4.4, ...) are
> available at all. ;)
> 
> Thanks,
> Fengguang
> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> index 9b4ec13..4f89873 100644
> >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device
*netdev)
> >>  	case E1000E_INT_MODE_MSIX:
> >>  		e1000_intr_msix(adapter->pdev->irq, netdev);
> >>  		break;
> >> -	case E1000E_INT_MODE_MSI:
> >> -		disable_irq(adapter->pdev->irq);
> >> -		e1000_intr_msi(adapter->pdev->irq, netdev);
> >> -		enable_irq(adapter->pdev->irq);
> >> -		break;
> >>  	default:		/* E1000E_INT_MODE_LEGACY */
> >> -		disable_irq(adapter->pdev->irq);
> >> -		e1000_intr(adapter->pdev->irq, netdev);
> >> -		enable_irq(adapter->pdev->irq);
> >> +		if (napi_schedule_prep(&adapter->napi)) {
> >> +			adapter->total_tx_bytes = 0;
> >> +			adapter->total_tx_packets = 0;
> >> +			adapter->total_rx_bytes = 0;
> >> +			adapter->total_rx_packets = 0;
> >> +			__napi_schedule(&adapter->napi);
> >> +		}
> >>  		break;
> >>  	}
> >>  }
> >>
> >> Thanks,
> >> Fengguang
> >
> 

About all netpoll implementations use disable_irq(), so I guess netpoll
is not compatible with threaded irqs.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-26  9:14 ` Eric Dumazet
       [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
  2016-07-26 15:32   ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
@ 2016-07-27 21:38   ` Jeff Kirsher
  2016-07-28  5:43     ` Eric Dumazet
  2 siblings, 1 reply; 17+ messages in thread
From: Jeff Kirsher @ 2016-07-27 21:38 UTC (permalink / raw)
  To: Eric Dumazet, Fengguang Wu
  Cc: LKML, netdev, Satyam Sharma, Thomas Gleixner, intel-wired-lan,
	Ye Xiaolong

[-- Attachment #1: Type: text/plain, Size: 1319 bytes --]

On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> Could you try this ?
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index
> f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> 938b3820b 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> *netdev)
>  {
>         struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -       disable_irq(adapter->pdev->irq);
> -       e1000_intr(adapter->pdev->irq, netdev);
> -       enable_irq(adapter->pdev->irq);
> +       if (napi_schedule_prep(&adapter->napi)) {
> +               adapter->total_tx_bytes = 0;
> +               adapter->total_tx_packets = 0;
> +               adapter->total_rx_bytes = 0;
> +               adapter->total_rx_packets = 0;
> +               __napi_schedule(&adapter->napi);
> +       }
>  }
>  #endif
>  

Since this fixes the issue Fengguang saw, will you be submitting a formal
patch Eric? (please) I can get this queued up for Dave's net tree as soon
as I receive the formal patch.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-27 21:38   ` Jeff Kirsher
@ 2016-07-28  5:43     ` Eric Dumazet
  2016-07-28 10:19       ` Sabrina Dubroca
  2016-07-28 23:28       ` [Intel-wired-lan] " Francois Romieu
  0 siblings, 2 replies; 17+ messages in thread
From: Eric Dumazet @ 2016-07-28  5:43 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Fengguang Wu, LKML, netdev, Satyam Sharma, Thomas Gleixner,
	intel-wired-lan, Ye Xiaolong

On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > Could you try this ?
> > 
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index
> > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > 938b3820b 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > *netdev)
> >  {
> >         struct e1000_adapter *adapter = netdev_priv(netdev);
> >  
> > -       disable_irq(adapter->pdev->irq);
> > -       e1000_intr(adapter->pdev->irq, netdev);
> > -       enable_irq(adapter->pdev->irq);
> > +       if (napi_schedule_prep(&adapter->napi)) {
> > +               adapter->total_tx_bytes = 0;
> > +               adapter->total_tx_packets = 0;
> > +               adapter->total_rx_bytes = 0;
> > +               adapter->total_rx_packets = 0;
> > +               __napi_schedule(&adapter->napi);
> > +       }
> >  }
> >  #endif
> >  
> 
> Since this fixes the issue Fengguang saw, will you be submitting a formal
> patch Eric? (please) I can get this queued up for Dave's net tree as soon
> as I receive the formal patch.

I would prefer having a definitive advice from Thomas Gleixner and/or
others if disable_irq() is forbidden from IRQ path.

As I said, about all netpoll() methods in net drivers use disable_irq()
so a lot of patches would be needed.

disable_irq() should then test this condition earlier, so that we can
detect potential bug, even if the IRQ is not (yet) threaded.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
       [not found]       ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
@ 2016-07-28  7:45         ` Thomas Gleixner
  2016-07-28  9:46           ` Valdis.Kletnieks
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-28  7:45 UTC (permalink / raw)
  To: nick
  Cc: Fengguang Wu, Eric Dumazet, LKML, netdev, Satyam Sharma,
	intel-wired-lan, Jeff Kirsher, Ye Xiaolong

On Tue, 26 Jul 2016, nick wrote:
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index f42129d..e1830af 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
>  		hw->get_link_status = 1;
>  		/* guard against interrupt when we're going down */
>  		if (!test_bit(__E1000_DOWN, &adapter->flags))
> -			schedule_delayed_work(&adapter->watchdog_task, 1);
> +			mod_work(&adapter->watchdog_task, jiffies + 1);

And that's not even funny anymore. Are you using a random generator to create
these patches?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-28  7:45         ` Thomas Gleixner
@ 2016-07-28  9:46           ` Valdis.Kletnieks
  0 siblings, 0 replies; 17+ messages in thread
From: Valdis.Kletnieks @ 2016-07-28  9:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: nick, Fengguang Wu, Eric Dumazet, LKML, netdev, Satyam Sharma,
	intel-wired-lan, Jeff Kirsher, Ye Xiaolong

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

On Thu, 28 Jul 2016 09:45:12 +0200, Thomas Gleixner said:
> On Tue, 26 Jul 2016, nick wrote:
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index f42129d..e1830af 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -3797,7 +3797,7 @@ static irqreturn_t e1000_intr(int irq, void *data)
> >  		hw->get_link_status = 1;
> >  		/* guard against interrupt when we're going down */
> >  		if (!test_bit(__E1000_DOWN, &adapter->flags))
> > -			schedule_delayed_work(&adapter->watchdog_task, 1);
> > +			mod_work(&adapter->watchdog_task, jiffies + 1);
>
> And that's not even funny anymore. Are you using a random generator to create
> these patches?

At some point, we need to decide if the occasional accidentally-correct
trivial patch from Nick is worth all the wasted maintainer time.


[-- Attachment #2: Type: application/pgp-signature, Size: 848 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-28  5:43     ` Eric Dumazet
@ 2016-07-28 10:19       ` Sabrina Dubroca
  2016-07-28 12:21         ` Thomas Gleixner
  2016-07-28 13:30         ` Fengguang Wu
  2016-07-28 23:28       ` [Intel-wired-lan] " Francois Romieu
  1 sibling, 2 replies; 17+ messages in thread
From: Sabrina Dubroca @ 2016-07-28 10:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jeff Kirsher, Fengguang Wu, LKML, netdev, Satyam Sharma,
	Thomas Gleixner, intel-wired-lan, Ye Xiaolong

2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> > On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > > Could you try this ?
> > > 
> > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > index
> > > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > > 938b3820b 100644
> > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > > *netdev)
> > >  {
> > >         struct e1000_adapter *adapter = netdev_priv(netdev);
> > >  
> > > -       disable_irq(adapter->pdev->irq);
> > > -       e1000_intr(adapter->pdev->irq, netdev);
> > > -       enable_irq(adapter->pdev->irq);
> > > +       if (napi_schedule_prep(&adapter->napi)) {
> > > +               adapter->total_tx_bytes = 0;
> > > +               adapter->total_tx_packets = 0;
> > > +               adapter->total_rx_bytes = 0;
> > > +               adapter->total_rx_packets = 0;
> > > +               __napi_schedule(&adapter->napi);
> > > +       }
> > >  }
> > >  #endif
> > >  
> > 
> > Since this fixes the issue Fengguang saw, will you be submitting a formal
> > patch Eric? (please) I can get this queued up for Dave's net tree as soon
> > as I receive the formal patch.
> 
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.
> 
> disable_irq() should then test this condition earlier, so that we can
> detect potential bug, even if the IRQ is not (yet) threaded.

The idea when this first came up was to skip the sleeping part of
disable_irq():

http://marc.info/?l=linux-netdev&m=142314159626052

This fell off my todolist and I didn't send the conversion patches,
which would basically look like this:


diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0b341e..b022691e680b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
 
 		vector = 0;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
-		e1000_intr_msix_rx(msix_irq, netdev);
+		if (disable_hardirq(msix_irq))
+			e1000_intr_msix_rx(msix_irq, netdev);
 		enable_irq(msix_irq);
 
 		vector++;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
-		e1000_intr_msix_tx(msix_irq, netdev);
+		if (disable_hardirq(msix_irq))
+			e1000_intr_msix_tx(msix_irq, netdev);
 		enable_irq(msix_irq);
 
 		vector++;
 		msix_irq = adapter->msix_entries[vector].vector;
-		disable_irq(msix_irq);
-		e1000_msix_other(msix_irq, netdev);
+		if (disable_hardirq(msix_irq))
+			e1000_msix_other(msix_irq, netdev);
 		enable_irq(msix_irq);
 	}
 
@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
 		e1000_intr_msix(adapter->pdev->irq, netdev);
 		break;
 	case E1000E_INT_MODE_MSI:
-		disable_irq(adapter->pdev->irq);
-		e1000_intr_msi(adapter->pdev->irq, netdev);
+		if (disable_hardirq(adapter->pdev->irq))
+			e1000_intr_msi(adapter->pdev->irq, netdev);
 		enable_irq(adapter->pdev->irq);
 		break;
 	default:		/* E1000E_INT_MODE_LEGACY */
-		disable_irq(adapter->pdev->irq);
-		e1000_intr(adapter->pdev->irq, netdev);
+		if (disable_hardirq(adapter->pdev->irq))
+			e1000_intr(adapter->pdev->irq, netdev);
 		enable_irq(adapter->pdev->irq);
 		break;
 	}


-- 
Sabrina

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-28 10:19       ` Sabrina Dubroca
@ 2016-07-28 12:21         ` Thomas Gleixner
  2016-07-28 13:30         ` Fengguang Wu
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2016-07-28 12:21 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Eric Dumazet, Jeff Kirsher, Fengguang Wu, LKML, netdev,
	Satyam Sharma, intel-wired-lan, Ye Xiaolong

On Thu, 28 Jul 2016, Sabrina Dubroca wrote:
> 2016-07-28, 07:43:55 +0200, Eric Dumazet wrote:
> > I would prefer having a definitive advice from Thomas Gleixner and/or
> > others if disable_irq() is forbidden from IRQ path.

Yes it is. Before we added threaded interrupt handlers it was not an issue,
but with (possibly) threaded interrupts it's an absolute no-no.

> > As I said, about all netpoll() methods in net drivers use disable_irq()
> > so a lot of patches would be needed.
> > 
> > disable_irq() should then test this condition earlier, so that we can
> > detect potential bug, even if the IRQ is not (yet) threaded.
> 
> The idea when this first came up was to skip the sleeping part of
> disable_irq():
> 
> http://marc.info/?l=linux-netdev&m=142314159626052
> 
> This fell off my todolist and I didn't send the conversion patches,
> which would basically look like this:
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 41f32c0b341e..b022691e680b 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
>  
>  		vector = 0;
>  		msix_irq = adapter->msix_entries[vector].vector;
> -		disable_irq(msix_irq);
> -		e1000_intr_msix_rx(msix_irq, netdev);
> +		if (disable_hardirq(msix_irq))
> +			e1000_intr_msix_rx(msix_irq, netdev);
>  		enable_irq(msix_irq);

That'll work nicely even when one of the affected interrupts is threaded.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-28 10:19       ` Sabrina Dubroca
  2016-07-28 12:21         ` Thomas Gleixner
@ 2016-07-28 13:30         ` Fengguang Wu
  1 sibling, 0 replies; 17+ messages in thread
From: Fengguang Wu @ 2016-07-28 13:30 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Eric Dumazet, Jeff Kirsher, LKML, netdev, Satyam Sharma,
	Thomas Gleixner, intel-wired-lan, Ye Xiaolong

Hi Sabrina,

>The idea when this first came up was to skip the sleeping part of
>disable_irq():
>
>http://marc.info/?l=linux-netdev&m=142314159626052
>
>This fell off my todolist and I didn't send the conversion patches,
>which would basically look like this:

Yes it works in the several machines that had the BUG!

[   23.806847] netpoll: netconsole: local port 6665
[   23.807145] netpoll: netconsole: local IPv4 address 0.0.0.0
[   23.807494] netpoll: netconsole: interface 'eth0'
[   23.807799] netpoll: netconsole: remote port 6646
[   23.808096] netpoll: netconsole: remote IPv4 address 192.168.1.1
[   23.808474] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
[   23.808910] netpoll: netconsole: local IP 192.168.1.161
[   23.811680] 28 Jul 19:42:10 ntpdate[376]: step time server 192.168.1.1 offset 1696.257557 sec
[   23.811886] console [netcon0] enabled
[   23.812131] netconsole: network logging started

Thanks,
Fengguang

>
>diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
>index 41f32c0b341e..b022691e680b 100644
>--- a/drivers/net/ethernet/intel/e1000e/netdev.c
>+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>@@ -6713,20 +6713,20 @@ static irqreturn_t e1000_intr_msix(int __always_unused irq, void *data)
>
> 		vector = 0;
> 		msix_irq = adapter->msix_entries[vector].vector;
>-		disable_irq(msix_irq);
>-		e1000_intr_msix_rx(msix_irq, netdev);
>+		if (disable_hardirq(msix_irq))
>+			e1000_intr_msix_rx(msix_irq, netdev);
> 		enable_irq(msix_irq);
>
> 		vector++;
> 		msix_irq = adapter->msix_entries[vector].vector;
>-		disable_irq(msix_irq);
>-		e1000_intr_msix_tx(msix_irq, netdev);
>+		if (disable_hardirq(msix_irq))
>+			e1000_intr_msix_tx(msix_irq, netdev);
> 		enable_irq(msix_irq);
>
> 		vector++;
> 		msix_irq = adapter->msix_entries[vector].vector;
>-		disable_irq(msix_irq);
>-		e1000_msix_other(msix_irq, netdev);
>+		if (disable_hardirq(msix_irq))
>+			e1000_msix_other(msix_irq, netdev);
> 		enable_irq(msix_irq);
> 	}
>
>@@ -6750,13 +6750,13 @@ static void e1000_netpoll(struct net_device *netdev)
> 		e1000_intr_msix(adapter->pdev->irq, netdev);
> 		break;
> 	case E1000E_INT_MODE_MSI:
>-		disable_irq(adapter->pdev->irq);
>-		e1000_intr_msi(adapter->pdev->irq, netdev);
>+		if (disable_hardirq(adapter->pdev->irq))
>+			e1000_intr_msi(adapter->pdev->irq, netdev);
> 		enable_irq(adapter->pdev->irq);
> 		break;
> 	default:		/* E1000E_INT_MODE_LEGACY */
>-		disable_irq(adapter->pdev->irq);
>-		e1000_intr(adapter->pdev->irq, netdev);
>+		if (disable_hardirq(adapter->pdev->irq))
>+			e1000_intr(adapter->pdev->irq, netdev);
> 		enable_irq(adapter->pdev->irq);
> 		br
>ak;
> 	}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Intel-wired-lan] [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
  2016-07-28  5:43     ` Eric Dumazet
  2016-07-28 10:19       ` Sabrina Dubroca
@ 2016-07-28 23:28       ` Francois Romieu
  1 sibling, 0 replies; 17+ messages in thread
From: Francois Romieu @ 2016-07-28 23:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jeff Kirsher, netdev, LKML, Ye Xiaolong, intel-wired-lan,
	Satyam Sharma, Thomas Gleixner

Eric Dumazet <eric.dumazet@gmail.com> :
[...]
> I would prefer having a definitive advice from Thomas Gleixner and/or
> others if disable_irq() is forbidden from IRQ path.
> 
> As I said, about all netpoll() methods in net drivers use disable_irq()
> so a lot of patches would be needed.

s/about all/many/

There has been a WARN_ONCE(!irqs_disabled() in netpoll_send_skb_on_dev for
quite some time now but it's apparently screened by too many tests to be
effective. :o/

-- 
Ueimor

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-07-28 23:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-26  3:50 [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26  9:14 ` Eric Dumazet
     [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26  9:45     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26  9:50     ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
     [not found]       ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
2016-07-28  7:45         ` Thomas Gleixner
2016-07-28  9:46           ` Valdis.Kletnieks
2016-07-26  9:50     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26 15:32   ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26 16:28     ` Eric Dumazet
2016-07-27 15:01       ` Fengguang Wu
2016-07-27 18:50         ` Eric Dumazet
2016-07-27 21:38   ` Jeff Kirsher
2016-07-28  5:43     ` Eric Dumazet
2016-07-28 10:19       ` Sabrina Dubroca
2016-07-28 12:21         ` Thomas Gleixner
2016-07-28 13:30         ` Fengguang Wu
2016-07-28 23:28       ` [Intel-wired-lan] " Francois Romieu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).