linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Fengguang Wu <fengguang.wu@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	netdev@vger.kernel.org, Satyam Sharma <satyam@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	intel-wired-lan@lists.osuosl.org,
	Jeff Kirsher <jeffrey.t.kirsher@intel.com>,
	Ye Xiaolong <xiaolong.ye@intel.com>
Subject: Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
Date: Wed, 27 Jul 2016 20:50:33 +0200	[thread overview]
Message-ID: <1469645433.17736.11.camel@edumazet-glaptop3.roam.corp.google.com> (raw)
In-Reply-To: <20160727150146.GA22982@wfg-t540p.sh.intel.com>

On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote:
> On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> >> Hi Eric,
> >>
> >> It works!
> >>
> >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> >> Greetings,
> >> >>
> >> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> >> linux-stable. It happens when running
> >> >>
> >> >>         modprobe netconsole netconsole=@/,$port@$server/
> >> >>
> >> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec
> >> >> [   39.943285] netpoll: netconsole: local port 6665
> >> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> >> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> >> [   39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
> >> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
> >> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> >> [   39.944518] Hardware name:                  /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> >> [   39.944522]  0000000000000000 ffffc90001f2f9e8 ffffffff813417d9 ffff88007faba5c0
> >> >> [   39.944524]  000000000000006e ffffc90001f2fa00 ffffffff810aec03 ffffffff81a25948
> >> >> [   39.944525]  ffffc90001f2fa28 ffffffff810aec9a ffff8803e5bd9400 ffff8803e50fbd68
> >> >> [   39.944526] Call Trace:
> >> >> [   39.944533]  [<ffffffff813417d9>] dump_stack+0x63/0x8a
> >> >> [   39.944536]  [<ffffffff810aec03>] ___might_sleep+0xd3/0x120
> >> >> [   39.944537]  [<ffffffff810aec9a>] __might_sleep+0x4a/0x80
> >> >> [   39.944541]  [<ffffffff810e4638>] synchronize_irq+0x38/0xa0
> >> >> [   39.944543]  [<ffffffff810e3c8e>] ? __irq_put_desc_unlock+0x1e/0x40
> >> >> [   39.944545]  [<ffffffff810e48e3>] ? __disable_irq_nosync+0x43/0x60
> >> >> [   39.944547]  [<ffffffff810e492c>] disable_irq+0x1c/0x20
> >> >> [   39.944559]  [<ffffffffa0220932>] e1000_netpoll+0xf2/0x120 [e1000e]
> >> >> [   39.944563]  [<ffffffff815f2bdc>] netpoll_poll_dev+0x5c/0x1a0
> >> >> [   39.944567]  [<ffffffff815bb361>] ? __kmalloc_reserve+0x31/0x90
> >> >> [   39.944569]  [<ffffffff815f2e8b>] netpoll_send_skb_on_dev+0x16b/0x250
> >> >> [   39.944572]  [<ffffffff815f325c>] netpoll_send_udp+0x2ec/0x450
> >> >> [   39.944576]  [<ffffffffa003cb62>] write_msg+0xb2/0xf0 [netconsole]
> >> >> [   39.944578]  [<ffffffff810e04e5>] call_console_drivers+0x115/0x120
> >> >> [   39.944580]  [<ffffffff810e1f13>] console_unlock+0x333/0x5c0
> >> >> [   39.944583]  [<ffffffff810e2c74>] register_console+0x1c4/0x380
> >> >> [   39.944586]  [<ffffffffa004f1c5>] init_netconsole+0x1c5/0x1000 [netconsole]
> >> >> [   39.944588]  [<ffffffffa004f000>] ? 0xffffffffa004f000
> >> >> [   39.944591]  [<ffffffff8100216d>] do_one_initcall+0x3d/0x150
> >> >> [   39.944592]  [<ffffffff810aec9a>] ? __might_sleep+0x4a/0x80
> >> >> [   39.944596]  [<ffffffff811f5098>] ? kmem_cache_alloc_trace+0x188/0x1e0
> >> >> [   39.944598]  [<ffffffff8118f871>] do_init_module+0x5f/0x1d8
> >> >> [   39.944602]  [<ffffffff81114009>] load_module+0x1429/0x1b40
> >> >> [   39.944604]  [<ffffffff81110cd0>] ? __symbol_put+0x40/0x40
> >> >> [   39.944607]  [<ffffffff8121f348>] ? kernel_read_file+0x178/0x1a0
> >> >> [   39.944608]  [<ffffffff8121f429>] ? kernel_read_file_from_fd+0x49/0x80
> >> >> [   39.944611]  [<ffffffff81114973>] SYSC_finit_module+0xc3/0xf0
> >> >> [   39.944614]  [<ffffffff811149be>] SyS_finit_module+0xe/0x10
> >> >> [   39.944617]  [<ffffffff816e5877>] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> >> [   39.946384] console [netcon0] enabled
> >> >> [   39.946514] netconsole: network logging started
> >> >>
> >> >> Can this be possibly fixed?
> >> >
> >> >Could you try this ?
> >> >
> >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644
> >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> >> > {
> >> > 	struct e1000_adapter *adapter = netdev_priv(netdev);
> >> >
> >> >-	disable_irq(adapter->pdev->irq);
> >> >-	e1000_intr(adapter->pdev->irq, netdev);
> >> >-	enable_irq(adapter->pdev->irq);
> >> >+	if (napi_schedule_prep(&adapter->napi)) {
> >> >+		adapter->total_tx_bytes = 0;
> >> >+		adapter->total_tx_packets = 0;
> >> >+		adapter->total_rx_bytes = 0;
> >> >+		adapter->total_rx_packets = 0;
> >> >+		__napi_schedule(&adapter->napi);
> >> >+	}
> >>
> >> The machines are actually running e1000e driver, so I copied your
> >> approach to e1000e and it works:
> >>
> >> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> >> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
> >> kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
> >> kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1
> >> kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
> >> kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
> >> kern  :info  : [   16.114284] console [netcon0] enabled
> >> kern  :info  : [   16.114550] netconsole: network logging started
> >>
> >> However I'm not sure if it'll have side effects, because this
> >> effectively disables the various checks in e1000_intr() and
> >> e1000_intr_msi().
> >>
> >
> >As far as netpoll is concerned, this should not matter.
> >
> >We only want to drain packets from TX rings.
> 
> OK.
> 
> >I have no idea why you hit this issue only recently, since this looks a
> >rather old bug to me ?
> 
> Yeah it's a rather old bug. It becomes obvious when we try to detect
> and filter out buggy kernels for the machines that are expected to run
> stable services. This BUG effectively blocks the stable machines from
> booting because no clean kernel (v4.6.4, v4.6, v4.5, v4.4, ...) are
> available at all. ;)
> 
> Thanks,
> Fengguang
> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> index 9b4ec13..4f89873 100644
> >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> >> @@ -6711,15 +6711,14 @@ static void e1000_netpoll(struct net_device
*netdev)
> >>  	case E1000E_INT_MODE_MSIX:
> >>  		e1000_intr_msix(adapter->pdev->irq, netdev);
> >>  		break;
> >> -	case E1000E_INT_MODE_MSI:
> >> -		disable_irq(adapter->pdev->irq);
> >> -		e1000_intr_msi(adapter->pdev->irq, netdev);
> >> -		enable_irq(adapter->pdev->irq);
> >> -		break;
> >>  	default:		/* E1000E_INT_MODE_LEGACY */
> >> -		disable_irq(adapter->pdev->irq);
> >> -		e1000_intr(adapter->pdev->irq, netdev);
> >> -		enable_irq(adapter->pdev->irq);
> >> +		if (napi_schedule_prep(&adapter->napi)) {
> >> +			adapter->total_tx_bytes = 0;
> >> +			adapter->total_tx_packets = 0;
> >> +			adapter->total_rx_bytes = 0;
> >> +			adapter->total_rx_packets = 0;
> >> +			__napi_schedule(&adapter->napi);
> >> +		}
> >>  		break;
> >>  	}
> >>  }
> >>
> >> Thanks,
> >> Fengguang
> >
> 

About all netpoll implementations use disable_irq(), so I guess netpoll
is not compatible with threaded irqs.

  reply	other threads:[~2016-07-27 18:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-26  3:50 [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26  9:14 ` Eric Dumazet
     [not found]   ` <20160726093224.GA10339@wfg-t540p.sh.intel.com>
2016-07-26  9:45     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26  9:50     ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Thomas Gleixner
     [not found]       ` <8578bb16-cd04-e8a5-c7f4-be061ede95b4@gmail.com>
2016-07-28  7:45         ` Thomas Gleixner
2016-07-28  9:46           ` Valdis.Kletnieks
2016-07-26  9:50     ` [PATCH] schedule function called for e1000 driver interrupt kbuild test robot
2016-07-26 15:32   ` [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 Fengguang Wu
2016-07-26 16:28     ` Eric Dumazet
2016-07-27 15:01       ` Fengguang Wu
2016-07-27 18:50         ` Eric Dumazet [this message]
2016-07-27 21:38   ` Jeff Kirsher
2016-07-28  5:43     ` Eric Dumazet
2016-07-28 10:19       ` Sabrina Dubroca
2016-07-28 12:21         ` Thomas Gleixner
2016-07-28 13:30         ` Fengguang Wu
2016-07-28 23:28       ` [Intel-wired-lan] " Francois Romieu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1469645433.17736.11.camel@edumazet-glaptop3.roam.corp.google.com \
    --to=eric.dumazet@gmail.com \
    --cc=fengguang.wu@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=satyam@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=xiaolong.ye@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).