All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
@ 2017-09-28  8:51 Lijun Shen
  2017-09-28 15:33 ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Lijun Shen @ 2017-09-28  8:51 UTC (permalink / raw)
  To: intel-wired-lan

Hi,

Can you please look at the issue below? Thank you if any comments and suggestions.

Environment:
CPU                       : Westmere-EP 6C
Memory              : 24G
PCH                        : Tylersburg
Ether Ports         : Intel Corporation 82576 Gigabit Network Connection [8086:10c9]

Problem:
                When the OS boot up, kernel panic as soon as the ether port/igb driver receives any packets.

Log (whole log attached):
linux:~ # ifconfig ctrll0 10.163.177.16
linux:~ # ping 10.163.177.1
PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging request at 00000000bf3c78ed
[  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
[  303.725089] PGD 0
[  303.727117] Oops: 0000 [#1] SMP
[  303.730375] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas coretemp kvm_intel kvm usb_storage sd_mod ahci libahci crct10dif_pclmul iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support e1000e crc32c_intel libata ehci_pci uhci_hcd i7core_edac ppdev igb ehci_hcd ipmi_si ixgbe mpt2sas parport_pc aesni_intel mdio edac_core ioatdma ptp aes_x86_64 pps_core i2c_algo_bit lrw gf128mul glue_helper ablk_helper usbcore cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core raid_class scsi_transport_sas acpi_cpufreq usb_common parport ipmi_msghandler dca shpchp button processor sg scsi_mod autofs4
[  303.788756] Supported: No, Unsupported modules are loaded
[  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N  3.12.53-60.30-default #1
[  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , BIOS R12A01 2017-07-06
[  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
[  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
[  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX: 0000000000000054
[  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI: ffff8805e2d5ae88
[  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09: 0000000000000000
[  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12: 0000000000000000
[  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15: 000000000000000e
[  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000) knlGS:0000000000000000
[  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4: 00000000000007f0
[  303.888357] Stack:
[  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50 ffffffff81d39e48
[  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700 ffffffff81d39e48
[  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880 0000000000000001
[  303.912727] Call Trace:
[  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
[  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
[  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 [igb]
[  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
[  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
[  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
[  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
[  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
[  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
[  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
[  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
[  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
[  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
[  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
[  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
[  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
[  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
[  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 c0 66 09
[  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
[  304.035999]  RSP <ffff88063f203d20>
[  304.039482] CR2: 00000000bf3c78ed
1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data object 651 -> 512
data.
[  304.193794] Kernel panic - not syncing: Fatal exception in interrupt

BR//Lijun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170928/df5247cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: board2_G3_R12A02_092403_secboot.7z
Type: application/octet-stream
Size: 34344 bytes
Desc: board2_G3_R12A02_092403_secboot.7z
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170928/df5247cb/attachment-0001.obj>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-28  8:51 [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received Lijun Shen
@ 2017-09-28 15:33 ` Alexander Duyck
  2017-09-29  6:05   ` Robin Nie
  2017-09-29  7:18   ` Lijun Shen
  0 siblings, 2 replies; 9+ messages in thread
From: Alexander Duyck @ 2017-09-28 15:33 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi,
>
>
>
> Can you please look at the issue below? Thank you if any comments and
> suggestions.
>
>
>
> Environment:
>
> CPU                       : Westmere-EP 6C
>
> Memory              : 24G
>
> PCH                        : Tylersburg
>
> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
> [8086:10c9]
>
>
>
> Problem:
>
>                 When the OS boot up, kernel panic as soon as the ether
> port/igb driver receives any packets.
>
>
>
> Log (whole log attached):
>
> linux:~ # ifconfig ctrll0 10.163.177.16
>
> linux:~ # ping 10.163.177.1
>
> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging request
> at 00000000bf3c78ed
>
> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  303.725089] PGD 0
>
> [  303.727117] Oops: 0000 [#1] SMP
>
> [  303.730375] Modules linked in: af_packet iscsi_ibft iscsi_boot_sysfs
> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas coretemp
> kvm_intel kvm usb_storage sd_mod ahci libahci crct10dif_pclmul iTCO_wdt
> crc32_pclmul gpio_ich iTCO_vendor_support e1000e crc32c_intel libata
> ehci_pci uhci_hcd i7core_edac ppdev igb ehci_hcd ipmi_si ixgbe mpt2sas
> parport_pc aesni_intel mdio edac_core ioatdma ptp aes_x86_64 pps_core
> i2c_algo_bit lrw gf128mul glue_helper ablk_helper usbcore cryptd serio_raw
> pcspkr lpc_ich i2c_i801 mfd_core raid_class scsi_transport_sas acpi_cpufreq
> usb_common parport ipmi_msghandler dca shpchp button processor sg scsi_mod
> autofs4
>
> [  303.788756] Supported: No, Unsupported modules are loaded
>
> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
> 3.12.53-60.30-default #1
>
> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , BIOS
> R12A01 2017-07-06
>
> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
> ffffffff81c00000
>
> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
> inet_gro_receive+0x39/0x200
>
> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>
> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
> 0000000000000054
>
> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
> ffff8805e2d5ae88
>
> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
> 0000000000000000
>
> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
> 0000000000000000
>
> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
> 000000000000000e
>
> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
> knlGS:0000000000000000
>
> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
> 00000000000007f0
>
> [  303.888357] Stack:
>
> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
> ffffffff81d39e48
>
> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
> ffffffff81d39e48
>
> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
> 0000000000000001
>
> [  303.912727] Call Trace:
>
> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>
> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>
> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 [igb]
>
> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>
> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>
> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>
> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>
> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>
> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>
> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>
> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>
> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>
> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>
> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>
> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>
> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>
> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>
> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d 45 14
> 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d 85 c0 74 0c
> <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 c0 66 09
>
> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  304.035999]  RSP <ffff88063f203d20>
>
> [  304.039482] CR2: 00000000bf3c78ed
>
> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>
> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data object
> 651 -> 512
>
> data.
>
> [  304.193794] Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
> BR//Lijun

Can you clarify what distribution and driver version you are working
with? Have any additional patches been applied? If so have you tried
testing without those patches?

Based on your kernel version it looks like a SLES 12 kernel, thought
it isn't the latest update to that kernel. If your distribution has
updates available I would suggest checking with them first to make
sure you have the latest version of their kernel supported for your
release as this trace looks like it could possibly be a kernel bug
instead of bug in the driver itself. For example we have seen
something similar in the past as a result of flow dissector
initializing after the network driver was loaded which resulted in
uninitialized state in the flow dissector causing issues.

One thing you might try as a debugging step would be to boot the OS
with the igb driver blacklisted and then manually insert it after the
kernel has been fully booted using insmod. If that makes the issue
disappear then the kernel likely has an issue with the initialization
order for the components of the network stack.

Secondly we would need to know what version of the driver you are
using. Are you using the driver included with the distribution or did
you build a separate driver for use with the hardware?

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-28 15:33 ` Alexander Duyck
@ 2017-09-29  6:05   ` Robin Nie
  2017-09-29  7:18   ` Lijun Shen
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Nie @ 2017-09-29  6:05 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alexander,

Thanks a lot for your quick reply.
Below is info you requested:

SLES12 SP1, kernel version:
Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f)
Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).

Igb driver version:
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k   
Should be included in the distribution and not sure about if any additional patches are used.

NIC info Correction :

Intel Corporation I350 Gigabit Network Connection (rev 01)

Thanks
BR/Robin

-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck at gmail.com] 
Sent: Thursday, September 28, 2017 11:33 PM
To: Lijun Shen <lijun.shen@ericsson.com>
Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received

On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi,
>
>
>
> Can you please look at the issue below? Thank you if any comments and 
> suggestions.
>
>
>
> Environment:
>
> CPU                       : Westmere-EP 6C
>
> Memory              : 24G
>
> PCH                        : Tylersburg
>
> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
> [8086:10c9]
>
>
>
> Problem:
>
>                 When the OS boot up, kernel panic as soon as the ether 
> port/igb driver receives any packets.
>
>
>
> Log (whole log attached):
>
> linux:~ # ifconfig ctrll0 10.163.177.16
>
> linux:~ # ping 10.163.177.1
>
> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging 
> request at 00000000bf3c78ed
>
> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  303.725089] PGD 0
>
> [  303.727117] Oops: 0000 [#1] SMP
>
> [  303.730375] Modules linked in: af_packet iscsi_ibft 
> iscsi_boot_sysfs
> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas coretemp 
> kvm_intel kvm usb_storage sd_mod ahci libahci crct10dif_pclmul 
> iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support e1000e crc32c_intel 
> libata ehci_pci uhci_hcd i7core_edac ppdev igb ehci_hcd ipmi_si ixgbe 
> mpt2sas parport_pc aesni_intel mdio edac_core ioatdma ptp aes_x86_64 
> pps_core i2c_algo_bit lrw gf128mul glue_helper ablk_helper usbcore 
> cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core raid_class 
> scsi_transport_sas acpi_cpufreq usb_common parport ipmi_msghandler dca 
> shpchp button processor sg scsi_mod
> autofs4
>
> [  303.788756] Supported: No, Unsupported modules are loaded
>
> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
> 3.12.53-60.30-default #1
>
> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , 
> BIOS
> R12A01 2017-07-06
>
> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
> ffffffff81c00000
>
> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
> inet_gro_receive+0x39/0x200
>
> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>
> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
> 0000000000000054
>
> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
> ffff8805e2d5ae88
>
> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
> 0000000000000000
>
> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
> 0000000000000000
>
> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
> 000000000000000e
>
> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
> knlGS:0000000000000000
>
> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
> 00000000000007f0
>
> [  303.888357] Stack:
>
> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
> ffffffff81d39e48
>
> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
> ffffffff81d39e48
>
> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
> 0000000000000001
>
> [  303.912727] Call Trace:
>
> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>
> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>
> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 
> [igb]
>
> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>
> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>
> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>
> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>
> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>
> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>
> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>
> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>
> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>
> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>
> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>
> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>
> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>
> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>
> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d 
> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d 
> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 c0 
> 66 09
>
> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  304.035999]  RSP <ffff88063f203d20>
>
> [  304.039482] CR2: 00000000bf3c78ed
>
> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>
> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data 
> object
> 651 -> 512
>
> data.
>
> [  304.193794] Kernel panic - not syncing: Fatal exception in 
> interrupt
>
>
>
> BR//Lijun

Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?

Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.

One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.

Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-28 15:33 ` Alexander Duyck
  2017-09-29  6:05   ` Robin Nie
@ 2017-09-29  7:18   ` Lijun Shen
  2017-09-29 18:22     ` Alexander Duyck
  1 sibling, 1 reply; 9+ messages in thread
From: Lijun Shen @ 2017-09-29  7:18 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alexander,

Thanks a lot for your quick reply.
Below is info you requested:

SLES12 SP1, kernel version:
Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).

Igb driver version:
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k   
Should be included in the distribution and not sure about if any additional patches are used.

NIC info Correction :

Intel Corporation I350 Gigabit Network Connection (rev 01)

Thanks
BR/Robin

-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck at gmail.com] 
Sent: Thursday, September 28, 2017 11:33 PM
To: Lijun Shen <lijun.shen@ericsson.com>
Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received

On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi,
>
>
>
> Can you please look at the issue below? Thank you if any comments and 
> suggestions.
>
>
>
> Environment:
>
> CPU                       : Westmere-EP 6C
>
> Memory              : 24G
>
> PCH                        : Tylersburg
>
> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
> [8086:10c9]
>
>
>
> Problem:
>
>                 When the OS boot up, kernel panic as soon as the ether 
> port/igb driver receives any packets.
>
>
>
> Log (whole log attached):
>
> linux:~ # ifconfig ctrll0 10.163.177.16
>
> linux:~ # ping 10.163.177.1
>
> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging 
> request at 00000000bf3c78ed
>
> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  303.725089] PGD 0
>
> [  303.727117] Oops: 0000 [#1] SMP
>
> [  303.730375] Modules linked in: af_packet iscsi_ibft 
> iscsi_boot_sysfs
> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas coretemp 
> kvm_intel kvm usb_storage sd_mod ahci libahci crct10dif_pclmul 
> iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support e1000e crc32c_intel 
> libata ehci_pci uhci_hcd i7core_edac ppdev igb ehci_hcd ipmi_si ixgbe 
> mpt2sas parport_pc aesni_intel mdio edac_core ioatdma ptp aes_x86_64 
> pps_core i2c_algo_bit lrw gf128mul glue_helper ablk_helper usbcore 
> cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core raid_class 
> scsi_transport_sas acpi_cpufreq usb_common parport ipmi_msghandler dca 
> shpchp button processor sg scsi_mod
> autofs4
>
> [  303.788756] Supported: No, Unsupported modules are loaded
>
> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
> 3.12.53-60.30-default #1
>
> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , 
> BIOS
> R12A01 2017-07-06
>
> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
> ffffffff81c00000
>
> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
> inet_gro_receive+0x39/0x200
>
> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>
> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
> 0000000000000054
>
> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
> ffff8805e2d5ae88
>
> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
> 0000000000000000
>
> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
> 0000000000000000
>
> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
> 000000000000000e
>
> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
> knlGS:0000000000000000
>
> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>
> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
> 00000000000007f0
>
> [  303.888357] Stack:
>
> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
> ffffffff81d39e48
>
> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
> ffffffff81d39e48
>
> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
> 0000000000000001
>
> [  303.912727] Call Trace:
>
> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>
> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>
> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 
> [igb]
>
> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>
> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>
> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>
> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>
> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>
> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>
> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>
> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>
> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>
> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>
> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>
> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>
> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>
> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>
> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d 
> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d 
> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 c0 
> 66 09
>
> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>
> [  304.035999]  RSP <ffff88063f203d20>
>
> [  304.039482] CR2: 00000000bf3c78ed
>
> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>
> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data 
> object
> 651 -> 512
>
> data.
>
> [  304.193794] Kernel panic - not syncing: Fatal exception in 
> interrupt
>
>
>
> BR//Lijun

Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?

Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.

One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.

Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-29  7:18   ` Lijun Shen
@ 2017-09-29 18:22     ` Alexander Duyck
  2017-09-30  6:34       ` Lijun Shen
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2017-09-29 18:22 UTC (permalink / raw)
  To: intel-wired-lan

As I said in my earlier email. You should probably look at updating
your kernel first before we proceed any further. Specifically for
SLES12 SP1 there are several issues in the GRO stack that could be
causing the issue you are seeing. There was a kernel update,
3.12.67-60.64.18, for SLES12 SP1 that included fixes to GRO for a few
issues. You may want to try updating to at least that version to see
if the issues you are seeing persist.

- Alex

On Fri, Sep 29, 2017 at 12:18 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi Alexander,
>
> Thanks a lot for your quick reply.
> Below is info you requested:
>
> SLES12 SP1, kernel version:
> Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).
>
> Igb driver version:
> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
> Should be included in the distribution and not sure about if any additional patches are used.
>
> NIC info Correction :
>
> Intel Corporation I350 Gigabit Network Connection (rev 01)
>
> Thanks
> BR/Robin
>
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Thursday, September 28, 2017 11:33 PM
> To: Lijun Shen <lijun.shen@ericsson.com>
> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
>
> On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>> Hi,
>>
>>
>>
>> Can you please look at the issue below? Thank you if any comments and
>> suggestions.
>>
>>
>>
>> Environment:
>>
>> CPU                       : Westmere-EP 6C
>>
>> Memory              : 24G
>>
>> PCH                        : Tylersburg
>>
>> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
>> [8086:10c9]
>>
>>
>>
>> Problem:
>>
>>                 When the OS boot up, kernel panic as soon as the ether
>> port/igb driver receives any packets.
>>
>>
>>
>> Log (whole log attached):
>>
>> linux:~ # ifconfig ctrll0 10.163.177.16
>>
>> linux:~ # ping 10.163.177.1
>>
>> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging
>> request at 00000000bf3c78ed
>>
>> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>
>> [  303.725089] PGD 0
>>
>> [  303.727117] Oops: 0000 [#1] SMP
>>
>> [  303.730375] Modules linked in: af_packet iscsi_ibft
>> iscsi_boot_sysfs
>> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas coretemp
>> kvm_intel kvm usb_storage sd_mod ahci libahci crct10dif_pclmul
>> iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support e1000e crc32c_intel
>> libata ehci_pci uhci_hcd i7core_edac ppdev igb ehci_hcd ipmi_si ixgbe
>> mpt2sas parport_pc aesni_intel mdio edac_core ioatdma ptp aes_x86_64
>> pps_core i2c_algo_bit lrw gf128mul glue_helper ablk_helper usbcore
>> cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core raid_class
>> scsi_transport_sas acpi_cpufreq usb_common parport ipmi_msghandler dca
>> shpchp button processor sg scsi_mod
>> autofs4
>>
>> [  303.788756] Supported: No, Unsupported modules are loaded
>>
>> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
>> 3.12.53-60.30-default #1
>>
>> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 ,
>> BIOS
>> R12A01 2017-07-06
>>
>> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
>> ffffffff81c00000
>>
>> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
>> inet_gro_receive+0x39/0x200
>>
>> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>>
>> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
>> 0000000000000054
>>
>> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
>> ffff8805e2d5ae88
>>
>> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
>> 0000000000000000
>>
>> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
>> 0000000000000000
>>
>> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
>> 000000000000000e
>>
>> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
>> knlGS:0000000000000000
>>
>> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>
>> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
>> 00000000000007f0
>>
>> [  303.888357] Stack:
>>
>> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
>> ffffffff81d39e48
>>
>> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
>> ffffffff81d39e48
>>
>> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
>> 0000000000000001
>>
>> [  303.912727] Call Trace:
>>
>> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>>
>> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>>
>> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0
>> [igb]
>>
>> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>>
>> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>>
>> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>>
>> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>>
>> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>>
>> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>>
>> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>>
>> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>>
>> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>>
>> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>>
>> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>>
>> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>>
>> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>>
>> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>>
>> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d
>> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d
>> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 c0
>> 66 09
>>
>> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>
>> [  304.035999]  RSP <ffff88063f203d20>
>>
>> [  304.039482] CR2: 00000000bf3c78ed
>>
>> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>>
>> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data
>> object
>> 651 -> 512
>>
>> data.
>>
>> [  304.193794] Kernel panic - not syncing: Fatal exception in
>> interrupt
>>
>>
>>
>> BR//Lijun
>
> Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?
>
> Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.
>
> One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.
>
> Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?
>
> Thanks.
>
> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-29 18:22     ` Alexander Duyck
@ 2017-09-30  6:34       ` Lijun Shen
  2017-09-30 17:30         ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Lijun Shen @ 2017-09-30  6:34 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alex,

Thanks for your info.
We tried the following 2 kernel versions and kernel panic issue cannot be reproduced on these 2 kernel versions.
	SLES12SP1??(x86_64) - Kernel 3.12.69-60.64.29-default (ttyS0).
		Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
		Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
		Intel(R) PRO/1000 Network Driver - 3.2.5-k
	SLES12SP1??(x86_64) - Kernel 3.12.62-60.64.8-default (ttyS0).
		Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
		Intel(R) PRO/1000 Network Driver - 3.2.5-k
		Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k

BTW, What is the GRO stack???As you mentioned several issue, What issues may cause kernel panic in SLES12SP1 3.12.53-60.30?
Thanks a lot!

BR//Robin
-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck at gmail.com] 
Sent: Saturday, September 30, 2017 2:22 AM
To: Lijun Shen <lijun.shen@ericsson.com>
Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received

As I said in my earlier email. You should probably look at updating your kernel first before we proceed any further. Specifically for
SLES12 SP1 there are several issues in the GRO stack that could be causing the issue you are seeing. There was a kernel update, 3.12.67-60.64.18, for SLES12 SP1 that included fixes to GRO for a few issues. You may want to try updating to at least that version to see if the issues you are seeing persist.

- Alex

On Fri, Sep 29, 2017 at 12:18 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi Alexander,
>
> Thanks a lot for your quick reply.
> Below is info you requested:
>
> SLES12 SP1, kernel version:
> Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).
>
> Igb driver version:
> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k 
> Should be included in the distribution and not sure about if any additional patches are used.
>
> NIC info Correction :
>
> Intel Corporation I350 Gigabit Network Connection (rev 01)
>
> Thanks
> BR/Robin
>
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Thursday, September 28, 2017 11:33 PM
> To: Lijun Shen <lijun.shen@ericsson.com>
> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H 
> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; 
> Robin Nie <robin.nie@ericsson.com>; Sylar Tao 
> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric 
> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu 
> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any 
> packet received
>
> On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>> Hi,
>>
>>
>>
>> Can you please look at the issue below? Thank you if any comments and 
>> suggestions.
>>
>>
>>
>> Environment:
>>
>> CPU                       : Westmere-EP 6C
>>
>> Memory              : 24G
>>
>> PCH                        : Tylersburg
>>
>> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
>> [8086:10c9]
>>
>>
>>
>> Problem:
>>
>>                 When the OS boot up, kernel panic as soon as the 
>> ether port/igb driver receives any packets.
>>
>>
>>
>> Log (whole log attached):
>>
>> linux:~ # ifconfig ctrll0 10.163.177.16
>>
>> linux:~ # ping 10.163.177.1
>>
>> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging 
>> request at 00000000bf3c78ed
>>
>> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>
>> [  303.725089] PGD 0
>>
>> [  303.727117] Oops: 0000 [#1] SMP
>>
>> [  303.730375] Modules linked in: af_packet iscsi_ibft 
>> iscsi_boot_sysfs
>> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas 
>> coretemp kvm_intel kvm usb_storage sd_mod ahci libahci 
>> crct10dif_pclmul iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support 
>> e1000e crc32c_intel libata ehci_pci uhci_hcd i7core_edac ppdev igb 
>> ehci_hcd ipmi_si ixgbe mpt2sas parport_pc aesni_intel mdio edac_core 
>> ioatdma ptp aes_x86_64 pps_core i2c_algo_bit lrw gf128mul glue_helper 
>> ablk_helper usbcore cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core 
>> raid_class scsi_transport_sas acpi_cpufreq usb_common parport 
>> ipmi_msghandler dca shpchp button processor sg scsi_mod
>> autofs4
>>
>> [  303.788756] Supported: No, Unsupported modules are loaded
>>
>> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
>> 3.12.53-60.30-default #1
>>
>> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , 
>> BIOS
>> R12A01 2017-07-06
>>
>> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
>> ffffffff81c00000
>>
>> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
>> inet_gro_receive+0x39/0x200
>>
>> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>>
>> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
>> 0000000000000054
>>
>> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
>> ffff8805e2d5ae88
>>
>> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
>> 0000000000000000
>>
>> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
>> 0000000000000000
>>
>> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
>> 000000000000000e
>>
>> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
>> knlGS:0000000000000000
>>
>> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>
>> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
>> 00000000000007f0
>>
>> [  303.888357] Stack:
>>
>> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
>> ffffffff81d39e48
>>
>> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
>> ffffffff81d39e48
>>
>> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
>> 0000000000000001
>>
>> [  303.912727] Call Trace:
>>
>> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>>
>> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>>
>> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 
>> [igb]
>>
>> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>>
>> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>>
>> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>>
>> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>>
>> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>>
>> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>>
>> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>>
>> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>>
>> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>>
>> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>>
>> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>>
>> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>>
>> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>>
>> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>>
>> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d
>> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d
>> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31 
>> c0
>> 66 09
>>
>> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>
>> [  304.035999]  RSP <ffff88063f203d20>
>>
>> [  304.039482] CR2: 00000000bf3c78ed
>>
>> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>>
>> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data 
>> object
>> 651 -> 512
>>
>> data.
>>
>> [  304.193794] Kernel panic - not syncing: Fatal exception in 
>> interrupt
>>
>>
>>
>> BR//Lijun
>
> Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?
>
> Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.
>
> One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.
>
> Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?
>
> Thanks.
>
> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-30  6:34       ` Lijun Shen
@ 2017-09-30 17:30         ` Alexander Duyck
  2017-10-01  1:19           ` Robin Nie
  2017-10-01  2:04           ` Robin Nie
  0 siblings, 2 replies; 9+ messages in thread
From: Alexander Duyck @ 2017-09-30 17:30 UTC (permalink / raw)
  To: intel-wired-lan

Hi Robin,

GRO is an acronym for Generic Receive Offload. What it does is
essentially aggregate multiple TCP packets into one large packet. This
aggregation is performed in the stack.

There have historically been several bugs in the code that can lead to
issues such as CVE-2016-8666
(http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-8666) which
results in a stack overrun or kernel panic if multiple tunnel headers
are present. As your trace included references to GRO my thought was
that there may have been an issue there which is why I had suggested
updating the kernel first.

Thanks.

- Alex

On Fri, Sep 29, 2017 at 11:34 PM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi Alex,
>
> Thanks for your info.
> We tried the following 2 kernel versions and kernel panic issue cannot be reproduced on these 2 kernel versions.
>         SLES12SP1  (x86_64) - Kernel 3.12.69-60.64.29-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>         SLES12SP1  (x86_64) - Kernel 3.12.62-60.64.8-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
>
> BTW, What is the GRO stack?  As you mentioned several issue, What issues may cause kernel panic in SLES12SP1 3.12.53-60.30?
> Thanks a lot!
>
> BR//Robin
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Saturday, September 30, 2017 2:22 AM
> To: Lijun Shen <lijun.shen@ericsson.com>
> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
>
> As I said in my earlier email. You should probably look at updating your kernel first before we proceed any further. Specifically for
> SLES12 SP1 there are several issues in the GRO stack that could be causing the issue you are seeing. There was a kernel update, 3.12.67-60.64.18, for SLES12 SP1 that included fixes to GRO for a few issues. You may want to try updating to at least that version to see if the issues you are seeing persist.
>
> - Alex
>
> On Fri, Sep 29, 2017 at 12:18 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>> Hi Alexander,
>>
>> Thanks a lot for your quick reply.
>> Below is info you requested:
>>
>> SLES12 SP1, kernel version:
>> Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).
>>
>> Igb driver version:
>> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>> Should be included in the distribution and not sure about if any additional patches are used.
>>
>> NIC info Correction :
>>
>> Intel Corporation I350 Gigabit Network Connection (rev 01)
>>
>> Thanks
>> BR/Robin
>>
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> Sent: Thursday, September 28, 2017 11:33 PM
>> To: Lijun Shen <lijun.shen@ericsson.com>
>> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H
>> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>;
>> Robin Nie <robin.nie@ericsson.com>; Sylar Tao
>> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric
>> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu
>> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
>> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any
>> packet received
>>
>> On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>>> Hi,
>>>
>>>
>>>
>>> Can you please look at the issue below? Thank you if any comments and
>>> suggestions.
>>>
>>>
>>>
>>> Environment:
>>>
>>> CPU                       : Westmere-EP 6C
>>>
>>> Memory              : 24G
>>>
>>> PCH                        : Tylersburg
>>>
>>> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
>>> [8086:10c9]
>>>
>>>
>>>
>>> Problem:
>>>
>>>                 When the OS boot up, kernel panic as soon as the
>>> ether port/igb driver receives any packets.
>>>
>>>
>>>
>>> Log (whole log attached):
>>>
>>> linux:~ # ifconfig ctrll0 10.163.177.16
>>>
>>> linux:~ # ping 10.163.177.1
>>>
>>> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging
>>> request at 00000000bf3c78ed
>>>
>>> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  303.725089] PGD 0
>>>
>>> [  303.727117] Oops: 0000 [#1] SMP
>>>
>>> [  303.730375] Modules linked in: af_packet iscsi_ibft
>>> iscsi_boot_sysfs
>>> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas
>>> coretemp kvm_intel kvm usb_storage sd_mod ahci libahci
>>> crct10dif_pclmul iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support
>>> e1000e crc32c_intel libata ehci_pci uhci_hcd i7core_edac ppdev igb
>>> ehci_hcd ipmi_si ixgbe mpt2sas parport_pc aesni_intel mdio edac_core
>>> ioatdma ptp aes_x86_64 pps_core i2c_algo_bit lrw gf128mul glue_helper
>>> ablk_helper usbcore cryptd serio_raw pcspkr lpc_ich i2c_i801 mfd_core
>>> raid_class scsi_transport_sas acpi_cpufreq usb_common parport
>>> ipmi_msghandler dca shpchp button processor sg scsi_mod
>>> autofs4
>>>
>>> [  303.788756] Supported: No, Unsupported modules are loaded
>>>
>>> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
>>> 3.12.53-60.30-default #1
>>>
>>> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 ,
>>> BIOS
>>> R12A01 2017-07-06
>>>
>>> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
>>> ffffffff81c00000
>>>
>>> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
>>> inet_gro_receive+0x39/0x200
>>>
>>> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>>>
>>> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
>>> 0000000000000054
>>>
>>> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
>>> ffff8805e2d5ae88
>>>
>>> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
>>> 0000000000000000
>>>
>>> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
>>> 0000000000000000
>>>
>>> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
>>> 000000000000000e
>>>
>>> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
>>> knlGS:0000000000000000
>>>
>>> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>
>>> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
>>> 00000000000007f0
>>>
>>> [  303.888357] Stack:
>>>
>>> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
>>> ffffffff81d39e48
>>>
>>> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
>>> ffffffff81d39e48
>>>
>>> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
>>> 0000000000000001
>>>
>>> [  303.912727] Call Trace:
>>>
>>> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>>>
>>> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>>>
>>> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0
>>> [igb]
>>>
>>> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>>>
>>> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>>>
>>> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>>>
>>> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>>>
>>> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>>>
>>> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>>>
>>> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>>>
>>> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>>>
>>> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>>>
>>> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>>>
>>> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>>>
>>> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>>>
>>> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>>>
>>> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>>>
>>> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d
>>> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d
>>> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31
>>> c0
>>> 66 09
>>>
>>> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  304.035999]  RSP <ffff88063f203d20>
>>>
>>> [  304.039482] CR2: 00000000bf3c78ed
>>>
>>> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>>>
>>> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data
>>> object
>>> 651 -> 512
>>>
>>> data.
>>>
>>> [  304.193794] Kernel panic - not syncing: Fatal exception in
>>> interrupt
>>>
>>>
>>>
>>> BR//Lijun
>>
>> Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?
>>
>> Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.
>>
>> One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.
>>
>> Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?
>>
>> Thanks.
>>
>> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-30 17:30         ` Alexander Duyck
@ 2017-10-01  1:19           ` Robin Nie
  2017-10-01  2:04           ` Robin Nie
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Nie @ 2017-10-01  1:19 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alex,

Thanks for your info.
We will check the issues and bugs in this kernel version.

BR/Robin

-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck at gmail.com] 
Sent: Sunday, October 01, 2017 1:31 AM
To: Lijun Shen <lijun.shen@ericsson.com>
Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received

Hi Robin,

GRO is an acronym for Generic Receive Offload. What it does is essentially aggregate multiple TCP packets into one large packet. This aggregation is performed in the stack.

There have historically been several bugs in the code that can lead to issues such as CVE-2016-8666
(http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-8666) which results in a stack overrun or kernel panic if multiple tunnel headers are present. As your trace included references to GRO my thought was that there may have been an issue there which is why I had suggested updating the kernel first.

Thanks.

- Alex

On Fri, Sep 29, 2017 at 11:34 PM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi Alex,
>
> Thanks for your info.
> We tried the following 2 kernel versions and kernel panic issue cannot be reproduced on these 2 kernel versions.
>         SLES12SP1  (x86_64) - Kernel 3.12.69-60.64.29-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>         SLES12SP1  (x86_64) - Kernel 3.12.62-60.64.8-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - 
> version 4.0.1-k
>
> BTW, What is the GRO stack?  As you mentioned several issue, What issues may cause kernel panic in SLES12SP1 3.12.53-60.30?
> Thanks a lot!
>
> BR//Robin
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Saturday, September 30, 2017 2:22 AM
> To: Lijun Shen <lijun.shen@ericsson.com>
> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H 
> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; 
> Robin Nie <robin.nie@ericsson.com>; Sylar Tao 
> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric 
> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu 
> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any 
> packet received
>
> As I said in my earlier email. You should probably look at updating 
> your kernel first before we proceed any further. Specifically for
> SLES12 SP1 there are several issues in the GRO stack that could be causing the issue you are seeing. There was a kernel update, 3.12.67-60.64.18, for SLES12 SP1 that included fixes to GRO for a few issues. You may want to try updating to at least that version to see if the issues you are seeing persist.
>
> - Alex
>
> On Fri, Sep 29, 2017 at 12:18 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>> Hi Alexander,
>>
>> Thanks a lot for your quick reply.
>> Below is info you requested:
>>
>> SLES12 SP1, kernel version:
>> Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).
>>
>> Igb driver version:
>> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k 
>> Should be included in the distribution and not sure about if any additional patches are used.
>>
>> NIC info Correction :
>>
>> Intel Corporation I350 Gigabit Network Connection (rev 01)
>>
>> Thanks
>> BR/Robin
>>
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> Sent: Thursday, September 28, 2017 11:33 PM
>> To: Lijun Shen <lijun.shen@ericsson.com>
>> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H 
>> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; 
>> Robin Nie <robin.nie@ericsson.com>; Sylar Tao 
>> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric 
>> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu 
>> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
>> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as 
>> any packet received
>>
>> On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>>> Hi,
>>>
>>>
>>>
>>> Can you please look at the issue below? Thank you if any comments 
>>> and suggestions.
>>>
>>>
>>>
>>> Environment:
>>>
>>> CPU                       : Westmere-EP 6C
>>>
>>> Memory              : 24G
>>>
>>> PCH                        : Tylersburg
>>>
>>> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
>>> [8086:10c9]
>>>
>>>
>>>
>>> Problem:
>>>
>>>                 When the OS boot up, kernel panic as soon as the 
>>> ether port/igb driver receives any packets.
>>>
>>>
>>>
>>> Log (whole log attached):
>>>
>>> linux:~ # ifconfig ctrll0 10.163.177.16
>>>
>>> linux:~ # ping 10.163.177.1
>>>
>>> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging 
>>> request at 00000000bf3c78ed
>>>
>>> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  303.725089] PGD 0
>>>
>>> [  303.727117] Oops: 0000 [#1] SMP
>>>
>>> [  303.730375] Modules linked in: af_packet iscsi_ibft 
>>> iscsi_boot_sysfs
>>> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas 
>>> coretemp kvm_intel kvm usb_storage sd_mod ahci libahci 
>>> crct10dif_pclmul iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support 
>>> e1000e crc32c_intel libata ehci_pci uhci_hcd i7core_edac ppdev igb 
>>> ehci_hcd ipmi_si ixgbe mpt2sas parport_pc aesni_intel mdio edac_core 
>>> ioatdma ptp aes_x86_64 pps_core i2c_algo_bit lrw gf128mul 
>>> glue_helper ablk_helper usbcore cryptd serio_raw pcspkr lpc_ich 
>>> i2c_i801 mfd_core raid_class scsi_transport_sas acpi_cpufreq 
>>> usb_common parport ipmi_msghandler dca shpchp button processor sg 
>>> scsi_mod
>>> autofs4
>>>
>>> [  303.788756] Supported: No, Unsupported modules are loaded
>>>
>>> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
>>> 3.12.53-60.30-default #1
>>>
>>> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , 
>>> BIOS
>>> R12A01 2017-07-06
>>>
>>> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
>>> ffffffff81c00000
>>>
>>> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
>>> inet_gro_receive+0x39/0x200
>>>
>>> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>>>
>>> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
>>> 0000000000000054
>>>
>>> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
>>> ffff8805e2d5ae88
>>>
>>> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
>>> 0000000000000000
>>>
>>> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
>>> 0000000000000000
>>>
>>> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
>>> 000000000000000e
>>>
>>> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
>>> knlGS:0000000000000000
>>>
>>> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>
>>> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
>>> 00000000000007f0
>>>
>>> [  303.888357] Stack:
>>>
>>> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
>>> ffffffff81d39e48
>>>
>>> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
>>> ffffffff81d39e48
>>>
>>> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
>>> 0000000000000001
>>>
>>> [  303.912727] Call Trace:
>>>
>>> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>>>
>>> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>>>
>>> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 
>>> [igb]
>>>
>>> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>>>
>>> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>>>
>>> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>>>
>>> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>>>
>>> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>>>
>>> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>>>
>>> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>>>
>>> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>>>
>>> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>>>
>>> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>>>
>>> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>>>
>>> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>>>
>>> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>>>
>>> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>>>
>>> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d
>>> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d
>>> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31
>>> c0
>>> 66 09
>>>
>>> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  304.035999]  RSP <ffff88063f203d20>
>>>
>>> [  304.039482] CR2: 00000000bf3c78ed
>>>
>>> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>>>
>>> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data 
>>> object
>>> 651 -> 512
>>>
>>> data.
>>>
>>> [  304.193794] Kernel panic - not syncing: Fatal exception in 
>>> interrupt
>>>
>>>
>>>
>>> BR//Lijun
>>
>> Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?
>>
>> Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.
>>
>> One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.
>>
>> Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?
>>
>> Thanks.
>>
>> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received
  2017-09-30 17:30         ` Alexander Duyck
  2017-10-01  1:19           ` Robin Nie
@ 2017-10-01  2:04           ` Robin Nie
  1 sibling, 0 replies; 9+ messages in thread
From: Robin Nie @ 2017-10-01  2:04 UTC (permalink / raw)
  To: intel-wired-lan

Hi Alex,

At the same time, we happened to find that the kernel panic issue cannot be triggered again after rebuilding our BIOS image using different compiling and linking tool chain without any code change.
According to our current investigation and test result, it seems that the kernel panic is triggered by special BIOS image and bugs of Kernel 3.12.53-60.30 combination.

Because we are from BIOS team,  we are more concern about if it is a BIOS issue or BIOS building environment issue.
Do you have any comments or suggestion on BIOS side?

The attachment is the difference of map files of BIOS's CSMCORE module from 2 different build environment.
We can see .data and .bss segment size is obviously different.

Thanks,
BR/Robin

-----Original Message-----
From: Alexander Duyck [mailto:alexander.duyck at gmail.com] 
Sent: Sunday, October 01, 2017 1:31 AM
To: Lijun Shen <lijun.shen@ericsson.com>
Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; Robin Nie <robin.nie@ericsson.com>; Sylar Tao <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received

Hi Robin,

GRO is an acronym for Generic Receive Offload. What it does is essentially aggregate multiple TCP packets into one large packet. This aggregation is performed in the stack.

There have historically been several bugs in the code that can lead to issues such as CVE-2016-8666
(http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-8666) which results in a stack overrun or kernel panic if multiple tunnel headers are present. As your trace included references to GRO my thought was that there may have been an issue there which is why I had suggested updating the kernel first.

Thanks.

- Alex

On Fri, Sep 29, 2017 at 11:34 PM, Lijun Shen <lijun.shen@ericsson.com> wrote:
> Hi Alex,
>
> Thanks for your info.
> We tried the following 2 kernel versions and kernel panic issue cannot be reproduced on these 2 kernel versions.
>         SLES12SP1  (x86_64) - Kernel 3.12.69-60.64.29-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - version 4.0.1-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>         SLES12SP1  (x86_64) - Kernel 3.12.62-60.64.8-default (ttyS0).
>                 Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k
>                 Intel(R) PRO/1000 Network Driver - 3.2.5-k
>                 Intel(R) 10 Gigabit PCI Express Network Driver - 
> version 4.0.1-k
>
> BTW, What is the GRO stack?  As you mentioned several issue, What issues may cause kernel panic in SLES12SP1 3.12.53-60.30?
> Thanks a lot!
>
> BR//Robin
> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
> Sent: Saturday, September 30, 2017 2:22 AM
> To: Lijun Shen <lijun.shen@ericsson.com>
> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H 
> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; 
> Robin Nie <robin.nie@ericsson.com>; Sylar Tao 
> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric 
> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu 
> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as any 
> packet received
>
> As I said in my earlier email. You should probably look at updating 
> your kernel first before we proceed any further. Specifically for
> SLES12 SP1 there are several issues in the GRO stack that could be causing the issue you are seeing. There was a kernel update, 3.12.67-60.64.18, for SLES12 SP1 that included fixes to GRO for a few issues. You may want to try updating to at least that version to see if the issues you are seeing persist.
>
> - Alex
>
> On Fri, Sep 29, 2017 at 12:18 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>> Hi Alexander,
>>
>> Thanks a lot for your quick reply.
>> Below is info you requested:
>>
>> SLES12 SP1, kernel version:
>> Linux version 3.12.53-60.30-default (geeko at buildhost) (gcc version 4.8.5 (SUSE Linux) ) #1 SMP Wed Feb 10 14:41:46 UTC 2016 (e57129f) Welcome to SUSE Linux Enterprise Server 12 SP1  (x86_64) - Kernel 3.12.53-60.30-default (ttyS0).
>>
>> Igb driver version:
>> igb: Intel(R) Gigabit Ethernet Network Driver - version 5.2.18-k 
>> Should be included in the distribution and not sure about if any additional patches are used.
>>
>> NIC info Correction :
>>
>> Intel Corporation I350 Gigabit Network Connection (rev 01)
>>
>> Thanks
>> BR/Robin
>>
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck at gmail.com]
>> Sent: Thursday, September 28, 2017 11:33 PM
>> To: Lijun Shen <lijun.shen@ericsson.com>
>> Cc: intel-wired-lan at lists.osuosl.org; Hui Yu H 
>> <hui.h.yu@ericsson.com>; Sean Zhang N <sean.n.zhang@ericsson.com>; 
>> Robin Nie <robin.nie@ericsson.com>; Sylar Tao 
>> <sylar.tao@ericsson.com>; Shaoxia Ma <shaoxia.ma@ericsson.com>; Eric 
>> Zhang X <eric.x.zhang@ericsson.com>; Longfei Wu 
>> <longfei.wu@ericsson.com>; Yufeng Pan <yufeng.pan@ericsson.com>
>> Subject: Re: [Intel-wired-lan] Igb driver: Kernel panic as soon as 
>> any packet received
>>
>> On Thu, Sep 28, 2017 at 1:51 AM, Lijun Shen <lijun.shen@ericsson.com> wrote:
>>> Hi,
>>>
>>>
>>>
>>> Can you please look at the issue below? Thank you if any comments 
>>> and suggestions.
>>>
>>>
>>>
>>> Environment:
>>>
>>> CPU                       : Westmere-EP 6C
>>>
>>> Memory              : 24G
>>>
>>> PCH                        : Tylersburg
>>>
>>> Ether Ports         : Intel Corporation 82576 Gigabit Network Connection
>>> [8086:10c9]
>>>
>>>
>>>
>>> Problem:
>>>
>>>                 When the OS boot up, kernel panic as soon as the 
>>> ether port/igb driver receives any packets.
>>>
>>>
>>>
>>> Log (whole log attached):
>>>
>>> linux:~ # ifconfig ctrll0 10.163.177.16
>>>
>>> linux:~ # ping 10.163.177.1
>>>
>>> PING 10.163.177.[  303.711998] BUG: unable to handle kernel paging 
>>> request at 00000000bf3c78ed
>>>
>>> [  303.719064] IP: [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  303.725089] PGD 0
>>>
>>> [  303.727117] Oops: 0000 [#1] SMP
>>>
>>> [  303.730375] Modules linked in: af_packet iscsi_ibft 
>>> iscsi_boot_sysfs
>>> gsio(OEN) eri_ipmi(OEN) gephwid(OEN) msr intel_powerclamp uas 
>>> coretemp kvm_intel kvm usb_storage sd_mod ahci libahci 
>>> crct10dif_pclmul iTCO_wdt crc32_pclmul gpio_ich iTCO_vendor_support 
>>> e1000e crc32c_intel libata ehci_pci uhci_hcd i7core_edac ppdev igb 
>>> ehci_hcd ipmi_si ixgbe mpt2sas parport_pc aesni_intel mdio edac_core 
>>> ioatdma ptp aes_x86_64 pps_core i2c_algo_bit lrw gf128mul 
>>> glue_helper ablk_helper usbcore cryptd serio_raw pcspkr lpc_ich 
>>> i2c_i801 mfd_core raid_class scsi_transport_sas acpi_cpufreq 
>>> usb_common parport ipmi_msghandler dca shpchp button processor sg 
>>> scsi_mod
>>> autofs4
>>>
>>> [  303.788756] Supported: No, Unsupported modules are loaded
>>>
>>> [  303.794150] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           OE N
>>> 3.12.53-60.30-default #1
>>>
>>> [  303.802666] Hardware name: Ericsson AB CXC1060259/ROJ208840/3 , 
>>> BIOS
>>> R12A01 2017-07-06
>>>
>>> [  303.810571] task: ffffffff81c11460 ti: ffffffff81c00000 task.ti:
>>> ffffffff81c00000
>>>
>>> [  303.818041] RIP: 0010:[<ffffffff814a1029>]  [<ffffffff814a1029>]
>>> inet_gro_receive+0x39/0x200
>>>
>>> [  303.826484] RSP: 0018:ffff88063f203d20  EFLAGS: 00010206
>>>
>>> [  303.831789] RAX: 0000000000000001 RBX: ffff880602499880 RCX:
>>> 0000000000000054
>>>
>>> [  303.838911] RDX: ffff88062273174e RSI: ffff880602499880 RDI:
>>> ffff8805e2d5ae88
>>>
>>> [  303.846037] RBP: 0000000000000000 R08: 00000000bf3c78dd R09:
>>> 0000000000000000
>>>
>>> [  303.853161] R10: 0000000000000000 R11: 0000000000001043 R12:
>>> 0000000000000000
>>>
>>> [  303.860286] R13: ffff8805e2d5ae88 R14: 000000000000000e R15:
>>> 000000000000000e
>>>
>>> [  303.867411] FS:  0000000000000000(0000) GS:ffff88063f200000(0000)
>>> knlGS:0000000000000000
>>>
>>> [  303.875496] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>>>
>>> [  303.881232] CR2: 00000000bf3c78ed CR3: 0000000001c0c000 CR4:
>>> 00000000000007f0
>>>
>>> [  303.888357] Stack:
>>>
>>> [  303.890369]  ffff880602499880 0000000000000008 ffff8805e2d5ae50
>>> ffffffff81d39e48
>>>
>>> [  303.897821]  000000000000000e ffffffff81435ec4 ffff880622731700
>>> ffffffff81d39e48
>>>
>>> [  303.905275]  ffff880602499880 ffff8805e2d5ae50 ffff880602499880
>>> 0000000000000001
>>>
>>> [  303.912727] Call Trace:
>>>
>>> [  303.915182]  [<ffffffff81435ec4>] dev_gro_receive+0x1d4/0x290
>>>
>>> [  303.920923]  [<ffffffff81436210>] napi_gro_receive+0x20/0x90
>>>
>>> [  303.926588]  [<ffffffffa0252f60>] igb_clean_rx_irq+0x390/0x8c0 
>>> [igb]
>>>
>>> [  303.932954]  [<ffffffffa025379a>] igb_poll+0x30a/0x770 [igb]
>>>
>>> [  303.938620]  [<ffffffff81435bf0>] net_rx_action+0x140/0x240
>>>
>>> [  303.944197]  [<ffffffff8105e2d5>] __do_softirq+0xe5/0x230
>>>
>>> [  303.949593]  [<ffffffff8152d09c>] call_softirq+0x1c/0x30
>>>
>>> [  303.954906]  [<ffffffff81004665>] do_softirq+0x55/0x90
>>>
>>> [  303.960046]  [<ffffffff8105e575>] irq_exit+0x95/0xa0
>>>
>>> [  303.965011]  [<ffffffff8152d88e>] do_IRQ+0x4e/0xb0
>>>
>>> [  303.969805]  [<ffffffff8152386d>] common_interrupt+0x6d/0x6d
>>>
>>> [  303.975467]  [<ffffffff813f220f>] cpuidle_enter_state+0x4f/0xc0
>>>
>>> [  303.981383]  [<ffffffff813f2352>] cpuidle_idle_call+0xd2/0x210
>>>
>>> [  303.987219]  [<ffffffff8100ba0a>] arch_cpu_idle+0xa/0x30
>>>
>>> [  303.992531]  [<ffffffff810b06b1>] cpu_startup_entry+0xe1/0x270
>>>
>>> [  303.998364]  [<ffffffff81d50ea1>] start_kernel+0x43e/0x449
>>>
>>> [  304.003847]  [<ffffffff81d506a3>] x86_64_start_kernel+0x10c/0x11b
>>>
>>> [  304.009940] Code: 55 53 8b 6e 34 48 89 f3 48 8b 56 28 41 89 ec 8d
>>> 45 14 4c 01 e2 3b 46 30 77 38 0f b6 42 09 4c 8b 04 c5 e0 8b d3 81 4d
>>> 85 c0 74 0c <49> 83 78 10 00 74 05 80 3a 45 74 53 ba 01 00 00 00 31
>>> c0
>>> 66 09
>>>
>>> [  304.029897] RIP  [<ffffffff814a1029>] inet_gro_receive+0x39/0x200
>>>
>>> [  304.035999]  RSP <ffff88063f203d20>
>>>
>>> [  304.039482] CR2: 00000000bf3c78ed
>>>
>>> 1 (10.163.177.1)[  304.042850] ---[ end trace f18a2476a4cd658a ]---
>>>
>>> 56(84) bytes of[  304.044551] systemd-journald[171]: Compressed data 
>>> object
>>> 651 -> 512
>>>
>>> data.
>>>
>>> [  304.193794] Kernel panic - not syncing: Fatal exception in 
>>> interrupt
>>>
>>>
>>>
>>> BR//Lijun
>>
>> Can you clarify what distribution and driver version you are working with? Have any additional patches been applied? If so have you tried testing without those patches?
>>
>> Based on your kernel version it looks like a SLES 12 kernel, thought it isn't the latest update to that kernel. If your distribution has updates available I would suggest checking with them first to make sure you have the latest version of their kernel supported for your release as this trace looks like it could possibly be a kernel bug instead of bug in the driver itself. For example we have seen something similar in the past as a result of flow dissector initializing after the network driver was loaded which resulted in uninitialized state in the flow dissector causing issues.
>>
>> One thing you might try as a debugging step would be to boot the OS with the igb driver blacklisted and then manually insert it after the kernel has been fully booted using insmod. If that makes the issue disappear then the kernel likely has an issue with the initialization order for the components of the network stack.
>>
>> Secondly we would need to know what version of the driver you are using. Are you using the driver included with the distribution or did you build a separate driver for use with the hardware?
>>
>> Thanks.
>>
>> - Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: csmcore.map.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 772080 bytes
Desc: csmcore.map.docx
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20171001/465ca147/attachment-0001.docx>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-10-01  2:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28  8:51 [Intel-wired-lan] Igb driver: Kernel panic as soon as any packet received Lijun Shen
2017-09-28 15:33 ` Alexander Duyck
2017-09-29  6:05   ` Robin Nie
2017-09-29  7:18   ` Lijun Shen
2017-09-29 18:22     ` Alexander Duyck
2017-09-30  6:34       ` Lijun Shen
2017-09-30 17:30         ` Alexander Duyck
2017-10-01  1:19           ` Robin Nie
2017-10-01  2:04           ` Robin Nie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.