From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E80EFC282CB for ; Tue, 5 Feb 2019 20:22:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B988F2175B for ; Tue, 5 Feb 2019 20:22:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730155AbfBEUWA (ORCPT ); Tue, 5 Feb 2019 15:22:00 -0500 Received: from pop3.seti.kr.ua ([91.202.132.4]:35409 "EHLO mail.seti.kr.ua" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726769AbfBEUWA (ORCPT ); Tue, 5 Feb 2019 15:22:00 -0500 Received: from [91.202.134.199] (helo=[192.168.0.145]) by mail.seti.kr.ua with esmtpa (Exim 4.68) (envelope-from ) id 1gr7ED-0007lJ-5v for netdev@vger.kernel.org; Tue, 05 Feb 2019 22:21:58 +0200 Subject: Re: Kernel panic in eth_header To: Netdev References: <18c17dde-5963-4412-2e98-ba44953f0ddd@seti.kr.ua> <19716555-3522-cbdd-a128-e2ec672f89cd@gmail.com> <16f5a810-f183-2874-c67d-d490f70f7bf6@gmail.com> From: Andrew Message-ID: <189be8e7-7126-06bf-67bf-53d56ea0723c@seti.kr.ua> Date: Tue, 5 Feb 2019 22:21:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: <16f5a810-f183-2874-c67d-d490f70f7bf6@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 05.02.2019 21:34, Florian Fainelli wrote: > On 2/5/19 8:57 AM, Eric Dumazet wrote: >> >> On 02/05/2019 08:29 AM, Andrew wrote: >>> Hi all. >>> >>> After upgrade on PPPoE BRAS to kernel 4.9.153 I've got an kernel panic after a 3 days of uptime. >>> >>> Unfortunately kernel is compiled w/o debug info; I rebuilt kernel with debug info enabled (kernel is compiled with same function addresses - I compare vmlinux symbol maps) - it says that panic is in net/ethernet/eth.c:88 >>> >>> Below there is a kernel panic trace. igb is from vendor, ver. 5.3.5.4. What extra info is needed? >>> >>> [263565.106441] BUG: unable to handle kernel paging request at ffff88015a4d2dd4 >>> [263565.113527] IP: [] eth_header+0x3b/0xc0 >>> [263565.119030] PGD 1e8f067 [263565.121474] PUD 0 >>> [263565.123580] >>> [263565.125166] Oops: 0002 [#1] SMP >>> [263565.128398] Modules linked in: xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter xt_length xt_TCPMSS xt_tcpudp xt_mark xt_dscp iptable_mangle ip_tables x_tables nf_nat_pptp nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_proto_gre nf_nat nf_conntrack sch_sfq sch_htb cls_u32 sch_ingress sch_prio sch_tbf cls_flow cls_fw act_police ifb 8021q mrp garp stp llc softdog pppoe pppox ppp_generic slhc i2c_nforce2 i2c_core igb(O) parport_pc dca parport thermal asus_atk0110 fan ptp k10temp hwmon pps_core nv_tco >>> [263565.176083] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G           O    4.9.153-x86_64 #1 >>> [263565.183996] Hardware name: System manufacturer System Product Name/M2N-E, BIOS ASUS M2N-E ACPI BIOS Revision 5001 03/23/2010 >>> [263565.195289] task: ffff88007d0f5200 task.stack: ffffc9000006c000 >>> [263565.201295] RIP: 0010:[] [] eth_header+0x3b/0xc0 >>> [263565.209225] RSP: 0018:ffff88007fa83c58  EFLAGS: 00010286 >>> [263565.214622] RAX: ffff88015a4d2dc8 RBX: 0000000000000008 RCX: ffff8800682434a0 >>> [263565.221843] RDX: ffff88015a4d2dc8 RSI: ffff88015a4d2dc8 RDI: ffff880077aab000 >>> [263565.229062] RBP: ffff88007b663d90 R08: ffff88007b663d90 R09: 0000000000000574 >>> [263565.236281] R10: ffff88007d1fa000 R11: 0000000000000000 R12: ffff8800682434a0 >>> [263565.243501] R13: ffff88007d1fa000 R14: 0000000000000574 R15: 0000000000000008 >>> [263565.250719] FS:  0000000000000000(0000) GS:ffff88007fa80000(0000) knlGS:0000000000000000 >>> [263565.258894] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [263565.264725] CR2: ffff88015a4d2dd4 CR3: 000000007ad73000 CR4: 00000000000006f0 >>> [263565.271944] Stack: >>> [263565.274041]  ffff880077aab000 ffff880068243400 ffff88007a745000 ffff8800682434a0 >>> [263565.281582]  0000000000000002 ffffffff81571d09 ffff880068243400 ffff88007fa83d00 >>> [263565.289121]  ffff88007a745000 ffff880077aab000 ffff88007a712000 ffffffff815a8c61 >>> [263565.296661] Call Trace: >>> [263565.299193]  [263565.301205] [] ? neigh_connected_output+0xa9/0x100 >>> [263565.307740]  [] ? ip_finish_output2+0x221/0x400 >>> [263565.313920]  [] ? nf_iterate+0x54/0x60 >>> [263565.319319]  [] ? ip_output+0x6a/0xf0 >>> [263565.324631]  [] ? nf_iterate+0x12/0x60 >>> [263565.330030]  [] ? ip_fragment.constprop.5+0x80/0x80 >>> [263565.336556]  [] ? ip_forward+0x396/0x480 >>> [263565.342128]  [] ? ip_check_defrag+0x1e0/0x1e0 >>> [263565.348134]  [] ? ip_rcv+0x2ae/0x370 >>> [263565.353361]  [] ? pppoe_rcv_core+0xd2/0x160 [pppoe] >>> [263565.359888]  [] ? ip_local_deliver_finish+0x1d0/0x1d0 >>> [263565.366586]  [] ? __netif_receive_skb_core+0x527/0xa80 >>> [263565.373373]  [] ? process_backlog+0x92/0x130 >>> [263565.379291]  [] ? net_rx_action+0x24d/0x390 >>> [263565.385124]  [] ? __do_softirq+0xf4/0x2a0 >>> [263565.390784]  [] ? irq_exit+0xbc/0xd0 >>> [263565.396008]  [] ? call_function_single_interrupt+0x96/0xa0 >>> [263565.403141]  [263565.405153] [] ? __sched_text_end+0x2/0x2 >>> [263565.410907]  [] ? native_safe_halt+0x2/0x10 >>> [263565.416741]  [] ? default_idle+0x18/0xd0 >>> [263565.422314]  [] ? cpu_startup_entry+0x126/0x220 >>> [263565.428492]  [] ? start_secondary+0x161/0x180 >>> [263565.434496] Code: 0e 00 00 00 53 89 d3 49 89 cc 4c 89 c5 45 89 ce e8 bb 8a fc ff 66 83 fb 01 48 89 c6 74 44 66 83 fb 04 74 3e 66 c1 c3 08 48 85 ed <66> 89 58 0c 74 40 8b 45 00 4d 85 e4 89 46 06 0f b7 45 04 66 89 >>> [263565.454534] RIP  [] eth_header+0x3b/0xc0 >>> [263565.460124]  RSP >>> [263565.463696] CR2: ffff88015a4d2dd4 >>> [263565.467104] ---[ end trace a1bcaf3618724adf ]--- >>> [263565.471807] Kernel panic - not syncing: Fatal exception in interrupt >>> [263565.478245] Kernel Offset: disabled >>> [263565.481818] Rebooting in 5 seconds.. >>> >> >> This is a well known issue, a fix should come shortly in stable branches > Is Peter or yourself doing the backport? David would only take care of > the most two recent stable kernels. > > Sorry about missing that change as part of the fragmenstack backport to > 4.9... I think that backport will be trivial - at least patch lays smoothly on 4.9 (just with offsets difference). I'll test it. Btw, maybe there's a some test conditions to quickly check if patch helps? Crash is reproducible with unpredictable interval (tens of hours of quite heavy load). >> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c >> index f8bbd693c19c247e41839c2d0b5318ca51b23ee8..d95b32af4a0e3f552405c9e61cc372729834160c 100644 >> --- a/net/ipv4/ip_fragment.c >> +++ b/net/ipv4/ip_fragment.c >> @@ -425,6 +425,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) >> * fragment. >> */ >> >> + err = -EINVAL; >> /* Find out where to put this fragment. */ >> prev_tail = qp->q.fragments_tail; >> if (!prev_tail) >> @@ -501,7 +502,6 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) >> >> discard_qp: >> inet_frag_kill(&qp->q); >> - err = -EINVAL; >> __IP_INC_STATS(net, IPSTATS_MIB_REASM_OVERLAPS); >> err: >> kfree_skb(skb); >> >> >> >