From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754551AbdIGHsE (ORCPT ); Thu, 7 Sep 2017 03:48:04 -0400 Received: from mail-wr0-f176.google.com ([209.85.128.176]:37291 "EHLO mail-wr0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754344AbdIGHsD (ORCPT ); Thu, 7 Sep 2017 03:48:03 -0400 X-Google-Smtp-Source: ADKCNb5mATxWOrfzG4+WAKlY8FXpv4O2bHRvtn07Tl/zf2TXAkkfCUYUD59MhPZu26zkzJxrqSaW5Q== Message-ID: <96385FADCFE64551887990B8626A2064@alyakaslap> From: "Alex Lyakas" To: Cc: , References: <39905A127A0F47DDA91035865D8C3319@alyakaslap> <20170906150201.GZ15437@linux.vnet.ibm.com> In-Reply-To: <20170906150201.GZ15437@linux.vnet.ibm.com> Subject: Re: cpu_needs_another_gp: unable to handle kernel paging request Date: Thu, 7 Sep 2017 10:47:57 +0300 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3528.331 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3528.331 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello Paul, Thank you for your response. Can you give us hint what does this panic indicate? A random kernel memory corruption? An improper use of an RCU primitive? A hardware issue? This happened only once in one of the production systems, and we don't have a reproduction scenario unfortunately. Thanks, Alex. -----Original Message----- From: Paul E. McKenney Sent: Wednesday, September 06, 2017 6:02 PM To: Alex Lyakas Cc: josh@joshtriplett.org ; linux-kernel@vger.kernel.org Subject: Re: cpu_needs_another_gp: unable to handle kernel paging request On Wed, Sep 06, 2017 at 12:53:42PM +0300, Alex Lyakas wrote: > Hello, > > Kernel 3.18.19 hit the following panic[1]. Can you please advise on > how to debug this further, or if there is any known issue that you > recognize. > > Thanks, > Alex. > > > [1] > Sep 5 01:05:02.092499 vsa-0000000f-vc-0 kernel: [1294776.890064] > BUG: unable to handle kernel paging request at fffffffffffffeda > Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.890892] > IP: [] cpu_needs_another_gp+0x25/0x80 > Sep 5 01:05:02.092517 vsa-0000000f-vc-0 kernel: [1294776.891007] > PGD 1c19067 PUD 1c1b067 PMD 0 > Sep 5 01:05:02.092518 vsa-0000000f-vc-0 kernel: [1294776.891007] > Oops: 0002 [#1] PREEMPT SMP > Sep 5 01:05:02.092520 vsa-0000000f-vc-0 kernel: [1294776.891007] > Modules linked in: xt_nat(E) veth(E) xt_addrtype(E) br_netfilter(E) > xfrm_user(E) xfrm4_tunnel(E) tunnel4(E) ipcomp(E) xfrm_ipcomp(E) > esp4(E) ah4(E) 8021q(E) garp(E) mrp(E) xt_multiport(E) sd_mod(E) > bonding(E) ib_iser(OE) iscsi_tcp(OE) libiscsi_tcp(OE) libiscsi(OE) > scsi_transport_iscsi(OE) dm_zcache(OE) xfs(OE) btrfs(OE) raid456(OE) > async_raid6_recov(E) async_memcpy(E) async_pq(E) async_xor(E) xor(E) > async_tx(E) raid6_pq(E) raid1(OE) md_mod(OE) rdma_ucm(OE) > ib_uverbs(OE) mlx4_ib(OE) mlx4_en(OE) ipt_MASQUERADE(E) > nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_nat_ipv4(E) nf_nat(E) > nf_conntrack_ipv4(E) nf_defrag_ipv4(E) xt_conntrack(E) > nf_conntrack(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_CHECKSUM(E) > iptable_mangle(E) xt_tcpudp(E) bridge(E) stp(E) llc(E) vxlan(E) > ip6_udp_tunnel(E) udp_tunnel(E) ptp(E) pps_core(E) > ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) > x_tables(E) mlx4_core(OE) deflate(E) ctr(E) twofish_generic(E) > twofish_avx_x86_64(E) twofish_x86_64_3way(E) twofish_x86_64(E) > twofish_common(E) camellia_generic(E) camellia_aesni_avx2(E) > camellia_aesni_avx_x86_64(E) camellia_x86_64(E) serpent_avx2(E) > serpent_avx_x86_64(E) serpent_sse2_x86_64(E) xts(E) > serpent_generic(E) blowfish_generic(E) blowfish_x86_64(E) > blowfish_common(E) cast5_avx_x86_64(E) cast5_generic(E) > cast_common(E) des3_ede_x86_64(E) des_generic(E) cmac(E) xcbc(E) > rmd160(E) isert_scst(OE) crypto_null(E) rdma_cm(OE) af_key(E) > iw_cm(OE) xfrm_algo(E) ib_cm(OE) ib_sa(OE) ib_mad(OE) ib_core(OE) > ib_addr(OE) compat(OE) iscsi_scst(OE) scst_utgt(OE) scst_vdisk(OE) > libcrc32c(E) scst(OE) nls_iso8859_1(E) kvm_intel(E) kvm(E) > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) > aesni_intel(E) nfsd(OE) aes_x86_64(E) lrw(E) gf128mul(E) > glue_helper(E) ablk_helper(E) cryptd(E) auth_rpcgss(E) nfs_acl(E) > mac_hid(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) > dm_multipath(OE) scsi_dh(E) ttm(E) drm_kms_helper(E) serio_raw(E) > drm(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) i2c_piix4(E) > i6300esb(E) lp(E) parport(E) dm_iostat(OE) ata_generic(E) > pata_acpi(E) ata_piix(E) libata(E) psmouse(E) scsi_mod(OE) > Sep 5 01:05:02.092522 vsa-0000000f-vc-0 kernel: [1294776.892666] > CPU: 5 PID: 14385 Comm: aws Tainted: G W OE > 3.18.19-zadara05 #1 > Sep 5 01:05:02.092523 vsa-0000000f-vc-0 kernel: [1294776.892666] > Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 > Sep 5 01:05:02.092524 vsa-0000000f-vc-0 kernel: [1294776.892666] > task: ffff880022da6540 ti: ffff88000a9a4000 task.ti: > ffff88000a9a4000 > Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666] > RIP: 0010:[] [] > cpu_needs_another_gp+0x25/0x80 > Sep 5 01:05:02.092525 vsa-0000000f-vc-0 kernel: [1294776.892666] > RSP: 0000:ffff8808bfca3e88 EFLAGS: 00010097 > Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666] > RAX: 0000000000000000 RBX: ffffffff81c55c40 RCX: fffffffffffffeda > Sep 5 01:05:02.092526 vsa-0000000f-vc-0 kernel: [1294776.892666] > RDX: fffffffffffffeda RSI: ffff8808bfcad600 RDI: ffffffff81c55c40 > Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666] > RBP: ffff8808bfca3e88 R08: 00000000000021ac R09: 0000000000000100 > Sep 5 01:05:02.092527 vsa-0000000f-vc-0 kernel: [1294776.892666] > R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000246 > Sep 5 01:05:02.092529 vsa-0000000f-vc-0 kernel: [1294776.892666] > R13: 0000000000000009 R14: 0000000000000100 R15: ffff8808bfcad600 > Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666] > FS: 00007f158f7fe700(0000) GS:ffff8808bfca0000(0000) > knlGS:0000000000000000 > Sep 5 01:05:02.092531 vsa-0000000f-vc-0 kernel: [1294776.892666] > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Sep 5 01:05:02.092533 vsa-0000000f-vc-0 kernel: [1294776.892666] > CR2: fffffffffffffeda CR3: 0000000741e12000 CR4: 00000000003407e0 > Sep 5 01:05:02.092554 vsa-0000000f-vc-0 kernel: [1294776.892666] > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Sep 5 01:05:02.092566 vsa-0000000f-vc-0 kernel: [1294776.892666] > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Sep 5 01:05:02.092568 vsa-0000000f-vc-0 kernel: [1294776.892666] Stack: > Sep 5 01:05:02.092569 vsa-0000000f-vc-0 kernel: [1294776.892666] > ffff8808bfca3ef8 ffffffff810d491c ffff88088e17d838 ffff88088e17d438 > Sep 5 01:05:02.092571 vsa-0000000f-vc-0 kernel: [1294776.892666] > ffff880022da6540 ffff88000a9a7fd8 ffff8808bfca3eb8 ffff880799cad868 > Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666] > 0000000000000004 0000000000000009 ffffffff81c0f0c8 0000000000000009 > Sep 5 01:05:02.092572 vsa-0000000f-vc-0 kernel: [1294776.892666] > Call Trace: > Sep 5 01:05:02.092573 vsa-0000000f-vc-0 kernel: [1294776.892666] > Sep 5 01:05:02.092574 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] rcu_process_callbacks+0xcc/0x610 > Sep 5 01:05:02.092576 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] __do_softirq+0xf5/0x320 > Sep 5 01:05:02.092578 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] irq_exit+0x115/0x120 > Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] smp_apic_timer_interrupt+0x4a/0x60 > Sep 5 01:05:02.092579 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] apic_timer_interrupt+0x6d/0x80 > Sep 5 01:05:02.092580 vsa-0000000f-vc-0 kernel: [1294776.892666] > Sep 5 01:05:02.092581 vsa-0000000f-vc-0 kernel: [1294776.892666] > [] ? system_call_fastpath+0x16/0x1b > Sep 5 01:05:02.092582 vsa-0000000f-vc-0 kernel: [1294776.892666] > Code: 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 8f 50 11 00 00 31 c0 > 48 8b 97 48 11 00 00 48 89 e5 48 39 d1 74 02 5d c3 48 8b 47 10 83 > 01 83 e0 01 48 83 c0 20 8b 44 87 20 85 c0 75 11 48 83 7e 48 > Sep 5 01:05:02.092585 vsa-0000000f-vc-0 kernel: [1294776.892666] > RIP [] cpu_needs_another_gp+0x25/0x80 > Sep 5 01:05:02.092586 vsa-0000000f-vc-0 kernel: [1294776.892666] > RSP > Sep 5 01:05:02.092587 vsa-0000000f-vc-0 kernel: [1294776.892666] > CR2: fffffffffffffeda > Sep 5 01:05:02.092588 vsa-0000000f-vc-0 kernel: [1294776.892666] > ---[ end trace 9b3c5d4642bb89b5 ]--- New one on me! If this is reproducible, and if you have some other version where it is not happening, do a bisection. If you have a set of patches that you carry on top of the stable kernel (for example, to support some new hardware), try reproducing on hardware that is supported natively by 3.18.19. Either way, CONFIG_DEBUG_OBJECTS_RCU_HEAD can be helpful, as can any number of other debugging Kconfig options. Thanx, Paul