linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fengguang Wu <fengguang.wu@intel.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Stephane Eranian <eranian@google.com>,
	Kan Liang <kan.liang@intel.com>
Subject: Linux 4.14-rc6: WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0
Date: Mon, 30 Oct 2017 07:27:36 +0100	[thread overview]
Message-ID: <20171030062736.tv4el5mkwg6tkeup@wfg-t540p.sh.intel.com> (raw)
In-Reply-To: <20171029225155.qcum5i75awrt5tzm@wfg-t540p.sh.intel.com>

CC perf maintainers.

On Sun, Oct 29, 2017 at 11:51:55PM +0100, Fengguang Wu wrote:
>Hi Linus,
>
>Up to now we see the below boot error/warnings when testing v4.14-rc6.
>
>They hit the RC release mainly due to various imperfections in 0day's
>auto bisection. So I manually list them here and CC the likely easy to
>debug ones to the corresponding maintainers in the followup emails.
>
>boot_successes: 4700
>boot_failures: 247
>
...
>WARNING:at_arch/x86/events/intel/core.c:#intel_pmu_handle_irq: 1

This happens rarely, hence hard to bisect:

[  189.480568] perf: interrupt took too long (5132 > 4982), lowering kernel.perf_event_max_sample_rate to 38000
[  189.690660] perf: interrupt took too long (6582 > 6415), lowering kernel.perf_event_max_sample_rate to 30000
[  189.901706] perf: interrupt took too long (8268 > 8227), lowering kernel.perf_event_max_sample_rate to 24000
[  272.841032] perfevents: irq loop stuck!
[  272.841038] ------------[ cut here ]------------
[  272.841046] WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0
[  272.841047] Modules linked in: xfs loop rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver btrfs xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod sg intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 kvm_intel ttm kvm irqbypass crct10dif_pclmul drm_kms_helper crc32_pclmul syscopyarea crc32c_intel sysfillrect sysimgblt snd_pcm ghash_clmulni_intel fb_sys_fops ahci pcbc libahci snd_timer nvme aesni_intel snd crypto_simd ipmi_si glue_helper mxm_wmi soundcore drm nvme_core cryptd ipmi_devintf pcspkr libata shpchp ipmi_msghandler wmi acpi_power_meter acpi_pad ip_tables
[  272.841083] CPU: 9 PID: 5377 Comm: usemem Not tainted 4.14.0-rc6 #1
[  272.841084] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRRFSDP1.86B.0271.R00.1510301446 10/30/2015
[  272.841085] task: ffff881011adcd00 task.stack: ffffc90008fac000
[  272.841087] RIP: 0010:intel_pmu_handle_irq+0x4a8/0x4c0
[  272.841089] RSP: 0000:ffff88103f445c00 EFLAGS: 00010086
[  272.841090] RAX: 000000000000001b RBX: 0000000000000064 RCX: 0000000000000000
[  272.841091] RDX: ffff88103f456180 RSI: ffff88103f44e018 RDI: ffff88103f44e018
[  272.841092] RBP: ffff88103f445df0 R08: 0000000000000000 R09: 000000000000001b
[  272.841093] R10: ffff88103f445c00 R11: 00000000000da200 R12: ffff88103f44a3a0
[  272.841094] R13: ffff88103a27e000 R14: 0000000000000040 R15: ffff88103f44a5a0
[  272.841095] FS:  00007f8bed879700(0000) GS:ffff88103f440000(0000) knlGS:0000000000000000
[  272.841096] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  272.841097] CR2: 00007401a7800000 CR3: 0000001015908006 CR4: 00000000003606e0
[  272.841098] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  272.841099] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  272.841100] Call Trace:
[  272.841101]  <NMI>
[  272.841111]  perf_event_nmi_handler+0x2c/0x50
[  272.841116]  ? sched_clock+0x9/0x10
[  272.841118]  ? sched_clock+0x9/0x10
[  272.841120]  ? perf_event_nmi_handler+0x2c/0x50
[  272.841130]  nmi_handle+0x71/0x130
[  272.841132]  default_do_nmi+0x53/0x110
[  272.841133]  do_nmi+0xec/0x140
[  272.841138]  end_repeat_nmi+0x1a/0x1e
[  272.841141] RIP: 0010:native_write_msr+0x6/0x30
[  272.841142] RSP: 0000:ffff88103f443e60 EFLAGS: 00000046
[  272.841143] RAX: 00000000000000b0 RBX: ffff88103a27e000 RCX: 000000000000038d
[  272.841144] RDX: 0000000000000000 RSI: 00000000000000b0 RDI: 000000000000038d
[  272.841145] RBP: ffff88103f443e80 R08: 0000000000000007 R09: 0000000000000000
[  272.841146] R10: ffff88103f443df0 R11: 00007f8bed878c20 R12: 000000000000000b
[  272.841147] R13: 0000000000000004 R14: 0000000000000000 R15: ffff88103f4548e8
[  272.841150]  ? native_write_msr+0x6/0x30
[  272.841152]  ? native_write_msr+0x6/0x30
[  272.841152]  </NMI>
[  272.841153]  <IRQ>
[  272.841155]  ? intel_pmu_enable_event+0x19c/0x1d0
[  272.841157]  x86_pmu_start+0x7a/0xa0
[  272.841159]  x86_pmu_enable+0x272/0x2e0
[  272.841166]  ? __perf_install_in_context+0x160/0x160
[  272.841168]  perf_pmu_enable+0x7/0x10
[  272.841170]  perf_mux_hrtimer_handler+0x1bc/0x1f0
[  272.841175]  __hrtimer_run_queues+0xdd/0x230
[  272.841177]  hrtimer_interrupt+0xa3/0x1f0
[  272.841179]  smp_apic_timer_interrupt+0x5f/0x140
[  272.841181]  apic_timer_interrupt+0x9d/0xb0
[  272.841182]  </IRQ>
[  272.841184] RIP: 0033:0x5631f31497fd
[  272.841185] RSP: 002b:00007f8bed878c38 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
[  272.841186] RAX: 0000000000000000 RBX: 000000081aead944 RCX: 000000003a329d08
[  272.841187] RDX: 0000000000000001 RSI: 000000081aead944 RDI: 000073c0d038b000
[  272.841188] RBP: 000000081aead944 R08: 00007f8bed878cdc R09: 0000000000000001
[  272.841189] R10: 00007f8bed878c20 R11: 00007f8bed878c20 R12: 0000000000000001
[  272.841190] R13: 00007f8bed878cdc R14: 000073c0d038b000 R15: 00000040d756ca20
[  272.841192] Code: ff ff 48 89 c2 e8 b9 81 05 00 66 90 48 8b bd 40 fe ff ff 57 9d 0f 1f 44 00 00 e9 b2 fd ff ff 48 c7 c7 22 db c8 81 e8 89 44 0d 00 <0f> ff e8 f1 a7 ff ff c6 05 8a 16 2e 01 01 e9 50 fe ff ff 0f 1f 
[  272.841217] ---[ end trace 53c053df5268aee8 ]---
[  272.841218] 

Thanks,
Fengguang

  parent reply	other threads:[~2017-10-30  6:27 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-23 11:03 Linux 4.14-rc6 Linus Torvalds
2017-10-29 22:51 ` Fengguang Wu
2017-10-29 23:02   ` [perf_event_ctx_lock_nested] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:97 Fengguang Wu
2017-10-30  8:42     ` Peter Zijlstra
2017-10-30  8:52       ` Fengguang Wu
2017-10-29 23:10   ` [o2nm_depend_item] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:52 Fengguang Wu
2017-10-29 23:23     ` Fengguang Wu
2017-10-30  1:48       ` Eric Ren
2017-10-30  2:04       ` piaojun
2017-10-29 23:18   ` [ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at mm/page_alloc.c:4150 Fengguang Wu
2017-10-30 11:05     ` Borislav Petkov
2017-10-30 14:01       ` Tyler Baicar
2017-10-30 14:06         ` Borislav Petkov
2017-10-30 14:17           ` Tyler Baicar
2017-10-30 14:56             ` Borislav Petkov
2017-10-30 17:20       ` Linus Torvalds
2017-10-30 17:42         ` Borislav Petkov
2017-10-30 17:46         ` Linus Torvalds
2017-10-30 17:49           ` Will Deacon
2017-10-30 18:00             ` Linus Torvalds
2017-10-30 20:14           ` Tyler Baicar
2017-10-31 10:38             ` Will Deacon
2017-10-31 12:29               ` Mark Rutland
     [not found]             ` <20171106224635.qopgsszwxzuitkpf@wfg-t540p.sh.intel.com>
2017-11-06 22:57               ` [v4.14-rc8 ghes_copy_tofrom_phys] BUG: sleeping function called from invalid context at lib/ioremap.c:165 Linus Torvalds
2017-11-06 23:20                 ` Fengguang Wu
2017-11-06 23:02               ` Borislav Petkov
2017-11-06 23:04                 ` Rafael J. Wysocki
2017-11-07 13:39                 ` Fengguang Wu
     [not found]               ` <20171106225354.6ucl4f4ipsjlntzl@wfg-t540p.sh.intel.com>
2017-11-06 23:12                 ` [ata_scsi_offline_dev] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:238 Linus Torvalds
2017-11-07  0:12                   ` Tejun Heo
2017-11-07  3:34                   ` Martin K. Petersen
2017-11-07  6:55                   ` Hannes Reinecke
2017-10-29 23:37   ` [pgtable_trans_huge_withdraw] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 Fengguang Wu
2017-10-30  9:19     ` Kirill A. Shutemov
2017-10-30  9:28       ` Fengguang Wu
2017-10-30 11:27         ` Kirill A. Shutemov
2017-10-30 11:58     ` Kirill A. Shutemov
2017-10-30 12:40       ` Zi Yan
2017-10-30 13:24         ` Kirill A. Shutemov
2017-10-29 23:48   ` [run_timer_softirq] BUG: unable to handle kernel paging request at 0000000000010007 Fengguang Wu
2017-10-30 19:29     ` Linus Torvalds
2017-10-30 20:37       ` Fengguang Wu
     [not found]       ` <20171109051905.pdlsyrbzrwlsjbrs@wfg-t540p.sh.intel.com>
2017-11-10 20:08         ` Linus Torvalds
2017-11-10 21:29           ` Thomas Gleixner
2017-11-11 15:35             ` Fengguang Wu
2017-10-30  6:27   ` Fengguang Wu [this message]
2017-10-30 10:02     ` Linux 4.14-rc6: WARNING: CPU: 9 PID: 5377 at arch/x86/events/intel/core.c:2228 intel_pmu_handle_irq+0x4a8/0x4c0 Peter Zijlstra
2017-10-30 22:49       ` Fengguang Wu
2017-10-31 14:57         ` Peter Zijlstra
2017-10-30  6:44   ` [migration_cpu_stop] WARNING: CPU: 0 PID: 11 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x69/0x9e Fengguang Wu
2017-10-30  7:00   ` [haswell_crtc_enable] WARNING: CPU: 3 PID: 109 at drivers/gpu/drm/drm_vblank.c:1066 drm_wait_one_vblank+0x18f/0x1a0 [drm] Fengguang Wu
2017-10-30 19:10     ` Linus Torvalds
2017-10-30 20:03       ` [Intel-gfx] " Rodrigo Vivi
2017-10-30 23:17         ` Fengguang Wu
2017-10-30 20:18       ` Fengguang Wu
2017-10-30  7:20   ` [btrfs] WARNING: CPU: 0 PID: 6379 at fs/direct-io.c:293 dio_complete+0x1d4/0x220 Fengguang Wu
2017-10-30  7:44     ` Eryu Guan
2017-10-31  0:10       ` Fengguang Wu
2017-10-31  6:54         ` Eryu Guan
2017-10-31  7:10           ` Fengguang Wu
2017-11-06  1:13           ` Eric Biggers
2017-11-13 19:13             ` Eric Biggers
2017-11-13 19:16               ` Jens Axboe
2017-11-13 19:21                 ` Linus Torvalds
2017-11-13 21:56                   ` Darrick J. Wong
2017-11-13 22:01                     ` Linus Torvalds
2017-11-14 17:17                       ` Theodore Ts'o
2017-10-31 15:13       ` Filipe Manana
2017-10-30  7:35   ` [locking/paravirt] static_key_disable_cpuslocked(): static key 'virt_spin_lock_key+0x0/0x20' used before call to jump_label_init() Fengguang Wu
2017-10-30  7:47     ` Juergen Gross
2017-10-30  8:38       ` Fengguang Wu
2017-10-30  9:56         ` Fengguang Wu
2017-10-30  8:43     ` Dou Liyang
2017-10-30  7:40   ` [pmem_attach_disk] WARNING: CPU: 46 PID: 518 at kernel/memremap.c:363 devm_memremap_pages+0x350/0x4b0 Fengguang Wu
2017-10-30 15:59     ` Dan Williams
2017-10-31  0:00       ` Fengguang Wu
2017-10-31  0:24         ` Dan Williams
2017-10-31  7:08           ` Fengguang Wu
2017-11-12  0:15           ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171030062736.tv4el5mkwg6tkeup@wfg-t540p.sh.intel.com \
    --to=fengguang.wu@intel.com \
    --cc=ak@linux.intel.com \
    --cc=eranian@google.com \
    --cc=kan.liang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).