linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: kernel test robot <oliver.sang@intel.com>
To: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Suleiman Souhlal <suleiman@google.com>,
	Steven Rostedt <rostedt@goodmis.org>, Hsin Yi <hsinyi@google.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Joel Fernandes <joel@joelfernandes.org>,
	<linux-kernel@vger.kernel.org>, <aubrey.li@linux.intel.com>,
	<yu.c.chen@intel.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	"Daniel Bristot de Oliveira" <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Vineeth Pillai <vineethrp@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB
Date: Thu, 19 Oct 2023 22:56:08 +0800	[thread overview]
Message-ID: <202310192232.750e5c5b-oliver.sang@intel.com> (raw)
In-Reply-To: <20231005161727.1855004-1-joel@joelfernandes.org>



Hello,

kernel test robot noticed "WARNING:at_kernel/sched/core.c:#nohz_csd_func" on:

commit: 7b0c45f5095f8868fb14cc4e1745befdf58d173c ("[PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB")
url: https://github.com/intel-lab-lkp/linux/commits/Joel-Fernandes-Google/sched-fair-Avoid-unnecessary-IPIs-for-ILB/20231006-003907
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 3006adf3be79cde4d14b1800b963b82b6e5572e0
patch link: https://lore.kernel.org/all/20231005161727.1855004-1-joel@joelfernandes.org/
patch subject: [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB

in testcase: blktests
version: blktests-x86_64-3f75e62-1_20231017
with following parameters:

	disk: 1SSD
	test: nvme-group-00
	nvme_trtype: rdma



compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


+----------------+------------+------------+
|                | 3006adf3be | 7b0c45f509 |
+----------------+------------+------------+
| boot_successes | 0          | 3          |
+----------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202310192232.750e5c5b-oliver.sang@intel.com


[   55.309389][    C1] ------------[ cut here ]------------
[ 55.315508][ C1] WARNING: CPU: 1 PID: 0 at kernel/sched/core.c:1182 nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[   55.325508][    C1] Modules linked in: intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp btrfs blake2b_generic kvm_intel xor kvm raid6_pq zstd_compress irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel libcrc32c sha512_ssse3 crc32c_intel ipmi_ssif rapl nvme intel_cstate nvme_core mei_me ast t10_pi dax_hmem drm_shmem_helper crc64_rocksoft_generic idxd crc64_rocksoft mei drm_kms_helper wmi idxd_bus joydev i2c_ismt crc64 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad drm fuse ip_tables
[   55.380240][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.0-rc4-00038-g7b0c45f5095f #1
[ 55.390037][ C1] RIP: 0010:nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[ 55.396018][ C1] Code: 84 c0 74 06 0f 8e d3 00 00 00 45 88 b4 24 28 0a 00 00 48 83 c4 08 bf 07 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d e9 22 b0 f6 ff <0f> 0b e9 1b fe ff ff e8 76 6e 72 00 e9 66 fd ff ff e8 cc 6e 72 00
All code
========
   0:	84 c0                	test   %al,%al
   2:	74 06                	je     0xa
   4:	0f 8e d3 00 00 00    	jle    0xdd
   a:	45 88 b4 24 28 0a 00 	mov    %r14b,0xa28(%r12)
  11:	00 
  12:	48 83 c4 08          	add    $0x8,%rsp
  16:	bf 07 00 00 00       	mov    $0x7,%edi
  1b:	5b                   	pop    %rbx
  1c:	41 5c                	pop    %r12
  1e:	41 5d                	pop    %r13
  20:	41 5e                	pop    %r14
  22:	41 5f                	pop    %r15
  24:	5d                   	pop    %rbp
  25:	e9 22 b0 f6 ff       	jmpq   0xfffffffffff6b04c
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e9 1b fe ff ff       	jmpq   0xfffffffffffffe4c
  31:	e8 76 6e 72 00       	callq  0x726eac
  36:	e9 66 fd ff ff       	jmpq   0xfffffffffffffda1
  3b:	e8 cc 6e 72 00       	callq  0x726f0c

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e9 1b fe ff ff       	jmpq   0xfffffffffffffe22
   7:	e8 76 6e 72 00       	callq  0x726e82
   c:	e9 66 fd ff ff       	jmpq   0xfffffffffffffd77
  11:	e8 cc 6e 72 00       	callq  0x726ee2
[   55.418037][    C1] RSP: 0018:ffa00000001f8f58 EFLAGS: 00010046
[   55.424802][    C1] RAX: 0000000000000000 RBX: 000000000003a100 RCX: ffffffff8444c928
[   55.433718][    C1] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ff110017fc8ba164
[   55.442631][    C1] RBP: ffa00000001f8f88 R08: 0000000000000001 R09: ffe21c02ff91742c
[   55.451542][    C1] R10: ff110017fc8ba167 R11: ffa00000001f8ff8 R12: ff110017fc8ba100
[   55.460461][    C1] R13: ff110017fc8ba164 R14: 0000000000000000 R15: 0000000000000001
[   55.470959][    C1] FS:  0000000000000000(0000) GS:ff110017fc880000(0000) knlGS:0000000000000000
[   55.482348][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   55.491067][    C1] CR2: 00007fabd7bff699 CR3: 000000407de46006 CR4: 0000000000f71ee0
[   55.501337][    C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   55.511601][    C1] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[   55.521859][    C1] PKRU: 55555554
[   55.527131][    C1] Call Trace:
[   55.532072][    C1]  <IRQ>
[ 55.536527][ C1] ? __warn (kernel/panic.c:673) 
[ 55.542341][ C1] ? nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[ 55.548935][ C1] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
[ 55.555241][ C1] ? handle_bug (arch/x86/kernel/traps.c:237) 
[ 55.561323][ C1] ? exc_invalid_op (arch/x86/kernel/traps.c:258 (discriminator 1)) 
[ 55.567792][ C1] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) 
[ 55.574671][ C1] ? nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[ 55.581230][ C1] ? nohz_csd_func (arch/x86/include/asm/atomic.h:23 arch/x86/include/asm/atomic.h:135 include/linux/atomic/atomic-arch-fallback.h:1433 include/linux/atomic/atomic-arch-fallback.h:1565 include/linux/atomic/atomic-instrumented.h:862 kernel/sched/core.c:1181) 
[ 55.587667][ C1] ? task_mm_cid_work (kernel/sched/core.c:1173) 
[ 55.594511][ C1] __flush_smp_call_function_queue (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/csd.h:64 kernel/smp.c:134 kernel/smp.c:531) 
[ 55.602619][ C1] __sysvec_call_function_single (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:99 arch/x86/kernel/smp.c:293) 
[ 55.610431][ C1] sysvec_call_function_single (arch/x86/kernel/smp.c:287 (discriminator 14)) 
[   55.617918][    C1]  </IRQ>
[   55.622373][    C1]  <TASK>
[   55.624388][    C2] ------------[ cut here ]------------
[ 55.625607][ C1] asm_sysvec_call_function_single (arch/x86/include/asm/idtentry.h:652) 
[ 55.631669][ C2] WARNING: CPU: 2 PID: 0 at kernel/sched/core.c:1182 nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[ 55.638279][ C1] RIP: _nohz_idle_balance+0xd9/0x7f0 
[   55.648220][    C2] Modules linked in:
[ 55.655250][ C1] Code: 48 74 0a c7 05 c0 0f ce 04 00 00 00 00 8b 44 24 2c 83 e0 08 89 44 24 14 74 0a c7 05 ad 0f ce 04 00 00 00 00 f0 83 44 24 fc 00 <49> c7 c5 10 c4 3f 85 41 83 c4 01 48 b8 00 00 00 00 00 fc ff df 4c
All code
========
   0:	48 74 0a             	rex.W je 0xd
   3:	c7 05 c0 0f ce 04 00 	movl   $0x0,0x4ce0fc0(%rip)        # 0x4ce0fcd
   a:	00 00 00 
   d:	8b 44 24 2c          	mov    0x2c(%rsp),%eax
  11:	83 e0 08             	and    $0x8,%eax
  14:	89 44 24 14          	mov    %eax,0x14(%rsp)
  18:	74 0a                	je     0x24
  1a:	c7 05 ad 0f ce 04 00 	movl   $0x0,0x4ce0fad(%rip)        # 0x4ce0fd1
  21:	00 00 00 
  24:	f0 83 44 24 fc 00    	lock addl $0x0,-0x4(%rsp)
  2a:*	49 c7 c5 10 c4 3f 85 	mov    $0xffffffff853fc410,%r13		<-- trapping instruction
  31:	41 83 c4 01          	add    $0x1,%r12d
  35:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  3c:	fc ff df 
  3f:	4c                   	rex.WR

Code starting with the faulting instruction
===========================================
   0:	49 c7 c5 10 c4 3f 85 	mov    $0xffffffff853fc410,%r13
   7:	41 83 c4 01          	add    $0x1,%r12d
   b:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  12:	fc ff df 
  15:	4c                   	rex.WR
[   55.656534][    C2]  intel_rapl_msr
[   55.660852][    C1] RSP: 0018:ffa000000865fdb0 EFLAGS: 00000246
[   55.682783][    C2]  intel_rapl_common
[   55.686779][    C1] RAX: 0000000000000008 RBX: 0000000000000001 RCX: ffffffff812b76c7
[   55.693493][    C2]  x86_pkg_temp_thermal
[   55.697774][    C1] RDX: dffffc0000000000 RSI: 0000000000000008 RDI: 0000000000000001
[   55.706653][    C2]  intel_powerclamp
[   55.711232][    C1] RBP: ffa000000865fe90 R08: 0000000000000001 R09: ffe21c02ff91742c
[   55.720120][    C2]  coretemp btrfs
[   55.724298][    C1] R10: ff110017fc8ba167 R11: 0000000000000014 R12: 0000000000000001
[   55.733156][    C2]  blake2b_generic
[   55.737157][    C1] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   55.746019][    C2]  kvm_intel xor
[ 55.750115][ C1] ? nohz_run_idle_balance (arch/x86/include/asm/atomic.h:23 arch/x86/include/asm/atomic.h:135 include/linux/atomic/atomic-arch-fallback.h:1433 include/linux/atomic/atomic-arch-fallback.h:1565 include/linux/atomic/atomic-instrumented.h:862 kernel/sched/fair.c:11954) 
[   55.758991][    C2]  kvm
[ 55.762902][ C1] ? clockevents_program_event (kernel/time/clockevents.c:336 (discriminator 3)) 
[   55.768839][    C2]  raid6_pq zstd_compress
[ 55.771772][ C1] ? rebalance_domains (kernel/sched/fair.c:11826) 
[   55.778197][    C2]  irqbypass
[ 55.782972][ C1] ? __flush_smp_call_function_queue (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/csd.h:64 kernel/smp.c:134 kernel/smp.c:531) 
[   55.788612][    C2]  crct10dif_pclmul crc32_pclmul
[ 55.792132][ C1] do_idle (arch/x86/include/asm/current.h:41 include/linux/sched/idle.h:31 kernel/sched/idle.c:255) 
[   55.799153][    C2]  ghash_clmulni_intel
[ 55.804630][ C1] cpu_startup_entry (kernel/sched/idle.c:379 (discriminator 1)) 
[   55.809028][    C2]  libcrc32c sha512_ssse3
[ 55.813499][ C1] start_secondary (arch/x86/kernel/smpboot.c:210 arch/x86/kernel/smpboot.c:294) 
[   55.818765][    C2]  crc32c_intel
[ 55.823547][ C1] ? set_cpu_sibling_map (arch/x86/kernel/smpboot.c:240) 
[   55.828795][    C2]  ipmi_ssif
[ 55.832605][ C1] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:433) 
[   55.838652][    C2]  rapl
[   55.842150][    C1]  </TASK>
[   55.848889][    C2]  nvme
[   55.851904][    C1] ---[ end trace 0000000000000000 ]---
[   55.855226][    C2]  intel_cstate
[   55.856376][    T1] systemd[1]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set
[   55.929716][    C2]  nvme_core mei_me ast t10_pi dax_hmem drm_shmem_helper crc64_rocksoft_generic idxd crc64_rocksoft mei drm_kms_helper wmi idxd_bus joydev i2c_ismt crc64 acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad drm fuse ip_tables
[   55.958456][    C2] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G        W          6.6.0-rc4-00038-g7b0c45f5095f #1
[ 55.971146][ C2] RIP: 0010:nohz_csd_func (kernel/sched/core.c:1182 (discriminator 1)) 
[ 55.978359][ C2] Code: 84 c0 74 06 0f 8e d3 00 00 00 45 88 b4 24 28 0a 00 00 48 83 c4 08 bf 07 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d e9 22 b0 f6 ff <0f> 0b e9 1b fe ff ff e8 76 6e 72 00 e9 66 fd ff ff e8 cc 6e 72 00
All code
========
   0:	84 c0                	test   %al,%al
   2:	74 06                	je     0xa
   4:	0f 8e d3 00 00 00    	jle    0xdd
   a:	45 88 b4 24 28 0a 00 	mov    %r14b,0xa28(%r12)
  11:	00 
  12:	48 83 c4 08          	add    $0x8,%rsp
  16:	bf 07 00 00 00       	mov    $0x7,%edi
  1b:	5b                   	pop    %rbx
  1c:	41 5c                	pop    %r12
  1e:	41 5d                	pop    %r13
  20:	41 5e                	pop    %r14
  22:	41 5f                	pop    %r15
  24:	5d                   	pop    %rbp
  25:	e9 22 b0 f6 ff       	jmpq   0xfffffffffff6b04c
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e9 1b fe ff ff       	jmpq   0xfffffffffffffe4c
  31:	e8 76 6e 72 00       	callq  0x726eac
  36:	e9 66 fd ff ff       	jmpq   0xfffffffffffffda1
  3b:	e8 cc 6e 72 00       	callq  0x726f0c

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e9 1b fe ff ff       	jmpq   0xfffffffffffffe22
   7:	e8 76 6e 72 00       	callq  0x726e82
   c:	e9 66 fd ff ff       	jmpq   0xfffffffffffffd77
  11:	e8 cc 6e 72 00       	callq  0x726ee2


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231019/202310192232.750e5c5b-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


      parent reply	other threads:[~2023-10-19 14:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-05 16:17 [PATCH RFC] sched/fair: Avoid unnecessary IPIs for ILB Joel Fernandes (Google)
2023-10-06 10:51 ` Ingo Molnar
2023-10-06 16:32   ` Joel Fernandes
2023-10-08 17:35   ` Joel Fernandes
2023-10-09 18:33     ` Vineeth Pillai
2023-10-10  7:15     ` Vincent Guittot
2023-10-10 19:32       ` Joel Fernandes
2023-10-06 13:46 ` Vincent Guittot
2023-10-06 16:46   ` Joel Fernandes
2023-10-06 19:18 ` Shrikanth Hegde
2023-10-06 20:10   ` Shrikanth Hegde
2023-10-08 16:50     ` Joel Fernandes
2023-10-06 21:20   ` Vineeth Pillai
2023-10-08 16:46   ` Joel Fernandes
2023-10-06 20:01 ` Peter Zijlstra
2023-10-08 16:39   ` Joel Fernandes
2023-10-09 11:25     ` Ingo Molnar
2023-10-09 20:11       ` Steven Rostedt
2023-10-10 17:55       ` Joel Fernandes
2023-10-19 14:56 ` kernel test robot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202310192232.750e5c5b-oliver.sang@intel.com \
    --to=oliver.sang@intel.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=frederic@kernel.org \
    --cc=hsinyi@google.com \
    --cc=joel@joelfernandes.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=suleiman@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vineethrp@google.com \
    --cc=vschneid@redhat.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).