[mcgrof:20240131-test-xarray] [test_xarray] 3bc2ee924e: WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall

From: kernel test robot <oliver.sang@intel.com>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>,
	Daniel Gomez <da.gomez@samsung.com>,
	<linux-kernel@vger.kernel.org>, <oliver.sang@intel.com>
Subject: [mcgrof:20240131-test-xarray] [test_xarray] 3bc2ee924e: WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall
Date: Thu, 8 Feb 2024 10:32:01 +0800	[thread overview]
Message-ID: <202402071613.70f28243-lkp@intel.com> (raw)

Hello,

kernel test robot noticed "WARNING:at_kernel/rcu/tree_stall.h:#rcu_check_gp_start_stall" on:

commit: 3bc2ee924e22d16abb6483d266f4c461f665fadf ("test_xarray: add tests for advanced multi-index use")
https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git 20240131-test-xarray

in testcase: boot

compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202402071613.70f28243-lkp@intel.com

[  157.401342][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 59s!
[  157.403783][    C0] Showing busy workqueues and worker pools:
[  157.405264][    C0] workqueue events: flags=0x0
[  157.406458][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[  157.406481][    C0]     pending: psi_avgs_work, stop_one_cpu_nowait_workfn
[  157.406529][    C0] workqueue events_power_efficient: flags=0x80
[  157.411631][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[  157.411655][    C0]     pending: neigh_managed_work, neigh_periodic_work, do_cache_clean, check_lifetime
[  157.411741][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  188.121356][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 89s!
[  188.123611][    C0] Showing busy workqueues and worker pools:
[  188.124965][    C0] workqueue events: flags=0x0
[  188.126133][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 refcnt=3
[  188.126154][    C0]     pending: psi_avgs_work, stop_one_cpu_nowait_workfn
[  188.126199][    C0] workqueue events_power_efficient: flags=0x80
[  188.131195][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=4/256 refcnt=5
[  188.131212][    C0]     pending: neigh_managed_work, neigh_periodic_work, do_cache_clean, check_lifetime
[  188.131260][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  202.629415][    C0] ------------[ cut here ]------------
[ 202.630948][ C0] WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:990 rcu_check_gp_start_stall+0x17f/0x2c0 
[  202.633356][    C0] Modules linked in:
[  202.633356][    C0] CPU: 0 PID: 1 Comm: swapper Tainted: G        W        N 6.8.0-rc2-00001-g3bc2ee924e22 #1 147c3e57e8173b4ddcb54882b8d3ddba616ae71f
[  202.633356][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 202.633356][ C0] RIP: rcu_check_gp_start_stall+0x17f/0x2c0 
[ 202.633356][ C0] Code: d5 00 00 00 48 8b 15 e0 25 c2 02 48 01 f2 48 39 d0 0f 88 c2 00 00 00 b8 01 00 00 00 87 05 f5 49 60 04 85 c0 0f 85 af 00 00 00 <0f> 0b 48 81 fb 00 65 3d 9a 74 2c 8b 15 04 44 f9 02 85 d2 74 16 8b
All code
========
   0:	d5                   	(bad)
   1:	00 00                	add    %al,(%rax)
   3:	00 48 8b             	add    %cl,-0x75(%rax)
   6:	15 e0 25 c2 02       	adc    $0x2c225e0,%eax
   b:	48 01 f2             	add    %rsi,%rdx
   e:	48 39 d0             	cmp    %rdx,%rax
  11:	0f 88 c2 00 00 00    	js     0xd9
  17:	b8 01 00 00 00       	mov    $0x1,%eax
  1c:	87 05 f5 49 60 04    	xchg   %eax,0x46049f5(%rip)        # 0x4604a17
  22:	85 c0                	test   %eax,%eax
  24:	0f 85 af 00 00 00    	jne    0xd9
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	48 81 fb 00 65 3d 9a 	cmp    $0xffffffff9a3d6500,%rbx
  33:	74 2c                	je     0x61
  35:	8b 15 04 44 f9 02    	mov    0x2f94404(%rip),%edx        # 0x2f9443f
  3b:	85 d2                	test   %edx,%edx
  3d:	74 16                	je     0x55
  3f:	8b                   	.byte 0x8b

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	48 81 fb 00 65 3d 9a 	cmp    $0xffffffff9a3d6500,%rbx
   9:	74 2c                	je     0x37
   b:	8b 15 04 44 f9 02    	mov    0x2f94404(%rip),%edx        # 0x2f94415
  11:	85 d2                	test   %edx,%edx
  13:	74 16                	je     0x2b
  15:	8b                   	.byte 0x8b
[  202.633356][    C0] RSP: 0000:ffffaaae80003f48 EFLAGS: 00010046
[  202.633356][    C0] RAX: 0000000000000000 RBX: ffffffff9a3d6500 RCX: 0000000000003b10
[  202.633356][    C0] RDX: 00000000ffff9d65 RSI: 000000000000668a RDI: 0000000000000000
[  202.633356][    C0] RBP: 0000000000000246 R08: 0000000000000000 R09: 0000000000000000
[  202.633356][    C0] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000009
[  202.633356][    C0] R13: 0000000000000100 R14: ffffffff9b1c4368 R15: 0000000000000009
[  202.633356][    C0] FS:  0000000000000000(0000) GS:ffffffff99e67000(0000) knlGS:0000000000000000
[  202.633356][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  202.633356][    C0] CR2: ffff96127ffff000 CR3: 00000002c7257000 CR4: 00000000000406b0
[  202.633356][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  202.633356][    C0] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  202.633356][    C0] Call Trace:
[  202.633356][    C0]  <IRQ>
[ 202.633356][ C0] ? show_regs (arch/x86/kernel/dumpstack.c:479) 
[ 202.633356][ C0] ? __warn (kernel/panic.c:677) 
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x17f/0x2c0 
[ 202.633356][ C0] ? report_bug (lib/bug.c:180 lib/bug.c:219) 
[ 202.633356][ C0] ? handle_bug (arch/x86/kernel/traps.c:238) 
[ 202.633356][ C0] ? exc_invalid_op (arch/x86/kernel/traps.c:259 (discriminator 1)) 
[ 202.633356][ C0] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) 
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x17f/0x2c0 
[ 202.633356][ C0] ? rcu_check_gp_start_stall+0x8b/0x2c0 
[ 202.633356][ C0] rcu_core (kernel/rcu/tree.c:2463) 
[ 202.633356][ C0] __do_softirq (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/irq.h:142 kernel/softirq.c:554) 
[ 202.633356][ C0] irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) 
[ 202.633356][ C0] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) 
[  202.633356][    C0]  </IRQ>
[  202.633356][    C0]  <TASK>
[ 202.633356][ C0] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:649) 
[ 202.633356][ C0] RIP: 0010:lock_is_held_type (kernel/locking/lockdep.c:5830) 
[ 202.633356][ C0] Code: 0f 00 00 b8 ff ff ff ff 0f c1 05 fe 65 22 01 83 f8 01 75 31 9c 58 f6 c4 02 75 44 48 f7 04 24 00 02 00 00 74 01 fb 48 83 c4 08 <44> 89 e0 5b 5d 41 5c 41 5d 41 5e 41 5f 31 d2 31 c9 31 f6 31 ff c3
All code
========
   0:	0f 00 00             	sldt   (%rax)
   3:	b8 ff ff ff ff       	mov    $0xffffffff,%eax
   8:	0f c1 05 fe 65 22 01 	xadd   %eax,0x12265fe(%rip)        # 0x122660d
   f:	83 f8 01             	cmp    $0x1,%eax
  12:	75 31                	jne    0x45
  14:	9c                   	pushf
  15:	58                   	pop    %rax
  16:	f6 c4 02             	test   $0x2,%ah
  19:	75 44                	jne    0x5f
  1b:	48 f7 04 24 00 02 00 	testq  $0x200,(%rsp)
  22:	00 
  23:	74 01                	je     0x26
  25:	fb                   	sti
  26:	48 83 c4 08          	add    $0x8,%rsp
  2a:*	44 89 e0             	mov    %r12d,%eax		<-- trapping instruction
  2d:	5b                   	pop    %rbx
  2e:	5d                   	pop    %rbp
  2f:	41 5c                	pop    %r12
  31:	41 5d                	pop    %r13
  33:	41 5e                	pop    %r14
  35:	41 5f                	pop    %r15
  37:	31 d2                	xor    %edx,%edx
  39:	31 c9                	xor    %ecx,%ecx
  3b:	31 f6                	xor    %esi,%esi
  3d:	31 ff                	xor    %edi,%edi
  3f:	c3                   	ret

Code starting with the faulting instruction
===========================================
   0:	44 89 e0             	mov    %r12d,%eax
   3:	5b                   	pop    %rbx
   4:	5d                   	pop    %rbp
   5:	41 5c                	pop    %r12
   7:	41 5d                	pop    %r13
   9:	41 5e                	pop    %r14
   b:	41 5f                	pop    %r15
   d:	31 d2                	xor    %edx,%edx
   f:	31 c9                	xor    %ecx,%ecx
  11:	31 f6                	xor    %esi,%esi
  13:	31 ff                	xor    %edi,%edi
  15:	c3                   	ret
[  202.633356][    C0] RSP: 0000:ffffaaae80013cd8 EFLAGS: 00000286
[  202.633356][    C0] RAX: 0000000000000046 RBX: ffffffff9a3d4080 RCX: 0000000000000000
[  202.633356][    C0] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff99b17b04
[  202.633356][    C0] RBP: ffff960f44e56000 R08: 0000000000000000 R09: 0000000000000000
[  202.633356][    C0] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[  202.633356][    C0] R13: 00000000ffffffff R14: 0000000000000000 R15: ffff960f44e566f8
[ 202.633356][ C0] xas_descend (include/linux/xarray.h:1216 lib/xarray.c:206) 
[ 202.633356][ C0] xas_load (lib/xarray.c:245) 
[ 202.633356][ C0] test_get_entry+0x8f/0x1c0 
[ 202.633356][ C0] check_xa_multi_store_adv+0x41e/0x600 
[ 202.633356][ C0] check_multi_store_advanced+0x36/0x80 
[ 202.633356][ C0] xarray_checks (lib/test_xarray.c:1971) 
[ 202.633356][ C0] ? check_xas_retry+0xb00/0xb00 
[ 202.633356][ C0] do_one_initcall (init/main.c:1236) 
[ 202.633356][ C0] do_initcalls (init/main.c:1297 init/main.c:1314) 
[ 202.633356][ C0] kernel_init_freeable (init/main.c:1553) 
[ 202.633356][ C0] ? rest_init (init/main.c:1433) 
[ 202.633356][ C0] kernel_init (init/main.c:1443) 
[ 202.633356][ C0] ? _raw_spin_unlock_irq (arch/x86/include/asm/preempt.h:103 include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:202) 
[ 202.633356][ C0] ret_from_fork (arch/x86/kernel/process.c:153) 
[ 202.633356][ C0] ? rest_init (init/main.c:1433) 
[ 202.633356][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:250) 
[  202.633356][    C0]  </TASK>
[  202.633356][    C0] irq event stamp: 7770391
[ 202.633356][ C0] hardirqs last enabled at (7770390): rcu_core (arch/x86/include/asm/irqflags.h:26 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 kernel/rcu/tree_nocb.h:1738 kernel/rcu/tree.c:2457) 
[ 202.633356][ C0] hardirqs last disabled at (7770391): _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:108 kernel/locking/spinlock.c:162) 
[ 202.633356][ C0] softirqs last enabled at (7770364): __do_softirq (arch/x86/include/asm/preempt.h:26 kernel/softirq.c:400 kernel/softirq.c:582) 
[ 202.633356][ C0] softirqs last disabled at (7770379): irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) 
[  202.633356][    C0] ---[ end trace 0000000000000000 ]---
[  202.738070][    C0] rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 ->rt_priority 0 delta ->gp_start 26476 ->gp_activity 26475 ->gp_req_activity 26277 ->gp_wake_time 26277 ->gp_wake_seq 15120 ->gp_seq 15120 ->gp_seq_needed 15124 ->gp_max 2505 ->gp_flags 0x1
[  202.743977][    C0] rcu: 	rcu_node 0:0 ->gp_seq 15120 ->gp_seq_needed 15124 ->qsmask 0x0 .... ->n_boosts 0
[  202.746434][    C0] rcu: RCU callbacks invoked since boot: 121850
[  202.747976][    C0] rcu_tasks: RTGS_WAIT_CBS(11) since 46815 g:16 i:0/0 k.u. l:63
[  202.749977][    C0] rcu_tasks_rude: RTGS_WAIT_CBS(11) since 46816 g:16 i:0/4 k.u. l:63

The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240207/202402071613.70f28243-lkp@intel.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki