All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+03464269af631f4a4bdf@syzkaller.appspotmail.com>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	syzkaller-bugs@googlegroups.com
Subject: Re: [syzbot] INFO: rcu detected stall in ext4_file_write_iter (4)
Date: Mon, 27 Dec 2021 22:14:23 -0500	[thread overview]
Message-ID: <6396f046-b292-1a73-8339-d32b611d9b7f@redhat.com> (raw)
In-Reply-To: <20211225005253.1962-1-hdanton@sina.com>


On 12/24/21 19:52, Hillf Danton wrote:
> On Wed, 22 Dec 2021 09:33:26 -0800
>> syzbot found the following issue on:
>>
>> HEAD commit:    9eaa88c7036e Merge tag 'libata-5.16-rc6' of git://git.kern..
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=142d4143b00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=e3bdfd29b408d1b6
>> dashboard link: https://syzkaller.appspot.com/bug?extid=03464269af631f4a4bdf
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11b1537db00000
>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=125de549b00000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: syzbot+03464269af631f4a4bdf@syzkaller.appspotmail.com
>>
>> rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
>> 	(detected by 0, t=10502 jiffies, g=5509, q=170)
>> rcu: All QSes seen, last rcu_preempt kthread activity 8642 (4295000735-4294992093), jiffies_till_next_fqs=1, root ->qsmask 0x0
>> rcu: rcu_preempt kthread starved for 8643 jiffies! g5509 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
>> rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
>> rcu: RCU grace-period kthread stack dump:
>> task:rcu_preempt     state:R  running task     stack:28208 pid:   14 ppid:     2 flags:0x00004000
>> Call Trace:
>>   <TASK>
>>   context_switch kernel/sched/core.c:4972 [inline]
>>   context_switch kernel/sched/core.c:4972 [inline] kernel/sched/core.c:6253
>>   __schedule+0xa9a/0x4940 kernel/sched/core.c:6253 kernel/sched/core.c:6253
>>   schedule+0xd2/0x260 kernel/sched/core.c:6326 kernel/sched/core.c:6326
>>   schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 kernel/time/timer.c:1881
>>   rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955 kernel/rcu/tree.c:1955
>>   rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128 kernel/rcu/tree.c:2128
>>   kthread+0x405/0x4f0 kernel/kthread.c:327 kernel/kthread.c:327
>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 arch/x86/entry/entry_64.S:295
>>   </TASK>
>> rcu: Stack dump where RCU GP kthread last ran:
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 3672 Comm: syz-executor549 Not tainted 5.16.0-rc5-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>   <IRQ>
>>   __dump_stack lib/dump_stack.c:88 [inline]
>>   __dump_stack lib/dump_stack.c:88 [inline] lib/dump_stack.c:106
>>   dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 lib/dump_stack.c:106
>>   nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:111 lib/nmi_backtrace.c:111
>>   nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62 lib/nmi_backtrace.c:62
>>   trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>>   trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline] kernel/rcu/tree_stall.h:481
>>   rcu_check_gp_kthread_starvation.cold+0x1fb/0x200 kernel/rcu/tree_stall.h:481 kernel/rcu/tree_stall.h:481
>>   print_other_cpu_stall kernel/rcu/tree_stall.h:586 [inline]
>>   check_cpu_stall kernel/rcu/tree_stall.h:729 [inline]
>>   rcu_pending kernel/rcu/tree.c:3878 [inline]
>>   print_other_cpu_stall kernel/rcu/tree_stall.h:586 [inline] kernel/rcu/tree.c:2597
>>   check_cpu_stall kernel/rcu/tree_stall.h:729 [inline] kernel/rcu/tree.c:2597
>>   rcu_pending kernel/rcu/tree.c:3878 [inline] kernel/rcu/tree.c:2597
>>   rcu_sched_clock_irq+0x2125/0x2200 kernel/rcu/tree.c:2597 kernel/rcu/tree.c:2597
>>   update_process_times+0x16d/0x200 kernel/time/timer.c:1785 kernel/time/timer.c:1785
>>   tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:226 kernel/time/tick-sched.c:226
>>   tick_sched_timer+0x1b0/0x2d0 kernel/time/tick-sched.c:1428 kernel/time/tick-sched.c:1428
>>   __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
>>   __run_hrtimer kernel/time/hrtimer.c:1685 [inline] kernel/time/hrtimer.c:1749
>>   __hrtimer_run_queues+0x1c0/0xe50 kernel/time/hrtimer.c:1749 kernel/time/hrtimer.c:1749
>>   hrtimer_interrupt+0x31c/0x790 kernel/time/hrtimer.c:1811 kernel/time/hrtimer.c:1811
>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline]
>>   local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1086 [inline] arch/x86/kernel/apic/apic.c:1103
>>   __sysvec_apic_timer_interrupt+0x146/0x530 arch/x86/kernel/apic/apic.c:1103 arch/x86/kernel/apic/apic.c:1103
>>   sysvec_apic_timer_interrupt+0x8e/0xc0 arch/x86/kernel/apic/apic.c:1097 arch/x86/kernel/apic/apic.c:1097
>>   </IRQ>
>>   <TASK>
>>   asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:638 arch/x86/include/asm/idtentry.h:638
>> RIP: 0010:owner_on_cpu kernel/locking/rwsem.c:667 [inline]
> The lock acquirer was spinning long enough to make RCU stall, and the
> question is, if this is not reproducible, how was the report triggered?
> Anything that syzbot is special in this case?
>
> Thanks
> 		Hillf

The test was running on a CONFIG_PREEMPT kernel. So if the syzkaller 
kthread is running at a higher priority than the rcu_preempt kthread, it 
is possible for the rcu_preempt kthread to be starved of cpu time. The 
rwsem optimistic spinning code will relinquish the cpu if there is a 
higher priority thread running. Since rcu_preempt kthread did not seem 
to be able to get the cpu, I suspect that it is probably caused by the 
syzkaller thread having a higher priority.

Cheers,
Longman


  parent reply	other threads:[~2021-12-28  3:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 17:33 [syzbot] INFO: rcu detected stall in ext4_file_write_iter (4) syzbot
     [not found] ` <20211225005253.1962-1-hdanton@sina.com>
2021-12-28  3:14   ` Waiman Long [this message]
2021-12-29 21:29     ` Theodore Ts'o
     [not found]       ` <20211230125018.2272-1-hdanton@sina.com>
2022-02-08 10:32         ` Aleksandr Nogikh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6396f046-b292-1a73-8339-d32b611d9b7f@redhat.com \
    --to=longman@redhat.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=syzbot+03464269af631f4a4bdf@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.