linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Vyukov <dvyukov@google.com>
To: Borislav Petkov <bp@alien8.de>
Cc: syzbot <syzbot+111bc509cd9740d7e4aa@syzkaller.appspotmail.com>,
	Dou Liyang <douly.fnst@cn.fujitsu.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	konrad.wilk@oracle.com, Len Brown <len.brown@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	puwen@hygon.cn, syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	wang.yi59@zte.com.cn, "the arch/x86 maintainers" <x86@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: WARNING in rcu_check_gp_start_stall
Date: Sat, 23 Feb 2019 11:50:49 +0100	[thread overview]
Message-ID: <CACT4Y+ZaUe+_ZT4tLpXWUtfih4ocNgAsa4WdV4NRceLdK4tc4Q@mail.gmail.com> (raw)
In-Reply-To: <20190223103809.GB23376@zn.tnic>

On Sat, Feb 23, 2019 at 11:38 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Sat, Feb 23, 2019 at 11:33:33AM +0100, Dmitry Vyukov wrote:
> > On Fri, Feb 22, 2019 at 11:20 PM Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Fri, Feb 22, 2019 at 09:10:04AM -0800, syzbot wrote:
> > > > Hello,
> > > >
> > > > syzbot found the following crash on:
> > > >
> > > > HEAD commit:    8a61716ff2ab Merge tag 'ceph-for-5.0-rc8' of git://github...
> > > > git tree:       upstream
> > > > console output: https://syzkaller.appspot.com/x/log.txt?x=1531dd3f400000
> > > > kernel config:  https://syzkaller.appspot.com/x/.config?x=7132344728e7ec3f
> > > > dashboard link: https://syzkaller.appspot.com/bug?extid=111bc509cd9740d7e4aa
> > > > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > > > syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16d4966cc00000
> > > > C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=10c492d0c00000
> > >
> > > So I ran this for more than an hour in a guest here with the above
> > > .config but nothing happened. The compiler I used is 8.2, dunno of that
> > > makes the difference or I'm missing something else...
> >
> > I was able to reproduce this on the first run:
> >
> > # ./syz-execprog -procs=8 -repeat=0 hang
>
> Ok, this is what I'm missing: I ran the reproducer directly but you run
> a multithreaded thing with this syz-execprog. Where do I get that syz-
> thing?
>
> > There is a bunch of other bug reports about hangs where reproducers
> > mention perf_event_open and sched_setattr.
>
> This is a known issue, says peterz.


Peter, what is the canonical location to reference for this issue
(open bug or something)? When we get back to this report later, how
does one know if this is fixed or not and what's the status?

The C repro hanged the machine even faster, so it must be some other difference.
I would expect compiler to not make difference. qemu, number of CPUs
and maybe host kernel config (at least for interrupt granularity or
something?) may be relevant.

But if you want to run syzkaller reproducer with syz-execprog, there
is a docs link in the "syz repro" link.

# [   38.881989] hrtimer: interrupt took 28594 ns
[   91.730829] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0
nice=0 stuck for 51s!
[   91.734505] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0
nice=0 stuck for 51s!
[   91.735403] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0
nice=0 stuck for 51s!
[   91.736273] BUG: workqueue lockup - pool cpus=2 node=0 flags=0x0
nice=-20 stuck for 49s!
[   91.737173] BUG: workqueue lockup - pool cpus=3 node=0 flags=0x0
nice=0 stuck for 51s!
[   91.738044] BUG: workqueue lockup - pool cpus=0-3 flags=0x4 nice=0
stuck for 49s!
[   91.738887] Showing busy workqueues and worker pools:
[   91.739496] workqueue events: flags=0x0
[   91.740012]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=3/256
[   91.740784]     pending: defense_work_handler, e1000_watchdog, cache_reap
[   91.741584]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=2/256
[   91.742237]     pending: cache_reap, check_corruption
[   91.742802]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[   91.743451]     pending: cache_reap
[   91.743844]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=3/256
[   91.744497]     pending: vmstat_shepherd, cache_reap, psi_update_work
[   91.745286] workqueue events_unbound: flags=0x2
[   91.745872]   pwq 8: cpus=0-3 flags=0x4 nice=0 active=1/512
[   91.747586]     pending: flush_to_ldisc
[   91.748049]
[   91.748052] ======================================================
[   91.748055] WARNING: possible circular locking dependency detected
[   91.748056] 5.0.0-rc7+ #7 Not tainted
[   91.748058] ------------------------------------------------------
[   91.748060] a.out/6057 is trying to acquire lock:
[   91.748062] 00000000a11439e2 (console_owner){-.-.}, at:
console_unlock+0x4d3/0x11e0
[   91.748068]
[   91.748069] but task is already holding lock:
[   91.748071] 0000000082e8b4f5 (&pool->lock/1){-.-.}, at:
show_workqueue_state.cold+0xac5/0x15a8
[   91.748078]
[   91.748080] which lock already depends on the new lock.
[   91.748081]
[   91.748082]
[   91.748084] the existing dependency chain (in reverse order) is:
[   91.748085]
[   91.748086] -> #3 (&pool->lock/1){-.-.}:
[   91.748092]        _raw_spin_lock+0x2f/0x40
[   91.748094]        __queue_work+0x2d9/0x1450
[   91.748095]        queue_work_on+0x192/0x200
[   91.748097]        tty_schedule_flip+0x149/0x1e0
[   91.748099]        tty_flip_buffer_push+0x16/0x20
[   91.748101]        pty_write+0x1a6/0x200
[   91.748102]        n_tty_write+0xb9e/0x1220
[   91.748104]        tty_write+0x45b/0x7a0
[   91.748105]        __vfs_write+0x116/0xb40
[   91.748107]        vfs_write+0x20c/0x580
[   91.748109]        ksys_write+0x105/0x260
[   91.748110]        __x64_sys_write+0x73/0xb0
[   91.748112]        do_syscall_64+0x1a3/0x800
[   91.748114]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   91.748115]
[   91.748116] -> #2 (&(&port->lock)->rlock){-.-.}:
[   91.748122]        _raw_spin_lock_irqsave+0x95/0xd0
[   91.748123]        tty_port_tty_get+0x22/0x80
[   91.748125]        tty_port_default_wakeup+0x16/0x40
[   91.748127]        tty_port_tty_wakeup+0x5d/0x70
[   91.748129]        uart_write_wakeup+0x46/0x70
[   91.748131]        serial8250_tx_chars+0x4a4/0xcc0
[   91.748133]        serial8250_handle_irq.part.0+0x1be/0x2e0
[   91.748135]        serial8250_default_handle_irq+0xc5/0x150
[   91.748136]        serial8250_interrupt+0xfb/0x1a0
[   91.748138]        __handle_irq_event_percpu+0x1c6/0xb10
[   91.748140]        handle_irq_event_percpu+0xa0/0x1d0
[   91.748142]        handle_irq_event+0xa7/0x134
[   91.748144]        handle_edge_irq+0x232/0x8a0
[   91.748145]        handle_irq+0x252/0x3d8
[   91.748147]        do_IRQ+0x99/0x1d0
[   91.748148]        ret_from_intr+0x0/0x1e
[   91.748150]        native_safe_halt+0x2/0x10
[   91.748152]        arch_cpu_idle+0x10/0x20
[   91.748153]        default_idle_call+0x36/0x90
[   91.748155]        do_idle+0x386/0x5d0
[   91.748157]        cpu_startup_entry+0x1b/0x20
[   91.748158]        rest_init+0x245/0x37b
[   91.748160]        arch_call_rest_init+0xe/0x1b
[   91.748161]        start_kernel+0x877/0x8b2
[   91.748163]        x86_64_start_reservations+0x29/0x2b
[   91.748165]        x86_64_start_kernel+0x77/0x7b
[   91.748167]        secondary_startup_64+0xa4/0xb0
[   91.748168]
[   91.748169] -> #1 (&port_lock_key){-.-.}:
[   91.748175]        _raw_spin_lock_irqsave+0x95/0xd0
[   91.748177]        serial8250_console_write+0x253/0xab0
[   91.748178]        univ8250_console_write+0x5f/0x70
[   91.748180]        console_unlock+0xcff/0x11e0
[   91.748182]        vprintk_emit+0x370/0x960
[   91.748183]        vprintk_default+0x28/0x30
[   91.748185]        vprintk_func+0x7e/0x189
[   91.748187]        printk+0xba/0xed
[   91.748189]        register_console+0x74d/0xb50
[   91.748190]        univ8250_console_init+0x3e/0x4b
[   91.748192]        console_init+0x6af/0x9f3
[   91.748194]        start_kernel+0x5dc/0x8b2
[   91.748196]        x86_64_start_reservations+0x29/0x2b
[   91.748198]        x86_64_start_kernel+0x77/0x7b
[   91.748200]        secondary_startup_64+0xa4/0xb0
[   91.748200]
[   91.748201] -> #0 (console_owner){-.-.}:
[   91.748207]        lock_acquire+0x1db/0x570
[   91.748209]        console_unlock+0x53d/0x11e0
[   91.748211]        vprintk_emit+0x370/0x960
[   91.748212]        vprintk_default+0x28/0x30
[   91.748214]        vprintk_func+0x7e/0x189
[   91.748215]        printk+0xba/0xed
[   91.748217]        show_workqueue_state.cold+0xc5f/0x15a8
[   91.748219]        wq_watchdog_timer_fn+0x6bd/0x7e0
[   91.748221]        call_timer_fn+0x254/0x900
[   91.748222]        __run_timers+0x6fc/0xd50
[   91.748224]        run_timer_softirq+0x88/0xb0
[   91.748226]        __do_softirq+0x30b/0xb11
[   91.748227]        irq_exit+0x180/0x1d0
[   91.748229]        smp_apic_timer_interrupt+0x1b7/0x760
[   91.748231]        apic_timer_interrupt+0xf/0x20
[   91.748233]        __sanitizer_cov_trace_const_cmp8+0x13/0x20
[   91.748234]        sanity+0x109/0x330
[   91.748236]        copy_page_to_iter+0x634/0x1000
[   91.748238]        generic_file_read_iter+0xbb1/0x2d40
[   91.748240]        ext4_file_read_iter+0x180/0x3c0
[   91.748242]        generic_file_splice_read+0x5c4/0xa90
[   91.748243]        do_splice_to+0x12a/0x190
[   91.748245]        splice_direct_to_actor+0x31b/0x9d0
[   91.748247]        do_splice_direct+0x2c7/0x420
[   91.748249]        do_sendfile+0x61d/0xe60
[   91.748250]        __x64_sys_sendfile64+0x15a/0x240
[   91.748252]        do_syscall_64+0x1a3/0x800
[   91.748254]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   91.748255]
[   91.748257] other info that might help us debug this:
[   91.748258]
[   91.748259] Chain exists of:
[   91.748260]   console_owner --> &(&port->lock)->rlock --> &pool->lock/1
[   91.748268]
[   91.748270]  Possible unsafe locking scenario:
[   91.748271]
[   91.748273]        CPU0                    CPU1
[   91.748274]        ----                    ----
[   91.748275]   lock(&pool->lock/1);
[   91.748280]                                lock(&(&port->lock)->rlock);
[   91.748284]                                lock(&pool->lock/1);
[   91.748288]   lock(console_owner);
[   91.748291]
[   91.748293]  *** DEADLOCK ***
[   91.748293]
[   91.748295] 5 locks held by a.out/6057:
[   91.748296]  #0: 00000000878cae3a (sb_writers#3){.+.+}, at:
do_sendfile+0xae0/0xe60
[   91.748304]  #1: 000000007b4589c2 ((&wq_watchdog_timer)){+.-.}, at:
call_timer_fn+0x1b4/0x900
[   91.748311]  #2: 0000000085eca237 (rcu_read_lock_sched){....}, at:
show_workqueue_state+0x0/0x180
[   91.748318]  #3: 0000000082e8b4f5 (&pool->lock/1){-.-.}, at:
show_workqueue_state.cold+0xac5/0x15a8
[   91.748326]  #4: 00000000fffd7726 (console_lock){+.+.}, at:
vprintk_emit+0x351/0x960
[   91.748332]
[   91.748334] stack backtrace:
[   91.748336] CPU: 0 PID: 6057 Comm: a.out Not tainted 5.0.0-rc7+ #7
[   91.748339] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[   91.748340] Call Trace:
[   91.748341]  <IRQ>
[   91.748343]  dump_stack+0x1db/0x2d0
[   91.748345]  ? dump_stack_print_info.cold+0x20/0x20
[   91.748347]  ? print_stack_trace+0x77/0xb0
[   91.748348]  ? vprintk_func+0x86/0x189
[   91.748350]  print_circular_bug.isra.0.cold+0x1cc/0x28f
[   91.748352]  __lock_acquire+0x3014/0x4a30
[   91.748354]  ? mark_held_locks+0x100/0x100
[   91.748355]  ? memcpy+0x46/0x50
[   91.748357]  ? add_lock_to_list.isra.0+0x450/0x450
[   91.748358]  ? sprintf+0xc0/0x100
[   91.748360]  ? scnprintf+0x140/0x140
[   91.748362]  ? console_unlock+0x518/0x11e0
[   91.748364]  ? find_held_lock+0x35/0x120
[   91.748365]  lock_acquire+0x1db/0x570
[   91.748367]  ? console_unlock+0x4d3/0x11e0
[   91.748369]  ? lock_release+0xc40/0xc40
[   91.748371]  ? do_raw_spin_trylock+0x270/0x270
[   91.748373]  ? lock_acquire+0x1db/0x570
[   91.748374]  console_unlock+0x53d/0x11e0
[   91.748376]  ? console_unlock+0x4d3/0x11e0
[   91.748378]  ? kmsg_dump_rewind+0x2b0/0x2b0
[   91.748380]  ? _raw_spin_unlock_irqrestore+0xa4/0xe0
[   91.748382]  ? vprintk_emit+0x351/0x960
[   91.748384]  ? __down_trylock_console_sem+0x148/0x210
[   91.748385]  vprintk_emit+0x370/0x960
[   91.748387]  ? wake_up_klogd+0x180/0x180
[   91.748389]  ? lockdep_hardirqs_on+0x19b/0x5d0
[   91.748390]  ? retint_kernel+0x2d/0x2d
[   91.748392]  ? trace_hardirqs_on_caller+0xc0/0x310
[   91.748394]  vprintk_default+0x28/0x30
[   91.748396]  vprintk_func+0x7e/0x189
[   91.748397]  ? printk+0xba/0xed
[   91.748398]  printk+0xba/0xed
[   91.748400]  ? kmsg_dump_rewind_nolock+0xe4/0xe4
[   91.748402]  ? wq_watchdog_touch+0xb0/0x102
[   91.748404]  show_workqueue_state.cold+0xc5f/0x15a8
[   91.748406]  ? print_worker_info+0x540/0x540
[   91.748408]  ? add_lock_to_list.isra.0+0x450/0x450
[   91.748409]  ? retint_kernel+0x2d/0x2d
[   91.748411]  ? trace_hardirqs_on_thunk+0x1a/0x1c
[   91.748413]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748415]  ? lock_downgrade+0x910/0x910
[   91.748416]  ? kasan_check_read+0x11/0x20
[   91.748418]  ? rcu_dynticks_curr_cpu_in_eqs+0xa2/0x170
[   91.748420]  ? rcu_read_unlock_special+0x380/0x380
[   91.748422]  ? wq_watchdog_timer_fn+0x4f4/0x7e0
[   91.748424]  wq_watchdog_timer_fn+0x6bd/0x7e0
[   91.748426]  ? show_workqueue_state+0x180/0x180
[   91.748428]  ? add_lock_to_list.isra.0+0x450/0x450
[   91.748430]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748432]  ? check_preemption_disabled+0x48/0x290
[   91.748433]  ? __lock_is_held+0xb6/0x140
[   91.748435]  call_timer_fn+0x254/0x900
[   91.748437]  ? show_workqueue_state+0x180/0x180
[   91.748438]  ? process_timeout+0x40/0x40
[   91.748440]  ? retint_kernel+0x2d/0x2d
[   91.748442]  ? show_workqueue_state+0x180/0x180
[   91.748443]  ? _raw_spin_unlock_irq+0x54/0x90
[   91.748445]  ? show_workqueue_state+0x180/0x180
[   91.748447]  __run_timers+0x6fc/0xd50
[   91.748449]  ? __bpf_trace_timer_expire_entry+0x30/0x30
[   91.748451]  ? print_usage_bug+0xd0/0xd0
[   91.748452]  ? find_held_lock+0x35/0x120
[   91.748454]  ? clockevents_program_event+0x15f/0x380
[   91.748456]  ? add_lock_to_list.isra.0+0x450/0x450
[   91.748458]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748460]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748462]  ? check_preemption_disabled+0x48/0x290
[   91.748464]  ? __lock_is_held+0xb6/0x140
[   91.748466]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748468]  ? check_preemption_disabled+0x48/0x290
[   91.748469]  run_timer_softirq+0x88/0xb0
[   91.748471]  ? rcu_read_lock_sched_held+0x110/0x130
[   91.748473]  __do_softirq+0x30b/0xb11
[   91.748475]  ? __irqentry_text_end+0x1f96c2/0x1f96c2
[   91.748477]  ? kvm_clock_read+0x18/0x30
[   91.748478]  ? kvm_sched_clock_read+0x9/0x20
[   91.748480]  ? sched_clock+0x2e/0x50
[   91.748482]  ? __sanitizer_cov_trace_const_cmp4+0x16/0x20
[   91.748484]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[   91.748486]  ? check_preemption_disabled+0x48/0x290
[   91.748487]  irq_exit+0x180/0x1d0
[   91.748489]  smp_apic_timer_interrupt+0x1b7/0x760
[   91.748491]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   91.748493]  ? smp_call_function_single_interrupt+0x640/0x640
[   91.748495]  ? trace_hardirqs_off+0x310/0x310
[   91.748497]  ? task_prio+0x50/0x50
[   91.748499]  ? __sanitizer_cov_trace_const_cmp8+0x18/0x20
[   91.748501]  ? check_preemption_disabled+0x48/0x290
[   91.748502]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[   91.748504]  apic_timer_interrupt+0xf/0x20
[   91.748505]  </IRQ>
[   91.748508] RIP: 0010:__sanitizer_cov_trace_const_cmp8+0x13/0x20
[   91.748509] Code: 00
[   91.748512] Lost 71 message(s)!
[   91.866234] workqueue events_power_efficient: flags=0x80
[   91.866863]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=2/256
[   91.867568]     pending: gc_worker, neigh_periodic_work
[   91.868202]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=3/256
[   91.868911]     pending: crda_timeout_work, neigh_periodic_work,
do_cache_clean
[   91.869781]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[   91.870488]     pending: fb_flashcursor
[   91.870963] workqueue mm_percpu_wq: flags=0x8
[   91.871482]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[   91.872188]     pending: vmstat_update
[   91.872644]   pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256
[   91.873353]     pending: vmstat_update
[   91.873806]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256
[   91.874512]     pending: vmstat_update
[   91.874964]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256
[   91.875670]     pending: vmstat_update
[   91.876131] workqueue writeback: flags=0x4e
[   91.876631]   pwq 8: cpus=0-3 flags=0x4 nice=0 active=1/256
[   91.877292]     pending: wb_workfn
[   91.877716] workqueue kblockd: flags=0x18
[   91.878201]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
[   91.878930]     pending: blk_mq_timeout_work
[   91.879513] workqueue dm_bufio_cache: flags=0x8
[   91.880053]   pwq 6: cpus=3 node=0 flags=0x0 nice=0 active=1/256
[   91.880761]     pending: work_fn

  reply	other threads:[~2019-02-23 10:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 17:10 WARNING in rcu_check_gp_start_stall syzbot
2019-02-22 22:20 ` Borislav Petkov
2019-02-23 10:33   ` Dmitry Vyukov
2019-02-23 10:38     ` Borislav Petkov
2019-02-23 10:50       ` Dmitry Vyukov [this message]
2019-02-23 10:56         ` Borislav Petkov
2019-03-17 10:43 ` syzbot
2019-03-17 11:04   ` Greg KH
2019-03-18 12:18     ` Dmitry Vyukov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACT4Y+ZaUe+_ZT4tLpXWUtfih4ocNgAsa4WdV4NRceLdK4tc4Q@mail.gmail.com \
    --to=dvyukov@google.com \
    --cc=bp@alien8.de \
    --cc=douly.fnst@cn.fujitsu.com \
    --cc=hpa@zytor.com \
    --cc=konrad.wilk@oracle.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=puwen@hygon.cn \
    --cc=syzbot+111bc509cd9740d7e4aa@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tglx@linutronix.de \
    --cc=wang.yi59@zte.com.cn \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).