* rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
@ 2016-03-16 21:39 Steven Haigh
2016-03-18 21:40 ` Steven Haigh
0 siblings, 1 reply; 4+ messages in thread
From: Steven Haigh @ 2016-03-16 21:39 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 3382 bytes --]
Hi all,
I've noticed the following problem that ends up with a non-repsonsive PV
DomU using kernel 4.4.5 under heavy disk IO:
INFO: rcu_sched self-detected stall on CPU
0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0
softirq=3244615/3244615 fqs=4
(t=6762321 jiffies g=2275626 c=2275625 q=54)
rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3
->state=0x0
Task dump for CPU 0:
updatedb R running task 0 6027 6021 0x00000088
ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
Call Trace:
<IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
[<ffffffff810a8699>] dump_cpu_task+0x39/0x40
[<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
[<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
[<ffffffff810a91e1>] ? account_system_time+0x81/0x110
[<ffffffff810a9481>] ? account_process_tick+0x61/0x160
[<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
[<ffffffff810d9749>] update_process_times+0x39/0x60
[<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
[<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
[<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
[<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
[<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
[<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
[<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
[<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
[<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180
[<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20
[<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80
[<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50
[<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90
<EOI> [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170
[<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30
[<ffffffff811df221>] find_inode_fast+0x61/0xa0
[<ffffffff811df7ce>] iget_locked+0x6e/0x170
[<ffffffff81249203>] ext4_iget+0x33/0xae0
[<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80
[<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40
[<ffffffff81253415>] ext4_lookup+0xd5/0x140
[<ffffffff811ceffd>] lookup_real+0x1d/0x50
[<ffffffff811cf3b3>] __lookup_hash+0x33/0x40
[<ffffffff811d1af7>] walk_component+0x177/0x280
[<ffffffff811d2c50>] path_lookupat+0x60/0x110
[<ffffffff811d47dc>] filename_lookup+0x9c/0x150
[<ffffffff811a982d>] ? kfree+0x10d/0x290
[<ffffffff8124122c>] ? call_filldir+0x9c/0x130
[<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0
[<ffffffff811d4946>] user_path_at_empty+0x36/0x40
[<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0
[<ffffffff811c7919>] ? __fput+0x169/0x1d0
[<ffffffff811cab42>] SYSC_newlstat+0x22/0x40
[<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270
[<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0
[<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100
[<ffffffff811cad4e>] SyS_newlstat+0xe/0x10
[<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71
This ends up with the system not responding at 100% CPU usage.
Has anyone else seen this using kernel 4.4.5 in a DomU?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
2016-03-16 21:39 rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU Steven Haigh
@ 2016-03-18 21:40 ` Steven Haigh
2016-03-18 21:46 ` Steven Haigh
0 siblings, 1 reply; 4+ messages in thread
From: Steven Haigh @ 2016-03-18 21:40 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 6759 bytes --]
Hi all,
So I'd just like to give this a prod. I'm still getting DomU's randomly
go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as
the DomU does not induce these problems.
Latest crash message from today:
INFO: rcu_sched self-detected stall on CPU
0-...: (20869552 ticks this GP) idle=9c9/140000000000001/0
softirq=1440865/1440865 fqs=15068
(t=20874993 jiffies g=1354899 c=1354898 q=798)
rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0
s3 ->state=0x0
Task dump for CPU 0:
kworker/u4:1 R running task 0 5853 2 0x00000088
Workqueue: writeback wb_workfn (flush-202:0)
ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
Call Trace:
<IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
[<ffffffff810a8699>] dump_cpu_task+0x39/0x40
[<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
[<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
[<ffffffff810a91e1>] ? account_system_time+0x81/0x110
[<ffffffff810a9481>] ? account_process_tick+0x61/0x160
[<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
[<ffffffff810d9749>] update_process_times+0x39/0x60
[<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
[<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
[<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
[<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
[<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
[<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
[<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
[<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
[<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180
[<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20
[<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80
[<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50
[<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90
<EOI> [<ffffffff810b9ce2>] ? queued_spin_lock_slowpath+0x22/0x170
[<ffffffff8165bc70>] _raw_spin_lock+0x20/0x30
[<ffffffff811f1f34>] writeback_sb_inodes+0x124/0x560
[<ffffffff8165ba36>] ? _raw_spin_unlock_irqrestore+0x16/0x20
[<ffffffff811f23f6>] __writeback_inodes_wb+0x86/0xc0
[<ffffffff811f2606>] wb_writeback+0x1d6/0x2d0
[<ffffffff811f2e94>] wb_workfn+0x284/0x3e0
[<ffffffff81093831>] process_one_work+0x151/0x400
[<ffffffff8109437a>] worker_thread+0x11a/0x460
[<ffffffff8165848f>] ? __schedule+0x2bf/0x880
[<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
[<ffffffff810993d9>] kthread+0xc9/0xe0
[<ffffffff81099310>] ? kthread_park+0x60/0x60
[<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
[<ffffffff81099310>] ? kthread_park+0x60/0x60
This repeats over and over causing 100% CPU usage - eventually on all
vcpus assigned to the DomU and the only recovery is 'xl destroy'.
I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in
both the DomU and Dom0.
On 17/03/2016 8:39 AM, Steven Haigh wrote:
> Hi all,
>
> I've noticed the following problem that ends up with a non-repsonsive PV
> DomU using kernel 4.4.5 under heavy disk IO:
>
> INFO: rcu_sched self-detected stall on CPU
> 0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0
> softirq=3244615/3244615 fqs=4
> (t=6762321 jiffies g=2275626 c=2275625 q=54)
> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3
> ->state=0x0
> Task dump for CPU 0:
> updatedb R running task 0 6027 6021 0x00000088
> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
> Call Trace:
> <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
> [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
> [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
> [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
> [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
> [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
> [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
> [<ffffffff810d9749>] update_process_times+0x39/0x60
> [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
> [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
> [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
> [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
> [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
> [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
> [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
> [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
> [<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180
> [<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20
> [<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80
> [<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50
> [<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90
> <EOI> [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170
> [<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30
> [<ffffffff811df221>] find_inode_fast+0x61/0xa0
> [<ffffffff811df7ce>] iget_locked+0x6e/0x170
> [<ffffffff81249203>] ext4_iget+0x33/0xae0
> [<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80
> [<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40
> [<ffffffff81253415>] ext4_lookup+0xd5/0x140
> [<ffffffff811ceffd>] lookup_real+0x1d/0x50
> [<ffffffff811cf3b3>] __lookup_hash+0x33/0x40
> [<ffffffff811d1af7>] walk_component+0x177/0x280
> [<ffffffff811d2c50>] path_lookupat+0x60/0x110
> [<ffffffff811d47dc>] filename_lookup+0x9c/0x150
> [<ffffffff811a982d>] ? kfree+0x10d/0x290
> [<ffffffff8124122c>] ? call_filldir+0x9c/0x130
> [<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0
> [<ffffffff811d4946>] user_path_at_empty+0x36/0x40
> [<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0
> [<ffffffff811c7919>] ? __fput+0x169/0x1d0
> [<ffffffff811cab42>] SYSC_newlstat+0x22/0x40
> [<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270
> [<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0
> [<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100
> [<ffffffff811cad4e>] SyS_newlstat+0xe/0x10
> [<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71
>
> This ends up with the system not responding at 100% CPU usage.
>
> Has anyone else seen this using kernel 4.4.5 in a DomU?
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU
2016-03-18 21:40 ` Steven Haigh
@ 2016-03-18 21:46 ` Steven Haigh
2016-03-24 3:32 ` rcu_sched self-detected stall on CPU on kernel 4.4.5/6 " Steven Haigh
0 siblings, 1 reply; 4+ messages in thread
From: Steven Haigh @ 2016-03-18 21:46 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 7263 bytes --]
On 19/03/2016 8:40 AM, Steven Haigh wrote:
> Hi all,
>
> So I'd just like to give this a prod. I'm still getting DomU's randomly
> go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as
> the DomU does not induce these problems.
Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't show
these errors. Only in the DomU.
>
> Latest crash message from today:
> INFO: rcu_sched self-detected stall on CPU
> 0-...: (20869552 ticks this GP) idle=9c9/140000000000001/0
> softirq=1440865/1440865 fqs=15068
> (t=20874993 jiffies g=1354899 c=1354898 q=798)
> rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0
> s3 ->state=0x0
> Task dump for CPU 0:
> kworker/u4:1 R running task 0 5853 2 0x00000088
> Workqueue: writeback wb_workfn (flush-202:0)
> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
> Call Trace:
> <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
> [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
> [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
> [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
> [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
> [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
> [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
> [<ffffffff810d9749>] update_process_times+0x39/0x60
> [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
> [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
> [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
> [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
> [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
> [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
> [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
> [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
> [<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180
> [<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20
> [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80
> [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50
> [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90
> <EOI> [<ffffffff810b9ce2>] ? queued_spin_lock_slowpath+0x22/0x170
> [<ffffffff8165bc70>] _raw_spin_lock+0x20/0x30
> [<ffffffff811f1f34>] writeback_sb_inodes+0x124/0x560
> [<ffffffff8165ba36>] ? _raw_spin_unlock_irqrestore+0x16/0x20
> [<ffffffff811f23f6>] __writeback_inodes_wb+0x86/0xc0
> [<ffffffff811f2606>] wb_writeback+0x1d6/0x2d0
> [<ffffffff811f2e94>] wb_workfn+0x284/0x3e0
> [<ffffffff81093831>] process_one_work+0x151/0x400
> [<ffffffff8109437a>] worker_thread+0x11a/0x460
> [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
> [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
> [<ffffffff810993d9>] kthread+0xc9/0xe0
> [<ffffffff81099310>] ? kthread_park+0x60/0x60
> [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
> [<ffffffff81099310>] ? kthread_park+0x60/0x60
>
> This repeats over and over causing 100% CPU usage - eventually on all
> vcpus assigned to the DomU and the only recovery is 'xl destroy'.
>
> I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in
> both the DomU and Dom0.
>
> On 17/03/2016 8:39 AM, Steven Haigh wrote:
>> Hi all,
>>
>> I've noticed the following problem that ends up with a non-repsonsive PV
>> DomU using kernel 4.4.5 under heavy disk IO:
>>
>> INFO: rcu_sched self-detected stall on CPU
>> 0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0
>> softirq=3244615/3244615 fqs=4
>> (t=6762321 jiffies g=2275626 c=2275625 q=54)
>> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0 s3
>> ->state=0x0
>> Task dump for CPU 0:
>> updatedb R running task 0 6027 6021 0x00000088
>> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
>> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
>> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
>> Call Trace:
>> <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
>> [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
>> [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
>> [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
>> [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
>> [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
>> [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
>> [<ffffffff810d9749>] update_process_times+0x39/0x60
>> [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
>> [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
>> [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
>> [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
>> [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
>> [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
>> [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
>> [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
>> [<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180
>> [<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20
>> [<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80
>> [<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50
>> [<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90
>> <EOI> [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170
>> [<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30
>> [<ffffffff811df221>] find_inode_fast+0x61/0xa0
>> [<ffffffff811df7ce>] iget_locked+0x6e/0x170
>> [<ffffffff81249203>] ext4_iget+0x33/0xae0
>> [<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80
>> [<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40
>> [<ffffffff81253415>] ext4_lookup+0xd5/0x140
>> [<ffffffff811ceffd>] lookup_real+0x1d/0x50
>> [<ffffffff811cf3b3>] __lookup_hash+0x33/0x40
>> [<ffffffff811d1af7>] walk_component+0x177/0x280
>> [<ffffffff811d2c50>] path_lookupat+0x60/0x110
>> [<ffffffff811d47dc>] filename_lookup+0x9c/0x150
>> [<ffffffff811a982d>] ? kfree+0x10d/0x290
>> [<ffffffff8124122c>] ? call_filldir+0x9c/0x130
>> [<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0
>> [<ffffffff811d4946>] user_path_at_empty+0x36/0x40
>> [<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0
>> [<ffffffff811c7919>] ? __fput+0x169/0x1d0
>> [<ffffffff811cab42>] SYSC_newlstat+0x22/0x40
>> [<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270
>> [<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0
>> [<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100
>> [<ffffffff811cad4e>] SyS_newlstat+0xe/0x10
>> [<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71
>>
>> This ends up with the system not responding at 100% CPU usage.
>>
>> Has anyone else seen this using kernel 4.4.5 in a DomU?
>>
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: rcu_sched self-detected stall on CPU on kernel 4.4.5/6 in PV DomU
2016-03-18 21:46 ` Steven Haigh
@ 2016-03-24 3:32 ` Steven Haigh
0 siblings, 0 replies; 4+ messages in thread
From: Steven Haigh @ 2016-03-24 3:32 UTC (permalink / raw)
To: xen-devel
Just wanted to give a bit of a poke about this.
Currently running kernel 4.4.6 in a PV DomU and still occasionally
getting hangs.
Also stumbled across this that may be related:
https://lkml.org/lkml/2016/2/4/724
My latest hang shows:
[339844.594001] INFO: rcu_sched self-detected stall on CPU
[339844.594001] 1-...: (287557828 ticks this GP)
idle=4cb/140000000000001/0 softirq=1340383/1340384 fqs=95372371
[339844.594001] (t=287566692 jiffies g=999283 c=999282
q=1725381)
[339844.594001] Task dump for CPU 1:
[339844.594001] find R running task 0 2840 2834
0x00000088
[339844.594001] ffffffff818d0c00 ffff88007fd03c58 ffffffff810a625f
0000000000000001
[339844.594001] ffffffff818d0c00 ffff88007fd03c70 ffffffff810a8699
0000000000000002
[339844.594001] ffff88007fd03ca0 ffffffff810d0e5a ffff88007fd170c0
ffffffff818d0c00
[339844.594001] Call Trace:
[339844.594001] <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
[339844.594001] [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
[339844.594001] [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
[339844.594001] [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
[339844.594001] [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
[339844.594001] [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
[339844.594001] [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
[339844.594001] [<ffffffff810d9749>] update_process_times+0x39/0x60
[339844.594001] [<ffffffff810e7aa6>]
tick_sched_handle.isra.15+0x36/0x50
[339844.594001] [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
[339844.594001] [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
[339844.594001] [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
[339844.594001] [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
[339844.594001] [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
[339844.594001] [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
[339844.594001] [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
[339844.594001] [<ffffffff813e50ff>]
__evtchn_fifo_handle_events+0x15f/0x180
[339844.594001] [<ffffffff813e5130>]
evtchn_fifo_handle_events+0x10/0x20
[339844.594001] [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80
[339844.594001] [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50
[339844.594001] [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90
[339844.594001] <EOI> [<ffffffff8165bc60>] ? _raw_spin_lock+0x10/0x3
On 2016-03-19 08:46, Steven Haigh wrote:
> On 19/03/2016 8:40 AM, Steven Haigh wrote:
>> Hi all,
>>
>> So I'd just like to give this a prod. I'm still getting DomU's
>> randomly
>> go to 100% CPU usage using kernel 4.4.6 now. It seems running 4.4.6 as
>> the DomU does not induce these problems.
>
> Sorry - slight correction. Running 4.4.6 as the Dom0 kernel doesn't
> show
> these errors. Only in the DomU.
>
>>
>> Latest crash message from today:
>> INFO: rcu_sched self-detected stall on CPU
>> 0-...: (20869552 ticks this GP) idle=9c9/140000000000001/0
>> softirq=1440865/1440865 fqs=15068
>> (t=20874993 jiffies g=1354899 c=1354898 q=798)
>> rcu_sched kthread starved for 20829030 jiffies! g1354899 c1354898 f0x0
>> s3 ->state=0x0
>> Task dump for CPU 0:
>> kworker/u4:1 R running task 0 5853 2 0x00000088
>> Workqueue: writeback wb_workfn (flush-202:0)
>> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
>> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
>> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
>> Call Trace:
>> <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
>> [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
>> [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
>> [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
>> [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
>> [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
>> [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
>> [<ffffffff810d9749>] update_process_times+0x39/0x60
>> [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
>> [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
>> [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
>> [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
>> [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
>> [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
>> [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
>> [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
>> [<ffffffff813e50ff>] __evtchn_fifo_handle_events+0x15f/0x180
>> [<ffffffff813e5130>] evtchn_fifo_handle_events+0x10/0x20
>> [<ffffffff813e2233>] __xen_evtchn_do_upcall+0x43/0x80
>> [<ffffffff813e3ea0>] xen_evtchn_do_upcall+0x30/0x50
>> [<ffffffff8165deb2>] xen_hvm_callback_vector+0x82/0x90
>> <EOI> [<ffffffff810b9ce2>] ? queued_spin_lock_slowpath+0x22/0x170
>> [<ffffffff8165bc70>] _raw_spin_lock+0x20/0x30
>> [<ffffffff811f1f34>] writeback_sb_inodes+0x124/0x560
>> [<ffffffff8165ba36>] ? _raw_spin_unlock_irqrestore+0x16/0x20
>> [<ffffffff811f23f6>] __writeback_inodes_wb+0x86/0xc0
>> [<ffffffff811f2606>] wb_writeback+0x1d6/0x2d0
>> [<ffffffff811f2e94>] wb_workfn+0x284/0x3e0
>> [<ffffffff81093831>] process_one_work+0x151/0x400
>> [<ffffffff8109437a>] worker_thread+0x11a/0x460
>> [<ffffffff8165848f>] ? __schedule+0x2bf/0x880
>> [<ffffffff81094260>] ? rescuer_thread+0x2f0/0x2f0
>> [<ffffffff810993d9>] kthread+0xc9/0xe0
>> [<ffffffff81099310>] ? kthread_park+0x60/0x60
>> [<ffffffff8165c38f>] ret_from_fork+0x3f/0x70
>> [<ffffffff81099310>] ? kthread_park+0x60/0x60
>>
>> This repeats over and over causing 100% CPU usage - eventually on all
>> vcpus assigned to the DomU and the only recovery is 'xl destroy'.
>>
>> I'm currently running Xen 4.6.1 on this system - with kernel 4.4.6 in
>> both the DomU and Dom0.
>>
>> On 17/03/2016 8:39 AM, Steven Haigh wrote:
>>> Hi all,
>>>
>>> I've noticed the following problem that ends up with a non-repsonsive
>>> PV
>>> DomU using kernel 4.4.5 under heavy disk IO:
>>>
>>> INFO: rcu_sched self-detected stall on CPU
>>> 0-...: (6759098 ticks this GP) idle=cb3/140000000000001/0
>>> softirq=3244615/3244615 fqs=4
>>> (t=6762321 jiffies g=2275626 c=2275625 q=54)
>>> rcu_sched kthread starved for 6762309 jiffies! g2275626 c2275625 f0x0
>>> s3
>>> ->state=0x0
>>> Task dump for CPU 0:
>>> updatedb R running task 0 6027 6021 0x00000088
>>> ffffffff818d0c00 ffff88007fc03c58 ffffffff810a625f 0000000000000000
>>> ffffffff818d0c00 ffff88007fc03c70 ffffffff810a8699 0000000000000001
>>> ffff88007fc03ca0 ffffffff810d0e5a ffff88007fc170c0 ffffffff818d0c00
>>> Call Trace:
>>> <IRQ> [<ffffffff810a625f>] sched_show_task+0xaf/0x110
>>> [<ffffffff810a8699>] dump_cpu_task+0x39/0x40
>>> [<ffffffff810d0e5a>] rcu_dump_cpu_stacks+0x8a/0xc0
>>> [<ffffffff810d4884>] rcu_check_callbacks+0x424/0x7a0
>>> [<ffffffff810a91e1>] ? account_system_time+0x81/0x110
>>> [<ffffffff810a9481>] ? account_process_tick+0x61/0x160
>>> [<ffffffff810e8050>] ? tick_sched_do_timer+0x30/0x30
>>> [<ffffffff810d9749>] update_process_times+0x39/0x60
>>> [<ffffffff810e7aa6>] tick_sched_handle.isra.15+0x36/0x50
>>> [<ffffffff810e808d>] tick_sched_timer+0x3d/0x70
>>> [<ffffffff810da342>] __hrtimer_run_queues+0xf2/0x250
>>> [<ffffffff810da698>] hrtimer_interrupt+0xa8/0x190
>>> [<ffffffff8100c61e>] xen_timer_interrupt+0x2e/0x140
>>> [<ffffffff810c8555>] handle_irq_event_percpu+0x55/0x1e0
>>> [<ffffffff810cbbca>] handle_percpu_irq+0x3a/0x50
>>> [<ffffffff810c7d22>] generic_handle_irq+0x22/0x30
>>> [<ffffffff813e51af>] __evtchn_fifo_handle_events+0x15f/0x180
>>> [<ffffffff813e51e0>] evtchn_fifo_handle_events+0x10/0x20
>>> [<ffffffff813e22e3>] __xen_evtchn_do_upcall+0x43/0x80
>>> [<ffffffff813e3f50>] xen_evtchn_do_upcall+0x30/0x50
>>> [<ffffffff8165def2>] xen_hvm_callback_vector+0x82/0x90
>>> <EOI> [<ffffffff810b9cd0>] ? queued_spin_lock_slowpath+0x10/0x170
>>> [<ffffffff8165bcc0>] _raw_spin_lock+0x20/0x30
>>> [<ffffffff811df221>] find_inode_fast+0x61/0xa0
>>> [<ffffffff811df7ce>] iget_locked+0x6e/0x170
>>> [<ffffffff81249203>] ext4_iget+0x33/0xae0
>>> [<ffffffff81658f02>] ? out_of_line_wait_on_bit+0x72/0x80
>>> [<ffffffff81249ce0>] ext4_iget_normal+0x30/0x40
>>> [<ffffffff81253415>] ext4_lookup+0xd5/0x140
>>> [<ffffffff811ceffd>] lookup_real+0x1d/0x50
>>> [<ffffffff811cf3b3>] __lookup_hash+0x33/0x40
>>> [<ffffffff811d1af7>] walk_component+0x177/0x280
>>> [<ffffffff811d2c50>] path_lookupat+0x60/0x110
>>> [<ffffffff811d47dc>] filename_lookup+0x9c/0x150
>>> [<ffffffff811a982d>] ? kfree+0x10d/0x290
>>> [<ffffffff8124122c>] ? call_filldir+0x9c/0x130
>>> [<ffffffff811d441f>] ? getname_flags+0x4f/0x1f0
>>> [<ffffffff811d4946>] user_path_at_empty+0x36/0x40
>>> [<ffffffff811ca6f3>] vfs_fstatat+0x53/0xa0
>>> [<ffffffff811c7919>] ? __fput+0x169/0x1d0
>>> [<ffffffff811cab42>] SYSC_newlstat+0x22/0x40
>>> [<ffffffff8110cad0>] ? __audit_syscall_exit+0x1f0/0x270
>>> [<ffffffff81078115>] ? syscall_slow_exit_work+0x3f/0xc0
>>> [<ffffffff8110c88f>] ? __audit_syscall_entry+0xaf/0x100
>>> [<ffffffff811cad4e>] SyS_newlstat+0xe/0x10
>>> [<ffffffff8165c06e>] entry_SYSCALL_64_fastpath+0x12/0x71
>>>
>>> This ends up with the system not responding at 100% CPU usage.
>>>
>>> Has anyone else seen this using kernel 4.4.5 in a DomU?
--
Steven Haigh
Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-03-24 3:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-16 21:39 rcu_sched self-detected stall on CPU on kernel 4.4.5 in PV DomU Steven Haigh
2016-03-18 21:40 ` Steven Haigh
2016-03-18 21:46 ` Steven Haigh
2016-03-24 3:32 ` rcu_sched self-detected stall on CPU on kernel 4.4.5/6 " Steven Haigh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).