* update_balloon_size_func blocked for more than 120 seconds
@ 2021-11-11 22:48 Luis Chamberlain
2021-11-12 9:13 ` David Hildenbrand
0 siblings, 1 reply; 3+ messages in thread
From: Luis Chamberlain @ 2021-11-11 22:48 UTC (permalink / raw)
To: mst, david, jasowang, virtualization, Michal Hocko, Vlastimil Babka
Cc: Luis Chamberlain, linux-kernel
I get the following splats with a kvm guest in idle, after a few seconds
it starts:
[ 242.412806] INFO: task kworker/6:2:271 blockedfor more than 120 seconds.
[ 242.415790] Tainted: G E 5.15.0-next-20211111 #68
[ 242.417755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 242.418332] task:kworker/6:2 state:D stack: 0 pid: 271 ppid: 2 flags:0x00004000
[ 242.418954] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[ 242.419518] Call Trace:
[ 242.419709] <TASK>
[ 242.419873] __schedule+0x2fd/0x990
[ 242.420142] schedule+0x4e/0xc0
[ 242.420382] tell_host+0xaa/0xf0 [virtio_balloon]
[ 242.420757] ? do_wait_intr_irq+0xa0/0xa0
[ 242.421065] update_balloon_size_func+0x2c9/0x2e0 [virtio_balloon]
[ 242.421527] process_one_work+0x1e5/0x3c0
[ 242.421833] worker_thread+0x50/0x3b0
[ 242.422204] ? rescuer_thread+0x370/0x370
[ 242.422507] kthread+0x169/0x190
[ 242.422754] ? set_kthread_struct+0x40/0x40
[ 242.423073] ret_from_fork+0x1f/0x30
[ 242.423347] </TASK>
And this goes on endlessly. The last one says it blocked for more than 1208
seconds. This was not happening until the last few weeks but I see no
relevant recent commits for virtio_balloon, so the related change could
be elsewhere.
I could bisect but first I figured I'd check to see if someone already
had spotted this.
Luis
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: update_balloon_size_func blocked for more than 120 seconds
2021-11-11 22:48 update_balloon_size_func blocked for more than 120 seconds Luis Chamberlain
@ 2021-11-12 9:13 ` David Hildenbrand
2021-11-24 1:32 ` Michael Ellerman
0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand @ 2021-11-12 9:13 UTC (permalink / raw)
To: Luis Chamberlain
Cc: Michael Tsirkin, Jason Wang, virtualization, Michal Hocko,
Vlastimil Babka, Linux Kernel Mailing List
On Thu, Nov 11, 2021 at 11:49 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>
> I get the following splats with a kvm guest in idle, after a few seconds
> it starts:
>
> [ 242.412806] INFO: task kworker/6:2:271 blockedfor more than 120 seconds.
> [ 242.415790] Tainted: G E 5.15.0-next-20211111 #68
> [ 242.417755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 242.418332] task:kworker/6:2 state:D stack: 0 pid: 271 ppid: 2 flags:0x00004000
> [ 242.418954] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
> [ 242.419518] Call Trace:
> [ 242.419709] <TASK>
> [ 242.419873] __schedule+0x2fd/0x990
> [ 242.420142] schedule+0x4e/0xc0
> [ 242.420382] tell_host+0xaa/0xf0 [virtio_balloon]
> [ 242.420757] ? do_wait_intr_irq+0xa0/0xa0
> [ 242.421065] update_balloon_size_func+0x2c9/0x2e0 [virtio_balloon]
> [ 242.421527] process_one_work+0x1e5/0x3c0
> [ 242.421833] worker_thread+0x50/0x3b0
> [ 242.422204] ? rescuer_thread+0x370/0x370
> [ 242.422507] kthread+0x169/0x190
> [ 242.422754] ? set_kthread_struct+0x40/0x40
> [ 242.423073] ret_from_fork+0x1f/0x30
> [ 242.423347] </TASK>
>
> And this goes on endlessly. The last one says it blocked for more than 1208
> seconds. This was not happening until the last few weeks but I see no
> relevant recent commits for virtio_balloon, so the related change could
> be elsewhere.
We're stuck somewhere in:
wq: update_balloon_size_func()->fill_balloon()->tell_host()
Most probably in wait_event().
I am no waitqueue expert, but my best guess would be that we're
waiting more than 2 minutes
on a host reply with TASK_UNINTERRUPTIBLE. At least that's my interpretation,
In case we're really stuck for more than 2 minutes, the hypervisor
might not be processing our
requests anymore -- or it's not getting processed for some other reason (or the
waitqueue is not getting woken up due do some other BUG).
IIUC, we can sleep longer via wait_event_interruptible(), TASK_UNINTERRUPTIBLE
seems to be the issue that triggers the warning. But by changing that
might just be hiding the fact that
we're waiting more than 2 minutes on a reply.
>
> I could bisect but first I figured I'd check to see if someone already
> had spotted this.
Bisecting would be awesome, then we might at least know if this is a
guest or a hypervisor issue.
Note that the environment matters: the hypervisor seems to be
requesting the guest to inflate
the balloon right when booting up. So you might not be able to
reproduce in a different environment.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: update_balloon_size_func blocked for more than 120 seconds
2021-11-12 9:13 ` David Hildenbrand
@ 2021-11-24 1:32 ` Michael Ellerman
0 siblings, 0 replies; 3+ messages in thread
From: Michael Ellerman @ 2021-11-24 1:32 UTC (permalink / raw)
To: David Hildenbrand, Luis Chamberlain
Cc: Michael Tsirkin, Jason Wang, virtualization, Michal Hocko,
Vlastimil Babka, Linux Kernel Mailing List
David Hildenbrand <david@redhat.com> writes:
> On Thu, Nov 11, 2021 at 11:49 PM Luis Chamberlain <mcgrof@kernel.org> wrote:
>>
>> I get the following splats with a kvm guest in idle, after a few seconds
>> it starts:
>>
>> [ 242.412806] INFO: task kworker/6:2:271 blockedfor more than 120 seconds.
>> [ 242.415790] Tainted: G E 5.15.0-next-20211111 #68
>> [ 242.417755] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 242.418332] task:kworker/6:2 state:D stack: 0 pid: 271 ppid: 2 flags:0x00004000
>> [ 242.418954] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
>> [ 242.419518] Call Trace:
>> [ 242.419709] <TASK>
>> [ 242.419873] __schedule+0x2fd/0x990
>> [ 242.420142] schedule+0x4e/0xc0
>> [ 242.420382] tell_host+0xaa/0xf0 [virtio_balloon]
>> [ 242.420757] ? do_wait_intr_irq+0xa0/0xa0
>> [ 242.421065] update_balloon_size_func+0x2c9/0x2e0 [virtio_balloon]
>> [ 242.421527] process_one_work+0x1e5/0x3c0
>> [ 242.421833] worker_thread+0x50/0x3b0
>> [ 242.422204] ? rescuer_thread+0x370/0x370
>> [ 242.422507] kthread+0x169/0x190
>> [ 242.422754] ? set_kthread_struct+0x40/0x40
>> [ 242.423073] ret_from_fork+0x1f/0x30
>> [ 242.423347] </TASK>
>>
>> And this goes on endlessly. The last one says it blocked for more than 1208
>> seconds. This was not happening until the last few weeks but I see no
>> relevant recent commits for virtio_balloon, so the related change could
>> be elsewhere.
>
> We're stuck somewhere in:
>
> wq: update_balloon_size_func()->fill_balloon()->tell_host()
>
> Most probably in wait_event().
>
>
> I am no waitqueue expert, but my best guess would be that we're
> waiting more than 2 minutes
> on a host reply with TASK_UNINTERRUPTIBLE. At least that's my interpretation,
>
> In case we're really stuck for more than 2 minutes, the hypervisor
> might not be processing our
> requests anymore -- or it's not getting processed for some other reason (or the
> waitqueue is not getting woken up due do some other BUG).
>
> IIUC, we can sleep longer via wait_event_interruptible(), TASK_UNINTERRUPTIBLE
> seems to be the issue that triggers the warning. But by changing that
> might just be hiding the fact that
> we're waiting more than 2 minutes on a reply.
>
>>
>> I could bisect but first I figured I'd check to see if someone already
>> had spotted this.
>
> Bisecting would be awesome, then we might at least know if this is a
> guest or a hypervisor issue.
I see this on ppc64le also.
I bisected it to:
# first bad commit: [939779f5152d161b34f612af29e7dc1ac4472fcf] virtio_ring: validate used buffer length
I also reported it in the thread hanging off that patch:
https://lore.kernel.org/lkml/87zgpupcga.fsf@mpe.ellerman.id.au/
cheers
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-11-24 1:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-11 22:48 update_balloon_size_func blocked for more than 120 seconds Luis Chamberlain
2021-11-12 9:13 ` David Hildenbrand
2021-11-24 1:32 ` Michael Ellerman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).