mm, virtio: possible OOM lockup at virtballoon_oom_notify()

* mm, virtio: possible OOM lockup at virtballoon_oom_notify()
@ 2017-09-11 10:27 Tetsuo Handa
  2017-09-29  4:00   ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-09-11 10:27 UTC (permalink / raw)
  To: mst, jasowang; +Cc: linux-mm, virtualization

Hello.

I noticed that virtio_balloon is using register_oom_notifier() and
leak_balloon() from virtballoon_oom_notify() might depend on
__GFP_DIRECT_RECLAIM memory allocation.

In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
__GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
__alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
And leak_balloon() is called by virtballoon_oom_notify() via
blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
at leak_balloon().

Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
request from leak_balloon() from virtballoon_oom_notify() from
blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
because oom_lock mutex is already held before calling out_of_memory().

OOM notifier callback should not (directly or indirectly) depend on
__GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?

^ permalink raw reply	[flat|nested] 38+ messages in thread