All of lore.kernel.org
 help / color / mirror / Atom feed
* mm, virtio: possible OOM lockup at virtballoon_oom_notify()
@ 2017-09-11 10:27 Tetsuo Handa
  2017-09-29  4:00   ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-09-11 10:27 UTC (permalink / raw)
  To: mst, jasowang; +Cc: linux-mm, virtualization

Hello.

I noticed that virtio_balloon is using register_oom_notifier() and
leak_balloon() from virtballoon_oom_notify() might depend on
__GFP_DIRECT_RECLAIM memory allocation.

In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
__GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
__alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
And leak_balloon() is called by virtballoon_oom_notify() via
blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
at leak_balloon().

Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
request from leak_balloon() from virtballoon_oom_notify() from
blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
because oom_lock mutex is already held before calling out_of_memory().

OOM notifier callback should not (directly or indirectly) depend on
__GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mm, virtio: possible OOM lockup at virtballoon_oom_notify()
  2017-09-11 10:27 mm, virtio: possible OOM lockup at virtballoon_oom_notify() Tetsuo Handa
@ 2017-09-29  4:00   ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-09-29  4:00 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: jasowang, virtualization, linux-mm

On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I noticed that virtio_balloon is using register_oom_notifier() and
> leak_balloon() from virtballoon_oom_notify() might depend on
> __GFP_DIRECT_RECLAIM memory allocation.
> 
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> And leak_balloon() is called by virtballoon_oom_notify() via
> blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> at leak_balloon().

That would be tricky to fix. I guess we'll need to drop the lock
while allocating memory - not an easy fix.

> Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> request from leak_balloon() from virtballoon_oom_notify() from
> blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> because oom_lock mutex is already held before calling out_of_memory().

I guess we should just do

GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?


> 
> OOM notifier callback should not (directly or indirectly) depend on
> __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mm, virtio: possible OOM lockup at virtballoon_oom_notify()
@ 2017-09-29  4:00   ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-09-29  4:00 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: linux-mm, virtualization

On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> Hello.
> 
> I noticed that virtio_balloon is using register_oom_notifier() and
> leak_balloon() from virtballoon_oom_notify() might depend on
> __GFP_DIRECT_RECLAIM memory allocation.
> 
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> And leak_balloon() is called by virtballoon_oom_notify() via
> blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> at leak_balloon().

That would be tricky to fix. I guess we'll need to drop the lock
while allocating memory - not an easy fix.

> Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> request from leak_balloon() from virtballoon_oom_notify() from
> blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> because oom_lock mutex is already held before calling out_of_memory().

I guess we should just do

GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?


> 
> OOM notifier callback should not (directly or indirectly) depend on
> __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mm, virtio: possible OOM lockup at virtballoon_oom_notify()
  2017-09-29  4:00   ` Michael S. Tsirkin
  (?)
  (?)
@ 2017-09-29  4:44   ` Tetsuo Handa
  2017-10-01  5:44     ` [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread Tetsuo Handa
  -1 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-09-29  4:44 UTC (permalink / raw)
  To: mst; +Cc: jasowang, virtualization, linux-mm

Michael S. Tsirkin wrote:
> On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > Hello.
> > 
> > I noticed that virtio_balloon is using register_oom_notifier() and
> > leak_balloon() from virtballoon_oom_notify() might depend on
> > __GFP_DIRECT_RECLAIM memory allocation.
> > 
> > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > And leak_balloon() is called by virtballoon_oom_notify() via
> > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > at leak_balloon().
> 
> That would be tricky to fix. I guess we'll need to drop the lock
> while allocating memory - not an easy fix.
> 
> > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > request from leak_balloon() from virtballoon_oom_notify() from
> > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > because oom_lock mutex is already held before calling out_of_memory().
> 
> I guess we should just do
> 
> GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?

Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
__GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
significantly increases possibility of memory allocation failure.

> 
> 
> > 
> > OOM notifier callback should not (directly or indirectly) depend on
> > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> 

Another idea would be to use a kernel thread (or workqueue) so that
virtballoon_oom_notify() can wait with timeout.

We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
call to a kernel thread (or workqueue) with timeout if MM folks agree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: mm, virtio: possible OOM lockup at virtballoon_oom_notify()
  2017-09-29  4:00   ` Michael S. Tsirkin
  (?)
@ 2017-09-29  4:44   ` Tetsuo Handa
  -1 siblings, 0 replies; 38+ messages in thread
From: Tetsuo Handa @ 2017-09-29  4:44 UTC (permalink / raw)
  To: mst; +Cc: linux-mm, virtualization

Michael S. Tsirkin wrote:
> On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > Hello.
> > 
> > I noticed that virtio_balloon is using register_oom_notifier() and
> > leak_balloon() from virtballoon_oom_notify() might depend on
> > __GFP_DIRECT_RECLAIM memory allocation.
> > 
> > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > And leak_balloon() is called by virtballoon_oom_notify() via
> > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > at leak_balloon().
> 
> That would be tricky to fix. I guess we'll need to drop the lock
> while allocating memory - not an easy fix.
> 
> > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > request from leak_balloon() from virtballoon_oom_notify() from
> > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > because oom_lock mutex is already held before calling out_of_memory().
> 
> I guess we should just do
> 
> GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?

Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
__GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
significantly increases possibility of memory allocation failure.

> 
> 
> > 
> > OOM notifier callback should not (directly or indirectly) depend on
> > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> 

Another idea would be to use a kernel thread (or workqueue) so that
virtballoon_oom_notify() can wait with timeout.

We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
call to a kernel thread (or workqueue) with timeout if MM folks agree.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
  2017-09-29  4:44   ` Tetsuo Handa
@ 2017-10-01  5:44     ` Tetsuo Handa
  2017-10-02  3:59         ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  2017-10-02  3:59       ` Michael S. Tsirkin
  0 siblings, 2 replies; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-01  5:44 UTC (permalink / raw)
  To: mst
  Cc: airlied, jasowang, jiangshanlai, josh, virtualization, linux-mm,
	mathieu.desnoyers, rostedt, rodrigo.vivi, paulmck, intel-gfx

Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > Hello.
> > > 
> > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > __GFP_DIRECT_RECLAIM memory allocation.
> > > 
> > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > serialize against fill_balloon(). But in fill_balloon(),
> > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > at leak_balloon().
> > 
> > That would be tricky to fix. I guess we'll need to drop the lock
> > while allocating memory - not an easy fix.
> > 
> > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > > request from leak_balloon() from virtballoon_oom_notify() from
> > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > > because oom_lock mutex is already held before calling out_of_memory().
> > 
> > I guess we should just do
> > 
> > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?
> 
> Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
> __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
> significantly increases possibility of memory allocation failure.
> 
> > 
> > 
> > > 
> > > OOM notifier callback should not (directly or indirectly) depend on
> > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> > 
> 
> Another idea would be to use a kernel thread (or workqueue) so that
> virtballoon_oom_notify() can wait with timeout.
> 
> We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
> call to a kernel thread (or workqueue) with timeout if MM folks agree.
> 

Below is a patch which offloads blocking_notifier_call_chain() call. What do you think?
----------------------------------------
[RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.

Since oom_notify_list is traversed via blocking_notifier_call_chain(),
it is legal to sleep inside OOM notifier callback function.

However, since oom_notify_list is traversed with oom_lock held,
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot
fail when traversing oom_notify_list entries. Therefore, OOM notifier
callback function should not (directly or indirectly) depend on
__GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt.

Currently there are 5 register_oom_notifier() users in the mainline kernel.

  arch/powerpc/platforms/pseries/cmm.c
  arch/s390/mm/cmm.c
  drivers/gpu/drm/i915/i915_gem_shrinker.c
  drivers/virtio/virtio_balloon.c
  kernel/rcu/tree_plugin.h

Among these users, at least virtio_balloon.c has possibility of OOM lockup
because it is using mutex which can depend on GFP_KERNEL memory allocations.
(Both cmm.c seem to be safe as they use spinlocks. I'm not sure about
tree_plugin.h and i915_gem_shrinker.c . Please check.)

But converting such allocations to use GFP_NOWAIT is not only prone to
allocation failures under memory pressure but also difficult to audit
whether all locations are converted to use GFP_NOWAIT.

Therefore, this patch offloads blocking_notifier_call_chain() call to a
dedicated kernel thread and wait for completion with timeout of 5 seconds
so that we can completely forget about possibility of OOM lockup due to
OOM notifier callback function.

(5 seconds is chosen from my guess that blocking_notifier_call_chain()
should not take long, for we are using mutex_trylock(&oom_lock) at
__alloc_pages_may_oom() based on an assumption that out_of_memory() should
reclaim memory shortly.)

The kernel thread is created upon first register_oom_notifier() call.
Thus, those environments which do not use register_oom_notifier() will
not waste resource for the dedicated kernel thread.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index dee0f75..d9744f7 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc,
 }
 
 static BLOCKING_NOTIFIER_HEAD(oom_notify_list);
+static bool oom_notifier_requested;
+static unsigned long oom_notifier_freed;
+static struct task_struct *oom_notifier_th;
+static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait);
+static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait);
+
+static int oom_notifier(void *unused)
+{
+	while (true) {
+		wait_event_freezable(oom_notifier_request_wait,
+				     oom_notifier_requested);
+		blocking_notifier_call_chain(&oom_notify_list, 0,
+					     &oom_notifier_freed);
+		oom_notifier_requested = false;
+		wake_up(&oom_notifier_response_wait);
+	}
+	return 0;
+}
 
 int register_oom_notifier(struct notifier_block *nb)
 {
+	if (!oom_notifier_th) {
+		struct task_struct *th = kthread_run(oom_notifier, NULL,
+						     "oom_notifier");
+
+		if (IS_ERR(th)) {
+			pr_err("Unable to start OOM notifier thread.\n");
+			return (int) PTR_ERR(th);
+		}
+		oom_notifier_th = th;
+	}
 	return blocking_notifier_chain_register(&oom_notify_list, nb);
 }
 EXPORT_SYMBOL_GPL(register_oom_notifier);
@@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
  */
 bool out_of_memory(struct oom_control *oc)
 {
-	unsigned long freed = 0;
 	enum oom_constraint constraint = CONSTRAINT_NONE;
 
 	if (oom_killer_disabled)
 		return false;
 
-	if (!is_memcg_oom(oc)) {
-		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
-		if (freed > 0)
+	if (!is_memcg_oom(oc) && oom_notifier_th) {
+		oom_notifier_requested = true;
+		wake_up(&oom_notifier_request_wait);
+		wait_event_timeout(oom_notifier_response_wait,
+				   !oom_notifier_requested, 5 * HZ);
+		if (oom_notifier_freed) {
+			oom_notifier_freed = 0;
 			/* Got some memory back in the last second. */
 			return true;
+		}
 	}
 
 	/*
-- 
1.8.3.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-01  5:44     ` [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread Tetsuo Handa
@ 2017-10-02  3:59         ` Michael S. Tsirkin
  2017-10-02  3:59       ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02  3:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: jasowang, jani.nikula, joonas.lahtinen, rodrigo.vivi, airlied,
	paulmck, josh, rostedt, mathieu.desnoyers, jiangshanlai,
	virtualization, intel-gfx, linux-mm

On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michael S. Tsirkin wrote:
> > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > 
> > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > at leak_balloon().
> > > 
> > > That would be tricky to fix. I guess we'll need to drop the lock
> > > while allocating memory - not an easy fix.
> > > 
> > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > > > request from leak_balloon() from virtballoon_oom_notify() from
> > > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > > > because oom_lock mutex is already held before calling out_of_memory().
> > > 
> > > I guess we should just do
> > > 
> > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?
> > 
> > Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
> > __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
> > significantly increases possibility of memory allocation failure.
> > 
> > > 
> > > 
> > > > 
> > > > OOM notifier callback should not (directly or indirectly) depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> > > 
> > 
> > Another idea would be to use a kernel thread (or workqueue) so that
> > virtballoon_oom_notify() can wait with timeout.
> > 
> > We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
> > call to a kernel thread (or workqueue) with timeout if MM folks agree.
> > 
> 
> Below is a patch which offloads blocking_notifier_call_chain() call. What do you think?
> ----------------------------------------
> [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
> 
> Since oom_notify_list is traversed via blocking_notifier_call_chain(),
> it is legal to sleep inside OOM notifier callback function.
> 
> However, since oom_notify_list is traversed with oom_lock held,
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot
> fail when traversing oom_notify_list entries. Therefore, OOM notifier
> callback function should not (directly or indirectly) depend on
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt.
> 
> Currently there are 5 register_oom_notifier() users in the mainline kernel.
> 
>   arch/powerpc/platforms/pseries/cmm.c
>   arch/s390/mm/cmm.c
>   drivers/gpu/drm/i915/i915_gem_shrinker.c
>   drivers/virtio/virtio_balloon.c
>   kernel/rcu/tree_plugin.h
> 
> Among these users, at least virtio_balloon.c has possibility of OOM lockup
> because it is using mutex which can depend on GFP_KERNEL memory allocations.
> (Both cmm.c seem to be safe as they use spinlocks. I'm not sure about
> tree_plugin.h and i915_gem_shrinker.c . Please check.)
> 
> But converting such allocations to use GFP_NOWAIT is not only prone to
> allocation failures under memory pressure but also difficult to audit
> whether all locations are converted to use GFP_NOWAIT.
> 
> Therefore, this patch offloads blocking_notifier_call_chain() call to a
> dedicated kernel thread and wait for completion with timeout of 5 seconds
> so that we can completely forget about possibility of OOM lockup due to
> OOM notifier callback function.
> 
> (5 seconds is chosen from my guess that blocking_notifier_call_chain()
> should not take long, for we are using mutex_trylock(&oom_lock) at
> __alloc_pages_may_oom() based on an assumption that out_of_memory() should
> reclaim memory shortly.)
> 
> The kernel thread is created upon first register_oom_notifier() call.
> Thus, those environments which do not use register_oom_notifier() will
> not waste resource for the dedicated kernel thread.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dee0f75..d9744f7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc,
>  }
>  
>  static BLOCKING_NOTIFIER_HEAD(oom_notify_list);
> +static bool oom_notifier_requested;
> +static unsigned long oom_notifier_freed;
> +static struct task_struct *oom_notifier_th;
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait);
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait);
> +
> +static int oom_notifier(void *unused)
> +{
> +	while (true) {
> +		wait_event_freezable(oom_notifier_request_wait,
> +				     oom_notifier_requested);
> +		blocking_notifier_call_chain(&oom_notify_list, 0,
> +					     &oom_notifier_freed);
> +		oom_notifier_requested = false;
> +		wake_up(&oom_notifier_response_wait);
> +	}
> +	return 0;
> +}
>  
>  int register_oom_notifier(struct notifier_block *nb)
>  {
> +	if (!oom_notifier_th) {
> +		struct task_struct *th = kthread_run(oom_notifier, NULL,
> +						     "oom_notifier");
> +
> +		if (IS_ERR(th)) {
> +			pr_err("Unable to start OOM notifier thread.\n");
> +			return (int) PTR_ERR(th);
> +		}
> +		oom_notifier_th = th;
> +	}
>  	return blocking_notifier_chain_register(&oom_notify_list, nb);
>  }
>  EXPORT_SYMBOL_GPL(register_oom_notifier);
> @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
>   */
>  bool out_of_memory(struct oom_control *oc)
>  {
> -	unsigned long freed = 0;
>  	enum oom_constraint constraint = CONSTRAINT_NONE;
>  
>  	if (oom_killer_disabled)
>  		return false;
>  
> -	if (!is_memcg_oom(oc)) {
> -		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> -		if (freed > 0)
> +	if (!is_memcg_oom(oc) && oom_notifier_th) {
> +		oom_notifier_requested = true;
> +		wake_up(&oom_notifier_request_wait);
> +		wait_event_timeout(oom_notifier_response_wait,
> +				   !oom_notifier_requested, 5 * HZ);

I guess this means what was earlier a deadlock will free up after 5
seconds, by a 5 sec downtime is still a lot, isn't it?


> +		if (oom_notifier_freed) {
> +			oom_notifier_freed = 0;
>  			/* Got some memory back in the last second. */
>  			return true;
> +		}
>  	}
>  
>  	/*
> -- 
> 1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-01  5:44     ` [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread Tetsuo Handa
  2017-10-02  3:59         ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
@ 2017-10-02  3:59       ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02  3:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: airlied, joonas.lahtinen, jiangshanlai, josh, jani.nikula,
	virtualization, linux-mm, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michael S. Tsirkin wrote:
> > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > 
> > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > at leak_balloon().
> > > 
> > > That would be tricky to fix. I guess we'll need to drop the lock
> > > while allocating memory - not an easy fix.
> > > 
> > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > > > request from leak_balloon() from virtballoon_oom_notify() from
> > > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > > > because oom_lock mutex is already held before calling out_of_memory().
> > > 
> > > I guess we should just do
> > > 
> > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?
> > 
> > Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
> > __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
> > significantly increases possibility of memory allocation failure.
> > 
> > > 
> > > 
> > > > 
> > > > OOM notifier callback should not (directly or indirectly) depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> > > 
> > 
> > Another idea would be to use a kernel thread (or workqueue) so that
> > virtballoon_oom_notify() can wait with timeout.
> > 
> > We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
> > call to a kernel thread (or workqueue) with timeout if MM folks agree.
> > 
> 
> Below is a patch which offloads blocking_notifier_call_chain() call. What do you think?
> ----------------------------------------
> [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
> 
> Since oom_notify_list is traversed via blocking_notifier_call_chain(),
> it is legal to sleep inside OOM notifier callback function.
> 
> However, since oom_notify_list is traversed with oom_lock held,
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot
> fail when traversing oom_notify_list entries. Therefore, OOM notifier
> callback function should not (directly or indirectly) depend on
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt.
> 
> Currently there are 5 register_oom_notifier() users in the mainline kernel.
> 
>   arch/powerpc/platforms/pseries/cmm.c
>   arch/s390/mm/cmm.c
>   drivers/gpu/drm/i915/i915_gem_shrinker.c
>   drivers/virtio/virtio_balloon.c
>   kernel/rcu/tree_plugin.h
> 
> Among these users, at least virtio_balloon.c has possibility of OOM lockup
> because it is using mutex which can depend on GFP_KERNEL memory allocations.
> (Both cmm.c seem to be safe as they use spinlocks. I'm not sure about
> tree_plugin.h and i915_gem_shrinker.c . Please check.)
> 
> But converting such allocations to use GFP_NOWAIT is not only prone to
> allocation failures under memory pressure but also difficult to audit
> whether all locations are converted to use GFP_NOWAIT.
> 
> Therefore, this patch offloads blocking_notifier_call_chain() call to a
> dedicated kernel thread and wait for completion with timeout of 5 seconds
> so that we can completely forget about possibility of OOM lockup due to
> OOM notifier callback function.
> 
> (5 seconds is chosen from my guess that blocking_notifier_call_chain()
> should not take long, for we are using mutex_trylock(&oom_lock) at
> __alloc_pages_may_oom() based on an assumption that out_of_memory() should
> reclaim memory shortly.)
> 
> The kernel thread is created upon first register_oom_notifier() call.
> Thus, those environments which do not use register_oom_notifier() will
> not waste resource for the dedicated kernel thread.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dee0f75..d9744f7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc,
>  }
>  
>  static BLOCKING_NOTIFIER_HEAD(oom_notify_list);
> +static bool oom_notifier_requested;
> +static unsigned long oom_notifier_freed;
> +static struct task_struct *oom_notifier_th;
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait);
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait);
> +
> +static int oom_notifier(void *unused)
> +{
> +	while (true) {
> +		wait_event_freezable(oom_notifier_request_wait,
> +				     oom_notifier_requested);
> +		blocking_notifier_call_chain(&oom_notify_list, 0,
> +					     &oom_notifier_freed);
> +		oom_notifier_requested = false;
> +		wake_up(&oom_notifier_response_wait);
> +	}
> +	return 0;
> +}
>  
>  int register_oom_notifier(struct notifier_block *nb)
>  {
> +	if (!oom_notifier_th) {
> +		struct task_struct *th = kthread_run(oom_notifier, NULL,
> +						     "oom_notifier");
> +
> +		if (IS_ERR(th)) {
> +			pr_err("Unable to start OOM notifier thread.\n");
> +			return (int) PTR_ERR(th);
> +		}
> +		oom_notifier_th = th;
> +	}
>  	return blocking_notifier_chain_register(&oom_notify_list, nb);
>  }
>  EXPORT_SYMBOL_GPL(register_oom_notifier);
> @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
>   */
>  bool out_of_memory(struct oom_control *oc)
>  {
> -	unsigned long freed = 0;
>  	enum oom_constraint constraint = CONSTRAINT_NONE;
>  
>  	if (oom_killer_disabled)
>  		return false;
>  
> -	if (!is_memcg_oom(oc)) {
> -		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> -		if (freed > 0)
> +	if (!is_memcg_oom(oc) && oom_notifier_th) {
> +		oom_notifier_requested = true;
> +		wake_up(&oom_notifier_request_wait);
> +		wait_event_timeout(oom_notifier_response_wait,
> +				   !oom_notifier_requested, 5 * HZ);

I guess this means what was earlier a deadlock will free up after 5
seconds, by a 5 sec downtime is still a lot, isn't it?


> +		if (oom_notifier_freed) {
> +			oom_notifier_freed = 0;
>  			/* Got some memory back in the last second. */
>  			return true;
> +		}
>  	}
>  
>  	/*
> -- 
> 1.8.3.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
@ 2017-10-02  3:59         ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02  3:59 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: airlied, jasowang, jiangshanlai, josh, virtualization, linux-mm,
	mathieu.desnoyers, rostedt, rodrigo.vivi, paulmck, intel-gfx

On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> Tetsuo Handa wrote:
> > Michael S. Tsirkin wrote:
> > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > Hello.
> > > > 
> > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > 
> > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > at leak_balloon().
> > > 
> > > That would be tricky to fix. I guess we'll need to drop the lock
> > > while allocating memory - not an easy fix.
> > > 
> > > > Also, in leak_balloon(), virtqueue_add_outbuf(GFP_KERNEL) is called via
> > > > tell_host(). Reaching __alloc_pages_may_oom() from this virtqueue_add_outbuf()
> > > > request from leak_balloon() from virtballoon_oom_notify() from
> > > > blocking_notifier_call_chain() from out_of_memory() leads to OOM lockup
> > > > because oom_lock mutex is already held before calling out_of_memory().
> > > 
> > > I guess we should just do
> > > 
> > > GFP_KERNEL & ~__GFP_DIRECT_RECLAIM there then?
> > 
> > Yes, but GFP_KERNEL & ~__GFP_DIRECT_RECLAIM will effectively be GFP_NOWAIT, for
> > __GFP_IO and __GFP_FS won't make sense without __GFP_DIRECT_RECLAIM. It might
> > significantly increases possibility of memory allocation failure.
> > 
> > > 
> > > 
> > > > 
> > > > OOM notifier callback should not (directly or indirectly) depend on
> > > > __GFP_DIRECT_RECLAIM memory allocation attempt. Can you fix this dependency?
> > > 
> > 
> > Another idea would be to use a kernel thread (or workqueue) so that
> > virtballoon_oom_notify() can wait with timeout.
> > 
> > We could offload entire blocking_notifier_call_chain(&oom_notify_list, 0, &freed)
> > call to a kernel thread (or workqueue) with timeout if MM folks agree.
> > 
> 
> Below is a patch which offloads blocking_notifier_call_chain() call. What do you think?
> ----------------------------------------
> [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
> 
> Since oom_notify_list is traversed via blocking_notifier_call_chain(),
> it is legal to sleep inside OOM notifier callback function.
> 
> However, since oom_notify_list is traversed with oom_lock held,
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt cannot
> fail when traversing oom_notify_list entries. Therefore, OOM notifier
> callback function should not (directly or indirectly) depend on
> __GFP_DIRECT_RECLAIM && !__GFP_NORETRY memory allocation attempt.
> 
> Currently there are 5 register_oom_notifier() users in the mainline kernel.
> 
>   arch/powerpc/platforms/pseries/cmm.c
>   arch/s390/mm/cmm.c
>   drivers/gpu/drm/i915/i915_gem_shrinker.c
>   drivers/virtio/virtio_balloon.c
>   kernel/rcu/tree_plugin.h
> 
> Among these users, at least virtio_balloon.c has possibility of OOM lockup
> because it is using mutex which can depend on GFP_KERNEL memory allocations.
> (Both cmm.c seem to be safe as they use spinlocks. I'm not sure about
> tree_plugin.h and i915_gem_shrinker.c . Please check.)
> 
> But converting such allocations to use GFP_NOWAIT is not only prone to
> allocation failures under memory pressure but also difficult to audit
> whether all locations are converted to use GFP_NOWAIT.
> 
> Therefore, this patch offloads blocking_notifier_call_chain() call to a
> dedicated kernel thread and wait for completion with timeout of 5 seconds
> so that we can completely forget about possibility of OOM lockup due to
> OOM notifier callback function.
> 
> (5 seconds is chosen from my guess that blocking_notifier_call_chain()
> should not take long, for we are using mutex_trylock(&oom_lock) at
> __alloc_pages_may_oom() based on an assumption that out_of_memory() should
> reclaim memory shortly.)
> 
> The kernel thread is created upon first register_oom_notifier() call.
> Thus, those environments which do not use register_oom_notifier() will
> not waste resource for the dedicated kernel thread.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  mm/oom_kill.c | 40 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index dee0f75..d9744f7 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -981,9 +981,37 @@ static void check_panic_on_oom(struct oom_control *oc,
>  }
>  
>  static BLOCKING_NOTIFIER_HEAD(oom_notify_list);
> +static bool oom_notifier_requested;
> +static unsigned long oom_notifier_freed;
> +static struct task_struct *oom_notifier_th;
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_request_wait);
> +static DECLARE_WAIT_QUEUE_HEAD(oom_notifier_response_wait);
> +
> +static int oom_notifier(void *unused)
> +{
> +	while (true) {
> +		wait_event_freezable(oom_notifier_request_wait,
> +				     oom_notifier_requested);
> +		blocking_notifier_call_chain(&oom_notify_list, 0,
> +					     &oom_notifier_freed);
> +		oom_notifier_requested = false;
> +		wake_up(&oom_notifier_response_wait);
> +	}
> +	return 0;
> +}
>  
>  int register_oom_notifier(struct notifier_block *nb)
>  {
> +	if (!oom_notifier_th) {
> +		struct task_struct *th = kthread_run(oom_notifier, NULL,
> +						     "oom_notifier");
> +
> +		if (IS_ERR(th)) {
> +			pr_err("Unable to start OOM notifier thread.\n");
> +			return (int) PTR_ERR(th);
> +		}
> +		oom_notifier_th = th;
> +	}
>  	return blocking_notifier_chain_register(&oom_notify_list, nb);
>  }
>  EXPORT_SYMBOL_GPL(register_oom_notifier);
> @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
>   */
>  bool out_of_memory(struct oom_control *oc)
>  {
> -	unsigned long freed = 0;
>  	enum oom_constraint constraint = CONSTRAINT_NONE;
>  
>  	if (oom_killer_disabled)
>  		return false;
>  
> -	if (!is_memcg_oom(oc)) {
> -		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> -		if (freed > 0)
> +	if (!is_memcg_oom(oc) && oom_notifier_th) {
> +		oom_notifier_requested = true;
> +		wake_up(&oom_notifier_request_wait);
> +		wait_event_timeout(oom_notifier_response_wait,
> +				   !oom_notifier_requested, 5 * HZ);

I guess this means what was earlier a deadlock will free up after 5
seconds, by a 5 sec downtime is still a lot, isn't it?


> +		if (oom_notifier_freed) {
> +			oom_notifier_freed = 0;
>  			/* Got some memory back in the last second. */
>  			return true;
> +		}
>  	}
>  
>  	/*
> -- 
> 1.8.3.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02  3:59         ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  (?)
  (?)
@ 2017-10-02  9:06         ` Michal Hocko
  2017-10-02 11:33           ` [RFC] [PATCH] mm, oom: " Tetsuo Handa
  -1 siblings, 1 reply; 38+ messages in thread
From: Michal Hocko @ 2017-10-02  9:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuo Handa, jasowang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, airlied, paulmck, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, virtualization, intel-gfx, linux-mm

[Hmm, I do not see the original patch which this has been a reply to]

On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > > Michael S. Tsirkin wrote:
> > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > Hello.
> > > > > 
> > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > 
> > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > at leak_balloon().

This is really nasty! And I would argue that this is an abuse of the oom
notifier interface from the virtio code. OOM notifiers are an ugly hack
on its own but all its users have to be really careful to not depend on
any allocation request because that is a straight deadlock situation.

I do not think that making oom notifier API more complex is the way to
go. Can we simply change the lock to try_lock? If the lock is held we
would simply fall back to the normal OOM handling. As a follow up it
would be great if virtio could use some other form of aging e.g.
shrinker.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02  3:59         ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  (?)
@ 2017-10-02  9:06         ` Michal Hocko
  -1 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02  9:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, Tetsuo Handa, joonas.lahtinen, jiangshanlai, josh,
	jani.nikula, virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

[Hmm, I do not see the original patch which this has been a reply to]

On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > Tetsuo Handa wrote:
> > > Michael S. Tsirkin wrote:
> > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > Hello.
> > > > > 
> > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > 
> > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > at leak_balloon().

This is really nasty! And I would argue that this is an abuse of the oom
notifier interface from the virtio code. OOM notifiers are an ugly hack
on its own but all its users have to be really careful to not depend on
any allocation request because that is a straight deadlock situation.

I do not think that making oom notifier API more complex is the way to
go. Can we simply change the lock to try_lock? If the lock is held we
would simply fall back to the normal OOM handling. As a follow up it
would be great if virtio could use some other form of aging e.g.
shrinker.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
  2017-10-02  9:06         ` Michal Hocko
@ 2017-10-02 11:33           ` Tetsuo Handa
  2017-10-02 11:50             ` Michal Hocko
  2017-10-02 11:50             ` Michal Hocko
  0 siblings, 2 replies; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-02 11:33 UTC (permalink / raw)
  To: mhocko, mst
  Cc: airlied, jasowang, jiangshanlai, josh, virtualization, linux-mm,
	mathieu.desnoyers, rostedt, rodrigo.vivi, paulmck, intel-gfx

Michal Hocko wrote:
> [Hmm, I do not see the original patch which this has been a reply to]

urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
and the rest are "Recipient address rejected: Greylisted" or
"Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
and after all dropped at the servers. Sad...

> 
> On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > Tetsuo Handa wrote:
> > > > Michael S. Tsirkin wrote:
> > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > Hello.
> > > > > > 
> > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > 
> > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > at leak_balloon().
> 
> This is really nasty! And I would argue that this is an abuse of the oom
> notifier interface from the virtio code. OOM notifiers are an ugly hack
> on its own but all its users have to be really careful to not depend on
> any allocation request because that is a straight deadlock situation.

Please describe such warning at
"int register_oom_notifier(struct notifier_block *nb)" definition.

> 
> I do not think that making oom notifier API more complex is the way to
> go. Can we simply change the lock to try_lock?

Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
which will fail to make progress due to oom_lock already held. Therefore,
virtballoon_oom_notify() needs to guarantee that all allocation attempts use
GFP_NOWAIT when called from virtballoon_oom_notify().

virtballoon_oom_notify() can guarantee GFP_NOIO using memalloc_noio_{save,restore}()
(which is currently missing because blocking_notifier_call_chain() might be called by
GFP_NOIO allocation request (e.g. disk_events_workfn)). But there is no
memalloc_nodirectreclaim_{save,restore}() for guaranteeing GFP_NOWAIT is used.
virtballoon_oom_notify() will need to use some flag and switch GFP_NOWAIT and
GFP_KERNEL based on that flag. I worry that such approach is prone to oversight.

>                                                If the lock is held we
> would simply fall back to the normal OOM handling. As a follow up it
> would be great if virtio could use some other form of aging e.g.
> shrinker.
> -- 
> Michal Hocko
> SUSE Labs
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 11:33           ` [RFC] [PATCH] mm, oom: " Tetsuo Handa
@ 2017-10-02 11:50             ` Michal Hocko
  2017-10-02 13:05               ` [RFC] [PATCH] mm,oom: " Tetsuo Handa
                                 ` (2 more replies)
  2017-10-02 11:50             ` Michal Hocko
  1 sibling, 3 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 11:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mst, airlied, jasowang, jiangshanlai, josh, virtualization,
	linux-mm, mathieu.desnoyers, rostedt, rodrigo.vivi, paulmck,
	intel-gfx

On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > [Hmm, I do not see the original patch which this has been a reply to]
> 
> urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
> and the rest are "Recipient address rejected: Greylisted" or
> "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
> and after all dropped at the servers. Sad...
> 
> > 
> > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > Tetsuo Handa wrote:
> > > > > Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > Hello.
> > > > > > > 
> > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > 
> > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > at leak_balloon().
> > 
> > This is really nasty! And I would argue that this is an abuse of the oom
> > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > on its own but all its users have to be really careful to not depend on
> > any allocation request because that is a straight deadlock situation.
> 
> Please describe such warning at
> "int register_oom_notifier(struct notifier_block *nb)" definition.

Yes, we can and should do that. Although I would prefer to simply
document this API as deprecated. Care to send a patch? I am quite busy
with other stuff.

> > I do not think that making oom notifier API more complex is the way to
> > go. Can we simply change the lock to try_lock?
> 
> Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> which will fail to make progress due to oom_lock already held. Therefore,
> virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> GFP_NOWAIT when called from virtballoon_oom_notify().

Ohh, I missed your point and thought the dependency is indirect and some
other call path is allocating while holding the lock. But you seem to be
right and
leak_balloon
  tell_host
    virtqueue_add_outbuf
      virtqueue_add

can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
try to allocate while we are in the OOM path. Michael, is there any way
to drop this?
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 11:33           ` [RFC] [PATCH] mm, oom: " Tetsuo Handa
  2017-10-02 11:50             ` Michal Hocko
@ 2017-10-02 11:50             ` Michal Hocko
  1 sibling, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 11:50 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: mst, airlied, joonas.lahtinen, jiangshanlai, josh, jani.nikula,
	virtualization, linux-mm, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > [Hmm, I do not see the original patch which this has been a reply to]
> 
> urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
> and the rest are "Recipient address rejected: Greylisted" or
> "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
> and after all dropped at the servers. Sad...
> 
> > 
> > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > Tetsuo Handa wrote:
> > > > > Michael S. Tsirkin wrote:
> > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > Hello.
> > > > > > > 
> > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > 
> > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > at leak_balloon().
> > 
> > This is really nasty! And I would argue that this is an abuse of the oom
> > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > on its own but all its users have to be really careful to not depend on
> > any allocation request because that is a straight deadlock situation.
> 
> Please describe such warning at
> "int register_oom_notifier(struct notifier_block *nb)" definition.

Yes, we can and should do that. Although I would prefer to simply
document this API as deprecated. Care to send a patch? I am quite busy
with other stuff.

> > I do not think that making oom notifier API more complex is the way to
> > go. Can we simply change the lock to try_lock?
> 
> Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> which will fail to make progress due to oom_lock already held. Therefore,
> virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> GFP_NOWAIT when called from virtballoon_oom_notify().

Ohh, I missed your point and thought the dependency is indirect and some
other call path is allocating while holding the lock. But you seem to be
right and
leak_balloon
  tell_host
    virtqueue_add_outbuf
      virtqueue_add

can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
try to allocate while we are in the OOM path. Michael, is there any way
to drop this?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 11:50             ` Michal Hocko
@ 2017-10-02 13:05               ` Tetsuo Handa
  2017-10-02 13:13                 ` Michal Hocko
  2017-10-02 14:11               ` Michael S. Tsirkin
  2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  2 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-02 13:05 UTC (permalink / raw)
  To: mhocko; +Cc: mst, linux-mm

(Reducing recipients in a hope not to be filtered at the servers.)

Michal Hocko wrote:
> On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > Tetsuo Handa wrote:
> > > > > > Michael S. Tsirkin wrote:
> > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > Hello.
> > > > > > > > 
> > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > 
> > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > at leak_balloon().
> > > 
> > > This is really nasty! And I would argue that this is an abuse of the oom
> > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > on its own but all its users have to be really careful to not depend on
> > > any allocation request because that is a straight deadlock situation.
> > 
> > > I do not think that making oom notifier API more complex is the way to
> > > go. Can we simply change the lock to try_lock?
> > 
> > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > which will fail to make progress due to oom_lock already held. Therefore,
> > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > GFP_NOWAIT when called from virtballoon_oom_notify().
> 
> Ohh, I missed your point and thought the dependency is indirect and some
> other call path is allocating while holding the lock. But you seem to be
> right and
> leak_balloon
>   tell_host
>     virtqueue_add_outbuf
>       virtqueue_add
> 
> can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> try to allocate while we are in the OOM path. Michael, is there any way
> to drop this?

Michael already said

  That would be tricky to fix. I guess we'll need to drop the lock
  while allocating memory - not an easy fix.

and I think that it would be possible for virtio to locally offload
virtballoon_oom_notify() using this patch's approach, if you don't like
globally offloading at the OOM notifier API level.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 13:05               ` [RFC] [PATCH] mm,oom: " Tetsuo Handa
@ 2017-10-02 13:13                 ` Michal Hocko
  2017-10-02 13:52                   ` Tetsuo Handa
  2017-10-02 14:15                   ` Michael S. Tsirkin
  0 siblings, 2 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 13:13 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mst, linux-mm

On Mon 02-10-17 22:05:17, Tetsuo Handa wrote:
> (Reducing recipients in a hope not to be filtered at the servers.)
> 
> Michal Hocko wrote:
> > On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > > Tetsuo Handa wrote:
> > > > > > > Michael S. Tsirkin wrote:
> > > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > > Hello.
> > > > > > > > > 
> > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > > 
> > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > > at leak_balloon().
> > > > 
> > > > This is really nasty! And I would argue that this is an abuse of the oom
> > > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > > on its own but all its users have to be really careful to not depend on
> > > > any allocation request because that is a straight deadlock situation.
> > > 
> > > > I do not think that making oom notifier API more complex is the way to
> > > > go. Can we simply change the lock to try_lock?
> > > 
> > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > > which will fail to make progress due to oom_lock already held. Therefore,
> > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > > GFP_NOWAIT when called from virtballoon_oom_notify().
> > 
> > Ohh, I missed your point and thought the dependency is indirect and some
> > other call path is allocating while holding the lock. But you seem to be
> > right and
> > leak_balloon
> >   tell_host
> >     virtqueue_add_outbuf
> >       virtqueue_add
> > 
> > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > try to allocate while we are in the OOM path. Michael, is there any way
> > to drop this?
> 
> Michael already said
> 
>   That would be tricky to fix. I guess we'll need to drop the lock
>   while allocating memory - not an easy fix.

We are OOM, we cannot allocate _any_ memory! This is just broken.

> and I think that it would be possible for virtio to locally offload
> virtballoon_oom_notify() using this patch's approach, if you don't like
> globally offloading at the OOM notifier API level.

Even if the allocation is offloaded to a different context we are sill
OOM and we would have to block waiting for it which is just error prone.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 13:13                 ` Michal Hocko
@ 2017-10-02 13:52                   ` Tetsuo Handa
  2017-10-02 14:23                     ` Michael S. Tsirkin
  2017-10-02 14:15                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-02 13:52 UTC (permalink / raw)
  To: mhocko; +Cc: mst, linux-mm

Michal Hocko wrote:
> On Mon 02-10-17 22:05:17, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > > > > I do not think that making oom notifier API more complex is the way to
> > > > > go. Can we simply change the lock to try_lock?
> > > > 
> > > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > > > which will fail to make progress due to oom_lock already held. Therefore,
> > > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > > > GFP_NOWAIT when called from virtballoon_oom_notify().
> > > 
> > > Ohh, I missed your point and thought the dependency is indirect and some
> > > other call path is allocating while holding the lock. But you seem to be
> > > right and
> > > leak_balloon
> > >   tell_host
> > >     virtqueue_add_outbuf
> > >       virtqueue_add
> > > 
> > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > try to allocate while we are in the OOM path. Michael, is there any way
> > > to drop this?
> > 
> > Michael already said
> > 
> >   That would be tricky to fix. I guess we'll need to drop the lock
> >   while allocating memory - not an easy fix.
> 
> We are OOM, we cannot allocate _any_ memory! This is just broken.
> 
> > and I think that it would be possible for virtio to locally offload
> > virtballoon_oom_notify() using this patch's approach, if you don't like
> > globally offloading at the OOM notifier API level.
> 
> Even if the allocation is offloaded to a different context we are sill
> OOM and we would have to block waiting for it which is just error prone.

Like I comment below, I'm assuming that this deadlock should rarely
happen from the beginning. Since GFP_KERNEL allocation is conditional,
we might be able to avoid the allocation from virtballoon_oom_notify().

Michael S. Tsirkin wrote:
> > @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
> >   */
> >  bool out_of_memory(struct oom_control *oc)
> >  {
> > -	unsigned long freed = 0;
> >  	enum oom_constraint constraint = CONSTRAINT_NONE;
> >  
> >  	if (oom_killer_disabled)
> >  		return false;
> >  
> > -	if (!is_memcg_oom(oc)) {
> > -		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> > -		if (freed > 0)
> > +	if (!is_memcg_oom(oc) && oom_notifier_th) {
> > +		oom_notifier_requested = true;
> > +		wake_up(&oom_notifier_request_wait);
> > +		wait_event_timeout(oom_notifier_response_wait,
> > +				   !oom_notifier_requested, 5 * HZ);
> 
> I guess this means what was earlier a deadlock will free up after 5
> seconds,

Yes.

>          by a 5 sec downtime is still a lot, isn't it?

This timeout should unlikely expire. Please note that this offloading is
intended for handling the worst scenario, that is, "out_of_memory() is called
when somebody is already holding vb->balloon_lock lock" and
"GFP_KERNEL allocation is attempted from virtballoon_oom_notify()".

As far as I know, this lock is held when fill_balloon() or leak_balloon() is
called. Majority of OOM events call out_of_memory() without holding this lock.
Thus, "out_of_memory() is called when somebody is already holding vb->balloon_lock
lock" should rarely happen from the beginning.

If you can artificially trigger this deadlock (i.e. user triggerable OOM DoS),
a patch for fixing this problem needs to be backported to older/distributor
kernels...

Yes, conditional GFP_KERNEL allocation attempt from virtqueue_add() might
still cause this deadlock. But that depends on whether you can trigger this
deadlock. As far as I know, there is no report. Thus, I think that avoiding
theoretical deadlock using timeout will be sufficient.

> 
> 
> > +		if (oom_notifier_freed) {
> > +			oom_notifier_freed = 0;
> >  			/* Got some memory back in the last second. */
> >  			return true;
> > +		}
> >  	}
> >  
> >  	/*
> > -- 
> > 1.8.3.1
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 11:50             ` Michal Hocko
@ 2017-10-02 14:11                 ` Michael S. Tsirkin
  2017-10-02 14:11               ` Michael S. Tsirkin
  2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  2 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, jasowang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, airlied, paulmck, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, virtualization, intel-gfx, linux-mm

On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > [Hmm, I do not see the original patch which this has been a reply to]
> > 
> > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
> > and the rest are "Recipient address rejected: Greylisted" or
> > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
> > and after all dropped at the servers. Sad...
> > 
> > > 
> > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > Tetsuo Handa wrote:
> > > > > > Michael S. Tsirkin wrote:
> > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > Hello.
> > > > > > > > 
> > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > 
> > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > at leak_balloon().
> > > 
> > > This is really nasty! And I would argue that this is an abuse of the oom
> > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > on its own but all its users have to be really careful to not depend on
> > > any allocation request because that is a straight deadlock situation.
> > 
> > Please describe such warning at
> > "int register_oom_notifier(struct notifier_block *nb)" definition.
> 
> Yes, we can and should do that. Although I would prefer to simply
> document this API as deprecated. Care to send a patch? I am quite busy
> with other stuff.
> 
> > > I do not think that making oom notifier API more complex is the way to
> > > go. Can we simply change the lock to try_lock?
> > 
> > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > which will fail to make progress due to oom_lock already held. Therefore,
> > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > GFP_NOWAIT when called from virtballoon_oom_notify().
> 
> Ohh, I missed your point and thought the dependency is indirect

I do think this is the case. See below.


> and some
> other call path is allocating while holding the lock. But you seem to be
> right and
> leak_balloon
>   tell_host
>     virtqueue_add_outbuf
>       virtqueue_add
> 
> can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> try to allocate while we are in the OOM path. Michael, is there any way
> to drop this?

Yes - in practice it won't ever allocate - that path is never taken
with add_outbuf - it is for add_sgs only.

IMHO the issue is balloon inflation which needs to allocate
memory. It does it under a mutex, and oom handler tries to take the
same mutex.


> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 11:50             ` Michal Hocko
  2017-10-02 13:05               ` [RFC] [PATCH] mm,oom: " Tetsuo Handa
@ 2017-10-02 14:11               ` Michael S. Tsirkin
  2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  2 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Tetsuo Handa, joonas.lahtinen, jiangshanlai, josh,
	jani.nikula, virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > [Hmm, I do not see the original patch which this has been a reply to]
> > 
> > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
> > and the rest are "Recipient address rejected: Greylisted" or
> > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
> > and after all dropped at the servers. Sad...
> > 
> > > 
> > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > Tetsuo Handa wrote:
> > > > > > Michael S. Tsirkin wrote:
> > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > Hello.
> > > > > > > > 
> > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > 
> > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > at leak_balloon().
> > > 
> > > This is really nasty! And I would argue that this is an abuse of the oom
> > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > on its own but all its users have to be really careful to not depend on
> > > any allocation request because that is a straight deadlock situation.
> > 
> > Please describe such warning at
> > "int register_oom_notifier(struct notifier_block *nb)" definition.
> 
> Yes, we can and should do that. Although I would prefer to simply
> document this API as deprecated. Care to send a patch? I am quite busy
> with other stuff.
> 
> > > I do not think that making oom notifier API more complex is the way to
> > > go. Can we simply change the lock to try_lock?
> > 
> > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > which will fail to make progress due to oom_lock already held. Therefore,
> > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > GFP_NOWAIT when called from virtballoon_oom_notify().
> 
> Ohh, I missed your point and thought the dependency is indirect

I do think this is the case. See below.


> and some
> other call path is allocating while holding the lock. But you seem to be
> right and
> leak_balloon
>   tell_host
>     virtqueue_add_outbuf
>       virtqueue_add
> 
> can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> try to allocate while we are in the OOM path. Michael, is there any way
> to drop this?

Yes - in practice it won't ever allocate - that path is never taken
with add_outbuf - it is for add_sgs only.

IMHO the issue is balloon inflation which needs to allocate
memory. It does it under a mutex, and oom handler tries to take the
same mutex.


> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
@ 2017-10-02 14:11                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Tetsuo Handa, jasowang, jiangshanlai, josh,
	virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > [Hmm, I do not see the original patch which this has been a reply to]
> > 
> > urbl.hostedemail.com and b.barracudacentral.org blocked my IP address,
> > and the rest are "Recipient address rejected: Greylisted" or
> > "Deferred: 451-4.3.0 Multiple destination domains per transaction is unsupported.",
> > and after all dropped at the servers. Sad...
> > 
> > > 
> > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > Tetsuo Handa wrote:
> > > > > > Michael S. Tsirkin wrote:
> > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > Hello.
> > > > > > > > 
> > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > 
> > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > at leak_balloon().
> > > 
> > > This is really nasty! And I would argue that this is an abuse of the oom
> > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > on its own but all its users have to be really careful to not depend on
> > > any allocation request because that is a straight deadlock situation.
> > 
> > Please describe such warning at
> > "int register_oom_notifier(struct notifier_block *nb)" definition.
> 
> Yes, we can and should do that. Although I would prefer to simply
> document this API as deprecated. Care to send a patch? I am quite busy
> with other stuff.
> 
> > > I do not think that making oom notifier API more complex is the way to
> > > go. Can we simply change the lock to try_lock?
> > 
> > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > which will fail to make progress due to oom_lock already held. Therefore,
> > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > GFP_NOWAIT when called from virtballoon_oom_notify().
> 
> Ohh, I missed your point and thought the dependency is indirect

I do think this is the case. See below.


> and some
> other call path is allocating while holding the lock. But you seem to be
> right and
> leak_balloon
>   tell_host
>     virtqueue_add_outbuf
>       virtqueue_add
> 
> can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> try to allocate while we are in the OOM path. Michael, is there any way
> to drop this?

Yes - in practice it won't ever allocate - that path is never taken
with add_outbuf - it is for add_sgs only.

IMHO the issue is balloon inflation which needs to allocate
memory. It does it under a mutex, and oom handler tries to take the
same mutex.


> -- 
> Michal Hocko
> SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 13:13                 ` Michal Hocko
  2017-10-02 13:52                   ` Tetsuo Handa
@ 2017-10-02 14:15                   ` Michael S. Tsirkin
  1 sibling, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:15 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, linux-mm

On Mon, Oct 02, 2017 at 03:13:30PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 22:05:17, Tetsuo Handa wrote:
> > (Reducing recipients in a hope not to be filtered at the servers.)
> > 
> > Michal Hocko wrote:
> > > On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > > > > On Mon 02-10-17 06:59:12, Michael S. Tsirkin wrote:
> > > > > > On Sun, Oct 01, 2017 at 02:44:34PM +0900, Tetsuo Handa wrote:
> > > > > > > Tetsuo Handa wrote:
> > > > > > > > Michael S. Tsirkin wrote:
> > > > > > > > > On Mon, Sep 11, 2017 at 07:27:19PM +0900, Tetsuo Handa wrote:
> > > > > > > > > > Hello.
> > > > > > > > > > 
> > > > > > > > > > I noticed that virtio_balloon is using register_oom_notifier() and
> > > > > > > > > > leak_balloon() from virtballoon_oom_notify() might depend on
> > > > > > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > > > > > > 
> > > > > > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE] implies
> > > > > > > > > > __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, this allocation attempt might
> > > > > > > > > > depend on somebody else's __GFP_DIRECT_RECLAIM | !__GFP_NORETRY memory
> > > > > > > > > > allocation. Such __GFP_DIRECT_RECLAIM | !__GFP_NORETRY allocation can reach
> > > > > > > > > > __alloc_pages_may_oom() and hold oom_lock mutex and call out_of_memory().
> > > > > > > > > > And leak_balloon() is called by virtballoon_oom_notify() via
> > > > > > > > > > blocking_notifier_call_chain() callback when vb->balloon_lock mutex is already
> > > > > > > > > > held by fill_balloon(). As a result, despite __GFP_NORETRY is specified,
> > > > > > > > > > fill_balloon() can indirectly get stuck waiting for vb->balloon_lock mutex
> > > > > > > > > > at leak_balloon().
> > > > > 
> > > > > This is really nasty! And I would argue that this is an abuse of the oom
> > > > > notifier interface from the virtio code. OOM notifiers are an ugly hack
> > > > > on its own but all its users have to be really careful to not depend on
> > > > > any allocation request because that is a straight deadlock situation.
> > > > 
> > > > > I do not think that making oom notifier API more complex is the way to
> > > > > go. Can we simply change the lock to try_lock?
> > > > 
> > > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > > > which will fail to make progress due to oom_lock already held. Therefore,
> > > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > > > GFP_NOWAIT when called from virtballoon_oom_notify().
> > > 
> > > Ohh, I missed your point and thought the dependency is indirect and some
> > > other call path is allocating while holding the lock. But you seem to be
> > > right and
> > > leak_balloon
> > >   tell_host
> > >     virtqueue_add_outbuf
> > >       virtqueue_add
> > > 
> > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > try to allocate while we are in the OOM path. Michael, is there any way
> > > to drop this?
> > 
> > Michael already said
> > 
> >   That would be tricky to fix. I guess we'll need to drop the lock
> >   while allocating memory - not an easy fix.
> 
> We are OOM, we cannot allocate _any_ memory! This is just broken.

I think we don't. What allocates memory is fill_balloon only.



> > and I think that it would be possible for virtio to locally offload
> > virtballoon_oom_notify() using this patch's approach, if you don't like
> > globally offloading at the OOM notifier API level.
> 
> Even if the allocation is offloaded to a different context we are sill
> OOM and we would have to block waiting for it which is just error prone.
> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
@ 2017-10-02 14:19                   ` Michal Hocko
  -1 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 14:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuo Handa, jasowang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, airlied, paulmck, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, virtualization, intel-gfx, linux-mm

On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
[...]
> > and some
> > other call path is allocating while holding the lock. But you seem to be
> > right and
> > leak_balloon
> >   tell_host
> >     virtqueue_add_outbuf
> >       virtqueue_add
> > 
> > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > try to allocate while we are in the OOM path. Michael, is there any way
> > to drop this?
> 
> Yes - in practice it won't ever allocate - that path is never taken
> with add_outbuf - it is for add_sgs only.
> 
> IMHO the issue is balloon inflation which needs to allocate
> memory. It does it under a mutex, and oom handler tries to take the
> same mutex.

try_lock for the oom notifier path should heal the problem then, righ?
At least for as a quick fix.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  (?)
  (?)
@ 2017-10-02 14:19                 ` Michal Hocko
  -1 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 14:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, Tetsuo Handa, joonas.lahtinen, jiangshanlai, josh,
	jani.nikula, virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
[...]
> > and some
> > other call path is allocating while holding the lock. But you seem to be
> > right and
> > leak_balloon
> >   tell_host
> >     virtqueue_add_outbuf
> >       virtqueue_add
> > 
> > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > try to allocate while we are in the OOM path. Michael, is there any way
> > to drop this?
> 
> Yes - in practice it won't ever allocate - that path is never taken
> with add_outbuf - it is for add_sgs only.
> 
> IMHO the issue is balloon inflation which needs to allocate
> memory. It does it under a mutex, and oom handler tries to take the
> same mutex.

try_lock for the oom notifier path should heal the problem then, righ?
At least for as a quick fix.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
@ 2017-10-02 14:19                   ` Michal Hocko
  0 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 14:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, Tetsuo Handa, jasowang, jiangshanlai, josh,
	virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
[...]
> > and some
> > other call path is allocating while holding the lock. But you seem to be
> > right and
> > leak_balloon
> >   tell_host
> >     virtqueue_add_outbuf
> >       virtqueue_add
> > 
> > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > try to allocate while we are in the OOM path. Michael, is there any way
> > to drop this?
> 
> Yes - in practice it won't ever allocate - that path is never taken
> with add_outbuf - it is for add_sgs only.
> 
> IMHO the issue is balloon inflation which needs to allocate
> memory. It does it under a mutex, and oom handler tries to take the
> same mutex.

try_lock for the oom notifier path should heal the problem then, righ?
At least for as a quick fix.
-- 
Michal Hocko
SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 13:52                   ` Tetsuo Handa
@ 2017-10-02 14:23                     ` Michael S. Tsirkin
  2017-10-02 14:44                       ` Tetsuo Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:23 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mhocko, linux-mm

On Mon, Oct 02, 2017 at 10:52:55PM +0900, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 02-10-17 22:05:17, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Mon 02-10-17 20:33:52, Tetsuo Handa wrote:
> > > > > > I do not think that making oom notifier API more complex is the way to
> > > > > > go. Can we simply change the lock to try_lock?
> > > > > 
> > > > > Using mutex_trylock(&vb->balloon_lock) alone is not sufficient. Inside the
> > > > > mutex, __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocation attempt is used
> > > > > which will fail to make progress due to oom_lock already held. Therefore,
> > > > > virtballoon_oom_notify() needs to guarantee that all allocation attempts use
> > > > > GFP_NOWAIT when called from virtballoon_oom_notify().
> > > > 
> > > > Ohh, I missed your point and thought the dependency is indirect and some
> > > > other call path is allocating while holding the lock. But you seem to be
> > > > right and
> > > > leak_balloon
> > > >   tell_host
> > > >     virtqueue_add_outbuf
> > > >       virtqueue_add
> > > > 
> > > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > > try to allocate while we are in the OOM path. Michael, is there any way
> > > > to drop this?
> > > 
> > > Michael already said
> > > 
> > >   That would be tricky to fix. I guess we'll need to drop the lock
> > >   while allocating memory - not an easy fix.
> > 
> > We are OOM, we cannot allocate _any_ memory! This is just broken.
> > 
> > > and I think that it would be possible for virtio to locally offload
> > > virtballoon_oom_notify() using this patch's approach, if you don't like
> > > globally offloading at the OOM notifier API level.
> > 
> > Even if the allocation is offloaded to a different context we are sill
> > OOM and we would have to block waiting for it which is just error prone.
> 
> Like I comment below, I'm assuming that this deadlock should rarely
> happen from the beginning. Since GFP_KERNEL allocation is conditional,
> we might be able to avoid the allocation from virtballoon_oom_notify().
> 
> Michael S. Tsirkin wrote:
> > > @@ -1005,17 +1033,21 @@ int unregister_oom_notifier(struct notifier_block *nb)
> > >   */
> > >  bool out_of_memory(struct oom_control *oc)
> > >  {
> > > -	unsigned long freed = 0;
> > >  	enum oom_constraint constraint = CONSTRAINT_NONE;
> > >  
> > >  	if (oom_killer_disabled)
> > >  		return false;
> > >  
> > > -	if (!is_memcg_oom(oc)) {
> > > -		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
> > > -		if (freed > 0)
> > > +	if (!is_memcg_oom(oc) && oom_notifier_th) {
> > > +		oom_notifier_requested = true;
> > > +		wake_up(&oom_notifier_request_wait);
> > > +		wait_event_timeout(oom_notifier_response_wait,
> > > +				   !oom_notifier_requested, 5 * HZ);
> > 
> > I guess this means what was earlier a deadlock will free up after 5
> > seconds,
> 
> Yes.
> 
> >          by a 5 sec downtime is still a lot, isn't it?
> 
> This timeout should unlikely expire. Please note that this offloading is
> intended for handling the worst scenario, that is, "out_of_memory() is called
> when somebody is already holding vb->balloon_lock lock" and
> "GFP_KERNEL allocation is attempted from virtballoon_oom_notify()".
> 
> As far as I know, this lock is held when fill_balloon() or leak_balloon() is
> called. Majority of OOM events call out_of_memory() without holding this lock.
> Thus, "out_of_memory() is called when somebody is already holding vb->balloon_lock
> lock" should rarely happen from the beginning.
> 
> If you can artificially trigger this deadlock (i.e. user triggerable OOM DoS),
> a patch for fixing this problem needs to be backported to older/distributor
> kernels...
> 
> Yes, conditional GFP_KERNEL allocation attempt from virtqueue_add() might
> still cause this deadlock. But that depends on whether you can trigger this
> deadlock. As far as I know, there is no report. Thus, I think that avoiding
> theoretical deadlock using timeout will be sufficient.


So first of all IMHO GFP_KERNEL allocations do not happen in
virtqueue_add_outbuf at all. They only trigger through add_sgs.

IMHO this is an API bug, we should just drop the gfp parameter
from this API.


so the issue is balloon_page_enqueue only.


> > 
> > 
> > > +		if (oom_notifier_freed) {
> > > +			oom_notifier_freed = 0;
> > >  			/* Got some memory back in the last second. */
> > >  			return true;
> > > +		}
> > >  	}
> > >  
> > >  	/*
> > > -- 
> > > 1.8.3.1
> > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:19                   ` [RFC] [PATCH] mm, oom: " Michal Hocko
@ 2017-10-02 14:29                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tetsuo Handa, jasowang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, airlied, paulmck, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, virtualization, intel-gfx, linux-mm

On Mon, Oct 02, 2017 at 04:19:00PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> > On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> [...]
> > > and some
> > > other call path is allocating while holding the lock. But you seem to be
> > > right and
> > > leak_balloon
> > >   tell_host
> > >     virtqueue_add_outbuf
> > >       virtqueue_add
> > > 
> > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > try to allocate while we are in the OOM path. Michael, is there any way
> > > to drop this?
> > 
> > Yes - in practice it won't ever allocate - that path is never taken
> > with add_outbuf - it is for add_sgs only.
> > 
> > IMHO the issue is balloon inflation which needs to allocate
> > memory. It does it under a mutex, and oom handler tries to take the
> > same mutex.
> 
> try_lock for the oom notifier path should heal the problem then, righ?
> At least for as a quick fix.

IMHO it definitely fixes the deadlock. But it does not fix the bug
that balloon isn't sometimes deflated on oom even though the deflate on
oom flag is set.

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:19                   ` [RFC] [PATCH] mm, oom: " Michal Hocko
  (?)
@ 2017-10-02 14:29                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Tetsuo Handa, joonas.lahtinen, jiangshanlai, josh,
	jani.nikula, virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon, Oct 02, 2017 at 04:19:00PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> > On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> [...]
> > > and some
> > > other call path is allocating while holding the lock. But you seem to be
> > > right and
> > > leak_balloon
> > >   tell_host
> > >     virtqueue_add_outbuf
> > >       virtqueue_add
> > > 
> > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > try to allocate while we are in the OOM path. Michael, is there any way
> > > to drop this?
> > 
> > Yes - in practice it won't ever allocate - that path is never taken
> > with add_outbuf - it is for add_sgs only.
> > 
> > IMHO the issue is balloon inflation which needs to allocate
> > memory. It does it under a mutex, and oom handler tries to take the
> > same mutex.
> 
> try_lock for the oom notifier path should heal the problem then, righ?
> At least for as a quick fix.

IMHO it definitely fixes the deadlock. But it does not fix the bug
that balloon isn't sometimes deflated on oom even though the deflate on
oom flag is set.

> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread.
@ 2017-10-02 14:29                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-02 14:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Tetsuo Handa, jasowang, jiangshanlai, josh,
	virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon, Oct 02, 2017 at 04:19:00PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> > On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> [...]
> > > and some
> > > other call path is allocating while holding the lock. But you seem to be
> > > right and
> > > leak_balloon
> > >   tell_host
> > >     virtqueue_add_outbuf
> > >       virtqueue_add
> > > 
> > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > try to allocate while we are in the OOM path. Michael, is there any way
> > > to drop this?
> > 
> > Yes - in practice it won't ever allocate - that path is never taken
> > with add_outbuf - it is for add_sgs only.
> > 
> > IMHO the issue is balloon inflation which needs to allocate
> > memory. It does it under a mutex, and oom handler tries to take the
> > same mutex.
> 
> try_lock for the oom notifier path should heal the problem then, righ?
> At least for as a quick fix.

IMHO it definitely fixes the deadlock. But it does not fix the bug
that balloon isn't sometimes deflated on oom even though the deflate on
oom flag is set.

> -- 
> Michal Hocko
> SUSE Labs
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:29                     ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  (?)
  (?)
@ 2017-10-02 14:31                     ` Michal Hocko
  -1 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 14:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Tetsuo Handa, jasowang, jani.nikula, joonas.lahtinen,
	rodrigo.vivi, airlied, paulmck, josh, rostedt, mathieu.desnoyers,
	jiangshanlai, virtualization, intel-gfx, linux-mm

On Mon 02-10-17 17:29:47, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2017 at 04:19:00PM +0200, Michal Hocko wrote:
> > On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> > > On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> > [...]
> > > > and some
> > > > other call path is allocating while holding the lock. But you seem to be
> > > > right and
> > > > leak_balloon
> > > >   tell_host
> > > >     virtqueue_add_outbuf
> > > >       virtqueue_add
> > > > 
> > > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > > try to allocate while we are in the OOM path. Michael, is there any way
> > > > to drop this?
> > > 
> > > Yes - in practice it won't ever allocate - that path is never taken
> > > with add_outbuf - it is for add_sgs only.
> > > 
> > > IMHO the issue is balloon inflation which needs to allocate
> > > memory. It does it under a mutex, and oom handler tries to take the
> > > same mutex.
> > 
> > try_lock for the oom notifier path should heal the problem then, righ?
> > At least for as a quick fix.
> 
> IMHO it definitely fixes the deadlock. But it does not fix the bug
> that balloon isn't sometimes deflated on oom even though the deflate on
> oom flag is set.

Yes, that would require a more sophisticated fix. And I would argue that
would require to drop oom handler and move deflate logic to a shrinker
to better scale with the memory pressure rather than act on the very
last moment.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:29                     ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
  (?)
@ 2017-10-02 14:31                     ` Michal Hocko
  -1 siblings, 0 replies; 38+ messages in thread
From: Michal Hocko @ 2017-10-02 14:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-mm, Tetsuo Handa, joonas.lahtinen, jiangshanlai, josh,
	jani.nikula, virtualization, airlied, mathieu.desnoyers, rostedt,
	rodrigo.vivi, paulmck, intel-gfx

On Mon 02-10-17 17:29:47, Michael S. Tsirkin wrote:
> On Mon, Oct 02, 2017 at 04:19:00PM +0200, Michal Hocko wrote:
> > On Mon 02-10-17 17:11:55, Michael S. Tsirkin wrote:
> > > On Mon, Oct 02, 2017 at 01:50:35PM +0200, Michal Hocko wrote:
> > [...]
> > > > and some
> > > > other call path is allocating while holding the lock. But you seem to be
> > > > right and
> > > > leak_balloon
> > > >   tell_host
> > > >     virtqueue_add_outbuf
> > > >       virtqueue_add
> > > > 
> > > > can do GFP_KERNEL allocation and this is clearly wrong. Nobody should
> > > > try to allocate while we are in the OOM path. Michael, is there any way
> > > > to drop this?
> > > 
> > > Yes - in practice it won't ever allocate - that path is never taken
> > > with add_outbuf - it is for add_sgs only.
> > > 
> > > IMHO the issue is balloon inflation which needs to allocate
> > > memory. It does it under a mutex, and oom handler tries to take the
> > > same mutex.
> > 
> > try_lock for the oom notifier path should heal the problem then, righ?
> > At least for as a quick fix.
> 
> IMHO it definitely fixes the deadlock. But it does not fix the bug
> that balloon isn't sometimes deflated on oom even though the deflate on
> oom flag is set.

Yes, that would require a more sophisticated fix. And I would argue that
would require to drop oom handler and move deflate logic to a shrinker
to better scale with the memory pressure rather than act on the very
last moment.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:23                     ` Michael S. Tsirkin
@ 2017-10-02 14:44                       ` Tetsuo Handa
  2017-10-07 11:30                         ` Tetsuo Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-02 14:44 UTC (permalink / raw)
  To: mst; +Cc: mhocko, linux-mm

Michael S. Tsirkin wrote:
> > Yes, conditional GFP_KERNEL allocation attempt from virtqueue_add() might
> > still cause this deadlock. But that depends on whether you can trigger this
> > deadlock. As far as I know, there is no report. Thus, I think that avoiding
> > theoretical deadlock using timeout will be sufficient.
> 
> 
> So first of all IMHO GFP_KERNEL allocations do not happen in
> virtqueue_add_outbuf at all. They only trigger through add_sgs.

I did not notice that total_sg == 1 is true for virtqueue_add_outbuf().

> 
> IMHO this is an API bug, we should just drop the gfp parameter
> from this API.

OK.

> 
> 
> so the issue is balloon_page_enqueue only.
> 

Since you explained that there is "the deflate on OOM flag", we don't
want to skip deflating upon lock contention.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-02 14:44                       ` Tetsuo Handa
@ 2017-10-07 11:30                         ` Tetsuo Handa
  2017-10-09  7:46                           ` Michal Hocko
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-07 11:30 UTC (permalink / raw)
  To: mst; +Cc: mhocko, linux-mm

Tetsuo Handa wrote:
> Michael S. Tsirkin wrote:
> > > Yes, conditional GFP_KERNEL allocation attempt from virtqueue_add() might
> > > still cause this deadlock. But that depends on whether you can trigger this
> > > deadlock. As far as I know, there is no report. Thus, I think that avoiding
> > > theoretical deadlock using timeout will be sufficient.
> > 
> > 
> > So first of all IMHO GFP_KERNEL allocations do not happen in
> > virtqueue_add_outbuf at all. They only trigger through add_sgs.
> 
> I did not notice that total_sg == 1 is true for virtqueue_add_outbuf().
> 
> > 
> > IMHO this is an API bug, we should just drop the gfp parameter
> > from this API.
> 
> OK.
> 
> > 
> > 
> > so the issue is balloon_page_enqueue only.
> > 
> 
> Since you explained that there is "the deflate on OOM flag", we don't
> want to skip deflating upon lock contention.
> 

I tested virtballoon_oom_notify() path using artificial stress
(with inverted virtio_has_feature(VIRTIO_BALLOON_F_DEFLATE_ON_OOM) check
because the QEMU I have does not seem to support deflate-on-oom option),
but I could not trigger the OOM lockup. Thus, I think that we don't need
to worry about

  IMHO it definitely fixes the deadlock. But it does not fix the bug
  that balloon isn't sometimes deflated on oom even though the deflate on
  oom flag is set.

because this is not easy to trigger.

Even if we make sure that the balloon is guaranteed to deflate on oom by
doing memory allocation for inflation outside of balloon_lock, we after all
have to inflate balloon to the size the host has requested by OOM killing
processes in the guest, don't we?

Then, I think that skipping deflating upon lock contention for now is
acceptable. Below is a patch. What do you think?



>From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Sat, 7 Oct 2017 19:29:21 +0900
Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()

In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
serialize against fill_balloon(). But in fill_balloon(),
alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
is specified, this allocation attempt might depend on somebody else's
__GFP_DIRECT_RECLAIM memory allocation. And such __GFP_DIRECT_RECLAIM
memory allocation might call leak_balloon() via virtballoon_oom_notify()
via blocking_notifier_call_chain() callback via out_of_memory() when it
reached __alloc_pages_may_oom() and held oom_lock mutex. Since
vb->balloon_lock mutex is already held by fill_balloon(), it will cause
OOM lockup. Thus, do not wait for vb->balloon_lock mutex if
leak_balloon() is called from out_of_memory().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/virtio/virtio_balloon.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index f0b3a0b..7dbacfb 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -192,7 +192,7 @@ static void release_pages_balloon(struct virtio_balloon *vb,
 	}
 }
 
-static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
+static unsigned leak_balloon(struct virtio_balloon *vb, size_t num, bool wait)
 {
 	unsigned num_freed_pages;
 	struct page *page;
@@ -202,7 +202,10 @@ static unsigned leak_balloon(struct virtio_balloon *vb, size_t num)
 	/* We can only do one array worth at a time. */
 	num = min(num, ARRAY_SIZE(vb->pfns));
 
-	mutex_lock(&vb->balloon_lock);
+	if (wait)
+		mutex_lock(&vb->balloon_lock);
+	else if (!mutex_trylock(&vb->balloon_lock))
+		return 0;
 	/* We can't release more pages than taken */
 	num = min(num, (size_t)vb->num_pages);
 	for (vb->num_pfns = 0; vb->num_pfns < num;
@@ -367,7 +370,7 @@ static int virtballoon_oom_notify(struct notifier_block *self,
 		return NOTIFY_OK;
 
 	freed = parm;
-	num_freed_pages = leak_balloon(vb, oom_pages);
+	num_freed_pages = leak_balloon(vb, oom_pages, false);
 	update_balloon_size(vb);
 	*freed += num_freed_pages;
 
@@ -395,7 +398,7 @@ static void update_balloon_size_func(struct work_struct *work)
 	if (diff > 0)
 		diff -= fill_balloon(vb, diff);
 	else if (diff < 0)
-		diff += leak_balloon(vb, -diff);
+		diff += leak_balloon(vb, -diff, true);
 	update_balloon_size(vb);
 
 	if (diff)
@@ -597,7 +600,7 @@ static void remove_common(struct virtio_balloon *vb)
 {
 	/* There might be pages left in the balloon: free them. */
 	while (vb->num_pages)
-		leak_balloon(vb, vb->num_pages);
+		leak_balloon(vb, vb->num_pages, true);
 	update_balloon_size(vb);
 
 	/* Now we reset the device so we can clean up the queues. */
-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-07 11:30                         ` Tetsuo Handa
@ 2017-10-09  7:46                           ` Michal Hocko
  2017-10-09  8:06                             ` Tetsuo Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Michal Hocko @ 2017-10-09  7:46 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mst, linux-mm

On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
[...]
> >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Sat, 7 Oct 2017 19:29:21 +0900
> Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> 
> In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> serialize against fill_balloon(). But in fill_balloon(),
> alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> is specified, this allocation attempt might depend on somebody else's
> __GFP_DIRECT_RECLAIM memory allocation.

How would that dependency look like? Is the holder of the lock doing
only __GFP_NORETRY?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-09  7:46                           ` Michal Hocko
@ 2017-10-09  8:06                             ` Tetsuo Handa
  2017-10-09 12:28                               ` Michal Hocko
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-09  8:06 UTC (permalink / raw)
  To: mhocko; +Cc: mst, linux-mm

Michal Hocko wrote:
> On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
> [...]
> > >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > Date: Sat, 7 Oct 2017 19:29:21 +0900
> > Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> > 
> > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > serialize against fill_balloon(). But in fill_balloon(),
> > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > is specified, this allocation attempt might depend on somebody else's
> > __GFP_DIRECT_RECLAIM memory allocation.
> 
> How would that dependency look like? Is the holder of the lock doing
> only __GFP_NORETRY?

__GFP_NORETRY makes difference only after reclaim attempt failed.

Reclaim attempt of __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS request can
indirectly wait for somebody else's GFP_NOFS and/or GFP_NOIO request (e.g.
blocked on filesystem's fs lock). And such indirect GFP_NOFS and/or
GFP_NOIO request can reach __alloc_pages_may_oom() unless they also have
__GFP_NORETRY. And such indirect GFP_NOFS and/or GFP_NOIO request can call
OOM notifier callback and try to hold balloon_lock at leak_balloon() which
fill_balloon() has already held before doing
GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY request.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-09  8:06                             ` Tetsuo Handa
@ 2017-10-09 12:28                               ` Michal Hocko
  2017-10-09 13:31                                 ` Tetsuo Handa
  0 siblings, 1 reply; 38+ messages in thread
From: Michal Hocko @ 2017-10-09 12:28 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mst, linux-mm

On Mon 09-10-17 17:06:51, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
> > [...]
> > > >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Date: Sat, 7 Oct 2017 19:29:21 +0900
> > > Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> > > 
> > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > serialize against fill_balloon(). But in fill_balloon(),
> > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > is specified, this allocation attempt might depend on somebody else's
> > > __GFP_DIRECT_RECLAIM memory allocation.
> > 
> > How would that dependency look like? Is the holder of the lock doing
> > only __GFP_NORETRY?
> 
> __GFP_NORETRY makes difference only after reclaim attempt failed.
> 
> Reclaim attempt of __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS request can
> indirectly wait for somebody else's GFP_NOFS and/or GFP_NOIO request (e.g.
> blocked on filesystem's fs lock). And such indirect GFP_NOFS and/or
> GFP_NOIO request can reach __alloc_pages_may_oom() unless they also have
> __GFP_NORETRY. And such indirect GFP_NOFS and/or GFP_NOIO request can call
> OOM notifier callback and try to hold balloon_lock at leak_balloon() which
> fill_balloon() has already held before doing
> GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY request.

OK, so let me decipher.
 Thread1				Thread2						Thread3
 alloc_pages(GFP_KERNEL)		  fill_balloon					fs_lock #1	
   out_of_memory			    balloon_lock #2				alloc_page(GFP_NOFS)
     blocking_notifier_call_chain	    balloon_page_enqueue			  # keep retrying
       leak_balloon			      alloc_page(GFP_HIGHUSER_MOVABLE)
         balloon_lock #2		        direct_reclaim (__GFP_FS context)
	 				          fs_lock #1

in other words, let's make the description understandable even for
somebody not really familiar with the allocation&reclaim internals.
The whole point is that the dependency is indirect and it requires
more actors and an example call grapg should be easier to follow.

One more nit. If there is a way to estimate how much memory could be
freed by the notifier when the trylock would succeed I would print that
value for debugging purposes.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-09 12:28                               ` Michal Hocko
@ 2017-10-09 13:31                                 ` Tetsuo Handa
  2017-10-09 13:37                                   ` Michal Hocko
  0 siblings, 1 reply; 38+ messages in thread
From: Tetsuo Handa @ 2017-10-09 13:31 UTC (permalink / raw)
  To: mhocko; +Cc: mst, linux-mm

Michal Hocko wrote:
> On Mon 09-10-17 17:06:51, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
> > > [...]
> > > > >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> > > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > Date: Sat, 7 Oct 2017 19:29:21 +0900
> > > > Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> > > > 
> > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > > is specified, this allocation attempt might depend on somebody else's
> > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > 
> > > How would that dependency look like? Is the holder of the lock doing
> > > only __GFP_NORETRY?
> > 
> > __GFP_NORETRY makes difference only after reclaim attempt failed.
> > 
> > Reclaim attempt of __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS request can
> > indirectly wait for somebody else's GFP_NOFS and/or GFP_NOIO request (e.g.
> > blocked on filesystem's fs lock). And such indirect GFP_NOFS and/or
> > GFP_NOIO request can reach __alloc_pages_may_oom() unless they also have
> > __GFP_NORETRY. And such indirect GFP_NOFS and/or GFP_NOIO request can call
> > OOM notifier callback and try to hold balloon_lock at leak_balloon() which
> > fill_balloon() has already held before doing
> > GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY request.
> 
> OK, so let me decipher.
>  Thread1				Thread2						Thread3
>  alloc_pages(GFP_KERNEL)		  fill_balloon					fs_lock #1	
>    out_of_memory			    balloon_lock #2				alloc_page(GFP_NOFS)
>      blocking_notifier_call_chain	    balloon_page_enqueue			  # keep retrying
>        leak_balloon			      alloc_page(GFP_HIGHUSER_MOVABLE)
>          balloon_lock #2		        direct_reclaim (__GFP_FS context)
> 	 				          fs_lock #1
> 
> in other words, let's make the description understandable even for
> somebody not really familiar with the allocation&reclaim internals.
> The whole point is that the dependency is indirect and it requires
> more actors and an example call grapg should be easier to follow.


Yes. But it is more simple. Only two threads are needed.

  Thread1                                       Thread2
    fill_balloon
      balloon_lock #1
      balloon_page_enqueue
        alloc_page(GFP_HIGHUSER_MOVABLE)
          direct reclaim (__GFP_FS context)       fs lock #2
            fs lock #2                              alloc_page(GFP_NOFS)
                                                      __alloc_pages_may_oom()
                                                        oom_lock
                                                        out_of_memory()
                                                          blocking_notifier_call_chain()
                                                            leak_balloon
                                                              balloon_lock #1     # dead lock

And other __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations (if any) will keep
retrying forever because oom_lock is held by Thread2.

> 
> One more nit. If there is a way to estimate how much memory could be
> freed by the notifier when the trylock would succeed I would print that
> value for debugging purposes.

I don't know internal of virtio-balloon.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-09 13:31                                 ` Tetsuo Handa
@ 2017-10-09 13:37                                   ` Michal Hocko
  2017-10-09 14:24                                     ` Michael S. Tsirkin
  0 siblings, 1 reply; 38+ messages in thread
From: Michal Hocko @ 2017-10-09 13:37 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: mst, linux-mm

On Mon 09-10-17 22:31:18, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Mon 09-10-17 17:06:51, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
> > > > [...]
> > > > > >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> > > > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > > Date: Sat, 7 Oct 2017 19:29:21 +0900
> > > > > Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> > > > > 
> > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > > > is specified, this allocation attempt might depend on somebody else's
> > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > 
> > > > How would that dependency look like? Is the holder of the lock doing
> > > > only __GFP_NORETRY?
> > > 
> > > __GFP_NORETRY makes difference only after reclaim attempt failed.
> > > 
> > > Reclaim attempt of __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS request can
> > > indirectly wait for somebody else's GFP_NOFS and/or GFP_NOIO request (e.g.
> > > blocked on filesystem's fs lock). And such indirect GFP_NOFS and/or
> > > GFP_NOIO request can reach __alloc_pages_may_oom() unless they also have
> > > __GFP_NORETRY. And such indirect GFP_NOFS and/or GFP_NOIO request can call
> > > OOM notifier callback and try to hold balloon_lock at leak_balloon() which
> > > fill_balloon() has already held before doing
> > > GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY request.
> > 
> > OK, so let me decipher.
> >  Thread1				Thread2						Thread3
> >  alloc_pages(GFP_KERNEL)		  fill_balloon					fs_lock #1	
> >    out_of_memory			    balloon_lock #2				alloc_page(GFP_NOFS)
> >      blocking_notifier_call_chain	    balloon_page_enqueue			  # keep retrying
> >        leak_balloon			      alloc_page(GFP_HIGHUSER_MOVABLE)
> >          balloon_lock #2		        direct_reclaim (__GFP_FS context)
> > 	 				          fs_lock #1
> > 
> > in other words, let's make the description understandable even for
> > somebody not really familiar with the allocation&reclaim internals.
> > The whole point is that the dependency is indirect and it requires
> > more actors and an example call grapg should be easier to follow.
> 
> 
> Yes. But it is more simple. Only two threads are needed.
> 
>   Thread1                                       Thread2
>     fill_balloon
>       balloon_lock #1
>       balloon_page_enqueue
>         alloc_page(GFP_HIGHUSER_MOVABLE)
>           direct reclaim (__GFP_FS context)       fs lock #2
>             fs lock #2                              alloc_page(GFP_NOFS)
>                                                       __alloc_pages_may_oom()
>                                                         oom_lock
>                                                         out_of_memory()
>                                                           blocking_notifier_call_chain()
>                                                             leak_balloon
>                                                               balloon_lock #1     # dead lock

Oh, right. I forgot we are allowed oom notifiers from NOFS context.
 
> And other __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations (if any) will keep
> retrying forever because oom_lock is held by Thread2.
> 
> > 
> > One more nit. If there is a way to estimate how much memory could be
> > freed by the notifier when the trylock would succeed I would print that
> > value for debugging purposes.
> 
> I don't know internal of virtio-balloon.

Maybe Michael can help here.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC] [PATCH] mm,oom: Offload OOM notify callback to a kernel thread.
  2017-10-09 13:37                                   ` Michal Hocko
@ 2017-10-09 14:24                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2017-10-09 14:24 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Tetsuo Handa, linux-mm

On Mon, Oct 09, 2017 at 03:37:34PM +0200, Michal Hocko wrote:
> On Mon 09-10-17 22:31:18, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> > > On Mon 09-10-17 17:06:51, Tetsuo Handa wrote:
> > > > Michal Hocko wrote:
> > > > > On Sat 07-10-17 20:30:19, Tetsuo Handa wrote:
> > > > > [...]
> > > > > > >From 6a0fd8a5e013ac63a6bcd06bd2ae6fdb25a4f3de Mon Sep 17 00:00:00 2001
> > > > > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > > > > Date: Sat, 7 Oct 2017 19:29:21 +0900
> > > > > > Subject: [PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()
> > > > > > 
> > > > > > In leak_balloon(), mutex_lock(&vb->balloon_lock) is called in order to
> > > > > > serialize against fill_balloon(). But in fill_balloon(),
> > > > > > alloc_page(GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY) is
> > > > > > called with vb->balloon_lock mutex held. Since GFP_HIGHUSER[_MOVABLE]
> > > > > > implies __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS, despite __GFP_NORETRY
> > > > > > is specified, this allocation attempt might depend on somebody else's
> > > > > > __GFP_DIRECT_RECLAIM memory allocation.
> > > > > 
> > > > > How would that dependency look like? Is the holder of the lock doing
> > > > > only __GFP_NORETRY?
> > > > 
> > > > __GFP_NORETRY makes difference only after reclaim attempt failed.
> > > > 
> > > > Reclaim attempt of __GFP_DIRECT_RECLAIM | __GFP_IO | __GFP_FS request can
> > > > indirectly wait for somebody else's GFP_NOFS and/or GFP_NOIO request (e.g.
> > > > blocked on filesystem's fs lock). And such indirect GFP_NOFS and/or
> > > > GFP_NOIO request can reach __alloc_pages_may_oom() unless they also have
> > > > __GFP_NORETRY. And such indirect GFP_NOFS and/or GFP_NOIO request can call
> > > > OOM notifier callback and try to hold balloon_lock at leak_balloon() which
> > > > fill_balloon() has already held before doing
> > > > GFP_HIGHUSER[_MOVABLE] | __GFP_NOMEMALLOC | __GFP_NORETRY request.
> > > 
> > > OK, so let me decipher.
> > >  Thread1				Thread2						Thread3
> > >  alloc_pages(GFP_KERNEL)		  fill_balloon					fs_lock #1	
> > >    out_of_memory			    balloon_lock #2				alloc_page(GFP_NOFS)
> > >      blocking_notifier_call_chain	    balloon_page_enqueue			  # keep retrying
> > >        leak_balloon			      alloc_page(GFP_HIGHUSER_MOVABLE)
> > >          balloon_lock #2		        direct_reclaim (__GFP_FS context)
> > > 	 				          fs_lock #1
> > > 
> > > in other words, let's make the description understandable even for
> > > somebody not really familiar with the allocation&reclaim internals.
> > > The whole point is that the dependency is indirect and it requires
> > > more actors and an example call grapg should be easier to follow.
> > 
> > 
> > Yes. But it is more simple. Only two threads are needed.
> > 
> >   Thread1                                       Thread2
> >     fill_balloon
> >       balloon_lock #1
> >       balloon_page_enqueue
> >         alloc_page(GFP_HIGHUSER_MOVABLE)
> >           direct reclaim (__GFP_FS context)       fs lock #2
> >             fs lock #2                              alloc_page(GFP_NOFS)
> >                                                       __alloc_pages_may_oom()
> >                                                         oom_lock
> >                                                         out_of_memory()
> >                                                           blocking_notifier_call_chain()
> >                                                             leak_balloon
> >                                                               balloon_lock #1     # dead lock
> 
> Oh, right. I forgot we are allowed oom notifiers from NOFS context.
>  
> > And other __GFP_DIRECT_RECLAIM && !__GFP_NORETRY allocations (if any) will keep
> > retrying forever because oom_lock is held by Thread2.
> > 
> > > 
> > > One more nit. If there is a way to estimate how much memory could be
> > > freed by the notifier when the trylock would succeed I would print that
> > > value for debugging purposes.
> > 
> > I don't know internal of virtio-balloon.
> 
> Maybe Michael can help here.

num_pages is a way to estimate that.

> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-10-09 14:25 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-11 10:27 mm, virtio: possible OOM lockup at virtballoon_oom_notify() Tetsuo Handa
2017-09-29  4:00 ` Michael S. Tsirkin
2017-09-29  4:00   ` Michael S. Tsirkin
2017-09-29  4:44   ` Tetsuo Handa
2017-09-29  4:44   ` Tetsuo Handa
2017-10-01  5:44     ` [RFC] [PATCH] mm, oom: Offload OOM notify callback to a kernel thread Tetsuo Handa
2017-10-02  3:59       ` [RFC] [PATCH] mm,oom: " Michael S. Tsirkin
2017-10-02  3:59         ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
2017-10-02  9:06         ` [RFC] [PATCH] mm,oom: " Michal Hocko
2017-10-02  9:06         ` Michal Hocko
2017-10-02 11:33           ` [RFC] [PATCH] mm, oom: " Tetsuo Handa
2017-10-02 11:50             ` Michal Hocko
2017-10-02 13:05               ` [RFC] [PATCH] mm,oom: " Tetsuo Handa
2017-10-02 13:13                 ` Michal Hocko
2017-10-02 13:52                   ` Tetsuo Handa
2017-10-02 14:23                     ` Michael S. Tsirkin
2017-10-02 14:44                       ` Tetsuo Handa
2017-10-07 11:30                         ` Tetsuo Handa
2017-10-09  7:46                           ` Michal Hocko
2017-10-09  8:06                             ` Tetsuo Handa
2017-10-09 12:28                               ` Michal Hocko
2017-10-09 13:31                                 ` Tetsuo Handa
2017-10-09 13:37                                   ` Michal Hocko
2017-10-09 14:24                                     ` Michael S. Tsirkin
2017-10-02 14:15                   ` Michael S. Tsirkin
2017-10-02 14:11               ` Michael S. Tsirkin
2017-10-02 14:11               ` Michael S. Tsirkin
2017-10-02 14:11                 ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
2017-10-02 14:19                 ` [RFC] [PATCH] mm,oom: " Michal Hocko
2017-10-02 14:19                   ` [RFC] [PATCH] mm, oom: " Michal Hocko
2017-10-02 14:29                   ` [RFC] [PATCH] mm,oom: " Michael S. Tsirkin
2017-10-02 14:29                   ` Michael S. Tsirkin
2017-10-02 14:29                     ` [RFC] [PATCH] mm, oom: " Michael S. Tsirkin
2017-10-02 14:31                     ` [RFC] [PATCH] mm,oom: " Michal Hocko
2017-10-02 14:31                     ` Michal Hocko
2017-10-02 14:19                 ` Michal Hocko
2017-10-02 11:50             ` Michal Hocko
2017-10-02  3:59       ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.