linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Michal Hocko <mhocko@kernel.org>, David Rientjes <rientjes@google.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm,oom: Bring OOM notifier callbacks to outside of OOM killer.
Date: Thu, 21 Jun 2018 20:27:41 +0900	[thread overview]
Message-ID: <2d8c3056-1bc2-9a32-d745-ab328fd587a1@i-love.sakura.ne.jp> (raw)
In-Reply-To: <20180621073142.GA10465@dhcp22.suse.cz>

On 2018/06/21 7:36, David Rientjes wrote:
>> @@ -1010,6 +1010,33 @@ int unregister_oom_notifier(struct notifier_block *nb)
>>  EXPORT_SYMBOL_GPL(unregister_oom_notifier);
>>  
>>  /**
>> + * try_oom_notifier - Try to reclaim memory from OOM notifier list.
>> + *
>> + * Returns non-zero if notifier callbacks released something, zero otherwise.
>> + */
>> +unsigned long try_oom_notifier(void)
> 
> It certainly is tried, but based on its usage it would probably be better 
> to describe what is being returned (it's going to set *did_some_progress, 
> which is a page count).

Well, it depends on what the callbacks are doing. Currently, we have 5 users.

  arch/powerpc/platforms/pseries/cmm.c
  arch/s390/mm/cmm.c
  drivers/gpu/drm/i915/i915_gem_shrinker.c
  drivers/virtio/virtio_balloon.c
  kernel/rcu/tree_plugin.h

Speak of rcu_oom_notify() in kernel/rcu/tree_plugin.h , we can't tell whether
the callback helped releasing memory, for it does not update the "freed" argument.

>> +{
>> +	static DEFINE_MUTEX(oom_notifier_lock);
>> +	unsigned long freed = 0;
>> +
>> +	/*
>> +	 * Since OOM notifier callbacks must not depend on __GFP_DIRECT_RECLAIM
>> +	 * && !__GFP_NORETRY memory allocation, waiting for mutex here is safe.
>> +	 * If lockdep reports possible deadlock dependency, it will be a bug in
>> +	 * OOM notifier callbacks.
>> +	 *
>> +	 * If SIGKILL is pending, it is likely that current thread was selected
>> +	 * as an OOM victim. In that case, current thread should return as soon
>> +	 * as possible using memory reserves.
>> +	 */
>> +	if (mutex_lock_killable(&oom_notifier_lock))
>> +		return 0;
>> +	blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
>> +	mutex_unlock(&oom_notifier_lock);
>> +	return freed;
>> +}
> 
> If __blocking_notifier_call_chain() used down_read_killable(), could we 
> eliminate oom_notifier_lock?

I don't think we can eliminate it now, for it is a serialization lock
(while trying to respond to SIGKILL as soon as possible) which is currently
achieved by mutex_trylock(&oom_lock).

(1) rcu_oom_notify() in kernel/rcu/tree_plugin.h is not prepared for being
    called concurrently.

----------
static int rcu_oom_notify(struct notifier_block *self,
			  unsigned long notused, void *nfreed)
{
	int cpu;

	/* Wait for callbacks from earlier instance to complete. */
	wait_event(oom_callback_wq, atomic_read(&oom_callback_count) == 0); // <= Multiple threads can pass this line at the same time.
	smp_mb(); /* Ensure callback reuse happens after callback invocation. */

	/*
	 * Prevent premature wakeup: ensure that all increments happen
	 * before there is a chance of the counter reaching zero.
	 */
	atomic_set(&oom_callback_count, 1); // <= Multiple threads can execute this line at the same time.

	for_each_online_cpu(cpu) {
		smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
		cond_resched_tasks_rcu_qs();
	}

	/* Unconditionally decrement: no need to wake ourselves up. */
	atomic_dec(&oom_callback_count); // <= Multiple threads can execute this line at the same time, making oom_callback_count < 0 ?

	return NOTIFY_OK;
}
----------

    The counter inconsistency problem could be fixed by

-	atomic_set(&oom_callback_count, 1);
+	atomic_inc(&oom_callback_count);

    but who becomes happy if rcu_oom_notify() became ready to be called
    concurrently? We want to wait for the callback to complete before
    proceeding to the OOM killer. I think that we should save CPU resource
    by serializing concurrent callers.

(2) i915_gem_shrinker_oom() in drivers/gpu/drm/i915/i915_gem_shrinker.c depends
    on mutex_trylock() from shrinker_lock() from i915_gem_shrink() from
    i915_gem_shrink_all() to return 1 (i.e. succeed) before need_resched()
    becomes true in order to avoid returning without reclaiming memory.

> This patch is certainly an improvement because it does the last 
> get_page_from_freelist() call after invoking the oom notifiers that can 
> free memory and we've otherwise pointlessly redirected it elsewhere.

Thanks, but this patch might break subtle balance which is currently
achieved by mutex_trylock(&oom_lock) serialization/exclusion.

(3) virtballoon_oom_notify() in drivers/virtio/virtio_balloon.c by default
    tries to release 256 pages. Since this value is configurable, one might
    set 1048576 pages. If virtballoon_oom_notify() is concurrently called by
    many threads, it might needlessly deflate the memory balloon.

We might want to remember and reuse the last result among serialized callers
(feedback mechanism) like

{
	static DEFINE_MUTEX(oom_notifier_lock);
	static unsigned long last_freed;
	unsigned long freed = 0;
	if (mutex_trylock(&oom_notifier_lock)) {
		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
		last_freed = freed;
	} else {
		mutex_lock(&oom_notifier_lock);
		freed = last_freed;
	}
	mutex_unlock(&oom_notifier_lock);
	return freed;

}

or

{
	static DEFINE_MUTEX(oom_notifier_lock);
	static unsigned long last_freed;
	unsigned long freed = 0;
	if (mutex_lock_killable(&oom_notifier_lock)) {
		freed = last_freed;
		last_freed >>= 1;
		return freed;
	} else if (last_freed) {
		freed = last_freed;
		last_freed >>= 1;
	} else {
		blocking_notifier_call_chain(&oom_notify_list, 0, &freed);
		last_freed = freed;
	}
	mutex_unlock(&oom_notifier_lock);
	return freed;
}

. Without feedback mechanism, mutex_lock_killable(&oom_notifier_lock) serialization
could still needlessly deflate the memory balloon compared to mutex_trylock(&oom_lock)
serialization/exclusion. Maybe virtballoon_oom_notify() (and two CMM users) would
implement feedback mechanism themselves, by examining watermark from OOM notifier
hooks.



On 2018/06/21 16:31, Michal Hocko wrote:
> On Wed 20-06-18 15:36:45, David Rientjes wrote:
> [...]
>> That makes me think that "oom_notify_list" isn't very intuitive: it can 
>> free memory as a last step prior to oom kill.  OOM notify, to me, sounds 
>> like its only notifying some callbacks about the condition.  Maybe 
>> oom_reclaim_list and then rename this to oom_reclaim_pages()?
> 
> Yes agreed and that is the reason I keep saying we want to get rid of
> this yet-another-reclaim mechanism. We already have shrinkers which are
> the main source of non-lru pages reclaim. Why do we even need
> oom_reclaim_pages? What is fundamentally different here? Sure those
> pages should be reclaimed as the last resort but we already do have
> priority for slab shrinking so we know that the system is struggling
> when reaching the lowest priority. Isn't that enough to express the need
> for current oom notifier implementations?
> 

Even if we update OOM notifier users to use shrinker hooks, they will need a
subtle balance which is currently achieved by mutex_trylock(&oom_lock).

Removing OOM notifier is not doable right now. It is not suitable as a regression
fix for commit 27ae357fa82be5ab ("mm, oom: fix concurrent munlock and oom reaper
unmap, v3"). What we could afford for this regression is
https://patchwork.kernel.org/patch/9842889/ which is exactly what you suggested
in a thread at https://www.spinics.net/lists/linux-mm/msg117896.html .

  reply	other threads:[~2018-06-21 11:28 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-20 11:20 [PATCH] mm,oom: Bring OOM notifier callbacks to outside of OOM killer Tetsuo Handa
2018-06-20 11:55 ` Michal Hocko
2018-06-20 12:21   ` Tetsuo Handa
2018-06-20 13:07     ` Michal Hocko
2018-06-25 13:03   ` peter enderborg
2018-06-25 13:07     ` Michal Hocko
2018-06-25 14:02       ` peter enderborg
2018-06-25 14:04       ` peter enderborg
2018-06-25 14:12         ` Michal Hocko
2018-06-20 22:36 ` David Rientjes
2018-06-21  7:31   ` Michal Hocko
2018-06-21 11:27     ` Tetsuo Handa [this message]
2018-06-21 12:05       ` Michal Hocko
2018-06-26 17:03       ` Paul E. McKenney
2018-06-26 20:10         ` Tetsuo Handa
2018-06-26 23:50           ` Paul E. McKenney
2018-06-27 10:52             ` Tetsuo Handa
2018-06-27 14:28               ` Paul E. McKenney
2018-06-27  7:22         ` Michal Hocko
2018-06-27 14:31           ` Paul E. McKenney
2018-06-28 11:39             ` Michal Hocko
2018-06-28 21:31               ` Paul E. McKenney
2018-06-29  9:04                 ` Michal Hocko
2018-06-29 12:52                   ` Paul E. McKenney
2018-06-29 13:26                     ` Michal Hocko
2018-06-30 17:05                       ` Paul E. McKenney
2018-07-02 12:00                         ` Michal Hocko
2018-07-02 21:37                         ` Paul E. McKenney
2018-07-03  7:24                           ` Michal Hocko
2018-07-03 16:01                             ` Paul E. McKenney
2018-07-06  5:39                               ` Michal Hocko
2018-07-06 12:22                                 ` Paul E. McKenney
2018-06-29 14:35                     ` Tetsuo Handa
2018-06-30 17:19                       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d8c3056-1bc2-9a32-d745-ab328fd587a1@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).