All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yafang Shao <laoar.shao@gmail.com>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Linux MM <linux-mm@kvack.org>
Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom
Date: Wed, 15 Jul 2020 11:10:07 +0800	[thread overview]
Message-ID: <CALOAHbB3wHgUPVJvg6trwWpNzeM=atgvoJ4wzih0g0DFdmStYw@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.2.23.453.2007141934320.2615810@chino.kir.corp.google.com>

On Wed, Jul 15, 2020 at 10:44 AM David Rientjes <rientjes@google.com> wrote:
>
> On Wed, 15 Jul 2020, Yafang Shao wrote:
>
> > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > > index 1962232..15e0e18 100644
> > > > --- a/mm/memcontrol.c
> > > > +++ b/mm/memcontrol.c
> > > > @@ -1560,15 +1560,21 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
> > > >               .gfp_mask = gfp_mask,
> > > >               .order = order,
> > > >       };
> > > > -     bool ret;
> > > > +     bool ret = true;
> > > >
> > > >       if (mutex_lock_killable(&oom_lock))
> > > >               return true;
> > > > +
> > > > +     if (mem_cgroup_margin(memcg) >= (1 << order))
> > > > +             goto unlock;
> > > > +
> > > >       /*
> > > >        * A few threads which were not waiting at mutex_lock_killable() can
> > > >        * fail to bail out. Therefore, check again after holding oom_lock.
> > > >        */
> > > >       ret = should_force_charge() || out_of_memory(&oc);
> > > > +
> > > > +unlock:
> > > >       mutex_unlock(&oom_lock);
> > > >       return ret;
> > > >  }
> > >
> > > Hi Yafang,
> > >
> > > We've run with a patch very much like this for several years and it works
> > > quite successfully to prevent the unnecessary oom killing of processes.
> > >
> > > We do this in out_of_memory() directly, however, because we found that we
> > > could prevent even *more* unnecessary killing if we checked this at the
> > > "point of no return" because the selection of processes takes some
> > > additional time when we might resolve the oom condition.
> > >
> >
> > Hi David,
> >
> > Your proposal could also resolve the issue,
>
> It has successfully resolved it for several years in our kernel, we tried
> an approach similiar to yours but saw many instances where memcg oom kills
> continued to proceed even though the memcg information dumped to the
> kernel log showed memory available.
>
> If this was a page or two that became available due to memory freeing,
> it's not a significant difference.  Instead, if this races with an oom
> notification and a process exiting or being SIGKILL'd, it becomes much
> harder to explain to a user why their process was oom killed when there
> are tens of megabytes of memory available as shown by the kernel log (the
> freeing/exiting happened during a particularly long iteration of processes
> attached to the memcg, for example).
>
> That's what motivated a change to moving this to out_of_memory() directly,
> we found that it prevented even more unnecessary oom kills, which is a
> very good thing.  It may only be easily observable and make a significant
> difference at very large scale, however.
>

Thanks for the clarification.

If it is the race which causes this issue and we want to reduce the
race window, I don't know whether it is proper to check the memcg
margin in out_of_memory() or  do it before calling do_send_sig_info().
Because per my understanding, dump_header() always takes much more
time than select_bad_process() especially if there're slow consoles.
So the race might easily happen when doing dump_header()  or dumping
other information, but if we check the memcg margin after dumping this
oom info, it would be strange to dump so much oom logs without killing
a process.

> > but I'm wondering why do
> > it specifically for memcg oom?
> > Doesn't it apply to global oom?
> > For example, in the global oom, when selecting the processes, the
> > others might free some pages and then it might allocate pages
> > successfully.
> >
>
> It's more complex because memory being allocated from the page allocator
> must be physically contiguous, it's not a simple matter of comparing the
> margin of available memory to the memory being charged.  It could
> theoretically be done but I haven't seen a use case where it has actually
> mattered as opposed to memcg oom where it can happen quite readily at
> scale.  When memory is uncharged to a memcg because of large freeing or a
> process exiting, that's immediately chargable by another process in the
> same hierarchy because of its isolation as opposed to the page allocator
> where that memory is up for grabs and anything on the system could
> allocate it.
>
> > > Some may argue that this is unnecessarily exposing mem_cgroup_margin() to
> > > generic mm code, but in the interest of preventing any unnecessary oom
> > > kill we've found it to be helpful.
> > >
> > > I proposed a variant of this in https://lkml.org/lkml/2020/3/11/1089.
> > >
> > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > > --- a/include/linux/memcontrol.h
> > > +++ b/include/linux/memcontrol.h
> > > @@ -798,6 +798,8 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> > >  void mem_cgroup_split_huge_fixup(struct page *head);
> > >  #endif
> > >
> > > +unsigned long mem_cgroup_margin(struct mem_cgroup *memcg);
> > > +
> > >  #else /* CONFIG_MEMCG */
> > >
> > >  #define MEM_CGROUP_ID_SHIFT    0
> > > @@ -825,6 +827,10 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm,
> > >  {
> > >  }
> > >
> > > +static inline unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> > > +{
> > > +}
> > > +
> > >  static inline unsigned long mem_cgroup_protection(struct mem_cgroup *memcg,
> > >                                                   bool in_low_reclaim)
> > >  {
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -1282,7 +1282,7 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
> > >   * Returns the maximum amount of memory @mem can be charged with, in
> > >   * pages.
> > >   */
> > > -static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> > > +unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> > >  {
> > >         unsigned long margin = 0;
> > >         unsigned long count;
> > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > > --- a/mm/oom_kill.c
> > > +++ b/mm/oom_kill.c
> > > @@ -1109,9 +1109,23 @@ bool out_of_memory(struct oom_control *oc)
> > >                 if (!is_sysrq_oom(oc) && !is_memcg_oom(oc))
> > >                         panic("System is deadlocked on memory\n");
> > >         }
> > > -       if (oc->chosen && oc->chosen != (void *)-1UL)
> > > +       if (oc->chosen && oc->chosen != (void *)-1UL) {
> > > +               if (is_memcg_oom(oc)) {
> > > +                       /*
> > > +                        * If a memcg is now under its limit or current will be
> > > +                        * exiting and freeing memory, avoid needlessly killing
> > > +                        * chosen.
> > > +                        */
> > > +                       if (mem_cgroup_margin(oc->memcg) >= (1 << oc->order) ||
> > > +                           task_will_free_mem(current)) {
> > > +                               put_task_struct(oc->chosen);
> > > +                               return true;
> > > +                       }
> > > +               }
> > > +
> > >                 oom_kill_process(oc, !is_memcg_oom(oc) ? "Out of memory" :
> > >                                  "Memory cgroup out of memory");
> > > +       }
> > >         return !!oc->chosen;
> > >  }
> > >
> >
> >
> > --
> > Thanks
> > Yafang
> >



-- 
Thanks
Yafang


  reply	other threads:[~2020-07-15  3:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14 13:57 [PATCH v2] memcg, oom: check memcg margin for parallel oom Yafang Shao
2020-07-14 14:05 ` Michal Hocko
2020-07-14 14:30 ` Chris Down
2020-07-14 18:46 ` David Rientjes
2020-07-15  1:44   ` Yafang Shao
2020-07-15  2:44     ` David Rientjes
2020-07-15  3:10       ` Yafang Shao [this message]
2020-07-15  3:18         ` David Rientjes
2020-07-15  3:31           ` Yafang Shao
2020-07-15 17:30             ` David Rientjes
2020-07-16  2:38               ` Yafang Shao
2020-07-16  7:04                 ` David Rientjes
2020-07-16 11:53                   ` Yafang Shao
2020-07-16 12:21                     ` Michal Hocko
2020-07-16 13:09                       ` Tetsuo Handa
2020-07-16 19:53                     ` David Rientjes
2020-07-17  1:35                       ` Yafang Shao
2020-07-17 19:26                         ` David Rientjes
2020-07-18  2:15                           ` Yafang Shao
2020-07-16  5:54               ` Tetsuo Handa
2020-07-16  6:11                 ` Michal Hocko
2020-07-16  7:06                   ` David Rientjes
2020-07-16  6:08               ` Michal Hocko
2020-07-16  6:56                 ` David Rientjes
2020-07-16  7:12                   ` Michal Hocko
2020-07-16 20:04                     ` David Rientjes
2020-07-28 18:04                   ` Johannes Weiner
2020-07-15  6:56         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALOAHbB3wHgUPVJvg6trwWpNzeM=atgvoJ4wzih0g0DFdmStYw@mail.gmail.com' \
    --to=laoar.shao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.