All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
	Minchan Kim <minchan@kernel.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Christian Brauner <christian@brauner.io>,
	Christoph Hellwig <hch@infradead.org>,
	Oleg Nesterov <oleg@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jann Horn <jannh@google.com>, Shakeel Butt <shakeelb@google.com>,
	Andy Lutomirski <luto@kernel.org>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Florian Weimer <fweimer@redhat.com>,
	Jan Engelhardt <jengelh@inai.de>,
	Tim Murray <timmurray@google.com>, linux-mm <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team <kernel-team@android.com>
Subject: Re: [PATCH 4/3] mm: drop MMF_OOM_SKIP from exit_mmap
Date: Mon, 3 Jan 2022 13:11:39 +0100	[thread overview]
Message-ID: <YdLn+192/0HfNJyl@dhcp22.suse.cz> (raw)
In-Reply-To: <CAJuCfpGNCX=Z=Bi0N7DAj=CXdLqJOqQ_8kq_HQNaLhAvA5tjPw@mail.gmail.com>

On Thu 30-12-21 09:29:40, Suren Baghdasaryan wrote:
> On Thu, Dec 30, 2021 at 12:24 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 29-12-21 21:59:55, Suren Baghdasaryan wrote:
> > [...]
> > > After some more digging I think there are two acceptable options:
> > >
> > > 1. Call unlock_range() under mmap_write_lock and then downgrade it to
> > > read lock so that both exit_mmap() and __oom_reap_task_mm() can unmap
> > > vmas in parallel like this:
> > >
> > >     if (mm->locked_vm) {
> > >         mmap_write_lock(mm);
> > >         unlock_range(mm->mmap, ULONG_MAX);
> > >         mmap_write_downgrade(mm);
> > >     } else
> > >         mmap_read_lock(mm);
> > > ...
> > >     unmap_vmas(&tlb, vma, 0, -1);
> > >     mmap_read_unlock(mm);
> > >     mmap_write_lock(mm);
> > >     free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
> > > ...
> > >     mm->mmap = NULL;
> > >     mmap_write_unlock(mm);
> > >
> > > This way exit_mmap() might block __oom_reap_task_mm() but for a much
> > > shorter time during unlock_range() call.
> >
> > IIRC unlock_range depends on page lock at some stage and that can mean
> > this will block for a long time or for ever when the holder of the lock
> > depends on a memory allocation. This was the primary problem why the oom
> > reaper skips over mlocked vmas.
> 
> Oh, I missed that detail. I thought __oom_reap_task_mm() skips locked
> vmas only to avoid destroying pgds from under follow_page().
> 
> >
> > > 2. Introduce another vm_flag mask similar to VM_LOCKED which is set
> > > before munlock_vma_pages_range() clears VM_LOCKED so that
> > > __oom_reap_task_mm() can identify vmas being unlocked and skip them.
> > >
> > > Option 1 seems cleaner to me because it keeps the locking pattern
> > > around unlock_range() in exit_mmap() consistent with all other places
> > > it is used (in mremap() and munmap()) with mmap_write_lock taken.
> > > WDYT?
> >
> > It would be really great to make unlock_range oom reaper aware IMHO.
> 
> What exactly do you envision? Say unlock_range() knows that it's
> racing with __oom_reap_task_mm() and that calling follow_page() is
> unsafe without locking, what should it do?

My original plan was to make the page lock conditional and use
trylocking from the oom reaper (aka lockless context). It is OK to
simply bail out and leave some mlocked memory behind if there is a
contention on a specific page. The overall objective is to free as much
memory as possible, not all of it.

IIRC Hugh was not a fan of this approach and he has mentioned that the
lock might not be even really needed and that the area would benefit
from a clean up rather than oom reaper specific hacks. I do tend to
agree with that. I just never managed to find any spare time for that
though and heavily mlocked oom victims tend to be really rare.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2022-01-03 12:11 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-08 21:22 [PATCH v4 1/3] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Suren Baghdasaryan
2021-12-08 21:22 ` [PATCH v4 2/3] mm: document locking restrictions for vm_operations_struct::close Suren Baghdasaryan
2021-12-09  8:55   ` Michal Hocko
2021-12-08 21:22 ` [PATCH v4 3/3] mm/oom_kill: allow process_mrelease to run under mmap_lock protection Suren Baghdasaryan
2021-12-09  8:59   ` Michal Hocko
2021-12-09 19:03     ` Suren Baghdasaryan
2021-12-09  8:55 ` [PATCH v4 1/3] mm: protect free_pgtables with mmap_lock write lock in exit_mmap Michal Hocko
2021-12-09 19:03   ` Suren Baghdasaryan
2021-12-10  9:20     ` Michal Hocko
2021-12-09  9:12 ` [PATCH 4/3] mm: drop MMF_OOM_SKIP from exit_mmap Michal Hocko
2021-12-09 16:24   ` Suren Baghdasaryan
2021-12-09 16:47     ` Michal Hocko
2021-12-09 17:06       ` Suren Baghdasaryan
2021-12-16  2:26         ` Suren Baghdasaryan
2021-12-16 11:49           ` Johannes Weiner
2021-12-16 17:23             ` Suren Baghdasaryan
2021-12-30  5:59               ` Suren Baghdasaryan
2021-12-30  8:24                 ` Michal Hocko
2021-12-30 17:29                   ` Suren Baghdasaryan
2022-01-03 12:11                     ` Michal Hocko [this message]
2022-01-03 21:16                       ` Hugh Dickins
2022-01-04 22:24                         ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdLn+192/0HfNJyl@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian.brauner@ubuntu.com \
    --cc=christian@brauner.io \
    --cc=david@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hch@infradead.org \
    --cc=jannh@google.com \
    --cc=jengelh@inai.de \
    --cc=kernel-team@android.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.