linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Rik van Riel <riel@surriel.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Yu Zhao <yuzhao@google.com>, Greg Thelen <gthelen@google.com>,
	Shakeel Butt <shakeelb@google.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 01/13] mm/munlock: delete page_mlock() and all its works
Date: Wed, 9 Feb 2022 19:31:27 +0100	[thread overview]
Message-ID: <4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz> (raw)
In-Reply-To: <5ed1f01-3e7e-7e26-cc1-2b7a574e2147@google.com>

On 2/6/22 22:30, Hugh Dickins wrote:
> We have recommended some applications to mlock their userspace, but that
> turns out to be counter-productive: when many processes mlock the same
> file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it
> is needed for write, to remove any vma mapping that file from rmap's tree;
> but hogged for read by those with mlocks calling page_mlock() (formerly
> known as try_to_munlock()) on *each* page mapped from the file (the
> purpose being to find out whether another process has the page mlocked,
> so therefore it should not be unmlocked yet).
> 
> Several optimizations have been made in the past: one is to skip
> page_mlock() when mapcount tells that nothing else has this page
> mapped; but that doesn't help at all when others do have it mapped.
> This time around, I initially intended to add a preliminary search
> of the rmap tree for overlapping VM_LOCKED ranges; but that gets
> messy with locking order, when in doubt whether a page is actually
> present; and risks adding even more contention on the i_mmap_rwsem.
> 
> A solution would be much easier, if only there were space in struct page
> for an mlock_count... but actually, most of the time, there is space for
> it - an mlocked page spends most of its life on an unevictable LRU, but
> since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has
> been redundant.  Let's try to reuse its page->lru.
> 
> But leave that until a later patch: in this patch, clear the ground by
> removing page_mlock(), and all the infrastructure that has gathered
> around it - which mostly hinders understanding, and will make reviewing
> new additions harder.  Don't mind those old comments about THPs, they
> date from before 4.5's refcounting rework: splitting is not a risk here.
> 
> Just keep a minimal version of munlock_vma_page(), as reminder of what it
> should attend to (in particular, the odd way PGSTRANDED is counted out of
> PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range().  Move
> unchanged __mlock_posix_error_return() out of the way, down to above its
> caller: this series then makes no further change after mlock_fixup().
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>

While I understand the reasons to clear the ground first, wonder what are
the implications for bisectability - is there a risk of surprising failures?
Maybe we should at least explicitly spell out the implications here?
IIUC, pages that once become mlocked, will stay mlocked, implicating the
Mlocked meminfo counter and inability to reclaim them. But if e.g. a process
that did mlockall() exits, its exclusive pages will be freed anyway, so it's
not a catastrophic kind of leak, right?
Yet it differs from the existing "failure modes" where pages would be left
as "stranded" due to failure of being isolated, because they would at least
go through TestClearPageMlocked and counters update.

>  
>  /*
> @@ -413,75 +136,11 @@ static unsigned long __munlock_pagevec_fill(struct pagevec *pvec,
>   *
>   * Returns with VM_LOCKED cleared.  Callers must be prepared to
>   * deal with this.
> - *
> - * We don't save and restore VM_LOCKED here because pages are
> - * still on lru.  In unmap path, pages might be scanned by reclaim
> - * and re-mlocked by page_mlock/try_to_unmap before we unmap and
> - * free them.  This will result in freeing mlocked pages.
>   */
>  void munlock_vma_pages_range(struct vm_area_struct *vma,
>  			     unsigned long start, unsigned long end)
>  {
> -	vma->vm_flags &= VM_LOCKED_CLEAR_MASK;

Should we at least keep doing the flags clearing? I haven't check if there
are some VM_BUG_ONs that would trip on not cleared, but wouldn't be entirely
surprised.

  reply	other threads:[~2022-02-09 18:31 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-06 21:27 [PATCH 00/13] mm/munlock: rework of mlock+munlock page handling Hugh Dickins
2022-02-06 21:30 ` [PATCH 01/13] mm/munlock: delete page_mlock() and all its works Hugh Dickins
2022-02-09 18:31   ` Vlastimil Babka [this message]
2022-02-09 22:28     ` Hugh Dickins
2022-02-10  9:52       ` Vlastimil Babka
2022-02-14  6:52         ` Hugh Dickins
2022-02-14  6:59           ` [PATCH v2 " Hugh Dickins
2022-02-14 10:07             ` Vlastimil Babka
2022-02-06 21:32 ` [PATCH 02/13] mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE Hugh Dickins
2022-02-10 11:35   ` Vlastimil Babka
2022-02-06 21:34 ` [PATCH 03/13] mm/munlock: delete munlock_vma_pages_all(), allow oomreap Hugh Dickins
2022-02-10 15:30   ` Vlastimil Babka
2022-02-06 21:36 ` [PATCH 04/13] mm/munlock: rmap call mlock_vma_page() munlock_vma_page() Hugh Dickins
2022-02-11 10:29   ` Vlastimil Babka
2022-02-14  7:05     ` [PATCH v2 " Hugh Dickins
2022-02-06 21:38 ` [PATCH 05/13] mm/munlock: replace clear_page_mlock() by final clearance Hugh Dickins
2022-02-11 11:42   ` Vlastimil Babka
2022-02-06 21:38 ` [PATCH 00/13] mm/munlock: rework of mlock+munlock page handling Matthew Wilcox
2022-02-07 18:20   ` Hugh Dickins
2022-02-06 21:40 ` [PATCH 06/13] mm/munlock: maintain page->mlock_count while unevictable Hugh Dickins
2022-02-11 12:27   ` Vlastimil Babka
2022-02-14  5:42     ` Hugh Dickins
2022-02-11 18:07   ` Vlastimil Babka
2022-02-14  6:28     ` Hugh Dickins
2022-02-06 21:42 ` [PATCH 07/13] mm/munlock: mlock_pte_range() when mlocking or munlocking Hugh Dickins
2022-02-11 16:45   ` Vlastimil Babka
2022-02-14  6:32     ` Hugh Dickins
2022-02-14  7:09       ` [PATCH v2 " Hugh Dickins
2022-02-06 21:43 ` [PATCH 08/13] mm/migrate: __unmap_and_move() push good newpage to LRU Hugh Dickins
2022-02-11 17:19   ` Vlastimil Babka
2022-02-06 21:45 ` [PATCH 09/13] mm/munlock: delete smp_mb() from __pagevec_lru_add_fn() Hugh Dickins
2022-02-11 17:43   ` Vlastimil Babka
2022-02-06 21:47 ` [PATCH 10/13] mm/munlock: mlock_page() munlock_page() batch by pagevec Hugh Dickins
2022-02-09  8:15   ` Geert Uytterhoeven
2022-02-09 15:45     ` Hugh Dickins
2022-02-11 18:26   ` Vlastimil Babka
2022-02-14  7:15     ` [PATCH v2 " Hugh Dickins
2022-02-06 21:49 ` [PATCH 11/13] mm/munlock: page migration needs mlock pagevec drained Hugh Dickins
2022-02-11 18:49   ` Vlastimil Babka
2022-02-14  5:34     ` Hugh Dickins
2022-02-14  7:17       ` [PATCH v2 " Hugh Dickins
2022-02-06 21:51 ` [PATCH 12/13] mm/thp: collapse_file() do try_to_unmap(TTU_BATCH_FLUSH) Hugh Dickins
2022-02-06 21:53 ` [PATCH 13/13] mm/thp: shrink_page_list() avoid splitting VM_LOCKED THP Hugh Dickins
     [not found] ` <20220207033518.2602-1-hdanton@sina.com>
2022-02-07 18:46   ` [PATCH 07/13] mm/munlock: mlock_pte_range() when mlocking or munlocking Hugh Dickins
2022-02-09 15:35 ` [PATCH 00/13] mm/munlock: rework of mlock+munlock page handling Michal Hocko
2022-02-09 16:21   ` Hugh Dickins
2022-02-09 21:01     ` Michal Hocko
2022-02-09 22:59       ` Hugh Dickins
2022-02-10  7:49         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a5bc989-e59a-d421-faf4-8156f700ec99@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=david@redhat.com \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).