From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Joonsoo Kim <js1304@gmail.com>,
Alex Shi <alex.shi@linux.alibaba.com>,
Shakeel Butt <shakeelb@google.com>,
Michal Hocko <mhocko@suse.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, kernel-team@fb.com,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API
Date: Mon, 11 May 2020 11:06:48 -0400 [thread overview]
Message-ID: <20200511150648.GA306292@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.11.2005102350360.2769@eggly.anvils>
On Mon, May 11, 2020 at 12:38:04AM -0700, Hugh Dickins wrote:
> On Fri, 8 May 2020, Johannes Weiner wrote:
> >
> > I looked at this some more, as well as compared it to non-shmem
> > swapping. My conclusion is - and Hugh may correct me on this - that
> > the deletion looks mandatory but is actually an optimization. Page
> > reclaim will ultimately pick these pages up.
> >
> > When non-shmem pages are swapped in by readahead (locked until IO
> > completes) and their page tables are simultaneously unmapped, the
> > zap_pte_range() code calls free_swap_and_cache() and the locked pages
> > are stranded in the swap cache with no page table references. We rely
> > on page reclaim to pick them up later on.
> >
> > The same appears to be true for shmem. If the references to the swap
> > page are zapped while we're trying to swap in, we can strand the page
> > in the swap cache. But it's not up to swapin to detect this reliably,
> > it just frees the page more quickly than having to wait for reclaim.
>
> I think you've got all that exactly right, thanks for working it out.
> It originates from v3.7's 215c02bc33bb ("tmpfs: fix shmem_getpage_gfp()
> VM_BUG_ON") - in which I also had to thank you.
I should have looked where it actually came from - I had forgotten
about that patch!
> I think I chose to do the delete_from_swap_cache() right there, partly
> because of following shmem_unuse_inode() code which already did that,
> partly on the basis that while we have to observe the case then it's
> better to clean it up, and partly out of guilt that our page lock here
> is what had prevented shmem_undo_range() from completing its job; but
> I believe you're right that unused swapcache reclaim would sort it out
> eventually.
That makes sense to me.
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index e80167927dce..236642775f89 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -640,7 +640,7 @@ static int shmem_add_to_page_cache(struct page *page,
> > xas_lock_irq(&xas);
> > entry = xas_find_conflict(&xas);
> > if (entry != expected)
> > - xas_set_err(&xas, -EEXIST);
> > + xas_set_err(&xas, expected ? -ENOENT : -EEXIST);
>
> Two things on this.
>
> Minor matter of taste, I'd prefer that as
> xas_set_err(&xas, entry ? -EEXIST : -ENOENT);
> which would be more general and more understandable -
> but what you have written should be fine for the actual callers.
Yes, checking `expected' was to differentiate the behavior depending
on the callsite. But testing `entry' is more obvious in that location.
> Except... I think returning -ENOENT there will not work correctly,
> in the case of a punched hole. Because (unless you've reworked it
> and I just haven't looked) shmem_getpage_gfp() knows to retry in
> the case of -EEXIST, but -ENOENT will percolate up to shmem_fault()
> and result in a SIGBUS, or a read/write error, when the hole should
> just get refilled instead.
Good catch, I had indeed missed that. I'm going to make it retry on
-ENOENT as well.
We could have it go directly to allocating a new page, but it seems
unnecessarily complicated: we've already been retrying in this
situation until now, so I would stick to "there was a race, retry."
> Not something that needs fixing in a hurry (it took trinity to
> generate this racy case in the first place), I'll take another look
> once I've pulled it into a tree (or collected next mmotm) - unless
> you've already have changed it around by then.
Attaching a delta fix based on your observations.
Andrew, barring any objections to this, could you please fold it into
the version you have in your tree already?
---
From 33d03ceebce0a6261d472ddc9c5a07940f44714c Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 11 May 2020 10:45:14 -0400
Subject: [PATCH] mm: memcontrol: convert page cache to a new
mem_cgroup_charge() API fix
Incorporate Hugh's feedback:
- shmem_getpage_gfp() needs to handle the new -ENOENT that was
previously implied in the -EEXIST when a swap entry changed under us
in any way. Otherwise hole punching could cause a racing fault to
SIGBUS instead of allocating a new page.
- It is indeed page reclaim that picks up any swapcache we leave
stranded when free_swap_and_cache() runs on a page locked by
somebody else. Document that our delete_from_swap_cache() is an
optimization, not something we rely on for correctness.
- Style cleanup: testing `expected' to decide on -EEXIST vs -ENOENT
differentiates the callsites, but is a bit awkward to read. Test
`entry' instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---
mm/shmem.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index afd5a057ebb7..00fb001e8f3e 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -638,7 +638,7 @@ static int shmem_add_to_page_cache(struct page *page,
xas_lock_irq(&xas);
entry = xas_find_conflict(&xas);
if (entry != expected)
- xas_set_err(&xas, expected ? -ENOENT : -EEXIST);
+ xas_set_err(&xas, entry ? -EEXIST : -ENOENT);
xas_create_range(&xas);
if (xas_error(&xas))
goto unlock;
@@ -1686,10 +1686,13 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
* We already confirmed swap under page lock, but
* free_swap_and_cache() only trylocks a page, so it
* is just possible that the entry has been truncated
- * or holepunched since swap was confirmed.
- * shmem_undo_range() will have done some of the
- * unaccounting, now delete_from_swap_cache() will do
- * the rest.
+ * or holepunched since swap was confirmed. This could
+ * occur at any time while the page is locked, and
+ * usually page reclaim will take care of the stranded
+ * swapcache page. But when we catch it, we may as
+ * well clean up after ourselves: shmem_undo_range()
+ * will have done some of the unaccounting, now
+ * delete_from_swap_cache() will do the rest.
*/
if (error == -ENOENT)
delete_from_swap_cache(page);
@@ -1765,7 +1768,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
if (xa_is_value(page)) {
error = shmem_swapin_page(inode, index, &page,
sgp, gfp, vma, fault_type);
- if (error == -EEXIST)
+ if (error == -EEXIST || error == -ENOENT)
goto repeat;
*pagep = page;
--
2.26.2
next prev parent reply other threads:[~2020-05-11 15:07 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-20 22:11 [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-20 22:11 ` [PATCH 01/18] mm: fix NUMA node file count error in replace_page_cache() Johannes Weiner
2020-04-21 8:28 ` Alex Shi
2020-04-21 19:13 ` Shakeel Butt
2020-04-22 6:34 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving Johannes Weiner
2020-04-22 6:36 ` Joonsoo Kim
2020-04-22 16:51 ` Shakeel Butt
2020-04-22 17:42 ` Johannes Weiner
2020-04-22 18:01 ` Shakeel Butt
2020-04-22 18:02 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 03/18] mm: memcontrol: drop @compound parameter from memcg charging API Johannes Weiner
2020-04-21 9:11 ` Alex Shi
2020-04-22 6:37 ` Joonsoo Kim
2020-04-22 17:30 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 04/18] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
2020-04-21 9:11 ` Alex Shi
2020-04-22 6:37 ` Joonsoo Kim
2020-04-22 22:20 ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API Johannes Weiner
2020-04-21 9:12 ` Alex Shi
2020-04-22 6:40 ` Joonsoo Kim
2020-04-22 12:09 ` Johannes Weiner
2020-04-23 5:25 ` Joonsoo Kim
2020-05-08 16:01 ` Johannes Weiner
2020-05-11 1:57 ` Joonsoo Kim
2020-05-11 7:38 ` Hugh Dickins
2020-05-11 15:06 ` Johannes Weiner [this message]
2020-05-11 16:32 ` Hugh Dickins
2020-05-11 18:10 ` Johannes Weiner
2020-05-11 18:12 ` Johannes Weiner
2020-05-11 18:44 ` Hugh Dickins
2020-04-20 22:11 ` [PATCH 06/18] mm: memcontrol: prepare uncharging for removal of private page type counters Johannes Weiner
2020-04-21 9:12 ` Alex Shi
2020-04-22 6:41 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 07/18] mm: memcontrol: prepare move_account " Johannes Weiner
2020-04-21 9:13 ` Alex Shi
2020-04-22 6:41 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 08/18] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters Johannes Weiner
2020-04-22 6:42 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 09/18] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters Johannes Weiner
2020-04-22 6:42 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 10/18] mm: memcontrol: switch to native NR_ANON_MAPPED counter Johannes Weiner
2020-04-22 6:51 ` Joonsoo Kim
2020-04-22 12:28 ` Johannes Weiner
2020-04-23 5:27 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 11/18] mm: memcontrol: switch to native NR_ANON_THPS counter Johannes Weiner
2020-04-24 0:29 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 12/18] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Johannes Weiner
2020-04-24 0:29 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 13/18] mm: memcontrol: drop unused try/commit/cancel charge API Johannes Weiner
2020-04-24 0:30 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 14/18] mm: memcontrol: prepare swap controller setup for integration Johannes Weiner
2020-04-24 0:30 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 15/18] mm: memcontrol: make swap tracking an integral part of memory control Johannes Weiner
2020-04-21 9:27 ` Alex Shi
2020-04-21 14:39 ` Johannes Weiner
2020-04-22 3:14 ` Alex Shi
2020-04-22 13:30 ` Johannes Weiner
2020-04-22 13:40 ` Alex Shi
2020-04-22 13:43 ` Alex Shi
2020-04-24 0:30 ` Joonsoo Kim
2020-04-24 3:01 ` Johannes Weiner
2020-04-20 22:11 ` [PATCH 16/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-21 9:21 ` Alex Shi
2020-04-24 0:44 ` Joonsoo Kim
2020-04-24 2:51 ` Johannes Weiner
2020-04-28 6:49 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 17/18] mm: memcontrol: delete unused lrucare handling Johannes Weiner
2020-04-24 0:46 ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 18/18] mm: memcontrol: update page->mem_cgroup stability rules Johannes Weiner
2020-04-21 9:20 ` Alex Shi
2020-04-24 0:48 ` Joonsoo Kim
2020-04-21 9:10 ` Hillf Danton
2020-04-21 14:34 ` Johannes Weiner
2020-04-21 9:32 ` [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Alex Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200511150648.GA306292@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=alex.shi@linux.alibaba.com \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hughd@google.com \
--cc=js1304@gmail.com \
--cc=kernel-team@fb.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).