linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	Shakeel Butt <shakeelb@google.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@suse.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Roman Gushchin <guro@fb.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API
Date: Mon, 11 May 2020 10:57:32 +0900	[thread overview]
Message-ID: <20200511015732.GA7749@js1304-desktop> (raw)
In-Reply-To: <20200508160122.GB181181@cmpxchg.org>

On Fri, May 08, 2020 at 12:01:22PM -0400, Johannes Weiner wrote:
> On Thu, Apr 23, 2020 at 02:25:06PM +0900, Joonsoo Kim wrote:
> > On Wed, Apr 22, 2020 at 08:09:46AM -0400, Johannes Weiner wrote:
> > > On Wed, Apr 22, 2020 at 03:40:41PM +0900, Joonsoo Kim wrote:
> > > > On Mon, Apr 20, 2020 at 06:11:13PM -0400, Johannes Weiner wrote:
> > > > > @@ -1664,29 +1678,22 @@ static int shmem_swapin_page(struct inode *inode, pgoff_t index,
> > > > >  			goto failed;
> > > > >  	}
> > > > >  
> > > > > -	error = mem_cgroup_try_charge_delay(page, charge_mm, gfp, &memcg);
> > > > > -	if (!error) {
> > > > > -		error = shmem_add_to_page_cache(page, mapping, index,
> > > > > -						swp_to_radix_entry(swap), gfp);
> > > > > -		/*
> > > > > -		 * We already confirmed swap under page lock, and make
> > > > > -		 * no memory allocation here, so usually no possibility
> > > > > -		 * of error; but free_swap_and_cache() only trylocks a
> > > > > -		 * page, so it is just possible that the entry has been
> > > > > -		 * truncated or holepunched since swap was confirmed.
> > > > > -		 * shmem_undo_range() will have done some of the
> > > > > -		 * unaccounting, now delete_from_swap_cache() will do
> > > > > -		 * the rest.
> > > > > -		 */
> > > > > -		if (error) {
> > > > > -			mem_cgroup_cancel_charge(page, memcg);
> > > > > -			delete_from_swap_cache(page);
> > > > > -		}
> > > > > -	}
> > > > > -	if (error)
> > > > > +	error = shmem_add_to_page_cache(page, mapping, index,
> > > > > +					swp_to_radix_entry(swap), gfp,
> > > > > +					charge_mm);
> > > > > +	/*
> > > > > +	 * We already confirmed swap under page lock, and make no
> > > > > +	 * memory allocation here, so usually no possibility of error;
> > > > > +	 * but free_swap_and_cache() only trylocks a page, so it is
> > > > > +	 * just possible that the entry has been truncated or
> > > > > +	 * holepunched since swap was confirmed.  shmem_undo_range()
> > > > > +	 * will have done some of the unaccounting, now
> > > > > +	 * delete_from_swap_cache() will do the rest.
> > > > > +	 */
> > > > > +	if (error) {
> > > > > +		delete_from_swap_cache(page);
> > > > >  		goto failed;
> > > > 
> > > > -EEXIST (from swap cache) and -ENOMEM (from memcg) should be handled
> > > > differently. delete_from_swap_cache() is for -EEXIST case.
> > > 
> > > Good catch, I accidentally changed things here.
> > > 
> > > I was just going to change it back, but now I'm trying to understand
> > > how it actually works.
> > > 
> > > Who is removing the page from swap cache if shmem_undo_range() races
> > > but we fail to charge the page?
> > > 
> > > Here is how this race is supposed to be handled: The page is in the
> > > swapcache, we have it locked and confirmed that the entry in i_pages
> > > is indeed a swap entry. We charge the page, then we try to replace the
> > > swap entry in i_pages with the actual page. If we determine, under
> > > tree lock now, that shmem_undo_range has raced with us, unaccounted
> > > the swap space, but must have failed to get the page lock, we remove
> > > the page from swap cache on our side, to free up swap slot and page.
> > > 
> > > But what if shmem_undo_range() raced with us, deleted the swap entry
> > > from i_pages while we had the page locked, but then we simply failed
> > > to charge? We unlock the page and return -EEXIST (shmem_confirm_swap
> > > at the exit). The page with its userdata is now in swapcache, but no
> > > corresponding swap entry in i_pages. shmem_getpage_gfp() sees the
> > > -EEXIST, retries, finds nothing in i_pages and allocates a new, empty
> > > page.
> > > 
> > > Aren't we leaking the swap slot and the page?
> > 
> > Yes, you're right! It seems that it's possible to leak the swap slot
> > and the page. Race could happen for all the places after lock_page()
> > and shmem_confirm_swap() are done. And, I think that it's not possible
> > to fix the problem in shmem_swapin_page() side since we can't know the
> > timing that trylock_page() is called. Maybe, solution would be,
> > instead of using free_swap_and_cache() in shmem_undo_range() that
> > calls trylock_page(), to use another function that calls lock_page().
> 
> I looked at this some more, as well as compared it to non-shmem
> swapping. My conclusion is - and Hugh may correct me on this - that
> the deletion looks mandatory but is actually an optimization. Page
> reclaim will ultimately pick these pages up.
> 
> When non-shmem pages are swapped in by readahead (locked until IO
> completes) and their page tables are simultaneously unmapped, the
> zap_pte_range() code calls free_swap_and_cache() and the locked pages
> are stranded in the swap cache with no page table references. We rely
> on page reclaim to pick them up later on.
> 
> The same appears to be true for shmem. If the references to the swap
> page are zapped while we're trying to swap in, we can strand the page
> in the swap cache. But it's not up to swapin to detect this reliably,
> it just frees the page more quickly than having to wait for reclaim.
> 
> That being said, my patch introduces potentially undesirable behavior
> (although AFAICS no correctness problem): We should only delete the
> page from swapcache when we actually raced with undo_range - which we
> see from the swap entry having been purged from the page cache
> tree. If we delete the page from swapcache just because we failed to
> charge it, the next fault has to read the still-valid page again from
> the swap device.

I got it! Thanks for explanation.

Thanks.



  reply	other threads:[~2020-05-11  1:57 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-20 22:11 [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-20 22:11 ` [PATCH 01/18] mm: fix NUMA node file count error in replace_page_cache() Johannes Weiner
2020-04-21  8:28   ` Alex Shi
2020-04-21 19:13   ` Shakeel Butt
2020-04-22  6:34   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 02/18] mm: memcontrol: fix theoretical race in charge moving Johannes Weiner
2020-04-22  6:36   ` Joonsoo Kim
2020-04-22 16:51   ` Shakeel Butt
2020-04-22 17:42     ` Johannes Weiner
2020-04-22 18:01       ` Shakeel Butt
2020-04-22 18:02   ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 03/18] mm: memcontrol: drop @compound parameter from memcg charging API Johannes Weiner
2020-04-21  9:11   ` Alex Shi
2020-04-22  6:37   ` Joonsoo Kim
2020-04-22 17:30   ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 04/18] mm: memcontrol: move out cgroup swaprate throttling Johannes Weiner
2020-04-21  9:11   ` Alex Shi
2020-04-22  6:37   ` Joonsoo Kim
2020-04-22 22:20   ` Shakeel Butt
2020-04-20 22:11 ` [PATCH 05/18] mm: memcontrol: convert page cache to a new mem_cgroup_charge() API Johannes Weiner
2020-04-21  9:12   ` Alex Shi
2020-04-22  6:40   ` Joonsoo Kim
2020-04-22 12:09     ` Johannes Weiner
2020-04-23  5:25       ` Joonsoo Kim
2020-05-08 16:01         ` Johannes Weiner
2020-05-11  1:57           ` Joonsoo Kim [this message]
2020-05-11  7:38           ` Hugh Dickins
2020-05-11 15:06             ` Johannes Weiner
2020-05-11 16:32               ` Hugh Dickins
2020-05-11 18:10                 ` Johannes Weiner
2020-05-11 18:12                   ` Johannes Weiner
2020-05-11 18:44                   ` Hugh Dickins
2020-04-20 22:11 ` [PATCH 06/18] mm: memcontrol: prepare uncharging for removal of private page type counters Johannes Weiner
2020-04-21  9:12   ` Alex Shi
2020-04-22  6:41   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 07/18] mm: memcontrol: prepare move_account " Johannes Weiner
2020-04-21  9:13   ` Alex Shi
2020-04-22  6:41   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 08/18] mm: memcontrol: prepare cgroup vmstat infrastructure for native anon counters Johannes Weiner
2020-04-22  6:42   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 09/18] mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters Johannes Weiner
2020-04-22  6:42   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 10/18] mm: memcontrol: switch to native NR_ANON_MAPPED counter Johannes Weiner
2020-04-22  6:51   ` Joonsoo Kim
2020-04-22 12:28     ` Johannes Weiner
2020-04-23  5:27       ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 11/18] mm: memcontrol: switch to native NR_ANON_THPS counter Johannes Weiner
2020-04-24  0:29   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 12/18] mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API Johannes Weiner
2020-04-24  0:29   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 13/18] mm: memcontrol: drop unused try/commit/cancel charge API Johannes Weiner
2020-04-24  0:30   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 14/18] mm: memcontrol: prepare swap controller setup for integration Johannes Weiner
2020-04-24  0:30   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 15/18] mm: memcontrol: make swap tracking an integral part of memory control Johannes Weiner
2020-04-21  9:27   ` Alex Shi
2020-04-21 14:39     ` Johannes Weiner
2020-04-22  3:14       ` Alex Shi
2020-04-22 13:30         ` Johannes Weiner
2020-04-22 13:40           ` Alex Shi
2020-04-22 13:43           ` Alex Shi
2020-04-24  0:30   ` Joonsoo Kim
2020-04-24  3:01   ` Johannes Weiner
2020-04-20 22:11 ` [PATCH 16/18] mm: memcontrol: charge swapin pages on instantiation Johannes Weiner
2020-04-21  9:21   ` Alex Shi
2020-04-24  0:44   ` Joonsoo Kim
2020-04-24  2:51     ` Johannes Weiner
2020-04-28  6:49       ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 17/18] mm: memcontrol: delete unused lrucare handling Johannes Weiner
2020-04-24  0:46   ` Joonsoo Kim
2020-04-20 22:11 ` [PATCH 18/18] mm: memcontrol: update page->mem_cgroup stability rules Johannes Weiner
2020-04-21  9:20   ` Alex Shi
2020-04-24  0:48   ` Joonsoo Kim
2020-04-21  9:10 ` Hillf Danton
2020-04-21 14:34   ` Johannes Weiner
2020-04-21  9:32 ` [PATCH 00/18] mm: memcontrol: charge swapin pages on instantiation Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511015732.GA7749@js1304-desktop \
    --to=iamjoonsoo.kim@lge.com \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@fb.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).