All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] shmem: fix negative rss in memcg memory.stat
Date: Tue, 10 Jul 2012 14:41:08 +0200	[thread overview]
Message-ID: <20120710124107.GE1779@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1207091541310.2051@eggly.anvils>

On Mon, Jul 09, 2012 at 03:44:24PM -0700, Hugh Dickins wrote:
> When adding the page_private checks before calling shmem_replace_page(),
> I did realize that there is a further race, but thought it too unlikely
> to need a hurried fix.
> 
> But independently I've been chasing why a mem cgroup's memory.stat
> sometimes shows negative rss after all tasks have gone: I expected it
> to be a stats gathering bug, but actually it's shmem swapping's fault.
> 
> It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
> the page may have been removed from swapcache before getting the lock; 
> or it may have been freed and reused and be back in swapcache; and it
> can even be using the same swap location as before (page_private same).
> 
> The swapoff case is already secure against this (swap cannot be reused
> until the whole area has been swapped off, and a new swapped on); and
> shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check
> for the expected radix_tree entry - but a little too late.
> 
> By that time, we might have already decided to shmem_replace_page():
> I don't know of a problem from that, but I'd feel more at ease not to
> do so spuriously.  And we have already done mem_cgroup_cache_charge(),
> on perhaps the wrong mem cgroup: and this charge is not then undone on
> the error path, because PageSwapCache ends up preventing that.

I couldn't see anything wrong with shmem_replace_page(), either, but
maybe the comment in its error path could be updated as the callsite
does not rely on page_private alone anymore to confirm correct swap.

> It's this last case which causes the occasional negative rss in
> memory.stat: the page is charged here as cache, but (sometimes) found
> to be anon when eventually it's uncharged - and in between, it's an
> undeserved charge on the wrong memcg.
> 
> Fix this by adding an earlier check on the radix_tree entry: it's
> inelegant to descend the tree twice, but swapping is not the fast path,
> and a better solution would need a pair (try+commit) of memcg calls,
> and a rework of shmem_replace_page() to keep out of the swapcache.
> 
> We can use the added shmem_confirm_swap() function to replace the
> find_get_page+page_cache_release we were already doing on the error
> path.  And add a comment on that -EEXIST: it seems a peculiar errno
> to be using, but originates from its use in radix_tree_insert().
> 
> [It can be surprising to see positive rss left in a memcg's memory.stat
> after all tasks have gone, since it is supposed to count anonymous but
> not shmem.  Aside from sharing anon pages via fork with a task in some
> other memcg, it often happens after swapping: because a swap page can't
> be freed while under writeback, nor while locked.  So it's not an error,
> and these residual pages are easily freed once pressure demands.]
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] shmem: fix negative rss in memcg memory.stat
Date: Tue, 10 Jul 2012 14:41:08 +0200	[thread overview]
Message-ID: <20120710124107.GE1779@cmpxchg.org> (raw)
In-Reply-To: <alpine.LSU.2.00.1207091541310.2051@eggly.anvils>

On Mon, Jul 09, 2012 at 03:44:24PM -0700, Hugh Dickins wrote:
> When adding the page_private checks before calling shmem_replace_page(),
> I did realize that there is a further race, but thought it too unlikely
> to need a hurried fix.
> 
> But independently I've been chasing why a mem cgroup's memory.stat
> sometimes shows negative rss after all tasks have gone: I expected it
> to be a stats gathering bug, but actually it's shmem swapping's fault.
> 
> It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
> the page may have been removed from swapcache before getting the lock; 
> or it may have been freed and reused and be back in swapcache; and it
> can even be using the same swap location as before (page_private same).
> 
> The swapoff case is already secure against this (swap cannot be reused
> until the whole area has been swapped off, and a new swapped on); and
> shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check
> for the expected radix_tree entry - but a little too late.
> 
> By that time, we might have already decided to shmem_replace_page():
> I don't know of a problem from that, but I'd feel more at ease not to
> do so spuriously.  And we have already done mem_cgroup_cache_charge(),
> on perhaps the wrong mem cgroup: and this charge is not then undone on
> the error path, because PageSwapCache ends up preventing that.

I couldn't see anything wrong with shmem_replace_page(), either, but
maybe the comment in its error path could be updated as the callsite
does not rely on page_private alone anymore to confirm correct swap.

> It's this last case which causes the occasional negative rss in
> memory.stat: the page is charged here as cache, but (sometimes) found
> to be anon when eventually it's uncharged - and in between, it's an
> undeserved charge on the wrong memcg.
> 
> Fix this by adding an earlier check on the radix_tree entry: it's
> inelegant to descend the tree twice, but swapping is not the fast path,
> and a better solution would need a pair (try+commit) of memcg calls,
> and a rework of shmem_replace_page() to keep out of the swapcache.
> 
> We can use the added shmem_confirm_swap() function to replace the
> find_get_page+page_cache_release we were already doing on the error
> path.  And add a comment on that -EEXIST: it seems a peculiar errno
> to be using, but originates from its use in radix_tree_insert().
> 
> [It can be surprising to see positive rss left in a memcg's memory.stat
> after all tasks have gone, since it is supposed to count anonymous but
> not shmem.  Aside from sharing anon pages via fork with a task in some
> other memcg, it often happens after swapping: because a swap page can't
> be freed while under writeback, nor while locked.  So it's not an error,
> and these residual pages are easily freed once pressure demands.]
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Michal Hocko <mhocko@suse.cz>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-07-10 12:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-09 22:35 [PATCH 0/3] shmem/tmpfs: three late patches Hugh Dickins
2012-07-09 22:35 ` Hugh Dickins
2012-07-09 22:41 ` [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE Hugh Dickins
2012-07-09 22:41   ` Hugh Dickins
2012-07-11  6:07   ` Cong Wang
2012-07-11  6:07     ` Cong Wang
2012-07-11 18:55     ` Hugh Dickins
2012-07-11 18:55       ` Hugh Dickins
2012-07-11 23:01       ` Dave Chinner
2012-07-11 23:01         ` Dave Chinner
2012-07-12  2:50         ` Hugh Dickins
2012-07-12  2:50           ` Hugh Dickins
2012-07-12  3:21         ` Jeff Liu
2012-07-12  3:21           ` Jeff Liu
2012-07-16  9:28           ` Hugh Dickins
2012-07-16  9:28             ` Hugh Dickins
2012-07-17  6:15             ` Jeff Liu
2012-07-17  6:15               ` Jeff Liu
2012-07-31 14:30             ` Jim Meyering
2012-07-31 14:42               ` Jim Meyering
2012-08-08  2:08               ` Hugh Dickins
2012-08-14 17:03                 ` Paul Eggert
2012-07-09 22:44 ` [PATCH 2/3] shmem: fix negative rss in memcg memory.stat Hugh Dickins
2012-07-09 22:44   ` Hugh Dickins
2012-07-10 12:41   ` Johannes Weiner [this message]
2012-07-10 12:41     ` Johannes Weiner
2012-07-11 18:15     ` Hugh Dickins
2012-07-11 18:15       ` Hugh Dickins
2012-07-09 22:46 ` [PATCH 3/3] shmem: cleanup shmem_add_to_page_cache Hugh Dickins
2012-07-09 22:46   ` Hugh Dickins
2012-07-10 13:01   ` Johannes Weiner
2012-07-10 13:01     ` Johannes Weiner
2012-07-09 23:39 ` [PATCH 0/3] shmem/tmpfs: three late patches Andrew Morton
2012-07-09 23:39   ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120710124107.GE1779@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.