All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: 3.13-rc breaks MEMCG_SWAP
Date: Tue, 17 Dec 2013 11:25:17 +0100	[thread overview]
Message-ID: <20131217102517.GA28991@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LNX.2.00.1312161742540.2037@eggly.anvils>

On Mon 16-12-13 18:26:18, Hugh Dickins wrote:
> On Mon, 16 Dec 2013, Michal Hocko wrote:
> > On Mon 16-12-13 00:36:05, Hugh Dickins wrote:
[...]
> > > +/**
> > > + * swap_cgroup_reassign - assign all old entries to new (before old is freed).
> > > + * @old: id of emptied memcg whose entries are now to be reassigned
> > > + * @new: id of parent memcg to which those entries are to be assigned
> > > + *
> > > + * Returns number of entries reassigned, for debugging or for statistics.
> > > + */
> > > +long swap_cgroup_reassign(unsigned short old, unsigned short new)
> > > +{
> > > +	long reassigned = 0;
> > > +	int type;
> > > +
> > > +	for (type = 0; type < MAX_SWAPFILES; type++) {
> > > +		struct swap_cgroup_ctrl *ctrl = &swap_cgroup_ctrl[type];
> > > +		unsigned long idx;
> > > +
> > > +		for (idx = 0; idx < ACCESS_ONCE(ctrl->length); idx++) {
> > > +			struct swap_cgroup *sc, *scend;
> > > +
> > > +			spin_lock(&swap_cgroup_lock);
> > > +			if (idx >= ACCESS_ONCE(ctrl->length))
> > > +				goto unlock;
> > > +			sc = page_address(ctrl->map[idx]);
> > > +			for (scend = sc + SC_PER_PAGE; sc < scend; sc++) {
> > > +				if (sc->id != old)
> > > +					continue;
> > 
> > Is this safe? What prevents from race when id is set to old?
> 
> I am assuming that when this is called, we shall not be making any
> new charges to old (if you see what I mean :) - this is called after
> mem_cgroup_reparent_charges() (the main call - I've not yet wrapped
> my head around Hannes's later safety-net call; perhaps these patches
> would even make that one redundant - dunno).

I have to think about this some more. I was playing with an alternative
fix for this race and few trace points shown me that the races are quite
common.

Anyway, wouldn't be it easier to take the lock for a batch of sc's and
then release it followed by cond_resched? Doing 1024 of iterations
doesn't sound too bad to me (we would take the lock 2 times for 4k pages).

> Quite a lot is made simpler by the fact that we do not need to call
> this from mem_cgroup_force_empty(), with all the races that would
> entail: there was never any attempt to move these swap charges before,
> so it's a pleasure not to have to deal with that possibility now.
>
> > > +				spin_lock(&ctrl->lock);
> > > +				if (sc->id == old) {
> > 
> > Also it seems that compiler is free to optimize this test away, no?
> > You need ACCESS_ONCE here as well, I guess.
> 
> Oh dear, you're asking me to read again through memory-barriers.txt
> (Xmas 2013 edition), and come to a conclusion.  I think this pattern
> of test outside spinlock, spinlock, test again inside spinlock is
> used in very many places, without any ACCESS_ONCE.  I'll have to
> go and search through the precedents.
> 
> I've probably brought this upon myself with the ACCESS_ONCE(ctrl->length)
> a few lines above, which I added at a late stage to match the one above
> it; but now I'm arguing that's unnecessary.

Yeah, that triggered the red flag ;)

> One of the problems with ACCESS_ONCE is that one easily falls into
> a mistaken state in which it seems to be necessary everywhere;
> but that illusion must be resisted.
> 
> The spinlock should make it unnecessary, but I'll have to muse on
> semi-permeable membranes, osmosis, stuff like that.

OK, I have checked that and you are right. ACCESS_ONCE is not needed.
Compiler is not allowed to optimize across barrier() which is a part of
spin_lock. So this should be ok.
[...]
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: 3.13-rc breaks MEMCG_SWAP
Date: Tue, 17 Dec 2013 11:25:17 +0100	[thread overview]
Message-ID: <20131217102517.GA28991@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LNX.2.00.1312161742540.2037@eggly.anvils>

On Mon 16-12-13 18:26:18, Hugh Dickins wrote:
> On Mon, 16 Dec 2013, Michal Hocko wrote:
> > On Mon 16-12-13 00:36:05, Hugh Dickins wrote:
[...]
> > > +/**
> > > + * swap_cgroup_reassign - assign all old entries to new (before old is freed).
> > > + * @old: id of emptied memcg whose entries are now to be reassigned
> > > + * @new: id of parent memcg to which those entries are to be assigned
> > > + *
> > > + * Returns number of entries reassigned, for debugging or for statistics.
> > > + */
> > > +long swap_cgroup_reassign(unsigned short old, unsigned short new)
> > > +{
> > > +	long reassigned = 0;
> > > +	int type;
> > > +
> > > +	for (type = 0; type < MAX_SWAPFILES; type++) {
> > > +		struct swap_cgroup_ctrl *ctrl = &swap_cgroup_ctrl[type];
> > > +		unsigned long idx;
> > > +
> > > +		for (idx = 0; idx < ACCESS_ONCE(ctrl->length); idx++) {
> > > +			struct swap_cgroup *sc, *scend;
> > > +
> > > +			spin_lock(&swap_cgroup_lock);
> > > +			if (idx >= ACCESS_ONCE(ctrl->length))
> > > +				goto unlock;
> > > +			sc = page_address(ctrl->map[idx]);
> > > +			for (scend = sc + SC_PER_PAGE; sc < scend; sc++) {
> > > +				if (sc->id != old)
> > > +					continue;
> > 
> > Is this safe? What prevents from race when id is set to old?
> 
> I am assuming that when this is called, we shall not be making any
> new charges to old (if you see what I mean :) - this is called after
> mem_cgroup_reparent_charges() (the main call - I've not yet wrapped
> my head around Hannes's later safety-net call; perhaps these patches
> would even make that one redundant - dunno).

I have to think about this some more. I was playing with an alternative
fix for this race and few trace points shown me that the races are quite
common.

Anyway, wouldn't be it easier to take the lock for a batch of sc's and
then release it followed by cond_resched? Doing 1024 of iterations
doesn't sound too bad to me (we would take the lock 2 times for 4k pages).

> Quite a lot is made simpler by the fact that we do not need to call
> this from mem_cgroup_force_empty(), with all the races that would
> entail: there was never any attempt to move these swap charges before,
> so it's a pleasure not to have to deal with that possibility now.
>
> > > +				spin_lock(&ctrl->lock);
> > > +				if (sc->id == old) {
> > 
> > Also it seems that compiler is free to optimize this test away, no?
> > You need ACCESS_ONCE here as well, I guess.
> 
> Oh dear, you're asking me to read again through memory-barriers.txt
> (Xmas 2013 edition), and come to a conclusion.  I think this pattern
> of test outside spinlock, spinlock, test again inside spinlock is
> used in very many places, without any ACCESS_ONCE.  I'll have to
> go and search through the precedents.
> 
> I've probably brought this upon myself with the ACCESS_ONCE(ctrl->length)
> a few lines above, which I added at a late stage to match the one above
> it; but now I'm arguing that's unnecessary.

Yeah, that triggered the red flag ;)

> One of the problems with ACCESS_ONCE is that one easily falls into
> a mistaken state in which it seems to be necessary everywhere;
> but that illusion must be resisted.
> 
> The spinlock should make it unnecessary, but I'll have to muse on
> semi-permeable membranes, osmosis, stuff like that.

OK, I have checked that and you are right. ACCESS_ONCE is not needed.
Compiler is not allowed to optimize across barrier() which is a part of
spin_lock. So this should be ok.
[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
To: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	KAMEZAWA Hiroyuki
	<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: 3.13-rc breaks MEMCG_SWAP
Date: Tue, 17 Dec 2013 11:25:17 +0100	[thread overview]
Message-ID: <20131217102517.GA28991@dhcp22.suse.cz> (raw)
In-Reply-To: <alpine.LNX.2.00.1312161742540.2037-fupSdm12i1nKWymIFiNcPA@public.gmane.org>

On Mon 16-12-13 18:26:18, Hugh Dickins wrote:
> On Mon, 16 Dec 2013, Michal Hocko wrote:
> > On Mon 16-12-13 00:36:05, Hugh Dickins wrote:
[...]
> > > +/**
> > > + * swap_cgroup_reassign - assign all old entries to new (before old is freed).
> > > + * @old: id of emptied memcg whose entries are now to be reassigned
> > > + * @new: id of parent memcg to which those entries are to be assigned
> > > + *
> > > + * Returns number of entries reassigned, for debugging or for statistics.
> > > + */
> > > +long swap_cgroup_reassign(unsigned short old, unsigned short new)
> > > +{
> > > +	long reassigned = 0;
> > > +	int type;
> > > +
> > > +	for (type = 0; type < MAX_SWAPFILES; type++) {
> > > +		struct swap_cgroup_ctrl *ctrl = &swap_cgroup_ctrl[type];
> > > +		unsigned long idx;
> > > +
> > > +		for (idx = 0; idx < ACCESS_ONCE(ctrl->length); idx++) {
> > > +			struct swap_cgroup *sc, *scend;
> > > +
> > > +			spin_lock(&swap_cgroup_lock);
> > > +			if (idx >= ACCESS_ONCE(ctrl->length))
> > > +				goto unlock;
> > > +			sc = page_address(ctrl->map[idx]);
> > > +			for (scend = sc + SC_PER_PAGE; sc < scend; sc++) {
> > > +				if (sc->id != old)
> > > +					continue;
> > 
> > Is this safe? What prevents from race when id is set to old?
> 
> I am assuming that when this is called, we shall not be making any
> new charges to old (if you see what I mean :) - this is called after
> mem_cgroup_reparent_charges() (the main call - I've not yet wrapped
> my head around Hannes's later safety-net call; perhaps these patches
> would even make that one redundant - dunno).

I have to think about this some more. I was playing with an alternative
fix for this race and few trace points shown me that the races are quite
common.

Anyway, wouldn't be it easier to take the lock for a batch of sc's and
then release it followed by cond_resched? Doing 1024 of iterations
doesn't sound too bad to me (we would take the lock 2 times for 4k pages).

> Quite a lot is made simpler by the fact that we do not need to call
> this from mem_cgroup_force_empty(), with all the races that would
> entail: there was never any attempt to move these swap charges before,
> so it's a pleasure not to have to deal with that possibility now.
>
> > > +				spin_lock(&ctrl->lock);
> > > +				if (sc->id == old) {
> > 
> > Also it seems that compiler is free to optimize this test away, no?
> > You need ACCESS_ONCE here as well, I guess.
> 
> Oh dear, you're asking me to read again through memory-barriers.txt
> (Xmas 2013 edition), and come to a conclusion.  I think this pattern
> of test outside spinlock, spinlock, test again inside spinlock is
> used in very many places, without any ACCESS_ONCE.  I'll have to
> go and search through the precedents.
> 
> I've probably brought this upon myself with the ACCESS_ONCE(ctrl->length)
> a few lines above, which I added at a late stage to match the one above
> it; but now I'm arguing that's unnecessary.

Yeah, that triggered the red flag ;)

> One of the problems with ACCESS_ONCE is that one easily falls into
> a mistaken state in which it seems to be necessary everywhere;
> but that illusion must be resisted.
> 
> The spinlock should make it unnecessary, but I'll have to muse on
> semi-permeable membranes, osmosis, stuff like that.

OK, I have checked that and you are right. ACCESS_ONCE is not needed.
Compiler is not allowed to optimize across barrier() which is a part of
spin_lock. So this should be ok.
[...]
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2013-12-17 10:25 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-16  8:36 3.13-rc breaks MEMCG_SWAP Hugh Dickins
2013-12-16  8:36 ` Hugh Dickins
2013-12-16  9:36 ` Li Zefan
2013-12-16  9:36   ` Li Zefan
2013-12-16  9:53   ` Michal Hocko
2013-12-16  9:53     ` Michal Hocko
2013-12-16 10:40     ` Michal Hocko
2013-12-16 10:40       ` Michal Hocko
2013-12-16 16:35       ` Tejun Heo
2013-12-16 16:35         ` Tejun Heo
2013-12-16 17:19         ` Michal Hocko
2013-12-16 17:19           ` Michal Hocko
2013-12-16 17:21           ` Tejun Heo
2013-12-16 17:21             ` Tejun Heo
2013-12-17  1:41             ` Hugh Dickins
2013-12-17  1:41               ` Hugh Dickins
2013-12-17  3:13               ` Li Zefan
2013-12-17  3:13                 ` Li Zefan
2013-12-17  7:09                 ` Hugh Dickins
2013-12-17  7:09                   ` Hugh Dickins
2013-12-17 13:11                   ` Michal Hocko
2013-12-17 13:11                     ` Michal Hocko
2013-12-17 13:14                     ` Tejun Heo
2013-12-17 13:14                       ` Tejun Heo
2013-12-17 12:29                 ` Tejun Heo
2013-12-17 12:29                   ` Tejun Heo
2013-12-17 13:12                   ` Michal Hocko
2013-12-17 13:12                     ` Michal Hocko
2013-12-17 13:12                     ` Michal Hocko
2013-12-17 12:48                 ` Michal Hocko
2013-12-17 12:48                   ` Michal Hocko
2013-12-17 13:05                 ` Michal Hocko
2013-12-17 13:05                   ` Michal Hocko
2013-12-17 13:15                 ` [PATCH cgroup/for-3.13-fixes] cgroup: don't recycle cgroup id until all csses' have been destroyed Tejun Heo
2013-12-17 13:15                   ` Tejun Heo
2013-12-17 13:15                   ` Tejun Heo
2013-12-17 13:14               ` 3.13-rc breaks MEMCG_SWAP Michal Hocko
2013-12-17 13:14                 ` Michal Hocko
2013-12-16 16:41       ` Johannes Weiner
2013-12-16 16:41         ` Johannes Weiner
2013-12-16 17:15         ` Michal Hocko
2013-12-16 17:15           ` Michal Hocko
2013-12-16 17:15           ` Michal Hocko
2013-12-16 17:19           ` Tejun Heo
2013-12-16 17:19             ` Tejun Heo
2013-12-16  9:49 ` Michal Hocko
2013-12-16  9:49   ` Michal Hocko
2013-12-16 16:20 ` Michal Hocko
2013-12-16 16:20   ` Michal Hocko
2013-12-17  2:26   ` Hugh Dickins
2013-12-17  2:26     ` Hugh Dickins
2013-12-17 10:25     ` Michal Hocko [this message]
2013-12-17 10:25       ` Michal Hocko
2013-12-17 10:25       ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131217102517.GA28991@dhcp22.suse.cz \
    --to=mhocko@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.