All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,  <linux-mm@kvack.org>,
	 <linux-kernel@vger.kernel.org>,  Hugh Dickins <hughd@google.com>,
	 "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>,
	 Minchan Kim <minchan@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Tim Chen <tim.c.chen@linux.intel.com>,
	 Mel Gorman <mgorman@techsingularity.net>,
	 Jérôme Glisse <jglisse@redhat.com>,
	 Michal Hocko <mhocko@suse.com>,
	 David Rientjes <rientjes@google.com>,
	 Rik van Riel <riel@redhat.com>,  Jan Kara <jack@suse.cz>,
	 Dave Jiang <dave.jiang@intel.com>,
	 Daniel Jordan <daniel.m.jordan@oracle.com>,
	 Andrea Parri <andrea.parri@amarulasolutions.com>
Subject: Re: [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations
Date: Thu, 14 Feb 2019 16:07:37 +0800	[thread overview]
Message-ID: <87k1i2oks6.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20190214023805.GA19090@redhat.com> (Andrea Arcangeli's message of "Wed, 13 Feb 2019 21:38:05 -0500")

Hi, Andrea,

Andrea Arcangeli <aarcange@redhat.com> writes:

> Hello everyone,
>
> On Mon, Feb 11, 2019 at 04:38:46PM +0800, Huang, Ying wrote:
>> @@ -2386,7 +2463,17 @@ static void enable_swap_info(struct swap_info_struct *p, int prio,
>>  	frontswap_init(p->type, frontswap_map);
>>  	spin_lock(&swap_lock);
>>  	spin_lock(&p->lock);
>> -	 _enable_swap_info(p, prio, swap_map, cluster_info);
>> +	setup_swap_info(p, prio, swap_map, cluster_info);
>> +	spin_unlock(&p->lock);
>> +	spin_unlock(&swap_lock);
>> +	/*
>> +	 * Guarantee swap_map, cluster_info, etc. fields are used
>> +	 * between get/put_swap_device() only if SWP_VALID bit is set
>> +	 */
>> +	stop_machine(swap_onoff_stop, NULL, cpu_online_mask);
>
> Should cpu_online_mask be read while holding cpus_read_lock?
>
> 	cpus_read_lock();
> 	err = __stop_machine(swap_onoff_stop, NULL, cpu_online_mask);
> 	cpus_read_unlock();

Thanks for pointing this out.  Because swap_onoff_stop() is just dumb
function, something as below should be sufficient.

stop_machine(swap_onoff_stop, NULL, NULL);

> I missed what the exact motivation was for the switch from
> rcu_read_lock()/syncrhonize_rcu() to preempt_disable()/stop_machine().
>
> It looks like the above stop_machine all it does is to reach a
> quiescent point, when you've RCU that already can reach the quiescent
> point without an explicit stop_machine.
>
> The reason both implementations are basically looking the same is that
> stop_machine dummy call of swap_onoff_stop() { /* noop */ } will only
> reach a quiescent point faster than RCU, but it's otherwise
> functionally identical to RCU, but it's extremely more expensive. If
> it wasn't functionally identical stop_machine() couldn't be used as a
> drop in replacement of synchronize_sched() in the previous patch.
>
> I don't see the point of worrying about the synchronize_rcu latency in
> swapoff when RCU is basically identical and not more complex.
>
> So to be clear, I'm not against stop_machine() but with stop_machine()
> method invoked in all CPUs, you can actually do more than RCU and you
> can remove real locking not just reach a quiescent point.
>
> With stop_machine() the code would need reshuffling around so that the
> actual p->swap_map = NULL happens inside stop_machine, not outside
> like with RCU.
>
> With RCU all code stays concurrent at all times, simply the race is
> controlled, as opposed with stop_machine() you can make fully
> serialize and run like in UP temporarily (vs all preempt_disable()
> section at least).
>
> For example nr_swapfiles could in theory become a constant under
> preempt_disable() with stop_machine() without having to take a
> swap_lock.
>
> swap_onoff_stop can be implemented like this:
>
> enum {
> 	FIRST_STOP_MACHINE_INIT,
> 	FIRST_STOP_MACHINE_START,
> 	FIRST_STOP_MACHINE_END,
> };
> static int first_stop_machine;
> static int swap_onoff_stop(void *data)
> {
> 	struct swap_stop_machine *swsm = (struct swap_stop_machine *)data;
> 	int first;
>
> 	first = cmpxchg(&first_stop_machine, FIRST_STOP_MACHINE_INIT,
> 			FIRST_STOP_MACHINE_START);
> 	if (first == FIRST_STOP_MACHINE_INIT) {
> 		swsm->p->swap_map = NULL;
> 		/* add more stuff here until swap_lock goes away */
> 		smp_wmb();
> 		WRITE_ONCE(first_stop_machine, FIRST_STOP_MACHINE_END);
> 	} else {
> 		do {
> 			cpu_relax();
> 		} while (READ_ONCE(first_stop_machine) !=
> 			 FIRST_STOP_MACHINE_END);
> 		smp_rmb();
> 	}
>
> 	return 0;
> }
>
> stop_machine invoked with a method like above, will guarantee while we
> set p->swap_map to NULL (and while we do nr_swapfiles++) nothing else
> can run, no even interrupts, so some lock may just disappear. Only NMI
> and SMI could possibly run concurrently with the swsm->p->swap_map =
> NULL operation.
>
> If we've to keep swap_onoff_stop() a dummy function run on all CPUs
> just to reach a quiescent point, then I don't see why
> the synchronize_rcu() (or synchronize_sched or synchronize_kernel or
> whatever it is called right now, but still RCU) solution isn't
> preferable.

We only need to keep swap_onoff_stop() a dummy function as above.  So
from functionality point of view, RCU works for us perfectly too.  Paul
pointed out that before too.

Before, we choose to use stop_machine() to reduce the overhead of hot
path (page fault handler) as much as possible.  But now, I found
rcu_read_lock_sched() is just a wrapper of preempt_disable().  So maybe
we can switch to RCU version now.

Best Regards,
Huang, Ying

> Thanks,
> Andrea


  reply	other threads:[~2019-02-14  8:07 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-11  8:38 [PATCH -mm -V7] mm, swap: fix race between swapoff and some swap operations Huang, Ying
2019-02-11 19:06 ` Daniel Jordan
2019-02-12  3:21   ` Andrea Parri
2019-02-12  6:47     ` Huang, Ying
2019-02-12 17:58       ` Tim Chen
2019-02-13  3:23         ` Huang, Ying
2019-02-12 20:06     ` Daniel Jordan
2019-02-12  6:40   ` Huang, Ying
2019-02-12 10:13 ` Andrea Parri
2019-02-15  6:34   ` Huang, Ying
2019-02-14  2:38 ` Andrea Arcangeli
2019-02-14  8:07   ` Huang, Ying [this message]
2019-02-14 21:47     ` Andrea Arcangeli
2019-02-15  7:50       ` Huang, Ying
2019-02-14 14:33 ` Michal Hocko
2019-02-14 20:30   ` Andrew Morton
2019-02-14 21:22     ` Andrea Arcangeli
2019-02-15  7:08   ` Huang, Ying
2019-02-15 13:11     ` Michal Hocko
2019-02-18  0:51       ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k1i2oks6.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrea.parri@amarulasolutions.com \
    --cc=daniel.m.jordan@oracle.com \
    --cc=dave.jiang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.