All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
@ 2021-05-13  6:48 Huang Ying
  2021-05-13  8:49 ` Miaohe Lin
  2021-05-13 12:46 ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Huang Ying @ 2021-05-13  6:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, Huang Ying, Daniel Jordan, Dan Carpenter,
	Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array
accesses to avoid NULL derefs"), the typical code to reference the
swap_info[] is as follows,

  type = swp_type(swp_entry);
  if (type >= nr_swapfiles)
          /* handle invalid swp_entry */;
  p = swap_info[type];
  /* access fields of *p.  OOPS! p may be NULL! */

Because the ordering isn't guaranteed, it's possible that "p" is read
before checking "type".  And that may result in NULL pointer
dereference.

So in commit c10d38cc8d3e, the code becomes,

  struct swap_info_struct *swap_type_to_swap_info(int type)
  {
	  if (type >= READ_ONCE(nr_swapfiles))
		  return NULL;
	  smp_rmb();
	  return READ_ONCE(swap_info[type]);
  }

  /* users */
  type = swp_type(swp_entry);
  p = swap_type_to_swap_info(type);
  if (!p)
	  /* handle invalid swp_entry */;
  /* access fields of *p */

Because "p" is checked to be non-zero before dereference, smp_rmb()
isn't needed anymore.

We still need to guarantee swap_info[type] is read before dereference.
That can be satisfied via the data dependency ordering of
READ_ONCE(swap_info[type]).  The corresponding smp_wmb() is adjusted
in alloc_swap_info() too.

And, we don't need to read "nr_swapfiles" too.  Because if
"type >= nr_swapfiles", swap_info[type] will be NULL.  We just need
to make sure we will not access out of the boundary of the array.
With that change, nr_swapfiles will only be accessed with swap_lock
held, except in swapcache_free_entries().  Where the absolute
correctness of the value isn't needed, as described in the comments.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/swapfile.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2aad85751991..4c1fb28bbe0e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
 static struct swap_info_struct *swap_type_to_swap_info(int type)
 {
-	if (type >= READ_ONCE(nr_swapfiles))
+	if (type >= MAX_SWAPFILES)
 		return NULL;
 
-	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
+	/*
+	 * The data dependency ordering from the READ_ONCE() pairs
+	 * with smp_wmb() in alloc_swap_info() to guarantee the
+	 * swap_info_struct fields are read after swap_info[type].
+	 */
 	return READ_ONCE(swap_info[type]);
 }
 
@@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		WRITE_ONCE(swap_info[type], p);
-		/*
-		 * Write swap_info[type] before nr_swapfiles, in case a
-		 * racing procfs swap_start() or swap_next() is reading them.
-		 * (We never shrink nr_swapfiles, we never free this entry.)
-		 */
+		/* Paired with READ_ONCE() in swap_type_to_swap_info() */
 		smp_wmb();
-		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
+		WRITE_ONCE(swap_info[type], p);
+		nr_swapfiles++;
 	} else {
 		defer = p;
 		p = swap_info[type];
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13  6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying
@ 2021-05-13  8:49 ` Miaohe Lin
  2021-05-13  9:54     ` Muchun Song
  2021-05-13 12:46 ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Miaohe Lin @ 2021-05-13  8:49 UTC (permalink / raw)
  To: Huang Ying, Andrew Morton
  Cc: linux-mm, linux-kernel, Daniel Jordan, Dan Carpenter,
	Andrea Parri, Peter Zijlstra, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon

On 2021/5/13 14:48, Huang Ying wrote:
> Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array
> accesses to avoid NULL derefs"), the typical code to reference the
> swap_info[] is as follows,
> 
>   type = swp_type(swp_entry);
>   if (type >= nr_swapfiles)
>           /* handle invalid swp_entry */;
>   p = swap_info[type];
>   /* access fields of *p.  OOPS! p may be NULL! */
> 
> Because the ordering isn't guaranteed, it's possible that "p" is read
> before checking "type".  And that may result in NULL pointer
> dereference.
> 
> So in commit c10d38cc8d3e, the code becomes,
> 
>   struct swap_info_struct *swap_type_to_swap_info(int type)
>   {
> 	  if (type >= READ_ONCE(nr_swapfiles))
> 		  return NULL;
> 	  smp_rmb();
> 	  return READ_ONCE(swap_info[type]);
>   }
> 
>   /* users */
>   type = swp_type(swp_entry);
>   p = swap_type_to_swap_info(type);
>   if (!p)
> 	  /* handle invalid swp_entry */;
>   /* access fields of *p */
> 
> Because "p" is checked to be non-zero before dereference, smp_rmb()
> isn't needed anymore.
> 
> We still need to guarantee swap_info[type] is read before dereference.
> That can be satisfied via the data dependency ordering of
> READ_ONCE(swap_info[type]).  The corresponding smp_wmb() is adjusted
> in alloc_swap_info() too.
> 
> And, we don't need to read "nr_swapfiles" too.  Because if
> "type >= nr_swapfiles", swap_info[type] will be NULL.  We just need
> to make sure we will not access out of the boundary of the array.
> With that change, nr_swapfiles will only be accessed with swap_lock
> held, except in swapcache_free_entries().  Where the absolute
> correctness of the value isn't needed, as described in the comments.
> 
> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Paul McKenney <paulmck@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/swapfile.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2aad85751991..4c1fb28bbe0e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>  {
> -	if (type >= READ_ONCE(nr_swapfiles))
> +	if (type >= MAX_SWAPFILES)
>  		return NULL;
>  
> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	/*
> +	 * The data dependency ordering from the READ_ONCE() pairs
> +	 * with smp_wmb() in alloc_swap_info() to guarantee the
> +	 * swap_info_struct fields are read after swap_info[type].
> +	 */
>  	return READ_ONCE(swap_info[type]);
>  }
>  
> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		WRITE_ONCE(swap_info[type], p);
> -		/*
> -		 * Write swap_info[type] before nr_swapfiles, in case a
> -		 * racing procfs swap_start() or swap_next() is reading them.
> -		 * (We never shrink nr_swapfiles, we never free this entry.)
> -		 */
> +		/* Paired with READ_ONCE() in swap_type_to_swap_info() */
>  		smp_wmb();

Many thank for your patch. The patch looks fine to me. There is one question:

There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ?
Could you please have a explanation ?

Thanks again!

> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> +		WRITE_ONCE(swap_info[type], p);
> +		nr_swapfiles++;
>  	} else {
>  		defer = p;
>  		p = swap_info[type];
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13  8:49 ` Miaohe Lin
@ 2021-05-13  9:54     ` Muchun Song
  0 siblings, 0 replies; 15+ messages in thread
From: Muchun Song @ 2021-05-13  9:54 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen,
	Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo,
	Will Deacon

On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2021/5/13 14:48, Huang Ying wrote:
> > Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array
> > accesses to avoid NULL derefs"), the typical code to reference the
> > swap_info[] is as follows,
> >
> >   type = swp_type(swp_entry);
> >   if (type >= nr_swapfiles)
> >           /* handle invalid swp_entry */;
> >   p = swap_info[type];
> >   /* access fields of *p.  OOPS! p may be NULL! */
> >
> > Because the ordering isn't guaranteed, it's possible that "p" is read
> > before checking "type".  And that may result in NULL pointer
> > dereference.
> >
> > So in commit c10d38cc8d3e, the code becomes,
> >
> >   struct swap_info_struct *swap_type_to_swap_info(int type)
> >   {
> >         if (type >= READ_ONCE(nr_swapfiles))
> >                 return NULL;
> >         smp_rmb();
> >         return READ_ONCE(swap_info[type]);
> >   }
> >
> >   /* users */
> >   type = swp_type(swp_entry);
> >   p = swap_type_to_swap_info(type);
> >   if (!p)
> >         /* handle invalid swp_entry */;
> >   /* access fields of *p */
> >
> > Because "p" is checked to be non-zero before dereference, smp_rmb()
> > isn't needed anymore.
> >
> > We still need to guarantee swap_info[type] is read before dereference.
> > That can be satisfied via the data dependency ordering of
> > READ_ONCE(swap_info[type]).  The corresponding smp_wmb() is adjusted
> > in alloc_swap_info() too.
> >
> > And, we don't need to read "nr_swapfiles" too.  Because if
> > "type >= nr_swapfiles", swap_info[type] will be NULL.  We just need
> > to make sure we will not access out of the boundary of the array.
> > With that change, nr_swapfiles will only be accessed with swap_lock
> > held, except in swapcache_free_entries().  Where the absolute
> > correctness of the value isn't needed, as described in the comments.
> >
> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> > Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Paul McKenney <paulmck@kernel.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/swapfile.c | 18 +++++++++---------
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 2aad85751991..4c1fb28bbe0e 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
> >
> >  static struct swap_info_struct *swap_type_to_swap_info(int type)
> >  {
> > -     if (type >= READ_ONCE(nr_swapfiles))
> > +     if (type >= MAX_SWAPFILES)
> >               return NULL;
> >
> > -     smp_rmb();      /* Pairs with smp_wmb in alloc_swap_info. */
> > +     /*
> > +      * The data dependency ordering from the READ_ONCE() pairs
> > +      * with smp_wmb() in alloc_swap_info() to guarantee the
> > +      * swap_info_struct fields are read after swap_info[type].
> > +      */
> >       return READ_ONCE(swap_info[type]);
> >  }
> >
> > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
> >       }
> >       if (type >= nr_swapfiles) {
> >               p->type = type;
> > -             WRITE_ONCE(swap_info[type], p);
> > -             /*
> > -              * Write swap_info[type] before nr_swapfiles, in case a
> > -              * racing procfs swap_start() or swap_next() is reading them.
> > -              * (We never shrink nr_swapfiles, we never free this entry.)
> > -              */
> > +             /* Paired with READ_ONCE() in swap_type_to_swap_info() */
> >               smp_wmb();
>
> Many thank for your patch. The patch looks fine to me. There is one question:
>
> There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ?
> Could you please have a explanation ?

The comment is very clear, it matches READ_ONCE() which implies a
data dependence barrier on some archs.

Thanks.

>
> Thanks again!
>
> > -             WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> > +             WRITE_ONCE(swap_info[type], p);
> > +             nr_swapfiles++;
> >       } else {
> >               defer = p;
> >               p = swap_info[type];
> >
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
@ 2021-05-13  9:54     ` Muchun Song
  0 siblings, 0 replies; 15+ messages in thread
From: Muchun Song @ 2021-05-13  9:54 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen,
	Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo,
	Will Deacon

On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>
> On 2021/5/13 14:48, Huang Ying wrote:
> > Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array
> > accesses to avoid NULL derefs"), the typical code to reference the
> > swap_info[] is as follows,
> >
> >   type = swp_type(swp_entry);
> >   if (type >= nr_swapfiles)
> >           /* handle invalid swp_entry */;
> >   p = swap_info[type];
> >   /* access fields of *p.  OOPS! p may be NULL! */
> >
> > Because the ordering isn't guaranteed, it's possible that "p" is read
> > before checking "type".  And that may result in NULL pointer
> > dereference.
> >
> > So in commit c10d38cc8d3e, the code becomes,
> >
> >   struct swap_info_struct *swap_type_to_swap_info(int type)
> >   {
> >         if (type >= READ_ONCE(nr_swapfiles))
> >                 return NULL;
> >         smp_rmb();
> >         return READ_ONCE(swap_info[type]);
> >   }
> >
> >   /* users */
> >   type = swp_type(swp_entry);
> >   p = swap_type_to_swap_info(type);
> >   if (!p)
> >         /* handle invalid swp_entry */;
> >   /* access fields of *p */
> >
> > Because "p" is checked to be non-zero before dereference, smp_rmb()
> > isn't needed anymore.
> >
> > We still need to guarantee swap_info[type] is read before dereference.
> > That can be satisfied via the data dependency ordering of
> > READ_ONCE(swap_info[type]).  The corresponding smp_wmb() is adjusted
> > in alloc_swap_info() too.
> >
> > And, we don't need to read "nr_swapfiles" too.  Because if
> > "type >= nr_swapfiles", swap_info[type] will be NULL.  We just need
> > to make sure we will not access out of the boundary of the array.
> > With that change, nr_swapfiles will only be accessed with swap_lock
> > held, except in swapcache_free_entries().  Where the absolute
> > correctness of the value isn't needed, as described in the comments.
> >
> > Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
> > Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
> > Cc: Dan Carpenter <dan.carpenter@oracle.com>
> > Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> > Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Cc: Andi Kleen <ak@linux.intel.com>
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Omar Sandoval <osandov@fb.com>
> > Cc: Paul McKenney <paulmck@kernel.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > Cc: Miaohe Lin <linmiaohe@huawei.com>
> > ---
> >  mm/swapfile.c | 18 +++++++++---------
> >  1 file changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 2aad85751991..4c1fb28bbe0e 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
> >
> >  static struct swap_info_struct *swap_type_to_swap_info(int type)
> >  {
> > -     if (type >= READ_ONCE(nr_swapfiles))
> > +     if (type >= MAX_SWAPFILES)
> >               return NULL;
> >
> > -     smp_rmb();      /* Pairs with smp_wmb in alloc_swap_info. */
> > +     /*
> > +      * The data dependency ordering from the READ_ONCE() pairs
> > +      * with smp_wmb() in alloc_swap_info() to guarantee the
> > +      * swap_info_struct fields are read after swap_info[type].
> > +      */
> >       return READ_ONCE(swap_info[type]);
> >  }
> >
> > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
> >       }
> >       if (type >= nr_swapfiles) {
> >               p->type = type;
> > -             WRITE_ONCE(swap_info[type], p);
> > -             /*
> > -              * Write swap_info[type] before nr_swapfiles, in case a
> > -              * racing procfs swap_start() or swap_next() is reading them.
> > -              * (We never shrink nr_swapfiles, we never free this entry.)
> > -              */
> > +             /* Paired with READ_ONCE() in swap_type_to_swap_info() */
> >               smp_wmb();
>
> Many thank for your patch. The patch looks fine to me. There is one question:
>
> There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ?
> Could you please have a explanation ?

The comment is very clear, it matches READ_ONCE() which implies a
data dependence barrier on some archs.

Thanks.

>
> Thanks again!
>
> > -             WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> > +             WRITE_ONCE(swap_info[type], p);
> > +             nr_swapfiles++;
> >       } else {
> >               defer = p;
> >               p = swap_info[type];
> >
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13  9:54     ` Muchun Song
  (?)
@ 2021-05-13 11:27     ` Miaohe Lin
  -1 siblings, 0 replies; 15+ messages in thread
From: Miaohe Lin @ 2021-05-13 11:27 UTC (permalink / raw)
  To: Muchun Song
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Peter Zijlstra, Andi Kleen,
	Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo,
	Will Deacon

On 2021/5/13 17:54, Muchun Song wrote:
> On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
>>
>> On 2021/5/13 14:48, Huang Ying wrote:
>>> Before commit c10d38cc8d3e ("mm, swap: bounds check swap_info array
>>> accesses to avoid NULL derefs"), the typical code to reference the
>>> swap_info[] is as follows,
>>>
>>>   type = swp_type(swp_entry);
>>>   if (type >= nr_swapfiles)
>>>           /* handle invalid swp_entry */;
>>>   p = swap_info[type];
>>>   /* access fields of *p.  OOPS! p may be NULL! */
>>>
>>> Because the ordering isn't guaranteed, it's possible that "p" is read
>>> before checking "type".  And that may result in NULL pointer
>>> dereference.
>>>
>>> So in commit c10d38cc8d3e, the code becomes,
>>>
>>>   struct swap_info_struct *swap_type_to_swap_info(int type)
>>>   {
>>>         if (type >= READ_ONCE(nr_swapfiles))
>>>                 return NULL;
>>>         smp_rmb();
>>>         return READ_ONCE(swap_info[type]);
>>>   }
>>>
>>>   /* users */
>>>   type = swp_type(swp_entry);
>>>   p = swap_type_to_swap_info(type);
>>>   if (!p)
>>>         /* handle invalid swp_entry */;
>>>   /* access fields of *p */
>>>
>>> Because "p" is checked to be non-zero before dereference, smp_rmb()
>>> isn't needed anymore.
>>>
>>> We still need to guarantee swap_info[type] is read before dereference.
>>> That can be satisfied via the data dependency ordering of
>>> READ_ONCE(swap_info[type]).  The corresponding smp_wmb() is adjusted
>>> in alloc_swap_info() too.
>>>
>>> And, we don't need to read "nr_swapfiles" too.  Because if
>>> "type >= nr_swapfiles", swap_info[type] will be NULL.  We just need
>>> to make sure we will not access out of the boundary of the array.
>>> With that change, nr_swapfiles will only be accessed with swap_lock
>>> held, except in swapcache_free_entries().  Where the absolute
>>> correctness of the value isn't needed, as described in the comments.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
>>> Cc: Dan Carpenter <dan.carpenter@oracle.com>
>>> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
>>> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
>>> Cc: Andi Kleen <ak@linux.intel.com>
>>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>>> Cc: Omar Sandoval <osandov@fb.com>
>>> Cc: Paul McKenney <paulmck@kernel.org>
>>> Cc: Tejun Heo <tj@kernel.org>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> Cc: Miaohe Lin <linmiaohe@huawei.com>
>>> ---
>>>  mm/swapfile.c | 18 +++++++++---------
>>>  1 file changed, 9 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>>> index 2aad85751991..4c1fb28bbe0e 100644
>>> --- a/mm/swapfile.c
>>> +++ b/mm/swapfile.c
>>> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>>>
>>>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>>>  {
>>> -     if (type >= READ_ONCE(nr_swapfiles))
>>> +     if (type >= MAX_SWAPFILES)
>>>               return NULL;
>>>
>>> -     smp_rmb();      /* Pairs with smp_wmb in alloc_swap_info. */
>>> +     /*
>>> +      * The data dependency ordering from the READ_ONCE() pairs
>>> +      * with smp_wmb() in alloc_swap_info() to guarantee the
>>> +      * swap_info_struct fields are read after swap_info[type].
>>> +      */
>>>       return READ_ONCE(swap_info[type]);
>>>  }
>>>
>>> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
>>>       }
>>>       if (type >= nr_swapfiles) {
>>>               p->type = type;
>>> -             WRITE_ONCE(swap_info[type], p);
>>> -             /*
>>> -              * Write swap_info[type] before nr_swapfiles, in case a
>>> -              * racing procfs swap_start() or swap_next() is reading them.
>>> -              * (We never shrink nr_swapfiles, we never free this entry.)
>>> -              */
>>> +             /* Paired with READ_ONCE() in swap_type_to_swap_info() */
>>>               smp_wmb();
>>
>> Many thank for your patch. The patch looks fine to me. There is one question:
>>
>> There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ?
>> Could you please have a explanation ?
> 
> The comment is very clear, it matches READ_ONCE() which implies a
> data dependence barrier on some archs.
> 
> Thanks.

Got it. I misunderstood it... Thanks!

> 
>>
>> Thanks again!
>>
>>> -             WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>>> +             WRITE_ONCE(swap_info[type], p);
>>> +             nr_swapfiles++;
>>>       } else {
>>>               defer = p;
>>>               p = swap_info[type];
>>>
>>
> .
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13  9:54     ` Muchun Song
  (?)
  (?)
@ 2021-05-13 12:34     ` Peter Zijlstra
  -1 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2021-05-13 12:34 UTC (permalink / raw)
  To: Muchun Song
  Cc: Miaohe Lin, Huang Ying, Andrew Morton, linux-mm, linux-kernel,
	Daniel Jordan, Dan Carpenter, Andrea Parri, Andi Kleen,
	Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo,
	Will Deacon

On Thu, May 13, 2021 at 05:54:42PM +0800, Muchun Song wrote:
> On Thu, May 13, 2021 at 5:11 PM Miaohe Lin <linmiaohe@huawei.com> wrote:
> > On 2021/5/13 14:48, Huang Ying wrote:

> > >  mm/swapfile.c | 18 +++++++++---------
> > >  1 file changed, 9 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index 2aad85751991..4c1fb28bbe0e 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
> > >
> > >  static struct swap_info_struct *swap_type_to_swap_info(int type)
> > >  {
> > > -     if (type >= READ_ONCE(nr_swapfiles))
> > > +     if (type >= MAX_SWAPFILES)
> > >               return NULL;
> > >
> > > -     smp_rmb();      /* Pairs with smp_wmb in alloc_swap_info. */
> > > +     /*
> > > +      * The data dependency ordering from the READ_ONCE() pairs
> > > +      * with smp_wmb() in alloc_swap_info() to guarantee the
> > > +      * swap_info_struct fields are read after swap_info[type].
> > > +      */
> > >       return READ_ONCE(swap_info[type]);
> > >  }
> > >
> > > @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
> > >       }
> > >       if (type >= nr_swapfiles) {
> > >               p->type = type;
> > > -             WRITE_ONCE(swap_info[type], p);
> > > -             /*
> > > -              * Write swap_info[type] before nr_swapfiles, in case a
> > > -              * racing procfs swap_start() or swap_next() is reading them.
> > > -              * (We never shrink nr_swapfiles, we never free this entry.)
> > > -              */
> > > +             /* Paired with READ_ONCE() in swap_type_to_swap_info() */
> > >               smp_wmb();
> >
> > Many thank for your patch. The patch looks fine to me. There is one question:
> >
> > There is no smp_rmb() paired with above smp_wmb(). What is this smp_wmb() used for ?
> > Could you please have a explanation ?
> 
> The comment is very clear, it matches READ_ONCE() which implies a
> data dependence barrier on some archs.

This statement doesn't make sense; this isn't code that needs to be
correct on 'some' archs, it needs to be unconditionally correct.

Also, you cannot pair with a single memop, there is no order in a set of
one element.

And if you depend on a data dependency, you need a store order; but you
just removed the store order. in which case the data dependency is also
moot.

All of this is utter confusion. Possibly correct, but a complete
trainwreck non-the-less.

Either you say ordering is irrelevant, because we only ever increase the
number of swapfiles and therefore any load is either NULL or the correct
pointer, as guaranteed by WRITE_ONCE()/READ_ONCE() avoiding load/store
tearing.

Or you need the data dependency, but then you also need the store order
like:

	CPU0					CPU1

	if (type >= READ_ONCE(nr_swapfiles))	WRITE_ONCE(swap_info[type], p);
		return NULL;
	/* data-dependency on type */		smp_wmb();
	return READ_ONCE(swap_info[type]);	WRITE_ONCE(nr_swapfiles, nr_swapfiles+1);

But you cannot have half of both and expect any of it to make sense.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13  6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying
  2021-05-13  8:49 ` Miaohe Lin
@ 2021-05-13 12:46 ` Peter Zijlstra
  2021-05-14  1:59   ` Daniel Jordan
  2021-05-14  3:27     ` Huang, Ying
  1 sibling, 2 replies; 15+ messages in thread
From: Peter Zijlstra @ 2021-05-13 12:46 UTC (permalink / raw)
  To: Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote:
>  mm/swapfile.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2aad85751991..4c1fb28bbe0e 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>  {
> -	if (type >= READ_ONCE(nr_swapfiles))
> +	if (type >= MAX_SWAPFILES)
>  		return NULL;
>  
> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	/*
> +	 * The data dependency ordering from the READ_ONCE() pairs
> +	 * with smp_wmb() in alloc_swap_info() to guarantee the
> +	 * swap_info_struct fields are read after swap_info[type].
> +	 */
>  	return READ_ONCE(swap_info[type]);
>  }
>  
> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		WRITE_ONCE(swap_info[type], p);
> -		/*
> -		 * Write swap_info[type] before nr_swapfiles, in case a
> -		 * racing procfs swap_start() or swap_next() is reading them.
> -		 * (We never shrink nr_swapfiles, we never free this entry.)
> -		 */
> +		/* Paired with READ_ONCE() in swap_type_to_swap_info() */
>  		smp_wmb();
> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> +		WRITE_ONCE(swap_info[type], p);
> +		nr_swapfiles++;

Ah, I think I see what you meant to say, it would perhaps help if you
write it like so:


diff --git a/mm/swapfile.c b/mm/swapfile.c
index 149e77454e3c..94735248dcd2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
 static struct swap_info_struct *swap_type_to_swap_info(int type)
 {
-	if (type >= READ_ONCE(nr_swapfiles))
+	if (type >= MAX_SWAPFILES)
 		return NULL;
 
-	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
-	return READ_ONCE(swap_info[type]);
+	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
 }
 
 static inline unsigned char swap_count(unsigned char ent)
@@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		WRITE_ONCE(swap_info[type], p);
 		/*
-		 * Write swap_info[type] before nr_swapfiles, in case a
-		 * racing procfs swap_start() or swap_next() is reading them.
-		 * (We never shrink nr_swapfiles, we never free this entry.)
+		 * Publish the swap_info_struct.
 		 */
-		smp_wmb();
-		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
+		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
+		nr_swapfiles++;
 	} else {
 		defer = p;
 		p = swap_info[type];

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13 12:46 ` Peter Zijlstra
@ 2021-05-14  1:59   ` Daniel Jordan
  2021-05-14  4:02       ` Huang, Ying
  2021-05-14 12:04     ` Peter Zijlstra
  2021-05-14  3:27     ` Huang, Ying
  1 sibling, 2 replies; 15+ messages in thread
From: Daniel Jordan @ 2021-05-14  1:59 UTC (permalink / raw)
  To: Peter Zijlstra, Huang Ying
  Cc: Andrew Morton, linux-mm, linux-kernel, Dan Carpenter,
	Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval,
	Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote:
> Ah, I think I see what you meant to say, it would perhaps help if you
> write it like so:
> 
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 149e77454e3c..94735248dcd2 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>  {
> -	if (type >= READ_ONCE(nr_swapfiles))
> +	if (type >= MAX_SWAPFILES)
>  		return NULL;
>  
> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> -	return READ_ONCE(swap_info[type]);
> +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
>  }
>  
>  static inline unsigned char swap_count(unsigned char ent)
> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		WRITE_ONCE(swap_info[type], p);
>  		/*
> -		 * Write swap_info[type] before nr_swapfiles, in case a
> -		 * racing procfs swap_start() or swap_next() is reading them.
> -		 * (We never shrink nr_swapfiles, we never free this entry.)
> +		 * Publish the swap_info_struct.
>  		 */
> -		smp_wmb();
> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
> +		nr_swapfiles++;

Yes, this does help, I didn't understand why smp_wmb stayed around in
the original post.

I think the only access smp_store_release() orders is p->type.  Wouldn't
it be kinda inconsistent to only initialize that one field before
publishing when many others would be done at the end of
alloc_swap_info() after the fact?  p->type doesn't seem special.  For
instance, get_swap_page_of_type() touches si->lock soon after it calls
swap_type_to_swap_info(), so there could be a small window where there's
a non-NULL si with an uninitialized lock.

It's not as if this is likely to be a problem in practice, it would just
make it harder to understand why smp_store_release is there.  Maybe all
we need is a WRITE_ONCE, or if it's really necessary for certain fields
to be set before publication then move them up and explain?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-13 12:46 ` Peter Zijlstra
@ 2021-05-14  3:27     ` Huang, Ying
  2021-05-14  3:27     ` Huang, Ying
  1 sibling, 0 replies; 15+ messages in thread
From: Huang, Ying @ 2021-05-14  3:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote:
>>  mm/swapfile.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 2aad85751991..4c1fb28bbe0e 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>>  
>>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>>  {
>> -	if (type >= READ_ONCE(nr_swapfiles))
>> +	if (type >= MAX_SWAPFILES)
>>  		return NULL;
>>  
>> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
>> +	/*
>> +	 * The data dependency ordering from the READ_ONCE() pairs
>> +	 * with smp_wmb() in alloc_swap_info() to guarantee the
>> +	 * swap_info_struct fields are read after swap_info[type].
>> +	 */
>>  	return READ_ONCE(swap_info[type]);
>>  }
>>  
>> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  	}
>>  	if (type >= nr_swapfiles) {
>>  		p->type = type;
>> -		WRITE_ONCE(swap_info[type], p);
>> -		/*
>> -		 * Write swap_info[type] before nr_swapfiles, in case a
>> -		 * racing procfs swap_start() or swap_next() is reading them.
>> -		 * (We never shrink nr_swapfiles, we never free this entry.)
>> -		 */
>> +		/* Paired with READ_ONCE() in swap_type_to_swap_info() */
>>  		smp_wmb();
>> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>> +		WRITE_ONCE(swap_info[type], p);
>> +		nr_swapfiles++;
>
> Ah, I think I see what you meant to say, it would perhaps help if you
> write it like so:
>
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 149e77454e3c..94735248dcd2 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>  {
> -	if (type >= READ_ONCE(nr_swapfiles))
> +	if (type >= MAX_SWAPFILES)
>  		return NULL;
>  
> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> -	return READ_ONCE(swap_info[type]);
> +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
>  }
>  
>  static inline unsigned char swap_count(unsigned char ent)
> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		WRITE_ONCE(swap_info[type], p);
>  		/*
> -		 * Write swap_info[type] before nr_swapfiles, in case a
> -		 * racing procfs swap_start() or swap_next() is reading them.
> -		 * (We never shrink nr_swapfiles, we never free this entry.)
> +		 * Publish the swap_info_struct.
>  		 */
> -		smp_wmb();
> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
> +		nr_swapfiles++;
>  	} else {
>  		defer = p;
>  		p = swap_info[type];

OK.  It seems that this helps people to understand.  I will use this in
the next version.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
@ 2021-05-14  3:27     ` Huang, Ying
  0 siblings, 0 replies; 15+ messages in thread
From: Huang, Ying @ 2021-05-14  3:27 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, linux-mm, linux-kernel, Daniel Jordan,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, May 13, 2021 at 02:48:37PM +0800, Huang Ying wrote:
>>  mm/swapfile.c | 18 +++++++++---------
>>  1 file changed, 9 insertions(+), 9 deletions(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 2aad85751991..4c1fb28bbe0e 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -100,10 +100,14 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>>  
>>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>>  {
>> -	if (type >= READ_ONCE(nr_swapfiles))
>> +	if (type >= MAX_SWAPFILES)
>>  		return NULL;
>>  
>> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
>> +	/*
>> +	 * The data dependency ordering from the READ_ONCE() pairs
>> +	 * with smp_wmb() in alloc_swap_info() to guarantee the
>> +	 * swap_info_struct fields are read after swap_info[type].
>> +	 */
>>  	return READ_ONCE(swap_info[type]);
>>  }
>>  
>> @@ -2884,14 +2888,10 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  	}
>>  	if (type >= nr_swapfiles) {
>>  		p->type = type;
>> -		WRITE_ONCE(swap_info[type], p);
>> -		/*
>> -		 * Write swap_info[type] before nr_swapfiles, in case a
>> -		 * racing procfs swap_start() or swap_next() is reading them.
>> -		 * (We never shrink nr_swapfiles, we never free this entry.)
>> -		 */
>> +		/* Paired with READ_ONCE() in swap_type_to_swap_info() */
>>  		smp_wmb();
>> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>> +		WRITE_ONCE(swap_info[type], p);
>> +		nr_swapfiles++;
>
> Ah, I think I see what you meant to say, it would perhaps help if you
> write it like so:
>
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 149e77454e3c..94735248dcd2 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>  {
> -	if (type >= READ_ONCE(nr_swapfiles))
> +	if (type >= MAX_SWAPFILES)
>  		return NULL;
>  
> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> -	return READ_ONCE(swap_info[type]);
> +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
>  }
>  
>  static inline unsigned char swap_count(unsigned char ent)
> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		WRITE_ONCE(swap_info[type], p);
>  		/*
> -		 * Write swap_info[type] before nr_swapfiles, in case a
> -		 * racing procfs swap_start() or swap_next() is reading them.
> -		 * (We never shrink nr_swapfiles, we never free this entry.)
> +		 * Publish the swap_info_struct.
>  		 */
> -		smp_wmb();
> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
> +		nr_swapfiles++;
>  	} else {
>  		defer = p;
>  		p = swap_info[type];

OK.  It seems that this helps people to understand.  I will use this in
the next version.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-14  1:59   ` Daniel Jordan
@ 2021-05-14  4:02       ` Huang, Ying
  2021-05-14 12:04     ` Peter Zijlstra
  1 sibling, 0 replies; 15+ messages in thread
From: Huang, Ying @ 2021-05-14  4:02 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote:
>> Ah, I think I see what you meant to say, it would perhaps help if you
>> write it like so:
>> 
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 149e77454e3c..94735248dcd2 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>>  
>>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>>  {
>> -	if (type >= READ_ONCE(nr_swapfiles))
>> +	if (type >= MAX_SWAPFILES)
>>  		return NULL;
>>  
>> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
>> -	return READ_ONCE(swap_info[type]);
>> +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
>>  }
>>  
>>  static inline unsigned char swap_count(unsigned char ent)
>> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  	}
>>  	if (type >= nr_swapfiles) {
>>  		p->type = type;
>> -		WRITE_ONCE(swap_info[type], p);
>>  		/*
>> -		 * Write swap_info[type] before nr_swapfiles, in case a
>> -		 * racing procfs swap_start() or swap_next() is reading them.
>> -		 * (We never shrink nr_swapfiles, we never free this entry.)
>> +		 * Publish the swap_info_struct.
>>  		 */
>> -		smp_wmb();
>> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>> +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
>> +		nr_swapfiles++;
>
> Yes, this does help, I didn't understand why smp_wmb stayed around in
> the original post.
>
> I think the only access smp_store_release() orders is p->type.  Wouldn't
> it be kinda inconsistent to only initialize that one field before
> publishing when many others would be done at the end of
> alloc_swap_info() after the fact?

In addition to p->type, *p is zeroed via kvzalloc().

> p->type doesn't seem special.  For
> instance, get_swap_page_of_type() touches si->lock soon after it calls
> swap_type_to_swap_info(), so there could be a small window where there's
> a non-NULL si with an uninitialized lock.

We usually check the state of swap_info_struct before other operations.
For example, we check si->swap_map in swap_start().

> It's not as if this is likely to be a problem in practice, it would just
> make it harder to understand why smp_store_release is there.  Maybe all
> we need is a WRITE_ONCE, or if it's really necessary for certain fields
> to be set before publication then move them up and explain?

I think we have initialized all fields before publication :-).

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
@ 2021-05-14  4:02       ` Huang, Ying
  0 siblings, 0 replies; 15+ messages in thread
From: Huang, Ying @ 2021-05-14  4:02 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote:
>> Ah, I think I see what you meant to say, it would perhaps help if you
>> write it like so:
>> 
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 149e77454e3c..94735248dcd2 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>>  
>>  static struct swap_info_struct *swap_type_to_swap_info(int type)
>>  {
>> -	if (type >= READ_ONCE(nr_swapfiles))
>> +	if (type >= MAX_SWAPFILES)
>>  		return NULL;
>>  
>> -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
>> -	return READ_ONCE(swap_info[type]);
>> +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
>>  }
>>  
>>  static inline unsigned char swap_count(unsigned char ent)
>> @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  	}
>>  	if (type >= nr_swapfiles) {
>>  		p->type = type;
>> -		WRITE_ONCE(swap_info[type], p);
>>  		/*
>> -		 * Write swap_info[type] before nr_swapfiles, in case a
>> -		 * racing procfs swap_start() or swap_next() is reading them.
>> -		 * (We never shrink nr_swapfiles, we never free this entry.)
>> +		 * Publish the swap_info_struct.
>>  		 */
>> -		smp_wmb();
>> -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>> +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
>> +		nr_swapfiles++;
>
> Yes, this does help, I didn't understand why smp_wmb stayed around in
> the original post.
>
> I think the only access smp_store_release() orders is p->type.  Wouldn't
> it be kinda inconsistent to only initialize that one field before
> publishing when many others would be done at the end of
> alloc_swap_info() after the fact?

In addition to p->type, *p is zeroed via kvzalloc().

> p->type doesn't seem special.  For
> instance, get_swap_page_of_type() touches si->lock soon after it calls
> swap_type_to_swap_info(), so there could be a small window where there's
> a non-NULL si with an uninitialized lock.

We usually check the state of swap_info_struct before other operations.
For example, we check si->swap_map in swap_start().

> It's not as if this is likely to be a problem in practice, it would just
> make it harder to understand why smp_store_release is there.  Maybe all
> we need is a WRITE_ONCE, or if it's really necessary for certain fields
> to be set before publication then move them up and explain?

I think we have initialized all fields before publication :-).

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-14  1:59   ` Daniel Jordan
  2021-05-14  4:02       ` Huang, Ying
@ 2021-05-14 12:04     ` Peter Zijlstra
  2021-05-14 20:51       ` Daniel Jordan
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2021-05-14 12:04 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter,
	Andrea Parri, Andi Kleen, Dave Hansen, Omar Sandoval,
	Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

On Thu, May 13, 2021 at 09:59:46PM -0400, Daniel Jordan wrote:
> On Thu, May 13, 2021 at 02:46:10PM +0200, Peter Zijlstra wrote:
> > Ah, I think I see what you meant to say, it would perhaps help if you
> > write it like so:
> > 
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 149e77454e3c..94735248dcd2 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -99,11 +99,10 @@ atomic_t nr_rotate_swap = ATOMIC_INIT(0);
> >  
> >  static struct swap_info_struct *swap_type_to_swap_info(int type)
> >  {
> > -	if (type >= READ_ONCE(nr_swapfiles))
> > +	if (type >= MAX_SWAPFILES)
> >  		return NULL;
> >  
> > -	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> > -	return READ_ONCE(swap_info[type]);
> > +	return READ_ONCE(swap_info[type]); /* rcu_dereference() */
> >  }
> >  
> >  static inline unsigned char swap_count(unsigned char ent)
> > @@ -2869,14 +2868,11 @@ static struct swap_info_struct *alloc_swap_info(void)
> >  	}
> >  	if (type >= nr_swapfiles) {
> >  		p->type = type;
> > -		WRITE_ONCE(swap_info[type], p);
> >  		/*
> > -		 * Write swap_info[type] before nr_swapfiles, in case a
> > -		 * racing procfs swap_start() or swap_next() is reading them.
> > -		 * (We never shrink nr_swapfiles, we never free this entry.)
> > +		 * Publish the swap_info_struct.
> >  		 */
> > -		smp_wmb();
> > -		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> > +		smp_store_release(&swap_info[type], p); /* rcu_assign_pointer() */
> > +		nr_swapfiles++;
> 
> Yes, this does help, I didn't understand why smp_wmb stayed around in
> the original post.
> 
> I think the only access smp_store_release() orders is p->type.  Wouldn't
> it be kinda inconsistent to only initialize that one field before
> publishing when many others would be done at the end of
> alloc_swap_info() after the fact?  p->type doesn't seem special.  For
> instance, get_swap_page_of_type() touches si->lock soon after it calls
> swap_type_to_swap_info(), so there could be a small window where there's
> a non-NULL si with an uninitialized lock.
> 
> It's not as if this is likely to be a problem in practice, it would just
> make it harder to understand why smp_store_release is there.  Maybe all
> we need is a WRITE_ONCE, or if it's really necessary for certain fields
> to be set before publication then move them up and explain?

You also care about the zero fill from kvzalloc(). Without the
smp_store_release() the zero-fill from the memset() might only be
visible 'late'.

Unless that also isn't a problem?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-14  4:02       ` Huang, Ying
  (?)
@ 2021-05-14 20:49       ` Daniel Jordan
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Jordan @ 2021-05-14 20:49 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Peter Zijlstra, Andrew Morton, linux-mm, linux-kernel,
	Dan Carpenter, Andrea Parri, Andi Kleen, Dave Hansen,
	Omar Sandoval, Paul McKenney, Tejun Heo, Will Deacon, Miaohe Lin

On Fri, May 14, 2021 at 12:02:05PM +0800, Huang, Ying wrote:
> Daniel Jordan <daniel.m.jordan@oracle.com> writes:
> > Yes, this does help, I didn't understand why smp_wmb stayed around in
> > the original post.
> >
> > I think the only access smp_store_release() orders is p->type.  Wouldn't
> > it be kinda inconsistent to only initialize that one field before
> > publishing when many others would be done at the end of
> > alloc_swap_info() after the fact?
> 
> In addition to p->type, *p is zeroed via kvzalloc().

So it is, good point.

> > p->type doesn't seem special.  For
> > instance, get_swap_page_of_type() touches si->lock soon after it calls
> > swap_type_to_swap_info(), so there could be a small window where there's
> > a non-NULL si with an uninitialized lock.
> 
> We usually check the state of swap_info_struct before other operations.
> For example, we check si->swap_map in swap_start().

Yes, we usually do.

> > It's not as if this is likely to be a problem in practice, it would just
> > make it harder to understand why smp_store_release is there.  Maybe all
> > we need is a WRITE_ONCE, or if it's really necessary for certain fields
> > to be set before publication then move them up and explain?
> 
> I think we have initialized all fields before publication :-).

Probably all the ones that matter in practice, yes :-)

Still feeling slightly uneasy about the theoretical p->lock, but that
was possible before this change too so it's out of scope.

A comment explaining the pairing and that we care mostly about the zero
init would be nice.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info()
  2021-05-14 12:04     ` Peter Zijlstra
@ 2021-05-14 20:51       ` Daniel Jordan
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Jordan @ 2021-05-14 20:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Ying, Andrew Morton, linux-mm, linux-kernel, Dan Carpenter,
	Andi Kleen, Dave Hansen, Omar Sandoval, Paul McKenney, Tejun Heo,
	Will Deacon, Miaohe Lin

On Fri, May 14, 2021 at 02:04:14PM +0200, Peter Zijlstra wrote:
> On Thu, May 13, 2021 at 09:59:46PM -0400, Daniel Jordan wrote:
> > Yes, this does help, I didn't understand why smp_wmb stayed around in
> > the original post.
> > 
> > I think the only access smp_store_release() orders is p->type.  Wouldn't
> > it be kinda inconsistent to only initialize that one field before
> > publishing when many others would be done at the end of
> > alloc_swap_info() after the fact?  p->type doesn't seem special.  For
> > instance, get_swap_page_of_type() touches si->lock soon after it calls
> > swap_type_to_swap_info(), so there could be a small window where there's
> > a non-NULL si with an uninitialized lock.
> > 
> > It's not as if this is likely to be a problem in practice, it would just
> > make it harder to understand why smp_store_release is there.  Maybe all
> > we need is a WRITE_ONCE, or if it's really necessary for certain fields
> > to be set before publication then move them up and explain?
> 
> You also care about the zero fill from kvzalloc(). Without the
> smp_store_release() the zero-fill from the memset() might only be
> visible 'late'.

Aha, yes, didn't consider that!

> Unless that also isn't a problem?

No, you're right, we need that for p->flags at least.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-14 20:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-13  6:48 [PATCH] mm, swap: Remove unnecessary smp_rmb() in swap_type_to_swap_info() Huang Ying
2021-05-13  8:49 ` Miaohe Lin
2021-05-13  9:54   ` Muchun Song
2021-05-13  9:54     ` Muchun Song
2021-05-13 11:27     ` Miaohe Lin
2021-05-13 12:34     ` Peter Zijlstra
2021-05-13 12:46 ` Peter Zijlstra
2021-05-14  1:59   ` Daniel Jordan
2021-05-14  4:02     ` Huang, Ying
2021-05-14  4:02       ` Huang, Ying
2021-05-14 20:49       ` Daniel Jordan
2021-05-14 12:04     ` Peter Zijlstra
2021-05-14 20:51       ` Daniel Jordan
2021-05-14  3:27   ` Huang, Ying
2021-05-14  3:27     ` Huang, Ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.