All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-11  9:59 ` Dan Carpenter
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-11  9:59 UTC (permalink / raw)
  To: Andrew Morton, Shaohua Li
  Cc: Huang Ying, Daniel Jordan, Dave Hansen, Stephen Rothwell,
	Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm, kernel-janitors

Smatch complains that the NULL checks on "si" aren't consistent.  This
seems like a real bug because we have not ensured that the type is
valid and so "si" can be NULL.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 mm/swapfile.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index f0edf7244256..21e92c757205 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
 	struct swap_info_struct *si;
 	pgoff_t offset;
 
+	if (type >= nr_swapfiles)
+		goto fail;
+
 	si = swap_info[type];
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-11  9:59 ` Dan Carpenter
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-11  9:59 UTC (permalink / raw)
  To: Andrew Morton, Shaohua Li
  Cc: Huang Ying, Daniel Jordan, Dave Hansen, Stephen Rothwell,
	Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm, kernel-janitors

Smatch complains that the NULL checks on "si" aren't consistent.  This
seems like a real bug because we have not ensured that the type is
valid and so "si" can be NULL.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
---
 mm/swapfile.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index f0edf7244256..21e92c757205 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
 	struct swap_info_struct *si;
 	pgoff_t offset;
 
+	if (type >= nr_swapfiles)
+		goto fail;
+
 	si = swap_info[type];
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-11  9:59 ` Dan Carpenter
@ 2019-01-11 17:41   ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-11 17:41 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Andrew Morton, Shaohua Li, Huang Ying, Daniel Jordan,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, andrea.parri

On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> Smatch complains that the NULL checks on "si" aren't consistent.  This
> seems like a real bug because we have not ensured that the type is
> valid and so "si" can be NULL.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
>  mm/swapfile.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f0edf7244256..21e92c757205 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
>  	struct swap_info_struct *si;
>  	pgoff_t offset;
>  
> +	if (type >= nr_swapfiles)
> +		goto fail;
> +

As long as we're worrying about NULL, I think there should be an smp_rmb here
to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
per LKMM.

I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
all) who can correct me if I'm wrong about any of this.

>  	si = swap_info[type];
>  	spin_lock(&si->lock);
> -	if (si && (si->flags & SWP_WRITEOK)) {
> +	if (si->flags & SWP_WRITEOK) {
>  		atomic_long_dec(&nr_swap_pages);
>  		/* This is called for allocating swap entry, not cache */
>  		offset = scan_swap_map(si, 1);
> @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
>  		atomic_long_inc(&nr_swap_pages);
>  	}
>  	spin_unlock(&si->lock);
> +fail:
>  	return (swp_entry_t) {0};
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-11 17:41   ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-11 17:41 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Andrew Morton, Shaohua Li, Huang Ying, Daniel Jordan,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, andrea.parri

On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> Smatch complains that the NULL checks on "si" aren't consistent.  This
> seems like a real bug because we have not ensured that the type is
> valid and so "si" can be NULL.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> ---
>  mm/swapfile.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f0edf7244256..21e92c757205 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
>  	struct swap_info_struct *si;
>  	pgoff_t offset;
>  
> +	if (type >= nr_swapfiles)
> +		goto fail;
> +

As long as we're worrying about NULL, I think there should be an smp_rmb here
to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
per LKMM.

I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
all) who can correct me if I'm wrong about any of this.

>  	si = swap_info[type];
>  	spin_lock(&si->lock);
> -	if (si && (si->flags & SWP_WRITEOK)) {
> +	if (si->flags & SWP_WRITEOK) {
>  		atomic_long_dec(&nr_swap_pages);
>  		/* This is called for allocating swap entry, not cache */
>  		offset = scan_swap_map(si, 1);
> @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
>  		atomic_long_inc(&nr_swap_pages);
>  	}
>  	spin_unlock(&si->lock);
> +fail:
>  	return (swp_entry_t) {0};
>  }
>  
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-11 17:41   ` Daniel Jordan
@ 2019-01-11 23:20     ` Andrea Parri
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-11 23:20 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, Paul E. McKenney,
	Alan Stern, Peter Zijlstra, Will Deacon

Hi Daniel,

On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > Smatch complains that the NULL checks on "si" aren't consistent.  This
> > seems like a real bug because we have not ensured that the type is
> > valid and so "si" can be NULL.
> > 
> > Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> >  mm/swapfile.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index f0edf7244256..21e92c757205 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> >  	struct swap_info_struct *si;
> >  	pgoff_t offset;
> >  
> > +	if (type >= nr_swapfiles)
> > +		goto fail;
> > +
> 
> As long as we're worrying about NULL, I think there should be an smp_rmb here
> to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> per LKMM.
> 
> I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> all) who can correct me if I'm wrong about any of this.

This is to confirm that your analysis seems correct to me: the barriers
should guarantee that get_swap_page_of_type() will observe the store to
swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
provided get_swap_page_of_type() observes the increment of nr_swapfiles
performed by the (same instance of) alloc_swap_info().

One clarification about the READ_ONCE() matter: the LKMM cannot handle
plain or unmarked (shared memory) accesses in their generality at the
moment (patches providing support for these accesses are in the making,
but they will take some time); IAC, I'm confident to anticipate that,
for the particular pattern in question (aka, MP), marking the accesses
to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
to convert nr_swapfiles to an atomic_t type...).

I take the liberty of adding other LKMM folks (so that they can blame
me for "the spam"! ;-) ): I've learnt from experience that four or more
eyes are better than two when it comes to discuss these matters... ;-)

  Andrea


> 
> >  	si = swap_info[type];
> >  	spin_lock(&si->lock);
> > -	if (si && (si->flags & SWP_WRITEOK)) {
> > +	if (si->flags & SWP_WRITEOK) {
> >  		atomic_long_dec(&nr_swap_pages);
> >  		/* This is called for allocating swap entry, not cache */
> >  		offset = scan_swap_map(si, 1);
> > @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
> >  		atomic_long_inc(&nr_swap_pages);
> >  	}
> >  	spin_unlock(&si->lock);
> > +fail:
> >  	return (swp_entry_t) {0};
> >  }
> >  
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-11 23:20     ` Andrea Parri
  0 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-11 23:20 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, Paul E. McKenney,
	Alan Stern, Peter Zijlstra, Will Deacon

Hi Daniel,

On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > Smatch complains that the NULL checks on "si" aren't consistent.  This
> > seems like a real bug because we have not ensured that the type is
> > valid and so "si" can be NULL.
> > 
> > Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> > Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> > ---
> >  mm/swapfile.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index f0edf7244256..21e92c757205 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> >  	struct swap_info_struct *si;
> >  	pgoff_t offset;
> >  
> > +	if (type >= nr_swapfiles)
> > +		goto fail;
> > +
> 
> As long as we're worrying about NULL, I think there should be an smp_rmb here
> to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> per LKMM.
> 
> I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> all) who can correct me if I'm wrong about any of this.

This is to confirm that your analysis seems correct to me: the barriers
should guarantee that get_swap_page_of_type() will observe the store to
swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
provided get_swap_page_of_type() observes the increment of nr_swapfiles
performed by the (same instance of) alloc_swap_info().

One clarification about the READ_ONCE() matter: the LKMM cannot handle
plain or unmarked (shared memory) accesses in their generality at the
moment (patches providing support for these accesses are in the making,
but they will take some time); IAC, I'm confident to anticipate that,
for the particular pattern in question (aka, MP), marking the accesses
to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
to convert nr_swapfiles to an atomic_t type...).

I take the liberty of adding other LKMM folks (so that they can blame
me for "the spam"! ;-) ): I've learnt from experience that four or more
eyes are better than two when it comes to discuss these matters... ;-)

  Andrea


> 
> >  	si = swap_info[type];
> >  	spin_lock(&si->lock);
> > -	if (si && (si->flags & SWP_WRITEOK)) {
> > +	if (si->flags & SWP_WRITEOK) {
> >  		atomic_long_dec(&nr_swap_pages);
> >  		/* This is called for allocating swap entry, not cache */
> >  		offset = scan_swap_map(si, 1);
> > @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
> >  		atomic_long_inc(&nr_swap_pages);
> >  	}
> >  	spin_unlock(&si->lock);
> > +fail:
> >  	return (swp_entry_t) {0};
> >  }
> >  
> > -- 
> > 2.17.1
> > 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-11 17:41   ` Daniel Jordan
  (?)
@ 2019-01-14  2:12     ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-01-14  2:12 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Dave Hansen,
	Stephen Rothwell, Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm,
	kernel-janitors, andrea.parri

Hi, Daniel,

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
>> Smatch complains that the NULL checks on "si" aren't consistent.  This
>> seems like a real bug because we have not ensured that the type is
>> valid and so "si" can be NULL.
>> 
>> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>> ---
>>  mm/swapfile.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f0edf7244256..21e92c757205 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
>>  	struct swap_info_struct *si;
>>  	pgoff_t offset;
>>  
>> +	if (type >= nr_swapfiles)
>> +		goto fail;
>> +
>
> As long as we're worrying about NULL, I think there should be an smp_rmb here
> to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> per LKMM.

I think you are right here.  And smp_rmb() for nr_swapfiles are missing
in many other places in swapfile.c too (e.g. __swap_info_get(),
swapdev_block(), etc.).

In theory, I think we need to fix this.

Best Regards,
Huang, Ying

> I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> all) who can correct me if I'm wrong about any of this.
>
>>  	si = swap_info[type];
>>  	spin_lock(&si->lock);
>> -	if (si && (si->flags & SWP_WRITEOK)) {
>> +	if (si->flags & SWP_WRITEOK) {
>>  		atomic_long_dec(&nr_swap_pages);
>>  		/* This is called for allocating swap entry, not cache */
>>  		offset = scan_swap_map(si, 1);
>> @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
>>  		atomic_long_inc(&nr_swap_pages);
>>  	}
>>  	spin_unlock(&si->lock);
>> +fail:
>>  	return (swp_entry_t) {0};
>>  }
>>  
>> -- 
>> 2.17.1
>> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-14  2:12     ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-01-14  2:12 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Dave Hansen,
	Stephen Rothwell, Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm,
	kernel-janitors, andrea.parri

Hi, Daniel,

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
>> Smatch complains that the NULL checks on "si" aren't consistent.  This
>> seems like a real bug because we have not ensured that the type is
>> valid and so "si" can be NULL.
>> 
>> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>> ---
>>  mm/swapfile.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f0edf7244256..21e92c757205 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
>>  	struct swap_info_struct *si;
>>  	pgoff_t offset;
>>  
>> +	if (type >= nr_swapfiles)
>> +		goto fail;
>> +
>
> As long as we're worrying about NULL, I think there should be an smp_rmb here
> to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> per LKMM.

I think you are right here.  And smp_rmb() for nr_swapfiles are missing
in many other places in swapfile.c too (e.g. __swap_info_get(),
swapdev_block(), etc.).

In theory, I think we need to fix this.

Best Regards,
Huang, Ying

> I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> all) who can correct me if I'm wrong about any of this.
>
>>  	si = swap_info[type];
>>  	spin_lock(&si->lock);
>> -	if (si && (si->flags & SWP_WRITEOK)) {
>> +	if (si->flags & SWP_WRITEOK) {
>>  		atomic_long_dec(&nr_swap_pages);
>>  		/* This is called for allocating swap entry, not cache */
>>  		offset = scan_swap_map(si, 1);
>> @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
>>  		atomic_long_inc(&nr_swap_pages);
>>  	}
>>  	spin_unlock(&si->lock);
>> +fail:
>>  	return (swp_entry_t) {0};
>>  }
>>  
>> -- 
>> 2.17.1
>> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-14  2:12     ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-01-14  2:12 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Dave Hansen,
	Stephen Rothwell, Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm,
	kernel-janitors, andrea.parri

Hi, Daniel,

Daniel Jordan <daniel.m.jordan@oracle.com> writes:

> On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
>> Smatch complains that the NULL checks on "si" aren't consistent.  This
>> seems like a real bug because we have not ensured that the type is
>> valid and so "si" can be NULL.
>> 
>> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>> ---
>>  mm/swapfile.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index f0edf7244256..21e92c757205 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
>>  	struct swap_info_struct *si;
>>  	pgoff_t offset;
>>  
>> +	if (type >= nr_swapfiles)
>> +		goto fail;
>> +
>
> As long as we're worrying about NULL, I think there should be an smp_rmb here
> to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> per LKMM.

I think you are right here.  And smp_rmb() for nr_swapfiles are missing
in many other places in swapfile.c too (e.g. __swap_info_get(),
swapdev_block(), etc.).

In theory, I think we need to fix this.

Best Regards,
Huang, Ying

> I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> all) who can correct me if I'm wrong about any of this.
>
>>  	si = swap_info[type];
>>  	spin_lock(&si->lock);
>> -	if (si && (si->flags & SWP_WRITEOK)) {
>> +	if (si->flags & SWP_WRITEOK) {
>>  		atomic_long_dec(&nr_swap_pages);
>>  		/* This is called for allocating swap entry, not cache */
>>  		offset = scan_swap_map(si, 1);
>> @@ -1061,6 +1064,7 @@ swp_entry_t get_swap_page_of_type(int type)
>>  		atomic_long_inc(&nr_swap_pages);
>>  	}
>>  	spin_unlock(&si->lock);
>> +fail:
>>  	return (swp_entry_t) {0};
>>  }
>>  
>> -- 
>> 2.17.1
>> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-11 17:41   ` Daniel Jordan
@ 2019-01-14  8:43     ` Dan Carpenter
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-14  8:43 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Andrew Morton, Shaohua Li, Huang Ying, Dave Hansen,
	Stephen Rothwell, Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm,
	kernel-janitors, andrea.parri

I'm really terribly ignorant when it comes to things like this...  To me
it looked like the barrier in alloc_swap_info() was enough but when so
many smarter people disagree then I must be wrong.  I'd like to help,
but I sort of feel unqualified.

Could someone else take care of it?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-14  8:43     ` Dan Carpenter
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-14  8:43 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Andrew Morton, Shaohua Li, Huang Ying, Dave Hansen,
	Stephen Rothwell, Omar Sandoval, Tejun Heo, Andi Kleen, linux-mm,
	kernel-janitors, andrea.parri

I'm really terribly ignorant when it comes to things like this...  To me
it looked like the barrier in alloc_swap_info() was enough but when so
many smarter people disagree then I must be wrong.  I'd like to help,
but I sort of feel unqualified.

Could someone else take care of it?

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-11 23:20     ` Andrea Parri
@ 2019-01-14 22:25       ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-14 22:25 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Daniel Jordan, Dan Carpenter, Andrew Morton, Shaohua Li,
	Huang Ying, Dave Hansen, Stephen Rothwell, Omar Sandoval,
	Tejun Heo, Andi Kleen, linux-mm, kernel-janitors,
	Paul E. McKenney, Alan Stern, Peter Zijlstra, Will Deacon

On Sat, Jan 12, 2019 at 12:20:07AM +0100, Andrea Parri wrote:
> On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> > On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index f0edf7244256..21e92c757205 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> > >  	struct swap_info_struct *si;
> > >  	pgoff_t offset;
> > >  
> > > +	if (type >= nr_swapfiles)
> > > +		goto fail;
> > > +
> > 
> > As long as we're worrying about NULL, I think there should be an smp_rmb here
> > to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> > swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> > matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> > per LKMM.
> > 
> > I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> > all) who can correct me if I'm wrong about any of this.
> 
> This is to confirm that your analysis seems correct to me: the barriers
> should guarantee that get_swap_page_of_type() will observe the store to
> swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
> provided get_swap_page_of_type() observes the increment of nr_swapfiles
> performed by the (same instance of) alloc_swap_info().

That's good to hear, thanks for looking into it.

> One clarification about the READ_ONCE() matter: the LKMM cannot handle
> plain or unmarked (shared memory) accesses in their generality at the
> moment (patches providing support for these accesses are in the making,
> but they will take some time); IAC, I'm confident to anticipate that,
> for the particular pattern in question (aka, MP), marking the accesses
> to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
> to convert nr_swapfiles to an atomic_t type...).

I guess you mean we could either use READ_ONCE or make nr_swapfiles atomic,
they're different ways of achieving the same thing.

> I take the liberty of adding other LKMM folks (so that they can blame
> me for "the spam"! ;-) ): I've learnt from experience that four or more
> eyes are better than two when it comes to discuss these matters... ;-)

Ok, it's fine with me as long as they blame you :)

> > >  	si = swap_info[type];

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-14 22:25       ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-14 22:25 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Daniel Jordan, Dan Carpenter, Andrew Morton, Shaohua Li,
	Huang Ying, Dave Hansen, Stephen Rothwell, Omar Sandoval,
	Tejun Heo, Andi Kleen, linux-mm, kernel-janitors,
	Paul E. McKenney, Alan Stern, Peter Zijlstra, Will Deacon

On Sat, Jan 12, 2019 at 12:20:07AM +0100, Andrea Parri wrote:
> On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> > On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > index f0edf7244256..21e92c757205 100644
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> > >  	struct swap_info_struct *si;
> > >  	pgoff_t offset;
> > >  
> > > +	if (type >= nr_swapfiles)
> > > +		goto fail;
> > > +
> > 
> > As long as we're worrying about NULL, I think there should be an smp_rmb here
> > to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> > swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> > matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> > per LKMM.
> > 
> > I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> > all) who can correct me if I'm wrong about any of this.
> 
> This is to confirm that your analysis seems correct to me: the barriers
> should guarantee that get_swap_page_of_type() will observe the store to
> swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
> provided get_swap_page_of_type() observes the increment of nr_swapfiles
> performed by the (same instance of) alloc_swap_info().

That's good to hear, thanks for looking into it.

> One clarification about the READ_ONCE() matter: the LKMM cannot handle
> plain or unmarked (shared memory) accesses in their generality at the
> moment (patches providing support for these accesses are in the making,
> but they will take some time); IAC, I'm confident to anticipate that,
> for the particular pattern in question (aka, MP), marking the accesses
> to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
> to convert nr_swapfiles to an atomic_t type...).

I guess you mean we could either use READ_ONCE or make nr_swapfiles atomic,
they're different ways of achieving the same thing.

> I take the liberty of adding other LKMM folks (so that they can blame
> me for "the spam"! ;-) ): I've learnt from experience that four or more
> eyes are better than two when it comes to discuss these matters... ;-)

Ok, it's fine with me as long as they blame you :)

> > >  	si = swap_info[type];

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-14  8:43     ` Dan Carpenter
@ 2019-01-14 23:40       ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-14 23:40 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Daniel Jordan, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, andrea.parri

On Mon, Jan 14, 2019 at 11:43:10AM +0300, Dan Carpenter wrote:
> I'm really terribly ignorant when it comes to things like this...  To me
> it looked like the barrier in alloc_swap_info() was enough but when so
> many smarter people disagree then I must be wrong.  I'd like to help,
> but I sort of feel unqualified.
> 
> Could someone else take care of it?

I'm not the most qualified person either, but I gave it a try anyway.  Patch to
follow.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-14 23:40       ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-14 23:40 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Daniel Jordan, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, andrea.parri

On Mon, Jan 14, 2019 at 11:43:10AM +0300, Dan Carpenter wrote:
> I'm really terribly ignorant when it comes to things like this...  To me
> it looked like the barrier in alloc_swap_info() was enough but when so
> many smarter people disagree then I must be wrong.  I'd like to help,
> but I sort of feel unqualified.
> 
> Could someone else take care of it?

I'm not the most qualified person either, but I gave it a try anyway.  Patch to
follow.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-14 22:25       ` Daniel Jordan
@ 2019-01-15  0:23         ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-15  0:23 UTC (permalink / raw)
  To: akpm
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon, daniel.m.jordan

Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:

  Smatch complains that the NULL checks on "si" aren't consistent.  This
  seems like a real bug because we have not ensured that the type is
  valid and so "si" can be NULL.

Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:

        CPU0                           CPU1
        alloc_swap_info()              if (type >= nr_swapfiles)
          swap_info[type] = p              /* handle invalid entry */
          smp_wmb()                    smp_rmb()
          ++nr_swapfiles               p = swap_info[type]

Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].

Ying Huang noticed that other places don't order these reads properly.
Introduce swap_type_to_swap_info to encourage correct usage.

Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).

This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.

This is a theoretical problem, no actual reports of it exist.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>

---

I'd appreciate it if someone more familiar with memory barriers could
check this over.  Thanks.

Probably no need for stable, this is all theoretical.

Against linux-mmotm tag v5.0-rc1-mmotm-2019-01-09-13-40

 mm/swapfile.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index f0edf7244256..dad52fc67045 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -99,6 +99,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
 
 atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
+static struct swap_info_struct *swap_type_to_swap_info(int type)
+{
+	if (type >= READ_ONCE(nr_swapfiles))
+		return NULL;
+
+	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
+	return READ_ONCE(swap_info[type]);
+}
+
 static inline unsigned char swap_count(unsigned char ent)
 {
 	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
@@ -1045,12 +1054,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 /* The only caller of this function is now suspend routine */
 swp_entry_t get_swap_page_of_type(int type)
 {
-	struct swap_info_struct *si;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 	pgoff_t offset;
 
-	si = swap_info[type];
+	if (!si)
+		goto fail;
+
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1061,6 +1072,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
@@ -1072,9 +1084,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swap_type_to_swap_info(type);
+	if (!p)
 		goto bad_nofile;
-	p = swap_info[type];
 	if (!(p->flags & SWP_USED))
 		goto bad_device;
 	offset = swp_offset(entry);
@@ -1212,9 +1224,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	si = swap_type_to_swap_info(type);
+	if (!si)
 		goto bad_nofile;
-	si = swap_info[type];
 
 	preempt_disable();
 	if (!(si->flags & SWP_VALID))
@@ -1765,10 +1777,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
 sector_t swapdev_block(int type, pgoff_t offset)
 {
 	struct block_device *bdev;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 
-	if ((unsigned int)type >= nr_swapfiles)
-		return 0;
-	if (!(swap_info[type]->flags & SWP_WRITEOK))
+	if (!si || !(si->flags & SWP_WRITEOK))
 		return 0;
 	return map_swap_entry(swp_entry(type, offset), &bdev);
 }
@@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
 	if (!l)
 		return SEQ_START_TOKEN;
 
-	for (type = 0; type < nr_swapfiles; type++) {
+	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
 		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+		si = READ_ONCE(swap_info[type]);
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		if (!--l)
@@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
 	else
 		type = si->type + 1;
 
-	for (; type < nr_swapfiles; type++) {
+	for (; type < READ_ONCE(nr_swapfiles); type++) {
 		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+		si = READ_ONCE(swap_info[type]);
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		++*pos;
@@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		swap_info[type] = p;
+		WRITE_ONCE(swap_info[type], p);
 		/*
 		 * Write swap_info[type] before nr_swapfiles, in case a
 		 * racing procfs swap_start() or swap_next() is reading them.
 		 * (We never shrink nr_swapfiles, we never free this entry.)
 		 */
 		smp_wmb();
-		nr_swapfiles++;
+		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
 	} else {
 		kvfree(p);
 		p = swap_info[type];
-- 
2.20.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-15  0:23         ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-15  0:23 UTC (permalink / raw)
  To: akpm
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon, daniel.m.jordan

Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:

  Smatch complains that the NULL checks on "si" aren't consistent.  This
  seems like a real bug because we have not ensured that the type is
  valid and so "si" can be NULL.

Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:

        CPU0                           CPU1
        alloc_swap_info()              if (type >= nr_swapfiles)
          swap_info[type] = p              /* handle invalid entry */
          smp_wmb()                    smp_rmb()
          ++nr_swapfiles               p = swap_info[type]

Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].

Ying Huang noticed that other places don't order these reads properly.
Introduce swap_type_to_swap_info to encourage correct usage.

Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).

This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.

This is a theoretical problem, no actual reports of it exist.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>

---

I'd appreciate it if someone more familiar with memory barriers could
check this over.  Thanks.

Probably no need for stable, this is all theoretical.

Against linux-mmotm tag v5.0-rc1-mmotm-2019-01-09-13-40

 mm/swapfile.c | 43 +++++++++++++++++++++++++++----------------
 1 file changed, 27 insertions(+), 16 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index f0edf7244256..dad52fc67045 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -99,6 +99,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
 
 atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
+static struct swap_info_struct *swap_type_to_swap_info(int type)
+{
+	if (type >= READ_ONCE(nr_swapfiles))
+		return NULL;
+
+	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
+	return READ_ONCE(swap_info[type]);
+}
+
 static inline unsigned char swap_count(unsigned char ent)
 {
 	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
@@ -1045,12 +1054,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 /* The only caller of this function is now suspend routine */
 swp_entry_t get_swap_page_of_type(int type)
 {
-	struct swap_info_struct *si;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 	pgoff_t offset;
 
-	si = swap_info[type];
+	if (!si)
+		goto fail;
+
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1061,6 +1072,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
@@ -1072,9 +1084,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swap_type_to_swap_info(type);
+	if (!p)
 		goto bad_nofile;
-	p = swap_info[type];
 	if (!(p->flags & SWP_USED))
 		goto bad_device;
 	offset = swp_offset(entry);
@@ -1212,9 +1224,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	si = swap_type_to_swap_info(type);
+	if (!si)
 		goto bad_nofile;
-	si = swap_info[type];
 
 	preempt_disable();
 	if (!(si->flags & SWP_VALID))
@@ -1765,10 +1777,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
 sector_t swapdev_block(int type, pgoff_t offset)
 {
 	struct block_device *bdev;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 
-	if ((unsigned int)type >= nr_swapfiles)
-		return 0;
-	if (!(swap_info[type]->flags & SWP_WRITEOK))
+	if (!si || !(si->flags & SWP_WRITEOK))
 		return 0;
 	return map_swap_entry(swp_entry(type, offset), &bdev);
 }
@@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
 	if (!l)
 		return SEQ_START_TOKEN;
 
-	for (type = 0; type < nr_swapfiles; type++) {
+	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
 		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+		si = READ_ONCE(swap_info[type]);
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		if (!--l)
@@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
 	else
 		type = si->type + 1;
 
-	for (; type < nr_swapfiles; type++) {
+	for (; type < READ_ONCE(nr_swapfiles); type++) {
 		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+		si = READ_ONCE(swap_info[type]);
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		++*pos;
@@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		swap_info[type] = p;
+		WRITE_ONCE(swap_info[type], p);
 		/*
 		 * Write swap_info[type] before nr_swapfiles, in case a
 		 * racing procfs swap_start() or swap_next() is reading them.
 		 * (We never shrink nr_swapfiles, we never free this entry.)
 		 */
 		smp_wmb();
-		nr_swapfiles++;
+		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
 	} else {
 		kvfree(p);
 		p = swap_info[type];
-- 
2.20.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
  2019-01-14 22:25       ` Daniel Jordan
@ 2019-01-15  0:28         ` Andrea Parri
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-15  0:28 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, Paul E. McKenney,
	Alan Stern, Peter Zijlstra, Will Deacon

On Mon, Jan 14, 2019 at 02:25:29PM -0800, Daniel Jordan wrote:
> On Sat, Jan 12, 2019 at 12:20:07AM +0100, Andrea Parri wrote:
> > On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> > > On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > index f0edf7244256..21e92c757205 100644
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> > > >  	struct swap_info_struct *si;
> > > >  	pgoff_t offset;
> > > >  
> > > > +	if (type >= nr_swapfiles)
> > > > +		goto fail;
> > > > +
> > > 
> > > As long as we're worrying about NULL, I think there should be an smp_rmb here
> > > to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> > > swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> > > matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> > > per LKMM.
> > > 
> > > I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> > > all) who can correct me if I'm wrong about any of this.
> > 
> > This is to confirm that your analysis seems correct to me: the barriers
> > should guarantee that get_swap_page_of_type() will observe the store to
> > swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
> > provided get_swap_page_of_type() observes the increment of nr_swapfiles
> > performed by the (same instance of) alloc_swap_info().
> 
> That's good to hear, thanks for looking into it.
> 
> > One clarification about the READ_ONCE() matter: the LKMM cannot handle
> > plain or unmarked (shared memory) accesses in their generality at the
> > moment (patches providing support for these accesses are in the making,
> > but they will take some time); IAC, I'm confident to anticipate that,
> > for the particular pattern in question (aka, MP), marking the accesses
> > to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
> > to convert nr_swapfiles to an atomic_t type...).
> 
> I guess you mean we could either use READ_ONCE or make nr_swapfiles atomic,
> they're different ways of achieving the same thing.

Indeed: I was suggesting to mark the read _and the increment of
nr_swapfiles, as I see you did in the patch you just submitted. 

  Andrea


> > swap_info[type] performed by alloc_swap_info() (or a "co"-later store),

> 
> > I take the liberty of adding other LKMM folks (so that they can blame
> > me for "the spam"! ;-) ): I've learnt from experience that four or more
> > eyes are better than two when it comes to discuss these matters... ;-)
> 
> Ok, it's fine with me as long as they blame you :)
> 
> > > >  	si = swap_info[type];

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type()
@ 2019-01-15  0:28         ` Andrea Parri
  0 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-15  0:28 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: Dan Carpenter, Andrew Morton, Shaohua Li, Huang Ying,
	Dave Hansen, Stephen Rothwell, Omar Sandoval, Tejun Heo,
	Andi Kleen, linux-mm, kernel-janitors, Paul E. McKenney,
	Alan Stern, Peter Zijlstra, Will Deacon

On Mon, Jan 14, 2019 at 02:25:29PM -0800, Daniel Jordan wrote:
> On Sat, Jan 12, 2019 at 12:20:07AM +0100, Andrea Parri wrote:
> > On Fri, Jan 11, 2019 at 09:41:28AM -0800, Daniel Jordan wrote:
> > > On Fri, Jan 11, 2019 at 12:59:19PM +0300, Dan Carpenter wrote:
> > > > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > > > index f0edf7244256..21e92c757205 100644
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -1048,9 +1048,12 @@ swp_entry_t get_swap_page_of_type(int type)
> > > >  	struct swap_info_struct *si;
> > > >  	pgoff_t offset;
> > > >  
> > > > +	if (type >= nr_swapfiles)
> > > > +		goto fail;
> > > > +
> > > 
> > > As long as we're worrying about NULL, I think there should be an smp_rmb here
> > > to ensure swap_info[type] isn't NULL in case of an (admittedly unlikely) racing
> > > swapon that increments nr_swapfiles.  See smp_wmb in alloc_swap_info and the
> > > matching smp_rmb's in the file.  And READ_ONCE's on either side of the barrier
> > > per LKMM.
> > > 
> > > I'm adding Andrea (randomly selected from the many LKMM folks to avoid spamming
> > > all) who can correct me if I'm wrong about any of this.
> > 
> > This is to confirm that your analysis seems correct to me: the barriers
> > should guarantee that get_swap_page_of_type() will observe the store to
> > swap_info[type] performed by alloc_swap_info() (or a "co"-later store),
> > provided get_swap_page_of_type() observes the increment of nr_swapfiles
> > performed by the (same instance of) alloc_swap_info().
> 
> That's good to hear, thanks for looking into it.
> 
> > One clarification about the READ_ONCE() matter: the LKMM cannot handle
> > plain or unmarked (shared memory) accesses in their generality at the
> > moment (patches providing support for these accesses are in the making,
> > but they will take some time); IAC, I'm confident to anticipate that,
> > for the particular pattern in question (aka, MP), marking the accesses
> > to nr_swapfiles will be "LKMM-sane" (one way to achieve this would be
> > to convert nr_swapfiles to an atomic_t type...).
> 
> I guess you mean we could either use READ_ONCE or make nr_swapfiles atomic,
> they're different ways of achieving the same thing.

Indeed: I was suggesting to mark the read _and the increment of
nr_swapfiles, as I see you did in the patch you just submitted. 

  Andrea


> > swap_info[type] performed by alloc_swap_info() (or a "co"-later store),

> 
> > I take the liberty of adding other LKMM folks (so that they can blame
> > me for "the spam"! ;-) ): I've learnt from experience that four or more
> > eyes are better than two when it comes to discuss these matters... ;-)
> 
> Ok, it's fine with me as long as they blame you :)
> 
> > > >  	si = swap_info[type];

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-15  0:23         ` Daniel Jordan
@ 2019-01-15  1:17           ` Andrea Parri
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-15  1:17 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, dan.carpenter, shli, ying.huang, dave.hansen, sfr, osandov,
	tj, ak, linux-mm, kernel-janitors, paulmck, stern, peterz,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>

Please feel free to add:

Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>

  Andrea


> 
> ---
> 
> I'd appreciate it if someone more familiar with memory barriers could
> check this over.  Thanks.
> 
> Probably no need for stable, this is all theoretical.
> 
> Against linux-mmotm tag v5.0-rc1-mmotm-2019-01-09-13-40
> 
>  mm/swapfile.c | 43 +++++++++++++++++++++++++++----------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f0edf7244256..dad52fc67045 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -99,6 +99,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
>  
>  atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
> +static struct swap_info_struct *swap_type_to_swap_info(int type)
> +{
> +	if (type >= READ_ONCE(nr_swapfiles))
> +		return NULL;
> +
> +	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	return READ_ONCE(swap_info[type]);
> +}
> +
>  static inline unsigned char swap_count(unsigned char ent)
>  {
>  	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
> @@ -1045,12 +1054,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  /* The only caller of this function is now suspend routine */
>  swp_entry_t get_swap_page_of_type(int type)
>  {
> -	struct swap_info_struct *si;
> +	struct swap_info_struct *si = swap_type_to_swap_info(type);
>  	pgoff_t offset;
>  
> -	si = swap_info[type];
> +	if (!si)
> +		goto fail;
> +
>  	spin_lock(&si->lock);
> -	if (si && (si->flags & SWP_WRITEOK)) {
> +	if (si->flags & SWP_WRITEOK) {
>  		atomic_long_dec(&nr_swap_pages);
>  		/* This is called for allocating swap entry, not cache */
>  		offset = scan_swap_map(si, 1);
> @@ -1061,6 +1072,7 @@ swp_entry_t get_swap_page_of_type(int type)
>  		atomic_long_inc(&nr_swap_pages);
>  	}
>  	spin_unlock(&si->lock);
> +fail:
>  	return (swp_entry_t) {0};
>  }
>  
> @@ -1072,9 +1084,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
>  	if (!entry.val)
>  		goto out;
>  	type = swp_type(entry);
> -	if (type >= nr_swapfiles)
> +	p = swap_type_to_swap_info(type);
> +	if (!p)
>  		goto bad_nofile;
> -	p = swap_info[type];
>  	if (!(p->flags & SWP_USED))
>  		goto bad_device;
>  	offset = swp_offset(entry);
> @@ -1212,9 +1224,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
>  	if (!entry.val)
>  		goto out;
>  	type = swp_type(entry);
> -	if (type >= nr_swapfiles)
> +	si = swap_type_to_swap_info(type);
> +	if (!si)
>  		goto bad_nofile;
> -	si = swap_info[type];
>  
>  	preempt_disable();
>  	if (!(si->flags & SWP_VALID))
> @@ -1765,10 +1777,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
>  sector_t swapdev_block(int type, pgoff_t offset)
>  {
>  	struct block_device *bdev;
> +	struct swap_info_struct *si = swap_type_to_swap_info(type);
>  
> -	if ((unsigned int)type >= nr_swapfiles)
> -		return 0;
> -	if (!(swap_info[type]->flags & SWP_WRITEOK))
> +	if (!si || !(si->flags & SWP_WRITEOK))
>  		return 0;
>  	return map_swap_entry(swp_entry(type, offset), &bdev);
>  }
> @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
>  	if (!l)
>  		return SEQ_START_TOKEN;
>  
> -	for (type = 0; type < nr_swapfiles; type++) {
> +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		if (!--l)
> @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
>  	else
>  		type = si->type + 1;
>  
> -	for (; type < nr_swapfiles; type++) {
> +	for (; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		++*pos;
> @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		swap_info[type] = p;
> +		WRITE_ONCE(swap_info[type], p);
>  		/*
>  		 * Write swap_info[type] before nr_swapfiles, in case a
>  		 * racing procfs swap_start() or swap_next() is reading them.
>  		 * (We never shrink nr_swapfiles, we never free this entry.)
>  		 */
>  		smp_wmb();
> -		nr_swapfiles++;
> +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>  	} else {
>  		kvfree(p);
>  		p = swap_info[type];
> -- 
> 2.20.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-15  1:17           ` Andrea Parri
  0 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-01-15  1:17 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, dan.carpenter, shli, ying.huang, dave.hansen, sfr, osandov,
	tj, ak, linux-mm, kernel-janitors, paulmck, stern, peterz,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>

Please feel free to add:

Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>

  Andrea


> 
> ---
> 
> I'd appreciate it if someone more familiar with memory barriers could
> check this over.  Thanks.
> 
> Probably no need for stable, this is all theoretical.
> 
> Against linux-mmotm tag v5.0-rc1-mmotm-2019-01-09-13-40
> 
>  mm/swapfile.c | 43 +++++++++++++++++++++++++++----------------
>  1 file changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index f0edf7244256..dad52fc67045 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -99,6 +99,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
>  
>  atomic_t nr_rotate_swap = ATOMIC_INIT(0);
>  
> +static struct swap_info_struct *swap_type_to_swap_info(int type)
> +{
> +	if (type >= READ_ONCE(nr_swapfiles))
> +		return NULL;
> +
> +	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	return READ_ONCE(swap_info[type]);
> +}
> +
>  static inline unsigned char swap_count(unsigned char ent)
>  {
>  	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
> @@ -1045,12 +1054,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
>  /* The only caller of this function is now suspend routine */
>  swp_entry_t get_swap_page_of_type(int type)
>  {
> -	struct swap_info_struct *si;
> +	struct swap_info_struct *si = swap_type_to_swap_info(type);
>  	pgoff_t offset;
>  
> -	si = swap_info[type];
> +	if (!si)
> +		goto fail;
> +
>  	spin_lock(&si->lock);
> -	if (si && (si->flags & SWP_WRITEOK)) {
> +	if (si->flags & SWP_WRITEOK) {
>  		atomic_long_dec(&nr_swap_pages);
>  		/* This is called for allocating swap entry, not cache */
>  		offset = scan_swap_map(si, 1);
> @@ -1061,6 +1072,7 @@ swp_entry_t get_swap_page_of_type(int type)
>  		atomic_long_inc(&nr_swap_pages);
>  	}
>  	spin_unlock(&si->lock);
> +fail:
>  	return (swp_entry_t) {0};
>  }
>  
> @@ -1072,9 +1084,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
>  	if (!entry.val)
>  		goto out;
>  	type = swp_type(entry);
> -	if (type >= nr_swapfiles)
> +	p = swap_type_to_swap_info(type);
> +	if (!p)
>  		goto bad_nofile;
> -	p = swap_info[type];
>  	if (!(p->flags & SWP_USED))
>  		goto bad_device;
>  	offset = swp_offset(entry);
> @@ -1212,9 +1224,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
>  	if (!entry.val)
>  		goto out;
>  	type = swp_type(entry);
> -	if (type >= nr_swapfiles)
> +	si = swap_type_to_swap_info(type);
> +	if (!si)
>  		goto bad_nofile;
> -	si = swap_info[type];
>  
>  	preempt_disable();
>  	if (!(si->flags & SWP_VALID))
> @@ -1765,10 +1777,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
>  sector_t swapdev_block(int type, pgoff_t offset)
>  {
>  	struct block_device *bdev;
> +	struct swap_info_struct *si = swap_type_to_swap_info(type);
>  
> -	if ((unsigned int)type >= nr_swapfiles)
> -		return 0;
> -	if (!(swap_info[type]->flags & SWP_WRITEOK))
> +	if (!si || !(si->flags & SWP_WRITEOK))
>  		return 0;
>  	return map_swap_entry(swp_entry(type, offset), &bdev);
>  }
> @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
>  	if (!l)
>  		return SEQ_START_TOKEN;
>  
> -	for (type = 0; type < nr_swapfiles; type++) {
> +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		if (!--l)
> @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
>  	else
>  		type = si->type + 1;
>  
> -	for (; type < nr_swapfiles; type++) {
> +	for (; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		++*pos;
> @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		swap_info[type] = p;
> +		WRITE_ONCE(swap_info[type], p);
>  		/*
>  		 * Write swap_info[type] before nr_swapfiles, in case a
>  		 * racing procfs swap_start() or swap_next() is reading them.
>  		 * (We never shrink nr_swapfiles, we never free this entry.)
>  		 */
>  		smp_wmb();
> -		nr_swapfiles++;
> +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>  	} else {
>  		kvfree(p);
>  		p = swap_info[type];
> -- 
> 2.20.0
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-15  0:23         ` Daniel Jordan
@ 2019-01-30  6:26           ` Andrew Morton
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2019-01-30  6:26 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Mon, 14 Jan 2019 19:23:05 -0500 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 

LGTM, but like most people I'm afraid to ack it ;)

mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
stuck so can you please redo this against mainline?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-30  6:26           ` Andrew Morton
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2019-01-30  6:26 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Mon, 14 Jan 2019 19:23:05 -0500 Daniel Jordan <daniel.m.jordan@oracle.com> wrote:

> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 

LGTM, but like most people I'm afraid to ack it ;)

mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
stuck so can you please redo this against mainline?


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-15  0:23         ` Daniel Jordan
@ 2019-01-30  7:28           ` Dan Carpenter
  -1 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-30  7:28 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, andrea.parri, shli, ying.huang, dave.hansen, sfr, osandov,
	tj, ak, linux-mm, kernel-janitors, paulmck, stern, peterz,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>
> 
> ---
> 
> I'd appreciate it if someone more familiar with memory barriers could
> check this over.  Thanks.
> 
> Probably no need for stable, this is all theoretical.
> 

The NULL dereference part is not theoretical.  It require CAP_SYS_ADMIN
so it's not a huge deal, but you could trigger it from snapshot_ioctl()
with SNAPSHOT_ALLOC_SWAP_PAGE.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-30  7:28           ` Dan Carpenter
  0 siblings, 0 replies; 55+ messages in thread
From: Dan Carpenter @ 2019-01-30  7:28 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, andrea.parri, shli, ying.huang, dave.hansen, sfr, osandov,
	tj, ak, linux-mm, kernel-janitors, paulmck, stern, peterz,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Cc: Alan Stern <stern@rowland.harvard.edu>
> Cc: Andi Kleen <ak@linux.intel.com>
> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Dan Carpenter <dan.carpenter@oracle.com>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>
> Cc: Huang Ying <ying.huang@intel.com>
> Cc: Omar Sandoval <osandov@fb.com>
> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
> Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
> Cc: Shaohua Li <shli@kernel.org>
> Cc: Stephen Rothwell <sfr@canb.auug.org.au>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Will Deacon <will.deacon@arm.com>
> 
> ---
> 
> I'd appreciate it if someone more familiar with memory barriers could
> check this over.  Thanks.
> 
> Probably no need for stable, this is all theoretical.
> 

The NULL dereference part is not theoretical.  It require CAP_SYS_ADMIN
so it's not a huge deal, but you could trigger it from snapshot_ioctl()
with SNAPSHOT_ALLOC_SWAP_PAGE.

regards,
dan carpenter



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-15  0:23         ` Daniel Jordan
@ 2019-01-30  9:13           ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2019-01-30  9:13 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>

A few comments below, but:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> +static struct swap_info_struct *swap_type_to_swap_info(int type)
> +{
> +	if (type >= READ_ONCE(nr_swapfiles))
> +		return NULL;
> +
> +	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	return READ_ONCE(swap_info[type]);
> +}

> @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
>  	if (!l)
>  		return SEQ_START_TOKEN;
>  
> -	for (type = 0; type < nr_swapfiles; type++) {
> +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		if (!--l)
> @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
>  	else
>  		type = si->type + 1;
>  
> -	for (; type < nr_swapfiles; type++) {
> +	for (; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		++*pos;

You could write those like:

	for (; (si = swap_type_to_swap_info(type)); type++)

> @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		swap_info[type] = p;
> +		WRITE_ONCE(swap_info[type], p);
>  		/*
>  		 * Write swap_info[type] before nr_swapfiles, in case a
>  		 * racing procfs swap_start() or swap_next() is reading them.
>  		 * (We never shrink nr_swapfiles, we never free this entry.)
>  		 */
>  		smp_wmb();
> -		nr_swapfiles++;
> +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>  	} else {
>  		kvfree(p);
>  		p = swap_info[type];

It is also possible to write this with smp_load_acquire() /
smp_store_release(). ARM64/RISC-V might benefit from that, OTOH ARM
won't like that much.

Dunno what would be better.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-30  9:13           ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2019-01-30  9:13 UTC (permalink / raw)
  To: Daniel Jordan
  Cc: akpm, dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	will.deacon

On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> Dan Carpenter reports a potential NULL dereference in
> get_swap_page_of_type:
> 
>   Smatch complains that the NULL checks on "si" aren't consistent.  This
>   seems like a real bug because we have not ensured that the type is
>   valid and so "si" can be NULL.
> 
> Add the missing check for NULL, taking care to use a read barrier to
> ensure CPU1 observes CPU0's updates in the correct order:
> 
>         CPU0                           CPU1
>         alloc_swap_info()              if (type >= nr_swapfiles)
>           swap_info[type] = p              /* handle invalid entry */
>           smp_wmb()                    smp_rmb()
>           ++nr_swapfiles               p = swap_info[type]
> 
> Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
> CPU0's write to swap_info[type] and read NULL from swap_info[type].
> 
> Ying Huang noticed that other places don't order these reads properly.
> Introduce swap_type_to_swap_info to encourage correct usage.
> 
> Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
> (see tools/memory-model/Documentation/explanation.txt).
> 
> This ordering need not be enforced in places where swap_lock is held
> (e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
> and the swap_info array.
> 
> This is a theoretical problem, no actual reports of it exist.
> 
> Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>

A few comments below, but:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> +static struct swap_info_struct *swap_type_to_swap_info(int type)
> +{
> +	if (type >= READ_ONCE(nr_swapfiles))
> +		return NULL;
> +
> +	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
> +	return READ_ONCE(swap_info[type]);
> +}

> @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
>  	if (!l)
>  		return SEQ_START_TOKEN;
>  
> -	for (type = 0; type < nr_swapfiles; type++) {
> +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		if (!--l)
> @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
>  	else
>  		type = si->type + 1;
>  
> -	for (; type < nr_swapfiles; type++) {
> +	for (; type < READ_ONCE(nr_swapfiles); type++) {
>  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> -		si = swap_info[type];
> +		si = READ_ONCE(swap_info[type]);
>  		if (!(si->flags & SWP_USED) || !si->swap_map)
>  			continue;
>  		++*pos;

You could write those like:

	for (; (si = swap_type_to_swap_info(type)); type++)

> @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
>  	}
>  	if (type >= nr_swapfiles) {
>  		p->type = type;
> -		swap_info[type] = p;
> +		WRITE_ONCE(swap_info[type], p);
>  		/*
>  		 * Write swap_info[type] before nr_swapfiles, in case a
>  		 * racing procfs swap_start() or swap_next() is reading them.
>  		 * (We never shrink nr_swapfiles, we never free this entry.)
>  		 */
>  		smp_wmb();
> -		nr_swapfiles++;
> +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
>  	} else {
>  		kvfree(p);
>  		p = swap_info[type];

It is also possible to write this with smp_load_acquire() /
smp_store_release(). ARM64/RISC-V might benefit from that, OTOH ARM
won't like that much.

Dunno what would be better.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-30  6:26           ` Andrew Morton
@ 2019-01-31  1:52             ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, ying.huang,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, will.deacon

On Tue, Jan 29, 2019 at 10:26:22PM -0800, Andrew Morton wrote:
> LGTM, but like most people I'm afraid to ack it ;)
> 
> mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> stuck so can you please redo this against mainline?

Yep, I can do that.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-31  1:52             ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  1:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, ying.huang,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, will.deacon

On Tue, Jan 29, 2019 at 10:26:22PM -0800, Andrew Morton wrote:
> LGTM, but like most people I'm afraid to ack it ;)
> 
> mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> stuck so can you please redo this against mainline?

Yep, I can do that.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-30  7:28           ` Dan Carpenter
@ 2019-01-31  1:55             ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  1:55 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Daniel Jordan, akpm, andrea.parri, shli, ying.huang, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Wed, Jan 30, 2019 at 10:28:46AM +0300, Dan Carpenter wrote:
> On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> > Probably no need for stable, this is all theoretical.
> 
> The NULL dereference part is not theoretical.  It require CAP_SYS_ADMIN
> so it's not a huge deal, but you could trigger it from snapshot_ioctl()
> with SNAPSHOT_ALLOC_SWAP_PAGE.

Ok, I'll amend the changelog for v2.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-31  1:55             ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  1:55 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Daniel Jordan, akpm, andrea.parri, shli, ying.huang, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Wed, Jan 30, 2019 at 10:28:46AM +0300, Dan Carpenter wrote:
> On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> > Probably no need for stable, this is all theoretical.
> 
> The NULL dereference part is not theoretical.  It require CAP_SYS_ADMIN
> so it's not a huge deal, but you could trigger it from snapshot_ioctl()
> with SNAPSHOT_ALLOC_SWAP_PAGE.

Ok, I'll amend the changelog for v2.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
  2019-01-30  9:13           ` Peter Zijlstra
@ 2019-01-31  2:00             ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  2:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Jordan, akpm, dan.carpenter, andrea.parri, shli,
	ying.huang, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, will.deacon

On Wed, Jan 30, 2019 at 10:13:16AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> 
> A few comments below, but:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks.

> > @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
> >  	if (!l)
> >  		return SEQ_START_TOKEN;
> >  
> > -	for (type = 0; type < nr_swapfiles; type++) {
> > +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
> >  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> > -		si = swap_info[type];
> > +		si = READ_ONCE(swap_info[type]);
> >  		if (!(si->flags & SWP_USED) || !si->swap_map)
> >  			continue;
> >  		if (!--l)
> > @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
> >  	else
> >  		type = si->type + 1;
> >  
> > -	for (; type < nr_swapfiles; type++) {
> > +	for (; type < READ_ONCE(nr_swapfiles); type++) {
> >  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> > -		si = swap_info[type];
> > +		si = READ_ONCE(swap_info[type]);
> >  		if (!(si->flags & SWP_USED) || !si->swap_map)
> >  			continue;
> >  		++*pos;
> 
> You could write those like:
> 
> 	for (; (si = swap_type_to_swap_info(type)); type++)

That's clever, and way better than the ugly iterator macro I wrote and then
deleted in disgust.

> > @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
> >  	}
> >  	if (type >= nr_swapfiles) {
> >  		p->type = type;
> > -		swap_info[type] = p;
> > +		WRITE_ONCE(swap_info[type], p);
> >  		/*
> >  		 * Write swap_info[type] before nr_swapfiles, in case a
> >  		 * racing procfs swap_start() or swap_next() is reading them.
> >  		 * (We never shrink nr_swapfiles, we never free this entry.)
> >  		 */
> >  		smp_wmb();
> > -		nr_swapfiles++;
> > +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> >  	} else {
> >  		kvfree(p);
> >  		p = swap_info[type];
> 
> It is also possible to write this with smp_load_acquire() /
> smp_store_release(). ARM64/RISC-V might benefit from that, OTOH ARM
> won't like that much.
> 
> Dunno what would be better.

I just left it as-is for now.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs
@ 2019-01-31  2:00             ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  2:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Daniel Jordan, akpm, dan.carpenter, andrea.parri, shli,
	ying.huang, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, will.deacon

On Wed, Jan 30, 2019 at 10:13:16AM +0100, Peter Zijlstra wrote:
> On Mon, Jan 14, 2019 at 07:23:05PM -0500, Daniel Jordan wrote:
> 
> A few comments below, but:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks.

> > @@ -2799,9 +2810,9 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
> >  	if (!l)
> >  		return SEQ_START_TOKEN;
> >  
> > -	for (type = 0; type < nr_swapfiles; type++) {
> > +	for (type = 0; type < READ_ONCE(nr_swapfiles); type++) {
> >  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> > -		si = swap_info[type];
> > +		si = READ_ONCE(swap_info[type]);
> >  		if (!(si->flags & SWP_USED) || !si->swap_map)
> >  			continue;
> >  		if (!--l)
> > @@ -2821,9 +2832,9 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
> >  	else
> >  		type = si->type + 1;
> >  
> > -	for (; type < nr_swapfiles; type++) {
> > +	for (; type < READ_ONCE(nr_swapfiles); type++) {
> >  		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
> > -		si = swap_info[type];
> > +		si = READ_ONCE(swap_info[type]);
> >  		if (!(si->flags & SWP_USED) || !si->swap_map)
> >  			continue;
> >  		++*pos;
> 
> You could write those like:
> 
> 	for (; (si = swap_type_to_swap_info(type)); type++)

That's clever, and way better than the ugly iterator macro I wrote and then
deleted in disgust.

> > @@ -2930,14 +2941,14 @@ static struct swap_info_struct *alloc_swap_info(void)
> >  	}
> >  	if (type >= nr_swapfiles) {
> >  		p->type = type;
> > -		swap_info[type] = p;
> > +		WRITE_ONCE(swap_info[type], p);
> >  		/*
> >  		 * Write swap_info[type] before nr_swapfiles, in case a
> >  		 * racing procfs swap_start() or swap_next() is reading them.
> >  		 * (We never shrink nr_swapfiles, we never free this entry.)
> >  		 */
> >  		smp_wmb();
> > -		nr_swapfiles++;
> > +		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
> >  	} else {
> >  		kvfree(p);
> >  		p = swap_info[type];
> 
> It is also possible to write this with smp_load_acquire() /
> smp_store_release(). ARM64/RISC-V might benefit from that, OTOH ARM
> won't like that much.
> 
> Dunno what would be better.

I just left it as-is for now.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2] mm, swap: bounds check swap_info array accesses to avoid NULL derefs
  2019-01-31  1:52             ` Daniel Jordan
@ 2019-01-31  2:44               ` Daniel Jordan
  -1 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  2:44 UTC (permalink / raw)
  To: akpm
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon, daniel.m.jordan

Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:

  Smatch complains that the NULL checks on "si" aren't consistent.  This
  seems like a real bug because we have not ensured that the type is
  valid and so "si" can be NULL.

Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:

     CPU0                           CPU1
     alloc_swap_info()              if (type >= nr_swapfiles)
       swap_info[type] = p              /* handle invalid entry */
       smp_wmb()                    smp_rmb()
       ++nr_swapfiles               p = swap_info[type]

Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].

Ying Huang noticed other places in swapfile.c don't order these reads
properly.  Introduce swap_type_to_swap_info to encourage correct usage.

Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).

This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
---

v1 -> v2
 - Rebased to latest mainline per Andrew
 - Added tags from Andrea and Peter (thanks)
 - Fixed loop conditions as Peter suggested and converted a few more
   places to use swap_type_to_swap_info; retained tags since changes
   were minor
 - Added a Suggested-by for Ying
 - Amended changelog after Dan's comment

 mm/swapfile.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index dbac1d49469d..67f60e051814 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -98,6 +98,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
 
 atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
+static struct swap_info_struct *swap_type_to_swap_info(int type)
+{
+	if (type >= READ_ONCE(nr_swapfiles))
+		return NULL;
+
+	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
+	return READ_ONCE(swap_info[type]);
+}
+
 static inline unsigned char swap_count(unsigned char ent)
 {
 	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
@@ -1044,12 +1053,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 /* The only caller of this function is now suspend routine */
 swp_entry_t get_swap_page_of_type(int type)
 {
-	struct swap_info_struct *si;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 	pgoff_t offset;
 
-	si = swap_info[type];
+	if (!si)
+		goto fail;
+
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1060,6 +1071,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
@@ -1071,9 +1083,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swap_type_to_swap_info(type);
+	if (!p)
 		goto bad_nofile;
-	p = swap_info[type];
 	if (!(p->flags & SWP_USED))
 		goto bad_device;
 	offset = swp_offset(entry);
@@ -1697,10 +1709,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
 sector_t swapdev_block(int type, pgoff_t offset)
 {
 	struct block_device *bdev;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 
-	if ((unsigned int)type >= nr_swapfiles)
-		return 0;
-	if (!(swap_info[type]->flags & SWP_WRITEOK))
+	if (!si || !(si->flags & SWP_WRITEOK))
 		return 0;
 	return map_swap_entry(swp_entry(type, offset), &bdev);
 }
@@ -2258,7 +2269,7 @@ static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
 	struct swap_extent *se;
 	pgoff_t offset;
 
-	sis = swap_info[swp_type(entry)];
+	sis = swp_swap_info(entry);
 	*bdev = sis->bdev;
 
 	offset = swp_offset(entry);
@@ -2700,9 +2711,7 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
 	if (!l)
 		return SEQ_START_TOKEN;
 
-	for (type = 0; type < nr_swapfiles; type++) {
-		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+	for (type = 0; (si = swap_type_to_swap_info(type)); type++) {
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		if (!--l)
@@ -2722,9 +2731,7 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
 	else
 		type = si->type + 1;
 
-	for (; type < nr_swapfiles; type++) {
-		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+	for (; (si = swap_type_to_swap_info(type)); type++) {
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		++*pos;
@@ -2831,14 +2838,14 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		swap_info[type] = p;
+		WRITE_ONCE(swap_info[type], p);
 		/*
 		 * Write swap_info[type] before nr_swapfiles, in case a
 		 * racing procfs swap_start() or swap_next() is reading them.
 		 * (We never shrink nr_swapfiles, we never free this entry.)
 		 */
 		smp_wmb();
-		nr_swapfiles++;
+		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
 	} else {
 		kvfree(p);
 		p = swap_info[type];
@@ -3358,7 +3365,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
 {
 	struct swap_info_struct *p;
 	struct swap_cluster_info *ci;
-	unsigned long offset, type;
+	unsigned long offset;
 	unsigned char count;
 	unsigned char has_cache;
 	int err = -EINVAL;
@@ -3366,10 +3373,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
 	if (non_swap_entry(entry))
 		goto out;
 
-	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swp_swap_info(entry);
+	if (!p)
 		goto bad_file;
-	p = swap_info[type];
+
 	offset = swp_offset(entry);
 	if (unlikely(offset >= p->max))
 		goto out;
@@ -3466,7 +3473,7 @@ int swapcache_prepare(swp_entry_t entry)
 
 struct swap_info_struct *swp_swap_info(swp_entry_t entry)
 {
-	return swap_info[swp_type(entry)];
+	return swap_type_to_swap_info(swp_type(entry));
 }
 
 struct swap_info_struct *page_swap_info(struct page *page)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2] mm, swap: bounds check swap_info array accesses to avoid NULL derefs
@ 2019-01-31  2:44               ` Daniel Jordan
  0 siblings, 0 replies; 55+ messages in thread
From: Daniel Jordan @ 2019-01-31  2:44 UTC (permalink / raw)
  To: akpm
  Cc: dan.carpenter, andrea.parri, shli, ying.huang, dave.hansen, sfr,
	osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon, daniel.m.jordan

Dan Carpenter reports a potential NULL dereference in
get_swap_page_of_type:

  Smatch complains that the NULL checks on "si" aren't consistent.  This
  seems like a real bug because we have not ensured that the type is
  valid and so "si" can be NULL.

Add the missing check for NULL, taking care to use a read barrier to
ensure CPU1 observes CPU0's updates in the correct order:

     CPU0                           CPU1
     alloc_swap_info()              if (type >= nr_swapfiles)
       swap_info[type] = p              /* handle invalid entry */
       smp_wmb()                    smp_rmb()
       ++nr_swapfiles               p = swap_info[type]

Without smp_rmb, CPU1 might observe CPU0's write to nr_swapfiles before
CPU0's write to swap_info[type] and read NULL from swap_info[type].

Ying Huang noticed other places in swapfile.c don't order these reads
properly.  Introduce swap_type_to_swap_info to encourage correct usage.

Use READ_ONCE and WRITE_ONCE to follow the Linux Kernel Memory Model
(see tools/memory-model/Documentation/explanation.txt).

This ordering need not be enforced in places where swap_lock is held
(e.g. si_swapinfo) because swap_lock serializes updates to nr_swapfiles
and the swap_info array.

Fixes: ec8acf20afb8 ("swap: add per-partition lock for swapfile")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
---

v1 -> v2
 - Rebased to latest mainline per Andrew
 - Added tags from Andrea and Peter (thanks)
 - Fixed loop conditions as Peter suggested and converted a few more
   places to use swap_type_to_swap_info; retained tags since changes
   were minor
 - Added a Suggested-by for Ying
 - Amended changelog after Dan's comment

 mm/swapfile.c | 51 +++++++++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 22 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index dbac1d49469d..67f60e051814 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -98,6 +98,15 @@ static atomic_t proc_poll_event = ATOMIC_INIT(0);
 
 atomic_t nr_rotate_swap = ATOMIC_INIT(0);
 
+static struct swap_info_struct *swap_type_to_swap_info(int type)
+{
+	if (type >= READ_ONCE(nr_swapfiles))
+		return NULL;
+
+	smp_rmb();	/* Pairs with smp_wmb in alloc_swap_info. */
+	return READ_ONCE(swap_info[type]);
+}
+
 static inline unsigned char swap_count(unsigned char ent)
 {
 	return ent & ~SWAP_HAS_CACHE;	/* may include COUNT_CONTINUED flag */
@@ -1044,12 +1053,14 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_size)
 /* The only caller of this function is now suspend routine */
 swp_entry_t get_swap_page_of_type(int type)
 {
-	struct swap_info_struct *si;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 	pgoff_t offset;
 
-	si = swap_info[type];
+	if (!si)
+		goto fail;
+
 	spin_lock(&si->lock);
-	if (si && (si->flags & SWP_WRITEOK)) {
+	if (si->flags & SWP_WRITEOK) {
 		atomic_long_dec(&nr_swap_pages);
 		/* This is called for allocating swap entry, not cache */
 		offset = scan_swap_map(si, 1);
@@ -1060,6 +1071,7 @@ swp_entry_t get_swap_page_of_type(int type)
 		atomic_long_inc(&nr_swap_pages);
 	}
 	spin_unlock(&si->lock);
+fail:
 	return (swp_entry_t) {0};
 }
 
@@ -1071,9 +1083,9 @@ static struct swap_info_struct *__swap_info_get(swp_entry_t entry)
 	if (!entry.val)
 		goto out;
 	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swap_type_to_swap_info(type);
+	if (!p)
 		goto bad_nofile;
-	p = swap_info[type];
 	if (!(p->flags & SWP_USED))
 		goto bad_device;
 	offset = swp_offset(entry);
@@ -1697,10 +1709,9 @@ int swap_type_of(dev_t device, sector_t offset, struct block_device **bdev_p)
 sector_t swapdev_block(int type, pgoff_t offset)
 {
 	struct block_device *bdev;
+	struct swap_info_struct *si = swap_type_to_swap_info(type);
 
-	if ((unsigned int)type >= nr_swapfiles)
-		return 0;
-	if (!(swap_info[type]->flags & SWP_WRITEOK))
+	if (!si || !(si->flags & SWP_WRITEOK))
 		return 0;
 	return map_swap_entry(swp_entry(type, offset), &bdev);
 }
@@ -2258,7 +2269,7 @@ static sector_t map_swap_entry(swp_entry_t entry, struct block_device **bdev)
 	struct swap_extent *se;
 	pgoff_t offset;
 
-	sis = swap_info[swp_type(entry)];
+	sis = swp_swap_info(entry);
 	*bdev = sis->bdev;
 
 	offset = swp_offset(entry);
@@ -2700,9 +2711,7 @@ static void *swap_start(struct seq_file *swap, loff_t *pos)
 	if (!l)
 		return SEQ_START_TOKEN;
 
-	for (type = 0; type < nr_swapfiles; type++) {
-		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+	for (type = 0; (si = swap_type_to_swap_info(type)); type++) {
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		if (!--l)
@@ -2722,9 +2731,7 @@ static void *swap_next(struct seq_file *swap, void *v, loff_t *pos)
 	else
 		type = si->type + 1;
 
-	for (; type < nr_swapfiles; type++) {
-		smp_rmb();	/* read nr_swapfiles before swap_info[type] */
-		si = swap_info[type];
+	for (; (si = swap_type_to_swap_info(type)); type++) {
 		if (!(si->flags & SWP_USED) || !si->swap_map)
 			continue;
 		++*pos;
@@ -2831,14 +2838,14 @@ static struct swap_info_struct *alloc_swap_info(void)
 	}
 	if (type >= nr_swapfiles) {
 		p->type = type;
-		swap_info[type] = p;
+		WRITE_ONCE(swap_info[type], p);
 		/*
 		 * Write swap_info[type] before nr_swapfiles, in case a
 		 * racing procfs swap_start() or swap_next() is reading them.
 		 * (We never shrink nr_swapfiles, we never free this entry.)
 		 */
 		smp_wmb();
-		nr_swapfiles++;
+		WRITE_ONCE(nr_swapfiles, nr_swapfiles + 1);
 	} else {
 		kvfree(p);
 		p = swap_info[type];
@@ -3358,7 +3365,7 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
 {
 	struct swap_info_struct *p;
 	struct swap_cluster_info *ci;
-	unsigned long offset, type;
+	unsigned long offset;
 	unsigned char count;
 	unsigned char has_cache;
 	int err = -EINVAL;
@@ -3366,10 +3373,10 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage)
 	if (non_swap_entry(entry))
 		goto out;
 
-	type = swp_type(entry);
-	if (type >= nr_swapfiles)
+	p = swp_swap_info(entry);
+	if (!p)
 		goto bad_file;
-	p = swap_info[type];
+
 	offset = swp_offset(entry);
 	if (unlikely(offset >= p->max))
 		goto out;
@@ -3466,7 +3473,7 @@ int swapcache_prepare(swp_entry_t entry)
 
 struct swap_info_struct *swp_swap_info(swp_entry_t entry)
 {
-	return swap_info[swp_type(entry)];
+	return swap_type_to_swap_info(swp_type(entry));
 }
 
 struct swap_info_struct *page_swap_info(struct page *page)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL d
  2019-01-30  6:26           ` Andrew Morton
@ 2019-01-31  2:48             ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-01-31  2:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

Andrew Morton <akpm@linux-foundation.org> writes:
> mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> stuck so can you please redo this against mainline?

Allow me to be off topic, this patch has been in mm tree for quite some
time, what can I do to help this be merged upstream?

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 55+ messages in thread

* About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-01-31  2:48             ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-01-31  2:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

Andrew Morton <akpm@linux-foundation.org> writes:
> mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> stuck so can you please redo this against mainline?

Allow me to be off topic, this patch has been in mm tree for quite some
time, what can I do to help this be merged upstream?

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-01-31  2:48             ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
@ 2019-01-31 20:46               ` Andrew Morton
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2019-01-31 20:46 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > stuck so can you please redo this against mainline?
> 
> Allow me to be off topic, this patch has been in mm tree for quite some
> time, what can I do to help this be merged upstream?

I have no evidence that it has been reviewed, for a start.  I've asked
Hugh to look at it.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-01-31 20:46               ` Andrew Morton
  0 siblings, 0 replies; 55+ messages in thread
From: Andrew Morton @ 2019-01-31 20:46 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > stuck so can you please redo this against mainline?
> 
> Allow me to be off topic, this patch has been in mm tree for quite some
> time, what can I do to help this be merged upstream?

I have no evidence that it has been reviewed, for a start.  I've asked
Hugh to look at it.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-01-31 20:46               ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Andrew Morton
@ 2019-02-02  7:14                 ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-02  7:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>> > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > stuck so can you please redo this against mainline?
>> 
>> Allow me to be off topic, this patch has been in mm tree for quite some
>> time, what can I do to help this be merged upstream?
>
> I have no evidence that it has been reviewed, for a start.  I've asked
> Hugh to look at it.

Got it!  I will try to find some people to review it.

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-02  7:14                 ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-02  7:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Daniel Jordan, dan.carpenter, andrea.parri, shli, dave.hansen,
	sfr, osandov, tj, ak, linux-mm, kernel-janitors, paulmck, stern,
	peterz, will.deacon

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>> > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > stuck so can you please redo this against mainline?
>> 
>> Allow me to be off topic, this patch has been in mm tree for quite some
>> time, what can I do to help this be merged upstream?
>
> I have no evidence that it has been reviewed, for a start.  I've asked
> Hugh to look at it.

Got it!  I will try to find some people to review it.

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-01-31 20:46               ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Andrew Morton
@ 2019-02-04 21:37                 ` Hugh Dickins
  -1 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2019-02-04 21:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang, Ying, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon, hughd

On Thu, 31 Jan 2019, Andrew Morton wrote:
> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > Andrew Morton <akpm@linux-foundation.org> writes:
> > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > stuck so can you please redo this against mainline?
> > 
> > Allow me to be off topic, this patch has been in mm tree for quite some
> > time, what can I do to help this be merged upstream?

Wow, yes, it's about a year old.

> 
> I have no evidence that it has been reviewed, for a start.  I've asked
> Hugh to look at it.

I tried at the weekend.  Usual story: I don't like it at all, the
ever-increasing complexity there, but certainly understand the need
for that fix, and have not managed to think up anything better -
and now I need to switch away, sorry.

The multiple dynamically allocated and freed swapper address spaces
have indeed broken what used to make it safe.  If those imaginary
address spaces did not have to be virtually contiguous, I'd say
cache them and reuse them, instead of freeing.  But I don't see
how to do that as it stands.

find_get_page(swapper_address_space(entry), swp_offset(entry)) has
become an unsafe construct, where it used to be safe against corrupted
page tables.  Maybe we don't care so much about crashing on corrupted
page tables nowadays (I haven't heard recent complaints), and I think
Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
happen to be the only instances that need to be guarded against swapoff
(the others are working with page table locked).

The array of arrays of swapper spaces is all just to get a separate
lock for separate extents of the swapfile: I wonder whether Matthew has
anything in mind for that in XArray (I think Peter once got it working
in radix-tree, but the overhead not so good).

(I was originally horrified by the stop_machine() added in swapon and
swapoff, but perhaps I'm remembering a distant past of really stopping
the machine: stop_machine() today looked reasonable, something to avoid
generally like lru_add_drain_all(), but not as shameful as I thought.)

Hugh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-04 21:37                 ` Hugh Dickins
  0 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2019-02-04 21:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang, Ying, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon, hughd

On Thu, 31 Jan 2019, Andrew Morton wrote:
> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > Andrew Morton <akpm@linux-foundation.org> writes:
> > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > stuck so can you please redo this against mainline?
> > 
> > Allow me to be off topic, this patch has been in mm tree for quite some
> > time, what can I do to help this be merged upstream?

Wow, yes, it's about a year old.

> 
> I have no evidence that it has been reviewed, for a start.  I've asked
> Hugh to look at it.

I tried at the weekend.  Usual story: I don't like it at all, the
ever-increasing complexity there, but certainly understand the need
for that fix, and have not managed to think up anything better -
and now I need to switch away, sorry.

The multiple dynamically allocated and freed swapper address spaces
have indeed broken what used to make it safe.  If those imaginary
address spaces did not have to be virtually contiguous, I'd say
cache them and reuse them, instead of freeing.  But I don't see
how to do that as it stands.

find_get_page(swapper_address_space(entry), swp_offset(entry)) has
become an unsafe construct, where it used to be safe against corrupted
page tables.  Maybe we don't care so much about crashing on corrupted
page tables nowadays (I haven't heard recent complaints), and I think
Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
happen to be the only instances that need to be guarded against swapoff
(the others are working with page table locked).

The array of arrays of swapper spaces is all just to get a separate
lock for separate extents of the swapfile: I wonder whether Matthew has
anything in mind for that in XArray (I think Peter once got it working
in radix-tree, but the overhead not so good).

(I was originally horrified by the stop_machine() added in swapon and
swapoff, but perhaps I'm remembering a distant past of really stopping
the machine: stop_machine() today looked reasonable, something to avoid
generally like lru_add_drain_all(), but not as shameful as I thought.)

Hugh


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-04 21:37                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
@ 2019-02-04 22:26                   ` Matthew Wilcox
  -1 siblings, 0 replies; 55+ messages in thread
From: Matthew Wilcox @ 2019-02-04 22:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Huang, Ying, Daniel Jordan, dan.carpenter,
	andrea.parri, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, peterz, will.deacon

On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
> On Thu, 31 Jan 2019, Andrew Morton wrote:
> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > > Andrew Morton <akpm@linux-foundation.org> writes:
> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > > stuck so can you please redo this against mainline?
> > 
> > I have no evidence that it has been reviewed, for a start.  I've asked
> > Hugh to look at it.
> 
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.
> 
> The multiple dynamically allocated and freed swapper address spaces
> have indeed broken what used to make it safe.  If those imaginary
> address spaces did not have to be virtually contiguous, I'd say
> cache them and reuse them, instead of freeing.  But I don't see
> how to do that as it stands.
> 
> find_get_page(swapper_address_space(entry), swp_offset(entry)) has
> become an unsafe construct, where it used to be safe against corrupted
> page tables.  Maybe we don't care so much about crashing on corrupted
> page tables nowadays (I haven't heard recent complaints), and I think
> Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
> happen to be the only instances that need to be guarded against swapoff
> (the others are working with page table locked).
> 
> The array of arrays of swapper spaces is all just to get a separate
> lock for separate extents of the swapfile: I wonder whether Matthew has
> anything in mind for that in XArray (I think Peter once got it working
> in radix-tree, but the overhead not so good).

Hi Hugh, thanks for putting me on the cc.

I've certainly noticed what's been going on with the swapper code, but
I've generally had a lack of tuits (round or otherwise) to really dig in
and figure out what's going on.  I've had some ideas about embedding a
spinlock in each leaf node (giving one lock per 64 slots), but I know I've
got about 800 things that I've actually promised to do ahead of looking
at doing that.

I have a suspicion that the swapper code could probably be replaced with
an allocating XArray (like the IDR) and it doesn't really need to be a
full on address_space, but I'm probably wrong because I haven't studied
the swap code in depth.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-04 22:26                   ` Matthew Wilcox
  0 siblings, 0 replies; 55+ messages in thread
From: Matthew Wilcox @ 2019-02-04 22:26 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Huang, Ying, Daniel Jordan, dan.carpenter,
	andrea.parri, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, peterz, will.deacon

On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
> On Thu, 31 Jan 2019, Andrew Morton wrote:
> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > > Andrew Morton <akpm@linux-foundation.org> writes:
> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > > stuck so can you please redo this against mainline?
> > 
> > I have no evidence that it has been reviewed, for a start.  I've asked
> > Hugh to look at it.
> 
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.
> 
> The multiple dynamically allocated and freed swapper address spaces
> have indeed broken what used to make it safe.  If those imaginary
> address spaces did not have to be virtually contiguous, I'd say
> cache them and reuse them, instead of freeing.  But I don't see
> how to do that as it stands.
> 
> find_get_page(swapper_address_space(entry), swp_offset(entry)) has
> become an unsafe construct, where it used to be safe against corrupted
> page tables.  Maybe we don't care so much about crashing on corrupted
> page tables nowadays (I haven't heard recent complaints), and I think
> Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
> happen to be the only instances that need to be guarded against swapoff
> (the others are working with page table locked).
> 
> The array of arrays of swapper spaces is all just to get a separate
> lock for separate extents of the swapfile: I wonder whether Matthew has
> anything in mind for that in XArray (I think Peter once got it working
> in radix-tree, but the overhead not so good).

Hi Hugh, thanks for putting me on the cc.

I've certainly noticed what's been going on with the swapper code, but
I've generally had a lack of tuits (round or otherwise) to really dig in
and figure out what's going on.  I've had some ideas about embedding a
spinlock in each leaf node (giving one lock per 64 slots), but I know I've
got about 800 things that I've actually promised to do ahead of looking
at doing that.

I have a suspicion that the swapper code could probably be replaced with
an allocating XArray (like the IDR) and it doesn't really need to be a
full on address_space, but I'm probably wrong because I haven't studied
the swap code in depth.


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-04 21:37                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
@ 2019-02-06  0:14                   ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-06  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hi, Hugh,

Hugh Dickins <hughd@google.com> writes:

> On Thu, 31 Jan 2019, Andrew Morton wrote:
>> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>> > Andrew Morton <akpm@linux-foundation.org> writes:
>> > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > > stuck so can you please redo this against mainline?
>> > 
>> > Allow me to be off topic, this patch has been in mm tree for quite some
>> > time, what can I do to help this be merged upstream?
>
> Wow, yes, it's about a year old.
>
>> 
>> I have no evidence that it has been reviewed, for a start.  I've asked
>> Hugh to look at it.
>
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.
>
> The multiple dynamically allocated and freed swapper address spaces
> have indeed broken what used to make it safe.  If those imaginary
> address spaces did not have to be virtually contiguous, I'd say
> cache them and reuse them, instead of freeing.  But I don't see
> how to do that as it stands.
>
> find_get_page(swapper_address_space(entry), swp_offset(entry)) has
> become an unsafe construct, where it used to be safe against corrupted
> page tables.  Maybe we don't care so much about crashing on corrupted
> page tables nowadays (I haven't heard recent complaints), and I think
> Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
> happen to be the only instances that need to be guarded against swapoff
> (the others are working with page table locked).
>
> The array of arrays of swapper spaces is all just to get a separate
> lock for separate extents of the swapfile: I wonder whether Matthew has
> anything in mind for that in XArray (I think Peter once got it working
> in radix-tree, but the overhead not so good).
>
> (I was originally horrified by the stop_machine() added in swapon and
> swapoff, but perhaps I'm remembering a distant past of really stopping
> the machine: stop_machine() today looked reasonable, something to avoid
> generally like lru_add_drain_all(), but not as shameful as I thought.)

Thanks a lot for your review and comments!

It appears that you have no strong objection for this patch?  Could I
have your "Acked-by"?

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-06  0:14                   ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-06  0:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hi, Hugh,

Hugh Dickins <hughd@google.com> writes:

> On Thu, 31 Jan 2019, Andrew Morton wrote:
>> On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>> > Andrew Morton <akpm@linux-foundation.org> writes:
>> > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > > stuck so can you please redo this against mainline?
>> > 
>> > Allow me to be off topic, this patch has been in mm tree for quite some
>> > time, what can I do to help this be merged upstream?
>
> Wow, yes, it's about a year old.
>
>> 
>> I have no evidence that it has been reviewed, for a start.  I've asked
>> Hugh to look at it.
>
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.
>
> The multiple dynamically allocated and freed swapper address spaces
> have indeed broken what used to make it safe.  If those imaginary
> address spaces did not have to be virtually contiguous, I'd say
> cache them and reuse them, instead of freeing.  But I don't see
> how to do that as it stands.
>
> find_get_page(swapper_address_space(entry), swp_offset(entry)) has
> become an unsafe construct, where it used to be safe against corrupted
> page tables.  Maybe we don't care so much about crashing on corrupted
> page tables nowadays (I haven't heard recent complaints), and I think
> Huang is correct that lookup_swap_cache() and __read_swap_cache_async()
> happen to be the only instances that need to be guarded against swapoff
> (the others are working with page table locked).
>
> The array of arrays of swapper spaces is all just to get a separate
> lock for separate extents of the swapfile: I wonder whether Matthew has
> anything in mind for that in XArray (I think Peter once got it working
> in radix-tree, but the overhead not so good).
>
> (I was originally horrified by the stop_machine() added in swapon and
> swapoff, but perhaps I'm remembering a distant past of really stopping
> the machine: stop_machine() today looked reasonable, something to avoid
> generally like lru_add_drain_all(), but not as shameful as I thought.)

Thanks a lot for your review and comments!

It appears that you have no strong objection for this patch?  Could I
have your "Acked-by"?

Best Regards,
Huang, Ying


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-06  0:14                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
@ 2019-02-06  0:36                     ` Hugh Dickins
  -1 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2019-02-06  0:36 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Hugh Dickins, Andrew Morton, Daniel Jordan, dan.carpenter,
	andrea.parri, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, peterz, willy, will.deacon

On Wed, 6 Feb 2019, Huang, Ying wrote:
> 
> Thanks a lot for your review and comments!
> 
> It appears that you have no strong objection for this patch?

That much is correct.

> Could I have your "Acked-by"?

Sorry to be so begrudging, but I have to save my Acks for when I feel
more confident in my opinion.  Here I don't think I can get beyond

Not-Nacked-by: Hugh Dickins <hughd@google.com>

I imagine Daniel would ask for some barriers in there: maybe you can
get a more generous response from him when he looks over the result.

Warmly but meanly,
Hugh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-06  0:36                     ` Hugh Dickins
  0 siblings, 0 replies; 55+ messages in thread
From: Hugh Dickins @ 2019-02-06  0:36 UTC (permalink / raw)
  To: Huang, Ying
  Cc: Hugh Dickins, Andrew Morton, Daniel Jordan, dan.carpenter,
	andrea.parri, dave.hansen, sfr, osandov, tj, ak, linux-mm,
	kernel-janitors, paulmck, stern, peterz, willy, will.deacon

On Wed, 6 Feb 2019, Huang, Ying wrote:
> 
> Thanks a lot for your review and comments!
> 
> It appears that you have no strong objection for this patch?

That much is correct.

> Could I have your "Acked-by"?

Sorry to be so begrudging, but I have to save my Acks for when I feel
more confident in my opinion.  Here I don't think I can get beyond

Not-Nacked-by: Hugh Dickins <hughd@google.com>

I imagine Daniel would ask for some barriers in there: maybe you can
get a more generous response from him when he looks over the result.

Warmly but meanly,
Hugh


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-06  0:36                     ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
@ 2019-02-06  0:58                       ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-06  0:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hugh Dickins <hughd@google.com> writes:

> On Wed, 6 Feb 2019, Huang, Ying wrote:
>> 
>> Thanks a lot for your review and comments!
>> 
>> It appears that you have no strong objection for this patch?
>
> That much is correct.
>
>> Could I have your "Acked-by"?
>
> Sorry to be so begrudging, but I have to save my Acks for when I feel
> more confident in my opinion.  Here I don't think I can get beyond
>
> Not-Nacked-by: Hugh Dickins <hughd@google.com>
>
> I imagine Daniel would ask for some barriers in there: maybe you can
> get a more generous response from him when he looks over the result.

Thanks a lot for your help!  Will ask him help too.

Best Regards,
Huang, Ying

> Warmly but meanly,
> Hugh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-06  0:58                       ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-06  0:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Daniel Jordan, dan.carpenter, andrea.parri,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hugh Dickins <hughd@google.com> writes:

> On Wed, 6 Feb 2019, Huang, Ying wrote:
>> 
>> Thanks a lot for your review and comments!
>> 
>> It appears that you have no strong objection for this patch?
>
> That much is correct.
>
>> Could I have your "Acked-by"?
>
> Sorry to be so begrudging, but I have to save my Acks for when I feel
> more confident in my opinion.  Here I don't think I can get beyond
>
> Not-Nacked-by: Hugh Dickins <hughd@google.com>
>
> I imagine Daniel would ask for some barriers in there: maybe you can
> get a more generous response from him when he looks over the result.

Thanks a lot for your help!  Will ask him help too.

Best Regards,
Huang, Ying

> Warmly but meanly,
> Hugh


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-04 21:37                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
@ 2019-02-08  0:28                   ` Andrea Parri
  -1 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-02-08  0:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Huang, Ying, Daniel Jordan, dan.carpenter,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hi Huang, Ying,

On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
> On Thu, 31 Jan 2019, Andrew Morton wrote:
> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > > Andrew Morton <akpm@linux-foundation.org> writes:
> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > > stuck so can you please redo this against mainline?
> > > 
> > > Allow me to be off topic, this patch has been in mm tree for quite some
> > > time, what can I do to help this be merged upstream?

[...]


> 
> Wow, yes, it's about a year old.
> 
> > 
> > I have no evidence that it has been reviewed, for a start.  I've asked
> > Hugh to look at it.
> 
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.

FWIW, I do agree with Hugh about "the need for that fix": AFAIU, that
(mainline) code is naively buggy _and_ "this patch":

  http://lkml.kernel.org/r/20180223060010.954-1-ying.huang@intel.com

"redone on top of mainline" seems both correct and appropriate to me.


> (I was originally horrified by the stop_machine() added in swapon and
> swapoff, but perhaps I'm remembering a distant past of really stopping
> the machine: stop_machine() today looked reasonable, something to avoid
> generally like lru_add_drain_all(), but not as shameful as I thought.)

AFAIC_find_on_LKML, we have three different fixes (at least!): resp.,

  1. refcount(-based),
  2. RCU,
  3. stop_machine();

(3) appears to be the less documented/relied-upon/tested among these;
I'm not aware of definitive reasons forcing us to reject (1) and (2).

  Andrea


> 
> Hugh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-08  0:28                   ` Andrea Parri
  0 siblings, 0 replies; 55+ messages in thread
From: Andrea Parri @ 2019-02-08  0:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Huang, Ying, Daniel Jordan, dan.carpenter,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Hi Huang, Ying,

On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
> On Thu, 31 Jan 2019, Andrew Morton wrote:
> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
> > > Andrew Morton <akpm@linux-foundation.org> writes:
> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
> > > > stuck so can you please redo this against mainline?
> > > 
> > > Allow me to be off topic, this patch has been in mm tree for quite some
> > > time, what can I do to help this be merged upstream?

[...]


> 
> Wow, yes, it's about a year old.
> 
> > 
> > I have no evidence that it has been reviewed, for a start.  I've asked
> > Hugh to look at it.
> 
> I tried at the weekend.  Usual story: I don't like it at all, the
> ever-increasing complexity there, but certainly understand the need
> for that fix, and have not managed to think up anything better -
> and now I need to switch away, sorry.

FWIW, I do agree with Hugh about "the need for that fix": AFAIU, that
(mainline) code is naively buggy _and_ "this patch":

  http://lkml.kernel.org/r/20180223060010.954-1-ying.huang@intel.com

"redone on top of mainline" seems both correct and appropriate to me.


> (I was originally horrified by the stop_machine() added in swapon and
> swapoff, but perhaps I'm remembering a distant past of really stopping
> the machine: stop_machine() today looked reasonable, something to avoid
> generally like lru_add_drain_all(), but not as shameful as I thought.)

AFAIC_find_on_LKML, we have three different fixes (at least!): resp.,

  1. refcount(-based),
  2. RCU,
  3. stop_machine();

(3) appears to be the less documented/relied-upon/tested among these;
I'm not aware of definitive reasons forcing us to reject (1) and (2).

  Andrea


> 
> Hugh


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU
  2019-02-08  0:28                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Andrea Parri
@ 2019-02-11  1:02                     ` Huang, Ying
  -1 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-11  1:02 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Hugh Dickins, Andrew Morton, Daniel Jordan, dan.carpenter,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Andrea Parri <andrea.parri@amarulasolutions.com> writes:

> Hi Huang, Ying,
>
> On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
>> On Thu, 31 Jan 2019, Andrew Morton wrote:
>> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>> > > Andrew Morton <akpm@linux-foundation.org> writes:
>> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > > > stuck so can you please redo this against mainline?
>> > > 
>> > > Allow me to be off topic, this patch has been in mm tree for quite some
>> > > time, what can I do to help this be merged upstream?
>
> [...]
>
>
>> 
>> Wow, yes, it's about a year old.
>> 
>> > 
>> > I have no evidence that it has been reviewed, for a start.  I've asked
>> > Hugh to look at it.
>> 
>> I tried at the weekend.  Usual story: I don't like it at all, the
>> ever-increasing complexity there, but certainly understand the need
>> for that fix, and have not managed to think up anything better -
>> and now I need to switch away, sorry.
>
> FWIW, I do agree with Hugh about "the need for that fix": AFAIU, that
> (mainline) code is naively buggy _and_ "this patch":
>
>   http://lkml.kernel.org/r/20180223060010.954-1-ying.huang@intel.com
>
> "redone on top of mainline" seems both correct and appropriate to me.

Thanks!  Because the patch needs to go through -mm tree, so I will
rebase the patch on top of the head of -mm tree.
>
>> (I was originally horrified by the stop_machine() added in swapon and
>> swapoff, but perhaps I'm remembering a distant past of really stopping
>> the machine: stop_machine() today looked reasonable, something to avoid
>> generally like lru_add_drain_all(), but not as shameful as I thought.)
>
> AFAIC_find_on_LKML, we have three different fixes (at least!): resp.,
>
>   1. refcount(-based),
>   2. RCU,
>   3. stop_machine();
>
> (3) appears to be the less documented/relied-upon/tested among these;
> I'm not aware of definitive reasons forcing us to reject (1) and (2).

Because swapoff() is a really cold path, while page fault handler is a
really hot path.  (3) can minimize the overhead of the hot path.

Best Regards,
Huang, Ying

>   Andrea
>
>
>> 
>> Hugh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: About swapoff race patch  (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs)
@ 2019-02-11  1:02                     ` Huang, Ying
  0 siblings, 0 replies; 55+ messages in thread
From: Huang, Ying @ 2019-02-11  1:02 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Hugh Dickins, Andrew Morton, Daniel Jordan, dan.carpenter,
	dave.hansen, sfr, osandov, tj, ak, linux-mm, kernel-janitors,
	paulmck, stern, peterz, willy, will.deacon

Andrea Parri <andrea.parri@amarulasolutions.com> writes:

> Hi Huang, Ying,
>
> On Mon, Feb 04, 2019 at 01:37:00PM -0800, Hugh Dickins wrote:
>> On Thu, 31 Jan 2019, Andrew Morton wrote:
>> > On Thu, 31 Jan 2019 10:48:29 +0800 "Huang\, Ying" <ying.huang@intel.com> wrote:
>> > > Andrew Morton <akpm@linux-foundation.org> writes:
>> > > > mm-swap-fix-race-between-swapoff-and-some-swap-operations.patch is very
>> > > > stuck so can you please redo this against mainline?
>> > > 
>> > > Allow me to be off topic, this patch has been in mm tree for quite some
>> > > time, what can I do to help this be merged upstream?
>
> [...]
>
>
>> 
>> Wow, yes, it's about a year old.
>> 
>> > 
>> > I have no evidence that it has been reviewed, for a start.  I've asked
>> > Hugh to look at it.
>> 
>> I tried at the weekend.  Usual story: I don't like it at all, the
>> ever-increasing complexity there, but certainly understand the need
>> for that fix, and have not managed to think up anything better -
>> and now I need to switch away, sorry.
>
> FWIW, I do agree with Hugh about "the need for that fix": AFAIU, that
> (mainline) code is naively buggy _and_ "this patch":
>
>   http://lkml.kernel.org/r/20180223060010.954-1-ying.huang@intel.com
>
> "redone on top of mainline" seems both correct and appropriate to me.

Thanks!  Because the patch needs to go through -mm tree, so I will
rebase the patch on top of the head of -mm tree.
>
>> (I was originally horrified by the stop_machine() added in swapon and
>> swapoff, but perhaps I'm remembering a distant past of really stopping
>> the machine: stop_machine() today looked reasonable, something to avoid
>> generally like lru_add_drain_all(), but not as shameful as I thought.)
>
> AFAIC_find_on_LKML, we have three different fixes (at least!): resp.,
>
>   1. refcount(-based),
>   2. RCU,
>   3. stop_machine();
>
> (3) appears to be the less documented/relied-upon/tested among these;
> I'm not aware of definitive reasons forcing us to reject (1) and (2).

Because swapoff() is a really cold path, while page fault handler is a
really hot path.  (3) can minimize the overhead of the hot path.

Best Regards,
Huang, Ying

>   Andrea
>
>
>> 
>> Hugh


^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2019-02-11  1:02 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11  9:59 [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type() Dan Carpenter
2019-01-11  9:59 ` Dan Carpenter
2019-01-11 17:41 ` Daniel Jordan
2019-01-11 17:41   ` Daniel Jordan
2019-01-11 23:20   ` Andrea Parri
2019-01-11 23:20     ` Andrea Parri
2019-01-14 22:25     ` Daniel Jordan
2019-01-14 22:25       ` Daniel Jordan
2019-01-15  0:23       ` [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs Daniel Jordan
2019-01-15  0:23         ` Daniel Jordan
2019-01-15  1:17         ` Andrea Parri
2019-01-15  1:17           ` Andrea Parri
2019-01-30  6:26         ` Andrew Morton
2019-01-30  6:26           ` Andrew Morton
2019-01-31  1:52           ` Daniel Jordan
2019-01-31  1:52             ` Daniel Jordan
2019-01-31  2:44             ` [PATCH v2] mm, swap: bounds check swap_info array " Daniel Jordan
2019-01-31  2:44               ` Daniel Jordan
2019-01-31  2:48           ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL d Huang, Ying
2019-01-31  2:48             ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
2019-01-31 20:46             ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Andrew Morton
2019-01-31 20:46               ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Andrew Morton
2019-02-02  7:14               ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Huang, Ying
2019-02-02  7:14                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
2019-02-04 21:37               ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Hugh Dickins
2019-02-04 21:37                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
2019-02-04 22:26                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Matthew Wilcox
2019-02-04 22:26                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Matthew Wilcox
2019-02-06  0:14                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Huang, Ying
2019-02-06  0:14                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
2019-02-06  0:36                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Hugh Dickins
2019-02-06  0:36                     ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Hugh Dickins
2019-02-06  0:58                     ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Huang, Ying
2019-02-06  0:58                       ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
2019-02-08  0:28                 ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Andrea Parri
2019-02-08  0:28                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Andrea Parri
2019-02-11  1:02                   ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NU Huang, Ying
2019-02-11  1:02                     ` About swapoff race patch (was Re: [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs) Huang, Ying
2019-01-30  7:28         ` [PATCH] mm, swap: bounds check swap_info accesses to avoid NULL derefs Dan Carpenter
2019-01-30  7:28           ` Dan Carpenter
2019-01-31  1:55           ` Daniel Jordan
2019-01-31  1:55             ` Daniel Jordan
2019-01-30  9:13         ` Peter Zijlstra
2019-01-30  9:13           ` Peter Zijlstra
2019-01-31  2:00           ` Daniel Jordan
2019-01-31  2:00             ` Daniel Jordan
2019-01-15  0:28       ` [PATCH] mm, swap: Potential NULL dereference in get_swap_page_of_type() Andrea Parri
2019-01-15  0:28         ` Andrea Parri
2019-01-14  2:12   ` Huang, Ying
2019-01-14  2:12     ` Huang, Ying
2019-01-14  2:12     ` Huang, Ying
2019-01-14  8:43   ` Dan Carpenter
2019-01-14  8:43     ` Dan Carpenter
2019-01-14 23:40     ` Daniel Jordan
2019-01-14 23:40       ` Daniel Jordan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.