All of lore.kernel.org
 help / color / mirror / Atom feed
* A special case in write_flush which cause the umount busy
@ 2018-02-11  6:55 Wang, Alan 1. (NSB - CN/Hangzhou)
  2018-02-13 19:57 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Wang, Alan 1. (NSB - CN/Hangzhou) @ 2018-02-11  6:55 UTC (permalink / raw)
  To: linux-nfs; +Cc: neilb

Hi,

I have a test case on mount/umount on a partition from nfs server side. And=
 encounter a problem of umount busy in a low probability.

The Linux version is 3.10.64 with the patch "sunrpc/cache: make cache flush=
ing more reliable".
https://patchwork.kernel.org/patch/7410021/

After some analysis and test in many times, I find that when it failed to m=
ount, the time "then" and "now" are different, which caused the last_refres=
h is far beyond the flush_time. So this cache is not expired and won't be c=
lean at once.=20
Then the ref in cache_head won't be released, and mntput_no_expire didn't b=
e called to decrease the count. That caused the umount busy.

Below are logs in my test.

kernel: [  292.767801] write_flush 1480 then =3D 249, now =3D 250
kernel: [  292.767817] cache_clean 451, cd name nfsd.fh expiry_time 7904852=
53, cd flush_time 249, last_refresh 369, seconds_since_boot 250
kernel: [  292.767913] write_flush 1480 then =3D 249, now =3D 250
kernel: [  292.767928] cache_clean 451, cd name nfsd.export expiry_time 204=
9, cd flush_time 249, last_refresh 369, seconds_since_boot 250
kernel: [  292.773229] do_refcount_check 283 mycount 4
kernel: [  292.773245] do_umount 1344 retval -16

I think this happens in such case that the exportfs writes the flush with c=
urrent time, the time of "then". But when seconds_since_boot being called i=
n function write_flush, the time is on the next second, so the "now" is one=
 second after "then".
Because "then" is less than "now", the flush_time is set directly to origin=
al "then", rather than "cd->flush_time + 1".


And I want to change the condition as below. I'm not sure it's OK and has n=
o effects to other part.

--------------------------------------------------------------------------
       then =3D get_expiry(&bp);
       now =3D seconds_since_boot();
       cd->nextcheck =3D now;
       /* Can only set flush_time to 1 second beyond "now", or
       * possibly 1 second beyond flushtime.  This is because
       * flush_time never goes backwards so it mustn't get too far
       * ahead of time.
       */
       if (then !=3D now)
              printk("%s %d then =3D %d, now =3D %d\n", __func__, __LINE__,=
 then, now);
-      if (then >=3D now) {
+     if (then >=3D now - 1) {
              /* Want to flush everything, so behave like cache_purge() */
              if (cd->flush_time >=3D now)
                     now =3D cd->flush_time + 1;
              then =3D now;
       }

       cd->flush_time =3D then;
--------------------------------------------------------------------------

Best Regards,
Alan Wang


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: A special case in write_flush which cause the umount busy
  2018-02-11  6:55 A special case in write_flush which cause the umount busy Wang, Alan 1. (NSB - CN/Hangzhou)
@ 2018-02-13 19:57 ` NeilBrown
  2018-02-13 21:03   ` J. Bruce Fields
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2018-02-13 19:57 UTC (permalink / raw)
  To: Wang, Alan 1. (NSB - CN/Hangzhou), linux-nfs, J. Bruce Fields

[-- Attachment #1: Type: text/plain, Size: 4649 bytes --]

On Sun, Feb 11 2018, Wang, Alan 1. (NSB - CN/Hangzhou) wrote:

> Hi,
>
> I have a test case on mount/umount on a partition from nfs server side. And encounter a problem of umount busy in a low probability.
>
> The Linux version is 3.10.64 with the patch "sunrpc/cache: make cache flushing more reliable".
> https://patchwork.kernel.org/patch/7410021/
>
> After some analysis and test in many times, I find that when it failed to mount, the time "then" and "now" are different, which caused the last_refresh is far beyond the flush_time. So this cache is not expired and won't be clean at once. 
> Then the ref in cache_head won't be released, and mntput_no_expire didn't be called to decrease the count. That caused the umount busy.
>
> Below are logs in my test.
>
> kernel: [  292.767801] write_flush 1480 then = 249, now = 250
> kernel: [  292.767817] cache_clean 451, cd name nfsd.fh expiry_time 790485253, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [  292.767913] write_flush 1480 then = 249, now = 250
> kernel: [  292.767928] cache_clean 451, cd name nfsd.export expiry_time 2049, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [  292.773229] do_refcount_check 283 mycount 4
> kernel: [  292.773245] do_umount 1344 retval -16
>
> I think this happens in such case that the exportfs writes the flush with current time, the time of "then". But when seconds_since_boot being called in function write_flush, the time is on the next second, so the "now" is one second after "then".
> Because "then" is less than "now", the flush_time is set directly to original "then", rather than "cd->flush_time + 1".
>
>
> And I want to change the condition as below. I'm not sure it's OK and has no effects to other part.

I think this change is safe.  It is a bit of a hack, but the "flush"
interface was actually very poorly designed and I don't think it *can*
be implemented cleanly.
Maybe we should just ignore the number written in and *always* flush
the cache completely.  That is was it always wanted.

See below.

Bruce: do you have any thoughts on this?

Thanks,
NeilBrown

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index aa36dad32db1..43f117d1547e 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1450,7 +1450,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 			   struct cache_detail *cd)
 {
 	char tbuf[20];
-	char *bp, *ep;
+	char *ep;
 	time_t then, now;
 
 	if (*ppos || count > sizeof(tbuf)-1)
@@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 	simple_strtoul(tbuf, &ep, 0);
 	if (*ep && *ep != '\n')
 		return -EINVAL;
+	/* Note that while we check that 'buf' holds a valid number,
+	 * we always ignore the value and just flush everything.
+	 * Making use of the number leads to races.
+	 */
 
-	bp = tbuf;
-	then = get_expiry(&bp);
 	now = seconds_since_boot();
-	cd->nextcheck = now;
-	/* Can only set flush_time to 1 second beyond "now", or
+	/* Always flush everything, so behave like cache_purge()
+	 * Can only set flush_time to 1 second beyond "now", or
 	 * possibly 1 second beyond flushtime.  This is because
 	 * flush_time never goes backwards so it mustn't get too far
 	 * ahead of time.
 	 */
-	if (then >= now) {
-		/* Want to flush everything, so behave like cache_purge() */
-		if (cd->flush_time >= now)
-			now = cd->flush_time + 1;
-		then = now;
-	}
 
-	cd->flush_time = then;
+	if (cd->flush_time >= now)
+		now = cd->flush_time + 1;
+
+	cd->flush_time = now;
+	cd->nextcheck = now;
 	cache_flush();
 
 	*ppos += count;


>
> --------------------------------------------------------------------------
>        then = get_expiry(&bp);
>        now = seconds_since_boot();
>        cd->nextcheck = now;
>        /* Can only set flush_time to 1 second beyond "now", or
>        * possibly 1 second beyond flushtime.  This is because
>        * flush_time never goes backwards so it mustn't get too far
>        * ahead of time.
>        */
>        if (then != now)
>               printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now);
> -      if (then >= now) {
> +     if (then >= now - 1) {
>               /* Want to flush everything, so behave like cache_purge() */
>               if (cd->flush_time >= now)
>                      now = cd->flush_time + 1;
>               then = now;
>        }
>
>        cd->flush_time = then;
> --------------------------------------------------------------------------
>
> Best Regards,
> Alan Wang

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: A special case in write_flush which cause the umount busy
  2018-02-13 19:57 ` NeilBrown
@ 2018-02-13 21:03   ` J. Bruce Fields
  2018-02-14  1:15     ` [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: J. Bruce Fields @ 2018-02-13 21:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: Wang, Alan 1. (NSB - CN/Hangzhou), linux-nfs

On Wed, Feb 14, 2018 at 06:57:06AM +1100, NeilBrown wrote:
> I think this change is safe.  It is a bit of a hack, but the "flush"
> interface was actually very poorly designed and I don't think it *can*
> be implemented cleanly.
> Maybe we should just ignore the number written in and *always* flush
> the cache completely.  That is was it always wanted.
> 
> See below.
> 
> Bruce: do you have any thoughts on this?

I'm fine with ignoring the number--I agree that the chance anyone is
doing anything fancier is vanishingly small.

--b.

> 
> Thanks,
> NeilBrown
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index aa36dad32db1..43f117d1547e 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1450,7 +1450,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  			   struct cache_detail *cd)
>  {
>  	char tbuf[20];
> -	char *bp, *ep;
> +	char *ep;
>  	time_t then, now;
>  
>  	if (*ppos || count > sizeof(tbuf)-1)
> @@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  	simple_strtoul(tbuf, &ep, 0);
>  	if (*ep && *ep != '\n')
>  		return -EINVAL;
> +	/* Note that while we check that 'buf' holds a valid number,
> +	 * we always ignore the value and just flush everything.
> +	 * Making use of the number leads to races.
> +	 */
>  
> -	bp = tbuf;
> -	then = get_expiry(&bp);
>  	now = seconds_since_boot();
> -	cd->nextcheck = now;
> -	/* Can only set flush_time to 1 second beyond "now", or
> +	/* Always flush everything, so behave like cache_purge()
> +	 * Can only set flush_time to 1 second beyond "now", or
>  	 * possibly 1 second beyond flushtime.  This is because
>  	 * flush_time never goes backwards so it mustn't get too far
>  	 * ahead of time.
>  	 */
> -	if (then >= now) {
> -		/* Want to flush everything, so behave like cache_purge() */
> -		if (cd->flush_time >= now)
> -			now = cd->flush_time + 1;
> -		then = now;
> -	}
>  
> -	cd->flush_time = then;
> +	if (cd->flush_time >= now)
> +		now = cd->flush_time + 1;
> +
> +	cd->flush_time = now;
> +	cd->nextcheck = now;
>  	cache_flush();
>  
>  	*ppos += count;
> 
> 
> >
> > --------------------------------------------------------------------------
> >        then = get_expiry(&bp);
> >        now = seconds_since_boot();
> >        cd->nextcheck = now;
> >        /* Can only set flush_time to 1 second beyond "now", or
> >        * possibly 1 second beyond flushtime.  This is because
> >        * flush_time never goes backwards so it mustn't get too far
> >        * ahead of time.
> >        */
> >        if (then != now)
> >               printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now);
> > -      if (then >= now) {
> > +     if (then >= now - 1) {
> >               /* Want to flush eve



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file.
  2018-02-13 21:03   ` J. Bruce Fields
@ 2018-02-14  1:15     ` NeilBrown
  2018-02-22  0:46       ` J. Bruce Fields
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2018-02-14  1:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Wang, Alan 1. (NSB - CN/Hangzhou), linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3012 bytes --]


The interface for flushing the sunrpc auth cache was poorly
designed and has caused problems a number of times.

The design is that you write a timestamp, and all entries
created before that time are discarded.
The most obvious problem is that this is not what people
actually want.  They want to just flush the whole cache.
The 1-second granularity can be a problem, as can the use
of wall-clock time.

A current problem is that code will write the current time to
this file - expecting it to clear everything - and if the
seconds number ticks over before this timestamp is checked,
the test "then >= now" fails, and a full flush isn't forced.

So lets just drop the subtleties and always flush the whole
cache.  The worst this could do is impose an extra cost
refilling it, but that would require someone to be using
non-standard tools.

We still report an error if the string written is not a number,
but we cause any valid number to flush the whole cache.

Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com>
Signed-off-by: NeilBrown <neilb@suse.com>
---
 net/sunrpc/cache.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 8a7e1c774f9c..26582e27be6a 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1450,8 +1450,8 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 			   struct cache_detail *cd)
 {
 	char tbuf[20];
-	char *bp, *ep;
-	time_t then, now;
+	char *ep;
+	time_t now;
 
 	if (*ppos || count > sizeof(tbuf)-1)
 		return -EINVAL;
@@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 	simple_strtoul(tbuf, &ep, 0);
 	if (*ep && *ep != '\n')
 		return -EINVAL;
+	/* Note that while we check that 'buf' holds a valid number,
+	 * we always ignore the value and just flush everything.
+	 * Making use of the number leads to races.
+	 */
 
-	bp = tbuf;
-	then = get_expiry(&bp);
 	now = seconds_since_boot();
-	cd->nextcheck = now;
-	/* Can only set flush_time to 1 second beyond "now", or
-	 * possibly 1 second beyond flushtime.  This is because
-	 * flush_time never goes backwards so it mustn't get too far
-	 * ahead of time.
+	/* Always flush everything, so behave like cache_purge()
+	 * Do this by advancing flush_time to the current time,
+	 * or by one second if it has already reached the current time.
+	 * Newly added cache entries will always have ->last_refresh greater
+	 * that ->flush_time, so they don't get flushed prematurely.
 	 */
-	if (then >= now) {
-		/* Want to flush everything, so behave like cache_purge() */
-		if (cd->flush_time >= now)
-			now = cd->flush_time + 1;
-		then = now;
-	}
 
-	cd->flush_time = then;
+	if (cd->flush_time >= now)
+		now = cd->flush_time + 1;
+
+	cd->flush_time = now;
+	cd->nextcheck = now;
 	cache_flush();
 
 	*ppos += count;
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file.
  2018-02-14  1:15     ` [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file NeilBrown
@ 2018-02-22  0:46       ` J. Bruce Fields
  0 siblings, 0 replies; 5+ messages in thread
From: J. Bruce Fields @ 2018-02-22  0:46 UTC (permalink / raw)
  To: NeilBrown; +Cc: Wang, Alan 1. (NSB - CN/Hangzhou), linux-nfs

On Wed, Feb 14, 2018 at 12:15:06PM +1100, NeilBrown wrote:
> 
> The interface for flushing the sunrpc auth cache was poorly
> designed and has caused problems a number of times.
> 
> The design is that you write a timestamp, and all entries
> created before that time are discarded.
> The most obvious problem is that this is not what people
> actually want.  They want to just flush the whole cache.
> The 1-second granularity can be a problem, as can the use
> of wall-clock time.
> 
> A current problem is that code will write the current time to
> this file - expecting it to clear everything - and if the
> seconds number ticks over before this timestamp is checked,
> the test "then >= now" fails, and a full flush isn't forced.
> 
> So lets just drop the subtleties and always flush the whole
> cache.  The worst this could do is impose an extra cost
> refilling it, but that would require someone to be using
> non-standard tools.
> 
> We still report an error if the string written is not a number,
> but we cause any valid number to flush the whole cache.

Thanks, applying for 4.17 (unless someone thinks it's more urgent).

--b.

> 
> Reported-by: "Wang, Alan 1. (NSB - CN/Hangzhou)" <alan.1.wang@nokia-sbell.com>
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
>  net/sunrpc/cache.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 8a7e1c774f9c..26582e27be6a 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -1450,8 +1450,8 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  			   struct cache_detail *cd)
>  {
>  	char tbuf[20];
> -	char *bp, *ep;
> -	time_t then, now;
> +	char *ep;
> +	time_t now;
>  
>  	if (*ppos || count > sizeof(tbuf)-1)
>  		return -EINVAL;
> @@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
>  	simple_strtoul(tbuf, &ep, 0);
>  	if (*ep && *ep != '\n')
>  		return -EINVAL;
> +	/* Note that while we check that 'buf' holds a valid number,
> +	 * we always ignore the value and just flush everything.
> +	 * Making use of the number leads to races.
> +	 */
>  
> -	bp = tbuf;
> -	then = get_expiry(&bp);
>  	now = seconds_since_boot();
> -	cd->nextcheck = now;
> -	/* Can only set flush_time to 1 second beyond "now", or
> -	 * possibly 1 second beyond flushtime.  This is because
> -	 * flush_time never goes backwards so it mustn't get too far
> -	 * ahead of time.
> +	/* Always flush everything, so behave like cache_purge()
> +	 * Do this by advancing flush_time to the current time,
> +	 * or by one second if it has already reached the current time.
> +	 * Newly added cache entries will always have ->last_refresh greater
> +	 * that ->flush_time, so they don't get flushed prematurely.
>  	 */
> -	if (then >= now) {
> -		/* Want to flush everything, so behave like cache_purge() */
> -		if (cd->flush_time >= now)
> -			now = cd->flush_time + 1;
> -		then = now;
> -	}
>  
> -	cd->flush_time = then;
> +	if (cd->flush_time >= now)
> +		now = cd->flush_time + 1;
> +
> +	cd->flush_time = now;
> +	cd->nextcheck = now;
>  	cache_flush();
>  
>  	*ppos += count;
> -- 
> 2.14.0.rc0.dirty
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-22  0:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-11  6:55 A special case in write_flush which cause the umount busy Wang, Alan 1. (NSB - CN/Hangzhou)
2018-02-13 19:57 ` NeilBrown
2018-02-13 21:03   ` J. Bruce Fields
2018-02-14  1:15     ` [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file NeilBrown
2018-02-22  0:46       ` J. Bruce Fields

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.