LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH] autofs: don't fail mount for transient error
@ 2017-11-03  1:40 NeilBrown
  2017-11-03 12:45 ` Ian Kent
  2017-12-05 22:21 ` [PATCH] autofs: fix careless error in recent commit NeilBrown
  0 siblings, 2 replies; 3+ messages in thread
From: NeilBrown @ 2017-11-03  1:40 UTC (permalink / raw)
  To: Ian Kent, Andrew Morton; +Cc: lkml, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 2191 bytes --]


Currently if the autofs kernel module gets an error when
writing to the pipe which links to the daemon, then it
marks the whole moutpoint as catatonic, and it will stop working.

It is possible that the error is transient.  This can happen
if the daemon is slow and more than 16 requests queue up.
If a subsequent process tries to queue a request, and is then signalled,
the write to the pipe will return -ERESTARTSYS and autofs
will take that as total failure.

So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole
mountpoint.

Signed-off-by: NeilBrown <neilb@suse.com>
---

Do people think this should got to -stable ??
It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.

Thanks,
NeilBrown


 fs/autofs4/waitq.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 4ac49d038bf3..8fc41705c7cd 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
 		spin_unlock_irqrestore(&current->sighand->siglock, flags);
 	}
 
-	return (bytes > 0);
+	/* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
+	return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
 }
 
 static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
@@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 	} pkt;
 	struct file *pipe = NULL;
 	size_t pktsz;
+	int ret;
 
 	pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
 		 (unsigned long) wq->wait_queue_token,
@@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 	mutex_unlock(&sbi->wq_mutex);
 
 	if (autofs4_write(sbi, pipe, &pkt, pktsz))
+	switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
+	case 0:
+		break;
+	case -ENOMEM:
+	case -ERESTARTSYS:
+		/* Just fail this one */
+		autofs4_wait_release(sbi, wq->wait_queue_token, ret);
+		break;
+	default:
 		autofs4_catatonic_mode(sbi);
+		break;
+	}
 	fput(pipe);
 }
 
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] autofs: don't fail mount for transient error
  2017-11-03  1:40 [PATCH] autofs: don't fail mount for transient error NeilBrown
@ 2017-11-03 12:45 ` Ian Kent
  2017-12-05 22:21 ` [PATCH] autofs: fix careless error in recent commit NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: Ian Kent @ 2017-11-03 12:45 UTC (permalink / raw)
  To: NeilBrown, Andrew Morton; +Cc: lkml, linux-fsdevel

On 03/11/17 09:40, NeilBrown wrote:
> 

Hi Neil, and thanks taking the time to post the patch.

> Currently if the autofs kernel module gets an error when
> writing to the pipe which links to the daemon, then it
> marks the whole moutpoint as catatonic, and it will stop working.
> 
> It is possible that the error is transient.  This can happen
> if the daemon is slow and more than 16 requests queue up.
> If a subsequent process tries to queue a request, and is then signalled,
> the write to the pipe will return -ERESTARTSYS and autofs
> will take that as total failure.

Indeed it does.

And given the problems with a half dozen (or so) user space
applications consuming large amounts of CPU under heavy mount
and umount activity this could happen more easily than we
expect.

> 
> So change the code to assess -ERESTARTSYS and -ENOMEM as transient
> failures which only abort the current request, not the whole
> mountpoint.

This looks good to me.

> 
> Signed-off-by: NeilBrown <neilb@suse.com>
> ---
> 
> Do people think this should got to -stable ??
> It isn't a crash or a data corruption, but having autofs mountpoints
> suddenly stop working is rather inconvenient.

Perhaps that's a good idea given the CPU usage problem I refer
to above has been around for a while now.

> 
> Thanks,
> NeilBrown
> 
> 
>  fs/autofs4/waitq.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index 4ac49d038bf3..8fc41705c7cd 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_sb_info *sbi,
>  		spin_unlock_irqrestore(&current->sighand->siglock, flags);
>  	}
>  
> -	return (bytes > 0);
> +	/* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
> +	return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
>  }
>  
>  static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
> @@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	} pkt;
>  	struct file *pipe = NULL;
>  	size_t pktsz;
> +	int ret;
>  
>  	pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
>  		 (unsigned long) wq->wait_queue_token,
> @@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>  	mutex_unlock(&sbi->wq_mutex);
>  
>  	if (autofs4_write(sbi, pipe, &pkt, pktsz))
> +	switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
> +	case 0:
> +		break;
> +	case -ENOMEM:
> +	case -ERESTARTSYS:
> +		/* Just fail this one */
> +		autofs4_wait_release(sbi, wq->wait_queue_token, ret);
> +		break;
> +	default:
>  		autofs4_catatonic_mode(sbi);
> +		break;
> +	}
>  	fput(pipe);
>  }
>  
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] autofs: fix careless error in recent commit.
  2017-11-03  1:40 [PATCH] autofs: don't fail mount for transient error NeilBrown
  2017-11-03 12:45 ` Ian Kent
@ 2017-12-05 22:21 ` NeilBrown
  1 sibling, 0 replies; 3+ messages in thread
From: NeilBrown @ 2017-12-05 22:21 UTC (permalink / raw)
  To: Ian Kent, Andrew Morton; +Cc: lkml, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 818 bytes --]


Commit ecc0c469f277 was meant to replace an 'if' with
a 'switch', but instead added the 'switch' leaving
the case in place.

Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: stable@vger.kernel.org
Fixes: ecc0c469f277 ("autofs: don't fail mount for transient error")
Signed-off-by: NeilBrown <neilb@suse.com>
---
 fs/autofs4/waitq.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 8fc41705c7cd..961a12dc6dc8 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -170,7 +170,6 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 
 	mutex_unlock(&sbi->wq_mutex);
 
-	if (autofs4_write(sbi, pipe, &pkt, pktsz))
 	switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
 	case 0:
 		break;
-- 
2.14.0.rc0.dirty


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-03  1:40 [PATCH] autofs: don't fail mount for transient error NeilBrown
2017-11-03 12:45 ` Ian Kent
2017-12-05 22:21 ` [PATCH] autofs: fix careless error in recent commit NeilBrown

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox