ceph-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
@ 2020-09-25 14:08 Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-25 14:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

Ilya noticed that he would get spurious EACCES errors on calls done just
after blocklisting the client on mounts with recover_session=clean. The
session would get marked as REJECTED and that caused in-flight calls to
die with EACCES. This patchset seems to smooth over the problem, but I'm
not fully convinced it's the right approach.

The potential issue I see is that the client could take cap references to
do a call on a session that has been blocklisted. We then queue the
message and reestablish the session, but we may not have been granted
the same caps by the MDS at that point.

If this is a problem, then we probably need to rework it so that we
return a distinct error code in this situation and have the upper layers
issue a completely new mds request (with new cap refs, etc.)

Obviously, that's a much more invasive approach though, so it would be
nice to avoid that if this would suffice.

Jeff Layton (4):
  ceph: don't WARN when removing caps due to blocklisting
  ceph: don't mark mount as SHUTDOWN when recovering session
  ceph: remove timeout on allowing reconnect after blocklisting
  ceph: queue request when CLEANRECOVER is set

 fs/ceph/caps.c       |  2 +-
 fs/ceph/mds_client.c | 10 ++++------
 fs/ceph/super.c      | 13 +++++++++----
 fs/ceph/super.h      |  1 -
 4 files changed, 14 insertions(+), 12 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
@ 2020-09-25 14:08 ` Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-25 14:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

We expect to remove dirty caps when the client is blocklisted. Don't
throw a warning in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index c7e69547628e..2ee3f316afcf 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1149,7 +1149,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
 	/* remove from inode's cap rbtree, and clear auth cap */
 	rb_erase(&cap->ci_node, &ci->i_caps);
 	if (ci->i_auth_cap == cap) {
-		WARN_ON_ONCE(!list_empty(&ci->i_dirty_item));
+		WARN_ON_ONCE(!list_empty(&ci->i_dirty_item) && !mdsc->fsc->blocklisted);
 		ci->i_auth_cap = NULL;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
@ 2020-09-25 14:08 ` Jeff Layton
  2020-09-29  8:20   ` Yan, Zheng
  2020-09-25 14:08 ` [RFC PATCH 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2020-09-25 14:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

When recovering a session (a'la recover_session=clean), we want to do
all of the operations that we do on a forced umount, but changing the
mount state to SHUTDOWN is wrong and can cause queued MDS requests to
fail when the session comes back.

Only mark it as SHUTDOWN when umount_begin is called.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/super.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 2516304379d3..46a0e4e1b177 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -832,6 +832,13 @@ static void destroy_caches(void)
 	ceph_fscache_unregister();
 }
 
+static void __ceph_umount_begin(struct ceph_fs_client *fsc)
+{
+	ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
+	ceph_mdsc_force_umount(fsc->mdsc);
+	fsc->filp_gen++; // invalidate open files
+}
+
 /*
  * ceph_umount_begin - initiate forced umount.  Tear down the
  * mount, skipping steps that may hang while waiting for server(s).
@@ -844,9 +851,7 @@ static void ceph_umount_begin(struct super_block *sb)
 	if (!fsc)
 		return;
 	fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
-	ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
-	ceph_mdsc_force_umount(fsc->mdsc);
-	fsc->filp_gen++; // invalidate open files
+	__ceph_umount_begin(fsc);
 }
 
 static const struct super_operations ceph_super_ops = {
@@ -1235,7 +1240,7 @@ int ceph_force_reconnect(struct super_block *sb)
 	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
 	int err = 0;
 
-	ceph_umount_begin(sb);
+	__ceph_umount_begin(fsc);
 
 	/* Make sure all page caches get invalidated.
 	 * see remove_session_caps_cb() */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 3/4] ceph: remove timeout on allowing reconnect after blocklisting
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
@ 2020-09-25 14:08 ` Jeff Layton
  2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-25 14:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

30 minutes is a long time to wait, and this makes it difficult to test
the feature by manually blocklisting clients. Remove the timeout
infrastructure and just allow the client to reconnect at will.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 5 -----
 fs/ceph/super.h      | 1 -
 2 files changed, 6 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 08f1c0c31dc2..fd16db6ecb0a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4374,12 +4374,7 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc)
 	if (!READ_ONCE(fsc->blocklisted))
 		return;
 
-	if (fsc->last_auto_reconnect &&
-	    time_before(jiffies, fsc->last_auto_reconnect + HZ * 60 * 30))
-		return;
-
 	pr_info("auto reconnect after blocklisted\n");
-	fsc->last_auto_reconnect = jiffies;
 	ceph_force_reconnect(fsc->sb);
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 582694899130..cb138e218ab4 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -108,7 +108,6 @@ struct ceph_fs_client {
 
 	unsigned long mount_state;
 
-	unsigned long last_auto_reconnect;
 	bool blocklisted;
 
 	bool have_copy_from2;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
                   ` (2 preceding siblings ...)
  2020-09-25 14:08 ` [RFC PATCH 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
@ 2020-09-25 14:08 ` Jeff Layton
  2020-09-29  8:31   ` Yan, Zheng
  2020-09-29 19:55   ` Jeff Layton
  2020-09-29  8:28 ` [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Yan, Zheng
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
  5 siblings, 2 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-25 14:08 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

Ilya noticed that the first access to a blacklisted mount would often
get back -EACCES, but then subsequent calls would be OK. The problem is
in __do_request. If the session is marked as REJECTED, a hard error is
returned instead of waiting for a new session to come into being.

When the session is REJECTED and the mount was done with
recover_session=clean, queue the request to the waiting_for_map queue,
which will be awoken after tearing down the old session.

URL: https://tracker.ceph.com/issues/47385
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd16db6ecb0a..b07e7adf146f 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
 	if (session->s_state != CEPH_MDS_SESSION_OPEN &&
 	    session->s_state != CEPH_MDS_SESSION_HUNG) {
 		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
-			err = -EACCES;
+			if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
+				list_add(&req->r_wait, &mdsc->waiting_for_map);
+			else
+				err = -EACCES;
 			goto out_session;
 		}
 		/*
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session
  2020-09-25 14:08 ` [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
@ 2020-09-29  8:20   ` Yan, Zheng
  2020-09-29 12:30     ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Yan, Zheng @ 2020-09-29  8:20 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, Ilya Dryomov, Patrick Donnelly

On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> When recovering a session (a'la recover_session=clean), we want to do
> all of the operations that we do on a forced umount, but changing the
> mount state to SHUTDOWN is wrong and can cause queued MDS requests to
> fail when the session comes back.
>

code that cleanup page cache check the SHUTDOWN state.

> Only mark it as SHUTDOWN when umount_begin is called.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/super.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> index 2516304379d3..46a0e4e1b177 100644
> --- a/fs/ceph/super.c
> +++ b/fs/ceph/super.c
> @@ -832,6 +832,13 @@ static void destroy_caches(void)
>         ceph_fscache_unregister();
>  }
>
> +static void __ceph_umount_begin(struct ceph_fs_client *fsc)
> +{
> +       ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
> +       ceph_mdsc_force_umount(fsc->mdsc);
> +       fsc->filp_gen++; // invalidate open files
> +}
> +
>  /*
>   * ceph_umount_begin - initiate forced umount.  Tear down the
>   * mount, skipping steps that may hang while waiting for server(s).
> @@ -844,9 +851,7 @@ static void ceph_umount_begin(struct super_block *sb)
>         if (!fsc)
>                 return;
>         fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
> -       ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
> -       ceph_mdsc_force_umount(fsc->mdsc);
> -       fsc->filp_gen++; // invalidate open files
> +       __ceph_umount_begin(fsc);
>  }
>
>  static const struct super_operations ceph_super_ops = {
> @@ -1235,7 +1240,7 @@ int ceph_force_reconnect(struct super_block *sb)
>         struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
>         int err = 0;
>
> -       ceph_umount_begin(sb);
> +       __ceph_umount_begin(fsc);
>
>         /* Make sure all page caches get invalidated.
>          * see remove_session_caps_cb() */
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
                   ` (3 preceding siblings ...)
  2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
@ 2020-09-29  8:28 ` Yan, Zheng
  2020-09-29  8:54   ` Ilya Dryomov
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
  5 siblings, 1 reply; 23+ messages in thread
From: Yan, Zheng @ 2020-09-29  8:28 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, Ilya Dryomov, Patrick Donnelly

On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Ilya noticed that he would get spurious EACCES errors on calls done just
> after blocklisting the client on mounts with recover_session=clean. The
> session would get marked as REJECTED and that caused in-flight calls to
> die with EACCES. This patchset seems to smooth over the problem, but I'm
> not fully convinced it's the right approach.
>

the root is cause is that client does not recover session instantly
after getting rejected by mds. Before session gets recovered, client
continues to return error.


> The potential issue I see is that the client could take cap references to
> do a call on a session that has been blocklisted. We then queue the
> message and reestablish the session, but we may not have been granted
> the same caps by the MDS at that point.
>
> If this is a problem, then we probably need to rework it so that we
> return a distinct error code in this situation and have the upper layers
> issue a completely new mds request (with new cap refs, etc.)
>
> Obviously, that's a much more invasive approach though, so it would be
> nice to avoid that if this would suffice.
>
> Jeff Layton (4):
>   ceph: don't WARN when removing caps due to blocklisting
>   ceph: don't mark mount as SHUTDOWN when recovering session
>   ceph: remove timeout on allowing reconnect after blocklisting
>   ceph: queue request when CLEANRECOVER is set
>
>  fs/ceph/caps.c       |  2 +-
>  fs/ceph/mds_client.c | 10 ++++------
>  fs/ceph/super.c      | 13 +++++++++----
>  fs/ceph/super.h      |  1 -
>  4 files changed, 14 insertions(+), 12 deletions(-)
>
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set
  2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
@ 2020-09-29  8:31   ` Yan, Zheng
  2020-09-29 12:46     ` Jeff Layton
  2020-09-29 19:55   ` Jeff Layton
  1 sibling, 1 reply; 23+ messages in thread
From: Yan, Zheng @ 2020-09-29  8:31 UTC (permalink / raw)
  To: Jeff Layton; +Cc: ceph-devel, Ilya Dryomov, Patrick Donnelly

On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> Ilya noticed that the first access to a blacklisted mount would often
> get back -EACCES, but then subsequent calls would be OK. The problem is
> in __do_request. If the session is marked as REJECTED, a hard error is
> returned instead of waiting for a new session to come into being.
>
> When the session is REJECTED and the mount was done with
> recover_session=clean, queue the request to the waiting_for_map queue,
> which will be awoken after tearing down the old session.
>
> URL: https://tracker.ceph.com/issues/47385
> Reported-by: Ilya Dryomov <idryomov@gmail.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/mds_client.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index fd16db6ecb0a..b07e7adf146f 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
>         if (session->s_state != CEPH_MDS_SESSION_OPEN &&
>             session->s_state != CEPH_MDS_SESSION_HUNG) {
>                 if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> -                       err = -EACCES;
> +                       if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> +                               list_add(&req->r_wait, &mdsc->waiting_for_map);
> +                       else
> +                               err = -EACCES;

During recovering session, client drops all dirty caps and abort all
osd requests.  It does not make sense , some operations are waiting,
the others get aborted.

>                         goto out_session;
>                 }
>                 /*
> --
> 2.26.2
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29  8:28 ` [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Yan, Zheng
@ 2020-09-29  8:54   ` Ilya Dryomov
  2020-09-29 10:44     ` Yan, Zheng
  0 siblings, 1 reply; 23+ messages in thread
From: Ilya Dryomov @ 2020-09-29  8:54 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Jeff Layton, ceph-devel, Patrick Donnelly

On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > Ilya noticed that he would get spurious EACCES errors on calls done just
> > after blocklisting the client on mounts with recover_session=clean. The
> > session would get marked as REJECTED and that caused in-flight calls to
> > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > not fully convinced it's the right approach.
> >
>
> the root is cause is that client does not recover session instantly
> after getting rejected by mds. Before session gets recovered, client
> continues to return error.

Hi Zheng,

I don't think it's about whether that happens instantly or not.
In the example from [1], the first "ls" would fail even if issued
minutes after the session reject message and the reconnect.  From
the user's POV it is well after the automatic recovery promised by
recover_session=clean.

[1] https://tracker.ceph.com/issues/47385

Thanks,

                Ilya

>
>
> > The potential issue I see is that the client could take cap references to
> > do a call on a session that has been blocklisted. We then queue the
> > message and reestablish the session, but we may not have been granted
> > the same caps by the MDS at that point.
> >
> > If this is a problem, then we probably need to rework it so that we
> > return a distinct error code in this situation and have the upper layers
> > issue a completely new mds request (with new cap refs, etc.)
> >
> > Obviously, that's a much more invasive approach though, so it would be
> > nice to avoid that if this would suffice.
> >
> > Jeff Layton (4):
> >   ceph: don't WARN when removing caps due to blocklisting
> >   ceph: don't mark mount as SHUTDOWN when recovering session
> >   ceph: remove timeout on allowing reconnect after blocklisting
> >   ceph: queue request when CLEANRECOVER is set
> >
> >  fs/ceph/caps.c       |  2 +-
> >  fs/ceph/mds_client.c | 10 ++++------
> >  fs/ceph/super.c      | 13 +++++++++----
> >  fs/ceph/super.h      |  1 -
> >  4 files changed, 14 insertions(+), 12 deletions(-)
> >
> > --
> > 2.26.2
> >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29  8:54   ` Ilya Dryomov
@ 2020-09-29 10:44     ` Yan, Zheng
  2020-09-29 10:58       ` Ilya Dryomov
  2020-09-29 19:50       ` Jeff Layton
  0 siblings, 2 replies; 23+ messages in thread
From: Yan, Zheng @ 2020-09-29 10:44 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Jeff Layton, ceph-devel, Patrick Donnelly

On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> >
> > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > >
> > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > after blocklisting the client on mounts with recover_session=clean. The
> > > session would get marked as REJECTED and that caused in-flight calls to
> > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > not fully convinced it's the right approach.
> > >
> >
> > the root is cause is that client does not recover session instantly
> > after getting rejected by mds. Before session gets recovered, client
> > continues to return error.
>
> Hi Zheng,
>
> I don't think it's about whether that happens instantly or not.
> In the example from [1], the first "ls" would fail even if issued
> minutes after the session reject message and the reconnect.  From
> the user's POV it is well after the automatic recovery promised by
> recover_session=clean.
>
> [1] https://tracker.ceph.com/issues/47385

Reconnect should close all old session. It's likely because that
client didn't detect it's blacklisted.

>
> Thanks,
>
>                 Ilya
>
> >
> >
> > > The potential issue I see is that the client could take cap references to
> > > do a call on a session that has been blocklisted. We then queue the
> > > message and reestablish the session, but we may not have been granted
> > > the same caps by the MDS at that point.
> > >
> > > If this is a problem, then we probably need to rework it so that we
> > > return a distinct error code in this situation and have the upper layers
> > > issue a completely new mds request (with new cap refs, etc.)
> > >
> > > Obviously, that's a much more invasive approach though, so it would be
> > > nice to avoid that if this would suffice.
> > >
> > > Jeff Layton (4):
> > >   ceph: don't WARN when removing caps due to blocklisting
> > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > >   ceph: remove timeout on allowing reconnect after blocklisting
> > >   ceph: queue request when CLEANRECOVER is set
> > >
> > >  fs/ceph/caps.c       |  2 +-
> > >  fs/ceph/mds_client.c | 10 ++++------
> > >  fs/ceph/super.c      | 13 +++++++++----
> > >  fs/ceph/super.h      |  1 -
> > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > >
> > > --
> > > 2.26.2
> > >

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29 10:44     ` Yan, Zheng
@ 2020-09-29 10:58       ` Ilya Dryomov
  2020-09-29 12:48         ` Jeff Layton
  2020-09-29 19:50       ` Jeff Layton
  1 sibling, 1 reply; 23+ messages in thread
From: Ilya Dryomov @ 2020-09-29 10:58 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Jeff Layton, ceph-devel, Patrick Donnelly

On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> >
> > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > >
> > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > not fully convinced it's the right approach.
> > > >
> > >
> > > the root is cause is that client does not recover session instantly
> > > after getting rejected by mds. Before session gets recovered, client
> > > continues to return error.
> >
> > Hi Zheng,
> >
> > I don't think it's about whether that happens instantly or not.
> > In the example from [1], the first "ls" would fail even if issued
> > minutes after the session reject message and the reconnect.  From
> > the user's POV it is well after the automatic recovery promised by
> > recover_session=clean.
> >
> > [1] https://tracker.ceph.com/issues/47385
>
> Reconnect should close all old session. It's likely because that
> client didn't detect it's blacklisted.

Sorry, I should have pasted dmesg there as well.  It _does_ detect
blacklisting -- notice that I wrote "after the session reject message
and the reconnect".

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session
  2020-09-29  8:20   ` Yan, Zheng
@ 2020-09-29 12:30     ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-29 12:30 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, Ilya Dryomov, Patrick Donnelly

On Tue, 2020-09-29 at 16:20 +0800, Yan, Zheng wrote:
> On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > When recovering a session (a'la recover_session=clean), we want to do
> > all of the operations that we do on a forced umount, but changing the
> > mount state to SHUTDOWN is wrong and can cause queued MDS requests to
> > fail when the session comes back.
> > 
> 
> code that cleanup page cache check the SHUTDOWN state.
> 

Ok, so we do need to do something else there if we don't mark the thing
SHUTDOWN. Maybe we ought to declare a new mount_state for
this...CEPH_MOUNT_RECOVERING ?

> > Only mark it as SHUTDOWN when umount_begin is called.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/super.c | 13 +++++++++----
> >  1 file changed, 9 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/ceph/super.c b/fs/ceph/super.c
> > index 2516304379d3..46a0e4e1b177 100644
> > --- a/fs/ceph/super.c
> > +++ b/fs/ceph/super.c
> > @@ -832,6 +832,13 @@ static void destroy_caches(void)
> >         ceph_fscache_unregister();
> >  }
> > 
> > +static void __ceph_umount_begin(struct ceph_fs_client *fsc)
> > +{
> > +       ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
> > +       ceph_mdsc_force_umount(fsc->mdsc);
> > +       fsc->filp_gen++; // invalidate open files
> > +}
> > +
> >  /*
> >   * ceph_umount_begin - initiate forced umount.  Tear down the
> >   * mount, skipping steps that may hang while waiting for server(s).
> > @@ -844,9 +851,7 @@ static void ceph_umount_begin(struct super_block *sb)
> >         if (!fsc)
> >                 return;
> >         fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
> > -       ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
> > -       ceph_mdsc_force_umount(fsc->mdsc);
> > -       fsc->filp_gen++; // invalidate open files
> > +       __ceph_umount_begin(fsc);
> >  }
> > 
> >  static const struct super_operations ceph_super_ops = {
> > @@ -1235,7 +1240,7 @@ int ceph_force_reconnect(struct super_block *sb)
> >         struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
> >         int err = 0;
> > 
> > -       ceph_umount_begin(sb);
> > +       __ceph_umount_begin(fsc);
> > 
> >         /* Make sure all page caches get invalidated.
> >          * see remove_session_caps_cb() */
> > --
> > 2.26.2
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set
  2020-09-29  8:31   ` Yan, Zheng
@ 2020-09-29 12:46     ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-29 12:46 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: ceph-devel, Ilya Dryomov, Patrick Donnelly

On Tue, 2020-09-29 at 16:31 +0800, Yan, Zheng wrote:
> On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > Ilya noticed that the first access to a blacklisted mount would often
> > get back -EACCES, but then subsequent calls would be OK. The problem is
> > in __do_request. If the session is marked as REJECTED, a hard error is
> > returned instead of waiting for a new session to come into being.
> > 
> > When the session is REJECTED and the mount was done with
> > recover_session=clean, queue the request to the waiting_for_map queue,
> > which will be awoken after tearing down the old session.
> > 
> > URL: https://tracker.ceph.com/issues/47385
> > Reported-by: Ilya Dryomov <idryomov@gmail.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/ceph/mds_client.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> > index fd16db6ecb0a..b07e7adf146f 100644
> > --- a/fs/ceph/mds_client.c
> > +++ b/fs/ceph/mds_client.c
> > @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
> >         if (session->s_state != CEPH_MDS_SESSION_OPEN &&
> >             session->s_state != CEPH_MDS_SESSION_HUNG) {
> >                 if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> > -                       err = -EACCES;
> > +                       if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> > +                               list_add(&req->r_wait, &mdsc->waiting_for_map);
> > +                       else
> > +                               err = -EACCES;
> 
> During recovering session, client drops all dirty caps and abort all
> osd requests.  It does not make sense , some operations are waiting,
> the others get aborted.
> 


It makes sense to drop the caps and fail writeback of pages that were
dirty. The issue here is what to do with MDS (metadata) requests that
come in just after we notice the blocklisting but before the session has
been reestablished. Most of those aren't going to have any dependency on
the state of the pagecache.

They also (for the most part) won't have a dependency on caps. The main
exception I see is async unlink (async creates will be saved by the fact
we'll be dropping our delegated inode number range). An async unlink
could end up stalling across a recovery. The new MDS probably won't have
granted Du caps by the time the call goes out. We could cancel that but
likely would have already returned success on the unlink() call.

Granted, with all of this we're _way_ outside the realm of POSIX
behavior, so ultimately the right behavior is whatever we decide it
should be. Anyone who turns this stuff on should be prepared for some of
the operations leading up to the blocklisting to vaporize.

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29 10:58       ` Ilya Dryomov
@ 2020-09-29 12:48         ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-29 12:48 UTC (permalink / raw)
  To: Ilya Dryomov, Yan, Zheng; +Cc: ceph-devel, Patrick Donnelly

On Tue, 2020-09-29 at 12:58 +0200, Ilya Dryomov wrote:
> On Tue, Sep 29, 2020 at 12:44 PM Yan, Zheng <ukernel@gmail.com> wrote:
> > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > not fully convinced it's the right approach.
> > > > > 
> > > > 
> > > > the root is cause is that client does not recover session instantly
> > > > after getting rejected by mds. Before session gets recovered, client
> > > > continues to return error.
> > > 
> > > Hi Zheng,
> > > 
> > > I don't think it's about whether that happens instantly or not.
> > > In the example from [1], the first "ls" would fail even if issued
> > > minutes after the session reject message and the reconnect.  From
> > > the user's POV it is well after the automatic recovery promised by
> > > recover_session=clean.
> > > 
> > > [1] https://tracker.ceph.com/issues/47385
> > 
> > Reconnect should close all old session. It's likely because that
> > client didn't detect it's blacklisted.
> 
> Sorry, I should have pasted dmesg there as well.  It _does_ detect
> blacklisting -- notice that I wrote "after the session reject message
> and the reconnect".
> 

Yep, this is pretty easy to reproduce too (as Ilya points out in the tracker).

I'm open to other ways of smoothing this over. If we end up with a small
window where errors can occur, then so be it, but I think we can
probably do better than we have now.

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29 10:44     ` Yan, Zheng
  2020-09-29 10:58       ` Ilya Dryomov
@ 2020-09-29 19:50       ` Jeff Layton
  2020-09-30  8:45         ` Yan, Zheng
  1 sibling, 1 reply; 23+ messages in thread
From: Jeff Layton @ 2020-09-29 19:50 UTC (permalink / raw)
  To: Yan, Zheng, Ilya Dryomov; +Cc: ceph-devel, Patrick Donnelly

On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > not fully convinced it's the right approach.
> > > > 
> > > 
> > > the root is cause is that client does not recover session instantly
> > > after getting rejected by mds. Before session gets recovered, client
> > > continues to return error.
> > 
> > Hi Zheng,
> > 
> > I don't think it's about whether that happens instantly or not.
> > In the example from [1], the first "ls" would fail even if issued
> > minutes after the session reject message and the reconnect.  From
> > the user's POV it is well after the automatic recovery promised by
> > recover_session=clean.
> > 
> > [1] https://tracker.ceph.com/issues/47385
> 
> Reconnect should close all old session. It's likely because that
> client didn't detect it's blacklisted.
> 

I should have described this better -- let me explain:

It did detect that it was blocklisted (almost immediately) because the
MDS shuts down the session. I think it immediately sends a
SESSION_REJECT message when blacklisting and indicates that it has been
blocklisted.

At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
calls through would see that it was in that state and would return
-EACCES. Eventually, the delayed work runs and then the session gets
reconnected, and further calls proceed normally.

So, I think this is just a timing thing for the most part. The workqueue
job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
enough for the disruption to be noticeable.

While this was happening during 'ls' for Ilya, it could happen in
anything that involves sending a request to the MDS. I think we want to
prevent new opens from erroring out during this window if we can.

The real question is whether this is safe in all cases. For instance, if
the call that we're idling is dependent on holding certain caps, then
it's possible we will have lost them when we got REJECTED.

Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
a reply to that patch.

> > Thanks,
> > 
> >                 Ilya
> > 
> > > 
> > > > The potential issue I see is that the client could take cap references to
> > > > do a call on a session that has been blocklisted. We then queue the
> > > > message and reestablish the session, but we may not have been granted
> > > > the same caps by the MDS at that point.
> > > > 
> > > > If this is a problem, then we probably need to rework it so that we
> > > > return a distinct error code in this situation and have the upper layers
> > > > issue a completely new mds request (with new cap refs, etc.)
> > > > 
> > > > Obviously, that's a much more invasive approach though, so it would be
> > > > nice to avoid that if this would suffice.
> > > > 
> > > > Jeff Layton (4):
> > > >   ceph: don't WARN when removing caps due to blocklisting
> > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > >   ceph: queue request when CLEANRECOVER is set
> > > > 
> > > >  fs/ceph/caps.c       |  2 +-
> > > >  fs/ceph/mds_client.c | 10 ++++------
> > > >  fs/ceph/super.c      | 13 +++++++++----
> > > >  fs/ceph/super.h      |  1 -
> > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > 
> > > > --
> > > > 2.26.2
> > > > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set
  2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
  2020-09-29  8:31   ` Yan, Zheng
@ 2020-09-29 19:55   ` Jeff Layton
  1 sibling, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-29 19:55 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

On Fri, 2020-09-25 at 10:08 -0400, Jeff Layton wrote:
> Ilya noticed that the first access to a blacklisted mount would often
> get back -EACCES, but then subsequent calls would be OK. The problem is
> in __do_request. If the session is marked as REJECTED, a hard error is
> returned instead of waiting for a new session to come into being.
> 
> When the session is REJECTED and the mount was done with
> recover_session=clean, queue the request to the waiting_for_map queue,
> which will be awoken after tearing down the old session.
> 
> URL: https://tracker.ceph.com/issues/47385
> Reported-by: Ilya Dryomov <idryomov@gmail.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/ceph/mds_client.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index fd16db6ecb0a..b07e7adf146f 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -2819,7 +2819,10 @@ static void __do_request(struct ceph_mds_client *mdsc,
>  	if (session->s_state != CEPH_MDS_SESSION_OPEN &&
>  	    session->s_state != CEPH_MDS_SESSION_HUNG) {
>  		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
> -			err = -EACCES;
> +			if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
> +				list_add(&req->r_wait, &mdsc->waiting_for_map);
> +			else
> +				err = -EACCES;
>  			goto out_session;
>  		}
>  		/*

I think this is wrong when CEPH_MDS_R_ASYNC is set. I'll send a v2 set
if we end up staying with this approach.
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-29 19:50       ` Jeff Layton
@ 2020-09-30  8:45         ` Yan, Zheng
  2020-09-30 17:55           ` Jeff Layton
  0 siblings, 1 reply; 23+ messages in thread
From: Yan, Zheng @ 2020-09-30  8:45 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Ilya Dryomov, ceph-devel, Patrick Donnelly

On Wed, Sep 30, 2020 at 3:50 AM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > not fully convinced it's the right approach.
> > > > >
> > > >
> > > > the root is cause is that client does not recover session instantly
> > > > after getting rejected by mds. Before session gets recovered, client
> > > > continues to return error.
> > >
> > > Hi Zheng,
> > >
> > > I don't think it's about whether that happens instantly or not.
> > > In the example from [1], the first "ls" would fail even if issued
> > > minutes after the session reject message and the reconnect.  From
> > > the user's POV it is well after the automatic recovery promised by
> > > recover_session=clean.
> > >
> > > [1] https://tracker.ceph.com/issues/47385
> >
> > Reconnect should close all old session. It's likely because that
> > client didn't detect it's blacklisted.
> >
>
> I should have described this better -- let me explain:
>
> It did detect that it was blocklisted (almost immediately) because the
> MDS shuts down the session. I think it immediately sends a
> SESSION_REJECT message when blacklisting and indicates that it has been
> blocklisted.
>
> At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
> calls through would see that it was in that state and would return
> -EACCES. Eventually, the delayed work runs and then the session gets
> reconnected, and further calls proceed normally.
>
> So, I think this is just a timing thing for the most part. The workqueue
> job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
> enough for the disruption to be noticeable.
>
> While this was happening during 'ls' for Ilya, it could happen in
> anything that involves sending a request to the MDS. I think we want to
> prevent new opens from erroring out during this window if we can.
>
> The real question is whether this is safe in all cases. For instance, if
> the call that we're idling is dependent on holding certain caps, then
> it's possible we will have lost them when we got REJECTED.
>

The session in rejected state is new session. It should hold no caps.

> Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
> a reply to that patch.
>
> > > Thanks,
> > >
> > >                 Ilya
> > >
> > > >
> > > > > The potential issue I see is that the client could take cap references to
> > > > > do a call on a session that has been blocklisted. We then queue the
> > > > > message and reestablish the session, but we may not have been granted
> > > > > the same caps by the MDS at that point.
> > > > >
> > > > > If this is a problem, then we probably need to rework it so that we
> > > > > return a distinct error code in this situation and have the upper layers
> > > > > issue a completely new mds request (with new cap refs, etc.)
> > > > >
> > > > > Obviously, that's a much more invasive approach though, so it would be
> > > > > nice to avoid that if this would suffice.
> > > > >
> > > > > Jeff Layton (4):
> > > > >   ceph: don't WARN when removing caps due to blocklisting
> > > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > > >   ceph: queue request when CLEANRECOVER is set
> > > > >
> > > > >  fs/ceph/caps.c       |  2 +-
> > > > >  fs/ceph/mds_client.c | 10 ++++------
> > > > >  fs/ceph/super.c      | 13 +++++++++----
> > > > >  fs/ceph/super.h      |  1 -
> > > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > >
> > > > > --
> > > > > 2.26.2
> > > > >
>
> --
> Jeff Layton <jlayton@kernel.org>
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v2 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
                   ` (4 preceding siblings ...)
  2020-09-29  8:28 ` [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Yan, Zheng
@ 2020-09-30 12:10 ` Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
                     ` (3 more replies)
  5 siblings, 4 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 12:10 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

v2: fix handling of async requests in patch to queue requests

This is basically the same patchset as before, with a small revision to
the last patch to fix up the handling of async requests. With this
version, an async request will not be queued but instead will return
-EJUKEBOX so the caller can drive a synchronous request.

Original cover letter follows:

Ilya noticed that he would get spurious EACCES errors on calls done just
after blocklisting the client on mounts with recover_session=clean. The
session would get marked as REJECTED and that caused in-flight calls to
die with EACCES. This patchset seems to smooth over the problem, but I'm
not fully convinced it's the right approach.

The potential issue I see is that the client could take cap references to
do a call on a session that has been blocklisted. We then queue the
message and reestablish the session, but we may not have been granted
the same caps by the MDS at that point.

If this is a problem, then we probably need to rework it so that we
return a distinct error code in this situation and have the upper layers
issue a completely new mds request (with new cap refs, etc.)

Obviously, that's a much more invasive approach though, so it would be
nice to avoid that if this would suffice.

Jeff Layton (4):
  ceph: don't WARN when removing caps due to blocklisting
  ceph: don't mark mount as SHUTDOWN when recovering session
  ceph: remove timeout on allowing reconnect after blocklisting
  ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set

 fs/ceph/caps.c       |  2 +-
 fs/ceph/mds_client.c | 23 ++++++++++++++---------
 fs/ceph/super.c      | 13 +++++++++----
 fs/ceph/super.h      |  1 -
 4 files changed, 24 insertions(+), 15 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
@ 2020-09-30 12:10   ` Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 12:10 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

We expect to remove dirty caps when the client is blocklisted. Don't
throw a warning in that case.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/caps.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index c7e69547628e..2ee3f316afcf 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -1149,7 +1149,7 @@ void __ceph_remove_cap(struct ceph_cap *cap, bool queue_release)
 	/* remove from inode's cap rbtree, and clear auth cap */
 	rb_erase(&cap->ci_node, &ci->i_caps);
 	if (ci->i_auth_cap == cap) {
-		WARN_ON_ONCE(!list_empty(&ci->i_dirty_item));
+		WARN_ON_ONCE(!list_empty(&ci->i_dirty_item) && !mdsc->fsc->blocklisted);
 		ci->i_auth_cap = NULL;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v2 2/4] ceph: don't mark mount as SHUTDOWN when recovering session
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
@ 2020-09-30 12:10   ` Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 4/4] ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set Jeff Layton
  3 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 12:10 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

When recovering a session (a'la recover_session=clean), we want to do
all of the operations that we do on a forced umount, but changing the
mount state to SHUTDOWN is wrong and can cause queued MDS requests to
fail when the session comes back.

Only mark it as SHUTDOWN when umount_begin is called.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/super.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/super.c b/fs/ceph/super.c
index 2516304379d3..46a0e4e1b177 100644
--- a/fs/ceph/super.c
+++ b/fs/ceph/super.c
@@ -832,6 +832,13 @@ static void destroy_caches(void)
 	ceph_fscache_unregister();
 }
 
+static void __ceph_umount_begin(struct ceph_fs_client *fsc)
+{
+	ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
+	ceph_mdsc_force_umount(fsc->mdsc);
+	fsc->filp_gen++; // invalidate open files
+}
+
 /*
  * ceph_umount_begin - initiate forced umount.  Tear down the
  * mount, skipping steps that may hang while waiting for server(s).
@@ -844,9 +851,7 @@ static void ceph_umount_begin(struct super_block *sb)
 	if (!fsc)
 		return;
 	fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
-	ceph_osdc_abort_requests(&fsc->client->osdc, -EIO);
-	ceph_mdsc_force_umount(fsc->mdsc);
-	fsc->filp_gen++; // invalidate open files
+	__ceph_umount_begin(fsc);
 }
 
 static const struct super_operations ceph_super_ops = {
@@ -1235,7 +1240,7 @@ int ceph_force_reconnect(struct super_block *sb)
 	struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
 	int err = 0;
 
-	ceph_umount_begin(sb);
+	__ceph_umount_begin(fsc);
 
 	/* Make sure all page caches get invalidated.
 	 * see remove_session_caps_cb() */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v2 3/4] ceph: remove timeout on allowing reconnect after blocklisting
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
@ 2020-09-30 12:10   ` Jeff Layton
  2020-09-30 12:10   ` [RFC PATCH v2 4/4] ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set Jeff Layton
  3 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 12:10 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

30 minutes is a long time to wait, and this makes it difficult to test
the feature by manually blocklisting clients. Remove the timeout
infrastructure and just allow the client to reconnect at will.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 5 -----
 fs/ceph/super.h      | 1 -
 2 files changed, 6 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 08f1c0c31dc2..fd16db6ecb0a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -4374,12 +4374,7 @@ static void maybe_recover_session(struct ceph_mds_client *mdsc)
 	if (!READ_ONCE(fsc->blocklisted))
 		return;
 
-	if (fsc->last_auto_reconnect &&
-	    time_before(jiffies, fsc->last_auto_reconnect + HZ * 60 * 30))
-		return;
-
 	pr_info("auto reconnect after blocklisted\n");
-	fsc->last_auto_reconnect = jiffies;
 	ceph_force_reconnect(fsc->sb);
 }
 
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 582694899130..cb138e218ab4 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -108,7 +108,6 @@ struct ceph_fs_client {
 
 	unsigned long mount_state;
 
-	unsigned long last_auto_reconnect;
 	bool blocklisted;
 
 	bool have_copy_from2;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [RFC PATCH v2 4/4] ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set
  2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
                     ` (2 preceding siblings ...)
  2020-09-30 12:10   ` [RFC PATCH v2 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
@ 2020-09-30 12:10   ` Jeff Layton
  3 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 12:10 UTC (permalink / raw)
  To: ceph-devel; +Cc: idryomov, ukernel, pdonnell

Ilya noticed that the first access to a blacklisted mount would often
get back -EACCES, but then subsequent calls would be OK. The problem is
in __do_request. If the session is marked as REJECTED, a hard error is
returned instead of waiting for a new session to come into being.

When the session is REJECTED and the mount was done with
recover_session=clean, queue the request to the waiting_for_map queue,
which will be awoken after tearing down the old session. We can only
do this for sync requests though, so check for async ones first and
just let the callers redrive a sync request.

URL: https://tracker.ceph.com/issues/47385
Reported-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/mds_client.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index fd16db6ecb0a..2e1b3313f69b 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2818,10 +2818,6 @@ static void __do_request(struct ceph_mds_client *mdsc,
 	     ceph_session_state_name(session->s_state));
 	if (session->s_state != CEPH_MDS_SESSION_OPEN &&
 	    session->s_state != CEPH_MDS_SESSION_HUNG) {
-		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
-			err = -EACCES;
-			goto out_session;
-		}
 		/*
 		 * We cannot queue async requests since the caps and delegated
 		 * inodes are bound to the session. Just return -EJUKEBOX and
@@ -2831,6 +2827,20 @@ static void __do_request(struct ceph_mds_client *mdsc,
 			err = -EJUKEBOX;
 			goto out_session;
 		}
+
+		/*
+		 * If the session has been REJECTED, then return a hard error,
+		 * unless it's a CLEANRECOVER mount, in which case we'll queue
+		 * it to the mdsc queue.
+		 */
+		if (session->s_state == CEPH_MDS_SESSION_REJECTED) {
+			if (ceph_test_mount_opt(mdsc->fsc, CLEANRECOVER))
+				list_add(&req->r_wait, &mdsc->waiting_for_map);
+			else
+				err = -EACCES;
+			goto out_session;
+		}
+
 		if (session->s_state == CEPH_MDS_SESSION_NEW ||
 		    session->s_state == CEPH_MDS_SESSION_CLOSING) {
 			err = __open_session(mdsc, session);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors
  2020-09-30  8:45         ` Yan, Zheng
@ 2020-09-30 17:55           ` Jeff Layton
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Layton @ 2020-09-30 17:55 UTC (permalink / raw)
  To: Yan, Zheng; +Cc: Ilya Dryomov, ceph-devel, Patrick Donnelly

On Wed, 2020-09-30 at 16:45 +0800, Yan, Zheng wrote:
> On Wed, Sep 30, 2020 at 3:50 AM Jeff Layton <jlayton@kernel.org> wrote:
> > On Tue, 2020-09-29 at 18:44 +0800, Yan, Zheng wrote:
> > > On Tue, Sep 29, 2020 at 4:55 PM Ilya Dryomov <idryomov@gmail.com> wrote:
> > > > On Tue, Sep 29, 2020 at 10:28 AM Yan, Zheng <ukernel@gmail.com> wrote:
> > > > > On Fri, Sep 25, 2020 at 10:08 PM Jeff Layton <jlayton@kernel.org> wrote:
> > > > > > Ilya noticed that he would get spurious EACCES errors on calls done just
> > > > > > after blocklisting the client on mounts with recover_session=clean. The
> > > > > > session would get marked as REJECTED and that caused in-flight calls to
> > > > > > die with EACCES. This patchset seems to smooth over the problem, but I'm
> > > > > > not fully convinced it's the right approach.
> > > > > > 
> > > > > 
> > > > > the root is cause is that client does not recover session instantly
> > > > > after getting rejected by mds. Before session gets recovered, client
> > > > > continues to return error.
> > > > 
> > > > Hi Zheng,
> > > > 
> > > > I don't think it's about whether that happens instantly or not.
> > > > In the example from [1], the first "ls" would fail even if issued
> > > > minutes after the session reject message and the reconnect.  From
> > > > the user's POV it is well after the automatic recovery promised by
> > > > recover_session=clean.
> > > > 
> > > > [1] https://tracker.ceph.com/issues/47385
> > > 
> > > Reconnect should close all old session. It's likely because that
> > > client didn't detect it's blacklisted.
> > > 
> > 
> > I should have described this better -- let me explain:
> > 
> > It did detect that it was blocklisted (almost immediately) because the
> > MDS shuts down the session. I think it immediately sends a
> > SESSION_REJECT message when blacklisting and indicates that it has been
> > blocklisted.
> > 
> > At that point the session is CEPH_MDS_SESSION_REJECTED. The next MDS
> > calls through would see that it was in that state and would return
> > -EACCES. Eventually, the delayed work runs and then the session gets
> > reconnected, and further calls proceed normally.
> > 
> > So, I think this is just a timing thing for the most part. The workqueue
> > job runs on a delay of round_jiffies_relative(HZ * 5);, and that's long
> > enough for the disruption to be noticeable.
> > 
> > While this was happening during 'ls' for Ilya, it could happen in
> > anything that involves sending a request to the MDS. I think we want to
> > prevent new opens from erroring out during this window if we can.
> > 
> > The real question is whether this is safe in all cases. For instance, if
> > the call that we're idling is dependent on holding certain caps, then
> > it's possible we will have lost them when we got REJECTED.
> > 
> 
> The session in rejected state is new session. It should hold no caps.
> 

Right.

We're actually OK here wrt to async requests as they will return
EJUKEBOX and the caller will redrive a synchronous request. Other
MClientRequest calls don't require that the client hold any caps,
AFAICT, so idling them until we can establish a new session should be
OK, no?

> > Hmm...so that means patch 4/4 is probably wrong. I'll comment further in
> > a reply to that patch.
> > 
> > > > Thanks,
> > > > 
> > > >                 Ilya
> > > > 
> > > > > > The potential issue I see is that the client could take cap references to
> > > > > > do a call on a session that has been blocklisted. We then queue the
> > > > > > message and reestablish the session, but we may not have been granted
> > > > > > the same caps by the MDS at that point.
> > > > > > 
> > > > > > If this is a problem, then we probably need to rework it so that we
> > > > > > return a distinct error code in this situation and have the upper layers
> > > > > > issue a completely new mds request (with new cap refs, etc.)
> > > > > > 
> > > > > > Obviously, that's a much more invasive approach though, so it would be
> > > > > > nice to avoid that if this would suffice.
> > > > > > 
> > > > > > Jeff Layton (4):
> > > > > >   ceph: don't WARN when removing caps due to blocklisting
> > > > > >   ceph: don't mark mount as SHUTDOWN when recovering session
> > > > > >   ceph: remove timeout on allowing reconnect after blocklisting
> > > > > >   ceph: queue request when CLEANRECOVER is set
> > > > > > 
> > > > > >  fs/ceph/caps.c       |  2 +-
> > > > > >  fs/ceph/mds_client.c | 10 ++++------
> > > > > >  fs/ceph/super.c      | 13 +++++++++----
> > > > > >  fs/ceph/super.h      |  1 -
> > > > > >  4 files changed, 14 insertions(+), 12 deletions(-)
> > > > > > 
> > > > > > --
> > > > > > 2.26.2
> > > > > > 
> > 
> > --
> > Jeff Layton <jlayton@kernel.org>
> > 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-09-30 17:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-25 14:08 [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
2020-09-29  8:20   ` Yan, Zheng
2020-09-29 12:30     ` Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
2020-09-25 14:08 ` [RFC PATCH 4/4] ceph: queue request when CLEANRECOVER is set Jeff Layton
2020-09-29  8:31   ` Yan, Zheng
2020-09-29 12:46     ` Jeff Layton
2020-09-29 19:55   ` Jeff Layton
2020-09-29  8:28 ` [RFC PATCH 0/4] ceph: fix spurious recover_session=clean errors Yan, Zheng
2020-09-29  8:54   ` Ilya Dryomov
2020-09-29 10:44     ` Yan, Zheng
2020-09-29 10:58       ` Ilya Dryomov
2020-09-29 12:48         ` Jeff Layton
2020-09-29 19:50       ` Jeff Layton
2020-09-30  8:45         ` Yan, Zheng
2020-09-30 17:55           ` Jeff Layton
2020-09-30 12:10 ` [RFC PATCH v2 " Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 1/4] ceph: don't WARN when removing caps due to blocklisting Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 2/4] ceph: don't mark mount as SHUTDOWN when recovering session Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 3/4] ceph: remove timeout on allowing reconnect after blocklisting Jeff Layton
2020-09-30 12:10   ` [RFC PATCH v2 4/4] ceph: queue MDS requests to REJECTED sessions when CLEANRECOVER is set Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).