All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/3] Misc withdraw patches
@ 2022-07-27 16:02 Bob Peterson
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Bob Peterson @ 2022-07-27 16:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch set fixes a few bugs in how gfs2 handles file systems after
withdraw. In an ideal world, after a file system is withdrawn, users
should be able to unmount the file system without problems. However, we
discovered three problems that prevented clean unmounts:

1. A duplicate iput of the journal after attempted recovery caused
   kernel panics after withdraw.
2. After withdraw, unmount would hang for its alloted timeout period
   when glocks had waiters queued that, due to the withdraw, could
   never be granted.
3. Unmount would similarly hang when the withdraw prevented an outgoing
   request to dlm, but so the glock was never unlocked.

Bob Peterson (3):
  gfs2: Prevent double iput for journal on error
  gfs2: Dequeue waiters when withdrawn
  gfs2: Clear GLF_LOCK when withdraw prevents xmote

 fs/gfs2/glock.c | 33 ++++++++++++++++++++++++++++++++-
 fs/gfs2/glock.h |  1 +
 fs/gfs2/util.c  | 16 ++++++++++++++++
 3 files changed, 49 insertions(+), 1 deletion(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error
  2022-07-27 16:02 [Cluster-devel] [PATCH 0/3] Misc withdraw patches Bob Peterson
@ 2022-07-27 16:02 ` Bob Peterson
  2022-08-02 14:37   ` Andreas Gruenbacher
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn Bob Peterson
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 3/3] gfs2: Clear GLF_LOCK when withdraw prevents xmote Bob Peterson
  2 siblings, 1 reply; 7+ messages in thread
From: Bob Peterson @ 2022-07-27 16:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a gfs2 file system is withdrawn, it requests recovery from another
cluster node. To do that, it evicts its journal from memory, but it
keeps the journal entry queued to the journals queue, jindex_list. After
recovery it tries to grab a new inode for its (recovered) journal. If it
cannot, it skips further recovery but its evicted journal is still on
the jindex list, which means unmount will try to iput it a second time
after it's been evicted. This second iput causes vfs to complain and BUG
out:

kernel BUG at fs/inode.c:1680!

To prevent this, this patch takes steps to dequeue the journal
descriptor from the list when it cannot get a replacement inode. So
unmount won't find it on the list and try to iput it again.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/util.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 8241029a2a5d..78cb12d0fba1 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -275,6 +275,17 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	if (IS_ERR(inode)) {
 		fs_warn(sdp, "Reprocessing of jid %d failed with %ld.\n",
 			sdp->sd_lockstruct.ls_jid, PTR_ERR(inode));
+		/*
+		 * We couldn't get a replacement inode for our journal but we
+		 * evicted the old one. So dequeue it from the journals queue,
+		 * jindex_list, so that unmount doesn't do iput on it twice.
+		 */
+		spin_lock(&sdp->sd_jindex_spin);
+		list_del(&sdp->sd_jdesc->jd_list);
+		sdp->sd_journals--;
+		spin_unlock(&sdp->sd_jindex_spin);
+		kfree(sdp->sd_jdesc);
+		sdp->sd_jdesc = NULL;
 		goto skip_recovery;
 	}
 	sdp->sd_jdesc->jd_inode = inode;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn
  2022-07-27 16:02 [Cluster-devel] [PATCH 0/3] Misc withdraw patches Bob Peterson
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson
@ 2022-07-27 16:02 ` Bob Peterson
  2022-08-02 14:47   ` Andreas Gruenbacher
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 3/3] gfs2: Clear GLF_LOCK when withdraw prevents xmote Bob Peterson
  2 siblings, 1 reply; 7+ messages in thread
From: Bob Peterson @ 2022-07-27 16:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a withdraw occurs, ordinary (not system) glocks may not be granted
anymore. This patch takes measures to remove any pending waiters from
the glocks that will never be granted.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 14 ++++++++++++++
 fs/gfs2/glock.h |  1 +
 fs/gfs2/util.c  |  5 +++++
 3 files changed, 20 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index e540d1290099..0bfecffd71f1 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2223,6 +2223,20 @@ static void dump_glock_func(struct gfs2_glock *gl)
 	dump_glock(NULL, gl, true);
 }
 
+static void withdraw_dq(struct gfs2_glock *gl)
+{
+	spin_lock(&gl->gl_lockref.lock);
+	if (!__lockref_is_dead(&gl->gl_lockref) &&
+	    glock_blocked_by_withdraw(gl))
+		do_error(gl, LM_OUT_ERROR); /* remove pending waiters */
+	spin_unlock(&gl->gl_lockref.lock);
+}
+
+void gfs2_gl_dq_holders(struct gfs2_sbd *sdp)
+{
+	glock_hash_walk(withdraw_dq, sdp);
+}
+
 /**
  * gfs2_gl_hash_clear - Empty out the glock hash table
  * @sdp: the filesystem
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index c0ae9100a0bc..d6fc449aa7ff 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -273,6 +273,7 @@ extern void gfs2_cancel_delete_work(struct gfs2_glock *gl);
 extern bool gfs2_delete_work_queued(const struct gfs2_glock *gl);
 extern void gfs2_flush_delete_work(struct gfs2_sbd *sdp);
 extern void gfs2_gl_hash_clear(struct gfs2_sbd *sdp);
+extern void gfs2_gl_dq_holders(struct gfs2_sbd *sdp);
 extern void gfs2_glock_finish_truncate(struct gfs2_inode *ip);
 extern void gfs2_glock_thaw(struct gfs2_sbd *sdp);
 extern void gfs2_glock_add_to_lru(struct gfs2_glock *gl);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 78cb12d0fba1..acf94c83b29b 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -164,6 +164,11 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 		}
 		if (!ret)
 			gfs2_make_fs_ro(sdp);
+		/*
+		 * Dequeue any pending non-system glock holders that can no
+		 * longer be granted because the file system is withdrawn.
+		 */
+		gfs2_gl_dq_holders(sdp);
 		gfs2_freeze_unlock(&freeze_gh);
 	}
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 3/3] gfs2: Clear GLF_LOCK when withdraw prevents xmote
  2022-07-27 16:02 [Cluster-devel] [PATCH 0/3] Misc withdraw patches Bob Peterson
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn Bob Peterson
@ 2022-07-27 16:02 ` Bob Peterson
  2 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2022-07-27 16:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

There are a couple places in function do_xmote where normal processing
is circumvented due to withdraws in progress. However, since we bypass
most of do_xmote() we bypass telling dlm to lock the dlm lock, which
means dlm will never respond with a completion callback. Since the
completion callback ordinarily clears GLF_LOCK, this patch changes
function do_xmote to handle those situations more gracefully so the
file system may be unmounted after withdraw.

A very similar situation happens with the GLF_DEMOTE_IN_PROGRESS flag,
which is cleared by function finish_xmote(). Since the withdraw causes
us to skip the majority of do_xmote, it therefore also skips the call
to finish_xmote() so the DEMOTE_IN_PROGRESS flag needs to be cleared
manually.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 0bfecffd71f1..d508d8fa0838 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -59,6 +59,8 @@ typedef void (*glock_examiner) (struct gfs2_glock * gl);
 
 static void do_xmote(struct gfs2_glock *gl, struct gfs2_holder *gh, unsigned int target);
 static void __gfs2_glock_dq(struct gfs2_holder *gh);
+static void handle_callback(struct gfs2_glock *gl, unsigned int state,
+			    unsigned long delay, bool remote);
 
 static struct dentry *gfs2_root;
 static struct workqueue_struct *glock_workqueue;
@@ -762,8 +764,21 @@ __acquires(&gl->gl_lockref.lock)
 	int ret;
 
 	if (target != LM_ST_UNLOCKED && glock_blocked_by_withdraw(gl) &&
-	    gh && !(gh->gh_flags & LM_FLAG_NOEXP))
+	    gh && !(gh->gh_flags & LM_FLAG_NOEXP)) {
+		/*
+		 * We won't tell dlm to perform the lock, so we won't get a
+		 * reply that would otherwise clear GLF_LOCK. So we clear it.
+		 */
+		handle_callback(gl, LM_ST_UNLOCKED, 0, false);
+		clear_bit(GLF_LOCK, &gl->gl_flags);
+		clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
+		/*
+		 * Don't increment lockref here. The next time the worker runs it will do
+		 * glock_put, which will decrement it to 0, and free the glock.
+		 */
+		__gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
 		return;
+	}
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
 		      LM_FLAG_PRIORITY);
 	GLOCK_BUG_ON(gl, gl->gl_state == target);
@@ -848,6 +863,8 @@ __acquires(&gl->gl_lockref.lock)
 	    (target != LM_ST_UNLOCKED ||
 	     test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags))) {
 		if (!is_system_glock(gl)) {
+			clear_bit(GLF_LOCK, &gl->gl_flags);
+			clear_bit(GLF_DEMOTE_IN_PROGRESS, &gl->gl_flags);
 			gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
 			goto out;
 		} else {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson
@ 2022-08-02 14:37   ` Andreas Gruenbacher
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2022-08-02 14:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 27, 2022 at 6:32 PM Bob Peterson <rpeterso@redhat.com> wrote:
> When a gfs2 file system is withdrawn, it requests recovery from another
> cluster node. To do that, it evicts its journal from memory, but it
> keeps the journal entry queued to the journals queue, jindex_list. After
> recovery it tries to grab a new inode for its (recovered) journal. If it
> cannot, it skips further recovery but its evicted journal is still on
> the jindex list, which means unmount will try to iput it a second time
> after it's been evicted. This second iput causes vfs to complain and BUG
> out:
>
> kernel BUG at fs/inode.c:1680!
>
> To prevent this, this patch takes steps to dequeue the journal
> descriptor from the list when it cannot get a replacement inode. So
> unmount won't find it on the list and try to iput it again.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/util.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 8241029a2a5d..78cb12d0fba1 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -275,6 +275,17 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
>         if (IS_ERR(inode)) {
>                 fs_warn(sdp, "Reprocessing of jid %d failed with %ld.\n",
>                         sdp->sd_lockstruct.ls_jid, PTR_ERR(inode));
> +               /*
> +                * We couldn't get a replacement inode for our journal but we
> +                * evicted the old one. So dequeue it from the journals queue,
> +                * jindex_list, so that unmount doesn't do iput on it twice.
> +                */
> +               spin_lock(&sdp->sd_jindex_spin);
> +               list_del(&sdp->sd_jdesc->jd_list);
> +               sdp->sd_journals--;
> +               spin_unlock(&sdp->sd_jindex_spin);
> +               kfree(sdp->sd_jdesc);
> +               sdp->sd_jdesc = NULL;

Wouldn't it make more sense to set sdp->sd_jdesc->jd_inode to NULL
where we call iput() on that inode? An iput(NULL) is a no-op, so we'd
not need to change gfs2_jindex_free() to make that work.

Thanks,
Andreas

>                 goto skip_recovery;
>         }
>         sdp->sd_jdesc->jd_inode = inode;
> --
> 2.36.1
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn
  2022-07-27 16:02 ` [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn Bob Peterson
@ 2022-08-02 14:47   ` Andreas Gruenbacher
  0 siblings, 0 replies; 7+ messages in thread
From: Andreas Gruenbacher @ 2022-08-02 14:47 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jul 27, 2022 at 6:24 PM Bob Peterson <rpeterso@redhat.com> wrote:
> When a withdraw occurs, ordinary (not system) glocks may not be granted
> anymore. This patch takes measures to remove any pending waiters from
> the glocks that will never be granted.

You kind of explain why this patch is useful in the cover letter, but
not here. Could you please add that description here?

Thanks,
Andreas

> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/glock.c | 14 ++++++++++++++
>  fs/gfs2/glock.h |  1 +
>  fs/gfs2/util.c  |  5 +++++
>  3 files changed, 20 insertions(+)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index e540d1290099..0bfecffd71f1 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -2223,6 +2223,20 @@ static void dump_glock_func(struct gfs2_glock *gl)
>         dump_glock(NULL, gl, true);
>  }
>
> +static void withdraw_dq(struct gfs2_glock *gl)
> +{
> +       spin_lock(&gl->gl_lockref.lock);
> +       if (!__lockref_is_dead(&gl->gl_lockref) &&
> +           glock_blocked_by_withdraw(gl))
> +               do_error(gl, LM_OUT_ERROR); /* remove pending waiters */
> +       spin_unlock(&gl->gl_lockref.lock);
> +}
> +
> +void gfs2_gl_dq_holders(struct gfs2_sbd *sdp)
> +{
> +       glock_hash_walk(withdraw_dq, sdp);
> +}
> +
>  /**
>   * gfs2_gl_hash_clear - Empty out the glock hash table
>   * @sdp: the filesystem
> diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
> index c0ae9100a0bc..d6fc449aa7ff 100644
> --- a/fs/gfs2/glock.h
> +++ b/fs/gfs2/glock.h
> @@ -273,6 +273,7 @@ extern void gfs2_cancel_delete_work(struct gfs2_glock *gl);
>  extern bool gfs2_delete_work_queued(const struct gfs2_glock *gl);
>  extern void gfs2_flush_delete_work(struct gfs2_sbd *sdp);
>  extern void gfs2_gl_hash_clear(struct gfs2_sbd *sdp);
> +extern void gfs2_gl_dq_holders(struct gfs2_sbd *sdp);
>  extern void gfs2_glock_finish_truncate(struct gfs2_inode *ip);
>  extern void gfs2_glock_thaw(struct gfs2_sbd *sdp);
>  extern void gfs2_glock_add_to_lru(struct gfs2_glock *gl);
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 78cb12d0fba1..acf94c83b29b 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -164,6 +164,11 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
>                 }
>                 if (!ret)
>                         gfs2_make_fs_ro(sdp);
> +               /*
> +                * Dequeue any pending non-system glock holders that can no
> +                * longer be granted because the file system is withdrawn.
> +                */
> +               gfs2_gl_dq_holders(sdp);
>                 gfs2_freeze_unlock(&freeze_gh);
>         }
>
> --
> 2.36.1
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error
  2022-08-02 17:58 [Cluster-devel] [PATCH v2 0/3] gfs2: Misc withdraw patches Bob Peterson
@ 2022-08-02 17:58 ` Bob Peterson
  0 siblings, 0 replies; 7+ messages in thread
From: Bob Peterson @ 2022-08-02 17:58 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a gfs2 file system is withdrawn it does iput on its journal to
allow recovery from another cluster node. If it's unable to get a
replacement inode for whatever reason, the journal descriptor would
still be pointing at the evicted inode. So when unmount clears out the
list of journals, it would do a second iput referencing the pointer.
To avoid this, set the journal inode pointer to NULL.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/util.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 8241029a2a5d..95c79a3ec161 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -204,6 +204,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 * exception code in glock_dq.
 	 */
 	iput(inode);
+	sdp->sd_jdesc->jd_inode = NULL;
 	/*
 	 * Wait until the journal inode's glock is freed. This allows try locks
 	 * on other nodes to be successful, otherwise we remain the owner of
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-02 17:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27 16:02 [Cluster-devel] [PATCH 0/3] Misc withdraw patches Bob Peterson
2022-07-27 16:02 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson
2022-08-02 14:37   ` Andreas Gruenbacher
2022-07-27 16:02 ` [Cluster-devel] [PATCH 2/3] gfs2: Dequeue waiters when withdrawn Bob Peterson
2022-08-02 14:47   ` Andreas Gruenbacher
2022-07-27 16:02 ` [Cluster-devel] [PATCH 3/3] gfs2: Clear GLF_LOCK when withdraw prevents xmote Bob Peterson
2022-08-02 17:58 [Cluster-devel] [PATCH v2 0/3] gfs2: Misc withdraw patches Bob Peterson
2022-08-02 17:58 ` [Cluster-devel] [PATCH 1/3] gfs2: Prevent double iput for journal on error Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.