[Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection
@ 2019-05-23 13:03 Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
                   ` (25 more replies)
  0 siblings, 26 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here is version 6 of the patch set I posted on 23 April. It is revised
based on additional bugs I found testing with xfstests, and problems
found by Andreas.

The first 8 are cleanups, the rest are bug fixes.

This is a collection of patches I've been using to address the myriad
of recovery problems I've found. There aren't many other dependencies
between patches, so many could be accepted or rejected individually.


Bob Peterson (26):
  gfs2: kthread and remount improvements
  gfs2: eliminate tr_num_revoke_rm
  gfs2: log which portion of the journal is replayed
  gfs2: Warn when a journal replay overwrites a rgrp with buffers
  gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN
  gfs2: simplify gfs2_freeze by removing case
  gfs2: dump fsid when dumping glock problems
  gfs2: replace more printk with calls to fs_info and friends
  gfs2: Introduce concept of a pending withdraw
  gfs2: fix infinite loop in gfs2_ail1_flush on io error
  gfs2: log error reform
  gfs2: Only complain the first time an io error occurs in quota or log
  gfs2: Stop ail1 wait loop when withdrawn
  gfs2: Ignore dlm recovery requests if gfs2 is withdrawn
  gfs2: move check_journal_clean to util.c for future use
  gfs2: Allow some glocks to be used during withdraw
  gfs2: Don't loop forever in gfs2_freeze if withdrawn
  gfs2: Make secondary withdrawers wait for first withdrawer
  gfs2: Don't write log headers after file system withdraw
  gfs2: Force withdraw to replay journals and wait for it to finish
  gfs2: fix infinite loop when checking ail item count before go_inval
  gfs2: Add verbose option to check_journal_clean
  gfs2: Abort gfs2_freeze if io error is seen
  gfs2: Issue revokes more intelligently
  gfs2: Prepare to withdraw as soon as an IO error occurs in log write
  gfs2: Check for log write errors before telling dlm to unlock

 fs/gfs2/aops.c       |   4 +-
 fs/gfs2/bmap.c       |   2 +-
 fs/gfs2/file.c       |   2 +-
 fs/gfs2/glock.c      | 121 +++++++++++++++++-----
 fs/gfs2/glock.h      |  12 ++-
 fs/gfs2/glops.c      | 108 ++++++++++++++++++--
 fs/gfs2/glops.h      |   3 +-
 fs/gfs2/incore.h     |  30 ++++--
 fs/gfs2/inode.c      |  14 ++-
 fs/gfs2/lock_dlm.c   |  52 ++++++++++
 fs/gfs2/log.c        | 113 ++++++++++----------
 fs/gfs2/lops.c       |  28 ++++-
 fs/gfs2/meta_io.c    |   6 +-
 fs/gfs2/ops_fstype.c |  65 +++---------
 fs/gfs2/quota.c      |  10 +-
 fs/gfs2/recovery.c   |   8 +-
 fs/gfs2/rgrp.c       |  48 +++++----
 fs/gfs2/rgrp.h       |   3 +-
 fs/gfs2/super.c      | 125 ++++++++++++++++-------
 fs/gfs2/super.h      |   1 +
 fs/gfs2/sys.c        |  14 ++-
 fs/gfs2/trans.c      |   6 +-
 fs/gfs2/util.c       | 238 +++++++++++++++++++++++++++++++++++++++++--
 fs/gfs2/util.h       |  15 +++
 24 files changed, 785 insertions(+), 243 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
@ 2019-05-23 13:03 ` Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm Bob Peterson
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, gfs2 saved the pointers to the two daemon threads
(logd and quotad) in the superblock, but they were never cleared,
even if the threads were stopped (e.g. on remount -o ro). That meant
that certain error conditions (like a withdrawn file system) could
race. For example, xfstests generic/361 caused an IO error during
remount -o ro, which caused the kthreads to be stopped, then the
error flagged. Later, when the test unmounted the file system, it
would try to stop the threads a second time with kthread_stop.

This patch does two things: First, every time it stops the threads
it zeroes out the thread pointer, and also checks whether it's NULL
before trying to stop it. Second, in function gfs2_remount_fs, it
was returning if an error was logged by either of the two functions
for gfs2_make_fs_ro and _rw, which caused it to bypass the online
uevent at the bottom of the function. This removes that bypass in
favor of just running the whole function, then returning the error.
That way, unmounts and remounts won't hang forever.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index fbf6b1fd330b..a468e58fcda4 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -397,6 +397,7 @@ static int init_threads(struct gfs2_sbd *sdp)

 fail:
 	kthread_stop(sdp->sd_logd_process);
+	sdp->sd_logd_process = NULL;
 	return error;
 }

@@ -454,8 +455,12 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 	freeze_gh.gh_flags |= GL_NOCACHE;
 	gfs2_glock_dq_uninit(&freeze_gh);
 fail_threads:
-	kthread_stop(sdp->sd_quotad_process);
-	kthread_stop(sdp->sd_logd_process);
+	if (sdp->sd_quotad_process)
+		kthread_stop(sdp->sd_quotad_process);
+	sdp->sd_quotad_process = NULL;
+	if (sdp->sd_logd_process)
+		kthread_stop(sdp->sd_logd_process);
+	sdp->sd_logd_process = NULL;
 	return error;
 }

@@ -856,8 +861,12 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 		return error;

 	flush_workqueue(gfs2_delete_workqueue);
-	kthread_stop(sdp->sd_quotad_process);
-	kthread_stop(sdp->sd_logd_process);
+	if (sdp->sd_quotad_process)
+		kthread_stop(sdp->sd_quotad_process);
+	sdp->sd_quotad_process = NULL;
+	if (sdp->sd_logd_process)
+		kthread_stop(sdp->sd_logd_process);
+	sdp->sd_logd_process = NULL;

 	gfs2_quota_sync(sdp->sd_vfs, 0);
 	gfs2_statfs_sync(sdp->sd_vfs, 0);
@@ -1276,8 +1285,6 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
 			error = gfs2_make_fs_ro(sdp);
 		else
 			error = gfs2_make_fs_rw(sdp);
-		if (error)
-			return error;
 	}

 	sdp->sd_args = args;
@@ -1303,7 +1310,7 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
 	spin_unlock(&gt->gt_spin);

 	gfs2_online_uevent(sdp);
-	return 0;
+	return error;
 }

 /**
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
@ 2019-05-23 13:03 ` Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed Bob Peterson
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed in their own list, linked from the superblock.
So it's entirely unnecessary to keep separate per-transaction counts
for revokes added and removed. A single count will do the same job.
Therefore, this patch combines the transaction revokes into a single
count.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h | 1 -
 fs/gfs2/log.c    | 3 +--
 fs/gfs2/trans.c  | 6 +++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b15755068593..e22756214570 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -507,7 +507,6 @@ struct gfs2_trans {
 	unsigned int tr_num_buf_rm;
 	unsigned int tr_num_databuf_rm;
 	unsigned int tr_num_revoke;
-	unsigned int tr_num_revoke_rm;
 
 	struct list_head tr_list;
 	struct list_head tr_databuf;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index a2e1df488df0..ed64a3e48a66 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -887,7 +887,6 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new)
 	old->tr_num_buf_rm	+= new->tr_num_buf_rm;
 	old->tr_num_databuf_rm	+= new->tr_num_databuf_rm;
 	old->tr_num_revoke	+= new->tr_num_revoke;
-	old->tr_num_revoke_rm	+= new->tr_num_revoke_rm;
 
 	list_splice_tail_init(&new->tr_databuf, &old->tr_databuf);
 	list_splice_tail_init(&new->tr_buf, &old->tr_buf);
@@ -909,7 +908,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 		set_bit(TR_ATTACHED, &tr->tr_flags);
 	}
 
-	sdp->sd_log_commited_revoke += tr->tr_num_revoke - tr->tr_num_revoke_rm;
+	sdp->sd_log_commited_revoke += tr->tr_num_revoke;
 	reserved = calc_reserved(sdp);
 	maxres = sdp->sd_log_blks_reserved + tr->tr_reserved;
 	gfs2_assert_withdraw(sdp, maxres >= reserved);
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index d328da7cde36..7ebc0f99deb1 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -80,10 +80,10 @@ static void gfs2_print_trans(struct gfs2_sbd *sdp, const struct gfs2_trans *tr)
 	fs_warn(sdp, "blocks=%u revokes=%u reserved=%u touched=%u\n",
 		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved,
 		test_bit(TR_TOUCHED, &tr->tr_flags));
-	fs_warn(sdp, "Buf %u/%u Databuf %u/%u Revoke %u/%u\n",
+	fs_warn(sdp, "Buf %u/%u Databuf %u/%u Revoke %u\n",
 		tr->tr_num_buf_new, tr->tr_num_buf_rm,
 		tr->tr_num_databuf_new, tr->tr_num_databuf_rm,
-		tr->tr_num_revoke, tr->tr_num_revoke_rm);
+		tr->tr_num_revoke);
 }
 
 void gfs2_trans_end(struct gfs2_sbd *sdp)
@@ -266,7 +266,7 @@ void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
-			tr->tr_num_revoke_rm++;
+			tr->tr_num_revoke--;
 			if (--n == 0)
 				break;
 		}
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm Bob Peterson
@ 2019-05-23 13:03 ` Bob Peterson
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a journal is replayed, gfs2 logs a message similar to:

jid=X: Replaying journal...

This patch adds the tail and block number so that the range of the
replayed block is also printed. These values will match the values
shown if the journal is dumped with gfs2_edit -p journalX. The
resulting output looks something like this:

jid=1: Replaying journal...0x28b7 to 0x2beb

This will allow us to better debug file system corruption problems.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/recovery.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 389b3ef77e20..4ce2bfdbefdc 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -391,7 +391,8 @@ void gfs2_recover_func(struct work_struct *work)
 		}
 
 		t_tlck = ktime_get();
-		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);
+		fs_info(sdp, "jid=%u: Replaying journal...0x%x to 0x%x\n",
+			jd->jd_jid, head.lh_tail, head.lh_blkno);
 
 		for (pass = 0; pass < 2; pass++) {
 			lops_before_scan(jd, &head, pass);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (2 preceding siblings ...)
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed Bob Peterson
@ 2019-05-23 13:03 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN Bob Peterson
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.

When this problem occurs, it's a situation in which this node has
been granted a rgrp glock and subsequently read in buffer_heads for
it, and possibly even made changes to the rgrp bits and/or
allocation values. But now another node has failed and forced us to
replay its journal, but its journal contains a copy of the same
rgrp, without a revoke, which means we're about to overwrite a
rgrp that we now rightfully own, with an obsolete copy. That is
always a problem. It means the other node (which failed and left
its journal to be replayed) failed to flush out its rgrp buffers,
write out the revoke, and invalidate its copy before it released
the glock to our possession.

No node should ever release a glock until its metadata has been
written to the journal and revoked and invalidated..

We also kludge around the problem and refuse to replace our good
copy with the journals bad copy by not marking the buffer dirty,
but never do it silently. That's wallpapering over a larger problem
that still exists. IOW, if this situation can happen to this node,
it can also happen to a different node and we wouldn't even know it
or be able to circumvent it: Suppose we have a 3-node cluster:
Node 1 fails, leaving an obsolete rgrp block in its journal without
a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
released and starts making changes, allocating and freeing blocks
from the rgrp, etc. Node 3 replays the journal from node 1,
oblivious and unaware that it's about to overwrite node 2's changes.
So we still need to be vocal and log the error to make it apparent
that a corruption path still exists in gfs2.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 33ab662c9aac..ac73aa48674b 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -762,9 +762,27 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, u32 start,

 		if (gfs2_meta_check(sdp, bh_ip))
 			error = -EIO;
-		else
+		else {
+			struct gfs2_meta_header *mh =
+				(struct gfs2_meta_header *)bh_ip->b_data;
+
+			if (mh->mh_type == cpu_to_be32(GFS2_METATYPE_RG)) {
+				struct gfs2_rgrpd *rgd;
+
+				rgd = gfs2_blk2rgrpd(sdp, blkno, false);
+				if (rgd && rgd->rd_addr == blkno &&
+				    rgd->rd_bits && rgd->rd_bits->bi_bh) {
+					fs_info(sdp, "Replaying 0x%llx but we "
+						"already have a bh!\n",
+						(unsigned long long)blkno);
+					fs_info(sdp, "busy:%d, pinned:%d\n",
+						buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0,
+						buffer_pinned(rgd->rd_bits->bi_bh));
+					gfs2_dump_glock(NULL, rgd->rd_gl);
+				}
+			}
 			mark_buffer_dirty(bh_ip);
-
+		}
 		brelse(bh_log);
 		brelse(bh_ip);

-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (3 preceding siblings ...)
  2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case Bob Peterson
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, the superblock flag indicating when a file system
is withdrawn was called SDF_SHUTDOWN. This patch simply renames it to
the more obvious SDF_WITHDRAWN.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c       | 4 ++--
 fs/gfs2/file.c       | 2 +-
 fs/gfs2/glock.c      | 6 +++---
 fs/gfs2/glops.c      | 2 +-
 fs/gfs2/incore.h     | 2 +-
 fs/gfs2/meta_io.c    | 6 +++---
 fs/gfs2/ops_fstype.c | 2 +-
 fs/gfs2/quota.c      | 2 +-
 fs/gfs2/super.c      | 6 +++---
 fs/gfs2/sys.c        | 2 +-
 fs/gfs2/util.c       | 4 ++--
 11 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 6210d4429d84..f296d8e67c20 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page)
 		error = mpage_readpage(page, gfs2_block_map);
 	}
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		return -EIO;
 
 	return error;
@@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		ret = -EIO;
 	return ret;
 }
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 58a768e59712..699ab831ec65 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
 		cmd = F_SETLK;
 		fl->fl_type = F_UNLCK;
 	}
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags))) {
 		if (fl->fl_type == F_UNLCK)
 			locks_lock_file_wait(file, fl);
 		return -EIO;
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 15c605cfcfc8..5e074fbf3796 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -547,7 +547,7 @@ __acquires(&gl->gl_lockref.lock)
 	unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0);
 	int ret;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) &&
 	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
@@ -584,7 +584,7 @@ __acquires(&gl->gl_lockref.lock)
 		}
 		else if (ret) {
 			fs_err(sdp, "lm_lock ret %d\n", ret);
-			GLOCK_BUG_ON(gl, !test_bit(SDF_SHUTDOWN,
+			GLOCK_BUG_ON(gl, !test_bit(SDF_WITHDRAWN,
 						   &sdp->sd_flags));
 		}
 	} else { /* lock_nolock */
@@ -1097,7 +1097,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 24ada3ccc525..5cecde79c63f 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -539,7 +539,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh)
 			gfs2_consist(sdp);
 
 		/*  Initialize some head of the log stuff  */
-		if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+		if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
 			sdp->sd_log_sequence = head.lh_sequence + 1;
 			gfs2_log_pointers_init(sdp, head.lh_blkno);
 		}
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index e22756214570..a4a2c7e07ba7 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -611,7 +611,7 @@ struct gfs2_tune {
 enum {
 	SDF_JOURNAL_CHECKED	= 0,
 	SDF_JOURNAL_LIVE	= 1,
-	SDF_SHUTDOWN		= 2,
+	SDF_WITHDRAWN		= 2,
 	SDF_NOBARRIERS		= 3,
 	SDF_NORECOVERY		= 4,
 	SDF_DEMOTE		= 5,
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index ff86e1d4f8ff..df8193d48177 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -254,7 +254,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	struct buffer_head *bh, *bhs[2];
 	int num = 0;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags))) {
 		*bhp = NULL;
 		return -EIO;
 	}
@@ -312,7 +312,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 
 int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 {
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		return -EIO;
 
 	wait_on_buffer(bh);
@@ -323,7 +323,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 			gfs2_io_error_bh_wd(sdp, bh);
 		return -EIO;
 	}
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		return -EIO;
 
 	return 0;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 46f6615eaf12..f836ae4f7fce 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -999,7 +999,7 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
 void gfs2_lm_unmount(struct gfs2_sbd *sdp)
 {
 	const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
-	if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
+	if (likely(!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) &&
 	    lm->lm_unmount)
 		lm->lm_unmount(sdp);
 }
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 2ae5a109eea7..33d5063e2c7f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1478,7 +1478,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 {
 	if (error == 0 || error == -EROFS)
 		return;
-	if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+	if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
 		fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
 		sdp->sd_log_error = error;
 		wake_up(&sdp->sd_logd_waitq);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a468e58fcda4..01137635fb44 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -808,7 +808,7 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 
 	if (!(flags & I_DIRTY_INODE))
 		return;
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
 		return;
 	if (!gfs2_glock_is_locked_by_me(ip->i_gl)) {
 		ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
@@ -857,7 +857,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 
 	error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, GL_NOCACHE,
 				   &freeze_gh);
-	if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+	if (error && !test_bit(SDF_WITHDRAWN, &sdp->sd_flags))
 		return error;
 
 	flush_workqueue(gfs2_delete_workqueue);
@@ -1016,7 +1016,7 @@ static int gfs2_freeze(struct super_block *sb)
 	if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN)
 		goto out;
 
-	if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+	if (test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
 		error = -EINVAL;
 		goto out;
 	}
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 08e4996adc23..4448e269aa12 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -121,7 +121,7 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 
 static ssize_t withdraw_show(struct gfs2_sbd *sdp, char *buf)
 {
-	unsigned int b = test_bit(SDF_SHUTDOWN, &sdp->sd_flags);
+	unsigned int b = test_bit(SDF_WITHDRAWN, &sdp->sd_flags);
 	return snprintf(buf, PAGE_SIZE, "%u\n", b);
 }
 
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 0a814ccac41d..3b729f49fbb1 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -44,7 +44,7 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 	struct va_format vaf;
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW &&
-	    test_and_set_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+	    test_and_set_bit(SDF_WITHDRAWN, &sdp->sd_flags))
 		return 0;
 
 	if (fmt) {
@@ -259,7 +259,7 @@ void gfs2_io_error_bh_i(struct gfs2_sbd *sdp, struct buffer_head *bh,
 			const char *function, char *file, unsigned int line,
 			bool withdraw)
 {
-	if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+	if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags))
 		fs_err(sdp,
 		       "fatal: I/O error\n"
 		       "  block = %llu\n"
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (4 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems Bob Peterson
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function gfs2_freeze had a case statement that simply checked the
error code, but the break statements just made the logic hard to
read. This patch simplifies the logic in favor of a simple if.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 01137635fb44..6e3a318edccd 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1026,20 +1026,14 @@ static int gfs2_freeze(struct super_block *sb)
 		if (!error)
 			break;
 
-		switch (error) {
-		case -EBUSY:
+		if (error == -EBUSY)
 			fs_err(sdp, "waiting for recovery before freeze\n");
-			break;
-
-		default:
+		else
 			fs_err(sdp, "error freezing FS: %d\n", error);
-			break;
-		}
 
 		fs_err(sdp, "retrying...\n");
 		msleep(1000);
 	}
-	error = 0;
 	set_bit(SDF_FS_FROZEN, &sdp->sd_flags);
 out:
 	mutex_unlock(&sdp->sd_freeze_mutex);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (5 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, if a glock error was encountered, the glock with
the problem was dumped. But sometimes you may have lots of file systems
mounted, and that doesn't tell you which file system it was for.

This patch adds a new boolean parameter fsid to the dump_glock family
of functions. For non-error cases, such as dumping the glocks debugfs
file, the fsid is not dumped in order to keep lock dumps and glocktop
as clean as possible. For all error cases, such as GLOCK_BUG_ON, the
file system id is now printed. This will make it easier to debug.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c  | 34 +++++++++++++++++++++-------------
 fs/gfs2/glock.h  | 11 +++++++----
 fs/gfs2/glops.c  |  7 +++++--
 fs/gfs2/incore.h |  3 ++-
 fs/gfs2/lops.c   |  2 +-
 fs/gfs2/rgrp.c   | 21 ++++++++++++++-------
 fs/gfs2/rgrp.h   |  3 ++-
 fs/gfs2/util.c   |  4 +++-
 8 files changed, 55 insertions(+), 30 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 5e074fbf3796..3663643dc80d 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1078,7 +1078,7 @@ __acquires(&gl->gl_lockref.lock)
 	fs_err(sdp, "pid: %d\n", pid_nr(gh->gh_owner_pid));
 	fs_err(sdp, "lock type: %d req lock state : %d\n",
 	       gh->gh_gl->gl_name.ln_type, gh->gh_state);
-	gfs2_dump_glock(NULL, gl);
+	gfs2_dump_glock(NULL, gl, true);
 	BUG();
 }
 
@@ -1613,16 +1613,16 @@ void gfs2_glock_thaw(struct gfs2_sbd *sdp)
 	glock_hash_walk(thaw_glock, sdp);
 }
 
-static void dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
+static void dump_glock(struct seq_file *seq, struct gfs2_glock *gl, bool fsid)
 {
 	spin_lock(&gl->gl_lockref.lock);
-	gfs2_dump_glock(seq, gl);
+	gfs2_dump_glock(seq, gl, fsid);
 	spin_unlock(&gl->gl_lockref.lock);
 }
 
 static void dump_glock_func(struct gfs2_glock *gl)
 {
-	dump_glock(NULL, gl);
+	dump_glock(NULL, gl, true);
 }
 
 /**
@@ -1707,10 +1707,12 @@ static const char *hflags2str(char *buf, u16 flags, unsigned long iflags)
  * dump_holder - print information about a glock holder
  * @seq: the seq_file struct
  * @gh: the glock holder
+ * @fs_id_buf: pointer to file system id (if requested)
  *
  */
 
-static void dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
+static void dump_holder(struct gfs2_sbd *sdp, struct seq_file *seq,
+			const struct gfs2_holder *gh, const char *fs_id_buf)
 {
 	struct task_struct *gh_owner = NULL;
 	char flags_buf[32];
@@ -1718,8 +1720,8 @@ static void dump_holder(struct seq_file *seq, const struct gfs2_holder *gh)
 	rcu_read_lock();
 	if (gh->gh_owner_pid)
 		gh_owner = pid_task(gh->gh_owner_pid, PIDTYPE_PID);
-	gfs2_print_dbg(seq, " H: s:%s f:%s e:%d p:%ld [%s] %pS\n",
-		       state2str(gh->gh_state),
+	gfs2_print_dbg(seq, "%s H: s:%s f:%s e:%d p:%ld [%s] %pS\n",
+		       fs_id_buf, state2str(gh->gh_state),
 		       hflags2str(flags_buf, gh->gh_flags, gh->gh_iflags),
 		       gh->gh_error,
 		       gh->gh_owner_pid ? (long)pid_nr(gh->gh_owner_pid) : -1,
@@ -1769,6 +1771,7 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
  * gfs2_dump_glock - print information about a glock
  * @seq: The seq_file struct
  * @gl: the glock
+ * @fsid: If true, also dump the file system id
  *
  * The file format is as follows:
  * One line per object, capital letters are used to indicate objects
@@ -1782,19 +1785,24 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
  *
  */
 
-void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
+void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl, bool fsid)
 {
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	unsigned long long dtime;
 	const struct gfs2_holder *gh;
 	char gflags_buf[32];
+	char fs_id_buf[GFS2_FSNAME_LEN + 3 * sizeof(int) + 2];
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 
+	memset(fs_id_buf, 0, sizeof(fs_id_buf));
+	if (fsid && sdp) /* safety precaution */
+		sprintf(fs_id_buf, "fsid=%s: ", sdp->sd_fsname);
 	dtime = jiffies - gl->gl_demote_time;
 	dtime *= 1000000/HZ; /* demote time in uSec */
 	if (!test_bit(GLF_DEMOTE, &gl->gl_flags))
 		dtime = 0;
-	gfs2_print_dbg(seq, "G:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d v:%d r:%d m:%ld\n",
-		  state2str(gl->gl_state),
+	gfs2_print_dbg(seq, "%sG:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d "
+		       "v:%d r:%d m:%ld\n", fs_id_buf, state2str(gl->gl_state),
 		  gl->gl_name.ln_type,
 		  (unsigned long long)gl->gl_name.ln_number,
 		  gflags2str(gflags_buf, gl),
@@ -1805,10 +1813,10 @@ void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl)
 		  (int)gl->gl_lockref.count, gl->gl_hold_time);
 
 	list_for_each_entry(gh, &gl->gl_holders, gh_list)
-		dump_holder(seq, gh);
+		dump_holder(sdp, seq, gh, fs_id_buf);
 
 	if (gl->gl_state != LM_ST_UNLOCKED && glops->go_dump)
-		glops->go_dump(seq, gl);
+		glops->go_dump(seq, gl, fs_id_buf);
 }
 
 static int gfs2_glstats_seq_show(struct seq_file *seq, void *iter_ptr)
@@ -2009,7 +2017,7 @@ static void gfs2_glock_seq_stop(struct seq_file *seq, void *iter_ptr)
 
 static int gfs2_glock_seq_show(struct seq_file *seq, void *iter_ptr)
 {
-	dump_glock(seq, iter_ptr);
+	dump_glock(seq, iter_ptr, false);
 	return 0;
 }
 
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 936b3295839c..ea1568db0452 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -202,8 +202,11 @@ extern int gfs2_glock_nq_num(struct gfs2_sbd *sdp, u64 number,
 			     struct gfs2_holder *gh);
 extern int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 extern void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
-extern void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl);
-#define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { gfs2_dump_glock(NULL, gl); BUG(); } } while(0)
+extern void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl,
+			    bool fsid);
+#define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) {		\
+			gfs2_dump_glock(NULL, gl, true);	\
+			BUG(); } } while(0)
 extern __printf(2, 3)
 void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);
 
@@ -269,7 +272,7 @@ static inline void glock_set_object(struct gfs2_glock *gl, void *object)
 {
 	spin_lock(&gl->gl_lockref.lock);
 	if (gfs2_assert_warn(gl->gl_name.ln_sbd, gl->gl_object == NULL))
-		gfs2_dump_glock(NULL, gl);
+		gfs2_dump_glock(NULL, gl, true);
 	gl->gl_object = object;
 	spin_unlock(&gl->gl_lockref.lock);
 }
@@ -281,7 +284,7 @@ static inline void glock_set_object(struct gfs2_glock *gl, void *object)
  *
  * I'd love to similarly add this:
  *	else if (gfs2_assert_warn(gl->gl_sbd, gl->gl_object == object))
- *		gfs2_dump_glock(NULL, gl);
+ *		gfs2_dump_glock(NULL, gl, true);
  * Unfortunately, that's not possible because as soon as gfs2_delete_inode
  * frees the block in the rgrp, another process can reassign it for an I_NEW
  * inode in gfs2_create_inode because that calls new_inode, not gfs2_iget.
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 5cecde79c63f..84a403ed6e77 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -464,10 +464,12 @@ static int inode_go_lock(struct gfs2_holder *gh)
  * inode_go_dump - print information about an inode
  * @seq: The iterator
  * @ip: the inode
+ * @fs_id_buf: file system id (may be empty)
  *
  */
 
-static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl)
+static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl,
+			  const char *fs_id_buf)
 {
 	struct gfs2_inode *ip = gl->gl_object;
 	struct inode *inode = &ip->i_inode;
@@ -480,7 +482,8 @@ static void inode_go_dump(struct seq_file *seq, struct gfs2_glock *gl)
 	nrpages = inode->i_data.nrpages;
 	xa_unlock_irq(&inode->i_data.i_pages);
 
-	gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu p:%lu\n",
+	gfs2_print_dbg(seq, "%s I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu "
+		       "p:%lu\n", fs_id_buf,
 		  (unsigned long long)ip->i_no_formal_ino,
 		  (unsigned long long)ip->i_no_addr,
 		  IF2DT(ip->i_inode.i_mode), ip->i_flags,
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a4a2c7e07ba7..52c5777289f5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -243,7 +243,8 @@ struct gfs2_glock_operations {
 	int (*go_demote_ok) (const struct gfs2_glock *gl);
 	int (*go_lock) (struct gfs2_holder *gh);
 	void (*go_unlock) (struct gfs2_holder *gh);
-	void (*go_dump)(struct seq_file *seq, struct gfs2_glock *gl);
+	void (*go_dump)(struct seq_file *seq, struct gfs2_glock *gl,
+			const char *fs_id_buf);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
 	const int go_type;
 	const unsigned long go_flags;
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index ac73aa48674b..fd2e452ad327 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -778,7 +778,7 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, u32 start,
 					fs_info(sdp, "busy:%d, pinned:%d\n",
 						buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0,
 						buffer_pinned(rgd->rd_bits->bi_bh));
-					gfs2_dump_glock(NULL, rgd->rd_gl);
+					gfs2_dump_glock(NULL, rgd->rd_gl, true);
 				}
 			}
 			mark_buffer_dirty(bh_ip);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 15d6e32de55f..ed3a6d3973a9 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -613,11 +613,12 @@ int gfs2_rsqa_alloc(struct gfs2_inode *ip)
 	return gfs2_qa_alloc(ip);
 }
 
-static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs)
+static void dump_rs(struct seq_file *seq, const struct gfs2_blkreserv *rs,
+		    const char *fs_id_buf)
 {
 	struct gfs2_inode *ip = container_of(rs, struct gfs2_inode, i_res);
 
-	gfs2_print_dbg(seq, "  B: n:%llu s:%llu b:%u f:%u\n",
+	gfs2_print_dbg(seq, "%s  B: n:%llu s:%llu b:%u f:%u\n", fs_id_buf,
 		       (unsigned long long)ip->i_no_addr,
 		       (unsigned long long)gfs2_rbm_to_block(&rs->rs_rbm),
 		       rs->rs_rbm.offset, rs->rs_free);
@@ -2249,10 +2250,12 @@ static void rgblk_free(struct gfs2_sbd *sdp, struct gfs2_rgrpd *rgd,
  * gfs2_rgrp_dump - print out an rgrp
  * @seq: The iterator
  * @gl: The glock in question
+ * @fs_id_buf: pointer to file system id (if requested)
  *
  */
 
-void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl)
+void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl,
+		    const char *fs_id_buf)
 {
 	struct gfs2_rgrpd *rgd = gl->gl_object;
 	struct gfs2_blkreserv *trs;
@@ -2260,14 +2263,15 @@ void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl)
 
 	if (rgd == NULL)
 		return;
-	gfs2_print_dbg(seq, " R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
+	gfs2_print_dbg(seq, "%s R: n:%llu f:%02x b:%u/%u i:%u r:%u e:%u\n",
+		       fs_id_buf,
 		       (unsigned long long)rgd->rd_addr, rgd->rd_flags,
 		       rgd->rd_free, rgd->rd_free_clone, rgd->rd_dinodes,
 		       rgd->rd_reserved, rgd->rd_extfail_pt);
 	if (rgd->rd_sbd->sd_args.ar_rgrplvb) {
 		struct gfs2_rgrp_lvb *rgl = rgd->rd_rgl;
 
-		gfs2_print_dbg(seq, "  L: f:%02x b:%u i:%u\n",
+		gfs2_print_dbg(seq, "%s  L: f:%02x b:%u i:%u\n", fs_id_buf,
 			       be32_to_cpu(rgl->rl_flags),
 			       be32_to_cpu(rgl->rl_free),
 			       be32_to_cpu(rgl->rl_dinodes));
@@ -2275,7 +2279,7 @@ void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl)
 	spin_lock(&rgd->rd_rsspin);
 	for (n = rb_first(&rgd->rd_rstree); n; n = rb_next(&trs->rs_node)) {
 		trs = rb_entry(n, struct gfs2_blkreserv, rs_node);
-		dump_rs(seq, trs);
+		dump_rs(seq, trs, fs_id_buf);
 	}
 	spin_unlock(&rgd->rd_rsspin);
 }
@@ -2283,10 +2287,13 @@ void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl)
 static void gfs2_rgrp_error(struct gfs2_rgrpd *rgd)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
+	char fs_id_buf[GFS2_FSNAME_LEN + 3 * sizeof(int) + 2];
+
 	fs_warn(sdp, "rgrp %llu has an error, marking it readonly until umount\n",
 		(unsigned long long)rgd->rd_addr);
 	fs_warn(sdp, "umount on all nodes and run fsck.gfs2 to fix the error\n");
-	gfs2_rgrp_dump(NULL, rgd->rd_gl);
+	sprintf(fs_id_buf, "fsid=%s: ", sdp->sd_fsname);
+	gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
 	rgd->rd_flags |= GFS2_RDF_ERROR;
 }
 
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 499079a9dbbe..c809f4654718 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -72,7 +72,8 @@ extern void gfs2_rlist_add(struct gfs2_inode *ip, struct gfs2_rgrp_list *rlist,
 extern void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist);
 extern void gfs2_rlist_free(struct gfs2_rgrp_list *rlist);
 extern u64 gfs2_ri_total(struct gfs2_sbd *sdp);
-extern void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl);
+extern void gfs2_rgrp_dump(struct seq_file *seq, struct gfs2_glock *gl,
+			   const char *fs_id_buf);
 extern int gfs2_rgrp_send_discards(struct gfs2_sbd *sdp, u64 offset,
 				   struct buffer_head *bh,
 				   const struct gfs2_bitmap *bi, unsigned minlen, u64 *ptrimmed);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 3b729f49fbb1..084a76ffaf65 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -181,9 +181,11 @@ int gfs2_consist_rgrpd_i(struct gfs2_rgrpd *rgd, int cluster_wide,
 			 const char *function, char *file, unsigned int line)
 {
 	struct gfs2_sbd *sdp = rgd->rd_sbd;
+	char fs_id_buf[GFS2_FSNAME_LEN + 3 * sizeof(int) + 2];
 	int rv;
 
-	gfs2_rgrp_dump(NULL, rgd->rd_gl);
+	sprintf(fs_id_buf, "fsid=%s: ", sdp->sd_fsname);
+	gfs2_rgrp_dump(NULL, rgd->rd_gl, fs_id_buf);
 	rv = gfs2_lm_withdraw(sdp,
 			      "fatal: filesystem consistency error\n"
 			      "  RG = %llu\n"
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (6 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-29 16:20   ` Andreas Gruenbacher
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw Bob Peterson
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch replaces a few leftover printk errors with calls to
fs_info and similar, so that the file system having the error is
properly logged.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/bmap.c  |  2 +-
 fs/gfs2/glops.c |  3 ++-
 fs/gfs2/rgrp.c  | 27 ++++++++++++++-------------
 fs/gfs2/super.c |  6 +++---
 4 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
index f42718dd292f..a809aa887521 100644
--- a/fs/gfs2/bmap.c
+++ b/fs/gfs2/bmap.c
@@ -1862,7 +1862,7 @@ static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length)
 			gfs2_assert_withdraw(sdp, bh);
 			if (gfs2_assert_withdraw(sdp,
 						 prev_bnr != bh->b_blocknr)) {
-				printk(KERN_EMERG "GFS2: fsid=%s:inode %llu, "
+				fs_emerg(sdp, "GFS2: fsid=%s:inode %llu, "
 				       "block:%llu, i_h:%u, s_h:%u, mp_h:%u\n",
 				       sdp->sd_fsname,
 				       (unsigned long long)ip->i_no_addr,
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 84a403ed6e77..1e5720e50e9c 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -509,7 +509,8 @@ static void freeze_go_sync(struct gfs2_glock *gl)
 		atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
 		error = freeze_super(sdp->sd_vfs);
 		if (error) {
-			printk(KERN_INFO "GFS2: couldn't freeze filesystem: %d\n", error);
+			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
+				error);
 			gfs2_assert_withdraw(sdp, 0);
 		}
 		queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work);
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index ed3a6d3973a9..985d968b042c 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1115,32 +1115,33 @@ static int gfs2_rgrp_lvb_valid(struct gfs2_rgrpd *rgd)
 {
 	struct gfs2_rgrp_lvb *rgl = rgd->rd_rgl;
 	struct gfs2_rgrp *str = (struct gfs2_rgrp *)rgd->rd_bits[0].bi_bh->b_data;
+	struct gfs2_sbd *sdp = rgd->rd_sbd;
 	int valid = 1;
 
 	if (rgl->rl_flags != str->rg_flags) {
-		printk(KERN_WARNING "GFS2: rgd: %llu lvb flag mismatch %u/%u",
-		       (unsigned long long)rgd->rd_addr,
+		fs_warn(sdp, "GFS2: rgd: %llu lvb flag mismatch %u/%u",
+			(unsigned long long)rgd->rd_addr,
 		       be32_to_cpu(rgl->rl_flags), be32_to_cpu(str->rg_flags));
 		valid = 0;
 	}
 	if (rgl->rl_free != str->rg_free) {
-		printk(KERN_WARNING "GFS2: rgd: %llu lvb free mismatch %u/%u",
-		       (unsigned long long)rgd->rd_addr,
-		       be32_to_cpu(rgl->rl_free), be32_to_cpu(str->rg_free));
+		fs_warn(sdp, "GFS2: rgd: %llu lvb free mismatch %u/%u",
+			(unsigned long long)rgd->rd_addr,
+			be32_to_cpu(rgl->rl_free), be32_to_cpu(str->rg_free));
 		valid = 0;
 	}
 	if (rgl->rl_dinodes != str->rg_dinodes) {
-		printk(KERN_WARNING "GFS2: rgd: %llu lvb dinode mismatch %u/%u",
-		       (unsigned long long)rgd->rd_addr,
-		       be32_to_cpu(rgl->rl_dinodes),
-		       be32_to_cpu(str->rg_dinodes));
+		fs_warn(sdp, "GFS2: rgd: %llu lvb dinode mismatch %u/%u",
+			(unsigned long long)rgd->rd_addr,
+			be32_to_cpu(rgl->rl_dinodes),
+			be32_to_cpu(str->rg_dinodes));
 		valid = 0;
 	}
 	if (rgl->rl_igeneration != str->rg_igeneration) {
-		printk(KERN_WARNING "GFS2: rgd: %llu lvb igen mismatch "
-		       "%llu/%llu", (unsigned long long)rgd->rd_addr,
-		       (unsigned long long)be64_to_cpu(rgl->rl_igeneration),
-		       (unsigned long long)be64_to_cpu(str->rg_igeneration));
+		fs_warn(sdp, "GFS2: rgd: %llu lvb igen mismatch %llu/%llu",
+			(unsigned long long)rgd->rd_addr,
+			(unsigned long long)be64_to_cpu(rgl->rl_igeneration),
+			(unsigned long long)be64_to_cpu(str->rg_igeneration));
 		valid = 0;
 	}
 	return valid;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 6e3a318edccd..62cc451f30d5 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -981,14 +981,14 @@ void gfs2_freeze_func(struct work_struct *work)
 	error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, 0,
 				   &freeze_gh);
 	if (error) {
-		printk(KERN_INFO "GFS2: couldn't get freeze lock : %d\n", error);
+		fs_info(sdp, "GFS2: couldn't get freeze lock : %d\n", error);
 		gfs2_assert_withdraw(sdp, 0);
 	} else {
 		atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
 		error = thaw_super(sb);
 		if (error) {
-			printk(KERN_INFO "GFS2: couldn't thaw filesystem: %d\n",
-			       error);
+			fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n",
+				error);
 			gfs2_assert_withdraw(sdp, 0);
 		}
 		if (!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (7 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error Bob Peterson
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

File system withdraws can be delayed when inconsistencies are
discovered when we cannot withdraw immediately, for example, when
critical spin_locks are held. But delaying the withdraw can cause
gfs2 to ignore the error and keep running for a short period of time.
For example, an rgrp glock may be dequeued and demoted while there
are still buffers that haven't been properly revoked, due to io
errors writing to the journal.

This patch introduces a new concept of a pending withdraw, which
means an inconsistency has been discovered and we need to withdraw
at the earliest possible opportunity. In these cases, we aren't
quite withdrawn yet, but we still need to not dequeue glocks and
other critical things. If we dequeue the glocks and the withdraw
results in our journal being replayed, the replay could overwrite
data that's been modified by a different node that acquired the
glock in the meantime.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c       |  4 ++--
 fs/gfs2/file.c       |  2 +-
 fs/gfs2/glock.c      |  7 +++----
 fs/gfs2/glops.c      |  2 +-
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/log.c        | 20 ++++++++------------
 fs/gfs2/meta_io.c    |  6 +++---
 fs/gfs2/ops_fstype.c |  3 +--
 fs/gfs2/quota.c      |  2 +-
 fs/gfs2/super.c      |  6 +++---
 fs/gfs2/sys.c        |  2 +-
 fs/gfs2/util.c       | 14 +++++++-------
 fs/gfs2/util.h       | 12 ++++++++++++
 13 files changed, 44 insertions(+), 37 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index f296d8e67c20..c3c328b782fa 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page)
 		error = mpage_readpage(page, gfs2_block_map);
 	}
 
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		return -EIO;
 
 	return error;
@@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		ret = -EIO;
 	return ret;
 }
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 699ab831ec65..bba39c73a43c 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
 		cmd = F_SETLK;
 		fl->fl_type = F_UNLCK;
 	}
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags))) {
+	if (unlikely(gfs2_withdrawn(sdp))) {
 		if (fl->fl_type == F_UNLCK)
 			locks_lock_file_wait(file, fl);
 		return -EIO;
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 3663643dc80d..b03f784c982f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -547,7 +547,7 @@ __acquires(&gl->gl_lockref.lock)
 	unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0);
 	int ret;
 
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) &&
+	if (unlikely(gfs2_withdrawn(sdp)) &&
 	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
@@ -584,8 +584,7 @@ __acquires(&gl->gl_lockref.lock)
 		}
 		else if (ret) {
 			fs_err(sdp, "lm_lock ret %d\n", ret);
-			GLOCK_BUG_ON(gl, !test_bit(SDF_WITHDRAWN,
-						   &sdp->sd_flags));
+			GLOCK_BUG_ON(gl, !gfs2_withdrawn(sdp));
 		}
 	} else { /* lock_nolock */
 		finish_xmote(gl, target);
@@ -1097,7 +1096,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 1e5720e50e9c..4825a00b7dfa 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -543,7 +543,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh)
 			gfs2_consist(sdp);
 
 		/*  Initialize some head of the log stuff  */
-		if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
+		if (!gfs2_withdrawn(sdp)) {
 			sdp->sd_log_sequence = head.lh_sequence + 1;
 			gfs2_log_pointers_init(sdp, head.lh_blkno);
 		}
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 52c5777289f5..b261168be298 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -622,6 +622,7 @@ enum {
 	SDF_FORCE_AIL_FLUSH     = 9,
 	SDF_AIL1_IO_ERROR	= 10,
 	SDF_FS_FROZEN           = 11,
+	SDF_WITHDRAWING		= 12, /* Will withdraw eventually */
 };
 
 enum gfs2_freeze_state {
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index ed64a3e48a66..6ccc5d01b813 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -92,8 +92,7 @@ static void gfs2_remove_from_ail(struct gfs2_bufdata *bd)
 
 static int gfs2_ail1_start_one(struct gfs2_sbd *sdp,
 			       struct writeback_control *wbc,
-			       struct gfs2_trans *tr,
-			       bool *withdraw)
+			       struct gfs2_trans *tr)
 __releases(&sdp->sd_ail_lock)
 __acquires(&sdp->sd_ail_lock)
 {
@@ -112,7 +111,7 @@ __acquires(&sdp->sd_ail_lock)
 			    !test_and_set_bit(SDF_AIL1_IO_ERROR,
 					      &sdp->sd_flags)) {
 				gfs2_io_error_bh(sdp, bh);
-				*withdraw = true;
+				set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 			}
 			list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
 			continue;
@@ -153,7 +152,6 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 	struct list_head *head = &sdp->sd_ail1_list;
 	struct gfs2_trans *tr;
 	struct blk_plug plug;
-	bool withdraw = false;
 
 	trace_gfs2_ail_flush(sdp, wbc, 1);
 	blk_start_plug(&plug);
@@ -162,12 +160,12 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 	list_for_each_entry_reverse(tr, head, tr_list) {
 		if (wbc->nr_to_write <= 0)
 			break;
-		if (gfs2_ail1_start_one(sdp, wbc, tr, &withdraw))
+		if (gfs2_ail1_start_one(sdp, wbc, tr))
 			goto restart;
 	}
 	spin_unlock(&sdp->sd_ail_lock);
 	blk_finish_plug(&plug);
-	if (withdraw)
+	if (test_bit(SDF_WITHDRAWING, &sdp->sd_flags))
 		gfs2_lm_withdraw(sdp, NULL);
 	trace_gfs2_ail_flush(sdp, wbc, 0);
 }
@@ -196,8 +194,7 @@ static void gfs2_ail1_start(struct gfs2_sbd *sdp)
  *
  */
 
-static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
-				bool *withdraw)
+static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 {
 	struct gfs2_bufdata *bd, *s;
 	struct buffer_head *bh;
@@ -211,7 +208,7 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
 		if (!buffer_uptodate(bh) &&
 		    !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
 			gfs2_io_error_bh(sdp, bh);
-			*withdraw = true;
+			set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 		}
 		list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
 	}
@@ -229,11 +226,10 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 	struct gfs2_trans *tr, *s;
 	int oldest_tr = 1;
 	int ret;
-	bool withdraw = false;
 
 	spin_lock(&sdp->sd_ail_lock);
 	list_for_each_entry_safe_reverse(tr, s, &sdp->sd_ail1_list, tr_list) {
-		gfs2_ail1_empty_one(sdp, tr, &withdraw);
+		gfs2_ail1_empty_one(sdp, tr);
 		if (list_empty(&tr->tr_ail1_list) && oldest_tr)
 			list_move(&tr->tr_list, &sdp->sd_ail2_list);
 		else
@@ -242,7 +238,7 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 	ret = list_empty(&sdp->sd_ail1_list);
 	spin_unlock(&sdp->sd_ail_lock);
 
-	if (withdraw)
+	if (test_bit(SDF_WITHDRAWING, &sdp->sd_flags))
 		gfs2_lm_withdraw(sdp, "fatal: I/O error(s)\n");
 
 	return ret;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index df8193d48177..6f2ef841a4bb 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -254,7 +254,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	struct buffer_head *bh, *bhs[2];
 	int num = 0;
 
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags))) {
+	if (unlikely(gfs2_withdrawn(sdp))) {
 		*bhp = NULL;
 		return -EIO;
 	}
@@ -312,7 +312,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 
 int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 {
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		return -EIO;
 
 	wait_on_buffer(bh);
@@ -323,7 +323,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 			gfs2_io_error_bh_wd(sdp, bh);
 		return -EIO;
 	}
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		return -EIO;
 
 	return 0;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index f836ae4f7fce..56616997ab24 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -999,8 +999,7 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
 void gfs2_lm_unmount(struct gfs2_sbd *sdp)
 {
 	const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
-	if (likely(!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) &&
-	    lm->lm_unmount)
+	if (likely(!gfs2_withdrawn(sdp)) && lm->lm_unmount)
 		lm->lm_unmount(sdp);
 }
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 33d5063e2c7f..a8dfc86fd682 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1478,7 +1478,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 {
 	if (error == 0 || error == -EROFS)
 		return;
-	if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
+	if (!gfs2_withdrawn(sdp)) {
 		fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
 		sdp->sd_log_error = error;
 		wake_up(&sdp->sd_logd_waitq);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 62cc451f30d5..5ea8d45e989d 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -808,7 +808,7 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 
 	if (!(flags & I_DIRTY_INODE))
 		return;
-	if (unlikely(test_bit(SDF_WITHDRAWN, &sdp->sd_flags)))
+	if (unlikely(gfs2_withdrawn(sdp)))
 		return;
 	if (!gfs2_glock_is_locked_by_me(ip->i_gl)) {
 		ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
@@ -857,7 +857,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 
 	error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, GL_NOCACHE,
 				   &freeze_gh);
-	if (error && !test_bit(SDF_WITHDRAWN, &sdp->sd_flags))
+	if (error && !gfs2_withdrawn(sdp))
 		return error;
 
 	flush_workqueue(gfs2_delete_workqueue);
@@ -1016,7 +1016,7 @@ static int gfs2_freeze(struct super_block *sb)
 	if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN)
 		goto out;
 
-	if (test_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
+	if (gfs2_withdrawn(sdp)) {
 		error = -EINVAL;
 		goto out;
 	}
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 4448e269aa12..8f50c2921e6f 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -121,7 +121,7 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 
 static ssize_t withdraw_show(struct gfs2_sbd *sdp, char *buf)
 {
-	unsigned int b = test_bit(SDF_WITHDRAWN, &sdp->sd_flags);
+	unsigned int b = gfs2_withdrawn(sdp);
 	return snprintf(buf, PAGE_SIZE, "%u\n", b);
 }
 
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 084a76ffaf65..bd3c91fa446b 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -261,13 +261,13 @@ void gfs2_io_error_bh_i(struct gfs2_sbd *sdp, struct buffer_head *bh,
 			const char *function, char *file, unsigned int line,
 			bool withdraw)
 {
-	if (!test_bit(SDF_WITHDRAWN, &sdp->sd_flags))
-		fs_err(sdp,
-		       "fatal: I/O error\n"
-		       "  block = %llu\n"
-		       "  function = %s, file = %s, line = %u\n",
-		       (unsigned long long)bh->b_blocknr,
-		       function, file, line);
+	if (gfs2_withdrawn(sdp))
+		return;
+
+	fs_err(sdp, "fatal: I/O error\n"
+	       "  block = %llu\n"
+	       "  function = %s, file = %s, line = %u\n",
+	       (unsigned long long)bh->b_blocknr, function, file, line);
 	if (withdraw)
 		gfs2_lm_withdraw(sdp, NULL);
 }
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index 9278fecba632..8d9085c0ecde 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -167,6 +167,18 @@ static inline unsigned int gfs2_tune_get_i(struct gfs2_tune *gt,
 	return x;
 }
 
+/**
+ * gfs2_withdrawn - test whether the file system is withdrawing or withdrawn
+ * @sdp: the superblock
+ */
+static inline bool gfs2_withdrawn(struct gfs2_sbd *sdp)
+{
+	if (test_bit(SDF_WITHDRAWN, &sdp->sd_flags) ||
+	    test_bit(SDF_WITHDRAWING, &sdp->sd_flags))
+		return true;
+	return false;
+}
+
 #define gfs2_tune_get(sdp, field) \
 gfs2_tune_get_i(&(sdp)->sd_tune, &(sdp)->sd_tune.field)
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (8 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, an IO error encountered in function gfs2_ail1_flush
would get cause a deadlock: because of the io error (and its resulting
withdrawn state), buffers stopped being written to the journal.
Buffers would remain on the ail1 list, so gfs2_ail1_start_one would
return 1 to indicate dirty buffers were still on the ail1 list.
However, when function gfs2_ail1_flush got a non-zero return code,
it would goto restart to retry the writes, which meant it would never
finish, and thus the infinite loop.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 6ccc5d01b813..0fe11bde796b 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -160,7 +160,7 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 	list_for_each_entry_reverse(tr, head, tr_list) {
 		if (wbc->nr_to_write <= 0)
 			break;
-		if (gfs2_ail1_start_one(sdp, wbc, tr))
+		if (gfs2_ail1_start_one(sdp, wbc, tr) && !gfs2_withdrawn(sdp))
 			goto restart;
 	}
 	spin_unlock(&sdp->sd_ail_lock);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (9 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-08-20 14:09   ` Andreas Gruenbacher
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, gfs2 kept track of journal io errors in two
places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
This patch consolidates the two into sd_log_error so that it
reflects the first error encountered writing to the journal.
In future patches, we will take advantage of this by checking
this value rather than having to check both when reacting to
io errors.

In addition, this fixes a tight loop in unmount: If buffers
get on the ail1 list and an io error occurs elsewhere, the
ail1 list would never be cleared because they were always busy.
So unmount would hang, waiting for the ail1 list to empty.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h |  7 +++----
 fs/gfs2/log.c    | 20 +++++++++++++++-----
 fs/gfs2/quota.c  |  2 +-
 3 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b261168be298..39cec5361ba5 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -620,9 +620,8 @@ enum {
 	SDF_RORECOVERY		= 7, /* read only recovery */
 	SDF_SKIP_DLM_UNLOCK	= 8,
 	SDF_FORCE_AIL_FLUSH     = 9,
-	SDF_AIL1_IO_ERROR	= 10,
-	SDF_FS_FROZEN           = 11,
-	SDF_WITHDRAWING		= 12, /* Will withdraw eventually */
+	SDF_FS_FROZEN           = 10,
+	SDF_WITHDRAWING		= 11, /* Will withdraw eventually */
 };
 
 enum gfs2_freeze_state {
@@ -831,7 +830,7 @@ struct gfs2_sbd {
 	atomic_t sd_log_in_flight;
 	struct bio *sd_log_bio;
 	wait_queue_head_t sd_log_flush_wait;
-	int sd_log_error;
+	int sd_log_error; /* First log error */
 
 	atomic_t sd_reserving_log;
 	wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 0fe11bde796b..9784763fbb4e 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -108,8 +108,7 @@ __acquires(&sdp->sd_ail_lock)
 
 		if (!buffer_busy(bh)) {
 			if (!buffer_uptodate(bh) &&
-			    !test_and_set_bit(SDF_AIL1_IO_ERROR,
-					      &sdp->sd_flags)) {
+			    !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
 				gfs2_io_error_bh(sdp, bh);
 				set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 			}
@@ -203,10 +202,21 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 					 bd_ail_st_list) {
 		bh = bd->bd_bh;
 		gfs2_assert(sdp, bd->bd_tr == tr);
-		if (buffer_busy(bh))
+		/**
+		 * If another process flagged an io error, e.g. writing to the
+		 * journal, error all other bhs and move them off the ail1 to
+		 * prevent a tight loop when unmount tries to flush ail1,
+		 * regardless of whether they're still busy. If no outside
+		 * errors were found and the buffer is busy, move to the next.
+		 * If the ail buffer is not busy and caught an error, flag it
+		 * for others.
+		 */
+		if (sdp->sd_log_error) {
+			gfs2_io_error_bh(sdp, bh);
+		} else if (buffer_busy(bh)) {
 			continue;
-		if (!buffer_uptodate(bh) &&
-		    !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
+		} else if (!buffer_uptodate(bh) &&
+			   !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
 			gfs2_io_error_bh(sdp, bh);
 			set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 		}
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index a8dfc86fd682..8871fca9102f 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1480,7 +1480,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 		return;
 	if (!gfs2_withdrawn(sdp)) {
 		fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
-		sdp->sd_log_error = error;
+		cmpxchg(&sdp->sd_log_error, 0, error);
 		wake_up(&sdp->sd_logd_waitq);
 	}
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (10 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn Bob Peterson
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, all io errors received by the quota daemon or the
logd daemon would cause a complaint message to be issued, such as:

   gfs2: fsid=dm-13.0: Error 10 writing to journal, jid=0

This patch changes it so that the error message is only issued the
first time the error is encountered.

Also, before this patch function gfs2_end_log_write did not set the
sd_log_error value, so log errors would not cause the file system to
be withdrawn. This patch sets the error code so the file system is
properly withdrawn if an io error is encountered writing to the journal.

WARNING: This change in function breaks check xfstests generic/441
and causes it to fail: io errors writing to the log should cause a
file system to be withdrawn, and no further operations are tolerated.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c  | 5 +++--
 fs/gfs2/quota.c | 4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index fd2e452ad327..edff7538d7f2 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -211,8 +211,9 @@ static void gfs2_end_log_write(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	if (bio->bi_status) {
-		fs_err(sdp, "Error %d writing to journal, jid=%u\n",
-		       bio->bi_status, sdp->sd_jdesc->jd_jid);
+		if (!cmpxchg(&sdp->sd_log_error, 0, (int)bio->bi_status))
+			fs_err(sdp, "Error %d writing to journal, jid=%u\n",
+			       bio->bi_status, sdp->sd_jdesc->jd_jid);
 		wake_up(&sdp->sd_logd_waitq);
 	}
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 8871fca9102f..acb6cbdd1431 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1479,8 +1479,8 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 	if (error == 0 || error == -EROFS)
 		return;
 	if (!gfs2_withdrawn(sdp)) {
-		fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
-		cmpxchg(&sdp->sd_log_error, 0, error);
+		if (!cmpxchg(&sdp->sd_log_error, 0, error))
+			fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
 		wake_up(&sdp->sd_logd_waitq);
 	}
 }
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (11 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_log_flush could get into an infinite
loop trying to clear out its ail1 list. If the file system was
withdrawn (or pending withdraw) due to a problem with writing the ail1
list, it would never clear out the list, and therefore, would loop
infinitely. This patch changes function gfs2_log_flush so that it
does while (!gfs2_withdraw(sdp)) rather than while (;;).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 9784763fbb4e..2fd43146de00 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -854,7 +854,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 
 	if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
 		if (!sdp->sd_log_idle) {
-			for (;;) {
+			while (!gfs2_withdrawn(sdp)) {
 				gfs2_ail1_start(sdp);
 				gfs2_ail1_wait(sdp);
 				if (gfs2_ail1_empty(sdp))
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (12 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-08-27 11:20   ` Andreas Gruenbacher
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use Bob Peterson
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a node fails, user space informs dlm of the node failure,
and dlm instructs gfs2 on the surviving nodes to perform journal
recovery. It does this by calling various callback functions in
lock_dlm.c. To mark its progress, it keeps generation numbers
and recover bits in a dlm "control" lock lvb, which is seen by
all nodes to determine which journals need to be replayed.

The gfs2 on all nodes get the same recovery requests from dlm,
so they all try to do the recovery, but only one will be
granted the exclusive lock on the journal. The others fail
with a "Busy" message on their "try lock."

However, when a node is withdrawn, it cannot safely do any
recovery or safely replay any journals. To make matters worse,
gfs2 might withdraw as a result of attempting recovery. For
example, this might happen if the device goes offline, or if
an hba fails. But in today's gfs2 code, it doesn't check for
being withdrawn at any step in the recovery process. What's
worse if that these callbacks from dlm have no return code,
so there is no way to indicate failure back to dlm. We can
send a "Recovery failed" uevent eventually, but that tells
user space what happened, not dlm's kernel code.

Before this patch, lock_dlm would perform its recovery steps but
ignore the result, and eventually it would still update its
generation number in the lvb, despite the fact that it may have
withdrawn or encountered an error. The other nodes would then
see the newer generation number in the lvb and conclude that
they don't need to do recovery because the generation number
is newer than the last one they saw. They think a different
node has already recovered the journal.

This patch adds checks to several of the callbacks used by dlm
in its recovery state machine so that the functions are ignored
and skipped if an io error has occurred or if the file system
is withdrawn. That prevents the lvb bits from being updated, and
therefore dlm and user space still see the need for recovery to
take place.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lock_dlm.c | 18 ++++++++++++++++++
 fs/gfs2/recovery.c |  5 +++++
 2 files changed, 23 insertions(+)

diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 31df26ed7854..9329f86ffcbe 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
 	struct gfs2_sbd *sdp = arg;
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (gfs2_withdrawn(sdp)) {
+		fs_err(sdp, "recover_prep ignored due to withdraw.\n");
+		return;
+	}
 	spin_lock(&ls->ls_recover_spin);
 	ls->ls_recover_block = ls->ls_recover_start;
 	set_bit(DFL_DLM_RECOVERY, &ls->ls_recover_flags);
@@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct dlm_slot *slot)
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 	int jid = slot->slot - 1;
 
+	if (gfs2_withdrawn(sdp)) {
+		fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
+		       jid);
+		return;
+	}
 	spin_lock(&ls->ls_recover_spin);
 	if (ls->ls_recover_size < jid + 1) {
 		fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
@@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct dlm_slot *slots, int num_slots,
 	struct gfs2_sbd *sdp = arg;
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (gfs2_withdrawn(sdp)) {
+		fs_err(sdp, "recover_done ignored due to withdraw.\n");
+		return;
+	}
 	/* ensure the ls jid arrays are large enough */
 	set_recover_size(sdp, slots, num_slots);
 
@@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, unsigned int jid,
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (gfs2_withdrawn(sdp)) {
+		fs_err(sdp, "recovery_result jid %d ignored due to withdraw.\n",
+		       jid);
+		return;
+	}
 	if (test_bit(DFL_NO_DLM_OPS, &ls->ls_recover_flags))
 		return;
 
diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 4ce2bfdbefdc..9e15b5aa2cfb 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
 	int jlocked = 0;
 
 	t_start = ktime_get();
+	if (gfs2_withdrawn(sdp)) {
+		fs_err(sdp, "jid=%u: Recovery not attempted due to withdraw.\n",
+		       jd->jd_jid);
+		goto fail;
+	}
 	if (sdp->sd_args.ar_spectator)
 		goto fail;
 	if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (13 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw Bob Peterson
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function check_journal_clean was in ops_fstype.c.
This patch moves it to util.c so we can make use of it elsewhere
in a future patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/ops_fstype.c | 42 -----------------------------------------
 fs/gfs2/util.c       | 45 ++++++++++++++++++++++++++++++++++++++++++++
 fs/gfs2/util.h       |  1 +
 3 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 56616997ab24..fcfd68794bfc 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -591,48 +591,6 @@ static int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh)
 	return error;
 }
 
-/**
- * check_journal_clean - Make sure a journal is clean for a spectator mount
- * @sdp: The GFS2 superblock
- * @jd: The journal descriptor
- *
- * Returns: 0 if the journal is clean or locked, else an error
- */
-static int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
-{
-	int error;
-	struct gfs2_holder j_gh;
-	struct gfs2_log_header_host head;
-	struct gfs2_inode *ip;
-
-	ip = GFS2_I(jd->jd_inode);
-	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
-				   GL_EXACT | GL_NOCACHE, &j_gh);
-	if (error) {
-		fs_err(sdp, "Error locking journal for spectator mount.\n");
-		return -EPERM;
-	}
-	error = gfs2_jdesc_check(jd);
-	if (error) {
-		fs_err(sdp, "Error checking journal for spectator mount.\n");
-		goto out_unlock;
-	}
-	error = gfs2_find_jhead(jd, &head, false);
-	if (error) {
-		fs_err(sdp, "Error parsing journal for spectator mount.\n");
-		goto out_unlock;
-	}
-	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
-		error = -EPERM;
-		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
-		       "must not be a spectator.\n", jd->jd_jid);
-	}
-
-out_unlock:
-	gfs2_glock_dq_uninit(&j_gh);
-	return error;
-}
-
 static int init_journal(struct gfs2_sbd *sdp, int undo)
 {
 	struct inode *master = d_inode(sdp->sd_master_dir);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index bd3c91fa446b..048236163be7 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -19,7 +19,10 @@
 #include "gfs2.h"
 #include "incore.h"
 #include "glock.h"
+#include "lops.h"
+#include "recovery.h"
 #include "rgrp.h"
+#include "super.h"
 #include "util.h"
 
 struct kmem_cache *gfs2_glock_cachep __read_mostly;
@@ -36,6 +39,48 @@ void gfs2_assert_i(struct gfs2_sbd *sdp)
 	fs_emerg(sdp, "fatal assertion failed\n");
 }
 
+/**
+ * check_journal_clean - Make sure a journal is clean for a spectator mount
+ * @sdp: The GFS2 superblock
+ * @jd: The journal descriptor
+ *
+ * Returns: 0 if the journal is clean or locked, else an error
+ */
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
+{
+	int error;
+	struct gfs2_holder j_gh;
+	struct gfs2_log_header_host head;
+	struct gfs2_inode *ip;
+
+	ip = GFS2_I(jd->jd_inode);
+	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
+				   GL_EXACT | GL_NOCACHE, &j_gh);
+	if (error) {
+		fs_err(sdp, "Error locking journal for spectator mount.\n");
+		return -EPERM;
+	}
+	error = gfs2_jdesc_check(jd);
+	if (error) {
+		fs_err(sdp, "Error checking journal for spectator mount.\n");
+		goto out_unlock;
+	}
+	error = gfs2_find_jhead(jd, &head, false);
+	if (error) {
+		fs_err(sdp, "Error parsing journal for spectator mount.\n");
+		goto out_unlock;
+	}
+	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
+		error = -EPERM;
+		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
+		       "must not be a spectator.\n", jd->jd_jid);
+	}
+
+out_unlock:
+	gfs2_glock_dq_uninit(&j_gh);
+	return error;
+}
+
 int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index 8d9085c0ecde..e3539ceda1ca 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -131,6 +131,7 @@ static inline void gfs2_metatype_set(struct buffer_head *bh, u16 type,
 
 int gfs2_io_error_i(struct gfs2_sbd *sdp, const char *function,
 		    char *file, unsigned int line);
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
 
 #define gfs2_io_error(sdp) \
 gfs2_io_error_i((sdp), __func__, __FILE__, __LINE__);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (14 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn Bob Peterson
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, when a file system was withdrawn, all further
attempts to enqueue or promote glocks were rejected and returned
-EIO. This is only important for media-backed glocks like inode
and rgrp glocks. All other glocks may be safely used because there
is no potential for metadata corruption. This patch allows some
glocks to be used even after the file system is withdrawn. This
is accomplished with a new glops flag, GLOF_JOURNALED, which tells
us which inodes cannot be safely manipulated after withdraw. This
facilitates future patches that enhance fs withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c      |  7 +++++--
 fs/gfs2/glops.c      | 16 +++++++++++++---
 fs/gfs2/glops.h      |  3 ++-
 fs/gfs2/incore.h     |  8 ++++++++
 fs/gfs2/inode.c      | 14 ++++++++++----
 fs/gfs2/ops_fstype.c | 14 +++++++++++---
 fs/gfs2/sys.c        | 12 ++++++++++--
 7 files changed, 59 insertions(+), 15 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index b03f784c982f..cf1d180621c1 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -548,7 +548,7 @@ __acquires(&gl->gl_lockref.lock)
 	int ret;
 
 	if (unlikely(gfs2_withdrawn(sdp)) &&
-	    target != LM_ST_UNLOCKED)
+	    (glops->go_flags & GLOF_JOURNALED) && target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
 		      LM_FLAG_PRIORITY);
@@ -1096,7 +1096,8 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(gfs2_withdrawn(sdp)))
+	if (unlikely(gfs2_withdrawn(sdp)) &&
+	    (gl->gl_ops->go_flags & GLOF_JOURNALED))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
@@ -1762,6 +1763,8 @@ static const char *gflags2str(char *buf, const struct gfs2_glock *gl)
 		*p++ = 'o';
 	if (test_bit(GLF_BLOCKING, gflags))
 		*p++ = 'b';
+	if (gl->gl_ops->go_flags & GLOF_JOURNALED)
+		*p++ = 'j';
 	*p = 0;
 	return buf;
 }
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 4825a00b7dfa..f738a7f64285 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -589,7 +589,7 @@ const struct gfs2_glock_operations gfs2_meta_glops = {
 	.go_type = LM_TYPE_META,
 };
 
-const struct gfs2_glock_operations gfs2_inode_glops = {
+const struct gfs2_glock_operations gfs2_sys_inode_glops = {
 	.go_sync = inode_go_sync,
 	.go_inval = inode_go_inval,
 	.go_demote_ok = inode_go_demote_ok,
@@ -599,6 +599,16 @@ const struct gfs2_glock_operations gfs2_inode_glops = {
 	.go_flags = GLOF_ASPACE | GLOF_LRU,
 };
 
+const struct gfs2_glock_operations gfs2_user_inode_glops = {
+	.go_sync = inode_go_sync,
+	.go_inval = inode_go_inval,
+	.go_demote_ok = inode_go_demote_ok,
+	.go_lock = inode_go_lock,
+	.go_dump = inode_go_dump,
+	.go_type = LM_TYPE_INODE,
+	.go_flags = GLOF_ASPACE | GLOF_LRU | GLOF_JOURNALED,
+};
+
 const struct gfs2_glock_operations gfs2_rgrp_glops = {
 	.go_sync = rgrp_go_sync,
 	.go_inval = rgrp_go_inval,
@@ -606,7 +616,7 @@ const struct gfs2_glock_operations gfs2_rgrp_glops = {
 	.go_unlock = gfs2_rgrp_go_unlock,
 	.go_dump = gfs2_rgrp_dump,
 	.go_type = LM_TYPE_RGRP,
-	.go_flags = GLOF_LVB,
+	.go_flags = GLOF_LVB | GLOF_JOURNALED,
 };
 
 const struct gfs2_glock_operations gfs2_freeze_glops = {
@@ -642,7 +652,7 @@ const struct gfs2_glock_operations gfs2_journal_glops = {
 
 const struct gfs2_glock_operations *gfs2_glops_list[] = {
 	[LM_TYPE_META] = &gfs2_meta_glops,
-	[LM_TYPE_INODE] = &gfs2_inode_glops,
+	[LM_TYPE_INODE] = &gfs2_user_inode_glops,
 	[LM_TYPE_RGRP] = &gfs2_rgrp_glops,
 	[LM_TYPE_IOPEN] = &gfs2_iopen_glops,
 	[LM_TYPE_FLOCK] = &gfs2_flock_glops,
diff --git a/fs/gfs2/glops.h b/fs/gfs2/glops.h
index 8ed1857c1a8d..63130a5959e1 100644
--- a/fs/gfs2/glops.h
+++ b/fs/gfs2/glops.h
@@ -15,7 +15,8 @@
 extern struct workqueue_struct *gfs2_freeze_wq;
 
 extern const struct gfs2_glock_operations gfs2_meta_glops;
-extern const struct gfs2_glock_operations gfs2_inode_glops;
+extern const struct gfs2_glock_operations gfs2_user_inode_glops;
+extern const struct gfs2_glock_operations gfs2_sys_inode_glops;
 extern const struct gfs2_glock_operations gfs2_rgrp_glops;
 extern const struct gfs2_glock_operations gfs2_freeze_glops;
 extern const struct gfs2_glock_operations gfs2_iopen_glops;
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 39cec5361ba5..3f9935730ffb 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -251,6 +251,14 @@ struct gfs2_glock_operations {
 #define GLOF_ASPACE 1
 #define GLOF_LVB    2
 #define GLOF_LRU    4
+/**
+ * The GLOF_JOURNALED flag marks objects that are journaled, which means we
+ * need to treat them specially when a file system withdraw occurs. Inodes
+ * like the journal inode itself are okay to manipulate after withdraw, but
+ * user files need to be protected from journal replay after their glock has
+ * been granted to another node.
+ */
+#define GLOF_JOURNALED 8
 };
 
 enum {
diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c
index 998051c4aea7..2fefd8581965 100644
--- a/fs/gfs2/inode.c
+++ b/fs/gfs2/inode.c
@@ -141,7 +141,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned int type,
 		struct gfs2_sbd *sdp = GFS2_SB(inode);
 		ip->i_no_formal_ino = no_formal_ino;
 
-		error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
+		error = gfs2_glock_get(sdp, no_addr, &gfs2_user_inode_glops,
+				       CREATE, &ip->i_gl);
 		if (unlikely(error))
 			goto fail;
 		flush_delayed_work(&ip->i_gl->gl_work);
@@ -248,6 +249,8 @@ struct inode *gfs2_lookup_simple(struct inode *dip, const char *name)
 {
 	struct qstr qstr;
 	struct inode *inode;
+	struct gfs2_inode *ip;
+
 	gfs2_str2qstr(&qstr, name);
 	inode = gfs2_lookupi(dip, &qstr, 1);
 	/* gfs2_lookupi has inconsistent callers: vfs
@@ -257,8 +260,10 @@ struct inode *gfs2_lookup_simple(struct inode *dip, const char *name)
 	 */
 	if (inode == NULL)
 		return ERR_PTR(-ENOENT);
-	else
-		return inode;
+
+	ip = GFS2_I(inode);
+	ip->i_gl->gl_ops = &gfs2_sys_inode_glops;
+	return inode;
 }
 
 
@@ -703,7 +708,8 @@ static int gfs2_create_inode(struct inode *dir, struct dentry *dentry,
 
 	gfs2_set_inode_blocks(inode, blocks);
 
-	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl);
+	error = gfs2_glock_get(sdp, ip->i_no_addr, &gfs2_user_inode_glops,
+			       CREATE, &ip->i_gl);
 	if (error)
 		goto fail_free_inode;
 	flush_delayed_work(&ip->i_gl->gl_work);
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index fcfd68794bfc..6e5e0f291a1a 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -432,7 +432,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh,
 }
 
 static int gfs2_lookup_root(struct super_block *sb, struct dentry **dptr,
-			    u64 no_addr, const char *name)
+			    u64 no_addr, const char *name, int journaled)
 {
 	struct gfs2_sbd *sdp = sb->s_fs_info;
 	struct dentry *dentry;
@@ -444,6 +444,11 @@ static int gfs2_lookup_root(struct super_block *sb, struct dentry **dptr,
 		fs_err(sdp, "can't read in %s inode: %ld\n", name, PTR_ERR(inode));
 		return PTR_ERR(inode);
 	}
+	if (!journaled) {
+		struct gfs2_inode *ip = GFS2_I(inode);
+
+		ip->i_gl->gl_ops = &gfs2_sys_inode_glops;
+	}
 	dentry = d_make_root(inode);
 	if (!dentry) {
 		fs_err(sdp, "can't alloc %s dentry\n", name);
@@ -492,13 +497,13 @@ static int init_sb(struct gfs2_sbd *sdp, int silent)
 
 	/* Get the root inode */
 	no_addr = sdp->sd_sb.sb_root_dir.no_addr;
-	ret = gfs2_lookup_root(sb, &sdp->sd_root_dir, no_addr, "root");
+	ret = gfs2_lookup_root(sb, &sdp->sd_root_dir, no_addr, "root", 1);
 	if (ret)
 		goto out;
 
 	/* Get the master inode */
 	no_addr = sdp->sd_sb.sb_master_dir.no_addr;
-	ret = gfs2_lookup_root(sb, &sdp->sd_master_dir, no_addr, "master");
+	ret = gfs2_lookup_root(sb, &sdp->sd_master_dir, no_addr, "master", 0);
 	if (ret) {
 		dput(sdp->sd_root_dir);
 		goto out;
@@ -533,6 +538,7 @@ static void gfs2_others_may_mount(struct gfs2_sbd *sdp)
 static int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh)
 {
 	struct gfs2_inode *dip = GFS2_I(sdp->sd_jindex);
+	struct gfs2_inode *ip;
 	struct qstr name;
 	char buf[20];
 	struct gfs2_jdesc *jd;
@@ -580,6 +586,8 @@ static int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh)
 			break;
 		}
 
+		ip = GFS2_I(jd->jd_inode);
+		ip->i_gl->gl_ops = &gfs2_sys_inode_glops;
 		spin_lock(&sdp->sd_jindex_spin);
 		jd->jd_jid = sdp->sd_journals++;
 		list_add_tail(&jd->jd_list, &sdp->sd_jindex_list);
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 8f50c2921e6f..9ca85023dde4 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -264,8 +264,16 @@ static ssize_t demote_rq_store(struct gfs2_sbd *sdp, const char *buf, size_t len
 	if (!test_and_set_bit(SDF_DEMOTE, &sdp->sd_flags))
 		fs_info(sdp, "demote interface used\n");
 	rv = gfs2_glock_get(sdp, glnum, glops, 0, &gl);
-	if (rv)
-		return rv;
+	if (rv) {
+		/* If the glock is not found and it's supposed to be an inode,
+		 * check if it's one of the system inodes. */
+		if (gltype == LM_TYPE_INODE) {
+			glops = &gfs2_sys_inode_glops;
+			rv = gfs2_glock_get(sdp, glnum, glops, 0, &gl);
+			if (rv)
+				return rv;
+		}
+	}
 	gfs2_glock_cb(gl, glmode);
 	gfs2_glock_put(gl);
 	return len;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (15 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_freeze would loop forever if the
file system trying to be frozen is withdrawn. That's because function
gfs2_lock_fs_check_clean tries to enqueue the glock of the journal
and the gfs2_glock returns -EIO because you can't enqueue a journaled
glock after a withdraw.

This patch moves the check for file system withdraw inside the loop
so that the loop can end when withdraw occurs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 5ea8d45e989d..110f00024950 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -1016,12 +1016,12 @@ static int gfs2_freeze(struct super_block *sb)
 	if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN)
 		goto out;
 
-	if (gfs2_withdrawn(sdp)) {
-		error = -EINVAL;
-		goto out;
-	}
-
 	for (;;) {
+		if (gfs2_withdrawn(sdp)) {
+			error = -EINVAL;
+			goto out;
+		}
+
 		error = gfs2_lock_fs_check_clean(sdp, &sdp->sd_freeze_gh);
 		if (!error)
 			break;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (16 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw Bob Peterson
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, if a process encountered an error and decided to
withdraw, if another process was already in the process of withdrawing,
the secondary withdraw would be silently ignored, which set it free
to proceed with its processing, unlock any locks, etc. That's correct
behavior if the original withdrawer encounters further errors down
the road. However, second withdrawers need to wait for the first
withdrawer to finish its withdraw before proceeding. If we don't wait
we could end up assuming everything is alright, unlock glocks and
telling other nodes they can have the glock, despite the fact that
a withdraw is still ongoing and may require a journal replay before
any locks are released. For example, if an rgrp glock is freed
by a process that didn't wait for the withdraw, a journal replay
could introduce file system corruption by replaying a rgrp block
that has already been granted to another node.

This patch makes secondary withdrawers wait until the primary
withdrawer is finished with its processing before proceeding.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h |  3 +++
 fs/gfs2/util.c   | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 3f9935730ffb..b8f6e2fd86ae 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -630,6 +630,7 @@ enum {
 	SDF_FORCE_AIL_FLUSH     = 9,
 	SDF_FS_FROZEN           = 10,
 	SDF_WITHDRAWING		= 11, /* Will withdraw eventually */
+	SDF_WITHDRAW_IN_PROG	= 12, /* Withdraw is in progress */
 };
 
 enum gfs2_freeze_state {
@@ -839,6 +840,8 @@ struct gfs2_sbd {
 	struct bio *sd_log_bio;
 	wait_queue_head_t sd_log_flush_wait;
 	int sd_log_error; /* First log error */
+	atomic_t sd_withdrawer;
+	wait_queue_head_t sd_withdraw_wait;
 
 	atomic_t sd_reserving_log;
 	wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 048236163be7..b678412d968a 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -89,9 +89,23 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 	struct va_format vaf;
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW &&
-	    test_and_set_bit(SDF_WITHDRAWN, &sdp->sd_flags))
-		return 0;
+	    test_and_set_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
+		if (!test_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags))
+			return -1;
+
+		fs_warn(sdp, "Pid %d waiting for process %d to withdraw.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG,
+			    TASK_UNINTERRUPTIBLE);
+		fs_warn(sdp, "Pid %d done waiting for process %d.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		return -1;
+	}
 
+	set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+	atomic_set(&sdp->sd_withdrawer, pid_nr(task_pid(current)));
 	if (fmt) {
 		va_start(args, fmt);
 
@@ -119,6 +133,9 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 		set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
 		fs_err(sdp, "withdrawn\n");
 		dump_stack();
+		clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+		smp_mb__after_atomic();
+		wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG);
 	}
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_PANIC)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (17 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, when a node withdrew a gfs2 file system, it
wrote a (clean) unmount log header. That's wrong. You don't want
to write anything to the journal once you're withdrawn because
that's acknowledging that the transaction is complete and the
journal is in good shape, neither of which may be a valid
assumption when the file system is withdrawn. This is especially
true if the withdraw was caused due to io errors writing to the
journal in the first place. The best course of action is to leave
the journal "as is" until it may be safely replayed during
journal recovery, regardless of whether it's done by this node or
another.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 2fd43146de00..4472f5fc471d 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -693,12 +693,16 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
 {
 	struct gfs2_log_header *lh;
 	u32 hash, crc;
-	struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
+	struct page *page;
 	struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local;
 	struct timespec64 tv;
 	struct super_block *sb = sdp->sd_vfs;
 	u64 dblock;
 
+	if (gfs2_withdrawn(sdp))
+		goto out;
+
+	page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
 	lh = page_address(page);
 	clear_page(lh);
 
@@ -751,6 +755,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
 
 	gfs2_log_write(sdp, page, sb->s_blocksize, 0, dblock);
 	gfs2_log_submit_bio(&sdp->sd_log_bio, REQ_OP_WRITE | op_flags);
+out:
 	log_flush_wait(sdp);
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (18 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held. The
glocks not affected by a withdraw are permitted to be passed
around as normal during a withdraw. A new glops flag, called
GLOF_OK_AT_WITHDRAW, indicates glocks that may be passed around
freely while a withdraw is taking place.

One such glock is the "live" glock which is now used to signal when
a withdraw occurs. When a withdraw occurs, the node signals its
withdraw by dequeueing the "live" glock and trying to enqueue it
in EX mode, thus forcing the other nodes to all see a demote
request, by way of a "1CB" (one callback) try lock. The "live"
glock is not granted in EX; the callback is only just used to
indicate a withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Also note that the withdraw function may be called from a wide
variety of situations, and therefore, we need to take extra
precautions to make sure pointers are valid before using them in
many circumstances.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c      |  32 +++++++++-
 fs/gfs2/glock.h      |   1 +
 fs/gfs2/glops.c      |  77 ++++++++++++++++++++++-
 fs/gfs2/incore.h     |   7 +++
 fs/gfs2/lock_dlm.c   |  34 ++++++++++
 fs/gfs2/log.c        |   6 ++
 fs/gfs2/meta_io.c    |   2 +-
 fs/gfs2/ops_fstype.c |   4 +-
 fs/gfs2/quota.c      |   4 ++
 fs/gfs2/super.c      |  60 ++++++++++++++++--
 fs/gfs2/super.h      |   1 +
 fs/gfs2/util.c       | 145 ++++++++++++++++++++++++++++++++++++++++++-
 12 files changed, 362 insertions(+), 11 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index cf1d180621c1..7202793056a8 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -247,7 +247,7 @@ static void __gfs2_glock_put(struct gfs2_glock *gl)
 	gfs2_glock_remove_from_lru(gl);
 	spin_unlock(&gl->gl_lockref.lock);
 	GLOCK_BUG_ON(gl, !list_empty(&gl->gl_holders));
-	GLOCK_BUG_ON(gl, mapping && mapping->nrpages);
+	GLOCK_BUG_ON(gl, mapping && mapping->nrpages && !gfs2_withdrawn(sdp));
 	trace_gfs2_glock_put(gl);
 	sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
 }
@@ -548,7 +548,9 @@ __acquires(&gl->gl_lockref.lock)
 	int ret;
 
 	if (unlikely(gfs2_withdrawn(sdp)) &&
-	    (glops->go_flags & GLOF_JOURNALED) && target != LM_ST_UNLOCKED)
+	    (glops->go_flags & GLOF_JOURNALED) &&
+	    (gh && !(LM_FLAG_NOEXP & gh->gh_flags)) &&
+	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
 		      LM_FLAG_PRIORITY);
@@ -1096,7 +1098,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(gfs2_withdrawn(sdp)) &&
+	if (unlikely(gfs2_withdrawn(sdp)) && !(LM_FLAG_NOEXP & gh->gh_flags) &&
 	    (gl->gl_ops->go_flags & GLOF_JOURNALED))
 		return -EIO;
 
@@ -1141,11 +1143,30 @@ int gfs2_glock_poll(struct gfs2_holder *gh)
 void gfs2_glock_dq(struct gfs2_holder *gh)
 {
 	struct gfs2_glock *gl = gh->gh_gl;
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	unsigned delay = 0;
 	int fast_path = 0;
 
 	spin_lock(&gl->gl_lockref.lock);
+	/**
+	 * If we're in the process of file system withdraw, we cannot just
+	 * dequeue any glocks until our journal is recovered, lest we
+	 * introduce file system corruption. We need two exceptions to this
+	 * rule: We need to allow unlocking of nondisk glocks and the glock
+	 * for our own journal that needs recovery. Note that evict may also
+	 * call this for the journal inode, but without using sdp->sd_jinode_gh
+	 * so we need to check the glock, not just the holder of the glock.
+	 */
+	if (test_bit(SDF_WITHDRAWN, &sdp->sd_flags) &&
+	    test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags) &&
+	    (gl->gl_ops->go_flags & GLOF_JOURNALED) &&
+	    gh->gh_gl != sdp->sd_jinode_gl) {
+		sdp->sd_glock_dqs_held++;
+		might_sleep();
+		wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY,
+			    TASK_UNINTERRUPTIBLE);
+	}
 	if (gh->gh_flags & GL_NOCACHE)
 		handle_callback(gl, LM_ST_UNLOCKED, 0, false);
 
@@ -1625,6 +1646,11 @@ static void dump_glock_func(struct gfs2_glock *gl)
 	dump_glock(NULL, gl, true);
 }
 
+void gfs2_gl_flushwork(struct gfs2_sbd *sdp)
+{
+	flush_workqueue(glock_workqueue);
+}
+
 /**
  * gfs2_gl_hash_clear - Empty out the glock hash table
  * @sdp: the filesystem
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index ea1568db0452..f8fe62d8f723 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -202,6 +202,7 @@ extern int gfs2_glock_nq_num(struct gfs2_sbd *sdp, u64 number,
 			     struct gfs2_holder *gh);
 extern int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 extern void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
+extern void gfs2_gl_flushwork(struct gfs2_sbd *sdp);
 extern void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl,
 			    bool fsid);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) {		\
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index f738a7f64285..db4d499ecf4f 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -32,6 +32,8 @@
 
 struct workqueue_struct *gfs2_freeze_wq;
 
+extern struct workqueue_struct *gfs2_control_wq;
+
 static void gfs2_ail_error(struct gfs2_glock *gl, const struct buffer_head *bh)
 {
 	fs_err(gl->gl_name.ln_sbd,
@@ -504,13 +506,17 @@ static void freeze_go_sync(struct gfs2_glock *gl)
 	int error = 0;
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 
-	if (gl->gl_state == LM_ST_SHARED &&
+	if (gl->gl_state == LM_ST_SHARED && !gfs2_withdrawn(sdp) &&
 	    test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags)) {
 		atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
 		error = freeze_super(sdp->sd_vfs);
 		if (error) {
 			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
 				error);
+			if (gfs2_withdrawn(sdp)) {
+				atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
+				return;
+			}
 			gfs2_assert_withdraw(sdp, 0);
 		}
 		queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work);
@@ -585,6 +591,73 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 	}
 }
 
+/**
+ * nondisk_go_callback - used to signal when a node did a withdraw
+ * @gl: the nondisk glock
+ * @remote: true if this came from a different cluster node
+ *
+ */
+static void nondisk_go_callback(struct gfs2_glock *gl, bool remote)
+{
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+
+	/* Ignore the callback unless it's from another node, and it's the
+	   live lock. */
+	if (!remote || gl->gl_name.ln_number != GFS2_LIVE_LOCK)
+		return;
+
+	/* First order of business is to cancel the demote request. We don't
+	 * really want to demote a nondisk glock. At best it's just to inform
+	 * us of a another node's withdraw. We'll keep it in SH mode. */
+	clear_bit(GLF_DEMOTE, &gl->gl_flags);
+	clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags);
+
+	/* Ignore the unlock if we're withdrawn, unmounting, or in recovery. */
+	if (test_bit(SDF_NORECOVERY, &sdp->sd_flags) ||
+	    test_bit(SDF_WITHDRAWN, &sdp->sd_flags) ||
+	    test_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags))
+		return;
+
+	/* We only care when a node wants us to unlock, because that means
+	 * they want a journal recovered. */
+	if (gl->gl_demote_state != LM_ST_UNLOCKED)
+		return;
+
+	if (sdp->sd_args.ar_spectator) {
+		fs_warn(sdp, "Spectator node cannot recover journals.\n");
+		return;
+	}
+
+	fs_warn(sdp, "Some node has withdrawn; checking for recovery.\n");
+	set_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags);
+	/**
+	 * We can't call remote_withdraw directly here or gfs2_recover_journal
+	 * because this is called from the glock unlock function and the
+	 * remote_withdraw needs to enqueue and dequeue the same "live" glock
+	 * we were called from. So we queue it to the control work queue in
+	 * lock_dlm.
+	 */
+	queue_delayed_work(gfs2_control_wq, &sdp->sd_control_work, 0);
+}
+
+/**
+ * inode_go_free - wake up anyone waiting for dlm's unlock ast to free it
+ * @gl: glock being freed
+ *
+ * For now, this is only used for the journal inode glock. In withdraw
+ * situations, we need to wait for the glock to be freed so that we know
+ * other nodes may proceed with recovery / journal replay.
+ */
+static void inode_go_free(struct gfs2_glock *gl)
+{
+	/* Note that we cannot reference gl_object because it's already set
+	 * to NULL by this point in its lifecycle. */
+	if (!test_bit(GLF_FREEING, &gl->gl_flags))
+		return;
+	clear_bit_unlock(GLF_FREEING, &gl->gl_flags);
+	wake_up_bit(&gl->gl_flags, GLF_FREEING);
+}
+
 const struct gfs2_glock_operations gfs2_meta_glops = {
 	.go_type = LM_TYPE_META,
 };
@@ -597,6 +670,7 @@ const struct gfs2_glock_operations gfs2_sys_inode_glops = {
 	.go_dump = inode_go_dump,
 	.go_type = LM_TYPE_INODE,
 	.go_flags = GLOF_ASPACE | GLOF_LRU,
+	.go_free = inode_go_free,
 };
 
 const struct gfs2_glock_operations gfs2_user_inode_glops = {
@@ -639,6 +713,7 @@ const struct gfs2_glock_operations gfs2_flock_glops = {
 
 const struct gfs2_glock_operations gfs2_nondisk_glops = {
 	.go_type = LM_TYPE_NONDISK,
+	.go_callback = nondisk_go_callback,
 };
 
 const struct gfs2_glock_operations gfs2_quota_glops = {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index b8f6e2fd86ae..a1910b842866 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -246,6 +246,7 @@ struct gfs2_glock_operations {
 	void (*go_dump)(struct seq_file *seq, struct gfs2_glock *gl,
 			const char *fs_id_buf);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
+	void (*go_free)(struct gfs2_glock *gl);
 	const int go_type;
 	const unsigned long go_flags;
 #define GLOF_ASPACE 1
@@ -355,6 +356,7 @@ enum {
 	GLF_BLOCKING			= 15,
 	GLF_INODE_CREATING		= 16, /* Inode creation occurring */
 	GLF_REVOKES			= 17, /* Glock has revokes in queue */
+	GLF_FREEING			= 18, /* Wait for glock to be freed */
 };
 
 struct gfs2_glock {
@@ -631,6 +633,9 @@ enum {
 	SDF_FS_FROZEN           = 10,
 	SDF_WITHDRAWING		= 11, /* Will withdraw eventually */
 	SDF_WITHDRAW_IN_PROG	= 12, /* Withdraw is in progress */
+	SDF_REMOTE_WITHDRAW	= 13, /* Performing remote recovery */
+	SDF_WITHDRAW_RECOVERY	= 14, /* Wait for journal recovery when we are
+					 withdrawing */
 };
 
 enum gfs2_freeze_state {
@@ -779,6 +784,7 @@ struct gfs2_sbd {
 	struct gfs2_jdesc *sd_jdesc;
 	struct gfs2_holder sd_journal_gh;
 	struct gfs2_holder sd_jinode_gh;
+	struct gfs2_glock *sd_jinode_gl;
 
 	struct gfs2_holder sd_sc_gh;
 	struct gfs2_holder sd_qc_gh;
@@ -865,6 +871,7 @@ struct gfs2_sbd {
 
 	unsigned long sd_last_warning;
 	struct dentry *debugfs_dir;    /* debugfs directory */
+	unsigned long sd_glock_dqs_held;
 };
 
 static inline void gfs2_glstats_inc(struct gfs2_glock *gl, int which)
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 9329f86ffcbe..f9ac4c1a83a5 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -19,6 +19,8 @@
 
 #include "incore.h"
 #include "glock.h"
+#include "glops.h"
+#include "recovery.h"
 #include "util.h"
 #include "sys.h"
 #include "trace_gfs2.h"
@@ -126,6 +128,8 @@ static void gdlm_ast(void *arg)
 
 	switch (gl->gl_lksb.sb_status) {
 	case -DLM_EUNLOCK: /* Unlocked, so glock can be freed */
+		if (gl->gl_ops->go_free)
+			gl->gl_ops->go_free(gl);
 		gfs2_glock_free(gl);
 		return;
 	case -DLM_ECANCEL: /* Cancel while getting lock */
@@ -325,6 +329,7 @@ static void gdlm_cancel(struct gfs2_glock *gl)
 /*
  * dlm/gfs2 recovery coordination using dlm_recover callbacks
  *
+ *  0. gfs2 checks for another cluster node withdraw, needing journal replay
  *  1. dlm_controld sees lockspace members change
  *  2. dlm_controld blocks dlm-kernel locking activity
  *  3. dlm_controld within dlm-kernel notifies gfs2 (recover_prep)
@@ -573,6 +578,28 @@ static int control_lock(struct gfs2_sbd *sdp, int mode, uint32_t flags)
 			 &ls->ls_control_lksb, "control_lock");
 }
 
+/**
+ * remote_withdraw - react to a node withdrawing from the file system
+ * @sdp: The superblock
+ */
+static void remote_withdraw(struct gfs2_sbd *sdp)
+{
+	struct gfs2_jdesc *jd;
+	int ret = 0, count = 0;
+
+	list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
+		if (jd->jd_jid == sdp->sd_lockstruct.ls_jid)
+			continue;
+		ret = gfs2_recover_journal(jd, true);
+		if (ret)
+			break;
+		count++;
+	}
+
+	/* Now drop the additional reference we acquired */
+	fs_err(sdp, "Journals checked: %d, ret = %d.\n", count, ret);
+}
+
 static void gfs2_control_func(struct work_struct *work)
 {
 	struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_control_work.work);
@@ -583,6 +610,13 @@ static void gfs2_control_func(struct work_struct *work)
 	int recover_size;
 	int i, error;
 
+	/* First check for other nodes that may have done a withdraw. */
+	if (test_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags)) {
+		remote_withdraw(sdp);
+		clear_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags);
+		return;
+	}
+
 	spin_lock(&ls->ls_recover_spin);
 	/*
 	 * No MOUNT_DONE means we're still mounting; control_mount()
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 4472f5fc471d..3e4640744c02 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -1013,6 +1013,7 @@ int gfs2_logd(void *data)
 	unsigned long t = 1;
 	DEFINE_WAIT(wait);
 	bool did_flush;
+	bool withdrawn = false;
 
 	while (!kthread_should_stop()) {
 
@@ -1023,9 +1024,13 @@ int gfs2_logd(void *data)
 					 "withdrawing the file system to "
 					 "prevent further damage.\n",
 					 sdp->sd_fsname, sdp->sd_log_error);
+			withdrawn = true;
 		}
 
 		did_flush = false;
+		if (withdrawn)
+			goto ignore_work;
+
 		if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
 			gfs2_ail1_empty(sdp);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
@@ -1042,6 +1047,7 @@ int gfs2_logd(void *data)
 			did_flush = true;
 		}
 
+	ignore_work:
 		if (!gfs2_ail_flush_reqd(sdp) || did_flush)
 			wake_up(&sdp->sd_log_waitq);
 
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 6f2ef841a4bb..34486c01b8f7 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -254,7 +254,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	struct buffer_head *bh, *bhs[2];
 	int num = 0;
 
-	if (unlikely(gfs2_withdrawn(sdp))) {
+	if (unlikely(gfs2_withdrawn(sdp)) && gl != sdp->sd_jinode_gl) {
 		*bhp = NULL;
 		return -EIO;
 	}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 6e5e0f291a1a..464d365bd3f5 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -651,7 +651,8 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 
 		error = gfs2_glock_nq_num(sdp, sdp->sd_lockstruct.ls_jid,
 					  &gfs2_journal_glops,
-					  LM_ST_EXCLUSIVE, LM_FLAG_NOEXP,
+					  LM_ST_EXCLUSIVE,
+					  LM_FLAG_NOEXP | GL_NOCACHE,
 					  &sdp->sd_journal_gh);
 		if (error) {
 			fs_err(sdp, "can't acquire journal glock: %d\n", error);
@@ -659,6 +660,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 		}
 
 		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
+		sdp->sd_jinode_gl = ip->i_gl;
 		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
 					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 					   &sdp->sd_jinode_gh);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index acb6cbdd1431..6075d26c9d78 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1575,6 +1575,10 @@ int gfs2_quotad(void *data)
 		else
 			t = 0;
 		finish_wait(&sdp->sd_quota_wait, &wait);
+
+		if (atomic_read(&sdp->sd_withdrawer) ==
+		    pid_nr(task_pid(current)))
+			break;
 	}
 
 	return 0;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 110f00024950..5db342835de0 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -426,9 +426,13 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 		goto fail_threads;
 
 	j_gl->gl_ops->go_inval(j_gl, DIO_METADATA);
+	if (gfs2_withdrawn(sdp)) {
+		error = -EIO;
+		goto fail;
+	}
 
 	error = gfs2_find_jhead(sdp->sd_jdesc, &head, false);
-	if (error)
+	if (error || gfs2_withdrawn(sdp))
 		goto fail;
 
 	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
@@ -442,7 +446,7 @@ int gfs2_make_fs_rw(struct gfs2_sbd *sdp)
 	gfs2_log_pointers_init(sdp, head.lh_blkno);
 
 	error = gfs2_quota_init(sdp);
-	if (error)
+	if (error || gfs2_withdrawn(sdp))
 		goto fail;
 
 	set_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
@@ -843,6 +847,52 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 		gfs2_glock_dq_uninit(&gh);
 }
 
+/**
+ * gfs2_withdraw_quiesce - Quiesce a withdrawn file system
+ * @sdp: the filesystem
+ *
+ * This is very much like function gfs2_make_fs_ro, but it needs to handle
+ * special situations we can get in when withdrawing.
+ *
+ * Returns: errno
+ */
+
+int gfs2_withdraw_quiesce(struct gfs2_sbd *sdp)
+{
+	struct gfs2_holder freeze_gh;
+	int error = 0;
+
+	gfs2_holder_mark_uninitialized(&freeze_gh);
+	if (sdp->sd_freeze_gl &&
+	    !gfs2_glock_is_locked_by_me(sdp->sd_freeze_gl)) {
+		error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED,
+				   GL_NOCACHE | LM_FLAG_TRY, &freeze_gh);
+		if (error == GLR_TRYFAILED)
+			error = 0;
+	}
+	flush_workqueue(gfs2_delete_workqueue);
+	if (current == sdp->sd_quotad_process)
+		fs_warn(sdp, "The quotad daemon is withdrawing.\n");
+	else if (sdp->sd_quotad_process)
+		kthread_stop(sdp->sd_quotad_process);
+	sdp->sd_quotad_process = NULL;
+	if (current == sdp->sd_logd_process)
+		fs_warn(sdp, "The logd daemon is withdrawing.\n");
+	else if (sdp->sd_logd_process)
+		kthread_stop(sdp->sd_logd_process);
+	sdp->sd_logd_process = NULL;
+
+	wait_event_timeout(sdp->sd_reserving_log_wait,
+			   atomic_read(&sdp->sd_reserving_log) == 0, HZ * 5);
+
+	if (gfs2_holder_initialized(&freeze_gh))
+		gfs2_glock_dq_uninit(&freeze_gh);
+
+	gfs2_quota_cleanup(sdp);
+	sdp->sd_vfs->s_flags |= SB_RDONLY;
+	return error;
+}
+
 /**
  * gfs2_make_fs_ro - Turn a Read-Write FS into a Read-Only one
  * @sdp: the filesystem
@@ -931,8 +981,10 @@ static void gfs2_put_super(struct super_block *sb)
 	gfs2_glock_put(sdp->sd_freeze_gl);
 
 	if (!sdp->sd_args.ar_spectator) {
-		gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
-		gfs2_glock_dq_uninit(&sdp->sd_jinode_gh);
+		if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+			gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
+		if (gfs2_holder_initialized(&sdp->sd_jinode_gh))
+			gfs2_glock_dq_uninit(&sdp->sd_jinode_gh);
 		gfs2_glock_dq_uninit(&sdp->sd_sc_gh);
 		gfs2_glock_dq_uninit(&sdp->sd_qc_gh);
 		iput(sdp->sd_sc_inode);
diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h
index 73c97dccae21..e7c0ab951223 100644
--- a/fs/gfs2/super.h
+++ b/fs/gfs2/super.h
@@ -45,6 +45,7 @@ extern void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc,
 extern void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 			  struct buffer_head *l_bh);
 extern int gfs2_statfs_sync(struct super_block *sb, int type);
+extern int gfs2_withdraw_quiesce(struct gfs2_sbd *sdp);
 extern void gfs2_freeze_func(struct work_struct *work);
 
 extern struct file_system_type gfs2_fs_type;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index b678412d968a..c0e37541b843 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -14,11 +14,14 @@
 #include <linux/buffer_head.h>
 #include <linux/crc32.h>
 #include <linux/gfs2_ondisk.h>
+#include <linux/delay.h>
 #include <linux/uaccess.h>
 
 #include "gfs2.h"
 #include "incore.h"
 #include "glock.h"
+#include "glops.h"
+#include "log.h"
 #include "lops.h"
 #include "recovery.h"
 #include "rgrp.h"
@@ -81,6 +84,144 @@ int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 	return error;
 }
 
+static void signal_our_withdraw(struct gfs2_sbd *sdp)
+{
+	struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
+	int ret = 0;
+	int tries;
+
+	if (test_bit(SDF_NORECOVERY, &sdp->sd_flags))
+		return;
+
+	/* Prevent any glock dq until withdraw recovery is complete */
+	set_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
+	/**
+	 * Don't tell dlm we're bailing until we have no more buffers in the
+	 * wind. If journal had an IO error, the log code should just purge
+	 * the outstanding buffers rather than submitting new IO. Making the
+	 * file system read-only will flush the journal, etc.
+	 *
+	 * During a normal unmount, gfs2_make_fs_ro calls gfs2_log_shutdown
+	 * which clears SDF_JOURNAL_LIVE. In a withdraw, we must not write
+	 * any UNMOUNT log header, so we can't call gfs2_log_shutdown, and
+	 * therefore we need to clear SDF_JOURNAL_LIVE manually.
+	 */
+	clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
+	if (sdp->sd_vfs && !sb_rdonly(sdp->sd_vfs))
+		ret = gfs2_withdraw_quiesce(sdp);
+
+	/**
+	 * Drop the glock for our journal so another node can recover it.
+	 */
+	if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
+		gfs2_glock_dq_wait(&sdp->sd_journal_gh);
+		gfs2_holder_uninit(&sdp->sd_journal_gh);
+	}
+	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
+	gfs2_glock_dq(&sdp->sd_jinode_gh);
+	if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
+		/* Make sure gfs2_unfreeze works if partially-frozen */
+		flush_workqueue(gfs2_freeze_wq);
+		atomic_set(&sdp->sd_freeze_state, SFS_FROZEN);
+		thaw_super(sdp->sd_vfs);
+	} else {
+		wait_on_bit(&gl->gl_flags, GLF_DEMOTE, TASK_UNINTERRUPTIBLE);
+	}
+
+	/**
+	 * holder_uninit to force glock_put, to force dlm to let go
+	 */
+	gfs2_holder_uninit(&sdp->sd_jinode_gh);
+
+	/**
+	 * Note: We need to be careful here:
+	 * Func gfs2_jindex_free calls iput on jd_inode, which will evict it.
+	 * The evict, in turn, will dequeue its glock, but the glock dq will
+	 * wait for the withdraw unless we have exception code in glock_dq.
+	 */
+	gfs2_jindex_free(sdp);
+	/**
+	 * Wait until the journal inode's glock is freed. This allows try locks
+	 * on other nodes to be successful, otherwise we remain the owner of
+	 * the glock as far as dlm is concerned.
+	 */
+	if (gl->gl_ops->go_free) {
+		set_bit(GLF_FREEING, &gl->gl_flags);
+		wait_on_bit(&gl->gl_flags, GLF_FREEING, TASK_UNINTERRUPTIBLE);
+	}
+
+	if (sdp->sd_lockstruct.ls_ops->lm_lock == NULL) { /* lock_nolock */
+		clear_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
+		goto skip_recovery;
+	}
+	/**
+	 * Dequeue the "live" glock, but keep a reference so it's never freed.
+	 */
+	gfs2_glock_hold(gl);
+	gfs2_glock_dq_wait(&sdp->sd_live_gh);
+	/**
+	 * We enqueue the "live" glock in EX so that all other nodes
+	 * get a demote request and act on it. We don't really want the
+	 * lock in EX, so we send a "try" lock with 1CB to produce a callback.
+	 */
+	fs_warn(sdp, "Requesting recovery of jid %d.\n",
+		sdp->sd_lockstruct.ls_jid);
+	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+			   &sdp->sd_live_gh);
+	msleep(GL_GLOCK_MAX_HOLD);
+	/**
+	 * This will likely fail in a cluster, but succeed standalone:
+	 */
+	ret = gfs2_glock_nq(&sdp->sd_live_gh);
+
+	/**
+	 * If we actually got the "live" lock in EX mode, there are no other
+	 * nodes available to replay our journal. So we try to replay it
+	 * ourselves. We hold the "live" glock to prevent other mounters
+	 * during recovery, then just dequeue it and reacquire it in our
+	 * normal SH mode. Just in case the problem that caused us to
+	 * withdraw prevents us from recovering our journal (e.g. io errors
+	 * and such) we still check if the journal is clean before proceeding
+	 * but we may wait forever until another mounter does the recovery.
+	 */
+	if (ret == 0) {
+		fs_warn(sdp, "No other mounters found. Trying to recover our "
+			"own journal jid %d.\n", sdp->sd_lockstruct.ls_jid);
+		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
+			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
+				sdp->sd_lockstruct.ls_jid);
+		gfs2_glock_dq_wait(&sdp->sd_live_gh);
+		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+				   &sdp->sd_live_gh);
+		gfs2_glock_nq(&sdp->sd_live_gh);
+	}
+
+	gfs2_glock_queue_put(gl); /* drop the extra reference we acquired */
+	clear_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
+
+	/**
+	 * Now wait until recovery complete.
+	 */
+	for (tries = 0; tries < 10; tries++) {
+		ret = check_journal_clean(sdp, sdp->sd_jdesc);
+		if (!ret)
+			break;
+		msleep(HZ);
+		fs_warn(sdp, "Waiting for journal recovery jid %d.\n",
+			sdp->sd_lockstruct.ls_jid);
+	}
+skip_recovery:
+	if (!ret)
+		fs_warn(sdp, "Journal recovery complete for jid %d.\n",
+			sdp->sd_lockstruct.ls_jid);
+	else
+		fs_warn(sdp, "Journal recovery skipped for %d until next "
+			"mount.\n", sdp->sd_lockstruct.ls_jid);
+	fs_warn(sdp, "Glock dequeues delayed: %lu\n", sdp->sd_glock_dqs_held);
+	sdp->sd_glock_dqs_held = 0;
+	wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY);
+}
+
 int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
@@ -121,6 +262,8 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 		fs_err(sdp, "about to withdraw this file system\n");
 		BUG_ON(sdp->sd_args.ar_debug);
 
+		signal_our_withdraw(sdp);
+
 		kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
 
 		if (!strcmp(sdp->sd_lockstruct.ls_ops->lm_proto_name, "lock_dlm"))
@@ -131,7 +274,7 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 			lm->lm_unmount(sdp);
 		}
 		set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
-		fs_err(sdp, "withdrawn\n");
+		fs_err(sdp, "File system withdrawn\n");
 		dump_stack();
 		clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
 		smp_mb__after_atomic();
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (19 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean Bob Peterson
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, the rgrp_go_inval and inode_go_inval functions each
checked if there were any items left on the ail count (by way of a
count), and if so, did a withdraw. But the withdraw code now uses
glocks when changing the file system to read-only status. So we can
not have glock functions withdrawing or a hang will likely result:
The glocks can't be serviced by the work_func if the work_func is
busy doing its own withdraw.

This patch removes the checks from the go_inval functions and adds
a centralized check in do_xmote to warn about the problem and not
withdraw, but flag the error so it's eventually caught when the logd
daemon eventually runs.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 17 +++++++++++++++--
 fs/gfs2/glops.c |  3 ---
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 7202793056a8..cb378df8217b 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -570,9 +570,22 @@ __acquires(&gl->gl_lockref.lock)
 	spin_unlock(&gl->gl_lockref.lock);
 	if (glops->go_sync)
 		glops->go_sync(gl);
-	if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
+	if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags)) {
+		/**
+		 * The call to go_sync should have cleared out the ail list.
+		 * If there are still items, we have a problem. We ought to
+		 * withdraw, but we can't because the withdraw code also uses
+		 * glocks. Warn about the error, dump the glock, then fall
+		 * through and wait for logd to do the withdraw for us.
+		 */
+		if ((atomic_read(&gl->gl_ail_count) != 0) &&
+		    (!cmpxchg(&sdp->sd_log_error, 0, -EIO))) {
+			gfs2_assert_warn(sdp, !atomic_read(&gl->gl_ail_count));
+			gfs2_dump_glock(NULL, gl, true);
+		}
 		glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA);
-	clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
+		clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
+	}
 
 	gfs2_glock_hold(gl);
 	if (sdp->sd_lockstruct.ls_ops->lm_lock)	{
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index db4d499ecf4f..5d4d2b6873e5 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -196,7 +196,6 @@ static void rgrp_go_inval(struct gfs2_glock *gl, int flags)
 		gfs2_rgrp_brelse(rgd);
 
 	WARN_ON_ONCE(!(flags & DIO_METADATA));
-	gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
 	truncate_inode_pages_range(mapping, gl->gl_vm.start, gl->gl_vm.end);
 
 	if (rgd)
@@ -296,8 +295,6 @@ static void inode_go_inval(struct gfs2_glock *gl, int flags)
 {
 	struct gfs2_inode *ip = gfs2_glock2inode(gl);
 
-	gfs2_assert_withdraw(gl->gl_name.ln_sbd, !atomic_read(&gl->gl_ail_count));
-
 	if (flags & DIO_METADATA) {
 		struct address_space *mapping = gfs2_glock2aspace(gl);
 		truncate_inode_pages(mapping, 0);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (20 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen Bob Peterson
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function check_journal_clean would give messages
related to journal recovery. That's fine for mount time, but when a
node withdraws and forces replay that way, we don't want all those
distracting and misleading messages. This patch adds a new parameter
to make those messages optional.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/ops_fstype.c |  2 +-
 fs/gfs2/util.c       | 23 ++++++++++++++++-------
 fs/gfs2/util.h       |  4 +++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 464d365bd3f5..c4031719fbaa 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -691,7 +691,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 			struct gfs2_jdesc *jd = gfs2_jdesc_find(sdp, x);
 
 			if (sdp->sd_args.ar_spectator) {
-				error = check_journal_clean(sdp, jd);
+				error = check_journal_clean(sdp, jd, true);
 				if (error)
 					goto fail_jinode_gh;
 				continue;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index c0e37541b843..c5425bf9a1fa 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -49,7 +49,8 @@ void gfs2_assert_i(struct gfs2_sbd *sdp)
  *
  * Returns: 0 if the journal is clean or locked, else an error
  */
-int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
+			bool verbose)
 {
 	int error;
 	struct gfs2_holder j_gh;
@@ -60,23 +61,31 @@ int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
 				   GL_EXACT | GL_NOCACHE, &j_gh);
 	if (error) {
-		fs_err(sdp, "Error locking journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error %d locking journal for spectator "
+			       "mount.\n", error);
 		return -EPERM;
 	}
 	error = gfs2_jdesc_check(jd);
 	if (error) {
-		fs_err(sdp, "Error checking journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error checking journal for spectator "
+			       "mount.\n");
 		goto out_unlock;
 	}
 	error = gfs2_find_jhead(jd, &head, false);
 	if (error) {
-		fs_err(sdp, "Error parsing journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error parsing journal for spectator "
+			       "mount.\n");
 		goto out_unlock;
 	}
 	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
 		error = -EPERM;
-		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
-		       "must not be a spectator.\n", jd->jd_jid);
+		if (verbose)
+			fs_err(sdp, "jid=%u: Journal is dirty, so the first "
+			       "mounter must not be a spectator.\n",
+			       jd->jd_jid);
 	}
 
 out_unlock:
@@ -203,7 +212,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 * Now wait until recovery complete.
 	 */
 	for (tries = 0; tries < 10; tries++) {
-		ret = check_journal_clean(sdp, sdp->sd_jdesc);
+		ret = check_journal_clean(sdp, sdp->sd_jdesc, false);
 		if (!ret)
 			break;
 		msleep(HZ);
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index e3539ceda1ca..6f333325d4ac 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -131,7 +131,9 @@ static inline void gfs2_metatype_set(struct buffer_head *bh, u16 type,
 
 int gfs2_io_error_i(struct gfs2_sbd *sdp, const char *function,
 		    char *file, unsigned int line);
-int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
+
+extern int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
+			       bool verbose);
 
 #define gfs2_io_error(sdp) \
 gfs2_io_error_i((sdp), __func__, __FILE__, __LINE__);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (21 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently Bob Peterson
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, an io error, such as -EIO writing to the journal
would cause function gfs2_freeze to go into an infinite loop,
continuously retrying the freeze operation. But nothing ever clears
the -EIO except unmount after withdraw, which is impossible if the
freeze operation never ends (fails). Instead you get:

[ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6499.773058] gfs2: fsid=dm-32.0: retrying...
[ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5
[ 6500.797015] gfs2: fsid=dm-32.0: retrying...

This patch adds a check for -EIO in gfs2_freeze, and if seen, it
dequeues the freeze glock, aborts the loop and returns the error.
Also, there's no need to pass the freeze holder to function
gfs2_lock_fs_check_clean since it's only called in one place and
it's a well-known superblock pointer, so this simplifies that.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 5db342835de0..7ac40f0e87cd 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -658,8 +658,7 @@ struct lfcc {
  * Returns: errno
  */
 
-static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
-				    struct gfs2_holder *freeze_gh)
+static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp)
 {
 	struct gfs2_inode *ip;
 	struct gfs2_jdesc *jd;
@@ -684,7 +683,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
 	}
 
 	error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_EXCLUSIVE,
-				   GL_NOCACHE, freeze_gh);
+				   GL_NOCACHE, &sdp->sd_freeze_gh);
 
 	list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
 		error = gfs2_jdesc_check(jd);
@@ -700,7 +699,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp,
 	}
 
 	if (error)
-		gfs2_glock_dq_uninit(freeze_gh);
+		gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
 
 out:
 	while (!list_empty(&list)) {
@@ -1074,15 +1073,20 @@ static int gfs2_freeze(struct super_block *sb)
 			goto out;
 		}
 
-		error = gfs2_lock_fs_check_clean(sdp, &sdp->sd_freeze_gh);
+		error = gfs2_lock_fs_check_clean(sdp);
 		if (!error)
 			break;
 
 		if (error == -EBUSY)
 			fs_err(sdp, "waiting for recovery before freeze\n");
-		else
+		else if (error == -EIO) {
+			fs_err(sdp, "Fatel IO error: cannot freeze gfs2 at "
+			       "this time.\n");
+			gfs2_glock_dq_uninit(&sdp->sd_freeze_gh);
+			goto out;
+		} else {
 			fs_err(sdp, "error freezing FS: %d\n", error);
-
+		}
 		fs_err(sdp, "retrying...\n");
 		msleep(1000);
 	}
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (22 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_write_revokes would call
gfs2_ail1_empty, then traverse the sd_ail1_list looking for
transactions that had bds which were no longer queued to a glock.
And if it found some, it would try to issue revokes for them, up to
a predetermined maximum. There were two problems with how it did
this. First was the fact that gfs2_ail1_empty moves transactions
which have nothing remaining on the ail1 list from the sd_ail1_list
to the sd_ail2_list, thus making its traversal of sd_ail1_list
miss them completely, and therefore, never issue revokes for them.
Second was the fact that there were three traversals (or partial
traversals) of the sd_ail1_list, each of which took and then
released the sd_ail_lock lock. (First inside gfs2_ail1_empty,
second to determine if there are any revokes to be issued, and
third to actually issue them. All this taking and releasing of the
sd_ail_lock meant other processes could modify the lists and the
conditions in which we're working.

This patch simplies the whole process by adding a new parameter
to function gfs2_ail1_empty, max_revokes. For normal calls, this
is passed in as 0, meaning we don't want to issue any revokes.
For function gfs2_write_revokes, we pass in the maximum number
of revokes we can, thus allowing gfs2_ail1_empty to add the
revokes where needed. This simplies the code, allows for a single
holding of the sd_ail_lock, and allows gfs2_ail1_empty to add
revokes for all the necessary bd items without missing any.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 61 +++++++++++++++++++--------------------------------
 1 file changed, 22 insertions(+), 39 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 3e4640744c02..a8be92d10234 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -189,11 +189,13 @@ static void gfs2_ail1_start(struct gfs2_sbd *sdp)
 /**
  * gfs2_ail1_empty_one - Check whether or not a trans in the AIL has been synced
  * @sdp: the filesystem
- * @ai: the AIL entry
+ * @tr: the transaction
+ * @max_revokes: If nonzero, issue revokes for the bd items for written buffers
  *
  */
 
-static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
+static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
+				int *max_revokes)
 {
 	struct gfs2_bufdata *bd, *s;
 	struct buffer_head *bh;
@@ -220,18 +222,28 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 			gfs2_io_error_bh(sdp, bh);
 			set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 		}
-		list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
+		/* If we have space for revokes and the bd is no longer on any
+		   buf list, we can just add a revoke for it here and avoid
+		   having to put it on the ail2 list, where it would need to
+		   be revoked later. */
+		if (*max_revokes && list_empty(&bd->bd_list)) {
+			gfs2_add_revoke(sdp, bd);
+			(*max_revokes)--;
+		} else {
+			list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
+		}
 	}
 }
 
 /**
  * gfs2_ail1_empty - Try to empty the ail1 lists
  * @sdp: The superblock
+ * @max_revokes: If non-zero, add revokes where appropriate
  *
  * Tries to empty the ail1 lists, starting with the oldest first
  */
 
-static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
+static int gfs2_ail1_empty(struct gfs2_sbd *sdp, int max_revokes)
 {
 	struct gfs2_trans *tr, *s;
 	int oldest_tr = 1;
@@ -239,7 +251,7 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 
 	spin_lock(&sdp->sd_ail_lock);
 	list_for_each_entry_safe_reverse(tr, s, &sdp->sd_ail1_list, tr_list) {
-		gfs2_ail1_empty_one(sdp, tr);
+		gfs2_ail1_empty_one(sdp, tr, &max_revokes);
 		if (list_empty(&tr->tr_ail1_list) && oldest_tr)
 			list_move(&tr->tr_list, &sdp->sd_ail2_list);
 		else
@@ -622,25 +634,9 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 
 void gfs2_write_revokes(struct gfs2_sbd *sdp)
 {
-	struct gfs2_trans *tr;
-	struct gfs2_bufdata *bd, *tmp;
-	int have_revokes = 0;
 	int max_revokes = (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_log_descriptor)) / sizeof(u64);
 
-	gfs2_ail1_empty(sdp);
-	spin_lock(&sdp->sd_ail_lock);
-	list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
-		list_for_each_entry(bd, &tr->tr_ail2_list, bd_ail_st_list) {
-			if (list_empty(&bd->bd_list)) {
-				have_revokes = 1;
-				goto done;
-			}
-		}
-	}
-done:
-	spin_unlock(&sdp->sd_ail_lock);
-	if (have_revokes == 0)
-		return;
+	gfs2_log_lock(sdp);
 	while (sdp->sd_log_num_revoke > max_revokes)
 		max_revokes += (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_meta_header)) / sizeof(u64);
 	max_revokes -= sdp->sd_log_num_revoke;
@@ -651,20 +647,7 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp)
 		if (!sdp->sd_log_blks_reserved)
 			atomic_dec(&sdp->sd_log_blks_free);
 	}
-	gfs2_log_lock(sdp);
-	spin_lock(&sdp->sd_ail_lock);
-	list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
-		list_for_each_entry_safe(bd, tmp, &tr->tr_ail2_list, bd_ail_st_list) {
-			if (max_revokes == 0)
-				goto out_of_blocks;
-			if (!list_empty(&bd->bd_list))
-				continue;
-			gfs2_add_revoke(sdp, bd);
-			max_revokes--;
-		}
-	}
-out_of_blocks:
-	spin_unlock(&sdp->sd_ail_lock);
+	gfs2_ail1_empty(sdp, max_revokes);
 	gfs2_log_unlock(sdp);
 
 	if (!sdp->sd_log_num_revoke) {
@@ -862,7 +845,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 			while (!gfs2_withdrawn(sdp)) {
 				gfs2_ail1_start(sdp);
 				gfs2_ail1_wait(sdp);
-				if (gfs2_ail1_empty(sdp))
+				if (gfs2_ail1_empty(sdp, 0))
 					break;
 			}
 			atomic_dec(&sdp->sd_log_blks_free); /* Adjust for unreserved buffer */
@@ -1032,7 +1015,7 @@ int gfs2_logd(void *data)
 			goto ignore_work;
 
 		if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
-			gfs2_ail1_empty(sdp);
+			gfs2_ail1_empty(sdp, 0);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 				       GFS2_LFC_LOGD_JFLUSH_REQD);
 			did_flush = true;
@@ -1041,7 +1024,7 @@ int gfs2_logd(void *data)
 		if (gfs2_ail_flush_reqd(sdp)) {
 			gfs2_ail1_start(sdp);
 			gfs2_ail1_wait(sdp);
-			gfs2_ail1_empty(sdp);
+			gfs2_ail1_empty(sdp, 0);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 				       GFS2_LFC_LOGD_AIL_FLUSH_REQD);
 			did_flush = true;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (23 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_end_log_write would detect any IO
errors writing to the journal and put out an appropriate message,
but it never set a withdrawing condition. Eventually, the log daemon
would see the error and determine it was time to withdraw, but in
the meantime, other processes could continue running as if nothing
bad ever happened. The biggest consequence is that __gfs2_glock_put
would BUG() when it saw that there were still unwritten items.

This patch sets the WITHDRAWING status as soon as an IO error is
detected, and that way, the BUG will be avoided so the file system
can be properly withdrawn and unmounted.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index edff7538d7f2..bfe296be621c 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -214,6 +214,7 @@ static void gfs2_end_log_write(struct bio *bio)
 		if (!cmpxchg(&sdp->sd_log_error, 0, (int)bio->bi_status))
 			fs_err(sdp, "Error %d writing to journal, jid=%u\n",
 			       bio->bi_status, sdp->sd_jdesc->jd_jid);
+		set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 		wake_up(&sdp->sd_logd_waitq);
 	}

-- 
2.21.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock
  2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
                   ` (24 preceding siblings ...)
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
@ 2019-05-23 13:04 ` Bob Peterson
  25 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-05-23 13:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 32 ++++++++++++++++++++++++++++----
 1 file changed, 28 insertions(+), 4 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index cb378df8217b..d1805285ed73 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -140,7 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
 {
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 
-	BUG_ON(test_bit(GLF_REVOKES, &gl->gl_flags));
+	GLOCK_BUG_ON(gl, test_bit(GLF_REVOKES, &gl->gl_flags));
 	rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
 	smp_mb();
 	wake_up_glock(gl);
@@ -588,6 +588,31 @@ __acquires(&gl->gl_lockref.lock)
 	}
 
 	gfs2_glock_hold(gl);
+	/**
+	 * Check for an error encountered since we called go_sync and go_inval.
+	 * If so, we can't withdraw from the glock code because the withdraw
+	 * code itself uses glocks (see function signal_our_withdraw) to
+	 * change the mount to read-only. Most importantly, we must not call
+	 * dlm to unlock the glock until the journal is in a known good state
+	 * (after journal replay) otherwise other nodes may use the object
+	 * (rgrp or dinode) and then later, journal replay will corrupt the
+	 * file system. The best we can do here is wait for the logd daemon
+	 * to see sd_log_error and withdraw, and in the meantime, requeue the
+	 * work for later.
+	 *
+	 * However, if we're just unlocking the lock (say, for unmount, when
+	 * gfs2_gl_hash_clear calls clear_glock) and recovery is complete
+	 * then it's okay to tell dlm to unlock it.
+	 */
+	if (unlikely(sdp->sd_log_error &&
+		     (glops->go_flags & GLOF_JOURNALED))) {
+		if (target != LM_ST_UNLOCKED ||
+		    test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags)) {
+			gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
+			goto out;
+		}
+	}
+
 	if (sdp->sd_lockstruct.ls_ops->lm_lock)	{
 		/* lock_dlm */
 		ret = sdp->sd_lockstruct.ls_ops->lm_lock(gl, target, lck_flags);
@@ -596,8 +621,7 @@ __acquires(&gl->gl_lockref.lock)
 		    test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) {
 			finish_xmote(gl, target);
 			gfs2_glock_queue_work(gl, 0);
-		}
-		else if (ret) {
+		} else if (ret) {
 			fs_err(sdp, "lm_lock ret %d\n", ret);
 			GLOCK_BUG_ON(gl, !gfs2_withdrawn(sdp));
 		}
@@ -605,7 +629,7 @@ __acquires(&gl->gl_lockref.lock)
 		finish_xmote(gl, target);
 		gfs2_glock_queue_work(gl, 0);
 	}
-
+out:
 	spin_lock(&gl->gl_lockref.lock);
 }
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
@ 2019-05-29 16:20   ` Andreas Gruenbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2019-05-29 16:20 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, 23 May 2019 at 15:05, Bob Peterson <rpeterso@redhat.com> wrote:
>
> This patch replaces a few leftover printk errors with calls to
> fs_info and similar, so that the file system having the error is
> properly logged.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/bmap.c  |  2 +-
>  fs/gfs2/glops.c |  3 ++-
>  fs/gfs2/rgrp.c  | 27 ++++++++++++++-------------
>  fs/gfs2/super.c |  6 +++---
>  4 files changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c
> index f42718dd292f..a809aa887521 100644
> --- a/fs/gfs2/bmap.c
> +++ b/fs/gfs2/bmap.c
> @@ -1862,7 +1862,7 @@ static int punch_hole(struct gfs2_inode *ip, u64 offset, u64 length)
>                         gfs2_assert_withdraw(sdp, bh);
>                         if (gfs2_assert_withdraw(sdp,
>                                                  prev_bnr != bh->b_blocknr)) {
> -                               printk(KERN_EMERG "GFS2: fsid=%s:inode %llu, "
> +                               fs_emerg(sdp, "GFS2: fsid=%s:inode %llu, "
>                                        "block:%llu, i_h:%u, s_h:%u, mp_h:%u\n",
>                                        sdp->sd_fsname,
>                                        (unsigned long long)ip->i_no_addr,

The fsid is printed twice here; that won't help.

> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 84a403ed6e77..1e5720e50e9c 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -509,7 +509,8 @@ static void freeze_go_sync(struct gfs2_glock *gl)
>                 atomic_set(&sdp->sd_freeze_state, SFS_STARTING_FREEZE);
>                 error = freeze_super(sdp->sd_vfs);
>                 if (error) {
> -                       printk(KERN_INFO "GFS2: couldn't freeze filesystem: %d\n", error);
> +                       fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
> +                               error);
>                         gfs2_assert_withdraw(sdp, 0);
>                 }
>                 queue_work(gfs2_freeze_wq, &sdp->sd_freeze_work);
> diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
> index ed3a6d3973a9..985d968b042c 100644
> --- a/fs/gfs2/rgrp.c
> +++ b/fs/gfs2/rgrp.c
> @@ -1115,32 +1115,33 @@ static int gfs2_rgrp_lvb_valid(struct gfs2_rgrpd *rgd)
>  {
>         struct gfs2_rgrp_lvb *rgl = rgd->rd_rgl;
>         struct gfs2_rgrp *str = (struct gfs2_rgrp *)rgd->rd_bits[0].bi_bh->b_data;
> +       struct gfs2_sbd *sdp = rgd->rd_sbd;
>         int valid = 1;
>
>         if (rgl->rl_flags != str->rg_flags) {
> -               printk(KERN_WARNING "GFS2: rgd: %llu lvb flag mismatch %u/%u",
> -                      (unsigned long long)rgd->rd_addr,
> +               fs_warn(sdp, "GFS2: rgd: %llu lvb flag mismatch %u/%u",
> +                       (unsigned long long)rgd->rd_addr,
>                        be32_to_cpu(rgl->rl_flags), be32_to_cpu(str->rg_flags));
>                 valid = 0;
>         }
>         if (rgl->rl_free != str->rg_free) {
> -               printk(KERN_WARNING "GFS2: rgd: %llu lvb free mismatch %u/%u",
> -                      (unsigned long long)rgd->rd_addr,
> -                      be32_to_cpu(rgl->rl_free), be32_to_cpu(str->rg_free));
> +               fs_warn(sdp, "GFS2: rgd: %llu lvb free mismatch %u/%u",
> +                       (unsigned long long)rgd->rd_addr,
> +                       be32_to_cpu(rgl->rl_free), be32_to_cpu(str->rg_free));
>                 valid = 0;
>         }
>         if (rgl->rl_dinodes != str->rg_dinodes) {
> -               printk(KERN_WARNING "GFS2: rgd: %llu lvb dinode mismatch %u/%u",
> -                      (unsigned long long)rgd->rd_addr,
> -                      be32_to_cpu(rgl->rl_dinodes),
> -                      be32_to_cpu(str->rg_dinodes));
> +               fs_warn(sdp, "GFS2: rgd: %llu lvb dinode mismatch %u/%u",
> +                       (unsigned long long)rgd->rd_addr,
> +                       be32_to_cpu(rgl->rl_dinodes),
> +                       be32_to_cpu(str->rg_dinodes));
>                 valid = 0;
>         }
>         if (rgl->rl_igeneration != str->rg_igeneration) {
> -               printk(KERN_WARNING "GFS2: rgd: %llu lvb igen mismatch "
> -                      "%llu/%llu", (unsigned long long)rgd->rd_addr,
> -                      (unsigned long long)be64_to_cpu(rgl->rl_igeneration),
> -                      (unsigned long long)be64_to_cpu(str->rg_igeneration));
> +               fs_warn(sdp, "GFS2: rgd: %llu lvb igen mismatch %llu/%llu",
> +                       (unsigned long long)rgd->rd_addr,
> +                       (unsigned long long)be64_to_cpu(rgl->rl_igeneration),
> +                       (unsigned long long)be64_to_cpu(str->rg_igeneration));
>                 valid = 0;
>         }
>         return valid;
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 6e3a318edccd..62cc451f30d5 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -981,14 +981,14 @@ void gfs2_freeze_func(struct work_struct *work)
>         error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, 0,
>                                    &freeze_gh);
>         if (error) {
> -               printk(KERN_INFO "GFS2: couldn't get freeze lock : %d\n", error);
> +               fs_info(sdp, "GFS2: couldn't get freeze lock : %d\n", error);
>                 gfs2_assert_withdraw(sdp, 0);
>         } else {
>                 atomic_set(&sdp->sd_freeze_state, SFS_UNFROZEN);
>                 error = thaw_super(sb);
>                 if (error) {
> -                       printk(KERN_INFO "GFS2: couldn't thaw filesystem: %d\n",
> -                              error);
> +                       fs_info(sdp, "GFS2: couldn't thaw filesystem: %d\n",
> +                               error);
>                         gfs2_assert_withdraw(sdp, 0);
>                 }
>                 if (!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))
> --
> 2.21.0
>

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
@ 2019-08-20 14:09   ` Andreas Gruenbacher
  2019-09-04 16:59     ` Bob Peterson
  0 siblings, 1 reply; 31+ messages in thread
From: Andreas Gruenbacher @ 2019-08-20 14:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

On Thu, May 23, 2019 at 3:05 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch, gfs2 kept track of journal io errors in two
> places sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
> This patch consolidates the two into sd_log_error so that it
> reflects the first error encountered writing to the journal.
> In future patches, we will take advantage of this by checking
> this value rather than having to check both when reacting to
> io errors.
>
> In addition, this fixes a tight loop in unmount: If buffers
> get on the ail1 list and an io error occurs elsewhere, the
> ail1 list would never be cleared because they were always busy.
> So unmount would hang, waiting for the ail1 list to empty.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/incore.h |  7 +++----
>  fs/gfs2/log.c    | 20 +++++++++++++++-----
>  fs/gfs2/quota.c  |  2 +-
>  3 files changed, 19 insertions(+), 10 deletions(-)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index b261168be298..39cec5361ba5 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -620,9 +620,8 @@ enum {
>         SDF_RORECOVERY          = 7, /* read only recovery */
>         SDF_SKIP_DLM_UNLOCK     = 8,
>         SDF_FORCE_AIL_FLUSH     = 9,
> -       SDF_AIL1_IO_ERROR       = 10,
> -       SDF_FS_FROZEN           = 11,
> -       SDF_WITHDRAWING         = 12, /* Will withdraw eventually */
> +       SDF_FS_FROZEN           = 10,
> +       SDF_WITHDRAWING         = 11, /* Will withdraw eventually */
>  };
>
>  enum gfs2_freeze_state {
> @@ -831,7 +830,7 @@ struct gfs2_sbd {
>         atomic_t sd_log_in_flight;
>         struct bio *sd_log_bio;
>         wait_queue_head_t sd_log_flush_wait;
> -       int sd_log_error;
> +       int sd_log_error; /* First log error */
>
>         atomic_t sd_reserving_log;
>         wait_queue_head_t sd_reserving_log_wait;
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 0fe11bde796b..9784763fbb4e 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -108,8 +108,7 @@ __acquires(&sdp->sd_ail_lock)
>
>                 if (!buffer_busy(bh)) {
>                         if (!buffer_uptodate(bh) &&
> -                           !test_and_set_bit(SDF_AIL1_IO_ERROR,
> -                                             &sdp->sd_flags)) {
> +                           !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
>                                 gfs2_io_error_bh(sdp, bh);
>                                 set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
>                         }
> @@ -203,10 +202,21 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
>                                          bd_ail_st_list) {
>                 bh = bd->bd_bh;
>                 gfs2_assert(sdp, bd->bd_tr == tr);
> -               if (buffer_busy(bh))
> +               /**
> +                * If another process flagged an io error, e.g. writing to the
> +                * journal, error all other bhs and move them off the ail1 to
> +                * prevent a tight loop when unmount tries to flush ail1,
> +                * regardless of whether they're still busy. If no outside
> +                * errors were found and the buffer is busy, move to the next.
> +                * If the ail buffer is not busy and caught an error, flag it
> +                * for others.
> +                */
> +               if (sdp->sd_log_error) {
> +                       gfs2_io_error_bh(sdp, bh);

some of the error handling here is still sketchy: the only place where
sd_log_error is set without withdrawing the filesystem is
quotad_error. If the filesystem has already been marked
SDF_WITHDRAWING or SDF_WITHDRAWN, gfs2_io_error_bh will be a no-op. It
seems that we want to set SDF_WITHDRAWING here for the quotad_error
case instead of calling gfs2_io_error_bh?

> +               } else if (buffer_busy(bh)) {
>                         continue;
> -               if (!buffer_uptodate(bh) &&
> -                   !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
> +               } else if (!buffer_uptodate(bh) &&
> +                          !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
>                         gfs2_io_error_bh(sdp, bh);
>                         set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
>                 }
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index a8dfc86fd682..8871fca9102f 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1480,7 +1480,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
>                 return;
>         if (!gfs2_withdrawn(sdp)) {
>                 fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> -               sdp->sd_log_error = error;
> +               cmpxchg(&sdp->sd_log_error, 0, error);
>                 wake_up(&sdp->sd_logd_waitq);
>         }
>  }
> --
> 2.21.0
>

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn
  2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
@ 2019-08-27 11:20   ` Andreas Gruenbacher
  0 siblings, 0 replies; 31+ messages in thread
From: Andreas Gruenbacher @ 2019-08-27 11:20 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

On Thu, May 23, 2019 at 3:05 PM Bob Peterson <rpeterso@redhat.com> wrote:
> When a node fails, user space informs dlm of the node failure,
> and dlm instructs gfs2 on the surviving nodes to perform journal
> recovery. It does this by calling various callback functions in
> lock_dlm.c. To mark its progress, it keeps generation numbers
> and recover bits in a dlm "control" lock lvb, which is seen by
> all nodes to determine which journals need to be replayed.
>
> The gfs2 on all nodes get the same recovery requests from dlm,
> so they all try to do the recovery, but only one will be
> granted the exclusive lock on the journal. The others fail
> with a "Busy" message on their "try lock."
>
> However, when a node is withdrawn, it cannot safely do any
> recovery or safely replay any journals. To make matters worse,
> gfs2 might withdraw as a result of attempting recovery. For
> example, this might happen if the device goes offline, or if
> an hba fails. But in today's gfs2 code, it doesn't check for
> being withdrawn at any step in the recovery process. What's
> worse if that these callbacks from dlm have no return code,
> so there is no way to indicate failure back to dlm. We can
> send a "Recovery failed" uevent eventually, but that tells
> user space what happened, not dlm's kernel code.
>
> Before this patch, lock_dlm would perform its recovery steps but
> ignore the result, and eventually it would still update its
> generation number in the lvb, despite the fact that it may have
> withdrawn or encountered an error. The other nodes would then
> see the newer generation number in the lvb and conclude that
> they don't need to do recovery because the generation number
> is newer than the last one they saw. They think a different
> node has already recovered the journal.
>
> This patch adds checks to several of the callbacks used by dlm
> in its recovery state machine so that the functions are ignored
> and skipped if an io error has occurred or if the file system
> is withdrawn. That prevents the lvb bits from being updated, and
> therefore dlm and user space still see the need for recovery to
> take place.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/lock_dlm.c | 18 ++++++++++++++++++
>  fs/gfs2/recovery.c |  5 +++++
>  2 files changed, 23 insertions(+)
>
> diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
> index 31df26ed7854..9329f86ffcbe 100644
> --- a/fs/gfs2/lock_dlm.c
> +++ b/fs/gfs2/lock_dlm.c
> @@ -1081,6 +1081,10 @@ static void gdlm_recover_prep(void *arg)
>         struct gfs2_sbd *sdp = arg;
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_prep ignored due to withdraw.\n");
> +               return;
> +       }
>         spin_lock(&ls->ls_recover_spin);
>         ls->ls_recover_block = ls->ls_recover_start;
>         set_bit(DFL_DLM_RECOVERY, &ls->ls_recover_flags);
> @@ -1103,6 +1107,11 @@ static void gdlm_recover_slot(void *arg, struct dlm_slot *slot)
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>         int jid = slot->slot - 1;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
> +                      jid);
> +               return;
> +       }
>         spin_lock(&ls->ls_recover_spin);
>         if (ls->ls_recover_size < jid + 1) {
>                 fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
> @@ -1127,6 +1136,10 @@ static void gdlm_recover_done(void *arg, struct dlm_slot *slots, int num_slots,
>         struct gfs2_sbd *sdp = arg;
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recover_done ignored due to withdraw.\n");
> +               return;
> +       }

In gdlm_recover_prep, we're setting the DFL_DLM_RECOVERY flag to
indicate that we're in recovery. Assume that a withdraw occurs after
that. We'll then skip clearing the DFL_DLM_RECOVERY flag because of
the above check. Won't that leave waiters on DFL_DLM_RECOVERY stuck?

>         /* ensure the ls jid arrays are large enough */
>         set_recover_size(sdp, slots, num_slots);
>
> @@ -1154,6 +1167,11 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, unsigned int jid,
>  {
>         struct lm_lockstruct *ls = &sdp->sd_lockstruct;
>
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "recovery_result jid %d ignored due to withdraw.\n",
> +                      jid);
> +               return;
> +       }
>         if (test_bit(DFL_NO_DLM_OPS, &ls->ls_recover_flags))
>                 return;
>
> diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
> index 4ce2bfdbefdc..9e15b5aa2cfb 100644
> --- a/fs/gfs2/recovery.c
> +++ b/fs/gfs2/recovery.c
> @@ -307,6 +307,11 @@ void gfs2_recover_func(struct work_struct *work)
>         int jlocked = 0;
>
>         t_start = ktime_get();
> +       if (gfs2_withdrawn(sdp)) {
> +               fs_err(sdp, "jid=%u: Recovery not attempted due to withdraw.\n",
> +                      jd->jd_jid);
> +               goto fail;
> +       }
>         if (sdp->sd_args.ar_spectator)
>                 goto fail;
>         if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
> --
> 2.21.0
>

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform
  2019-08-20 14:09   ` Andreas Gruenbacher
@ 2019-09-04 16:59     ` Bob Peterson
  0 siblings, 0 replies; 31+ messages in thread
From: Bob Peterson @ 2019-09-04 16:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Andreas,

----- Original Message -----
> > +                */
> > +               if (sdp->sd_log_error) {
> > +                       gfs2_io_error_bh(sdp, bh);
> 
> some of the error handling here is still sketchy: the only place where
> sd_log_error is set without withdrawing the filesystem is
> quotad_error. If the filesystem has already been marked
> SDF_WITHDRAWING or SDF_WITHDRAWN, gfs2_io_error_bh will be a no-op. It
> seems that we want to set SDF_WITHDRAWING here for the quotad_error
> case instead of calling gfs2_io_error_bh?
> 
> > +               } else if (buffer_busy(bh)) {
> >                         continue;
> > -               if (!buffer_uptodate(bh) &&
> > -                   !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
> > +               } else if (!buffer_uptodate(bh) &&
> > +                          !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
> >                         gfs2_io_error_bh(sdp, bh);
> >                         set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
> >                 }

The main idea was to move busy buffers to tr_ail2_list after
an errors have been flagged (before the test for buffer_busy()).
Would something like this be more acceptable?

@@ -200,10 +199,19 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 					 bd_ail_st_list) {
 		bh = bd->bd_bh;
 		gfs2_assert(sdp, bd->bd_tr == tr);
-		if (buffer_busy(bh))
+		/*
+		 * If another process flagged an io error, e.g. writing to the
+		 * journal, error all other bhs and move them off the ail1 to
+		 * prevent a tight loop when unmount tries to flush ail1,
+		 * regardless of whether they're still busy. If no outside
+		 * errors were found and the buffer is busy, move to the next.
+		 * If the ail buffer is not busy and caught an error, flag it
+		 * for others.
+		 */
+		if (!sdp->sd_log_error && buffer_busy(bh))
 			continue;
 		if (!buffer_uptodate(bh) &&
-		    !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
+		    !cmpxchg(&sdp->sd_log_error, 0, -EIO)) {
 			gfs2_io_error_bh(sdp, bh);
 			set_bit(SDF_WITHDRAWING, &sdp->sd_flags);
 		}



Regards,

Bob Peterson



^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2019-09-04 16:59 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
2019-05-29 16:20   ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
2019-08-20 14:09   ` Andreas Gruenbacher
2019-09-04 16:59     ` Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
2019-08-27 11:20   ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.