All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection
@ 2019-03-27 12:35 Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 01/19] gfs2: log error reform Bob Peterson
                   ` (18 more replies)
  0 siblings, 19 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is a collection of patches I've been using to address the myriad
of recovery problems I've found. I'm still finding them, so the battle
is not won yet. I'm not convinced we need all of these but I thought
I'd send them anyway and get feedback. Previously I sent out a version
of the patch "gfs2: Force withdraw to replay journals and wait for it
to finish" that was too big and complex. So I broke it up into four
patches, starting with "move check_journal_clean to util.c for future
use". So those four need to be a set. There aren't many other dependencies
between patches, so the others could probably be taken or rejected
individually.

Bob Peterson (19):
  gfs2: log error reform
  gfs2: Introduce concept of a pending withdraw
  gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn
  gfs2: move check_journal_clean to util.c for future use
  gfs2: Allow some glocks to be used during withdraw
  gfs2: Make secondary withdrawers wait for first withdrawer
  gfs2: Don't write log headers after file system withdraw
  gfs2: Force withdraw to replay journals and wait for it to finish
  gfs2: Add verbose option to check_journal_clean
  gfs2: Check for log write errors before telling dlm to unlock
  gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
  gfs2: If the journal isn't live ignore log flushes
  gfs2: Issue revokes more intelligently
  gfs2: Warn when a journal replay overwrites a rgrp with buffers
  gfs2: log which portion of the journal is replayed
  gfs2: Only remove revokes that we've submitted
  gfs2: eliminate tr_num_revoke_rm
  gfs2: don't call go_unlock unless demote is close at hand
  gfs2: clean_journal was setting sd_log_flush_head replaying other journals

 fs/gfs2/aops.c       |   4 +-
 fs/gfs2/file.c       |   2 +-
 fs/gfs2/glock.c      |  48 ++++++++++--
 fs/gfs2/glock.h      |   1 +
 fs/gfs2/glops.c      |  88 ++++++++++++++++++++-
 fs/gfs2/incore.h     |  18 ++++-
 fs/gfs2/lock_dlm.c   |  68 ++++++++++++++++
 fs/gfs2/log.c        | 101 +++++++++++-------------
 fs/gfs2/log.h        |   1 +
 fs/gfs2/lops.c       |  42 ++++++++--
 fs/gfs2/meta_io.c    |   6 +-
 fs/gfs2/ops_fstype.c |  52 ++----------
 fs/gfs2/quota.c      |   8 +-
 fs/gfs2/recovery.c   |   9 ++-
 fs/gfs2/rgrp.c       |   8 +-
 fs/gfs2/rgrp.h       |   2 +-
 fs/gfs2/super.c      |  30 ++++---
 fs/gfs2/super.h      |   1 +
 fs/gfs2/sys.c        |   2 +-
 fs/gfs2/trans.c      |   6 +-
 fs/gfs2/util.c       | 183 ++++++++++++++++++++++++++++++++++++++++++-
 fs/gfs2/util.h       |  11 +++
 22 files changed, 528 insertions(+), 163 deletions(-)

-- 
2.20.1



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 01/19] gfs2: log error reform
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-04-09 13:46   ` Andreas Gruenbacher
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw Bob Peterson
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, gfs2 kept track of journal io errors in two
places (sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
This patch consolidates the two by eliminating the SDF_AIL1_IO_ERROR
flag in favor of an atomic count of journal errors, sd_log_errors.
When the first io error occurs and its value is incremented to 1,
sd_log_error is set. Thus, sd_log_error reflects the first error
we encountered writing to the journal. In future patches, we will
take advantage of this by checking a single value (sd_log_errors)
rather than having to check both the flag and the sd_log_error
when reacting to io errors.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h     | 4 ++--
 fs/gfs2/log.c        | 7 ++++---
 fs/gfs2/lops.c       | 7 +++++--
 fs/gfs2/ops_fstype.c | 1 +
 fs/gfs2/quota.c      | 6 ++++--
 5 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index cdf07b408f54..76336b592030 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -620,7 +620,6 @@ enum {
 	SDF_RORECOVERY		= 7, /* read only recovery */
 	SDF_SKIP_DLM_UNLOCK	= 8,
 	SDF_FORCE_AIL_FLUSH     = 9,
-	SDF_AIL1_IO_ERROR	= 10,
 };
 
 enum gfs2_freeze_state {
@@ -829,7 +828,8 @@ struct gfs2_sbd {
 	atomic_t sd_log_in_flight;
 	struct bio *sd_log_bio;
 	wait_queue_head_t sd_log_flush_wait;
-	int sd_log_error;
+	int sd_log_error; /* First log error */
+	atomic_t sd_log_errors; /* Count of log errors */
 
 	atomic_t sd_reserving_log;
 	wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index b8830fda51e8..0717b4d4828b 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -109,8 +109,8 @@ __acquires(&sdp->sd_ail_lock)
 
 		if (!buffer_busy(bh)) {
 			if (!buffer_uptodate(bh) &&
-			    !test_and_set_bit(SDF_AIL1_IO_ERROR,
-					      &sdp->sd_flags)) {
+			    atomic_add_return(1, &sdp->sd_log_errors) == 1) {
+				sdp->sd_log_error = -EIO;
 				gfs2_io_error_bh(sdp, bh);
 				*withdraw = true;
 			}
@@ -209,7 +209,8 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
 		if (buffer_busy(bh))
 			continue;
 		if (!buffer_uptodate(bh) &&
-		    !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
+		    atomic_add_return(1, &sdp->sd_log_errors) == 1) {
+			sdp->sd_log_error = -EIO;
 			gfs2_io_error_bh(sdp, bh);
 			*withdraw = true;
 		}
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 8722c60b11fe..627738cfe6bb 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -211,8 +211,11 @@ static void gfs2_end_log_write(struct bio *bio)
 	struct bvec_iter_all iter_all;
 
 	if (bio->bi_status) {
-		fs_err(sdp, "Error %d writing to journal, jid=%u\n",
-		       bio->bi_status, sdp->sd_jdesc->jd_jid);
+		if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
+			sdp->sd_log_error = bio->bi_status;
+			fs_err(sdp, "Error %d writing to journal, jid=%u\n",
+			       bio->bi_status, sdp->sd_jdesc->jd_jid);
+		}
 		wake_up(&sdp->sd_logd_waitq);
 	}
 
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index b041cb8ae383..77610be6c918 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -129,6 +129,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
 
 	init_rwsem(&sdp->sd_log_flush_lock);
 	atomic_set(&sdp->sd_log_in_flight, 0);
+	atomic_set(&sdp->sd_log_errors, 0);
 	atomic_set(&sdp->sd_reserving_log, 0);
 	init_waitqueue_head(&sdp->sd_reserving_log_wait);
 	init_waitqueue_head(&sdp->sd_log_flush_wait);
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 2ae5a109eea7..009172ef4dfe 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1479,8 +1479,10 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 	if (error == 0 || error == -EROFS)
 		return;
 	if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
-		fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
-		sdp->sd_log_error = error;
+		if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
+			sdp->sd_log_error = error;
+			fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
+		}
 		wake_up(&sdp->sd_logd_waitq);
 	}
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 01/19] gfs2: log error reform Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-04-09 14:00   ` Andreas Gruenbacher
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 03/19] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn Bob Peterson
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

File system withdraws can be delayed when inconsistencies are
discovered when we cannot withdraw immediately, for example, when
critical spin_locks are held. But delaying the withdraw can cause
gfs2 to ignore the error and keep running for a short period of time.
For example, an rgrp glock may be dequeued and demoted while there
are still buffers that haven't been properly revoked, due to io
errors writing to the journal.

This patch introduces a new concept of a delayed withdraw, which
means an inconsistency has been discovered and we need to withdraw
at the earliest possible opportunity. In these cases, we aren't
quite withdrawn yet, but we still need to not dequeue glocks and
other critical things. If we dequeue the glocks and the withdraw
results in our journal being replayed, the replay could overwrite
data that's been modified by a different node that acquired the
glock in the meantime.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/aops.c       |  4 ++--
 fs/gfs2/file.c       |  2 +-
 fs/gfs2/glock.c      |  7 +++----
 fs/gfs2/glops.c      |  2 +-
 fs/gfs2/incore.h     |  1 +
 fs/gfs2/log.c        | 20 ++++++++------------
 fs/gfs2/meta_io.c    |  6 +++---
 fs/gfs2/ops_fstype.c |  3 +--
 fs/gfs2/quota.c      |  2 +-
 fs/gfs2/super.c      |  6 +++---
 fs/gfs2/sys.c        |  2 +-
 fs/gfs2/util.c       |  1 +
 fs/gfs2/util.h       |  8 ++++++++
 13 files changed, 34 insertions(+), 30 deletions(-)

diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 05dd78f4b2b3..0d3cde8a61cd 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page)
 		error = mpage_readpage(page, gfs2_block_map);
 	}
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		return -EIO;
 
 	return error;
@@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
 	gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		ret = -EIO;
 	return ret;
 }
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 58a768e59712..2a3ac9747d0d 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
 		cmd = F_SETLK;
 		fl->fl_type = F_UNLCK;
 	}
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
+	if (unlikely(withdrawn(sdp))) {
 		if (fl->fl_type == F_UNLCK)
 			locks_lock_file_wait(file, fl);
 		return -EIO;
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index d32964cd1117..4330164de8bd 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -542,7 +542,7 @@ __acquires(&gl->gl_lockref.lock)
 	unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0);
 	int ret;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
+	if (unlikely(withdrawn(sdp)) &&
 	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
@@ -579,8 +579,7 @@ __acquires(&gl->gl_lockref.lock)
 		}
 		else if (ret) {
 			fs_err(sdp, "lm_lock ret %d\n", ret);
-			GLOCK_BUG_ON(gl, !test_bit(SDF_SHUTDOWN,
-						   &sdp->sd_flags));
+			GLOCK_BUG_ON(gl, !withdrawn(sdp));
 		}
 	} else { /* lock_nolock */
 		finish_xmote(gl, target);
@@ -1092,7 +1091,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 78510ab91835..719961b5a511 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -538,7 +538,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh)
 			gfs2_consist(sdp);
 
 		/*  Initialize some head of the log stuff  */
-		if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+		if (!withdrawn(sdp)) {
 			sdp->sd_log_sequence = head.lh_sequence + 1;
 			gfs2_log_pointers_init(sdp, head.lh_blkno);
 		}
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 76336b592030..c6984265807f 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -620,6 +620,7 @@ enum {
 	SDF_RORECOVERY		= 7, /* read only recovery */
 	SDF_SKIP_DLM_UNLOCK	= 8,
 	SDF_FORCE_AIL_FLUSH     = 9,
+	SDF_PENDING_WITHDRAW	= 10, /* Will withdraw eventually */
 };
 
 enum gfs2_freeze_state {
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 0717b4d4828b..c79279ef03b8 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -92,8 +92,7 @@ static void gfs2_remove_from_ail(struct gfs2_bufdata *bd)
 
 static int gfs2_ail1_start_one(struct gfs2_sbd *sdp,
 			       struct writeback_control *wbc,
-			       struct gfs2_trans *tr,
-			       bool *withdraw)
+			       struct gfs2_trans *tr)
 __releases(&sdp->sd_ail_lock)
 __acquires(&sdp->sd_ail_lock)
 {
@@ -112,7 +111,7 @@ __acquires(&sdp->sd_ail_lock)
 			    atomic_add_return(1, &sdp->sd_log_errors) == 1) {
 				sdp->sd_log_error = -EIO;
 				gfs2_io_error_bh(sdp, bh);
-				*withdraw = true;
+				set_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
 			}
 			list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
 			continue;
@@ -153,7 +152,6 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 	struct list_head *head = &sdp->sd_ail1_list;
 	struct gfs2_trans *tr;
 	struct blk_plug plug;
-	bool withdraw = false;
 
 	trace_gfs2_ail_flush(sdp, wbc, 1);
 	blk_start_plug(&plug);
@@ -162,12 +160,12 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 	list_for_each_entry_reverse(tr, head, tr_list) {
 		if (wbc->nr_to_write <= 0)
 			break;
-		if (gfs2_ail1_start_one(sdp, wbc, tr, &withdraw))
+		if (gfs2_ail1_start_one(sdp, wbc, tr))
 			goto restart;
 	}
 	spin_unlock(&sdp->sd_ail_lock);
 	blk_finish_plug(&plug);
-	if (withdraw)
+	if (test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
 		gfs2_lm_withdraw(sdp, NULL);
 	trace_gfs2_ail_flush(sdp, wbc, 0);
 }
@@ -196,8 +194,7 @@ static void gfs2_ail1_start(struct gfs2_sbd *sdp)
  *
  */
 
-static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
-				bool *withdraw)
+static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 {
 	struct gfs2_bufdata *bd, *s;
 	struct buffer_head *bh;
@@ -212,7 +209,7 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
 		    atomic_add_return(1, &sdp->sd_log_errors) == 1) {
 			sdp->sd_log_error = -EIO;
 			gfs2_io_error_bh(sdp, bh);
-			*withdraw = true;
+			set_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
 		}
 		list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
 	}
@@ -230,11 +227,10 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 	struct gfs2_trans *tr, *s;
 	int oldest_tr = 1;
 	int ret;
-	bool withdraw = false;
 
 	spin_lock(&sdp->sd_ail_lock);
 	list_for_each_entry_safe_reverse(tr, s, &sdp->sd_ail1_list, tr_list) {
-		gfs2_ail1_empty_one(sdp, tr, &withdraw);
+		gfs2_ail1_empty_one(sdp, tr);
 		if (list_empty(&tr->tr_ail1_list) && oldest_tr)
 			list_move(&tr->tr_list, &sdp->sd_ail2_list);
 		else
@@ -243,7 +239,7 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 	ret = list_empty(&sdp->sd_ail1_list);
 	spin_unlock(&sdp->sd_ail_lock);
 
-	if (withdraw)
+	if (test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
 		gfs2_lm_withdraw(sdp, "fatal: I/O error(s)\n");
 
 	return ret;
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index 3201342404a7..b02c7eb2ded3 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -255,7 +255,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	struct buffer_head *bh, *bhs[2];
 	int num = 0;
 
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
+	if (unlikely(withdrawn(sdp))) {
 		*bhp = NULL;
 		return -EIO;
 	}
@@ -313,7 +313,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 
 int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 {
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		return -EIO;
 
 	wait_on_buffer(bh);
@@ -324,7 +324,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
 			gfs2_io_error_bh_wd(sdp, bh);
 		return -EIO;
 	}
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		return -EIO;
 
 	return 0;
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 77610be6c918..de328b9a5648 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -999,8 +999,7 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
 void gfs2_lm_unmount(struct gfs2_sbd *sdp)
 {
 	const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
-	if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
-	    lm->lm_unmount)
+	if (likely(!withdrawn(sdp)) && lm->lm_unmount)
 		lm->lm_unmount(sdp);
 }
 
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 009172ef4dfe..5a1a95fb57ca 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1478,7 +1478,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
 {
 	if (error == 0 || error == -EROFS)
 		return;
-	if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+	if (!withdrawn(sdp)) {
 		if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
 			sdp->sd_log_error = error;
 			fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index ca71163ff7cf..5db590d57564 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -802,7 +802,7 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 
 	if (!(flags & I_DIRTY_INODE))
 		return;
-	if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
+	if (unlikely(withdrawn(sdp)))
 		return;
 	if (!gfs2_glock_is_locked_by_me(ip->i_gl)) {
 		ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
@@ -851,7 +851,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 
 	error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, GL_NOCACHE,
 				   &freeze_gh);
-	if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+	if (error && !withdrawn(sdp))
 		return error;
 
 	flush_workqueue(gfs2_delete_workqueue);
@@ -1005,7 +1005,7 @@ static int gfs2_freeze(struct super_block *sb)
 	if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN)
 		goto out;
 
-	if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+	if (withdrawn(sdp)) {
 		error = -EINVAL;
 		goto out;
 	}
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index 1787d295834e..a2419e92a64e 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -121,7 +121,7 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
 
 static ssize_t withdraw_show(struct gfs2_sbd *sdp, char *buf)
 {
-	unsigned int b = test_bit(SDF_SHUTDOWN, &sdp->sd_flags);
+	unsigned int b = withdrawn(sdp);
 	return snprintf(buf, PAGE_SIZE, "%u\n", b);
 }
 
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 0a814ccac41d..717aef772c60 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -47,6 +47,7 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 	    test_and_set_bit(SDF_SHUTDOWN, &sdp->sd_flags))
 		return 0;
 
+	clear_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
 	if (fmt) {
 		va_start(args, fmt);
 
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index 9278fecba632..16e087da3bd3 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -167,6 +167,14 @@ static inline unsigned int gfs2_tune_get_i(struct gfs2_tune *gt,
 	return x;
 }
 
+static inline bool withdrawn(struct gfs2_sbd *sdp)
+{
+	if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags) ||
+	    test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
+		return true;
+	return false;
+}
+
 #define gfs2_tune_get(sdp, field) \
 gfs2_tune_get_i(&(sdp)->sd_tune, &(sdp)->sd_tune.field)
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 03/19] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 01/19] gfs2: log error reform Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 04/19] gfs2: move check_journal_clean to util.c for future use Bob Peterson
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch addresses various problems with gfs2/dlm recovery.

For example, suppose a node with a bunch of gfs2 mounts suddenly
reboots due to kernel panic, and dlm determines it should perform
recovery. DLM does so from a pseudo-state machine calling various
callbacks into lock_dlm to perform a sequence of steps. It uses
generation numbers and recover bits in dlm "control" lock lvbs.

Now suppose another node tries to recover the failed node's
journal, but in so doing, encounters an IO error or withdraws
due to unforeseen circumstances, such as an hba driver failure.
In these cases, the recovery would eventually bail out, but it
would still update its generation number in the lvb. The other
nodes would all see the newer generation number and think they
don't need to do recovery because the generation number is newer
than the last one they saw, and therefore someone else has already
taken care of it.

If the file system has an io error or is withdrawn, it cannot
safely replay any journals (its own or others) but someone else
still needs to do it. Therefore we don't want it messing with
the journal recovery generation numbers: the local generation
numbers eventually get put into the lvb generation numbers to be
seen by all nodes.

This patch adds checks to many of the callbacks used by dlm
in its recovery state machine so that the functions are ignored
and skipped if an io error has occurred or if the file system
was withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lock_dlm.c | 36 ++++++++++++++++++++++++++++++++++++
 fs/gfs2/util.c     |  2 +-
 2 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 31df26ed7854..8b94f34c5c0f 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -1081,6 +1081,14 @@ static void gdlm_recover_prep(void *arg)
 	struct gfs2_sbd *sdp = arg;
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (atomic_read(&sdp->sd_log_errors)) {
+		fs_err(sdp, "recover_prep ignored due to io error.\n");
+		return;
+	}
+	if (withdrawn(sdp)) {
+		fs_err(sdp, "recover_prep ignored due to withdraw.\n");
+		return;
+	}
 	spin_lock(&ls->ls_recover_spin);
 	ls->ls_recover_block = ls->ls_recover_start;
 	set_bit(DFL_DLM_RECOVERY, &ls->ls_recover_flags);
@@ -1103,6 +1111,16 @@ static void gdlm_recover_slot(void *arg, struct dlm_slot *slot)
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 	int jid = slot->slot - 1;
 
+	if (atomic_read(&sdp->sd_log_errors)) {
+		fs_err(sdp, "recover_slot jid %d ignored due to io error.\n",
+		       jid);
+		return;
+	}
+	if (withdrawn(sdp)) {
+		fs_err(sdp, "recover_slot jid %d ignored due to withdraw.\n",
+		       jid);
+		return;
+	}
 	spin_lock(&ls->ls_recover_spin);
 	if (ls->ls_recover_size < jid + 1) {
 		fs_err(sdp, "recover_slot jid %d gen %u short size %d\n",
@@ -1127,6 +1145,14 @@ static void gdlm_recover_done(void *arg, struct dlm_slot *slots, int num_slots,
 	struct gfs2_sbd *sdp = arg;
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (atomic_read(&sdp->sd_log_errors)) {
+		fs_err(sdp, "recover_done ignored due to io error.\n");
+		return;
+	}
+	if (withdrawn(sdp)) {
+		fs_err(sdp, "recover_done ignored due to withdraw.\n");
+		return;
+	}
 	/* ensure the ls jid arrays are large enough */
 	set_recover_size(sdp, slots, num_slots);
 
@@ -1154,6 +1180,16 @@ static void gdlm_recovery_result(struct gfs2_sbd *sdp, unsigned int jid,
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
 
+	if (atomic_read(&sdp->sd_log_errors)) {
+		fs_err(sdp, "recovery_result jid %d ignored due to io error.\n",
+		       jid);
+		return;
+	}
+	if (withdrawn(sdp)) {
+		fs_err(sdp, "recovery_result jid %d ignored due to withdraw.\n",
+		       jid);
+		return;
+	}
 	if (test_bit(DFL_NO_DLM_OPS, &ls->ls_recover_flags))
 		return;
 
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 717aef772c60..ca6de80b5e8b 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -260,7 +260,7 @@ void gfs2_io_error_bh_i(struct gfs2_sbd *sdp, struct buffer_head *bh,
 			const char *function, char *file, unsigned int line,
 			bool withdraw)
 {
-	if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+	if (!withdrawn(sdp))
 		fs_err(sdp,
 		       "fatal: I/O error\n"
 		       "  block = %llu\n"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 04/19] gfs2: move check_journal_clean to util.c for future use
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (2 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 03/19] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 05/19] gfs2: Allow some glocks to be used during withdraw Bob Peterson
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function check_journal_clean was in ops_fstype.c.
This patch moves it to util.c so we can make use of it elsewhere
in a future patch.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/ops_fstype.c | 42 -----------------------------------------
 fs/gfs2/util.c       | 45 ++++++++++++++++++++++++++++++++++++++++++++
 fs/gfs2/util.h       |  1 +
 3 files changed, 46 insertions(+), 42 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index de328b9a5648..69763f605a8b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -591,48 +591,6 @@ static int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh)
 	return error;
 }
 
-/**
- * check_journal_clean - Make sure a journal is clean for a spectator mount
- * @sdp: The GFS2 superblock
- * @jd: The journal descriptor
- *
- * Returns: 0 if the journal is clean or locked, else an error
- */
-static int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
-{
-	int error;
-	struct gfs2_holder j_gh;
-	struct gfs2_log_header_host head;
-	struct gfs2_inode *ip;
-
-	ip = GFS2_I(jd->jd_inode);
-	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
-				   GL_EXACT | GL_NOCACHE, &j_gh);
-	if (error) {
-		fs_err(sdp, "Error locking journal for spectator mount.\n");
-		return -EPERM;
-	}
-	error = gfs2_jdesc_check(jd);
-	if (error) {
-		fs_err(sdp, "Error checking journal for spectator mount.\n");
-		goto out_unlock;
-	}
-	error = gfs2_find_jhead(jd, &head);
-	if (error) {
-		fs_err(sdp, "Error parsing journal for spectator mount.\n");
-		goto out_unlock;
-	}
-	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
-		error = -EPERM;
-		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
-		       "must not be a spectator.\n", jd->jd_jid);
-	}
-
-out_unlock:
-	gfs2_glock_dq_uninit(&j_gh);
-	return error;
-}
-
 static int init_journal(struct gfs2_sbd *sdp, int undo)
 {
 	struct inode *master = d_inode(sdp->sd_master_dir);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index ca6de80b5e8b..dc00747d35f3 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -19,7 +19,10 @@
 #include "gfs2.h"
 #include "incore.h"
 #include "glock.h"
+#include "lops.h"
+#include "recovery.h"
 #include "rgrp.h"
+#include "super.h"
 #include "util.h"
 
 struct kmem_cache *gfs2_glock_cachep __read_mostly;
@@ -36,6 +39,48 @@ void gfs2_assert_i(struct gfs2_sbd *sdp)
 	fs_emerg(sdp, "fatal assertion failed\n");
 }
 
+/**
+ * check_journal_clean - Make sure a journal is clean for a spectator mount
+ * @sdp: The GFS2 superblock
+ * @jd: The journal descriptor
+ *
+ * Returns: 0 if the journal is clean or locked, else an error
+ */
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
+{
+	int error;
+	struct gfs2_holder j_gh;
+	struct gfs2_log_header_host head;
+	struct gfs2_inode *ip;
+
+	ip = GFS2_I(jd->jd_inode);
+	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
+				   GL_EXACT | GL_NOCACHE, &j_gh);
+	if (error) {
+		fs_err(sdp, "Error locking journal for spectator mount.\n");
+		return -EPERM;
+	}
+	error = gfs2_jdesc_check(jd);
+	if (error) {
+		fs_err(sdp, "Error checking journal for spectator mount.\n");
+		goto out_unlock;
+	}
+	error = gfs2_find_jhead(jd, &head);
+	if (error) {
+		fs_err(sdp, "Error parsing journal for spectator mount.\n");
+		goto out_unlock;
+	}
+	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
+		error = -EPERM;
+		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
+		       "must not be a spectator.\n", jd->jd_jid);
+	}
+
+out_unlock:
+	gfs2_glock_dq_uninit(&j_gh);
+	return error;
+}
+
 int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index 16e087da3bd3..fd72dfd592ab 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -131,6 +131,7 @@ static inline void gfs2_metatype_set(struct buffer_head *bh, u16 type,
 
 int gfs2_io_error_i(struct gfs2_sbd *sdp, const char *function,
 		    char *file, unsigned int line);
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
 
 #define gfs2_io_error(sdp) \
 gfs2_io_error_i((sdp), __func__, __FILE__, __LINE__);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 05/19] gfs2: Allow some glocks to be used during withdraw
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (3 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 04/19] gfs2: move check_journal_clean to util.c for future use Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 06/19] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, when a file system was withdrawn, all further
attempts to enqueue or promote glocks were rejected and returned
-EIO. This is only important for media-backed glocks like inode
and rgrp glocks. All other glocks may be safely used because there
is no potential for metadata corruption. This patch allows some
glocks to be used even after the file system is withdrawn. This
is accomplished with a new glops flag, GLOF_OK_AT_WITHDRAW. This
facilitates future patches that enhance fs withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c  | 4 +++-
 fs/gfs2/glops.c  | 8 ++++++--
 fs/gfs2/incore.h | 1 +
 3 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4330164de8bd..ff395dd44366 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -543,6 +543,7 @@ __acquires(&gl->gl_lockref.lock)
 	int ret;
 
 	if (unlikely(withdrawn(sdp)) &&
+	    !(glops->go_flags & GLOF_OK_AT_WITHDRAW) &&
 	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
@@ -1091,7 +1092,8 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(withdrawn(sdp)))
+	if (unlikely(withdrawn(sdp)) &&
+	    !(gl->gl_ops->go_flags & GLOF_OK_AT_WITHDRAW))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 719961b5a511..fa761ae33348 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -582,6 +582,7 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 
 const struct gfs2_glock_operations gfs2_meta_glops = {
 	.go_type = LM_TYPE_META,
+	.go_flags = GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations gfs2_inode_glops = {
@@ -609,6 +610,7 @@ const struct gfs2_glock_operations gfs2_freeze_glops = {
 	.go_xmote_bh = freeze_go_xmote_bh,
 	.go_demote_ok = freeze_go_demote_ok,
 	.go_type = LM_TYPE_NONDISK,
+	.go_flags = GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations gfs2_iopen_glops = {
@@ -619,20 +621,22 @@ const struct gfs2_glock_operations gfs2_iopen_glops = {
 
 const struct gfs2_glock_operations gfs2_flock_glops = {
 	.go_type = LM_TYPE_FLOCK,
-	.go_flags = GLOF_LRU,
+	.go_flags = GLOF_LRU | GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations gfs2_nondisk_glops = {
 	.go_type = LM_TYPE_NONDISK,
+	.go_flags = GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations gfs2_quota_glops = {
 	.go_type = LM_TYPE_QUOTA,
-	.go_flags = GLOF_LVB | GLOF_LRU,
+	.go_flags = GLOF_LVB | GLOF_LRU | GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations gfs2_journal_glops = {
 	.go_type = LM_TYPE_JOURNAL,
+	.go_flags = GLOF_OK_AT_WITHDRAW,
 };
 
 const struct gfs2_glock_operations *gfs2_glops_list[] = {
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index c6984265807f..d84430eceb6c 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -250,6 +250,7 @@ struct gfs2_glock_operations {
 #define GLOF_ASPACE 1
 #define GLOF_LVB    2
 #define GLOF_LRU    4
+#define GLOF_OK_AT_WITHDRAW 8
 };
 
 enum {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 06/19] gfs2: Make secondary withdrawers wait for first withdrawer
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (4 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 05/19] gfs2: Allow some glocks to be used during withdraw Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 07/19] gfs2: Don't write log headers after file system withdraw Bob Peterson
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, if a process encountered an error and decided to
withdraw, if another process was already in the process of withdrawing,
the secondary withdraw would be silently ignored, which set it free
to proceed with its processing, unlock any locks, etc. That's correct
behavior if the original withdrawer encounters further errors down
the road. However, second withdrawers need to wait for the first
withdrawer to finish its withdraw before proceeding. If we don't wait
we could end up assuming everything is alright, unlock glocks and
telling other nodes they can have the glock, despite the fact that
a withdraw is still ongoing and may require a journal replay before
any locks are released. For example, if an rgrp glock is freed
by a process that didn't wait for the withdraw, a journal replay
could introduce file system corruption by replaying a rgrp block
that has already been granted to another node.

This patch makes secondary withdrawers wait until the primary
withdrawer is finished with its processing before proceeding.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h |  3 +++
 fs/gfs2/util.c   | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index d84430eceb6c..2a9b79be8e07 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -622,6 +622,7 @@ enum {
 	SDF_SKIP_DLM_UNLOCK	= 8,
 	SDF_FORCE_AIL_FLUSH     = 9,
 	SDF_PENDING_WITHDRAW	= 10, /* Will withdraw eventually */
+	SDF_WITHDRAW_IN_PROG	= 11, /* Withdraw is in progress */
 };
 
 enum gfs2_freeze_state {
@@ -832,6 +833,8 @@ struct gfs2_sbd {
 	wait_queue_head_t sd_log_flush_wait;
 	int sd_log_error; /* First log error */
 	atomic_t sd_log_errors; /* Count of log errors */
+	atomic_t sd_withdrawer;
+	wait_queue_head_t sd_withdraw_wait;
 
 	atomic_t sd_reserving_log;
 	wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index dc00747d35f3..bfe9ec4efeb0 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -89,9 +89,23 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 	struct va_format vaf;
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW &&
-	    test_and_set_bit(SDF_SHUTDOWN, &sdp->sd_flags))
-		return 0;
+	    test_and_set_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
+		if (!test_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags))
+			return -1;
+
+		fs_warn(sdp, "Pid %d waiting for process %d to withdraw.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG,
+			    TASK_UNINTERRUPTIBLE);
+		fs_warn(sdp, "Pid %d done waiting for process %d.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		return -1;
+	}
 
+	set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+	atomic_set(&sdp->sd_withdrawer, pid_nr(task_pid(current)));
 	clear_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
 	if (fmt) {
 		va_start(args, fmt);
@@ -120,6 +134,9 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 		set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
 		fs_err(sdp, "withdrawn\n");
 		dump_stack();
+		clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+		smp_mb__after_atomic();
+		wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG);
 	}
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_PANIC)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 07/19] gfs2: Don't write log headers after file system withdraw
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (5 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 06/19] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 08/19] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, when a node withdrew a gfs2 file system, it
wrote a (clean) unmount log header. That's wrong. You don't want
to write anything to the journal once you're withdrawn because
that's acknowledging that the transaction is complete and the
journal is in good shape, neither of which may be a valid
assumption when the file system is withdrawn. This is especially
true if the withdraw was caused due to io errors writing to the
journal in the first place. The best course of action is to leave
the journal "as is" until it may be safely replayed during
journal recovery, regardless of whether it's done by this node or
another.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index c79279ef03b8..62106decba29 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -679,12 +679,16 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
 {
 	struct gfs2_log_header *lh;
 	u32 hash, crc;
-	struct page *page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
+	struct page *page;
 	struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local;
 	struct timespec64 tv;
 	struct super_block *sb = sdp->sd_vfs;
 	u64 addr;
 
+	if (withdrawn(sdp))
+		goto out;
+
+	page = mempool_alloc(gfs2_page_pool, GFP_NOIO);
 	lh = page_address(page);
 	clear_page(lh);
 
@@ -731,6 +735,7 @@ void gfs2_write_log_header(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
 
 	gfs2_log_write(sdp, page, sb->s_blocksize, 0, addr);
 	gfs2_log_submit_bio(&sdp->sd_log_bio, REQ_OP_WRITE, op_flags);
+out:
 	log_flush_wait(sdp);
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 08/19] gfs2: Force withdraw to replay journals and wait for it to finish
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (6 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 07/19] gfs2: Don't write log headers after file system withdraw Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 09/19] gfs2: Add verbose option to check_journal_clean Bob Peterson
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a node withdraws from a file system, it often leaves its journal
in an incomplete state. This is especially true when the withdraw is
caused by io errors writing to the journal. Before this patch, a
withdraw would try to write a "shutdown" record to the journal, tell
dlm it's done with the file system, and none of the other nodes
know about the problem. Later, when the problem is fixed and the
withdrawn node is rebooted, it would then discover that its own
journal was incomplete, and replay it. However, replaying it at this
point is almost guaranteed to introduce corruption because the other
nodes are likely to have used affected resource groups that appeared
in the journal since the time of the withdraw. Replaying the journal
later will overwrite any changes made, and not through any fault of
dlm, which was instructed during the withdraw to release those
resources.

This patch makes file system withdraws seen by the entire cluster.
Withdrawing nodes dequeue their journal glock to allow recovery.

The remaining nodes check all the journals to see if they are
clean or in need of replay. They try to replay dirty journals, but
only the journals of withdrawn nodes will be "not busy" and
therefore available for replay.

Until the journal replay is complete, no i/o related glocks may be
given out, to ensure that the replay does not cause the
aforementioned corruption: We cannot allow any journal replay to
overwrite blocks associated with a glock once it is held. The
glocks not affected by a withdraw are permitted to be passed
around as normal during a withdraw. A new glops flag, called
GLOF_OK_AT_WITHDRAW, indicates glocks that may be passed around
freely while a withdraw is taking place.

One such glock is the "live" glock which is now used to signal when
a withdraw occurs. When a withdraw occurs, the node signals its
withdraw by dequeueing the "live" glock and trying to enqueue it
in EX mode, thus forcing the other nodes to all see a demote
request, by way of a "1CB" (one callback) try lock. The "live"
glock is not granted in EX; the callback is only just used to
indicate a withdraw has occurred.

Note that all nodes in the cluster must wait for the recovering
node to finish replaying the withdrawing node's journal before
continuing. To this end, it checks that the journals are clean
multiple times in a retry loop.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c      |  30 +++++++++++--
 fs/gfs2/glock.h      |   1 +
 fs/gfs2/glops.c      |  52 +++++++++++++++++++++
 fs/gfs2/incore.h     |   5 +++
 fs/gfs2/lock_dlm.c   |  32 +++++++++++++
 fs/gfs2/meta_io.c    |   2 +-
 fs/gfs2/ops_fstype.c |   4 +-
 fs/gfs2/super.c      |  24 +++++-----
 fs/gfs2/super.h      |   1 +
 fs/gfs2/util.c       | 105 ++++++++++++++++++++++++++++++++++++++++++-
 10 files changed, 240 insertions(+), 16 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index ff395dd44366..4996ab06e721 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -242,7 +242,8 @@ static void __gfs2_glock_put(struct gfs2_glock *gl)
 	gfs2_glock_remove_from_lru(gl);
 	spin_unlock(&gl->gl_lockref.lock);
 	GLOCK_BUG_ON(gl, !list_empty(&gl->gl_holders));
-	GLOCK_BUG_ON(gl, mapping && mapping->nrpages);
+	GLOCK_BUG_ON(gl, mapping && mapping->nrpages &&
+		     !test_bit(SDF_SHUTDOWN, &sdp->sd_flags));
 	trace_gfs2_glock_put(gl);
 	sdp->sd_lockstruct.ls_ops->lm_put_lock(gl);
 }
@@ -544,6 +545,7 @@ __acquires(&gl->gl_lockref.lock)
 
 	if (unlikely(withdrawn(sdp)) &&
 	    !(glops->go_flags & GLOF_OK_AT_WITHDRAW) &&
+	    (gh && !(LM_FLAG_NOEXP & gh->gh_flags)) &&
 	    target != LM_ST_UNLOCKED)
 		return;
 	lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
@@ -1092,8 +1094,8 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	int error = 0;
 
-	if (unlikely(withdrawn(sdp)) &&
-	    !(gl->gl_ops->go_flags & GLOF_OK_AT_WITHDRAW))
+	if (unlikely(withdrawn(sdp) && !(LM_FLAG_NOEXP & gh->gh_flags) &&
+		     !(gl->gl_ops->go_flags & GLOF_OK_AT_WITHDRAW)))
 		return -EIO;
 
 	if (test_bit(GLF_LRU, &gl->gl_flags))
@@ -1137,11 +1139,28 @@ int gfs2_glock_poll(struct gfs2_holder *gh)
 void gfs2_glock_dq(struct gfs2_holder *gh)
 {
 	struct gfs2_glock *gl = gh->gh_gl;
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	const struct gfs2_glock_operations *glops = gl->gl_ops;
 	unsigned delay = 0;
 	int fast_path = 0;
 
 	spin_lock(&gl->gl_lockref.lock);
+	/**
+	 * If we're in the process of file system withdraw, we cannot just
+	 * dequeue any glocks until our journal is recovered, lest we
+	 * introduce file system corruption. We need two exceptions to this
+	 * rule: We need to allow unlocking of nondisk glocks and the glock
+	 * for our own journal that needs recovery.
+	 */
+	if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags) &&
+	    test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags) &&
+	    !(gl->gl_ops->go_flags & GLOF_OK_AT_WITHDRAW) &&
+	    gh != &sdp->sd_jinode_gh) {
+		sdp->sd_glock_dqs_held++;
+		might_sleep();
+		wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY,
+			    TASK_UNINTERRUPTIBLE);
+	}
 	if (gh->gh_flags & GL_NOCACHE)
 		handle_callback(gl, LM_ST_UNLOCKED, 0, false);
 
@@ -1621,6 +1640,11 @@ static void dump_glock_func(struct gfs2_glock *gl)
 	dump_glock(NULL, gl);
 }
 
+void gfs2_gl_flushwork(struct gfs2_sbd *sdp)
+{
+	flush_workqueue(glock_workqueue);
+}
+
 /**
  * gfs2_gl_hash_clear - Empty out the glock hash table
  * @sdp: the filesystem
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index 936b3295839c..c1c40e2dbd96 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -202,6 +202,7 @@ extern int gfs2_glock_nq_num(struct gfs2_sbd *sdp, u64 number,
 			     struct gfs2_holder *gh);
 extern int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs);
 extern void gfs2_glock_dq_m(unsigned int num_gh, struct gfs2_holder *ghs);
+extern void gfs2_gl_flushwork(struct gfs2_sbd *sdp);
 extern void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl);
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) { gfs2_dump_glock(NULL, gl); BUG(); } } while(0)
 extern __printf(2, 3)
diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index fa761ae33348..fb88e1f92eff 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -31,6 +31,8 @@
 
 struct workqueue_struct *gfs2_freeze_wq;
 
+extern struct workqueue_struct *gfs2_control_wq;
+
 static void gfs2_ail_error(struct gfs2_glock *gl, const struct buffer_head *bh)
 {
 	fs_err(gl->gl_name.ln_sbd,
@@ -580,6 +582,55 @@ static void iopen_go_callback(struct gfs2_glock *gl, bool remote)
 	}
 }
 
+/**
+ * nondisk_go_callback - used to signal when a node did a withdraw
+ * @gl: the nondisk glock
+ * @remote: true if this came from a different cluster node
+ *
+ */
+static void nondisk_go_callback(struct gfs2_glock *gl, bool remote)
+{
+	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
+
+	/* Ignore the callback unless it's from another node, and it's the
+	   live lock. */
+	if (!remote || gl->gl_name.ln_number != GFS2_LIVE_LOCK)
+		return;
+
+	/* First order of business is to cancel the demote request. We don't
+	 * really want to demote a nondisk glock. At best it's just to inform
+	 * us of a another node's withdraw. We'll keep it in SH mode. */
+	clear_bit(GLF_DEMOTE, &gl->gl_flags);
+	clear_bit(GLF_PENDING_DEMOTE, &gl->gl_flags);
+
+	/* Ignore the unlock if we're withdrawn, unmounting, or in recovery. */
+	if (test_bit(SDF_NORECOVERY, &sdp->sd_flags) ||
+	    test_bit(SDF_SHUTDOWN, &sdp->sd_flags) ||
+	    test_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags))
+		return;
+
+	/* We only care when a node wants us to unlock, because that means
+	 * they want a journal recovered. */
+	if (gl->gl_demote_state != LM_ST_UNLOCKED)
+		return;
+
+	if (sdp->sd_args.ar_spectator) {
+		fs_warn(sdp, "Spectator node cannot recover journals.\n");
+		return;
+	}
+
+	fs_warn(sdp, "Some node has withdrawn; checking for recovery.\n");
+	set_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags);
+	/**
+	 * We can't call remote_withdraw directly here or gfs2_recover_journal
+	 * because this is called from the glock unlock function and the
+	 * remote_withdraw needs to enqueue and dequeue the same "live" glock
+	 * we were called from. So we queue it to the control work queue in
+	 * lock_dlm.
+	 */
+	queue_delayed_work(gfs2_control_wq, &sdp->sd_control_work, 0);
+}
+
 const struct gfs2_glock_operations gfs2_meta_glops = {
 	.go_type = LM_TYPE_META,
 	.go_flags = GLOF_OK_AT_WITHDRAW,
@@ -626,6 +677,7 @@ const struct gfs2_glock_operations gfs2_flock_glops = {
 
 const struct gfs2_glock_operations gfs2_nondisk_glops = {
 	.go_type = LM_TYPE_NONDISK,
+	.go_callback = nondisk_go_callback,
 	.go_flags = GLOF_OK_AT_WITHDRAW,
 };
 
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 2a9b79be8e07..bcda69880ec1 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -623,6 +623,9 @@ enum {
 	SDF_FORCE_AIL_FLUSH     = 9,
 	SDF_PENDING_WITHDRAW	= 10, /* Will withdraw eventually */
 	SDF_WITHDRAW_IN_PROG	= 11, /* Withdraw is in progress */
+	SDF_REMOTE_WITHDRAW	= 12, /* Performing remote recovery */
+	SDF_WITHDRAW_RECOVERY	= 13, /* Wait for journal recovery when we are
+					 withdrawing */
 };
 
 enum gfs2_freeze_state {
@@ -771,6 +774,7 @@ struct gfs2_sbd {
 	struct gfs2_jdesc *sd_jdesc;
 	struct gfs2_holder sd_journal_gh;
 	struct gfs2_holder sd_jinode_gh;
+	struct gfs2_glock *sd_jinode_gl;
 
 	struct gfs2_holder sd_sc_gh;
 	struct gfs2_holder sd_qc_gh;
@@ -858,6 +862,7 @@ struct gfs2_sbd {
 
 	unsigned long sd_last_warning;
 	struct dentry *debugfs_dir;    /* debugfs directory */
+	unsigned long sd_glock_dqs_held;
 };
 
 static inline void gfs2_glstats_inc(struct gfs2_glock *gl, int which)
diff --git a/fs/gfs2/lock_dlm.c b/fs/gfs2/lock_dlm.c
index 8b94f34c5c0f..95e895adc1be 100644
--- a/fs/gfs2/lock_dlm.c
+++ b/fs/gfs2/lock_dlm.c
@@ -19,6 +19,8 @@
 
 #include "incore.h"
 #include "glock.h"
+#include "glops.h"
+#include "recovery.h"
 #include "util.h"
 #include "sys.h"
 #include "trace_gfs2.h"
@@ -325,6 +327,7 @@ static void gdlm_cancel(struct gfs2_glock *gl)
 /*
  * dlm/gfs2 recovery coordination using dlm_recover callbacks
  *
+ *  0. gfs2 checks for another cluster node withdraw, needing journal replay
  *  1. dlm_controld sees lockspace members change
  *  2. dlm_controld blocks dlm-kernel locking activity
  *  3. dlm_controld within dlm-kernel notifies gfs2 (recover_prep)
@@ -573,6 +576,28 @@ static int control_lock(struct gfs2_sbd *sdp, int mode, uint32_t flags)
 			 &ls->ls_control_lksb, "control_lock");
 }
 
+/**
+ * remote_withdraw - react to a node withdrawing from the file system
+ * @sdp: The superblock
+ */
+static void remote_withdraw(struct gfs2_sbd *sdp)
+{
+	struct gfs2_jdesc *jd;
+	int ret = 0, count = 0;
+
+	list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) {
+		if (jd->jd_jid == sdp->sd_lockstruct.ls_jid)
+			continue;
+		ret = gfs2_recover_journal(jd, true);
+		if (ret)
+			break;
+		count++;
+	}
+
+	/* Now drop the additional reference we acquired */
+	fs_err(sdp, "Journals checked: %d, ret = %d.\n", count, ret);
+}
+
 static void gfs2_control_func(struct work_struct *work)
 {
 	struct gfs2_sbd *sdp = container_of(work, struct gfs2_sbd, sd_control_work.work);
@@ -583,6 +608,13 @@ static void gfs2_control_func(struct work_struct *work)
 	int recover_size;
 	int i, error;
 
+	/* First check for other nodes that may have done a withdraw. */
+	if (test_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags)) {
+		remote_withdraw(sdp);
+		clear_bit(SDF_REMOTE_WITHDRAW, &sdp->sd_flags);
+		return;
+	}
+
 	spin_lock(&ls->ls_recover_spin);
 	/*
 	 * No MOUNT_DONE means we're still mounting; control_mount()
diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
index b02c7eb2ded3..03e758c4b3f9 100644
--- a/fs/gfs2/meta_io.c
+++ b/fs/gfs2/meta_io.c
@@ -255,7 +255,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
 	struct buffer_head *bh, *bhs[2];
 	int num = 0;
 
-	if (unlikely(withdrawn(sdp))) {
+	if (unlikely(withdrawn(sdp)) && gl != sdp->sd_jinode_gl) {
 		*bhp = NULL;
 		return -EIO;
 	}
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 69763f605a8b..2e9061eeec9c 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -643,7 +643,8 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 
 		error = gfs2_glock_nq_num(sdp, sdp->sd_lockstruct.ls_jid,
 					  &gfs2_journal_glops,
-					  LM_ST_EXCLUSIVE, LM_FLAG_NOEXP,
+					  LM_ST_EXCLUSIVE,
+					  LM_FLAG_NOEXP | GL_NOCACHE,
 					  &sdp->sd_journal_gh);
 		if (error) {
 			fs_err(sdp, "can't acquire journal glock: %d\n", error);
@@ -651,6 +652,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 		}
 
 		ip = GFS2_I(sdp->sd_jdesc->jd_inode);
+		sdp->sd_jinode_gl = ip->i_gl;
 		error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED,
 					   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 					   &sdp->sd_jinode_gh);
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 5db590d57564..9e3727d0c4de 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -840,11 +840,12 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
 /**
  * gfs2_make_fs_ro - Turn a Read-Write FS into a Read-Only one
  * @sdp: the filesystem
+ * @withdrawing: if 1, we're withdrawing so only do what's necessary
  *
  * Returns: errno
  */
 
-static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
+int gfs2_make_fs_ro(struct gfs2_sbd *sdp, int withdrawing)
 {
 	struct gfs2_holder freeze_gh;
 	int error;
@@ -858,11 +859,12 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 	kthread_stop(sdp->sd_quotad_process);
 	kthread_stop(sdp->sd_logd_process);
 
-	gfs2_quota_sync(sdp->sd_vfs, 0);
-	gfs2_statfs_sync(sdp->sd_vfs, 0);
-
-	gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
-		       GFS2_LFC_MAKE_FS_RO);
+	if (!withdrawing) {
+		gfs2_quota_sync(sdp->sd_vfs, 0);
+		gfs2_statfs_sync(sdp->sd_vfs, 0);
+		gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
+			       GFS2_LFC_MAKE_FS_RO);
+	}
 	wait_event(sdp->sd_reserving_log_wait, atomic_read(&sdp->sd_reserving_log) == 0);
 	gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_blks_free) == sdp->sd_jdesc->jd_blocks);
 
@@ -904,7 +906,7 @@ static void gfs2_put_super(struct super_block *sb)
 	spin_unlock(&sdp->sd_jindex_spin);
 
 	if (!sb_rdonly(sb)) {
-		error = gfs2_make_fs_ro(sdp);
+		error = gfs2_make_fs_ro(sdp, 0);
 		if (error)
 			gfs2_io_error(sdp);
 	}
@@ -921,8 +923,10 @@ static void gfs2_put_super(struct super_block *sb)
 	gfs2_glock_put(sdp->sd_freeze_gl);
 
 	if (!sdp->sd_args.ar_spectator) {
-		gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
-		gfs2_glock_dq_uninit(&sdp->sd_jinode_gh);
+		if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+			gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
+		if (gfs2_holder_initialized(&sdp->sd_jinode_gh))
+			gfs2_glock_dq_uninit(&sdp->sd_jinode_gh);
 		gfs2_glock_dq_uninit(&sdp->sd_sc_gh);
 		gfs2_glock_dq_uninit(&sdp->sd_qc_gh);
 		iput(sdp->sd_sc_inode);
@@ -1270,7 +1274,7 @@ static int gfs2_remount_fs(struct super_block *sb, int *flags, char *data)
 
 	if ((sb->s_flags ^ *flags) & SB_RDONLY) {
 		if (*flags & SB_RDONLY)
-			error = gfs2_make_fs_ro(sdp);
+			error = gfs2_make_fs_ro(sdp, 0);
 		else
 			error = gfs2_make_fs_rw(sdp);
 		if (error)
diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h
index 73c97dccae21..e859c6d5bb3e 100644
--- a/fs/gfs2/super.h
+++ b/fs/gfs2/super.h
@@ -45,6 +45,7 @@ extern void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc,
 extern void update_statfs(struct gfs2_sbd *sdp, struct buffer_head *m_bh,
 			  struct buffer_head *l_bh);
 extern int gfs2_statfs_sync(struct super_block *sb, int type);
+extern int gfs2_make_fs_ro(struct gfs2_sbd *sdp, int withdrawing);
 extern void gfs2_freeze_func(struct work_struct *work);
 
 extern struct file_system_type gfs2_fs_type;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index bfe9ec4efeb0..13e10b769f10 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -14,11 +14,13 @@
 #include <linux/buffer_head.h>
 #include <linux/crc32.h>
 #include <linux/gfs2_ondisk.h>
+#include <linux/delay.h>
 #include <linux/uaccess.h>
 
 #include "gfs2.h"
 #include "incore.h"
 #include "glock.h"
+#include "log.h"
 #include "lops.h"
 #include "recovery.h"
 #include "rgrp.h"
@@ -81,6 +83,105 @@ int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 	return error;
 }
 
+static void signal_our_withdraw(struct gfs2_sbd *sdp)
+{
+	struct gfs2_glock *gl = sdp->sd_live_gh.gh_gl;
+	int ret = 0;
+	int tries;
+
+	/* Prevent any glock dq until withdraw recovery is complete */
+	set_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
+	/**
+	 * Don't tell dlm we're bailing until we have no more buffers in the
+	 * wind. If journal had an IO error, the log code should just purge
+	 * the outstanding buffers rather than submitting new IO. Making the
+	 * file system read-only will flush the journal, etc.
+	 *
+	 * During a normal unmount, gfs2_make_fs_ro calls gfs2_log_shutdown
+	 * which clears SDF_JOURNAL_LIVE. In a withdraw, we cannot write
+	 * any UNMOUNT log header, so we can't call gfs2_log_shutdown, and
+	 * therefore we need to clear SDF_JOURNAL_LIVE manually.
+	 */
+	clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
+	ret = gfs2_make_fs_ro(sdp, 1);
+	sdp->sd_vfs->s_flags |= SB_RDONLY;
+
+	/* Drop the glock for our journal so another node can recover it. */
+	gfs2_glock_dq_wait(&sdp->sd_journal_gh);
+	gfs2_holder_uninit(&sdp->sd_journal_gh);
+	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
+	gfs2_glock_dq_wait(&sdp->sd_jinode_gh);
+	/* holder_uninit to force glock_put, to force dlm to let go */
+	gfs2_holder_uninit(&sdp->sd_jinode_gh);
+	gfs2_jindex_free(sdp);
+	/* Flush the glock work so the glock is freed. This allows try locks
+	 * on other nodes to be successful, otherwise we remain the owner of
+	 * the glock until the workqueue gets around to running. */
+	gfs2_gl_flushwork(sdp);
+
+	if (sdp->sd_lockstruct.ls_ops->lm_lock == NULL) /* lock_nolock */
+		goto skip_recovery;
+	/**
+	 * Dequeue the "live" glock, but keep a reference so it's never freed.
+	 */
+	gfs2_glock_hold(gl);
+	gfs2_glock_dq_wait(&sdp->sd_live_gh);
+	/**
+	 * We enqueue the "live" glock in EX so that all other nodes
+	 * get a demote request and act on it. We don't really want the
+	 * lock in EX, so we send a "try" lock with 1CB to produce a callback.
+	 */
+	fs_warn(sdp, "Requesting recovery of jid %d.\n",
+		sdp->sd_lockstruct.ls_jid);
+	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+			   &sdp->sd_live_gh);
+	msleep(GL_GLOCK_MAX_HOLD);
+	/* This will likely fail in a cluster, but succeed standalone: */
+	ret = gfs2_glock_nq(&sdp->sd_live_gh);
+	/* If we actually got the "live" lock in EX mode, there are no other
+	 * nodes available to replay our journal. So we try to replay it
+	 * ourselves. We hold the "live" glock to prevent other mounters
+	 * during recovery, then just dequeue it and reacquire it in our
+	 * normal SH mode. Just in case the problem that caused us to
+	 * withdraw prevents us from recovering our journal (e.g. io errors
+	 * and such) we still check if the journal is clean before proceeding
+	 * but we may wait forever until another mounter does the recovery. */
+	if (ret == 0) {
+		fs_warn(sdp, "No other mounters found. Trying to recover our "
+			"own journal jid %d.\n", sdp->sd_lockstruct.ls_jid);
+		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
+			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
+				sdp->sd_lockstruct.ls_jid);
+		gfs2_glock_dq_wait(&sdp->sd_live_gh);
+		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+				   &sdp->sd_live_gh);
+		gfs2_glock_nq(&sdp->sd_live_gh);
+	}
+	/* Now drop the additional reference we acquired */
+	gfs2_glock_queue_put(gl);
+	clear_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags);
+
+	/* Now wait until recovery complete. */
+	for (tries = 0; tries < 10; tries++) {
+		ret = check_journal_clean(sdp, sdp->sd_jdesc);
+		if (!ret)
+			break;
+		msleep(HZ);
+		fs_warn(sdp, "Waiting for journal recovery jid %d.\n",
+			sdp->sd_lockstruct.ls_jid);
+	}
+skip_recovery:
+	if (!ret)
+		fs_warn(sdp, "Journal recovery complete for jid %d.\n",
+			sdp->sd_lockstruct.ls_jid);
+	else
+		fs_warn(sdp, "Journal recovery skipped for %d until next "
+			"mount.\n", sdp->sd_lockstruct.ls_jid);
+	fs_warn(sdp, "Glock dequeues delayed: %lu\n", sdp->sd_glock_dqs_held);
+	sdp->sd_glock_dqs_held = 0;
+	wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_RECOVERY);
+}
+
 int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 {
 	struct lm_lockstruct *ls = &sdp->sd_lockstruct;
@@ -122,6 +223,8 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 		fs_err(sdp, "about to withdraw this file system\n");
 		BUG_ON(sdp->sd_args.ar_debug);
 
+		signal_our_withdraw(sdp);
+
 		kobject_uevent(&sdp->sd_kobj, KOBJ_OFFLINE);
 
 		if (!strcmp(sdp->sd_lockstruct.ls_ops->lm_proto_name, "lock_dlm"))
@@ -132,7 +235,7 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 			lm->lm_unmount(sdp);
 		}
 		set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
-		fs_err(sdp, "withdrawn\n");
+		fs_err(sdp, "File system withdrawn\n");
 		dump_stack();
 		clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
 		smp_mb__after_atomic();
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 09/19] gfs2: Add verbose option to check_journal_clean
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (7 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 08/19] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 10/19] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function check_journal_clean would give messages
related to journal recovery. That's fine for mount time, but when a
node withdraws and forces replay that way, we don't want all those
distracting and misleading messages. This patch adds a new parameter
to make those messages optional.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/ops_fstype.c |  2 +-
 fs/gfs2/util.c       | 23 ++++++++++++++++-------
 fs/gfs2/util.h       |  4 +++-
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index 2e9061eeec9c..a556760e798f 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -683,7 +683,7 @@ static int init_journal(struct gfs2_sbd *sdp, int undo)
 			struct gfs2_jdesc *jd = gfs2_jdesc_find(sdp, x);
 
 			if (sdp->sd_args.ar_spectator) {
-				error = check_journal_clean(sdp, jd);
+				error = check_journal_clean(sdp, jd, true);
 				if (error)
 					goto fail_jinode_gh;
 				continue;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 13e10b769f10..ba5cf394a095 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -48,7 +48,8 @@ void gfs2_assert_i(struct gfs2_sbd *sdp)
  *
  * Returns: 0 if the journal is clean or locked, else an error
  */
-int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
+int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
+			bool verbose)
 {
 	int error;
 	struct gfs2_holder j_gh;
@@ -59,23 +60,31 @@ int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd)
 	error = gfs2_glock_nq_init(ip->i_gl, LM_ST_SHARED, LM_FLAG_NOEXP |
 				   GL_EXACT | GL_NOCACHE, &j_gh);
 	if (error) {
-		fs_err(sdp, "Error locking journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error %d locking journal for spectator "
+			       "mount.\n", error);
 		return -EPERM;
 	}
 	error = gfs2_jdesc_check(jd);
 	if (error) {
-		fs_err(sdp, "Error checking journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error checking journal for spectator "
+			       "mount.\n");
 		goto out_unlock;
 	}
 	error = gfs2_find_jhead(jd, &head);
 	if (error) {
-		fs_err(sdp, "Error parsing journal for spectator mount.\n");
+		if (verbose)
+			fs_err(sdp, "Error parsing journal for spectator "
+			       "mount.\n");
 		goto out_unlock;
 	}
 	if (!(head.lh_flags & GFS2_LOG_HEAD_UNMOUNT)) {
 		error = -EPERM;
-		fs_err(sdp, "jid=%u: Journal is dirty, so the first mounter "
-		       "must not be a spectator.\n", jd->jd_jid);
+		if (verbose)
+			fs_err(sdp, "jid=%u: Journal is dirty, so the first "
+			       "mounter must not be a spectator.\n",
+			       jd->jd_jid);
 	}
 
 out_unlock:
@@ -163,7 +172,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 
 	/* Now wait until recovery complete. */
 	for (tries = 0; tries < 10; tries++) {
-		ret = check_journal_clean(sdp, sdp->sd_jdesc);
+		ret = check_journal_clean(sdp, sdp->sd_jdesc, false);
 		if (!ret)
 			break;
 		msleep(HZ);
diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
index fd72dfd592ab..036c7cfd856d 100644
--- a/fs/gfs2/util.h
+++ b/fs/gfs2/util.h
@@ -131,7 +131,9 @@ static inline void gfs2_metatype_set(struct buffer_head *bh, u16 type,
 
 int gfs2_io_error_i(struct gfs2_sbd *sdp, const char *function,
 		    char *file, unsigned int line);
-int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd);
+
+extern int check_journal_clean(struct gfs2_sbd *sdp, struct gfs2_jdesc *jd,
+			       bool verbose);
 
 #define gfs2_io_error(sdp) \
 gfs2_io_error_i((sdp), __func__, __FILE__, __LINE__);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 10/19] gfs2: Check for log write errors before telling dlm to unlock
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (8 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 09/19] gfs2: Add verbose option to check_journal_clean Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 11/19] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Bob Peterson
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple assert_withdraws
in do_xmote after the calls to go_sync and go_inval. The asserts
should cause another node to replay the journal before continuing,
thus protecting rgrp and dinode glocks and maintaining the
integrity of the metadata.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4996ab06e721..72a7b19c3aef 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -566,8 +566,12 @@ __acquires(&gl->gl_lockref.lock)
 	spin_unlock(&gl->gl_lockref.lock);
 	if (glops->go_sync)
 		glops->go_sync(gl);
+	gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0);
 	if (test_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags))
 		glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA);
+
+	if (!gfs2_assert_withdraw(sdp, atomic_read(&sdp->sd_log_errors) == 0))
+		gfs2_assert_withdraw(sdp, !atomic_read(&gl->gl_ail_count));
 	clear_bit(GLF_INVALIDATE_IN_PROGRESS, &gl->gl_flags);
 
 	gfs2_glock_hold(gl);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 11/19] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (9 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 10/19] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 12/19] gfs2: If the journal isn't live ignore log flushes Bob Peterson
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, if gfs2_ail_empty_gl saw there was nothing on
the ail list, it would return and not flush the log. The problem
is that there could still be a revoke for the rgrp sitting on the
sd_log_le_revoke list that's been recently taken off the ail list.
But that revoke still needs to be written, and the rgrp_go_inval
still needs to call log_flush_wait to ensure the revokes are all
properly written to the journal before we relinquish control of
the glock to another node. If we give the glock to another node
before we have this knowledge, the node might crash and its journal
replayed, in which case the missing revoke would allow the journal
replay to replay the rgrp over top of the rgrp we already gave to
another node, thus overwriting its changes and corrupting the
file system.

This patch makes gfs2_ail_empty_gl still call gfs2_log_flush rather
than returning.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c | 26 +++++++++++++++++++++++++-
 fs/gfs2/log.c   |  2 +-
 fs/gfs2/log.h   |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index fb88e1f92eff..9520ec62bcef 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -93,8 +93,31 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	INIT_LIST_HEAD(&tr.tr_databuf);
 	tr.tr_revokes = atomic_read(&gl->gl_ail_count);
 
-	if (!tr.tr_revokes)
+	if (!tr.tr_revokes) {
+		/**
+		 * We have nothing on the ail, but there could be revokes on
+		 * the sdp revoke queue, in which case, we still want to flush
+		 * the log and wait for it to finish.
+		 *
+		 * If the sdp revoke list is empty too, we might still have an
+		 * io outstanding for writing revokes, so we should wait for
+		 * it before returning.
+		 *
+		 * If none of these conditions are true, our revokes are all
+		 * flushed and we can return.
+		 */
+		gfs2_log_lock(sdp);
+		if (atomic_read(&gl->gl_revokes)) {
+			gfs2_log_unlock(sdp);
+			goto flush;
+		} else if (atomic_read(&sdp->sd_log_in_flight)) {
+			gfs2_log_unlock(sdp);
+			log_flush_wait(sdp);
+		} else {
+			gfs2_log_unlock(sdp);
+		}
 		return;
+	}
 
 	/* A shortened, inline version of gfs2_trans_begin()
          * tr->alloced is not set since the transaction structure is
@@ -109,6 +132,7 @@ static void gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	__gfs2_ail_flush(gl, 0, tr.tr_revokes);
 
 	gfs2_trans_end(sdp);
+flush:
 	gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 		       GFS2_LFC_AIL_EMPTY_GL);
 }
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 62106decba29..896165811063 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -513,7 +513,7 @@ static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail)
 }
 
 
-static void log_flush_wait(struct gfs2_sbd *sdp)
+void log_flush_wait(struct gfs2_sbd *sdp)
 {
 	DEFINE_WAIT(wait);
 
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 1bc9bd444b28..bd2d08d0f21c 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -75,6 +75,7 @@ extern void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl,
 			   u32 type);
 extern void gfs2_log_commit(struct gfs2_sbd *sdp, struct gfs2_trans *trans);
 extern void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc);
+extern void log_flush_wait(struct gfs2_sbd *sdp);
 
 extern void gfs2_log_shutdown(struct gfs2_sbd *sdp);
 extern int gfs2_logd(void *data);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 12/19] gfs2: If the journal isn't live ignore log flushes
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (10 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 11/19] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 13/19] gfs2: Issue revokes more intelligently Bob Peterson
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a check to function gfs2_log_flush: if the journal
is no longer live, the flush is ignored.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 896165811063..cd2f54db7f8a 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -782,6 +782,9 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 	struct gfs2_trans *tr;
 	enum gfs2_freeze_state state = atomic_read(&sdp->sd_freeze_state);
 
+	if (!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))
+		return;
+
 	down_write(&sdp->sd_log_flush_lock);
 
 	/* Log might have been flushed while we waited for the flush lock */
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 13/19] gfs2: Issue revokes more intelligently
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (11 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 12/19] gfs2: If the journal isn't live ignore log flushes Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 14/19] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_write_revokes would call
gfs2_ail1_empty, then traverse the sd_ail1_list looking for
transactions that had bds which were no longer queued to a glock.
And if it found some, it would try to issue revokes for them, up to
a predetermined maximum. There were two problems with how it did
this. First was the fact that gfs2_ail1_empty moves transactions
which have nothing remaining on the ail1 list from the sd_ail1_list
to the sd_ail2_list, thus making its traversal of sd_ail1_list
miss them completely, and therefore, never issue revokes for them.
Second was the fact that there were three traversals (or partial
traversals) of the sd_ail1_list, each of which took and then
released the sd_ail_lock lock. (First inside gfs2_ail1_empty,
second to determine if there are any revokes to be issued, and
third to actually issue them. All this taking and releasing of the
sd_ail_lock meant other processes could modify the lists and the
conditions in which we're working.

This patch simplies the whole process by adding a new parameter
to function gfs2_ail1_empty, max_revokes. For normal calls, this
is passed in as 0, meaning we don't want to issue any revokes.
For function gfs2_write_revokes, we pass in the maximum number
of revokes we can, thus allowing gfs2_ail1_empty to add the
revokes where needed. This simplies the code, allows for a single
holding of the sd_ail_lock, and allows gfs2_ail1_empty to add
revokes for all the necessary bd items without missing any.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 61 +++++++++++++++++++--------------------------------
 1 file changed, 22 insertions(+), 39 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index cd2f54db7f8a..e573bdbb9634 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -190,11 +190,13 @@ static void gfs2_ail1_start(struct gfs2_sbd *sdp)
 /**
  * gfs2_ail1_empty_one - Check whether or not a trans in the AIL has been synced
  * @sdp: the filesystem
- * @ai: the AIL entry
+ * @tr: the transaction
+ * @max_revokes: If nonzero, issue revokes for the bd items for written buffers
  *
  */
 
-static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
+static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
+				int *max_revokes)
 {
 	struct gfs2_bufdata *bd, *s;
 	struct buffer_head *bh;
@@ -211,18 +213,28 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 			gfs2_io_error_bh(sdp, bh);
 			set_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
 		}
-		list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
+		/* If we have space for revokes and the bd is no longer on any
+		   buf list, we can just add a revoke for it here and avoid
+		   having to put it on the ail2 list, where it would need to
+		   be revoked later. */
+		if (*max_revokes && list_empty(&bd->bd_list)) {
+			gfs2_add_revoke(sdp, bd);
+			(*max_revokes)--;
+		} else {
+			list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
+		}
 	}
 }
 
 /**
  * gfs2_ail1_empty - Try to empty the ail1 lists
  * @sdp: The superblock
+ * @max_revokes: If non-zero, add revokes where appropriate
  *
  * Tries to empty the ail1 lists, starting with the oldest first
  */
 
-static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
+static int gfs2_ail1_empty(struct gfs2_sbd *sdp, int max_revokes)
 {
 	struct gfs2_trans *tr, *s;
 	int oldest_tr = 1;
@@ -230,7 +242,7 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
 
 	spin_lock(&sdp->sd_ail_lock);
 	list_for_each_entry_safe_reverse(tr, s, &sdp->sd_ail1_list, tr_list) {
-		gfs2_ail1_empty_one(sdp, tr);
+		gfs2_ail1_empty_one(sdp, tr, &max_revokes);
 		if (list_empty(&tr->tr_ail1_list) && oldest_tr)
 			list_move(&tr->tr_list, &sdp->sd_ail2_list);
 		else
@@ -610,25 +622,9 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 
 void gfs2_write_revokes(struct gfs2_sbd *sdp)
 {
-	struct gfs2_trans *tr;
-	struct gfs2_bufdata *bd, *tmp;
-	int have_revokes = 0;
 	int max_revokes = (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_log_descriptor)) / sizeof(u64);
 
-	gfs2_ail1_empty(sdp);
-	spin_lock(&sdp->sd_ail_lock);
-	list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
-		list_for_each_entry(bd, &tr->tr_ail2_list, bd_ail_st_list) {
-			if (list_empty(&bd->bd_list)) {
-				have_revokes = 1;
-				goto done;
-			}
-		}
-	}
-done:
-	spin_unlock(&sdp->sd_ail_lock);
-	if (have_revokes == 0)
-		return;
+	gfs2_log_lock(sdp);
 	while (sdp->sd_log_num_revoke > max_revokes)
 		max_revokes += (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_meta_header)) / sizeof(u64);
 	max_revokes -= sdp->sd_log_num_revoke;
@@ -639,20 +635,7 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp)
 		if (!sdp->sd_log_blks_reserved)
 			atomic_dec(&sdp->sd_log_blks_free);
 	}
-	gfs2_log_lock(sdp);
-	spin_lock(&sdp->sd_ail_lock);
-	list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
-		list_for_each_entry_safe(bd, tmp, &tr->tr_ail2_list, bd_ail_st_list) {
-			if (max_revokes == 0)
-				goto out_of_blocks;
-			if (!list_empty(&bd->bd_list))
-				continue;
-			gfs2_add_revoke(sdp, bd);
-			max_revokes--;
-		}
-	}
-out_of_blocks:
-	spin_unlock(&sdp->sd_ail_lock);
+	gfs2_ail1_empty(sdp, max_revokes);
 	gfs2_log_unlock(sdp);
 
 	if (!sdp->sd_log_num_revoke) {
@@ -845,7 +828,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 			for (;;) {
 				gfs2_ail1_start(sdp);
 				gfs2_ail1_wait(sdp);
-				if (gfs2_ail1_empty(sdp))
+				if (gfs2_ail1_empty(sdp, 0))
 					break;
 			}
 			atomic_dec(&sdp->sd_log_blks_free); /* Adjust for unreserved buffer */
@@ -1011,7 +994,7 @@ int gfs2_logd(void *data)
 
 		did_flush = false;
 		if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
-			gfs2_ail1_empty(sdp);
+			gfs2_ail1_empty(sdp, 0);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 				       GFS2_LFC_LOGD_JFLUSH_REQD);
 			did_flush = true;
@@ -1020,7 +1003,7 @@ int gfs2_logd(void *data)
 		if (gfs2_ail_flush_reqd(sdp)) {
 			gfs2_ail1_start(sdp);
 			gfs2_ail1_wait(sdp);
-			gfs2_ail1_empty(sdp);
+			gfs2_ail1_empty(sdp, 0);
 			gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 				       GFS2_LFC_LOGD_AIL_FLUSH_REQD);
 			did_flush = true;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 14/19] gfs2: Warn when a journal replay overwrites a rgrp with buffers
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (12 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 13/19] gfs2: Issue revokes more intelligently Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 15/19] gfs2: log which portion of the journal is replayed Bob Peterson
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 627738cfe6bb..11d2fdeed8ac 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -567,9 +567,27 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, unsigned int start,
 
 		if (gfs2_meta_check(sdp, bh_ip))
 			error = -EIO;
-		else
+		else {
+			struct gfs2_meta_header *mh =
+				(struct gfs2_meta_header *)bh_ip->b_data;
+
+			if (mh->mh_type == cpu_to_be32(GFS2_METATYPE_RG)) {
+				struct gfs2_rgrpd *rgd;
+
+				rgd = gfs2_blk2rgrpd(sdp, blkno, false);
+				if (rgd && rgd->rd_addr == blkno &&
+				    rgd->rd_bits && rgd->rd_bits->bi_bh) {
+					fs_info(sdp, "Replaying 0x%llx but we "
+						"already have a bh!\n",
+						(unsigned long long)blkno);
+					fs_info(sdp, "busy:%d, pinned:%d\n",
+						buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0,
+						buffer_pinned(rgd->rd_bits->bi_bh));
+					gfs2_dump_glock(NULL, rgd->rd_gl);
+				}
+			}
 			mark_buffer_dirty(bh_ip);
-
+		}
 		brelse(bh_log);
 		brelse(bh_ip);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 15/19] gfs2: log which portion of the journal is replayed
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (13 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 14/19] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 16/19] gfs2: Only remove revokes that we've submitted Bob Peterson
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

When a journal is replayed, gfs2 logs a message similar to:

jid=X: Replaying journal...

This patch adds the tail and block number so that the range of the
replayed block is also printed. These values will match the values
shown if the journal is dumped with gfs2_edit -p journalX. The
resulting output looks something like this:

jid=1: Replaying journal...0x28b7 to 0x2beb

This will allow us to better debug file system corruption problems.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/recovery.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 2dac43065382..020cac0e1002 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -512,7 +512,8 @@ void gfs2_recover_func(struct work_struct *work)
 		}
 
 		t_tlck = ktime_get();
-		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);
+		fs_info(sdp, "jid=%u: Replaying journal...0x%x to 0x%x\n", 
+			jd->jd_jid, head.lh_tail, head.lh_blkno);
 
 		for (pass = 0; pass < 2; pass++) {
 			lops_before_scan(jd, &head, pass);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 16/19] gfs2: Only remove revokes that we've submitted
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (14 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 15/19] gfs2: log which portion of the journal is replayed Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 17/19] gfs2: eliminate tr_num_revoke_rm Bob Peterson
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function revoke_lo_after_commit would run the
list of revokes (sd_log_num_revoke) and free them all. That assumes
they've all been submitted. When revoke_lo_before_commit is called
by the log flush, that's true. But then all the io is submitted,
which could take some time. In the meantime, another process might
be able to add new revokes to the list, and those might be wiped
out. Also, since both functions run the list run head to tail,
function gfs2_add_revoke should add to the tail, not the head.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h |  1 +
 fs/gfs2/log.c    |  4 +++-
 fs/gfs2/lops.c   | 13 +++++++++++--
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index bcda69880ec1..7e1af6bc5990 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -814,6 +814,7 @@ struct gfs2_sbd {
 
 	atomic_t sd_log_pinned;
 	unsigned int sd_log_num_revoke;
+	unsigned int sd_log_num_revoked; /* number of revokes written */
 
 	struct list_head sd_log_le_revoke;
 	struct list_head sd_log_le_ordered;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index e573bdbb9634..2dc0a6b2e0c1 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -617,7 +617,7 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 	sdp->sd_log_num_revoke++;
 	atomic_inc(&gl->gl_revokes);
 	set_bit(GLF_LFLUSH, &gl->gl_flags);
-	list_add(&bd->bd_list, &sdp->sd_log_le_revoke);
+	list_add_tail(&bd->bd_list, &sdp->sd_log_le_revoke);
 }
 
 void gfs2_write_revokes(struct gfs2_sbd *sdp)
@@ -627,6 +627,8 @@ void gfs2_write_revokes(struct gfs2_sbd *sdp)
 	gfs2_log_lock(sdp);
 	while (sdp->sd_log_num_revoke > max_revokes)
 		max_revokes += (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_meta_header)) / sizeof(u64);
+	/* Subtract out the revokes we already have queued, to get the number
+	 * of additional revokes we can add at this time. */
 	max_revokes -= sdp->sd_log_num_revoke;
 	if (!sdp->sd_log_num_revoke) {
 		atomic_dec(&sdp->sd_log_blks_free);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 11d2fdeed8ac..f442ef60431d 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -656,9 +656,10 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 	length = gfs2_struct2blk(sdp, sdp->sd_log_num_revoke, sizeof(u64));
 	page = gfs2_get_log_desc(sdp, GFS2_LOG_DESC_REVOKE, length, sdp->sd_log_num_revoke);
 	offset = sizeof(struct gfs2_log_descriptor);
+	sdp->sd_log_num_revoked = 0;
 
 	list_for_each_entry(bd, head, bd_list) {
-		sdp->sd_log_num_revoke--;
+		sdp->sd_log_num_revoked++;
 
 		if (offset + sizeof(u64) > sdp->sd_sb.sb_bsize) {
 
@@ -675,7 +676,8 @@ static void revoke_lo_before_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 		*(__be64 *)(page_address(page) + offset) = cpu_to_be64(bd->bd_blkno);
 		offset += sizeof(u64);
 	}
-	gfs2_assert_withdraw(sdp, !sdp->sd_log_num_revoke);
+	gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke ==
+			     sdp->sd_log_num_revoked);
 
 	gfs2_log_write_page(sdp, page);
 }
@@ -686,6 +688,8 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 	struct gfs2_bufdata *bd;
 	struct gfs2_glock *gl;
 
+	BUG_ON(sdp->sd_log_num_revoked > sdp->sd_log_num_revoke);
+
 	while (!list_empty(head)) {
 		bd = list_entry(head->next, struct gfs2_bufdata, bd_list);
 		list_del_init(&bd->bd_list);
@@ -693,7 +697,12 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 		atomic_dec(&gl->gl_revokes);
 		clear_bit(GLF_LFLUSH, &gl->gl_flags);
 		kmem_cache_free(gfs2_bufdata_cachep, bd);
+		sdp->sd_log_num_revoke--;
+		sdp->sd_log_num_revoked--;
+		if (sdp->sd_log_num_revoked == 0)
+			break;
 	}
+	WARN_ON(sdp->sd_log_num_revoke);
 }
 
 static void revoke_lo_before_scan(struct gfs2_jdesc *jd,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 17/19] gfs2: eliminate tr_num_revoke_rm
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (15 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 16/19] gfs2: Only remove revokes that we've submitted Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 18/19] gfs2: don't call go_unlock unless demote is close at hand Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 19/19] gfs2: clean_journal was setting sd_log_flush_head replaying other journals Bob Peterson
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

For its journal processing, gfs2 kept track of the number of buffers
added and removed on a per-transaction basis. These values are used
to calculate space needed in the journal. But while these calculations
make sense for the number of buffers, they make no sense for revokes.
Revokes are managed in their own list, linked from the superblock.
So it's entirely unnecessary to keep separate per-transaction counts
for revokes added and removed. A single count will do the same job.
Therefore, this patch combines the transaction revokes into a single
count.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h | 1 -
 fs/gfs2/log.c    | 3 +--
 fs/gfs2/trans.c  | 6 +++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 7e1af6bc5990..341590b348a3 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -508,7 +508,6 @@ struct gfs2_trans {
 	unsigned int tr_num_buf_rm;
 	unsigned int tr_num_databuf_rm;
 	unsigned int tr_num_revoke;
-	unsigned int tr_num_revoke_rm;
 
 	struct list_head tr_list;
 	struct list_head tr_databuf;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 2dc0a6b2e0c1..64e435cedb32 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -866,7 +866,6 @@ static void gfs2_merge_trans(struct gfs2_trans *old, struct gfs2_trans *new)
 	old->tr_num_buf_rm	+= new->tr_num_buf_rm;
 	old->tr_num_databuf_rm	+= new->tr_num_databuf_rm;
 	old->tr_num_revoke	+= new->tr_num_revoke;
-	old->tr_num_revoke_rm	+= new->tr_num_revoke_rm;
 
 	list_splice_tail_init(&new->tr_databuf, &old->tr_databuf);
 	list_splice_tail_init(&new->tr_buf, &old->tr_buf);
@@ -888,7 +887,7 @@ static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 		set_bit(TR_ATTACHED, &tr->tr_flags);
 	}
 
-	sdp->sd_log_commited_revoke += tr->tr_num_revoke - tr->tr_num_revoke_rm;
+	sdp->sd_log_commited_revoke += tr->tr_num_revoke;
 	reserved = calc_reserved(sdp);
 	maxres = sdp->sd_log_blks_reserved + tr->tr_reserved;
 	gfs2_assert_withdraw(sdp, maxres >= reserved);
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index cd9a94a6b5bb..ecf8392473e1 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -80,10 +80,10 @@ static void gfs2_print_trans(struct gfs2_sbd *sdp, const struct gfs2_trans *tr)
 	fs_warn(sdp, "blocks=%u revokes=%u reserved=%u touched=%u\n",
 		tr->tr_blocks, tr->tr_revokes, tr->tr_reserved,
 		test_bit(TR_TOUCHED, &tr->tr_flags));
-	fs_warn(sdp, "Buf %u/%u Databuf %u/%u Revoke %u/%u\n",
+	fs_warn(sdp, "Buf %u/%u Databuf %u/%u Revoke %u\n",
 		tr->tr_num_buf_new, tr->tr_num_buf_rm,
 		tr->tr_num_databuf_new, tr->tr_num_databuf_rm,
-		tr->tr_num_revoke, tr->tr_num_revoke_rm);
+		tr->tr_num_revoke);
 }
 
 void gfs2_trans_end(struct gfs2_sbd *sdp)
@@ -266,7 +266,7 @@ void gfs2_trans_add_unrevoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
-			tr->tr_num_revoke_rm++;
+			tr->tr_num_revoke--;
 			if (--n == 0)
 				break;
 		}
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 18/19] gfs2: don't call go_unlock unless demote is close at hand
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (16 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 17/19] gfs2: eliminate tr_num_revoke_rm Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 19/19] gfs2: clean_journal was setting sd_log_flush_head replaying other journals Bob Peterson
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The go_unlock glops function is only used when unlocking rgrps.
Before this patch, the glock code did:
 1. spin_unlock(&gl->gl_lockref.lock);
 2. go_unlock
 3. spin_lock(&gl->gl_lockref.lock);
The go_unlock function checked for the GLF_DEMOTE or DEMOTE_PENDING
bits, and if set, called rgrp_go_unlock. But doing it that way
leaves a race: by the time rgrp_go_unlock returns, another process
may have set GLF_DEMOTE, causing the go_unlock to have been skipped.

This patch changes function gfs2_glock_dq so that it does not
release the gl_lockref lock until after the test for DEMOTE is
complete, and we know for sure whether it needs demoting (for this go).
That way, nobody can sneak in and change the flag during the function.

For simplicity's sake, I also changed the go_unlock parameter to
accept the glock rather than the holder.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c  | 9 ++++++---
 fs/gfs2/incore.h | 2 +-
 fs/gfs2/rgrp.c   | 8 +++-----
 fs/gfs2/rgrp.h   | 2 +-
 4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 72a7b19c3aef..465de5ef8a5d 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1173,9 +1173,12 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
 	if (find_first_holder(gl) == NULL) {
 		if (glops->go_unlock) {
 			GLOCK_BUG_ON(gl, test_and_set_bit(GLF_LOCK, &gl->gl_flags));
-			spin_unlock(&gl->gl_lockref.lock);
-			glops->go_unlock(gh);
-			spin_lock(&gl->gl_lockref.lock);
+			if (test_bit(GLF_DEMOTE, &gl->gl_flags) ||
+			    test_bit(GLF_PENDING_DEMOTE, &gl->gl_flags)) {
+				spin_unlock(&gl->gl_lockref.lock);
+				glops->go_unlock(gl);
+				spin_lock(&gl->gl_lockref.lock);
+			}
 			clear_bit(GLF_LOCK, &gl->gl_flags);
 		}
 		if (list_empty(&gl->gl_holders) &&
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index 341590b348a3..94afe91deb73 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -242,7 +242,7 @@ struct gfs2_glock_operations {
 	void (*go_inval) (struct gfs2_glock *gl, int flags);
 	int (*go_demote_ok) (const struct gfs2_glock *gl);
 	int (*go_lock) (struct gfs2_holder *gh);
-	void (*go_unlock) (struct gfs2_holder *gh);
+	void (*go_unlock) (struct gfs2_glock *gl);
 	void (*go_dump)(struct seq_file *seq, struct gfs2_glock *gl);
 	void (*go_callback)(struct gfs2_glock *gl, bool remote);
 	const int go_type;
diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c
index 52a4f340a867..2a7f47923f88 100644
--- a/fs/gfs2/rgrp.c
+++ b/fs/gfs2/rgrp.c
@@ -1306,13 +1306,11 @@ void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd)
  *
  */
 
-void gfs2_rgrp_go_unlock(struct gfs2_holder *gh)
+void gfs2_rgrp_go_unlock(struct gfs2_glock *gl)
 {
-	struct gfs2_rgrpd *rgd = gh->gh_gl->gl_object;
-	int demote_requested = test_bit(GLF_DEMOTE, &gh->gh_gl->gl_flags) |
-		test_bit(GLF_PENDING_DEMOTE, &gh->gh_gl->gl_flags);
+	struct gfs2_rgrpd *rgd = gl->gl_object;
 
-	if (rgd && demote_requested)
+	if (rgd)
 		gfs2_rgrp_brelse(rgd);
 }
 
diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h
index 499079a9dbbe..f2b6665857e1 100644
--- a/fs/gfs2/rgrp.h
+++ b/fs/gfs2/rgrp.h
@@ -36,7 +36,7 @@ extern int gfs2_rindex_update(struct gfs2_sbd *sdp);
 extern void gfs2_free_clones(struct gfs2_rgrpd *rgd);
 extern int gfs2_rgrp_go_lock(struct gfs2_holder *gh);
 extern void gfs2_rgrp_brelse(struct gfs2_rgrpd *rgd);
-extern void gfs2_rgrp_go_unlock(struct gfs2_holder *gh);
+extern void gfs2_rgrp_go_unlock(struct gfs2_glock *gl);
 
 extern struct gfs2_alloc *gfs2_alloc_get(struct gfs2_inode *ip);
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 19/19] gfs2: clean_journal was setting sd_log_flush_head replaying other journals
  2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
                   ` (17 preceding siblings ...)
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 18/19] gfs2: don't call go_unlock unless demote is close at hand Bob Peterson
@ 2019-03-27 12:35 ` Bob Peterson
  18 siblings, 0 replies; 22+ messages in thread
From: Bob Peterson @ 2019-03-27 12:35 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function clean_journal was setting the value of sd_log_flush_head,
but that's only a valid thing to do if it is replaying its own journal.
If it's replaying another node's journal, that's completely wrong and
will lead to multiple problems.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/recovery.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 020cac0e1002..bee5758fd027 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -387,8 +387,10 @@ static void clean_journal(struct gfs2_jdesc *jd,
 {
 	struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
 
-	sdp->sd_log_flush_head = head->lh_blkno;
-	gfs2_replay_incr_blk(jd, &sdp->sd_log_flush_head);
+	if (jd->jd_jid == sdp->sd_lockstruct.ls_jid) {
+		sdp->sd_log_flush_head = head->lh_blkno;
+		gfs2_replay_incr_blk(jd, &sdp->sd_log_flush_head);
+	}
 	gfs2_write_log_header(sdp, jd, head->lh_sequence + 1, 0,
 			      GFS2_LOG_HEAD_UNMOUNT | GFS2_LOG_HEAD_RECOVERY,
 			      REQ_PREFLUSH | REQ_FUA | REQ_META | REQ_SYNC);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 01/19] gfs2: log error reform
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 01/19] gfs2: log error reform Bob Peterson
@ 2019-04-09 13:46   ` Andreas Gruenbacher
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Gruenbacher @ 2019-04-09 13:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

On Wed, 27 Mar 2019 at 13:36, Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch, gfs2 kept track of journal io errors in two
> places (sd_log_error and the SDF_AIL1_IO_ERROR flag in sd_flags.
> This patch consolidates the two by eliminating the SDF_AIL1_IO_ERROR
> flag in favor of an atomic count of journal errors, sd_log_errors.
> When the first io error occurs and its value is incremented to 1,
> sd_log_error is set. Thus, sd_log_error reflects the first error
> we encountered writing to the journal. In future patches, we will
> take advantage of this by checking a single value (sd_log_errors)
> rather than having to check both the flag and the sd_log_error
> when reacting to io errors.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/incore.h     | 4 ++--
>  fs/gfs2/log.c        | 7 ++++---
>  fs/gfs2/lops.c       | 7 +++++--
>  fs/gfs2/ops_fstype.c | 1 +
>  fs/gfs2/quota.c      | 6 ++++--
>  5 files changed, 16 insertions(+), 9 deletions(-)
>
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index cdf07b408f54..76336b592030 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -620,7 +620,6 @@ enum {
>         SDF_RORECOVERY          = 7, /* read only recovery */
>         SDF_SKIP_DLM_UNLOCK     = 8,
>         SDF_FORCE_AIL_FLUSH     = 9,
> -       SDF_AIL1_IO_ERROR       = 10,
>  };
>
>  enum gfs2_freeze_state {
> @@ -829,7 +828,8 @@ struct gfs2_sbd {
>         atomic_t sd_log_in_flight;
>         struct bio *sd_log_bio;
>         wait_queue_head_t sd_log_flush_wait;
> -       int sd_log_error;
> +       int sd_log_error; /* First log error */
> +       atomic_t sd_log_errors; /* Count of log errors */
>
>         atomic_t sd_reserving_log;
>         wait_queue_head_t sd_reserving_log_wait;
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index b8830fda51e8..0717b4d4828b 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -109,8 +109,8 @@ __acquires(&sdp->sd_ail_lock)
>
>                 if (!buffer_busy(bh)) {
>                         if (!buffer_uptodate(bh) &&
> -                           !test_and_set_bit(SDF_AIL1_IO_ERROR,
> -                                             &sdp->sd_flags)) {
> +                           atomic_add_return(1, &sdp->sd_log_errors) == 1) {
> +                               sdp->sd_log_error = -EIO;
>                                 gfs2_io_error_bh(sdp, bh);
>                                 *withdraw = true;
>                         }
> @@ -209,7 +209,8 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
>                 if (buffer_busy(bh))
>                         continue;
>                 if (!buffer_uptodate(bh) &&
> -                   !test_and_set_bit(SDF_AIL1_IO_ERROR, &sdp->sd_flags)) {
> +                   atomic_add_return(1, &sdp->sd_log_errors) == 1) {
> +                       sdp->sd_log_error = -EIO;
>                         gfs2_io_error_bh(sdp, bh);
>                         *withdraw = true;
>                 }
> diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
> index 8722c60b11fe..627738cfe6bb 100644
> --- a/fs/gfs2/lops.c
> +++ b/fs/gfs2/lops.c
> @@ -211,8 +211,11 @@ static void gfs2_end_log_write(struct bio *bio)
>         struct bvec_iter_all iter_all;
>
>         if (bio->bi_status) {
> -               fs_err(sdp, "Error %d writing to journal, jid=%u\n",
> -                      bio->bi_status, sdp->sd_jdesc->jd_jid);
> +               if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
> +                       sdp->sd_log_error = bio->bi_status;
> +                       fs_err(sdp, "Error %d writing to journal, jid=%u\n",
> +                              bio->bi_status, sdp->sd_jdesc->jd_jid);
> +               }
>                 wake_up(&sdp->sd_logd_waitq);
>         }
>
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index b041cb8ae383..77610be6c918 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -129,6 +129,7 @@ static struct gfs2_sbd *init_sbd(struct super_block *sb)
>
>         init_rwsem(&sdp->sd_log_flush_lock);
>         atomic_set(&sdp->sd_log_in_flight, 0);
> +       atomic_set(&sdp->sd_log_errors, 0);
>         atomic_set(&sdp->sd_reserving_log, 0);
>         init_waitqueue_head(&sdp->sd_reserving_log_wait);
>         init_waitqueue_head(&sdp->sd_log_flush_wait);
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index 2ae5a109eea7..009172ef4dfe 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1479,8 +1479,10 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
>         if (error == 0 || error == -EROFS)
>                 return;
>         if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
> -               fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> -               sdp->sd_log_error = error;
> +               if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
> +                       sdp->sd_log_error = error;
> +                       fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> +               }
>                 wake_up(&sdp->sd_logd_waitq);
>         }
>  }
> --
> 2.20.1
>

this looks like the typical use case for cmpxchg, which won't require
an additional counter.

Andreas



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw
  2019-03-27 12:35 ` [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw Bob Peterson
@ 2019-04-09 14:00   ` Andreas Gruenbacher
  0 siblings, 0 replies; 22+ messages in thread
From: Andreas Gruenbacher @ 2019-04-09 14:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

On Wed, 27 Mar 2019 at 13:35, Bob Peterson <rpeterso@redhat.com> wrote:
> File system withdraws can be delayed when inconsistencies are
> discovered when we cannot withdraw immediately, for example, when
> critical spin_locks are held. But delaying the withdraw can cause
> gfs2 to ignore the error and keep running for a short period of time.
> For example, an rgrp glock may be dequeued and demoted while there
> are still buffers that haven't been properly revoked, due to io
> errors writing to the journal.
>
> This patch introduces a new concept of a delayed withdraw, which
> means an inconsistency has been discovered and we need to withdraw
> at the earliest possible opportunity. In these cases, we aren't
> quite withdrawn yet, but we still need to not dequeue glocks and
> other critical things. If we dequeue the glocks and the withdraw
> results in our journal being replayed, the replay could overwrite
> data that's been modified by a different node that acquired the
> glock in the meantime.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/aops.c       |  4 ++--
>  fs/gfs2/file.c       |  2 +-
>  fs/gfs2/glock.c      |  7 +++----
>  fs/gfs2/glops.c      |  2 +-
>  fs/gfs2/incore.h     |  1 +
>  fs/gfs2/log.c        | 20 ++++++++------------
>  fs/gfs2/meta_io.c    |  6 +++---
>  fs/gfs2/ops_fstype.c |  3 +--
>  fs/gfs2/quota.c      |  2 +-
>  fs/gfs2/super.c      |  6 +++---
>  fs/gfs2/sys.c        |  2 +-
>  fs/gfs2/util.c       |  1 +
>  fs/gfs2/util.h       |  8 ++++++++
>  13 files changed, 34 insertions(+), 30 deletions(-)
>
> diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
> index 05dd78f4b2b3..0d3cde8a61cd 100644
> --- a/fs/gfs2/aops.c
> +++ b/fs/gfs2/aops.c
> @@ -521,7 +521,7 @@ static int __gfs2_readpage(void *file, struct page *page)
>                 error = mpage_readpage(page, gfs2_block_map);
>         }
>
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 return -EIO;
>
>         return error;
> @@ -638,7 +638,7 @@ static int gfs2_readpages(struct file *file, struct address_space *mapping,
>         gfs2_glock_dq(&gh);
>  out_uninit:
>         gfs2_holder_uninit(&gh);
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 ret = -EIO;
>         return ret;
>  }
> diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
> index 58a768e59712..2a3ac9747d0d 100644
> --- a/fs/gfs2/file.c
> +++ b/fs/gfs2/file.c
> @@ -1169,7 +1169,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl)
>                 cmd = F_SETLK;
>                 fl->fl_type = F_UNLCK;
>         }
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
> +       if (unlikely(withdrawn(sdp))) {
>                 if (fl->fl_type == F_UNLCK)
>                         locks_lock_file_wait(file, fl);
>                 return -EIO;
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index d32964cd1117..4330164de8bd 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -542,7 +542,7 @@ __acquires(&gl->gl_lockref.lock)
>         unsigned int lck_flags = (unsigned int)(gh ? gh->gh_flags : 0);
>         int ret;
>
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
> +       if (unlikely(withdrawn(sdp)) &&
>             target != LM_ST_UNLOCKED)
>                 return;
>         lck_flags &= (LM_FLAG_TRY | LM_FLAG_TRY_1CB | LM_FLAG_NOEXP |
> @@ -579,8 +579,7 @@ __acquires(&gl->gl_lockref.lock)
>                 }
>                 else if (ret) {
>                         fs_err(sdp, "lm_lock ret %d\n", ret);
> -                       GLOCK_BUG_ON(gl, !test_bit(SDF_SHUTDOWN,
> -                                                  &sdp->sd_flags));
> +                       GLOCK_BUG_ON(gl, !withdrawn(sdp));
>                 }
>         } else { /* lock_nolock */
>                 finish_xmote(gl, target);
> @@ -1092,7 +1091,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh)
>         struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
>         int error = 0;
>
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 return -EIO;
>
>         if (test_bit(GLF_LRU, &gl->gl_flags))
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 78510ab91835..719961b5a511 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -538,7 +538,7 @@ static int freeze_go_xmote_bh(struct gfs2_glock *gl, struct gfs2_holder *gh)
>                         gfs2_consist(sdp);
>
>                 /*  Initialize some head of the log stuff  */
> -               if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
> +               if (!withdrawn(sdp)) {
>                         sdp->sd_log_sequence = head.lh_sequence + 1;
>                         gfs2_log_pointers_init(sdp, head.lh_blkno);
>                 }
> diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
> index 76336b592030..c6984265807f 100644
> --- a/fs/gfs2/incore.h
> +++ b/fs/gfs2/incore.h
> @@ -620,6 +620,7 @@ enum {
>         SDF_RORECOVERY          = 7, /* read only recovery */
>         SDF_SKIP_DLM_UNLOCK     = 8,
>         SDF_FORCE_AIL_FLUSH     = 9,
> +       SDF_PENDING_WITHDRAW    = 10, /* Will withdraw eventually */
>  };
>
>  enum gfs2_freeze_state {
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 0717b4d4828b..c79279ef03b8 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -92,8 +92,7 @@ static void gfs2_remove_from_ail(struct gfs2_bufdata *bd)
>
>  static int gfs2_ail1_start_one(struct gfs2_sbd *sdp,
>                                struct writeback_control *wbc,
> -                              struct gfs2_trans *tr,
> -                              bool *withdraw)
> +                              struct gfs2_trans *tr)
>  __releases(&sdp->sd_ail_lock)
>  __acquires(&sdp->sd_ail_lock)
>  {
> @@ -112,7 +111,7 @@ __acquires(&sdp->sd_ail_lock)
>                             atomic_add_return(1, &sdp->sd_log_errors) == 1) {
>                                 sdp->sd_log_error = -EIO;
>                                 gfs2_io_error_bh(sdp, bh);
> -                               *withdraw = true;
> +                               set_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
>                         }
>                         list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
>                         continue;
> @@ -153,7 +152,6 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
>         struct list_head *head = &sdp->sd_ail1_list;
>         struct gfs2_trans *tr;
>         struct blk_plug plug;
> -       bool withdraw = false;
>
>         trace_gfs2_ail_flush(sdp, wbc, 1);
>         blk_start_plug(&plug);
> @@ -162,12 +160,12 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
>         list_for_each_entry_reverse(tr, head, tr_list) {
>                 if (wbc->nr_to_write <= 0)
>                         break;
> -               if (gfs2_ail1_start_one(sdp, wbc, tr, &withdraw))
> +               if (gfs2_ail1_start_one(sdp, wbc, tr))
>                         goto restart;
>         }
>         spin_unlock(&sdp->sd_ail_lock);
>         blk_finish_plug(&plug);
> -       if (withdraw)
> +       if (test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
>                 gfs2_lm_withdraw(sdp, NULL);
>         trace_gfs2_ail_flush(sdp, wbc, 0);
>  }
> @@ -196,8 +194,7 @@ static void gfs2_ail1_start(struct gfs2_sbd *sdp)
>   *
>   */
>
> -static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
> -                               bool *withdraw)
> +static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
>  {
>         struct gfs2_bufdata *bd, *s;
>         struct buffer_head *bh;
> @@ -212,7 +209,7 @@ static void gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_trans *tr,
>                     atomic_add_return(1, &sdp->sd_log_errors) == 1) {
>                         sdp->sd_log_error = -EIO;
>                         gfs2_io_error_bh(sdp, bh);
> -                       *withdraw = true;
> +                       set_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
>                 }
>                 list_move(&bd->bd_ail_st_list, &tr->tr_ail2_list);
>         }
> @@ -230,11 +227,10 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
>         struct gfs2_trans *tr, *s;
>         int oldest_tr = 1;
>         int ret;
> -       bool withdraw = false;
>
>         spin_lock(&sdp->sd_ail_lock);
>         list_for_each_entry_safe_reverse(tr, s, &sdp->sd_ail1_list, tr_list) {
> -               gfs2_ail1_empty_one(sdp, tr, &withdraw);
> +               gfs2_ail1_empty_one(sdp, tr);
>                 if (list_empty(&tr->tr_ail1_list) && oldest_tr)
>                         list_move(&tr->tr_list, &sdp->sd_ail2_list);
>                 else
> @@ -243,7 +239,7 @@ static int gfs2_ail1_empty(struct gfs2_sbd *sdp)
>         ret = list_empty(&sdp->sd_ail1_list);
>         spin_unlock(&sdp->sd_ail_lock);
>
> -       if (withdraw)
> +       if (test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
>                 gfs2_lm_withdraw(sdp, "fatal: I/O error(s)\n");
>
>         return ret;
> diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c
> index 3201342404a7..b02c7eb2ded3 100644
> --- a/fs/gfs2/meta_io.c
> +++ b/fs/gfs2/meta_io.c
> @@ -255,7 +255,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
>         struct buffer_head *bh, *bhs[2];
>         int num = 0;
>
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags))) {
> +       if (unlikely(withdrawn(sdp))) {
>                 *bhp = NULL;
>                 return -EIO;
>         }
> @@ -313,7 +313,7 @@ int gfs2_meta_read(struct gfs2_glock *gl, u64 blkno, int flags,
>
>  int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
>  {
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 return -EIO;
>
>         wait_on_buffer(bh);
> @@ -324,7 +324,7 @@ int gfs2_meta_wait(struct gfs2_sbd *sdp, struct buffer_head *bh)
>                         gfs2_io_error_bh_wd(sdp, bh);
>                 return -EIO;
>         }
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 return -EIO;
>
>         return 0;
> diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
> index 77610be6c918..de328b9a5648 100644
> --- a/fs/gfs2/ops_fstype.c
> +++ b/fs/gfs2/ops_fstype.c
> @@ -999,8 +999,7 @@ static int gfs2_lm_mount(struct gfs2_sbd *sdp, int silent)
>  void gfs2_lm_unmount(struct gfs2_sbd *sdp)
>  {
>         const struct lm_lockops *lm = sdp->sd_lockstruct.ls_ops;
> -       if (likely(!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) &&
> -           lm->lm_unmount)
> +       if (likely(!withdrawn(sdp)) && lm->lm_unmount)
>                 lm->lm_unmount(sdp);
>  }
>
> diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
> index 009172ef4dfe..5a1a95fb57ca 100644
> --- a/fs/gfs2/quota.c
> +++ b/fs/gfs2/quota.c
> @@ -1478,7 +1478,7 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
>  {
>         if (error == 0 || error == -EROFS)
>                 return;
> -       if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
> +       if (!withdrawn(sdp)) {
>                 if (atomic_add_return(1, &sdp->sd_log_errors) == 1) {
>                         sdp->sd_log_error = error;
>                         fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index ca71163ff7cf..5db590d57564 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -802,7 +802,7 @@ static void gfs2_dirty_inode(struct inode *inode, int flags)
>
>         if (!(flags & I_DIRTY_INODE))
>                 return;
> -       if (unlikely(test_bit(SDF_SHUTDOWN, &sdp->sd_flags)))
> +       if (unlikely(withdrawn(sdp)))
>                 return;
>         if (!gfs2_glock_is_locked_by_me(ip->i_gl)) {
>                 ret = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
> @@ -851,7 +851,7 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
>
>         error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_SHARED, GL_NOCACHE,
>                                    &freeze_gh);
> -       if (error && !test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
> +       if (error && !withdrawn(sdp))
>                 return error;
>
>         flush_workqueue(gfs2_delete_workqueue);
> @@ -1005,7 +1005,7 @@ static int gfs2_freeze(struct super_block *sb)
>         if (atomic_read(&sdp->sd_freeze_state) != SFS_UNFROZEN)
>                 goto out;
>
> -       if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
> +       if (withdrawn(sdp)) {
>                 error = -EINVAL;
>                 goto out;
>         }
> diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
> index 1787d295834e..a2419e92a64e 100644
> --- a/fs/gfs2/sys.c
> +++ b/fs/gfs2/sys.c
> @@ -121,7 +121,7 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
>
>  static ssize_t withdraw_show(struct gfs2_sbd *sdp, char *buf)
>  {
> -       unsigned int b = test_bit(SDF_SHUTDOWN, &sdp->sd_flags);
> +       unsigned int b = withdrawn(sdp);
>         return snprintf(buf, PAGE_SIZE, "%u\n", b);
>  }
>
> diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
> index 0a814ccac41d..717aef772c60 100644
> --- a/fs/gfs2/util.c
> +++ b/fs/gfs2/util.c
> @@ -47,6 +47,7 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
>             test_and_set_bit(SDF_SHUTDOWN, &sdp->sd_flags))
>                 return 0;
>
> +       clear_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags);
>         if (fmt) {
>                 va_start(args, fmt);
>
> diff --git a/fs/gfs2/util.h b/fs/gfs2/util.h
> index 9278fecba632..16e087da3bd3 100644
> --- a/fs/gfs2/util.h
> +++ b/fs/gfs2/util.h
> @@ -167,6 +167,14 @@ static inline unsigned int gfs2_tune_get_i(struct gfs2_tune *gt,
>         return x;
>  }
>
> +static inline bool withdrawn(struct gfs2_sbd *sdp)
> +{
> +       if (test_bit(SDF_SHUTDOWN, &sdp->sd_flags) ||
> +           test_bit(SDF_PENDING_WITHDRAW, &sdp->sd_flags))
> +               return true;
> +       return false;
> +}
> +
>  #define gfs2_tune_get(sdp, field) \
>  gfs2_tune_get_i(&(sdp)->sd_tune, &(sdp)->sd_tune.field)
>
> --
> 2.20.1
>

this patch looks good, I'm not sure why this isn't the way things are
done already.

I'm not fully happy about how things are called though. What do you
think of renaming SDF_PENDING_WITHDRAW -> SDF_WITHDRAW, withdrawn ->
gfs2_withdraw?

Is there actually a need to clear the SDF_PENDING_WITHDRAW flag in
gfs2_lm_withdraw? It's not that a mounted filesystem can ever become
"unwithdrawn".

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-04-09 14:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-27 12:35 [Cluster-devel] [PATCH 00/19] gfs2: misc recovery patch collection Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 01/19] gfs2: log error reform Bob Peterson
2019-04-09 13:46   ` Andreas Gruenbacher
2019-03-27 12:35 ` [Cluster-devel] [PATCH 02/19] gfs2: Introduce concept of a pending withdraw Bob Peterson
2019-04-09 14:00   ` Andreas Gruenbacher
2019-03-27 12:35 ` [Cluster-devel] [PATCH 03/19] gfs2: Ignore recovery attempts if gfs2 has io error or is withdrawn Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 04/19] gfs2: move check_journal_clean to util.c for future use Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 05/19] gfs2: Allow some glocks to be used during withdraw Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 06/19] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 07/19] gfs2: Don't write log headers after file system withdraw Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 08/19] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 09/19] gfs2: Add verbose option to check_journal_clean Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 10/19] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 11/19] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 12/19] gfs2: If the journal isn't live ignore log flushes Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 13/19] gfs2: Issue revokes more intelligently Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 14/19] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 15/19] gfs2: log which portion of the journal is replayed Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 16/19] gfs2: Only remove revokes that we've submitted Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 17/19] gfs2: eliminate tr_num_revoke_rm Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 18/19] gfs2: don't call go_unlock unless demote is close at hand Bob Peterson
2019-03-27 12:35 ` [Cluster-devel] [PATCH 19/19] gfs2: clean_journal was setting sd_log_flush_head replaying other journals Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.