All of lore.kernel.org
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0/8] Misc Patch Collection
@ 2020-05-26 13:05 Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 1/8] gfs2: Don't ignore inode write errors during inode_go_sync Bob Peterson
                   ` (8 more replies)
  0 siblings, 9 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

I've been working on a problem whereby glock.c was causing a BUG panic
because gl_revokes was non-zero when the glock is freed. To debug and fix
this, I developed a series of patches, including some new instrumentation
and trace points. Some of these patches are independent, so could be pushed
(or rejected) independently of the others.

So this is my current patch collection. The patch that fixes the problem is
"gfs2: add memory barriers to gfs2_glock_remove_revoke".

Andreas expressed some concerns about some of the others. For example, he
didn't like that the new "status" sysfs file was taking "try" locks, but
if the lock is held, I don't know of a better way to do this. He also expressed
a concern that the new file should be in debugfs rather than sysfs.
I'm open to opinions. Regardless of where it is, the new debug file is a
perfect candidate to include in sos reports.

Bob Peterson

Bob Peterson (8):
  gfs2: Don't ignore inode write errors during inode_go_sync
  gfs2: Allow lock_nolock mount to specify jid=X
  gfs2: add memory barriers to gfs2_glock_remove_revoke
  gfs2: Add new sysfs file for gfs2 status
  gfs2: print mapping->nrpages in glock dump for address space glocks
  gfs2: instrumentation wrt log_flush stuck
  gfs2: Add new trace point for glock ail management
  gfs2: introduce new gfs2_glock_assert_withdraw

 fs/gfs2/glock.c      |  32 +++++++-----
 fs/gfs2/glock.h      |   9 ++++
 fs/gfs2/glops.c      |   8 +--
 fs/gfs2/log.c        |  32 +++++++++---
 fs/gfs2/log.h        |   2 +-
 fs/gfs2/lops.c       |   6 +--
 fs/gfs2/ops_fstype.c |   2 +-
 fs/gfs2/sys.c        | 113 +++++++++++++++++++++++++++++++++++++++++++
 fs/gfs2/trace_gfs2.h |  61 +++++++++++++++++++++++
 fs/gfs2/trans.c      |   4 +-
 10 files changed, 241 insertions(+), 28 deletions(-)

-- 
2.26.2



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 1/8] gfs2: Don't ignore inode write errors during inode_go_sync
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 2/8] gfs2: Allow lock_nolock mount to specify jid=X Bob Peterson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before for this patch, function inode_go_sync ignored io errors
during inode_go_sync, overwriting them with metadata write errors:

		error = filemap_fdatawait(mapping);
		mapping_set_error(mapping, error);
	}
	error = filemap_fdatawait(metamapping);
	...
	return error;

So any errors returned by the inode write would be forgotten if the
metadata write succeeded. This patch still does both writes, but
only sets error if it's still zero. That way, any errors will be
reported by to the caller, do_xmote, which will take appropriate
action and report the error.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 9e9c7a4b8c66..4862dae868a2 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -268,7 +268,7 @@ static int inode_go_sync(struct gfs2_glock *gl)
 	struct gfs2_inode *ip = gfs2_glock2inode(gl);
 	int isreg = ip && S_ISREG(ip->i_inode.i_mode);
 	struct address_space *metamapping = gfs2_glock2aspace(gl);
-	int error = 0;
+	int error = 0, ret;
 
 	if (isreg) {
 		if (test_and_clear_bit(GIF_SW_PAGED, &ip->i_flags))
@@ -289,8 +289,10 @@ static int inode_go_sync(struct gfs2_glock *gl)
 		error = filemap_fdatawait(mapping);
 		mapping_set_error(mapping, error);
 	}
-	error = filemap_fdatawait(metamapping);
-	mapping_set_error(metamapping, error);
+	ret = filemap_fdatawait(metamapping);
+	mapping_set_error(metamapping, ret);
+	if (!error)
+		error = ret;
 	gfs2_ail_empty_gl(gl);
 	/*
 	 * Writeback of the data mapping may cause the dirty flag to be set
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 2/8] gfs2: Allow lock_nolock mount to specify jid=X
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 1/8] gfs2: Don't ignore inode write errors during inode_go_sync Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 3/8] gfs2: add memory barriers to gfs2_glock_remove_revoke Bob Peterson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, a simple typo accidentally added \n to the jid=
string for lock_nolock mounts. This made it impossible to mount a
gfs2 file system with a journal other than journal0. Thus:

mount -tgfs2 -o hostdata="jid=1" <device> <mount pt>

Resulted in:
mount: wrong fs type, bad option, bad superblock on <device>

In most cases this is not a problem. However, for debugging and
testing purposes we sometimes want to test the integrity of other
journals. This patch removes the unnecessary \n and thus allows
lock_nolock users to specify an alternate journal.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/ops_fstype.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e2b69ffcc6a8..094f5fe7c009 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -880,7 +880,7 @@ static int init_per_node(struct gfs2_sbd *sdp, int undo)
 }
 
 static const match_table_t nolock_tokens = {
-	{ Opt_jid, "jid=%d\n", },
+	{ Opt_jid, "jid=%d", },
 	{ Opt_err, NULL },
 };
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 3/8] gfs2: add memory barriers to gfs2_glock_remove_revoke
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 1/8] gfs2: Don't ignore inode write errors during inode_go_sync Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 2/8] gfs2: Allow lock_nolock mount to specify jid=X Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 4/8] gfs2: Add new sysfs file for gfs2 status Bob Peterson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Function gfs2_glock_remove_revoke did atomic_dec_return for gl_revokes
without memory barriers. But according to Documentation/core-api/atomic_ops:

"Unlike the above routines, it is required that these primitives
include explicit memory barriers that are performed before and after
the operation.  It must be done such that all memory operations before
and after the atomic operation calls are strongly ordered with respect
to the atomic operation itself."

This patch adds the barriers.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 0644e58c6191..1d51b4781bdd 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -682,7 +682,12 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 
 void gfs2_glock_remove_revoke(struct gfs2_glock *gl)
 {
-	if (atomic_dec_return(&gl->gl_revokes) == 0) {
+	int revokes;
+
+	smp_mb__before_atomic();
+	revokes = atomic_dec_return(&gl->gl_revokes);
+	smp_mb__after_atomic();
+	if (revokes == 0) {
 		clear_bit(GLF_LFLUSH, &gl->gl_flags);
 		gfs2_glock_queue_put(gl);
 	}
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 4/8] gfs2: Add new sysfs file for gfs2 status
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (2 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 3/8] gfs2: add memory barriers to gfs2_glock_remove_revoke Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 5/8] gfs2: print mapping->nrpages in glock dump for address space glocks Bob Peterson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This patch adds a new file: /sys/fs/gfs2/*/status which will report
the status of the file system. Catting this file dumps the current
status of the file system according to various superblock variables.
For example:

Journal Checked:   1
Journal Live:      1
Withdrawn:         0
No barriers:       0
No recovery:       0
Demote:            0
No Journal ID:     1
RO Recovery:       0
Skip DLM Unlock:   0
Force AIL Flush:   0
FS Frozen:         0
Withdrawing:       0
Withdraw In Prog:  0
Remote Withdraw:   0
Withdraw Recovery: 0
sd_log_lock held:  0
statfs_spin held:  0
sd_rindex_spin:    0
sd_jindex_spin:    0
sd_trunc_lock:     0
sd_bitmap_lock:    0
sd_ordered_lock:   0
sd_ail_lock:       0
sd_log_error:      0
sd_log_flush_lock: 0
sd_log_flush_lock: 1
sd_ail1_list:      281 transactions, 11 ail1, 1 ail2
sd_ail2_list:      0 transactions, 0 ail1, 0 ail2
sd_log_revokes:    0 / 0
sd_log_in_flight:  1

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/sys.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index d28c41bd69b0..ea86ba5e4afb 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -63,6 +63,117 @@ static ssize_t id_show(struct gfs2_sbd *sdp, char *buf)
 			MAJOR(sdp->sd_vfs->s_dev), MINOR(sdp->sd_vfs->s_dev));
 }
 
+static int count_trans_list(struct gfs2_sbd *sdp, struct list_head *head,
+			    int *ail1, int *ail2)
+{
+	struct gfs2_trans *tr;
+	struct gfs2_bufdata *bd;
+	int unlock_spin, trans = 0;
+
+	*ail1 = *ail2 = 0;
+	unlock_spin = spin_trylock(&sdp->sd_ail_lock);
+
+	list_for_each_entry(tr, head, tr_list) {
+		trans++;
+		list_for_each_entry(bd, &tr->tr_ail1_list, bd_ail_st_list)
+			(*ail1)++;
+		list_for_each_entry(bd, &tr->tr_ail2_list, bd_ail_st_list)
+			(*ail2)++;
+	}
+
+	if (unlock_spin)
+		spin_unlock(&sdp->sd_ail_lock);
+
+	return trans;
+}
+
+static int count_list(struct gfs2_sbd *sdp, struct list_head *head)
+{
+	int unlock_spin, count = 0;
+	struct gfs2_bufdata *bd;
+
+	unlock_spin = spin_trylock(&sdp->sd_log_lock);
+
+	list_for_each_entry(bd, head, bd_list)
+		count++;
+	if (unlock_spin)
+		spin_unlock(&sdp->sd_log_lock);
+
+	return count;
+}
+
+static ssize_t status_show(struct gfs2_sbd *sdp, char *buf)
+{
+	unsigned long f = sdp->sd_flags;
+	int trans1, ail11, ail12, trans2, ail21, ail22;
+	unsigned int revokes;
+	ssize_t s;
+
+	trans1 = count_trans_list(sdp, &sdp->sd_ail1_list, &ail11, &ail12);
+	trans2 = count_trans_list(sdp, &sdp->sd_ail2_list, &ail21, &ail22);
+	revokes = count_list(sdp, &sdp->sd_log_revokes);
+	s= snprintf(buf, PAGE_SIZE,
+		    "Journal Checked:   %d\n"
+		    "Journal Live:      %d\n"
+		    "Withdrawn:         %d\n"
+		    "No barriers:       %d\n"
+		    "No recovery:       %d\n"
+		    "Demote:            %d\n"
+		    "No Journal ID:     %d\n"
+		    "RO Recovery:       %d\n"
+		    "Skip DLM Unlock:   %d\n"
+		    "Force AIL Flush:   %d\n"
+		    "FS Frozen:         %d\n"
+		    "Withdrawing:       %d\n"
+		    "Withdraw In Prog:  %d\n"
+		    "Remote Withdraw:   %d\n"
+		    "Withdraw Recovery: %d\n"
+		    "sd_log_lock held:  %d\n"
+		    "statfs_spin held:  %d\n"
+		    "sd_rindex_spin:    %d\n"
+		    "sd_jindex_spin:    %d\n"
+		    "sd_trunc_lock:     %d\n"
+		    "sd_bitmap_lock:    %d\n"
+		    "sd_ordered_lock:   %d\n"
+		    "sd_ail_lock:       %d\n"
+		    "sd_log_error:      %d\n"
+		    "sd_log_flush_lock: %d\n"
+		    "sd_ail1_list:      %d transactions, %d ail1, %d ail2\n"
+		    "sd_ail2_list:      %d transactions, %d ail1, %d ail2\n"
+		    "sd_log_revokes:    %d / %u\n"
+		    "sd_log_in_flight:  %d\n",
+		    test_bit(SDF_JOURNAL_CHECKED, &f),
+		    test_bit(SDF_JOURNAL_LIVE, &f),
+		    test_bit(SDF_WITHDRAWN, &f),
+		    test_bit(SDF_NOBARRIERS, &f),
+		    test_bit(SDF_NORECOVERY, &f),
+		    test_bit(SDF_DEMOTE, &f),
+		    test_bit(SDF_NOJOURNALID, &f),
+		    test_bit(SDF_RORECOVERY, &f),
+		    test_bit(SDF_SKIP_DLM_UNLOCK, &f),
+		    test_bit(SDF_FORCE_AIL_FLUSH, &f),
+		    test_bit(SDF_FS_FROZEN, &f),
+		    test_bit(SDF_WITHDRAWING, &f),
+		    test_bit(SDF_WITHDRAW_IN_PROG, &f),
+		    test_bit(SDF_REMOTE_WITHDRAW, &f),
+		    test_bit(SDF_WITHDRAW_RECOVERY, &f),
+		    spin_is_locked(&sdp->sd_log_lock),
+		    spin_is_locked(&sdp->sd_statfs_spin),
+		    spin_is_locked(&sdp->sd_rindex_spin),
+		    spin_is_locked(&sdp->sd_jindex_spin),
+		    spin_is_locked(&sdp->sd_trunc_lock),
+		    spin_is_locked(&sdp->sd_bitmap_lock),
+		    spin_is_locked(&sdp->sd_ordered_lock),
+		    spin_is_locked(&sdp->sd_ail_lock),
+		    sdp->sd_log_error,
+		    rwsem_is_locked(&sdp->sd_log_flush_lock),
+		    trans1, ail11, ail12,
+		    trans2, ail21, ail22,
+		    revokes, sdp->sd_log_num_revoke,
+		    atomic_read(&sdp->sd_log_in_flight));
+	return s;
+}
+
 static ssize_t fsname_show(struct gfs2_sbd *sdp, char *buf)
 {
 	return snprintf(buf, PAGE_SIZE, "%s\n", sdp->sd_fsname);
@@ -283,6 +394,7 @@ GFS2_ATTR(quota_sync,          0200, NULL,          quota_sync_store);
 GFS2_ATTR(quota_refresh_user,  0200, NULL,          quota_refresh_user_store);
 GFS2_ATTR(quota_refresh_group, 0200, NULL,          quota_refresh_group_store);
 GFS2_ATTR(demote_rq,           0200, NULL,	    demote_rq_store);
+GFS2_ATTR(status,              0444, status_show,   NULL);
 
 static struct attribute *gfs2_attrs[] = {
 	&gfs2_attr_id.attr,
@@ -295,6 +407,7 @@ static struct attribute *gfs2_attrs[] = {
 	&gfs2_attr_quota_refresh_user.attr,
 	&gfs2_attr_quota_refresh_group.attr,
 	&gfs2_attr_demote_rq.attr,
+	&gfs2_attr_status.attr,
 	NULL,
 };
 ATTRIBUTE_GROUPS(gfs2);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 5/8] gfs2: print mapping->nrpages in glock dump for address space glocks
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (3 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 4/8] gfs2: Add new sysfs file for gfs2 status Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck Bob Peterson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index bf70e3b14938..9a5dadc93cfc 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1978,7 +1978,13 @@ void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl, bool fsid)
 	char gflags_buf[32];
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	char fs_id_buf[sizeof(sdp->sd_fsname) + 7];
+	unsigned long nrpages = 0;
 
+	if (gl->gl_ops->go_flags & GLOF_ASPACE) {
+		struct address_space *mapping = gfs2_glock2aspace(gl);
+
+		nrpages = mapping->nrpages;
+	}
 	memset(fs_id_buf, 0, sizeof(fs_id_buf));
 	if (fsid && sdp) /* safety precaution */
 		sprintf(fs_id_buf, "fsid=%s: ", sdp->sd_fsname);
@@ -1987,15 +1993,16 @@ void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl, bool fsid)
 	if (!test_bit(GLF_DEMOTE, &gl->gl_flags))
 		dtime = 0;
 	gfs2_print_dbg(seq, "%sG:  s:%s n:%u/%llx f:%s t:%s d:%s/%llu a:%d "
-		       "v:%d r:%d m:%ld\n", fs_id_buf, state2str(gl->gl_state),
-		  gl->gl_name.ln_type,
-		  (unsigned long long)gl->gl_name.ln_number,
-		  gflags2str(gflags_buf, gl),
-		  state2str(gl->gl_target),
-		  state2str(gl->gl_demote_state), dtime,
-		  atomic_read(&gl->gl_ail_count),
-		  atomic_read(&gl->gl_revokes),
-		  (int)gl->gl_lockref.count, gl->gl_hold_time);
+		       "v:%d r:%d m:%ld p:%lu\n",
+		       fs_id_buf, state2str(gl->gl_state),
+		       gl->gl_name.ln_type,
+		       (unsigned long long)gl->gl_name.ln_number,
+		       gflags2str(gflags_buf, gl),
+		       state2str(gl->gl_target),
+		       state2str(gl->gl_demote_state), dtime,
+		       atomic_read(&gl->gl_ail_count),
+		       atomic_read(&gl->gl_revokes),
+		       (int)gl->gl_lockref.count, gl->gl_hold_time, nrpages);
 
 	list_for_each_entry(gh, &gl->gl_holders, gh_list)
 		dump_holder(seq, gh, fs_id_buf);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (4 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 5/8] gfs2: print mapping->nrpages in glock dump for address space glocks Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-06-05 13:24   ` Andreas Gruenbacher
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management Bob Peterson
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This adds checks for gfs2_log_flush being stuck, similarly to the check
in gfs2_ail1_flush.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 1d51b4781bdd..636c82dda68b 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -145,9 +145,6 @@ static void dump_ail_list(struct gfs2_sbd *sdp)
 	struct gfs2_bufdata *bd;
 	struct buffer_head *bh;
 
-	fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! t=%d\n",
-	       current->journal_info ? 1 : 0);
-
 	list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
 		list_for_each_entry_reverse(bd, &tr->tr_ail1_list,
 					    bd_ail_st_list) {
@@ -197,6 +194,8 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
 restart:
 	ret = 0;
 	if (time_after(jiffies, flush_start + (HZ * 600))) {
+		fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! "
+		       "t=%d\n", current->journal_info ? 1 : 0);
 		dump_ail_list(sdp);
 		goto out;
 	}
@@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
 
 	if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
 		if (!sdp->sd_log_idle) {
+			unsigned long start = jiffies;
+
 			for (;;) {
+				if (time_after(jiffies, start + (HZ * 600))) {
+					fs_err(sdp, "Error: In gfs2_log_flush "
+					       "for 10 minutes! t=%d\n",
+					       current->journal_info ? 1 : 0);
+					dump_ail_list(sdp);
+					break;
+				}
 				gfs2_ail1_start(sdp);
 				gfs2_ail1_wait(sdp);
 				if (gfs2_ail1_empty(sdp, 0))
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (5 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-06-05 14:43   ` Andreas Gruenbacher
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 8/8] gfs2: introduce new gfs2_glock_assert_withdraw Bob Peterson
  2020-06-05 14:38 ` [Cluster-devel] [PATCH 0/8] Misc Patch Collection Andreas Gruenbacher
  8 siblings, 1 reply; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/log.c        | 13 +++++++---
 fs/gfs2/log.h        |  2 +-
 fs/gfs2/lops.c       |  6 ++---
 fs/gfs2/trace_gfs2.h | 61 ++++++++++++++++++++++++++++++++++++++++++++
 fs/gfs2/trans.c      |  4 ++-
 5 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 636c82dda68b..28699a1e5fbd 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -75,6 +75,7 @@ static void gfs2_remove_from_ail(struct gfs2_bufdata *bd)
 	list_del_init(&bd->bd_ail_st_list);
 	list_del_init(&bd->bd_ail_gl_list);
 	atomic_dec(&bd->bd_gl->gl_ail_count);
+	trace_gfs2_glock_ail(bd, 3);
 	brelse(bd->bd_bh);
 }
 
@@ -675,21 +676,25 @@ void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd)
 	bd->bd_blkno = bh->b_blocknr;
 	gfs2_remove_from_ail(bd); /* drops ref on bh */
 	bd->bd_bh = NULL;
+	trace_gfs2_glock_ail(bd, 5);
 	set_bit(GLF_LFLUSH, &gl->gl_flags);
+	trace_gfs2_glock_ail(bd, 6);
 	list_add(&bd->bd_list, &sdp->sd_log_revokes);
 }
 
-void gfs2_glock_remove_revoke(struct gfs2_glock *gl)
+void gfs2_glock_remove_revoke(struct gfs2_bufdata *bd)
 {
 	int revokes;
 
 	smp_mb__before_atomic();
-	revokes = atomic_dec_return(&gl->gl_revokes);
+	revokes = atomic_dec_return(&bd->bd_gl->gl_revokes);
 	smp_mb__after_atomic();
 	if (revokes == 0) {
-		clear_bit(GLF_LFLUSH, &gl->gl_flags);
-		gfs2_glock_queue_put(gl);
+		clear_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
+		trace_gfs2_glock_ail(bd, 7);
+		gfs2_glock_queue_put(bd->bd_gl);
 	}
+	trace_gfs2_glock_ail(bd, 8);
 }
 
 /**
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index c1cd6ae17659..5fbfc0dc4b7a 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -77,7 +77,7 @@ extern void log_flush_wait(struct gfs2_sbd *sdp);
 
 extern int gfs2_logd(void *data);
 extern void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata *bd);
-extern void gfs2_glock_remove_revoke(struct gfs2_glock *gl);
+extern void gfs2_glock_remove_revoke(struct gfs2_bufdata *bd);
 extern void gfs2_write_revokes(struct gfs2_sbd *sdp);
 
 #endif /* __LOG_DOT_H__ */
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 48b54ec1c793..c9b430295506 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -118,12 +118,14 @@ static void gfs2_unpin(struct gfs2_sbd *sdp, struct buffer_head *bh,
 		struct gfs2_glock *gl = bd->bd_gl;
 		list_add(&bd->bd_ail_gl_list, &gl->gl_ail_list);
 		atomic_inc(&gl->gl_ail_count);
+		trace_gfs2_glock_ail(bd, 4);
 	}
 	bd->bd_tr = tr;
 	list_add(&bd->bd_ail_st_list, &tr->tr_ail1_list);
 	spin_unlock(&sdp->sd_ail_lock);
 
 	clear_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
+	trace_gfs2_glock_ail(bd, 2);
 	trace_gfs2_pin(bd, 0);
 	unlock_buffer(bh);
 	atomic_dec(&sdp->sd_log_pinned);
@@ -906,13 +908,11 @@ static void revoke_lo_after_commit(struct gfs2_sbd *sdp, struct gfs2_trans *tr)
 {
 	struct list_head *head = &sdp->sd_log_revokes;
 	struct gfs2_bufdata *bd;
-	struct gfs2_glock *gl;
 
 	while (!list_empty(head)) {
 		bd = list_first_entry(head, struct gfs2_bufdata, bd_list);
 		list_del_init(&bd->bd_list);
-		gl = bd->bd_gl;
-		gfs2_glock_remove_revoke(gl);
+		gfs2_glock_remove_revoke(bd);
 		kmem_cache_free(gfs2_bufdata_cachep, bd);
 	}
 }
diff --git a/fs/gfs2/trace_gfs2.h b/fs/gfs2/trace_gfs2.h
index e0025258107a..7cc090eb4fad 100644
--- a/fs/gfs2/trace_gfs2.h
+++ b/fs/gfs2/trace_gfs2.h
@@ -34,6 +34,16 @@
 			    { GFS2_BLKST_DINODE, "dinode" },	\
 			    { GFS2_BLKST_UNLINKED, "unlinked" })
 
+#define ail_caller(x) __print_symbolic(x,				\
+				  {1, "gfs2_trans_add_meta  lflush++"},  \
+				  {2, "gfs2_unpin           lflush--"},	\
+				  {3, "gfs2_remove_from_ail gl_ail_count--"}, \
+				  {4, "gfs2_unpin           gl_ail_count++"}, \
+			          {5, "gfs2_add_revoke      gl_revokes++"}, \
+				  {6, "gfs2_add_revoke      lflush++"},	\
+				  {7, "glock_remove_revoke  lflush--"}, \
+				  {8, "glock_remove_revoke  gl_revokes--"})
+
 #define TRACE_RS_DELETE  0
 #define TRACE_RS_TREEDEL 1
 #define TRACE_RS_INSERT  2
@@ -158,6 +168,57 @@ TRACE_EVENT(gfs2_glock_put,
 
 );
 
+/* ail list management for a glock */
+TRACE_EVENT(gfs2_glock_ail,
+
+	TP_PROTO(const struct gfs2_bufdata *bd, int caller),
+
+	TP_ARGS(bd, caller),
+
+	TP_STRUCT__entry(
+		__field(        dev_t,  dev                     )
+		__field(	u64,	glnum			)
+		__field(	u32,	gltype			)
+		__field(	u64,	blk			)
+		__field(	int,	caller			)
+		__field(	int,	lflush			)
+		__field(	int,	ail_count		)
+		__field(	int,	revokes			)
+		__field(	int,	bd_list_q		)
+		__field(	int,	bd_ail_st_list_q	)
+		__field(	int,	bd_ail_gl_list_q	)
+		__field(	int,	gl_refs			)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= bd->bd_gl->gl_name.ln_sbd->sd_vfs->s_dev;
+		__entry->gltype		= bd->bd_gl->gl_name.ln_type;
+		__entry->glnum		= bd->bd_gl->gl_name.ln_number;
+		__entry->blk		= bd->bd_bh ? bd->bd_bh->b_blocknr :
+					  bd->bd_blkno;
+		__entry->caller		= caller;
+		__entry->lflush		= test_bit(GLF_LFLUSH,
+						   &bd->bd_gl->gl_flags);
+		__entry->ail_count	= atomic_read(&bd->bd_gl->gl_ail_count);
+		__entry->revokes	= atomic_read(&bd->bd_gl->gl_revokes);
+		__entry->bd_list_q	= !list_empty(&bd->bd_list);
+		__entry->bd_ail_st_list_q = !list_empty(&bd->bd_ail_st_list);
+		__entry->bd_ail_gl_list_q = !list_empty(&bd->bd_ail_gl_list);
+		__entry->gl_refs	= bd->bd_gl->gl_lockref.count;
+	),
+
+	TP_printk("%u,%u glock %d:%lld blk %lld c:%d lflush:%d ail:%d "
+		  "revokes:%d q:%d/%d/%d ref:%d %s",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+                  __entry->gltype, (unsigned long long)__entry->glnum,
+		  (unsigned long long)__entry->blk, __entry->caller,
+		  __entry->lflush, __entry->ail_count, __entry->revokes,
+		  __entry->bd_list_q, __entry->bd_ail_st_list_q,
+		  __entry->bd_ail_gl_list_q, __entry->gl_refs,
+		  ail_caller(__entry->caller))
+
+);
+
 /* Callback (local or remote) requesting lock demotion */
 TRACE_EVENT(gfs2_demote_rq,
 
diff --git a/fs/gfs2/trans.c b/fs/gfs2/trans.c
index ffe840505082..9aed58f7368d 100644
--- a/fs/gfs2/trans.c
+++ b/fs/gfs2/trans.c
@@ -173,6 +173,7 @@ void gfs2_trans_add_data(struct gfs2_glock *gl, struct buffer_head *bh)
 	set_bit(TR_TOUCHED, &tr->tr_flags);
 	if (list_empty(&bd->bd_list)) {
 		set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
+		trace_gfs2_glock_ail(bd, 1);
 		set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags);
 		gfs2_pin(sdp, bd->bd_bh);
 		tr->tr_num_databuf_new++;
@@ -216,6 +217,7 @@ void gfs2_trans_add_meta(struct gfs2_glock *gl, struct buffer_head *bh)
 	if (!list_empty(&bd->bd_list))
 		goto out_unlock;
 	set_bit(GLF_LFLUSH, &bd->bd_gl->gl_flags);
+	trace_gfs2_glock_ail(bd, 1);
 	set_bit(GLF_DIRTY, &bd->bd_gl->gl_flags);
 	mh = (struct gfs2_meta_header *)bd->bd_bh->b_data;
 	if (unlikely(mh->mh_magic != cpu_to_be32(GFS2_MAGIC))) {
@@ -266,7 +268,7 @@ void gfs2_trans_remove_revoke(struct gfs2_sbd *sdp, u64 blkno, unsigned int len)
 			gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke);
 			sdp->sd_log_num_revoke--;
 			if (bd->bd_gl)
-				gfs2_glock_remove_revoke(bd->bd_gl);
+				gfs2_glock_remove_revoke(bd);
 			kmem_cache_free(gfs2_bufdata_cachep, bd);
 			tr->tr_num_revoke_rm++;
 			if (--n == 0)
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 8/8] gfs2: introduce new gfs2_glock_assert_withdraw
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (6 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management Bob Peterson
@ 2020-05-26 13:05 ` Bob Peterson
  2020-06-05 14:38 ` [Cluster-devel] [PATCH 0/8] Misc Patch Collection Andreas Gruenbacher
  8 siblings, 0 replies; 16+ messages in thread
From: Bob Peterson @ 2020-05-26 13:05 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, asserts based on glocks did not print the glock with
the error. This patch introduces a new macro, gfs2_glock_assert_withdraw
which first prints the glock, then takes the assert.

This also changes a few glock asserts to the new macro.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glock.c | 7 ++++---
 fs/gfs2/glock.h | 9 +++++++++
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 9a5dadc93cfc..64541d8bf9ad 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -164,7 +164,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
 {
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 
-	BUG_ON(atomic_read(&gl->gl_revokes));
+	gfs2_glock_assert_withdraw(gl, atomic_read(&gl->gl_revokes) == 0);
 	rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
 	smp_mb();
 	wake_up_glock(gl);
@@ -626,7 +626,8 @@ __acquires(&gl->gl_lockref.lock)
 		 */
 		if ((atomic_read(&gl->gl_ail_count) != 0) &&
 		    (!cmpxchg(&sdp->sd_log_error, 0, -EIO))) {
-			gfs2_assert_warn(sdp, !atomic_read(&gl->gl_ail_count));
+			gfs2_glock_assert_warn(gl,
+					       !atomic_read(&gl->gl_ail_count));
 			gfs2_dump_glock(NULL, gl, true);
 		}
 		glops->go_inval(gl, target == LM_ST_DEFERRED ? 0 : DIO_METADATA);
@@ -1836,7 +1837,7 @@ void gfs2_glock_finish_truncate(struct gfs2_inode *ip)
 	int ret;
 
 	ret = gfs2_truncatei_resume(ip);
-	gfs2_assert_withdraw(gl->gl_name.ln_sbd, ret == 0);
+	gfs2_glock_assert_withdraw(gl, ret == 0);
 
 	spin_lock(&gl->gl_lockref.lock);
 	clear_bit(GLF_LOCK, &gl->gl_flags);
diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h
index b8adaf80e4c5..8472d5ba0142 100644
--- a/fs/gfs2/glock.h
+++ b/fs/gfs2/glock.h
@@ -205,6 +205,15 @@ extern void gfs2_dump_glock(struct seq_file *seq, struct gfs2_glock *gl,
 #define GLOCK_BUG_ON(gl,x) do { if (unlikely(x)) {		\
 			gfs2_dump_glock(NULL, gl, true);	\
 			BUG(); } } while(0)
+#define gfs2_glock_assert_warn(gl, x) do { if (unlikely(!(x))) {	\
+			gfs2_dump_glock(NULL, gl, true);		\
+			gfs2_assert_warn((gl)->gl_name.ln_sbd, (x)); } } \
+	while(0)
+#define gfs2_glock_assert_withdraw(gl, x) do { if (unlikely(!(x))) {	\
+			gfs2_dump_glock(NULL, gl, true);		\
+			gfs2_assert_withdraw((gl)->gl_name.ln_sbd, (x)); } } \
+	while(0)
+
 extern __printf(2, 3)
 void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck Bob Peterson
@ 2020-06-05 13:24   ` Andreas Gruenbacher
  2020-06-05 14:48     ` Bob Peterson
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Gruenbacher @ 2020-06-05 13:24 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

On Tue, May 26, 2020 at 3:05 PM Bob Peterson <rpeterso@redhat.com> wrote:
> This adds checks for gfs2_log_flush being stuck, similarly to the check
> in gfs2_ail1_flush.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/log.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 1d51b4781bdd..636c82dda68b 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -145,9 +145,6 @@ static void dump_ail_list(struct gfs2_sbd *sdp)
>         struct gfs2_bufdata *bd;
>         struct buffer_head *bh;
>
> -       fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! t=%d\n",
> -              current->journal_info ? 1 : 0);
> -
>         list_for_each_entry_reverse(tr, &sdp->sd_ail1_list, tr_list) {
>                 list_for_each_entry_reverse(bd, &tr->tr_ail1_list,
>                                             bd_ail_st_list) {
> @@ -197,6 +194,8 @@ void gfs2_ail1_flush(struct gfs2_sbd *sdp, struct writeback_control *wbc)
>  restart:
>         ret = 0;
>         if (time_after(jiffies, flush_start + (HZ * 600))) {
> +               fs_err(sdp, "Error: In gfs2_ail1_flush for ten minutes! "
> +                      "t=%d\n", current->journal_info ? 1 : 0);
>                 dump_ail_list(sdp);
>                 goto out;
>         }
> @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
>
>         if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
>                 if (!sdp->sd_log_idle) {
> +                       unsigned long start = jiffies;
> +
>                         for (;;) {
> +                               if (time_after(jiffies, start + (HZ * 600))) {

This should probably have some rate limiting as well, for example:

                                        start = jiffies;

I'm not sure what provides similar rate limiting in gfs2_ail1_flush.

> +                                       fs_err(sdp, "Error: In gfs2_log_flush "
> +                                              "for 10 minutes! t=%d\n",
> +                                              current->journal_info ? 1 : 0);

Please don't break the format string up like that; this makes grepping harder.

> +                                       dump_ail_list(sdp);
> +                                       break;
> +                               }
>                                 gfs2_ail1_start(sdp);
>                                 gfs2_ail1_wait(sdp);
>                                 if (gfs2_ail1_empty(sdp, 0))
> --
> 2.26.2
>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 0/8] Misc Patch Collection
  2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
                   ` (7 preceding siblings ...)
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 8/8] gfs2: introduce new gfs2_glock_assert_withdraw Bob Peterson
@ 2020-06-05 14:38 ` Andreas Gruenbacher
  8 siblings, 0 replies; 16+ messages in thread
From: Andreas Gruenbacher @ 2020-06-05 14:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bob,

On Tue, May 26, 2020 at 3:07 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Andreas expressed some concerns about some of the others. For example, he
> didn't like that the new "status" sysfs file was taking "try" locks, but
> if the lock is held, I don't know of a better way to do this.

walking a linked list that's protected by a spin lock without holding
that spin lock is so fundamentally broken that I don't see why we even
need to discuss it. There's nothing that guarantees that we'll be able
to walk the list.

The spin_is_locked checks this patch adds for reporting the states of
spin locks are practically useless as well.

> He also expressed
> a concern that the new file should be in debugfs rather than sysfs.
> I'm open to opinions. Regardless of where it is, the new debug file is a
> perfect candidate to include in sos reports.

It may be better to include more relevant information in trace points.
That way, we automatically get that information correlated with other
trace events to see what was happening when.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management
  2020-05-26 13:05 ` [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management Bob Peterson
@ 2020-06-05 14:43   ` Andreas Gruenbacher
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Gruenbacher @ 2020-06-05 14:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Bob,

this very much looks like a debugging trace point that has no business
in production code.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-06-05 13:24   ` Andreas Gruenbacher
@ 2020-06-05 14:48     ` Bob Peterson
  2020-06-05 16:09       ` Andreas Gruenbacher
  0 siblings, 1 reply; 16+ messages in thread
From: Bob Peterson @ 2020-06-05 14:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Andreas,

----- Original Message -----
(snip)
> > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > gfs2_glock *gl, u32 flags)
> >
> >         if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> >                 if (!sdp->sd_log_idle) {
> > +                       unsigned long start = jiffies;
> > +
> >                         for (;;) {
> > +                               if (time_after(jiffies, start + (HZ *
> > 600))) {
> 
> This should probably have some rate limiting as well, for example:

Seems unnecessary. If the log flush gets stuck, the message will be printed
once, and at most every 10 minutes.
 
>                                         start = jiffies;
> 
> I'm not sure what provides similar rate limiting in gfs2_ail1_flush.
> 
> > +                                       fs_err(sdp, "Error: In
> > gfs2_log_flush "
> > +                                              "for 10 minutes! t=%d\n",
> > +                                              current->journal_info ? 1 :
> > 0);
> 
> Please don't break the format string up like that; this makes grepping
> harder.

How do you propose I break present it without going over 80 character?
I could #define it as a constant, or put it into a separate function
that has less indentation, I suppose.
 
Bob



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-06-05 14:48     ` Bob Peterson
@ 2020-06-05 16:09       ` Andreas Gruenbacher
  2020-06-05 16:14         ` Bob Peterson
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Gruenbacher @ 2020-06-05 16:09 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson <rpeterso@redhat.com> wrote:
> Hi Andreas,
>
> ----- Original Message -----
> (snip)
> > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > gfs2_glock *gl, u32 flags)
> > >
> > >         if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > >                 if (!sdp->sd_log_idle) {
> > > +                       unsigned long start = jiffies;
> > > +
> > >                         for (;;) {
> > > +                               if (time_after(jiffies, start + (HZ *
> > > 600))) {
> >
> > This should probably have some rate limiting as well, for example:
>
> Seems unnecessary. If the log flush gets stuck, the message will be printed
> once, and at most every 10 minutes.

No, after ten minutes, the message will actually be printed for each
iteration of the loop. That's exactly why I was suggesting the rate
limiting.

> >                                         start = jiffies;
> >
> > I'm not sure what provides similar rate limiting in gfs2_ail1_flush.
> >
> > > +                                       fs_err(sdp, "Error: In
> > > gfs2_log_flush "
> > > +                                              "for 10 minutes! t=%d\n",
> > > +                                              current->journal_info ? 1 :
> > > 0);
> >
> > Please don't break the format string up like that; this makes grepping
> > harder.
>
> How do you propose I break present it without going over 80 character?
> I could #define it as a constant, or put it into a separate function
> that has less indentation, I suppose.

There *is* no hard 80 character limit. The checkpatch warnings
shouldn't push us into making our code worse. Also note that since
commit bdc48fa11e, checkpatch will only warn about lines longer than
100 characters.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-06-05 16:09       ` Andreas Gruenbacher
@ 2020-06-05 16:14         ` Bob Peterson
  2020-06-05 16:18           ` Andreas Gruenbacher
  0 siblings, 1 reply; 16+ messages in thread
From: Bob Peterson @ 2020-06-05 16:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson <rpeterso@redhat.com> wrote:
> > Hi Andreas,
> >
> > ----- Original Message -----
> > (snip)
> > > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > > gfs2_glock *gl, u32 flags)
> > > >
> > > >         if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > > >                 if (!sdp->sd_log_idle) {
> > > > +                       unsigned long start = jiffies;
> > > > +
> > > >                         for (;;) {
> > > > +                               if (time_after(jiffies, start + (HZ *
> > > > 600))) {
> > >
> > > This should probably have some rate limiting as well, for example:
> >
> > Seems unnecessary. If the log flush gets stuck, the message will be printed
> > once, and at most every 10 minutes.
> 
> No, after ten minutes, the message will actually be printed for each
> iteration of the loop. That's exactly why I was suggesting the rate
> limiting.

No, after ten minutes it dumps the ail list so you can see the problem
and exits the loop with "break;".

The next time it enters the loop, it starts with a new value of start
which doesn't expire for another ten minutes.

Bob



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck
  2020-06-05 16:14         ` Bob Peterson
@ 2020-06-05 16:18           ` Andreas Gruenbacher
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Gruenbacher @ 2020-06-05 16:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, Jun 5, 2020 at 6:15 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > On Fri, Jun 5, 2020 at 4:49 PM Bob Peterson <rpeterso@redhat.com> wrote:
> > > Hi Andreas,
> > >
> > > ----- Original Message -----
> > > (snip)
> > > > > @@ -970,7 +969,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> > > > > gfs2_glock *gl, u32 flags)
> > > > >
> > > > >         if (!(flags & GFS2_LOG_HEAD_FLUSH_NORMAL)) {
> > > > >                 if (!sdp->sd_log_idle) {
> > > > > +                       unsigned long start = jiffies;
> > > > > +
> > > > >                         for (;;) {
> > > > > +                               if (time_after(jiffies, start + (HZ *
> > > > > 600))) {
> > > >
> > > > This should probably have some rate limiting as well, for example:
> > >
> > > Seems unnecessary. If the log flush gets stuck, the message will be printed
> > > once, and at most every 10 minutes.
> >
> > No, after ten minutes, the message will actually be printed for each
> > iteration of the loop. That's exactly why I was suggesting the rate
> > limiting.
>
> No, after ten minutes it dumps the ail list so you can see the problem
> and exits the loop with "break;".
>
> The next time it enters the loop, it starts with a new value of start
> which doesn't expire for another ten minutes.

Ok, I misread.

Thanks,
Andreas



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-06-05 16:18 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-26 13:05 [Cluster-devel] [PATCH 0/8] Misc Patch Collection Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 1/8] gfs2: Don't ignore inode write errors during inode_go_sync Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 2/8] gfs2: Allow lock_nolock mount to specify jid=X Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 3/8] gfs2: add memory barriers to gfs2_glock_remove_revoke Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 4/8] gfs2: Add new sysfs file for gfs2 status Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 5/8] gfs2: print mapping->nrpages in glock dump for address space glocks Bob Peterson
2020-05-26 13:05 ` [Cluster-devel] [PATCH 6/8] gfs2: instrumentation wrt log_flush stuck Bob Peterson
2020-06-05 13:24   ` Andreas Gruenbacher
2020-06-05 14:48     ` Bob Peterson
2020-06-05 16:09       ` Andreas Gruenbacher
2020-06-05 16:14         ` Bob Peterson
2020-06-05 16:18           ` Andreas Gruenbacher
2020-05-26 13:05 ` [Cluster-devel] [PATCH 7/8] gfs2: Add new trace point for glock ail management Bob Peterson
2020-06-05 14:43   ` Andreas Gruenbacher
2020-05-26 13:05 ` [Cluster-devel] [PATCH 8/8] gfs2: introduce new gfs2_glock_assert_withdraw Bob Peterson
2020-06-05 14:38 ` [Cluster-devel] [PATCH 0/8] Misc Patch Collection Andreas Gruenbacher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.