[Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
@ 2019-03-19 18:17 Ross Lagerwall
  2019-03-19 18:48 ` Bob Peterson
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Lagerwall @ 2019-03-19 18:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Occasionally during testing, we see the following assertion failure in 
log_pull_tail():

[ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal: 
assertion "atomic_read(&sdp->sd_log_blks_free) <= 
sdp->sd_jdesc->jd_blocks" failed
[ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510

It always seems to happen shortly after journal recovery. I added some 
debug logging at the point of the assertion failure and got the following:

[  862.335390] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Trying to acquire journal lock...
[  862.364578] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Looking at journal...
[  862.475108] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Acquiring the transaction lock...
[  862.475110] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Replaying journal...
[  862.475538] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Replayed 1 of 4 blocks
[  862.475539] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: Found 
3 revoke tags
[  862.484394] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: 
Journal replayed in 149ms [jlck:29ms, jhead:110ms, tlck:0ms, replay:9ms]
[  862.484431] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: recover jid 
0 result success
[  862.484434] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: jid=0: Done
[  862.484545] gfs2: fsid=xapi-clusterd:9f0b3587-c0f3-c8.1: recover 
generation 4 done
[  863.362577] DBG: log_blks_free = 14838 jd_blocks = 8192
[  863.362578] DBG: dist = 6663 new_tail = 1025 sd_log_tail = 2554
[  863.362579] DBG: sd_log_head = 1025 sd_log_head_flush = 1027
[  863.362579] DBG: sd_log_idle = 0
[  863.362580] DBG: sd_ail1_list empty? = yes

It looks to me that something messes with sd_log_tail and 
sd_log_head/sd_log_head_flush during recovery which causes a subsequent 
call to gfs2_log_flush() to hit the assertion.

Any ideas about this?

Thanks,
-- 
Ross Lagerwall



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
  2019-03-19 18:17 [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks Ross Lagerwall
@ 2019-03-19 18:48 ` Bob Peterson
  2019-03-22 18:15   ` Ross Lagerwall
  0 siblings, 1 reply; 6+ messages in thread
From: Bob Peterson @ 2019-03-19 18:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> Hi,
> 
> Occasionally during testing, we see the following assertion failure in
> log_pull_tail():
> 
> [ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal:
> assertion "atomic_read(&sdp->sd_log_blks_free) <=
> sdp->sd_jdesc->jd_blocks" failed
> [ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510
> 
> It always seems to happen shortly after journal recovery. I added some
> debug logging at the point of the assertion failure and got the following:
(snip) 
> Any ideas about this?
> 
> Thanks,
> --
> Ross Lagerwall

Hi Ross,

I've been fighting with/debugging multiple recovery problems for a long time now.
I've done countless (well, okay, thousands of) recovery tests and I can tell
you that gfs2 recovery has some major problems. These problems usually don't
occur when you have a few gfs2 mounts: they're much more likely when you have
lots of gfs2 mounts. I'm using 32 mounts.

The problem you mentioned sounds vaguely familiar, but I can't find anything
directly related. Make sure all your journals are the same size, and see if fsck.gfs2
complains about the journal. Otherwise, it could be a side effect of one of the
recovery issues I'm working on. Do you have other symptoms? Also, make sure
multiple nodes aren't trying to use the same journal because of lock_nolock or
something...I've made that mistake in the past.

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
  2019-03-19 18:48 ` Bob Peterson
@ 2019-03-22 18:15   ` Ross Lagerwall
  2019-03-25 15:29     ` Bob Peterson
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Lagerwall @ 2019-03-22 18:15 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 3/19/19 6:48 PM, Bob Peterson wrote:
> ----- Original Message -----
>> Hi,
>>
>> Occasionally during testing, we see the following assertion failure in
>> log_pull_tail():
>>
>> [ 1104.061245] gfs2: fsid=xapi-clusterd:2d2cc24c-c48a-ca.0: fatal:
>> assertion "atomic_read(&sdp->sd_log_blks_free) <=
>> sdp->sd_jdesc->jd_blocks" failed
>> [ 1104.061245]    function = log_pull_tail, file = fs/gfs2/log.c, line = 510
>>
>> It always seems to happen shortly after journal recovery. I added some
>> debug logging at the point of the assertion failure and got the following:
> (snip)
>> Any ideas about this?
>>
>> Thanks,
>> --
>> Ross Lagerwall
> 
> Hi Ross,
> 
> I've been fighting with/debugging multiple recovery problems for a long time now.
> I've done countless (well, okay, thousands of) recovery tests and I can tell
> you that gfs2 recovery has some major problems. These problems usually don't
> occur when you have a few gfs2 mounts: they're much more likely when you have
> lots of gfs2 mounts. I'm using 32 mounts.
> 
> The problem you mentioned sounds vaguely familiar, but I can't find anything
> directly related. Make sure all your journals are the same size, and see if fsck.gfs2
> complains about the journal. Otherwise, it could be a side effect of one of the
> recovery issues I'm working on. Do you have other symptoms? Also, make sure
> multiple nodes aren't trying to use the same journal because of lock_nolock or
> something...I've made that mistake in the past.
> 

I think I've found the cause of the assertion I was hitting. Recovery 
sets sd_log_flush_head but does not take locks which means a concurrent 
call to gfs2_log_flush() can result in sd_log_head being set to 
sd_log_flush_head. A later call to gfs2_log_flush() will then hit an 
assertion failure in log_pull_tail() because the mismatch between 
sd_log_head and sd_log_tail means too many blocks are freed.

I've worked around it by taking the log_flush lock in the patch below 
and it seems to avoid the problem. However, tracing the recovery process 
I see that it sets sd_log_flush_head and then calls clean_journal() -> 
gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This 
has:

	BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
	       (sdp->sd_log_flush_head != sdp->sd_log_head));

... but sd_log_tail and sd_log_head have not been set by 
gfs2_recover_func() so it might still BUG_ON() during recovery if you're 
particularly unlucky.

I had a look at your "GFS2: Withdraw corruption patches [V2]" series but 
I didn't see anything that might fix this.

If you think this patch is useful then I can send it as a proper patch 
to the list.

Thanks,
Ross

--------------

gfs2: Take log_flush lock during recovery

Recovery sets sd_log_flush_head but does not take any locks which means
a concurrent call to gfs2_log_flush can result in sd_log_head being set
to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
assertion failure in log_pull_tail because the mismatch between
sd_log_head and sd_log_tail means too many blocks are freed.

gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion 
"atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
function = log_pull_tail, file = fs/gfs2/log.c, line = 510

diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c
index 0f501f938d1c..5f7c3271e909 100644
--- a/fs/gfs2/recovery.c
+++ b/fs/gfs2/recovery.c
@@ -498,6 +498,8 @@ void gfs2_recover_func(struct work_struct *work)
  			goto fail_gunlock_thaw;
  		}

+		down_write(&sdp->sd_log_flush_lock);
+
  		t_tlck = ktime_get();
  		fs_info(sdp, "jid=%u: Replaying journal...\n", jd->jd_jid);

@@ -512,6 +514,8 @@ void gfs2_recover_func(struct work_struct *work)

  		clean_journal(jd, &head);

+		up_write(&sdp->sd_log_flush_lock);
+
  		gfs2_glock_dq_uninit(&thaw_gh);
  		t_rep = ktime_get();
  		fs_info(sdp, "jid=%u: Journal replayed in %lldms [jlck:%lldms, "

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
  2019-03-22 18:15   ` Ross Lagerwall
@ 2019-03-25 15:29     ` Bob Peterson
  2019-03-27 14:40       ` Ross Lagerwall
  0 siblings, 1 reply; 6+ messages in thread
From: Bob Peterson @ 2019-03-25 15:29 UTC (permalink / raw)
  To: cluster-devel.redhat.com

----- Original Message -----
> I think I've found the cause of the assertion I was hitting. Recovery
> sets sd_log_flush_head but does not take locks which means a concurrent
> call to gfs2_log_flush() can result in sd_log_head being set to
> sd_log_flush_head. A later call to gfs2_log_flush() will then hit an
> assertion failure in log_pull_tail() because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
> 
> I've worked around it by taking the log_flush lock in the patch below
> and it seems to avoid the problem. However, tracing the recovery process
> I see that it sets sd_log_flush_head and then calls clean_journal() ->
> gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This
> has:
> 
> 	BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
> 	       (sdp->sd_log_flush_head != sdp->sd_log_head));
> 
> ... but sd_log_tail and sd_log_head have not been set by
> gfs2_recover_func() so it might still BUG_ON() during recovery if you're
> particularly unlucky.
> 
> I had a look at your "GFS2: Withdraw corruption patches [V2]" series but
> I didn't see anything that might fix this.
> 
> If you think this patch is useful then I can send it as a proper patch
> to the list.
> 
> Thanks,
> Ross
> 
> --------------
> 
> gfs2: Take log_flush lock during recovery
> 
> Recovery sets sd_log_flush_head but does not take any locks which means
> a concurrent call to gfs2_log_flush can result in sd_log_head being set
> to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
> assertion failure in log_pull_tail because the mismatch between
> sd_log_head and sd_log_tail means too many blocks are freed.
> 
> gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion
> "atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
> function = log_pull_tail, file = fs/gfs2/log.c, line = 510

Hi Ross,

I think you found a valid bug, but that's not the proper solution.
The reason is: I think journal replay and journal flushing should both be
protected by the exclusive (EX) glock taken on the journal itself.

I think the problem may actually be a regression with patch 588bff95c94ef.
Because of that patch, function clean_journal now sets sdp->sd_log_flush_head
but its caller, gfs2_recover_func, is used to recover any node's journal, not
just its own. The bug is that clean_journal should only set sd_log_flush_head
if (and only if) it's replaying its own journal, not someone else's.
If it sets sd_log_flush_head while replaying another node's journal, that
will only lead to a problem like this.

I'll try and whip up another patch and perhaps you can test it for me.

FWIW, I've never seen this problem manifest on my recovery tests, but it
still might be causing some of the weird problems I'm seeing.

Regards,

Bob Peterson
Red Hat File Systems



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
  2019-03-25 15:29     ` Bob Peterson
@ 2019-03-27 14:40       ` Ross Lagerwall
  2019-03-27 17:04         ` Bob Peterson
  0 siblings, 1 reply; 6+ messages in thread
From: Ross Lagerwall @ 2019-03-27 14:40 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 3/25/19 3:29 PM, Bob Peterson wrote:
> ----- Original Message -----
>> I think I've found the cause of the assertion I was hitting. Recovery
>> sets sd_log_flush_head but does not take locks which means a concurrent
>> call to gfs2_log_flush() can result in sd_log_head being set to
>> sd_log_flush_head. A later call to gfs2_log_flush() will then hit an
>> assertion failure in log_pull_tail() because the mismatch between
>> sd_log_head and sd_log_tail means too many blocks are freed.
>>
>> I've worked around it by taking the log_flush lock in the patch below
>> and it seems to avoid the problem. However, tracing the recovery process
>> I see that it sets sd_log_flush_head and then calls clean_journal() ->
>> gfs2_write_log_header() -> gfs2_log_bmap() -> gfs2_log_incr_head(). This
>> has:
>>
>> 	BUG_ON((sdp->sd_log_flush_head == sdp->sd_log_tail) &&
>> 	       (sdp->sd_log_flush_head != sdp->sd_log_head));
>>
>> ... but sd_log_tail and sd_log_head have not been set by
>> gfs2_recover_func() so it might still BUG_ON() during recovery if you're
>> particularly unlucky.
>>
>> I had a look at your "GFS2: Withdraw corruption patches [V2]" series but
>> I didn't see anything that might fix this.
>>
>> If you think this patch is useful then I can send it as a proper patch
>> to the list.
>>
>> Thanks,
>> Ross
>>
>> --------------
>>
>> gfs2: Take log_flush lock during recovery
>>
>> Recovery sets sd_log_flush_head but does not take any locks which means
>> a concurrent call to gfs2_log_flush can result in sd_log_head being set
>> to sd_log_flush_head. A later call to gfs2_log_flush will then hit an
>> assertion failure in log_pull_tail because the mismatch between
>> sd_log_head and sd_log_tail means too many blocks are freed.
>>
>> gfs2: fsid=xapi-clusterd:88a31b8e-4072-b0.1: fatal: assertion
>> "atomic_read(&sdp->sd_log_blks_free) <= sdp->sd_jdesc->jd_blocks" failed
>> function = log_pull_tail, file = fs/gfs2/log.c, line = 510
> 
> Hi Ross,
> 
> I think you found a valid bug, but that's not the proper solution.
> The reason is: I think journal replay and journal flushing should both be
> protected by the exclusive (EX) glock taken on the journal itself.
> 
> I think the problem may actually be a regression with patch 588bff95c94ef.
> Because of that patch, function clean_journal now sets sdp->sd_log_flush_head
> but its caller, gfs2_recover_func, is used to recover any node's journal, not
> just its own. The bug is that clean_journal should only set sd_log_flush_head
> if (and only if) it's replaying its own journal, not someone else's.
> If it sets sd_log_flush_head while replaying another node's journal, that
> will only lead to a problem like this.
> 
> I'll try and whip up another patch and perhaps you can test it for me.
> 
> FWIW, I've never seen this problem manifest on my recovery tests, but it
> still might be causing some of the weird problems I'm seeing.
> 

(Replying on this thread since I didn't get CCed on the patch.)

I still hit the assertion with this patch. gfs2_log_write_header() is 
unconditionally called and it calls gfs2_log_bmap() which changes 
sd_log_flush_head in the success path.

FWIW we hit this assertion periodically during testing but I found that 
I could get it to trigger almost every time during recovery with some 
well-placed sleeps and a kthread that continuously calls gfs2_log_flush().

-----8<----------

diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index ee20ea42e7b5..8bd87082a110 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -820,6 +820,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct 
gfs2_glock *gl, u32 flags)
  	}
  	lops_after_commit(sdp, tr);

+       if (flags & GFS2_LFC_LOGD_TEST) {
+               msleep(1000);
+       }
+
  	gfs2_log_lock(sdp);
  	sdp->sd_log_head = sdp->sd_log_flush_head;
  	sdp->sd_log_blks_reserved = 0;
diff --git a/fs/gfs2/log.h b/fs/gfs2/log.h
index 20241436126d..9d658efdb57a 100644
--- a/fs/gfs2/log.h
+++ b/fs/gfs2/log.h
@@ -79,6 +79,7 @@ extern void gfs2_ail1_flush(struct gfs2_sbd *sdp, 
struct writeback_control *wbc)

  extern void gfs2_log_shutdown(struct gfs2_sbd *sdp);
  extern int gfs2_logd(void *data);
+extern int gfs2_logd2(void *data);
  extern void gfs2_add_revoke(struct gfs2_sbd *sdp, struct gfs2_bufdata 
*bd);
  extern void gfs2_write_revokes(struct gfs2_sbd *sdp);

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a971862b186e..1d7f32c1960a 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -378,6 +378,7 @@ static int init_threads(struct gfs2_sbd *sdp)
  	int error = 0;

  	p = kthread_run(gfs2_logd, sdp, "gfs2_logd");
+	p = kthread_run(gfs2_logd2, sdp, "gfs2_logd2");
  	if (IS_ERR(p)) {
  		error = PTR_ERR(p);
  		fs_err(sdp, "can't start logd thread: %d\n", error);
@@ -1763,3 +1764,17 @@ const struct super_operations gfs2_super_ops = {
  	.show_options		= gfs2_show_options,
  };

+int gfs2_logd2(void *data)
+{
+	struct gfs2_sbd *sdp = data;
+
+	msleep(30000);
+
+	for (;;) {
+		msleep(100);
+		gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
+			       GFS2_LFC_LOGD_TEST);
+	}
+
+	return 0;
+}
diff --git a/include/uapi/linux/gfs2_ondisk.h 
b/include/uapi/linux/gfs2_ondisk.h
index 2dc10a034de1..ebae3a21c98e 100644
--- a/include/uapi/linux/gfs2_ondisk.h
+++ b/include/uapi/linux/gfs2_ondisk.h
@@ -431,6 +431,7 @@ struct gfs2_ea_header {
  #define GFS2_LFC_TRANS_END		0x01000000
  #define GFS2_LFC_LOGD_JFLUSH_REQD	0x02000000
  #define GFS2_LFC_LOGD_AIL_FLUSH_REQD	0x04000000
+#define GFS2_LFC_LOGD_TEST     0x08000000

  #define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash))


Thanks,
-- 
Ross Lagerwall



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks
  2019-03-27 14:40       ` Ross Lagerwall
@ 2019-03-27 17:04         ` Bob Peterson
  0 siblings, 0 replies; 6+ messages in thread
From: Bob Peterson @ 2019-03-27 17:04 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Ross,

----- Original Message -----
> > I think the problem may actually be a regression with patch 588bff95c94ef.
> 
> I still hit the assertion with this patch. gfs2_log_write_header() is
> unconditionally called and it calls gfs2_log_bmap() which changes
> sd_log_flush_head in the success path.

Hi Ross,

Okay, it looks like I really screwed up patch 588bff95c94ef. The patch
I posted yesterday is grossly inadequate and that code path needs some major
fixing. I'm in the process of getting it fixed properly. Sorry I didn't
cc you on that last patch. I'll cc you on the new one when it's ready, but
it may take a couple hours to get it ready.

Regards,

Bob Peterson
Red Hat File Systems

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-03-27 17:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 18:17 [Cluster-devel] Assertion failure: sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks Ross Lagerwall
2019-03-19 18:48 ` Bob Peterson
2019-03-22 18:15   ` Ross Lagerwall
2019-03-25 15:29     ` Bob Peterson
2019-03-27 14:40       ` Ross Lagerwall
2019-03-27 17:04         ` Bob Peterson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.