* [PATCH 0/2] xfs_repair: two fixes
@ 2018-10-23 4:03 Eric Sandeen
2018-10-23 4:04 ` [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check Eric Sandeen
2018-10-23 4:08 ` [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance Eric Sandeen
0 siblings, 2 replies; 7+ messages in thread
From: Eric Sandeen @ 2018-10-23 4:03 UTC (permalink / raw)
To: linux-xfs
Two fixes for bugs that came up in the past 2 days ...
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check
2018-10-23 4:03 [PATCH 0/2] xfs_repair: two fixes Eric Sandeen
@ 2018-10-23 4:04 ` Eric Sandeen
2018-10-23 4:23 ` Darrick J. Wong
2018-10-23 4:08 ` [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance Eric Sandeen
1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2018-10-23 4:04 UTC (permalink / raw)
To: linux-xfs
If we need to realloc the bplist[] array holding buffers for a given
directory, we don't initialize the new slots. This causes a problem
if the directory has holes, because those slots never get filled in.
At the end of the function we call libxfs_putbuf for every non-null
slot, and any uninitialized slots are segfault landmines.
Make sure we initialize all new slots to NULL for this reason.
Reported-by: Oleg Davydov <burunduk3@gmail.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/repair/phase6.c b/repair/phase6.c
index b87c751..9d24a4f 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -2348,6 +2348,8 @@ longform_dir2_entry_check(xfs_mount_t *mp,
db = xfs_dir2_da_to_db(mp->m_dir_geo, da_bno);
if (db >= num_bps) {
+ int last_size = num_bps;
+
/* more data blocks than expected */
num_bps = db + 1;
bplist = realloc(bplist, num_bps * sizeof(struct xfs_buf*));
@@ -2355,6 +2357,9 @@ longform_dir2_entry_check(xfs_mount_t *mp,
do_error(_("realloc failed in %s (%zu bytes)\n"),
__func__,
num_bps * sizeof(struct xfs_buf*));
+ /* Initialize the new elements */
+ for (i = last_size; i < num_bps; i++)
+ bplist[i] = NULL;
}
if (isblock)
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance
2018-10-23 4:03 [PATCH 0/2] xfs_repair: two fixes Eric Sandeen
2018-10-23 4:04 ` [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check Eric Sandeen
@ 2018-10-23 4:08 ` Eric Sandeen
2018-10-23 4:22 ` Darrick J. Wong
2018-10-23 13:57 ` [PATCH 2/2 V2] " Eric Sandeen
1 sibling, 2 replies; 7+ messages in thread
From: Eric Sandeen @ 2018-10-23 4:08 UTC (permalink / raw)
To: linux-xfs
After commit:
15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi
xfs_bunmapi can legitimately return before all work is done.
Sadly nobody told xfs_repair, so it fires an assert:
phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed.
Fix this by calling back in until all work is done, as we do
in the kernel.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
diff --git a/repair/phase6.c b/repair/phase6.c
index e017326..b87c751 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1317,7 +1317,7 @@ longform_dir2_rebuild(
xfs_fileoff_t lastblock;
xfs_inode_t pip;
dir_hash_ent_t *p;
- int done;
+ int done = 0;
/*
* trash directory completely and rebuild from scratch using the
@@ -1352,12 +1352,25 @@ longform_dir2_rebuild(
error);
/* free all data, leaf, node and freespace blocks */
- error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
- &done);
- if (error) {
- do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
- goto out_bmap_cancel;
- }
+ while (!done) {
+ error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
+ 0, &done);
+ if (error) {
+ do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
+ goto out_bmap_cancel;
+ }
+ error = xfs_defer_finish(&tp);
+ if (error) {
+ do_warn(("defer_finish failed -- error - %d\n"), error);
+ goto out_bmap_cancel;
+ }
+ /*
+ * Close out trans and start the next one in the chain.
+ */
+ error = xfs_trans_roll_inode(&tp, ip);
+ if (error)
+ goto out_bmap_cancel;
+ }
ASSERT(done);
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance
2018-10-23 4:08 ` [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance Eric Sandeen
@ 2018-10-23 4:22 ` Darrick J. Wong
2018-10-23 13:57 ` [PATCH 2/2 V2] " Eric Sandeen
1 sibling, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-10-23 4:22 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
On Mon, Oct 22, 2018 at 11:08:49PM -0500, Eric Sandeen wrote:
> After commit:
>
> 15a8bcc xfs: fix multi-AG deadlock in xfs_bunmapi
>
> xfs_bunmapi can legitimately return before all work is done.
> Sadly nobody told xfs_repair, so it fires an assert:
>
> phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed.
>
> Fix this by calling back in until all work is done, as we do
> in the kernel.
Looking at the rest of xfsprogs, I think the other directory-related
xfs_bunmapi callers probably need to be able to roll-and-continue, but
that seems like a topic for (a) the kernel and (b) separate patches.
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
> Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
>
> diff --git a/repair/phase6.c b/repair/phase6.c
> index e017326..b87c751 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -1317,7 +1317,7 @@ longform_dir2_rebuild(
> xfs_fileoff_t lastblock;
> xfs_inode_t pip;
> dir_hash_ent_t *p;
> - int done;
> + int done = 0;
>
> /*
> * trash directory completely and rebuild from scratch using the
> @@ -1352,12 +1352,25 @@ longform_dir2_rebuild(
> error);
>
> /* free all data, leaf, node and freespace blocks */
> - error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
> - &done);
> - if (error) {
> - do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> - goto out_bmap_cancel;
> - }
> + while (!done) {
> + error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
> + 0, &done);
> + if (error) {
> + do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> + goto out_bmap_cancel;
> + }
> + error = xfs_defer_finish(&tp);
error = -libxfs_defer_finish(...);
> + if (error) {
> + do_warn(("defer_finish failed -- error - %d\n"), error);
> + goto out_bmap_cancel;
> + }
> + /*
> + * Close out trans and start the next one in the chain.
> + */
> + error = xfs_trans_roll_inode(&tp, ip);
error = -libxfs_trans_roll_inode(...);
> + if (error)
> + goto out_bmap_cancel;
> + }
>
> ASSERT(done);
This assert can go away since !done is the loop test condition.
--D
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check
2018-10-23 4:04 ` [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check Eric Sandeen
@ 2018-10-23 4:23 ` Darrick J. Wong
0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-10-23 4:23 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs
On Mon, Oct 22, 2018 at 11:04:31PM -0500, Eric Sandeen wrote:
> If we need to realloc the bplist[] array holding buffers for a given
> directory, we don't initialize the new slots. This causes a problem
> if the directory has holes, because those slots never get filled in.
>
> At the end of the function we call libxfs_putbuf for every non-null
> slot, and any uninitialized slots are segfault landmines.
>
> Make sure we initialize all new slots to NULL for this reason.
>
> Reported-by: Oleg Davydov <burunduk3@gmail.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Yay realloc :P
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
>
> diff --git a/repair/phase6.c b/repair/phase6.c
> index b87c751..9d24a4f 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -2348,6 +2348,8 @@ longform_dir2_entry_check(xfs_mount_t *mp,
>
> db = xfs_dir2_da_to_db(mp->m_dir_geo, da_bno);
> if (db >= num_bps) {
> + int last_size = num_bps;
> +
> /* more data blocks than expected */
> num_bps = db + 1;
> bplist = realloc(bplist, num_bps * sizeof(struct xfs_buf*));
> @@ -2355,6 +2357,9 @@ longform_dir2_entry_check(xfs_mount_t *mp,
> do_error(_("realloc failed in %s (%zu bytes)\n"),
> __func__,
> num_bps * sizeof(struct xfs_buf*));
> + /* Initialize the new elements */
> + for (i = last_size; i < num_bps; i++)
> + bplist[i] = NULL;
> }
>
> if (isblock)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2 V2] xfs_repair: continue after xfs_bunmapi deadlock avoidance
2018-10-23 4:08 ` [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance Eric Sandeen
2018-10-23 4:22 ` Darrick J. Wong
@ 2018-10-23 13:57 ` Eric Sandeen
2018-10-23 15:30 ` Darrick J. Wong
1 sibling, 1 reply; 7+ messages in thread
From: Eric Sandeen @ 2018-10-23 13:57 UTC (permalink / raw)
To: linux-xfs; +Cc: Tomasz Torcz
xfs_bunmapi can legitimately return before all work is done, to
avoid deadlocks across AGs.
Sadly nobody told xfs_repair, so it fires an assert if this happens:
phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed.
Fix this by calling back in until all work is done, as we do
in the kernel.
Fixes: 5a8bcc ("xfs: fix multi-AG deadlock in xfs_bunmapi")
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---
V2: libxfs-ify the code, remove now-pointless assert, thanks Darrick!
diff --git a/repair/phase6.c b/repair/phase6.c
index e017326..bdbbbaa 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1317,7 +1317,7 @@ longform_dir2_rebuild(
xfs_fileoff_t lastblock;
xfs_inode_t pip;
dir_hash_ent_t *p;
- int done;
+ int done = 0;
/*
* trash directory completely and rebuild from scratch using the
@@ -1352,14 +1352,25 @@ longform_dir2_rebuild(
error);
/* free all data, leaf, node and freespace blocks */
- error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
- &done);
- if (error) {
- do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
- goto out_bmap_cancel;
- }
-
- ASSERT(done);
+ while (!done) {
+ error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
+ 0, &done);
+ if (error) {
+ do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
+ goto out_bmap_cancel;
+ }
+ error = -libxfs_defer_finish(&tp);
+ if (error) {
+ do_warn(("defer_finish failed -- error - %d\n"), error);
+ goto out_bmap_cancel;
+ }
+ /*
+ * Close out trans and start the next one in the chain.
+ */
+ error = -libxfs_trans_roll_inode(&tp, ip);
+ if (error)
+ goto out_bmap_cancel;
+ }
error = -libxfs_dir_init(tp, ip, &pip);
if (error) {
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2 V2] xfs_repair: continue after xfs_bunmapi deadlock avoidance
2018-10-23 13:57 ` [PATCH 2/2 V2] " Eric Sandeen
@ 2018-10-23 15:30 ` Darrick J. Wong
0 siblings, 0 replies; 7+ messages in thread
From: Darrick J. Wong @ 2018-10-23 15:30 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-xfs, Tomasz Torcz
On Tue, Oct 23, 2018 at 08:57:03AM -0500, Eric Sandeen wrote:
> xfs_bunmapi can legitimately return before all work is done, to
> avoid deadlocks across AGs.
>
> Sadly nobody told xfs_repair, so it fires an assert if this happens:
>
> phase6.c:1410: longform_dir2_rebuild: Assertion `done' failed.
>
> Fix this by calling back in until all work is done, as we do
> in the kernel.
>
> Fixes: 5a8bcc ("xfs: fix multi-AG deadlock in xfs_bunmapi")
> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1641116
> Reported-by: Tomasz Torcz <tomek@pipebreaker.pl>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
>
> V2: libxfs-ify the code, remove now-pointless assert, thanks Darrick!
>
>
> diff --git a/repair/phase6.c b/repair/phase6.c
> index e017326..bdbbbaa 100644
> --- a/repair/phase6.c
> +++ b/repair/phase6.c
> @@ -1317,7 +1317,7 @@ longform_dir2_rebuild(
> xfs_fileoff_t lastblock;
> xfs_inode_t pip;
> dir_hash_ent_t *p;
> - int done;
> + int done = 0;
>
> /*
> * trash directory completely and rebuild from scratch using the
> @@ -1352,14 +1352,25 @@ longform_dir2_rebuild(
> error);
>
> /* free all data, leaf, node and freespace blocks */
> - error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA, 0,
> - &done);
> - if (error) {
> - do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> - goto out_bmap_cancel;
> - }
> -
> - ASSERT(done);
> + while (!done) {
> + error = -libxfs_bunmapi(tp, ip, 0, lastblock, XFS_BMAPI_METADATA,
> + 0, &done);
> + if (error) {
> + do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
> + goto out_bmap_cancel;
> + }
> + error = -libxfs_defer_finish(&tp);
> + if (error) {
> + do_warn(("defer_finish failed -- error - %d\n"), error);
> + goto out_bmap_cancel;
> + }
> + /*
> + * Close out trans and start the next one in the chain.
> + */
> + error = -libxfs_trans_roll_inode(&tp, ip);
> + if (error)
No do_warn() here? :)
With that,
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> + goto out_bmap_cancel;
> + }
>
> error = -libxfs_dir_init(tp, ip, &pip);
> if (error) {
>
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-10-23 23:54 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 4:03 [PATCH 0/2] xfs_repair: two fixes Eric Sandeen
2018-10-23 4:04 ` [PATCH 1/2] xfs_repair: initialize realloced bplist in longform_dir2_entry_check Eric Sandeen
2018-10-23 4:23 ` Darrick J. Wong
2018-10-23 4:08 ` [PATCH 2/2] xfs_repair: continue after xfs_bunmapi deadlock avoidance Eric Sandeen
2018-10-23 4:22 ` Darrick J. Wong
2018-10-23 13:57 ` [PATCH 2/2 V2] " Eric Sandeen
2018-10-23 15:30 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).