* extfree log recovery and owner (rmapbt) updates @ 2017-12-05 18:55 Brian Foster 2017-12-05 23:32 ` Darrick J. Wong ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Brian Foster @ 2017-12-05 18:55 UTC (permalink / raw) To: linux-xfs Hi, I've been playing around with deferring AGFL block frees and noticed something funky going on with xfs_bmap_add_free() and owner info. We pass the extent owner info to xfs_bmap_add_free() in various places to transfer this info to the deferred free processing context. Subsequently, xfs_trans_free_extent() -> ... -> xfs_free_ag_extent() uses the owner info to remove the rmapbt entry associated with the extent. The rmapbt update occurs at the time the extent is freed. If the system crashes immediately after the initial transaction commits, however, EFI recovery calls xfs_trans_free_extent() with an "unknown" owner. I see a reference to using a "wildcard" in this context in some of the old rmapbt commits, but it looks like this codepath skips the rmapbt update altogether if oi_owner == XFS_RMAP_OWN_UNKNOWN. A quick test that shuts down the fs immediately after a transaction commits that logs an EFI for a freed inode chunk leaves the fs inconsistent after log recovery, with a bit of a cryptic message from repair: unknown block state, ag 0, block 24 It does look like the associated rmapbt entry still exists in the tree even though the extent has been freed (from xfs_db -c fsmap): 5: 0/24 len 8 owner -7 offset 0 bmbt 0 attrfork 0 extflag 0 So what is supposed to be going on here? Should the rmapbt entry be removed unconditionally during recovery, or should a separate rmapbt update item be deferred (as in the bunmapi case, for example) rather than passing oinfo along with the TYPE_FREE deferred item? Or something else entirely..? Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: extfree log recovery and owner (rmapbt) updates 2017-12-05 18:55 extfree log recovery and owner (rmapbt) updates Brian Foster @ 2017-12-05 23:32 ` Darrick J. Wong 2017-12-05 23:34 ` [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests Darrick J. Wong 2017-12-05 23:49 ` extfree log recovery and owner (rmapbt) updates Dave Chinner 2 siblings, 0 replies; 9+ messages in thread From: Darrick J. Wong @ 2017-12-05 23:32 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Tue, Dec 05, 2017 at 01:55:24PM -0500, Brian Foster wrote: > Hi, > > I've been playing around with deferring AGFL block frees and noticed > something funky going on with xfs_bmap_add_free() and owner info. We > pass the extent owner info to xfs_bmap_add_free() in various places to > transfer this info to the deferred free processing context. > Subsequently, xfs_trans_free_extent() -> ... -> xfs_free_ag_extent() > uses the owner info to remove the rmapbt entry associated with the > extent. The rmapbt update occurs at the time the extent is freed. > > If the system crashes immediately after the initial transaction commits, > however, EFI recovery calls xfs_trans_free_extent() with an "unknown" > owner. I see a reference to using a "wildcard" in this context in some > of the old rmapbt commits, but it looks like this codepath skips the > rmapbt update altogether if oi_owner == XFS_RMAP_OWN_UNKNOWN. A quick > test that shuts down the fs immediately after a transaction commits that > logs an EFI for a freed inode chunk leaves the fs inconsistent after log > recovery, with a bit of a cryptic message from repair: > > unknown block state, ag 0, block 24 > > It does look like the associated rmapbt entry still exists in the tree > even though the extent has been freed (from xfs_db -c fsmap): > > 5: 0/24 len 8 owner -7 offset 0 bmbt 0 attrfork 0 extflag 0 > > So what is supposed to be going on here? Should the rmapbt entry be > removed unconditionally during recovery, or should a separate rmapbt > update item be deferred (as in the bunmapi case, for example) rather > than passing oinfo along with the TYPE_FREE deferred item? Or something > else entirely..? Ooops, this whole thing is kind of a mess... so right now we have two mechanisms to modify an rmap -- xfs_rmap_{map,unmap,alloc,free}_extent, which uses the deferred ops mechanism; or xfs_rmap_{alloc,free}, which makes the change directly in the current transaction. The first is used in the bmapping code for file blocks, and the second are used everywhere else, particularly if we aren't set up for a transaction roll, like when we're doing non-bmapi allocations. So the simple fix to this is that we fix xfs_free_extent and xfs_rmap_unmap so that OWN_UNKNOWN truly is a wildcard "remove rmap, don't care who owns it" operation. So long as the wildcard operation comes after any other attempt to remove the rmap, this will be fine. Note that xfs_bmap_add_free passes owner info into the in-core EFI, which means that freeing an extent can also remove the rmap. For operations where we explicitly free the rmap via deferred ops (bmapi and cow) we don't want this, so we need to be able to pass a NULL owner info to xfs_bmap_add_free. The EFI can record this as an OWN_NULL owner and skip freeing the rmap, though this is potentially fraught because log recovery won't know this and will try the wildcard rmap deletion anyway, exposing another bug. When we're cleaning up CoW extents, we create a CUI to remove the CoW extent from the refcountbt and then create an EFI to free the extent. The CUI finish routine queues an RUI to delete the rmap, which is then placed after the EFI. CUI -> EFI -> RUI, which is the wrong order. This causes a double-free of the rmap, first with the wildcard and again with OWN_COW, and the OWN_COW blows asserts. So, that piece has to be rearranged to log CUI -> RUI -> EFI. I'll send an RFC patch as a reply to this one... --- The other solution, of course, is to separate rmap and allocation completely -- every allocation requires a dfops structure, and the deferred rmaps must be queued directly by whomever calls the allocator. This has the disadvantage that now everything in the allocation path is a deferred op, whereas right now we can pack some of those into a single transaction. --D > Brian > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-05 18:55 extfree log recovery and owner (rmapbt) updates Brian Foster 2017-12-05 23:32 ` Darrick J. Wong @ 2017-12-05 23:34 ` Darrick J. Wong 2017-12-06 14:14 ` Brian Foster 2017-12-05 23:49 ` extfree log recovery and owner (rmapbt) updates Dave Chinner 2 siblings, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2017-12-05 23:34 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs From: Darrick J. Wong <darrick.wong@oracle.com> Calling xfs_rmap_free with an unknown owner is supposed to remove any rmaps covering that range regardless of owner. This is used by the EFI recovery code to say "we're freeing this, it mustn't be owned by anything anymore", but for whatever reason xfs_free_ag_extent filters them out. Therefore, remove the filter and make xfs_rmap_unmap actually treat it as a wildcard owner -- free anything that's already there, and if there's no owner at all then that's fine too. There are two existing callers of bmap_add_free that take care the rmap deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap cleanup; convert these to use OWN_NULL, and ensure that the RUI gets added to the defer ops ahead of any EFI. Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, growfs will have to consult directly with the rmap to ensure that there aren't any rmaps in the grown region. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> --- fs/xfs/libxfs/xfs_alloc.c | 2 +- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_refcount.c | 52 +++++++++++++++--------------------------- fs/xfs/libxfs/xfs_rmap.c | 15 +++++++++--- fs/xfs/xfs_fsops.c | 5 ++++ 5 files changed, 37 insertions(+), 39 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index a840028..0f260eeb 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -1696,7 +1696,7 @@ xfs_free_ag_extent( bno_cur = cnt_cur = NULL; mp = tp->t_mountp; - if (oinfo->oi_owner != XFS_RMAP_OWN_UNKNOWN) { + if (oinfo->oi_owner != XFS_RMAP_OWN_NULL) { error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo); if (error) goto error0; diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 16df627..89bb3d9 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -573,7 +573,7 @@ xfs_bmap_add_free( if (oinfo) new->xefi_oinfo = *oinfo; else - xfs_rmap_skip_owner_update(&new->xefi_oinfo); + xfs_rmap_ag_owner(&new->xefi_oinfo, XFS_RMAP_OWN_NULL); trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0, XFS_FSB_TO_AGBNO(mp, bno), len); xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 73f8058..9103be0 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1505,27 +1505,12 @@ __xfs_refcount_cow_alloc( xfs_extlen_t aglen, struct xfs_defer_ops *dfops) { - int error; - trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno, agbno, aglen); /* Add refcount btree reservation */ - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, + return xfs_refcount_adjust_cow(rcur, agbno, aglen, XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops); - if (error) - return error; - - /* Add rmap entry */ - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { - error = xfs_rmap_alloc_extent(rcur->bc_mp, dfops, - rcur->bc_private.a.agno, - agbno, aglen, XFS_RMAP_OWN_COW); - if (error) - return error; - } - - return error; } /* @@ -1538,27 +1523,12 @@ __xfs_refcount_cow_free( xfs_extlen_t aglen, struct xfs_defer_ops *dfops) { - int error; - trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno, agbno, aglen); /* Remove refcount btree reservation */ - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, + return xfs_refcount_adjust_cow(rcur, agbno, aglen, XFS_REFCOUNT_ADJUST_COW_FREE, dfops); - if (error) - return error; - - /* Remove rmap entry */ - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { - error = xfs_rmap_free_extent(rcur->bc_mp, dfops, - rcur->bc_private.a.agno, - agbno, aglen, XFS_RMAP_OWN_COW); - if (error) - return error; - } - - return error; } /* Record a CoW staging extent in the refcount btree. */ @@ -1569,11 +1539,19 @@ xfs_refcount_alloc_cow_extent( xfs_fsblock_t fsb, xfs_extlen_t len) { + int error; + if (!xfs_sb_version_hasreflink(&mp->m_sb)) return 0; - return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, + error = __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, fsb, len); + if (error) + return error; + + /* Add rmap entry */ + return xfs_rmap_alloc_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1584,9 +1562,17 @@ xfs_refcount_free_cow_extent( xfs_fsblock_t fsb, xfs_extlen_t len) { + int error; + if (!xfs_sb_version_hasreflink(&mp->m_sb)) return 0; + /* Remove rmap entry */ + error = xfs_rmap_free_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + if (error) + return error; + return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 5f3a3d9..fd0e630 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -484,10 +484,17 @@ xfs_rmap_unmap( XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); - /* Make sure the extent we found covers the entire freeing range. */ - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && - ltrec.rm_startblock + ltrec.rm_blockcount >= - bno + len, out_error); + /* + * Make sure the extent we found covers the entire freeing range. + * If this is a wildcard free, we're already done, otherwise there's + * something wrong with the rmapbt. + */ + if (ltrec.rm_startblock > bno || + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { + if (owner == XFS_RMAP_OWN_UNKNOWN) + goto out_done; + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); + } /* Make sure the owner matches what we expect to find in the tree. */ XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 8f22fc5..60a2e12 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -571,6 +571,11 @@ xfs_growfs_data_private( * this doesn't actually exist in the rmap btree. */ xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); + error = xfs_rmap_free(tp, bp, agno, + be32_to_cpu(agf->agf_length) - new, + new, &oinfo); + if (error) + goto error0; error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno, be32_to_cpu(agf->agf_length) - new), ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-05 23:34 ` [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests Darrick J. Wong @ 2017-12-06 14:14 ` Brian Foster 2017-12-06 17:53 ` Darrick J. Wong 0 siblings, 1 reply; 9+ messages in thread From: Brian Foster @ 2017-12-06 14:14 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@oracle.com> > > Calling xfs_rmap_free with an unknown owner is supposed to remove any > rmaps covering that range regardless of owner. This is used by the EFI > recovery code to say "we're freeing this, it mustn't be owned by > anything anymore", but for whatever reason xfs_free_ag_extent filters > them out. > > Therefore, remove the filter and make xfs_rmap_unmap actually treat it > as a wildcard owner -- free anything that's already there, and if > there's no owner at all then that's fine too. > > There are two existing callers of bmap_add_free that take care the rmap > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets > added to the defer ops ahead of any EFI. > > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, > growfs will have to consult directly with the rmap to ensure that there > aren't any rmaps in the grown region. > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > --- Thanks... this resolves the log recovery problem on a quick test. > fs/xfs/libxfs/xfs_alloc.c | 2 +- > fs/xfs/libxfs/xfs_bmap.c | 2 +- > fs/xfs/libxfs/xfs_refcount.c | 52 +++++++++++++++--------------------------- > fs/xfs/libxfs/xfs_rmap.c | 15 +++++++++--- > fs/xfs/xfs_fsops.c | 5 ++++ > 5 files changed, 37 insertions(+), 39 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index a840028..0f260eeb 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -1696,7 +1696,7 @@ xfs_free_ag_extent( > bno_cur = cnt_cur = NULL; > mp = tp->t_mountp; > > - if (oinfo->oi_owner != XFS_RMAP_OWN_UNKNOWN) { > + if (oinfo->oi_owner != XFS_RMAP_OWN_NULL) { > error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo); > if (error) > goto error0; > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > index 16df627..89bb3d9 100644 > --- a/fs/xfs/libxfs/xfs_bmap.c > +++ b/fs/xfs/libxfs/xfs_bmap.c > @@ -573,7 +573,7 @@ xfs_bmap_add_free( > if (oinfo) > new->xefi_oinfo = *oinfo; > else > - xfs_rmap_skip_owner_update(&new->xefi_oinfo); > + xfs_rmap_ag_owner(&new->xefi_oinfo, XFS_RMAP_OWN_NULL); So what is the difference now between xfs_rmap_skip_owner_update(), which sets OWN_UNKNOWN, and OWN_NULL, which skips owner updates in certain cases? Should we be using OWN_NULL consistently to skip owner updates (not that UNKNOWN makes much sense in some of the other cases, like allocation). > trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0, > XFS_FSB_TO_AGBNO(mp, bno), len); > xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c > index 73f8058..9103be0 100644 > --- a/fs/xfs/libxfs/xfs_refcount.c > +++ b/fs/xfs/libxfs/xfs_refcount.c > @@ -1505,27 +1505,12 @@ __xfs_refcount_cow_alloc( > xfs_extlen_t aglen, > struct xfs_defer_ops *dfops) > { > - int error; > - > trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno, > agbno, aglen); > > /* Add refcount btree reservation */ > - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, > + return xfs_refcount_adjust_cow(rcur, agbno, aglen, > XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops); > - if (error) > - return error; > - > - /* Add rmap entry */ > - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { > - error = xfs_rmap_alloc_extent(rcur->bc_mp, dfops, > - rcur->bc_private.a.agno, > - agbno, aglen, XFS_RMAP_OWN_COW); > - if (error) > - return error; > - } > - > - return error; > } I think the refcount fixup probably warrants an independent patch with a more detailed commit log around the ordering requirement and how this changes behavior. > > /* > @@ -1538,27 +1523,12 @@ __xfs_refcount_cow_free( > xfs_extlen_t aglen, > struct xfs_defer_ops *dfops) > { > - int error; > - > trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno, > agbno, aglen); > > /* Remove refcount btree reservation */ > - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, > + return xfs_refcount_adjust_cow(rcur, agbno, aglen, > XFS_REFCOUNT_ADJUST_COW_FREE, dfops); xfs_refcount_finish_one() -> xfs_refcount_cow_[alloc|free]() -> xfs_refcount_adjust_cow() -> ... Hmm, seems like there's opportunity for more cleanup here. Do we really need separate xfs_refcount_cow_*() functions just for tracepoints? Seems like we could just fold these into xfs_refcount_finish_one(). > - if (error) > - return error; > - > - /* Remove rmap entry */ > - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { > - error = xfs_rmap_free_extent(rcur->bc_mp, dfops, > - rcur->bc_private.a.agno, > - agbno, aglen, XFS_RMAP_OWN_COW); > - if (error) > - return error; > - } > - > - return error; > } > > /* Record a CoW staging extent in the refcount btree. */ > @@ -1569,11 +1539,19 @@ xfs_refcount_alloc_cow_extent( > xfs_fsblock_t fsb, > xfs_extlen_t len) > { > + int error; > + > if (!xfs_sb_version_hasreflink(&mp->m_sb)) > return 0; > > - return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, > + error = __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, > fsb, len); > + if (error) > + return error; > + > + /* Add rmap entry */ > + return xfs_rmap_alloc_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), > + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); > } > > /* Forget a CoW staging event in the refcount btree. */ > @@ -1584,9 +1562,17 @@ xfs_refcount_free_cow_extent( > xfs_fsblock_t fsb, > xfs_extlen_t len) > { > + int error; > + > if (!xfs_sb_version_hasreflink(&mp->m_sb)) > return 0; > > + /* Remove rmap entry */ > + error = xfs_rmap_free_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), > + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); > + if (error) > + return error; > + > return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_FREE_COW, > fsb, len); > } > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > index 5f3a3d9..fd0e630 100644 > --- a/fs/xfs/libxfs/xfs_rmap.c > +++ b/fs/xfs/libxfs/xfs_rmap.c > @@ -484,10 +484,17 @@ xfs_rmap_unmap( > XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == > (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); > > - /* Make sure the extent we found covers the entire freeing range. */ > - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > - ltrec.rm_startblock + ltrec.rm_blockcount >= > - bno + len, out_error); > + /* > + * Make sure the extent we found covers the entire freeing range. > + * If this is a wildcard free, we're already done, otherwise there's > + * something wrong with the rmapbt. > + */ What does this mean by "we're already done?" This logic appears to mean that we don't do anything (as opposed to throwing an error). I think the comment would be more clear if it pointed out that/why we have nothing to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and we essentially didn't find a match..? > + if (ltrec.rm_startblock > bno || > + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { > + if (owner == XFS_RMAP_OWN_UNKNOWN) > + goto out_done; > + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); > + } > Also... unrelated, but is this check immediately below really intending to ignore owner inconsistencies for all !inode owners? > /* Make sure the owner matches what we expect to find in the tree. */ > XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > index 8f22fc5..60a2e12 100644 > --- a/fs/xfs/xfs_fsops.c > +++ b/fs/xfs/xfs_fsops.c > @@ -571,6 +571,11 @@ xfs_growfs_data_private( > * this doesn't actually exist in the rmap btree. > */ > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > + error = xfs_rmap_free(tp, bp, agno, > + be32_to_cpu(agf->agf_length) - new, > + new, &oinfo); > + if (error) > + goto error0; OWN_NULL makes sense from the perspective of needing to avoid some error down in the free code where we need to free some space without needing to remove an owner, but what is the purpose of the above? It doesn't look like this really does anything beyond checking that the associated space is beyond the end of the rmapbt. If that's the intent, then it probably makes sense to update this comment as well. Brian > error = xfs_free_extent(tp, > XFS_AGB_TO_FSB(mp, agno, > be32_to_cpu(agf->agf_length) - new), > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-06 14:14 ` Brian Foster @ 2017-12-06 17:53 ` Darrick J. Wong 2017-12-06 20:49 ` Brian Foster 0 siblings, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2017-12-06 17:53 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Wed, Dec 06, 2017 at 09:14:07AM -0500, Brian Foster wrote: > On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > Calling xfs_rmap_free with an unknown owner is supposed to remove any > > rmaps covering that range regardless of owner. This is used by the EFI > > recovery code to say "we're freeing this, it mustn't be owned by > > anything anymore", but for whatever reason xfs_free_ag_extent filters > > them out. > > > > Therefore, remove the filter and make xfs_rmap_unmap actually treat it > > as a wildcard owner -- free anything that's already there, and if > > there's no owner at all then that's fine too. > > > > There are two existing callers of bmap_add_free that take care the rmap > > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap > > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets > > added to the defer ops ahead of any EFI. > > > > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, > > growfs will have to consult directly with the rmap to ensure that there > > aren't any rmaps in the grown region. > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > --- > > Thanks... this resolves the log recovery problem on a quick test. > > > fs/xfs/libxfs/xfs_alloc.c | 2 +- > > fs/xfs/libxfs/xfs_bmap.c | 2 +- > > fs/xfs/libxfs/xfs_refcount.c | 52 +++++++++++++++--------------------------- > > fs/xfs/libxfs/xfs_rmap.c | 15 +++++++++--- > > fs/xfs/xfs_fsops.c | 5 ++++ > > 5 files changed, 37 insertions(+), 39 deletions(-) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index a840028..0f260eeb 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -1696,7 +1696,7 @@ xfs_free_ag_extent( > > bno_cur = cnt_cur = NULL; > > mp = tp->t_mountp; > > > > - if (oinfo->oi_owner != XFS_RMAP_OWN_UNKNOWN) { > > + if (oinfo->oi_owner != XFS_RMAP_OWN_NULL) { > > error = xfs_rmap_free(tp, agbp, agno, bno, len, oinfo); > > if (error) > > goto error0; > > diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c > > index 16df627..89bb3d9 100644 > > --- a/fs/xfs/libxfs/xfs_bmap.c > > +++ b/fs/xfs/libxfs/xfs_bmap.c > > @@ -573,7 +573,7 @@ xfs_bmap_add_free( > > if (oinfo) > > new->xefi_oinfo = *oinfo; > > else > > - xfs_rmap_skip_owner_update(&new->xefi_oinfo); > > + xfs_rmap_ag_owner(&new->xefi_oinfo, XFS_RMAP_OWN_NULL); > > So what is the difference now between xfs_rmap_skip_owner_update(), > which sets OWN_UNKNOWN, and OWN_NULL, which skips owner updates in > certain cases? Should we be using OWN_NULL consistently to skip owner > updates (not that UNKNOWN makes much sense in some of the other cases, > like allocation). Yeah, there's a bunch of cleanups that I was intending to do (most of which you've caught below) prior to making a non-RFC submission. > > trace_xfs_bmap_free_defer(mp, XFS_FSB_TO_AGNO(mp, bno), 0, > > XFS_FSB_TO_AGBNO(mp, bno), len); > > xfs_defer_add(dfops, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); > > diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c > > index 73f8058..9103be0 100644 > > --- a/fs/xfs/libxfs/xfs_refcount.c > > +++ b/fs/xfs/libxfs/xfs_refcount.c > > @@ -1505,27 +1505,12 @@ __xfs_refcount_cow_alloc( > > xfs_extlen_t aglen, > > struct xfs_defer_ops *dfops) > > { > > - int error; > > - > > trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_private.a.agno, > > agbno, aglen); > > > > /* Add refcount btree reservation */ > > - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, > > + return xfs_refcount_adjust_cow(rcur, agbno, aglen, > > XFS_REFCOUNT_ADJUST_COW_ALLOC, dfops); > > - if (error) > > - return error; > > - > > - /* Add rmap entry */ > > - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { > > - error = xfs_rmap_alloc_extent(rcur->bc_mp, dfops, > > - rcur->bc_private.a.agno, > > - agbno, aglen, XFS_RMAP_OWN_COW); > > - if (error) > > - return error; > > - } > > - > > - return error; > > } > > I think the refcount fixup probably warrants an independent patch with a > more detailed commit log around the ordering requirement and how this > changes behavior. Yep. > > > > /* > > @@ -1538,27 +1523,12 @@ __xfs_refcount_cow_free( > > xfs_extlen_t aglen, > > struct xfs_defer_ops *dfops) > > { > > - int error; > > - > > trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_private.a.agno, > > agbno, aglen); > > > > /* Remove refcount btree reservation */ > > - error = xfs_refcount_adjust_cow(rcur, agbno, aglen, > > + return xfs_refcount_adjust_cow(rcur, agbno, aglen, > > XFS_REFCOUNT_ADJUST_COW_FREE, dfops); > > xfs_refcount_finish_one() -> xfs_refcount_cow_[alloc|free]() -> > xfs_refcount_adjust_cow() -> ... > > Hmm, seems like there's opportunity for more cleanup here. Do we really > need separate xfs_refcount_cow_*() functions just for tracepoints? Seems > like we could just fold these into xfs_refcount_finish_one(). Yep. > > - if (error) > > - return error; > > - > > - /* Remove rmap entry */ > > - if (xfs_sb_version_hasrmapbt(&rcur->bc_mp->m_sb)) { > > - error = xfs_rmap_free_extent(rcur->bc_mp, dfops, > > - rcur->bc_private.a.agno, > > - agbno, aglen, XFS_RMAP_OWN_COW); > > - if (error) > > - return error; > > - } > > - > > - return error; > > } > > > > /* Record a CoW staging extent in the refcount btree. */ > > @@ -1569,11 +1539,19 @@ xfs_refcount_alloc_cow_extent( > > xfs_fsblock_t fsb, > > xfs_extlen_t len) > > { > > + int error; > > + > > if (!xfs_sb_version_hasreflink(&mp->m_sb)) > > return 0; > > > > - return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, > > + error = __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_ALLOC_COW, > > fsb, len); > > + if (error) > > + return error; > > + > > + /* Add rmap entry */ > > + return xfs_rmap_alloc_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), > > + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); > > } > > > > /* Forget a CoW staging event in the refcount btree. */ > > @@ -1584,9 +1562,17 @@ xfs_refcount_free_cow_extent( > > xfs_fsblock_t fsb, > > xfs_extlen_t len) > > { > > + int error; > > + > > if (!xfs_sb_version_hasreflink(&mp->m_sb)) > > return 0; > > > > + /* Remove rmap entry */ > > + error = xfs_rmap_free_extent(mp, dfops, XFS_FSB_TO_AGNO(mp, fsb), > > + XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); > > + if (error) > > + return error; > > + > > return __xfs_refcount_add(mp, dfops, XFS_REFCOUNT_FREE_COW, > > fsb, len); > > } > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > > index 5f3a3d9..fd0e630 100644 > > --- a/fs/xfs/libxfs/xfs_rmap.c > > +++ b/fs/xfs/libxfs/xfs_rmap.c > > @@ -484,10 +484,17 @@ xfs_rmap_unmap( > > XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == > > (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); > > > > - /* Make sure the extent we found covers the entire freeing range. */ > > - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > > - ltrec.rm_startblock + ltrec.rm_blockcount >= > > - bno + len, out_error); > > + /* > > + * Make sure the extent we found covers the entire freeing range. > > + * If this is a wildcard free, we're already done, otherwise there's > > + * something wrong with the rmapbt. > > + */ > > What does this mean by "we're already done?" This logic appears to mean > that we don't do anything (as opposed to throwing an error). I think the > comment would be more clear if it pointed out that/why we have nothing > to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and > we essentially didn't find a match..? "Make sure the extent we found covers the entire freeing range. Passing in an owner of OWN_UNKNOWN means that the caller wants to remove any reverse mapping that may exist for this range of blocks regardless of owner; if there are no mappings at all, we're done." > > + if (ltrec.rm_startblock > bno || > > + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { > > + if (owner == XFS_RMAP_OWN_UNKNOWN) > > + goto out_done; > > + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); > > + } > > > > Also... unrelated, but is this check immediately below really intending > to ignore owner inconsistencies for all !inode owners? I had my eye on that one too, though I think that could be a freestanding cleanup. > > /* Make sure the owner matches what we expect to find in the tree. */ > > XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > index 8f22fc5..60a2e12 100644 > > --- a/fs/xfs/xfs_fsops.c > > +++ b/fs/xfs/xfs_fsops.c > > @@ -571,6 +571,11 @@ xfs_growfs_data_private( > > * this doesn't actually exist in the rmap btree. > > */ > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > > + error = xfs_rmap_free(tp, bp, agno, > > + be32_to_cpu(agf->agf_length) - new, > > + new, &oinfo); > > + if (error) > > + goto error0; > > OWN_NULL makes sense from the perspective of needing to avoid some error > down in the free code where we need to free some space without needing > to remove an owner, but what is the purpose of the above? It doesn't > look like this really does anything beyond checking that the associated > space is beyond the end of the rmapbt. If that's the intent, then it > probably makes sense to update this comment as well. Yes, that's exactly the intent. Hmm, come to think of it, the rmap xref patch adds a xfs_rmap_has_record helper that does exactly what we want here (decides if there are any records covering this range). --D > Brian > > > error = xfs_free_extent(tp, > > XFS_AGB_TO_FSB(mp, agno, > > be32_to_cpu(agf->agf_length) - new), > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-06 17:53 ` Darrick J. Wong @ 2017-12-06 20:49 ` Brian Foster 2017-12-06 22:06 ` Darrick J. Wong 0 siblings, 1 reply; 9+ messages in thread From: Brian Foster @ 2017-12-06 20:49 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Wed, Dec 06, 2017 at 09:53:00AM -0800, Darrick J. Wong wrote: > On Wed, Dec 06, 2017 at 09:14:07AM -0500, Brian Foster wrote: > > On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > Calling xfs_rmap_free with an unknown owner is supposed to remove any > > > rmaps covering that range regardless of owner. This is used by the EFI > > > recovery code to say "we're freeing this, it mustn't be owned by > > > anything anymore", but for whatever reason xfs_free_ag_extent filters > > > them out. > > > > > > Therefore, remove the filter and make xfs_rmap_unmap actually treat it > > > as a wildcard owner -- free anything that's already there, and if > > > there's no owner at all then that's fine too. > > > > > > There are two existing callers of bmap_add_free that take care the rmap > > > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap > > > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets > > > added to the defer ops ahead of any EFI. > > > > > > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, > > > growfs will have to consult directly with the rmap to ensure that there > > > aren't any rmaps in the grown region. > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > --- > > ... > > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > > > index 5f3a3d9..fd0e630 100644 > > > --- a/fs/xfs/libxfs/xfs_rmap.c > > > +++ b/fs/xfs/libxfs/xfs_rmap.c > > > @@ -484,10 +484,17 @@ xfs_rmap_unmap( > > > XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == > > > (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); > > > > > > - /* Make sure the extent we found covers the entire freeing range. */ > > > - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > > > - ltrec.rm_startblock + ltrec.rm_blockcount >= > > > - bno + len, out_error); > > > + /* > > > + * Make sure the extent we found covers the entire freeing range. > > > + * If this is a wildcard free, we're already done, otherwise there's > > > + * something wrong with the rmapbt. > > > + */ > > > > What does this mean by "we're already done?" This logic appears to mean > > that we don't do anything (as opposed to throwing an error). I think the > > comment would be more clear if it pointed out that/why we have nothing > > to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and > > we essentially didn't find a match..? > > "Make sure the extent we found covers the entire freeing range. Passing > in an owner of OWN_UNKNOWN means that the caller wants to remove any > reverse mapping that may exist for this range of blocks regardless of > owner; if there are no mappings at all, we're done." > Looking at this again, I find it a bit confusing how this check seems to double as a "nothing to do" in the unknown case and a corruption error otherwise. For example, is something still technically wrong if we get an UNKNOWN unmap request (aka an extent free) to unmap a range that overlaps with but starts before or extends past the range in the rmapbt? I could easily be missing something here, but otherwise I wonder if this would be better as separate checks so we don't lose some of the error checking coverage. Brian > > > + if (ltrec.rm_startblock > bno || > > > + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { > > > + if (owner == XFS_RMAP_OWN_UNKNOWN) > > > + goto out_done; > > > + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); > > > + } > > > > > > > Also... unrelated, but is this check immediately below really intending > > to ignore owner inconsistencies for all !inode owners? > > I had my eye on that one too, though I think that could be a > freestanding cleanup. > > > > /* Make sure the owner matches what we expect to find in the tree. */ > > > XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > index 8f22fc5..60a2e12 100644 > > > --- a/fs/xfs/xfs_fsops.c > > > +++ b/fs/xfs/xfs_fsops.c > > > @@ -571,6 +571,11 @@ xfs_growfs_data_private( > > > * this doesn't actually exist in the rmap btree. > > > */ > > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > > > + error = xfs_rmap_free(tp, bp, agno, > > > + be32_to_cpu(agf->agf_length) - new, > > > + new, &oinfo); > > > + if (error) > > > + goto error0; > > > > OWN_NULL makes sense from the perspective of needing to avoid some error > > down in the free code where we need to free some space without needing > > to remove an owner, but what is the purpose of the above? It doesn't > > look like this really does anything beyond checking that the associated > > space is beyond the end of the rmapbt. If that's the intent, then it > > probably makes sense to update this comment as well. > > Yes, that's exactly the intent. > > Hmm, come to think of it, the rmap xref patch adds a > xfs_rmap_has_record helper that does exactly what we want here (decides > if there are any records covering this range). > > --D > > > Brian > > > > > error = xfs_free_extent(tp, > > > XFS_AGB_TO_FSB(mp, agno, > > > be32_to_cpu(agf->agf_length) - new), > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-06 20:49 ` Brian Foster @ 2017-12-06 22:06 ` Darrick J. Wong 2017-12-07 13:00 ` Brian Foster 0 siblings, 1 reply; 9+ messages in thread From: Darrick J. Wong @ 2017-12-06 22:06 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Wed, Dec 06, 2017 at 03:49:46PM -0500, Brian Foster wrote: > On Wed, Dec 06, 2017 at 09:53:00AM -0800, Darrick J. Wong wrote: > > On Wed, Dec 06, 2017 at 09:14:07AM -0500, Brian Foster wrote: > > > On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > > > Calling xfs_rmap_free with an unknown owner is supposed to remove any > > > > rmaps covering that range regardless of owner. This is used by the EFI > > > > recovery code to say "we're freeing this, it mustn't be owned by > > > > anything anymore", but for whatever reason xfs_free_ag_extent filters > > > > them out. > > > > > > > > Therefore, remove the filter and make xfs_rmap_unmap actually treat it > > > > as a wildcard owner -- free anything that's already there, and if > > > > there's no owner at all then that's fine too. > > > > > > > > There are two existing callers of bmap_add_free that take care the rmap > > > > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap > > > > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets > > > > added to the defer ops ahead of any EFI. > > > > > > > > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, > > > > growfs will have to consult directly with the rmap to ensure that there > > > > aren't any rmaps in the grown region. > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > --- > > > > ... > > > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > > > > index 5f3a3d9..fd0e630 100644 > > > > --- a/fs/xfs/libxfs/xfs_rmap.c > > > > +++ b/fs/xfs/libxfs/xfs_rmap.c > > > > @@ -484,10 +484,17 @@ xfs_rmap_unmap( > > > > XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == > > > > (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); > > > > > > > > - /* Make sure the extent we found covers the entire freeing range. */ > > > > - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > > > > - ltrec.rm_startblock + ltrec.rm_blockcount >= > > > > - bno + len, out_error); > > > > + /* > > > > + * Make sure the extent we found covers the entire freeing range. > > > > + * If this is a wildcard free, we're already done, otherwise there's > > > > + * something wrong with the rmapbt. > > > > + */ > > > > > > What does this mean by "we're already done?" This logic appears to mean > > > that we don't do anything (as opposed to throwing an error). I think the > > > comment would be more clear if it pointed out that/why we have nothing > > > to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and > > > we essentially didn't find a match..? > > > > "Make sure the extent we found covers the entire freeing range. Passing > > in an owner of OWN_UNKNOWN means that the caller wants to remove any > > reverse mapping that may exist for this range of blocks regardless of > > owner; if there are no mappings at all, we're done." > > > > Looking at this again, I find it a bit confusing how this check seems to > double as a "nothing to do" in the unknown case and a corruption error > otherwise. For example, is something still technically wrong if we get > an UNKNOWN unmap request (aka an extent free) to unmap a range that > overlaps with but starts before or extends past the range in the rmapbt? I .... hmm, bear with my stream of consciousness: Start with the assumption that for each rmap free request there is exactly one rmap that overlaps with the request (X and Y are existing rmaps, R is the range we want to remove): XXXXX....YYYY RR XXXXX....YYYY RR XXXXX....YYYY RR XXXXX....YYYY RRRRR (Note that we also assume that the rmapbt never has two mergeable records. rm_blockcount is 32-bit which is large enough to cover an entire AG if need be.) On the regular logging path, we have two choices: (1) pass the owner to xfs_free_extent and let the EFI-finish remove the rmap, or (2) queue our own deferred rmap unmap and pass OWN_NULL to bmap_add_free in which case the EFI-finish will not try to remove an rmap. If we have to log-recover (1), then neither the rmap nor the free space btrees got updated, so there should be a single rmap covering the whole EFI extent. The EFI recovery will do an OWN_UNKNOWN rmap free, which will find that full rmap and remove as necessary. If we have to log-recover (2) then (assuming we queued the RUI before the EFI like we're supposed to) the RUI recovery will remove the rmapping and the EFI recovery's OWN_UNKNOWN rmap free will find nothing. So far so good? Let's see what happens if our assumption is false: XXXXX....YYYY RRRRR XXXXX....YYYY RRRRR XXXXX....YYYY RRRRRRRR Then we started with a filesystem that only had a partial rmapping for a mapped region. For a known-owner removal we'll only ever see rmap X and trigger this corruption report. This holds true even for an unknown-owner removal, so I think you're definitely onto something here. The rework of this hunk was incorrect; I should have left it alone. XXXXX....YYYY RR So this is this case where EFI recovery needs to delete an rmap and there's already nothing there, ideally because a previous RUI already did the work for us. I /think/ the proper fix is that if ltrec precedes the range being removed then we increment to check the next extent, and if there's no overlap with the right extent (or no right extent), then the rmappings were already cleared out and therefore we can just exit: /* * If we're doing an unknown-owner removal and there are no rmaps * overlapping the other end of range, we're done. */ if (owner == UNKNOWN && ltrec.rm_startblock + ltrec.rm_blockcount <= bno) { struct xfs_rmap_irec rtrec; xfs_btree_increment(rmap_cur, &has); if (!has) return 0; xfs_btree_getrec(rmap_cur, &rtrec); if (rtrec.rm_startblock > bno + len) return 0; } /* Make sure the extent we found covers the entire freeing range. */ XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && ltrec.rm_startblock + ltrec.rm_blockcount >= bno + len, out_error); Good catch! This whole thing has turned my brain into a lump of spaghetti. Well that and the three simultaneous internal bug hunts... --D > I could easily be missing something here, but otherwise I wonder if this > would be better as separate checks so we don't lose some of the error > checking coverage. > > Brian > > > > > + if (ltrec.rm_startblock > bno || > > > > + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { > > > > + if (owner == XFS_RMAP_OWN_UNKNOWN) > > > > + goto out_done; > > > > + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); > > > > + } > > > > > > > > > > Also... unrelated, but is this check immediately below really intending > > > to ignore owner inconsistencies for all !inode owners? > > > > I had my eye on that one too, though I think that could be a > > freestanding cleanup. > > > > > > /* Make sure the owner matches what we expect to find in the tree. */ > > > > XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || > > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > > index 8f22fc5..60a2e12 100644 > > > > --- a/fs/xfs/xfs_fsops.c > > > > +++ b/fs/xfs/xfs_fsops.c > > > > @@ -571,6 +571,11 @@ xfs_growfs_data_private( > > > > * this doesn't actually exist in the rmap btree. > > > > */ > > > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > > > > + error = xfs_rmap_free(tp, bp, agno, > > > > + be32_to_cpu(agf->agf_length) - new, > > > > + new, &oinfo); > > > > + if (error) > > > > + goto error0; > > > > > > OWN_NULL makes sense from the perspective of needing to avoid some error > > > down in the free code where we need to free some space without needing > > > to remove an owner, but what is the purpose of the above? It doesn't > > > look like this really does anything beyond checking that the associated > > > space is beyond the end of the rmapbt. If that's the intent, then it > > > probably makes sense to update this comment as well. > > > > Yes, that's exactly the intent. > > > > Hmm, come to think of it, the rmap xref patch adds a > > xfs_rmap_has_record helper that does exactly what we want here (decides > > if there are any records covering this range). > > > > --D > > > > > Brian > > > > > > > error = xfs_free_extent(tp, > > > > XFS_AGB_TO_FSB(mp, agno, > > > > be32_to_cpu(agf->agf_length) - new), > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests 2017-12-06 22:06 ` Darrick J. Wong @ 2017-12-07 13:00 ` Brian Foster 0 siblings, 0 replies; 9+ messages in thread From: Brian Foster @ 2017-12-07 13:00 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs On Wed, Dec 06, 2017 at 02:06:05PM -0800, Darrick J. Wong wrote: > On Wed, Dec 06, 2017 at 03:49:46PM -0500, Brian Foster wrote: > > On Wed, Dec 06, 2017 at 09:53:00AM -0800, Darrick J. Wong wrote: > > > On Wed, Dec 06, 2017 at 09:14:07AM -0500, Brian Foster wrote: > > > > On Tue, Dec 05, 2017 at 03:34:20PM -0800, Darrick J. Wong wrote: > > > > > From: Darrick J. Wong <darrick.wong@oracle.com> > > > > > > > > > > Calling xfs_rmap_free with an unknown owner is supposed to remove any > > > > > rmaps covering that range regardless of owner. This is used by the EFI > > > > > recovery code to say "we're freeing this, it mustn't be owned by > > > > > anything anymore", but for whatever reason xfs_free_ag_extent filters > > > > > them out. > > > > > > > > > > Therefore, remove the filter and make xfs_rmap_unmap actually treat it > > > > > as a wildcard owner -- free anything that's already there, and if > > > > > there's no owner at all then that's fine too. > > > > > > > > > > There are two existing callers of bmap_add_free that take care the rmap > > > > > deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap > > > > > cleanup; convert these to use OWN_NULL, and ensure that the RUI gets > > > > > added to the defer ops ahead of any EFI. > > > > > > > > > > Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, > > > > > growfs will have to consult directly with the rmap to ensure that there > > > > > aren't any rmaps in the grown region. > > > > > > > > > > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> > > > > > --- > > > > > > ... > > > > > diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c > > > > > index 5f3a3d9..fd0e630 100644 > > > > > --- a/fs/xfs/libxfs/xfs_rmap.c > > > > > +++ b/fs/xfs/libxfs/xfs_rmap.c > > > > > @@ -484,10 +484,17 @@ xfs_rmap_unmap( > > > > > XFS_WANT_CORRUPTED_GOTO(mp, (flags & XFS_RMAP_UNWRITTEN) == > > > > > (ltrec.rm_flags & XFS_RMAP_UNWRITTEN), out_error); > > > > > > > > > > - /* Make sure the extent we found covers the entire freeing range. */ > > > > > - XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > > > > > - ltrec.rm_startblock + ltrec.rm_blockcount >= > > > > > - bno + len, out_error); > > > > > + /* > > > > > + * Make sure the extent we found covers the entire freeing range. > > > > > + * If this is a wildcard free, we're already done, otherwise there's > > > > > + * something wrong with the rmapbt. > > > > > + */ > > > > > > > > What does this mean by "we're already done?" This logic appears to mean > > > > that we don't do anything (as opposed to throwing an error). I think the > > > > comment would be more clear if it pointed out that/why we have nothing > > > > to do here (due to OWN_UNKNOWN). I.e., caller passed in a wildcard and > > > > we essentially didn't find a match..? > > > > > > "Make sure the extent we found covers the entire freeing range. Passing > > > in an owner of OWN_UNKNOWN means that the caller wants to remove any > > > reverse mapping that may exist for this range of blocks regardless of > > > owner; if there are no mappings at all, we're done." > > > > > > > Looking at this again, I find it a bit confusing how this check seems to > > double as a "nothing to do" in the unknown case and a corruption error > > otherwise. For example, is something still technically wrong if we get > > an UNKNOWN unmap request (aka an extent free) to unmap a range that > > overlaps with but starts before or extends past the range in the rmapbt? > > I .... hmm, bear with my stream of consciousness: > > Start with the assumption that for each rmap free request there is > exactly one rmap that overlaps with the request (X and Y are existing > rmaps, R is the range we want to remove): > > XXXXX....YYYY > RR > > XXXXX....YYYY > RR > > XXXXX....YYYY > RR > > XXXXX....YYYY > RRRRR > > (Note that we also assume that the rmapbt never has two mergeable > records. rm_blockcount is 32-bit which is large enough to cover an > entire AG if need be.) > > On the regular logging path, we have two choices: (1) pass the owner to > xfs_free_extent and let the EFI-finish remove the rmap, or (2) queue our > own deferred rmap unmap and pass OWN_NULL to bmap_add_free in which case > the EFI-finish will not try to remove an rmap. > > If we have to log-recover (1), then neither the rmap nor the free space > btrees got updated, so there should be a single rmap covering the whole > EFI extent. The EFI recovery will do an OWN_UNKNOWN rmap free, which > will find that full rmap and remove as necessary. > > If we have to log-recover (2) then (assuming we queued the RUI before > the EFI like we're supposed to) the RUI recovery will remove the > rmapping and the EFI recovery's OWN_UNKNOWN rmap free will find nothing. > > So far so good? Let's see what happens if our assumption is false: > > XXXXX....YYYY > RRRRR > > XXXXX....YYYY > RRRRR > > XXXXX....YYYY > RRRRRRRR > > Then we started with a filesystem that only had a partial rmapping for a > mapped region. For a known-owner removal we'll only ever see rmap X and > trigger this corruption report. This holds true even for an > unknown-owner removal, so I think you're definitely onto something here. > The rework of this hunk was incorrect; I should have left it alone. > > XXXXX....YYYY > RR > > So this is this case where EFI recovery needs to delete an rmap and > there's already nothing there, ideally because a previous RUI already > did the work for us. > > I /think/ the proper fix is that if ltrec precedes the range being > removed then we increment to check the next extent, and if there's no > overlap with the right extent (or no right extent), then the rmappings > were already cleared out and therefore we can just exit: > ACK, that is much more clear. A nit.. > /* > * If we're doing an unknown-owner removal and there are no rmaps > * overlapping the other end of range, we're done. > */ To me, this implies that the overlap case is a valid one. I'd prefer something a bit more explicit, e.g.: "On unknown-owner removal, we expect to find the full range in the rmapbt or not at all. If no rmaps overlap the range, we're done. Otherwise, all sanity checks apply except for the rmap owner." Thanks for the breakdown! Brian > if (owner == UNKNOWN && ltrec.rm_startblock + ltrec.rm_blockcount <= bno) { > struct xfs_rmap_irec rtrec; > > xfs_btree_increment(rmap_cur, &has); > if (!has) > return 0; > xfs_btree_getrec(rmap_cur, &rtrec); > if (rtrec.rm_startblock > bno + len) > return 0; > } > > /* Make sure the extent we found covers the entire freeing range. */ > XFS_WANT_CORRUPTED_GOTO(mp, ltrec.rm_startblock <= bno && > ltrec.rm_startblock + ltrec.rm_blockcount >= > bno + len, out_error); > > Good catch! This whole thing has turned my brain into a lump of > spaghetti. Well that and the three simultaneous internal bug hunts... > > --D > > > I could easily be missing something here, but otherwise I wonder if this > > would be better as separate checks so we don't lose some of the error > > checking coverage. > > > > Brian > > > > > > > + if (ltrec.rm_startblock > bno || > > > > > + ltrec.rm_startblock + ltrec.rm_blockcount < bno + len) { > > > > > + if (owner == XFS_RMAP_OWN_UNKNOWN) > > > > > + goto out_done; > > > > > + XFS_WANT_CORRUPTED_GOTO(mp, false, out_error); > > > > > + } > > > > > > > > > > > > > Also... unrelated, but is this check immediately below really intending > > > > to ignore owner inconsistencies for all !inode owners? > > > > > > I had my eye on that one too, though I think that could be a > > > freestanding cleanup. > > > > > > > > /* Make sure the owner matches what we expect to find in the tree. */ > > > > > XFS_WANT_CORRUPTED_GOTO(mp, owner == ltrec.rm_owner || > > > > > diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c > > > > > index 8f22fc5..60a2e12 100644 > > > > > --- a/fs/xfs/xfs_fsops.c > > > > > +++ b/fs/xfs/xfs_fsops.c > > > > > @@ -571,6 +571,11 @@ xfs_growfs_data_private( > > > > > * this doesn't actually exist in the rmap btree. > > > > > */ > > > > > xfs_rmap_ag_owner(&oinfo, XFS_RMAP_OWN_NULL); > > > > > + error = xfs_rmap_free(tp, bp, agno, > > > > > + be32_to_cpu(agf->agf_length) - new, > > > > > + new, &oinfo); > > > > > + if (error) > > > > > + goto error0; > > > > > > > > OWN_NULL makes sense from the perspective of needing to avoid some error > > > > down in the free code where we need to free some space without needing > > > > to remove an owner, but what is the purpose of the above? It doesn't > > > > look like this really does anything beyond checking that the associated > > > > space is beyond the end of the rmapbt. If that's the intent, then it > > > > probably makes sense to update this comment as well. > > > > > > Yes, that's exactly the intent. > > > > > > Hmm, come to think of it, the rmap xref patch adds a > > > xfs_rmap_has_record helper that does exactly what we want here (decides > > > if there are any records covering this range). > > > > > > --D > > > > > > > Brian > > > > > > > > > error = xfs_free_extent(tp, > > > > > XFS_AGB_TO_FSB(mp, agno, > > > > > be32_to_cpu(agf->agf_length) - new), > > > > > -- > > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > > the body of a message to majordomo@vger.kernel.org > > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > > the body of a message to majordomo@vger.kernel.org > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: extfree log recovery and owner (rmapbt) updates 2017-12-05 18:55 extfree log recovery and owner (rmapbt) updates Brian Foster 2017-12-05 23:32 ` Darrick J. Wong 2017-12-05 23:34 ` [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests Darrick J. Wong @ 2017-12-05 23:49 ` Dave Chinner 2 siblings, 0 replies; 9+ messages in thread From: Dave Chinner @ 2017-12-05 23:49 UTC (permalink / raw) To: Brian Foster; +Cc: linux-xfs On Tue, Dec 05, 2017 at 01:55:24PM -0500, Brian Foster wrote: > Hi, > > I've been playing around with deferring AGFL block frees and noticed > something funky going on with xfs_bmap_add_free() and owner info. We > pass the extent owner info to xfs_bmap_add_free() in various places to > transfer this info to the deferred free processing context. > Subsequently, xfs_trans_free_extent() -> ... -> xfs_free_ag_extent() > uses the owner info to remove the rmapbt entry associated with the > extent. The rmapbt update occurs at the time the extent is freed. > > If the system crashes immediately after the initial transaction commits, > however, EFI recovery calls xfs_trans_free_extent() with an "unknown" > owner. I see a reference to using a "wildcard" in this context in some > of the old rmapbt commits, Yup, that was my original intent - to deal with log recovery not knowing who the owner of the extent being freed is as it's not recorded in the EFI. IOWs, XFS_RMAP_OWN_UNKNOWN was supposed to match any owner in the rmapbt record and allow it to be removed instead of throwing a corruption error because a the owner field didn't match the rmap extent record we found on lookup. It seems that this morphed during the refcount updates to the rmapbt record structure and I didn't catch that during review. i.e. once cow and unsharing are added into the picture, the rmapbt updates are no longer a 1:1 mapping with extent freeing.... > but it looks like this codepath skips the > rmapbt update altogether if oi_owner == XFS_RMAP_OWN_UNKNOWN. Yup, that looks like a bug at first glance, but I think what we really need to determine here is whether there is a deferred rmap update for that extent being freed or not. That will determine what needs to happen when processing the EFI - if there's a deferred rmap update, then we don't want to free the rmap record here, but if there isn't an rmap update, then we need to free it from EFI context... > A quick > test that shuts down the fs immediately after a transaction commits that > logs an EFI for a freed inode chunk leaves the fs inconsistent after log > recovery, with a bit of a cryptic message from repair: > > unknown block state, ag 0, block 24 That would explain the failures I occasionally see from an fstest whose number I can't recall right now. > It does look like the associated rmapbt entry still exists in the tree > even though the extent has been freed (from xfs_db -c fsmap): > > 5: 0/24 len 8 owner -7 offset 0 bmbt 0 attrfork 0 extflag 0 > > So what is supposed to be going on here? Should the rmapbt entry be > removed unconditionally during recovery, or should a separate rmapbt > update item be deferred (as in the bunmapi case, for example) rather > than passing oinfo along with the TYPE_FREE deferred item? Or something > else entirely..? I'm thinking it depends on whether we've got a deferred RUI for the extent being freed pending in the recovery list or not... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-12-07 13:00 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-05 18:55 extfree log recovery and owner (rmapbt) updates Brian Foster 2017-12-05 23:32 ` Darrick J. Wong 2017-12-05 23:34 ` [RFC PATCH] xfs: always honor OWN_UNKNOWN rmap removal requests Darrick J. Wong 2017-12-06 14:14 ` Brian Foster 2017-12-06 17:53 ` Darrick J. Wong 2017-12-06 20:49 ` Brian Foster 2017-12-06 22:06 ` Darrick J. Wong 2017-12-07 13:00 ` Brian Foster 2017-12-05 23:49 ` extfree log recovery and owner (rmapbt) updates Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.