* [PATCH 0/2] xfs: misc. attribute and log recovery fixes @ 2015-06-18 12:48 Brian Foster 2015-06-18 12:49 ` [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist Brian Foster 2015-06-18 12:49 ` [PATCH 2/2] xfs: validate transaction header length on log recovery Brian Foster 0 siblings, 2 replies; 11+ messages in thread From: Brian Foster @ 2015-06-18 12:48 UTC (permalink / raw) To: xfs Hi all, This is just a couple miscellaneous fixes. Patch 1 fixes a regression in the attribute fork destruction path in xfs_inactive(). Considering that the cause of said regression was stable fodder, this might be as well. Patch 2 fixes a crash reproduced on an s390x box due to a memcpy() overflow in log recovery due to a corrupted log operation header (caused by fsfuzzer). Thoughts, reviews, flames appreciated. Brian Brian Foster (2): xfs: don't truncate attribute extents if no extents exist xfs: validate transaction header length on log recovery fs/xfs/xfs_attr_inactive.c | 3 ++- fs/xfs/xfs_log_recover.c | 10 ++++++++-- 2 files changed, 10 insertions(+), 3 deletions(-) -- 1.9.3 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist 2015-06-18 12:48 [PATCH 0/2] xfs: misc. attribute and log recovery fixes Brian Foster @ 2015-06-18 12:49 ` Brian Foster 2015-06-19 15:14 ` Christoph Hellwig 2015-06-22 13:38 ` [PATCH v2] " Brian Foster 2015-06-18 12:49 ` [PATCH 2/2] xfs: validate transaction header length on log recovery Brian Foster 1 sibling, 2 replies; 11+ messages in thread From: Brian Foster @ 2015-06-18 12:49 UTC (permalink / raw) To: xfs The xfs_attr3_root_inactive() call from xfs_attr_inactive() assumes that attribute blocks exist to invalidate. It is possible to have an attribute fork without extents, however. Consider the case where the attribute fork is created towards the beginning of xfs_attr_set() but some part of the subsequent attribute set fails. If an inode in such a state hits xfs_attr_inactive(), it eventually calls xfs_dabuf_map() and possibly xfs_bmapi_read(). The former emits a filesystem corruption warning, returns an error that bubbles back up to xfs_attr_inactive(), and leads to destruction of the in-core attribute fork without an on-disk reset. If the inode happens to make it back through xfs_inactive() in this state (e.g., via a concurrent bulkstat that cycles the inode from the reclaim state and releases it), i_afp might not exist when xfs_bmapi_read() is called and causes a NULL dereference panic. A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB) reproduces these problems. The behavior is a regression caused by: 6dfe5a0 xfs: xfs_attr_inactive leaves inconsistent attr fork state behind ... which removed logic that avoided the attribute extent truncate when no extents exist. Restore this logic to ensure the attribute fork is destroyed and reset correctly if it exists without any allocated extents. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_attr_inactive.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 69a154c..4381b94 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -434,7 +434,8 @@ xfs_attr_inactive( xfs_trans_ijoin(trans, dp, 0); /* invalidate and truncate the attribute fork extents */ - if (dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { + if (xfs_inode_hasattr(dp) && + dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { error = xfs_attr3_root_inactive(&trans, dp); if (error) goto out_cancel; -- 1.9.3 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist 2015-06-18 12:49 ` [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist Brian Foster @ 2015-06-19 15:14 ` Christoph Hellwig 2015-06-19 15:45 ` Brian Foster 2015-06-22 13:38 ` [PATCH v2] " Brian Foster 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2015-06-19 15:14 UTC (permalink / raw) To: Brian Foster; +Cc: xfs > A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB) > reproduces these problems. Any chance to add this test case to xfstests? > /* invalidate and truncate the attribute fork extents */ > - if (dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { > + if (xfs_inode_hasattr(dp) && > + dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { And please add a comment describing the condition here, especially as xfs_inode_hasattr doesn't have a very descriptive name. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist 2015-06-19 15:14 ` Christoph Hellwig @ 2015-06-19 15:45 ` Brian Foster 2015-06-21 9:22 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Brian Foster @ 2015-06-19 15:45 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Fri, Jun 19, 2015 at 08:14:21AM -0700, Christoph Hellwig wrote: > > A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB) > > reproduces these problems. > > Any chance to add this test case to xfstests? > I think so. I'll look into it. > > /* invalidate and truncate the attribute fork extents */ > > - if (dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { > > + if (xfs_inode_hasattr(dp) && > > + dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { > > And please add a comment describing the condition here, especially > as xfs_inode_hasattr doesn't have a very descriptive name. > How about this? /* * Invalidate and truncate the attribute fork extents. Make sure the * fork actually has attributes as otherwise the invalidation has no * blocks to read and returns an error. In this case, just do the fork * removal below. */ Brian _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist 2015-06-19 15:45 ` Brian Foster @ 2015-06-21 9:22 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2015-06-21 9:22 UTC (permalink / raw) To: Brian Foster; +Cc: Christoph Hellwig, xfs On Fri, Jun 19, 2015 at 11:45:23AM -0400, Brian Foster wrote: > How about this? > > /* > * Invalidate and truncate the attribute fork extents. Make sure the > * fork actually has attributes as otherwise the invalidation has no > * blocks to read and returns an error. In this case, just do the fork > * removal below. > */ Yes, that looks good. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] xfs: don't truncate attribute extents if no extents exist 2015-06-18 12:49 ` [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist Brian Foster 2015-06-19 15:14 ` Christoph Hellwig @ 2015-06-22 13:38 ` Brian Foster 1 sibling, 0 replies; 11+ messages in thread From: Brian Foster @ 2015-06-22 13:38 UTC (permalink / raw) To: xfs The xfs_attr3_root_inactive() call from xfs_attr_inactive() assumes that attribute blocks exist to invalidate. It is possible to have an attribute fork without extents, however. Consider the case where the attribute fork is created towards the beginning of xfs_attr_set() but some part of the subsequent attribute set fails. If an inode in such a state hits xfs_attr_inactive(), it eventually calls xfs_dabuf_map() and possibly xfs_bmapi_read(). The former emits a filesystem corruption warning, returns an error that bubbles back up to xfs_attr_inactive(), and leads to destruction of the in-core attribute fork without an on-disk reset. If the inode happens to make it back through xfs_inactive() in this state (e.g., via a concurrent bulkstat that cycles the inode from the reclaim state and releases it), i_afp might not exist when xfs_bmapi_read() is called and causes a NULL dereference panic. A '-p 2' fsstress run to ENOSPC on a relatively small fs (1GB) reproduces these problems. The behavior is a regression caused by: 6dfe5a0 xfs: xfs_attr_inactive leaves inconsistent attr fork state behind ... which removed logic that avoided the attribute extent truncate when no extents exist. Restore this logic to ensure the attribute fork is destroyed and reset correctly if it exists without any allocated extents. Signed-off-by: Brian Foster <bfoster@redhat.com> --- Here's v2 of the attr inactive fix with the additional comment requested by Christoph. After beating my head against reproducing this with xfstests a bit last week, I determined that selinux was a contributing factor in my original fsstress reproducer. I found a non-selinux/fsstress reproducer sequence, so I'll cook up an xfstest based on that soon... Brian v2: - Added comment to explain logic behind skipping the inval/truncate in the no attribute case. v1: http://oss.sgi.com/archives/xfs/2015-06/msg00276.html fs/xfs/xfs_attr_inactive.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_attr_inactive.c b/fs/xfs/xfs_attr_inactive.c index 69a154c..2bb959a 100644 --- a/fs/xfs/xfs_attr_inactive.c +++ b/fs/xfs/xfs_attr_inactive.c @@ -433,8 +433,14 @@ xfs_attr_inactive( */ xfs_trans_ijoin(trans, dp, 0); - /* invalidate and truncate the attribute fork extents */ - if (dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { + /* + * Invalidate and truncate the attribute fork extents. Make sure the + * fork actually has attributes as otherwise the invalidation has no + * blocks to read and returns an error. In this case, just do the fork + * removal below. + */ + if (xfs_inode_hasattr(dp) && + dp->i_d.di_aformat != XFS_DINODE_FMT_LOCAL) { error = xfs_attr3_root_inactive(&trans, dp); if (error) goto out_cancel; -- 1.9.3 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] xfs: validate transaction header length on log recovery 2015-06-18 12:48 [PATCH 0/2] xfs: misc. attribute and log recovery fixes Brian Foster 2015-06-18 12:49 ` [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist Brian Foster @ 2015-06-18 12:49 ` Brian Foster 2015-06-21 9:27 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Brian Foster @ 2015-06-18 12:49 UTC (permalink / raw) To: xfs When log recovery hits a new transaction, it copies the transaction header from the expected location in the log to the in-core structure using the length from the op record header. This length is validated to ensure it doesn't exceed the length of the record, but not against the expected size of a transaction header (and thus the size of the in-core structure). If the on-disk length is corrupted, the associated memcpy() can overflow, write to unrelated memory and lead to crashes. This has been reproduced via filesystem fuzzing. The code already checks that the length matches the transaction header in order to add a recovery item to the transaction. Convert this check to an explicit validation of the length to prevent memcpy() overflow. In the event of the latter, warn the user and fail the log recovery. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/xfs_log_recover.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 299fbaf..3c6ad4c 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -3437,8 +3437,14 @@ xlog_recover_add_to_trans( ASSERT(0); return -EIO; } - if (len == sizeof(xfs_trans_header_t)) - xlog_recover_add_item(&trans->r_itemq); + if (len != sizeof(struct xfs_trans_header)) { + xfs_warn(log->l_mp, "%s: bad header size (%d)", + __func__, len); + ASSERT(0); + return -EIO; + } + + xlog_recover_add_item(&trans->r_itemq); memcpy(&trans->r_theader, dp, len); return 0; } -- 1.9.3 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: validate transaction header length on log recovery 2015-06-18 12:49 ` [PATCH 2/2] xfs: validate transaction header length on log recovery Brian Foster @ 2015-06-21 9:27 ` Christoph Hellwig 2015-06-21 20:25 ` Brian Foster 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2015-06-21 9:27 UTC (permalink / raw) To: Brian Foster; +Cc: xfs This looks sensible to me, but I still can't make sense of the old code which just conditionally copied it even after taking a brief look at the pre-git history of this code. Does anyone understand why the code was like this? Fixing code that seems to have had an intention I can't make sense of always feel dangerous. _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: validate transaction header length on log recovery 2015-06-21 9:27 ` Christoph Hellwig @ 2015-06-21 20:25 ` Brian Foster 2015-06-21 23:05 ` Dave Chinner 0 siblings, 1 reply; 11+ messages in thread From: Brian Foster @ 2015-06-21 20:25 UTC (permalink / raw) To: Christoph Hellwig; +Cc: xfs On Sun, Jun 21, 2015 at 02:27:22AM -0700, Christoph Hellwig wrote: > This looks sensible to me, but I still can't make sense of the old > code which just conditionally copied it even after taking a brief > look at the pre-git history of this code. Does anyone understand why > the code was like this? Fixing code that seems to have had an > intention I can't make sense of always feel dangerous. > Yeah, I couldn't really make much sense of the original code either. My reasoning at the time was that the memcpy() seemed superfluous in this case, as we wouldn't add anything to trans->r_itemq and end up doing the memcpy() again the next time around. Taking another look at other xlog_recover_add_item() callsites, there is this in the xlog_recover_add_to_cont_trans() case: if (list_empty(&trans->r_itemq)) { /* finish copying rest of trans header */ xlog_recover_add_item(&trans->r_itemq); ptr = (xfs_caddr_t) &trans->r_theader + sizeof(xfs_trans_header_t) - len; memcpy(ptr, dp, len); return 0; } ... which according to the code/comment, seems to imply that the transaction header could be split across op records..? I'm not terribly familiar with the log on-disk format. Does this sound sane? If so, perhaps the patch is wrong and we should only bail if the length in either of these two cases is obviously invalid (e.g., it exceeds the size of the full in-core structure). Perhaps there's also more validation that can occur here: can we assert that this should mean we're at the end of the operation record in the first short copy instance? I'll probably have to stare at this some more with regard to that. Brian > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: validate transaction header length on log recovery 2015-06-21 20:25 ` Brian Foster @ 2015-06-21 23:05 ` Dave Chinner 2015-06-22 13:59 ` Brian Foster 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2015-06-21 23:05 UTC (permalink / raw) To: Brian Foster; +Cc: Christoph Hellwig, xfs On Sun, Jun 21, 2015 at 04:25:29PM -0400, Brian Foster wrote: > On Sun, Jun 21, 2015 at 02:27:22AM -0700, Christoph Hellwig wrote: > > This looks sensible to me, but I still can't make sense of the old > > code which just conditionally copied it even after taking a brief > > look at the pre-git history of this code. Does anyone understand why > > the code was like this? Fixing code that seems to have had an > > intention I can't make sense of always feel dangerous. > > > > Yeah, I couldn't really make much sense of the original code either. My > reasoning at the time was that the memcpy() seemed superfluous in this > case, as we wouldn't add anything to trans->r_itemq and end up doing the > memcpy() again the next time around. > > Taking another look at other xlog_recover_add_item() callsites, there is > this in the xlog_recover_add_to_cont_trans() case: > > if (list_empty(&trans->r_itemq)) { > /* finish copying rest of trans header */ > xlog_recover_add_item(&trans->r_itemq); > ptr = (xfs_caddr_t) &trans->r_theader + > sizeof(xfs_trans_header_t) - len; > memcpy(ptr, dp, len); > return 0; > } > > ... which according to the code/comment, seems to imply that the > transaction header could be split across op records..? I'm not terribly > familiar with the log on-disk format. Does this sound sane? Yes. Look at xfs_cil_push, where it formats the transaction header. It is just another 16 byte vector that is passed to xlog_write so it gets encapsulated in log opheaders. That means it can be split across multiple iclogs and log opheaders (i.e. can trigger the last_was_partial_copy case in xlog_write_setup_copy()). > If so, perhaps the patch is wrong and we should only bail if the length > in either of these two cases is obviously invalid (e.g., it exceeds the > size of the full in-core structure). The code currently assumes that if the magic number matches, then the length of the opheader will be in range. It's probably a fair assumption, as only a software bug in setting the opheader size in the log write code will cause problems. Other on-disk corruption (e.g. bit flips) will be caught by the log buffer CRCs.... However, we should really trigger a corruption if it does exceed the size of the transaction header. Similarly, we should do the same check on the other side of the partial trans header copy... > Perhaps there's also more > validation that can occur here: can we assert that this should mean > we're at the end of the operation record in the first short copy > instance? Not sure what you mean here. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] xfs: validate transaction header length on log recovery 2015-06-21 23:05 ` Dave Chinner @ 2015-06-22 13:59 ` Brian Foster 0 siblings, 0 replies; 11+ messages in thread From: Brian Foster @ 2015-06-22 13:59 UTC (permalink / raw) To: Dave Chinner; +Cc: Christoph Hellwig, xfs On Mon, Jun 22, 2015 at 09:05:32AM +1000, Dave Chinner wrote: > On Sun, Jun 21, 2015 at 04:25:29PM -0400, Brian Foster wrote: > > On Sun, Jun 21, 2015 at 02:27:22AM -0700, Christoph Hellwig wrote: > > > This looks sensible to me, but I still can't make sense of the old > > > code which just conditionally copied it even after taking a brief > > > look at the pre-git history of this code. Does anyone understand why > > > the code was like this? Fixing code that seems to have had an > > > intention I can't make sense of always feel dangerous. > > > > > > > Yeah, I couldn't really make much sense of the original code either. My > > reasoning at the time was that the memcpy() seemed superfluous in this > > case, as we wouldn't add anything to trans->r_itemq and end up doing the > > memcpy() again the next time around. > > > > Taking another look at other xlog_recover_add_item() callsites, there is > > this in the xlog_recover_add_to_cont_trans() case: > > > > if (list_empty(&trans->r_itemq)) { > > /* finish copying rest of trans header */ > > xlog_recover_add_item(&trans->r_itemq); > > ptr = (xfs_caddr_t) &trans->r_theader + > > sizeof(xfs_trans_header_t) - len; > > memcpy(ptr, dp, len); > > return 0; > > } > > > > ... which according to the code/comment, seems to imply that the > > transaction header could be split across op records..? I'm not terribly > > familiar with the log on-disk format. Does this sound sane? > > Yes. Look at xfs_cil_push, where it formats the transaction header. > It is just another 16 byte vector that is passed to xlog_write so it > gets encapsulated in log opheaders. That means it can be split > across multiple iclogs and log opheaders (i.e. can trigger the > last_was_partial_copy case in xlog_write_setup_copy()). > Ok.. > > If so, perhaps the patch is wrong and we should only bail if the length > > in either of these two cases is obviously invalid (e.g., it exceeds the > > size of the full in-core structure). > > The code currently assumes that if the magic number matches, then > the length of the opheader will be in range. It's probably a fair > assumption, as only a software bug in setting the opheader size in > the log write code will cause problems. Other on-disk corruption > (e.g. bit flips) will be caught by the log buffer CRCs.... > > However, we should really trigger a corruption if it does exceed > the size of the transaction header. Similarly, we should do the same > check on the other side of the partial trans header copy... > Indeed, thanks for the breakdown. So we can drop this one and I'll send a new patch independently since the two patches in this series are not related (and afaics the first is Ok). > > Perhaps there's also more > > validation that can occur here: can we assert that this should mean > > we're at the end of the operation record in the first short copy > > instance? > > Not sure what you mean here. > I was just thinking out loud and wondering whether that if this transaction header split does occur, we can infer that it's due to hitting the end of the op record. In other words, a short transaction header here where the op record actually has more in it could also be considered a corruption. Taking a look at the code, it looks like the length we're working with down in this context is actually the op record length, so this is kind of irrelevant and likely just an artifact of my lack of understanding the on-disk log format. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-06-22 13:59 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-18 12:48 [PATCH 0/2] xfs: misc. attribute and log recovery fixes Brian Foster 2015-06-18 12:49 ` [PATCH 1/2] xfs: don't truncate attribute extents if no extents exist Brian Foster 2015-06-19 15:14 ` Christoph Hellwig 2015-06-19 15:45 ` Brian Foster 2015-06-21 9:22 ` Christoph Hellwig 2015-06-22 13:38 ` [PATCH v2] " Brian Foster 2015-06-18 12:49 ` [PATCH 2/2] xfs: validate transaction header length on log recovery Brian Foster 2015-06-21 9:27 ` Christoph Hellwig 2015-06-21 20:25 ` Brian Foster 2015-06-21 23:05 ` Dave Chinner 2015-06-22 13:59 ` Brian Foster
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.