All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: Brian Foster <bfoster@redhat.com>
Cc: Eric Sandeen <sandeen@redhat.com>, linux-xfs <linux-xfs@vger.kernel.org>
Subject: Re: [PATCH 1/2] xfs: fix up xfs_swap_extent_forks inline extent handling
Date: Mon, 7 Nov 2016 10:38:02 -0600	[thread overview]
Message-ID: <ca97f3c1-f9c6-1c43-8ac1-bc50fceff1fa@sandeen.net> (raw)
In-Reply-To: <20161107145737.GA51345@bfoster.bfoster>

On 11/7/16 8:57 AM, Brian Foster wrote:
> On Fri, Nov 04, 2016 at 09:24:16PM -0500, Eric Sandeen wrote:
>> There have been several reports over the years of NULL pointer
>> dereferences in xfs_trans_log_inode during xfs_fsr processes,
>> when the process is doing an fput and tearing down extents
>> on the temporary inode, something like:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
>>     [exception RIP: xfs_trans_log_inode+0x10]
>>  #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
>> #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
>> #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
>> #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
>> #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
>> #14 [ffff8800a57bbe00] evict at ffffffff811e1b67
>> #15 [ffff8800a57bbe28] iput at ffffffff811e23a5
>> #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
>> #17 [ffff8800a57bbe88] dput at ffffffff811dd06c
>> #18 [ffff8800a57bbea8] __fput at ffffffff811c823b
>> #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
>> #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
>> #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
>> #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
>>
>> As it turns out, this is because the i_itemp pointer, along
>> with the d_ops pointer, has been overwritten with zeros
>> when we tear down the extents during truncate.  When the in-core
>> inode fork on the temporary inode used by xfs_fsr was originally
>> set up during the extent swap, we mistakenly looked at di_nextents
>> to determine whether all extents fit inline, but this misses extents
>> generated by speculative preallocation; we should be using if_bytes
>> instead.
>>
> 
> Neat bug. :P The fix looks fine, but technically di_nextents doesn't
> include any delalloc extents, right? From taking a quick skim through
> the test case, it looks like the problematic situation is when the file
> flush doesn't end up converting post-eof delalloc (created via
> speculative prealloc) due to free space fragmentation.
> 
> So in other words, in most cases a file flush will convert all delalloc,
> including post-eof preallocation, by virtue of being part of an extent
> that is partly within file eof. In the fragmented free space situation,
> however, the file flush is not necessarily guaranteed to convert all
> delalloc blocks past eof, thus di_nextents is not consistent with the
> in-core inode state, and badness ensues...

To be honest I hadn't thought it through that far.  Could modify the
testcase to free up more contiguous space prior to the last flush, perhaps,
and see what happens.

-Eric

> If I'm following that correctly it would be nice to just clarify that a
> bit in the commit log. Otherwise looks good to me:
> 
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> 
>> This mistake corrupts the in-memory inode, and code in
>> xfs_iext_remove_inline eventually gets bad inputs, causing
>> it to memmove and memset incorrect ranges; this became apparent
>> because the two values in ifp->if_u2.if_inline_ext[1] contained
>> what should have been in d_ops and i_itemp; they were memmoved due
>> to incorrect array indexing and then the original locations
>> were zeroed with memset, again due to an array overrun.
>>
>> Fix this by properly using i_df.if_bytes to determine the number
>> of extents, not di_nextents.
>>
>> Thanks to dchinner for looking at this with me and spotting the
>> root cause.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>> ---
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index 552465e..47074e0 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1792,6 +1792,7 @@
>>  	struct xfs_ifork	tempifp, *ifp, *tifp;
>>  	int			aforkblks = 0;
>>  	int			taforkblks = 0;
>> +	xfs_extnum_t		nextents;
>>  	__uint64_t		tmp;
>>  	int			error;
>>  
>> @@ -1881,7 +1882,8 @@
>>  		 * pointer.  Otherwise it's already NULL or
>>  		 * pointing to the extent.
>>  		 */
>> -		if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) {
>> +		nextents = ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>> +		if (nextents <= XFS_INLINE_EXTS) {
>>  			ifp->if_u1.if_extents =
>>  				ifp->if_u2.if_inline_ext;
>>  		}
>> @@ -1900,7 +1902,8 @@
>>  		 * pointer.  Otherwise it's already NULL or
>>  		 * pointing to the extent.
>>  		 */
>> -		if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) {
>> +		nextents = tip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>> +		if (nextents <= XFS_INLINE_EXTS) {
>>  			tifp->if_u1.if_extents =
>>  				tifp->if_u2.if_inline_ext;
>>  		}
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2016-11-07 16:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-05  2:09 [PATCH 0/2] xfs: fix up xfs_swap_extent_forks inline extent handling Eric Sandeen
2016-11-05  2:24 ` [PATCH 1/2] " Eric Sandeen
2016-11-07 14:57   ` Brian Foster
2016-11-07 16:38     ` Eric Sandeen [this message]
2016-11-05  2:27 ` [PATCH 2/2] xfs: provide helper for counting extents from if_bytes Eric Sandeen
2016-11-06 22:02   ` Dave Chinner
2016-11-07 16:45     ` Eric Sandeen
2016-11-07 14:58   ` Brian Foster
2016-11-07 17:35   ` [PATCH 2/2 V2] " Eric Sandeen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca97f3c1-f9c6-1c43-8ac1-bc50fceff1fa@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=bfoster@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.