All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: dsterba@suse.com, linux-btrfs@vger.kernel.org, wqu@suse.com
Subject: Re: [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents
Date: Tue, 20 Nov 2018 23:09:03 +0200	[thread overview]
Message-ID: <1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com> (raw)
In-Reply-To: <20181120190050.u4pyaa6m5p5tck27@macbook-pro-91.dhcp.thefacebook.com>



On 20.11.18 г. 21:00 ч., Josef Bacik wrote:
> On Fri, Oct 26, 2018 at 02:41:55PM +0300, Nikolay Borisov wrote:
>> Running btrfs/124 in a loop hung up on me sporadically with the
>> following call trace:
>> 	btrfs           D    0  5760   5324 0x00000000
>> 	Call Trace:
>> 	 ? __schedule+0x243/0x800
>> 	 schedule+0x33/0x90
>> 	 btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>> 	 ? wait_woken+0xa0/0xa0
>> 	 btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>> 	 btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>> 	 btrfs_relocate_chunk+0x49/0x100 [btrfs]
>> 	 btrfs_balance+0xbeb/0x1740 [btrfs]
>> 	 btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>> 	 btrfs_ioctl+0x1691/0x3110 [btrfs]
>> 	 ? lockdep_hardirqs_on+0xed/0x180
>> 	 ? __handle_mm_fault+0x8e7/0xfb0
>> 	 ? _raw_spin_unlock+0x24/0x30
>> 	 ? __handle_mm_fault+0x8e7/0xfb0
>> 	 ? do_vfs_ioctl+0xa5/0x6e0
>> 	 ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>> 	 do_vfs_ioctl+0xa5/0x6e0
>> 	 ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>> 	 ksys_ioctl+0x3a/0x70
>> 	 __x64_sys_ioctl+0x16/0x20
>> 	 do_syscall_64+0x60/0x1b0
>> 	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> Turns out during page writeback it's possible that the code in
>> writepage_delalloc can instantiate a delalloc range which doesn't
>> belong to the page currently being written back. This happens since
>> find_lock_delalloc_range returns up to BTRFS_MAX_EXTENT_SIZE delalloc
>> range when asked and doens't really consider the range of the passed
>> page. When such a foregin range is found the code proceeds to
>> run_delalloc_range and calls the appropriate function to fill the
>> delalloc and create ordered extents. If, however, a failure occurs
>> while this operation is in effect then the clean up code in
>> btrfs_cleanup_ordered_extents will be called. This function has the
>> wrong assumption that caller of run_delalloc_range always properly
>> cleans the first page of the range hence when it calls
>> __endio_write_update_ordered it explicitly ommits the first page of
>> the delalloc range. This is wrong because this function could be
>> cleaning a delalloc range that doesn't belong to the current page. This
>> in turn means that the page cleanup code in __extent_writepage will
>> not really free the initial page for the range, leaving a hanging
>> ordered extent with bytes_left set to 4k. This bug has been present
>> ever since the original introduction of the cleanup code in 524272607e88.
>>
>> Fix this by correctly checking whether the current page belongs to the
>> range being instantiated and if so correctly adjust the range parameters
>> passed for cleaning up. If it doesn't, then just clean the whole OE
>> range directly.
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> Fixes: 524272607e88 ("btrfs: Handle delalloc error correctly to avoid ordered extent hang")
> 
> Can we just remove the endio cleanup in __extent_writepage() and make this do
> the proper cleanup?  I'm not sure if that is feasible or not, but seems like it
> would make the cleanup stuff less confusing and more straightforward.  If not
> you can add

Quickly skimming the code I think the cleanup in __extent_writepage
could be moved into __extent_writepage_io where we have 2 branches that
set PageError. So I guess it could be done, but I will have to
experiment with it.

> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> Thanks,
> 
> Josef
> 

      reply	other threads:[~2018-11-20 21:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-26 11:41 [PATCH v2] btrfs: Fix error handling in btrfs_cleanup_ordered_extents Nikolay Borisov
2018-10-26 11:53 ` Qu Wenruo
2018-10-26 12:02   ` Nikolay Borisov
2018-10-29  5:53     ` Qu Wenruo
2018-10-29  7:51       ` Nikolay Borisov
2018-10-29 12:21         ` Nikolay Borisov
2018-10-29 14:39           ` Nikolay Borisov
2018-11-20 19:00 ` Josef Bacik
2018-11-20 21:09   ` Nikolay Borisov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1af84f15-4ef4-0aaf-60a4-f4c3275e08ba@suse.com \
    --to=nborisov@suse.com \
    --cc=dsterba@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.