* [PULL] Btrfs, updates for 4.12
@ 2017-04-19 11:35 David Sterba
2017-04-26 15:06 ` Filipe Manana
0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2017-04-19 11:35 UTC (permalink / raw)
To: clm; +Cc: David Sterba, linux-btrfs
Hi,
this is the main part of my 4.12 pull, condensed changelog below. I might send
another pull with low-risk patches, mostly cleanups, but so far I'm done with
base testing now. We had a high-churn cycle last time, so this could be small
one and we can concentrate on testing & fixing the raid56 updates.
The qgroup patches have been in for-next but I haven't seen any new review for
the core part.
Updates:
* raid56:
* fix mirror name in warning message after repair
* scrub fixes: calculate parity correctly
* scrub recheck and dev replace race fix
* enabled auto-repair during read
* fix warnings during recovery, due to races, bogus reports can appear
* switch to refcount_t where atomic_t was used for plain refcounting
* new and updated tracepoints
* split __btrfs_map_block, clean up
* minor qgroup fixes
* usual cleanups
----------------------------------------------------------------
The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12
for you to fetch changes up to c2a9c7ab475bc3aaf06521a39ac65bc48c8cad4f:
btrfs: check if the device is flush capable (2017-04-18 16:13:27 +0200)
----------------------------------------------------------------
Adam Borowski (1):
btrfs: fix a bogus warning when converting only data or metadata
Anand Jain (3):
btrfs: use q which is already obtained from bdev_get_queue
btrfs: delete unused member nobarriers
btrfs: check if the device is flush capable
Dan Carpenter (1):
Btrfs: handle only applicable errors returned by btrfs_get_extent
David Sterba (12):
btrfs: preallocate radix tree node for readahead
btrfs: preallocate radix tree node for global readahead tree
btrfs: remove redundant parameter from btree_readahead_hook
btrfs: remove redundant parameter from reada_find_zone
btrfs: remove redundant parameter from reada_start_machine_dev
btrfs: remove local blocksize variable in reada_find_extent
btrfs: remove unused qgroup members from btrfs_trans_handle
btrfs: track exclusive filesystem operation in flags
btrfs: sink GFP flags parameter to tree_mod_log_insert_move
btrfs: sink GFP flags parameter to tree_mod_log_insert_root
btrfs: drop redundant parameters from btrfs_map_sblock
btrfs: use clear_page where appropriate
Deepa Dinamani (1):
btrfs: Use ktime_get_real_ts for root ctime
Dmitry V. Levin (1):
MAINTAINERS: add btrfs file entries for include directories
Edmund Nadolski (2):
btrfs: provide enumeration for __merge_refs mode argument
btrfs: replace hardcoded value with SEQ_LAST macro
Elena Reshetova (16):
btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
btrfs: convert extent_map.refs from atomic_t to refcount_t
btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t
btrfs: convert btrfs_caching_control.count from atomic_t to refcount_t
btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t
btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
btrfs: convert btrfs_root.refs from atomic_t to refcount_t
btrfs: convert extent_state.refs from atomic_t to refcount_t
btrfs: convert compressed_bio.pending_bios from atomic_t to refcount_t
btrfs: convert scrub_recover.refs from atomic_t to refcount_t
btrfs: convert scrub_block.refs from atomic_t to refcount_t
btrfs: convert scrub_parity.refs from atomic_t to refcount_t
btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t
Goldwyn Rodrigues (2):
btrfs: No need to check !(flags & MS_RDONLY) twice
btrfs: qgroups: Retry after commit on getting EDQUOT
Hans van Kranenburg (1):
Btrfs: consistent usage of types in balance_args
Liu Bo (15):
Btrfs: remove ASSERT in btrfs_truncate_inode_items
Btrfs: add file item tracepoints
Btrfs: create a helper for getting chunk map
Btrfs: separate DISCARD from __btrfs_map_block
Btrfs: introduce a function to get extra mirror from replace
Btrfs: handle operations for device replace separately
Btrfs: do not add extra mirror when dev_replace target dev is not available
Btrfs: helper for ops that requires full stripe
Btrfs: convert BUG_ON to WARN_ON
Btrfs: update comments in cache_save_setup
Btrfs: set scrub page's io_error if failing to submit io
Btrfs: fix wrong failed mirror_num of read-repair on raid56
Btrfs: enable repair during read for raid56 profile
Btrfs: update scrub_parity to use u64 stripe_len
Btrfs: switch to div64_u64 if with a u64 divisor
Qu Wenruo (8):
btrfs: qgroup: Add trace point for qgroup reserved space
btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
btrfs: scrub: Don't append on-disk pages for raid56 scrub
btrfs: Wait for in-flight bios before freeing target device for raid56
btrfs: Prevent scrub recheck from racing with dev replace
btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
btrfs: scrub: Introduce full stripe lock for RAID56
btrfs: scrub: Fix RAID56 recovery race condition
MAINTAINERS | 2 +
fs/btrfs/backref.c | 41 ++-
fs/btrfs/compression.c | 18 +-
fs/btrfs/ctree.c | 20 +-
fs/btrfs/ctree.h | 34 +-
fs/btrfs/delayed-inode.c | 46 +--
fs/btrfs/delayed-inode.h | 6 +-
fs/btrfs/delayed-ref.c | 8 +-
fs/btrfs/delayed-ref.h | 8 +-
fs/btrfs/dev-replace.c | 9 +-
fs/btrfs/disk-io.c | 15 +-
fs/btrfs/disk-io.h | 4 +-
fs/btrfs/extent-tree.c | 35 +-
fs/btrfs/extent_io.c | 59 +--
fs/btrfs/extent_io.h | 3 +-
fs/btrfs/extent_map.c | 10 +-
fs/btrfs/extent_map.h | 3 +-
fs/btrfs/file.c | 16 +-
fs/btrfs/free-space-cache.c | 2 +-
fs/btrfs/inode.c | 47 +--
fs/btrfs/ioctl.c | 33 +-
fs/btrfs/ordered-data.c | 20 +-
fs/btrfs/ordered-data.h | 2 +-
fs/btrfs/qgroup.c | 102 ++----
fs/btrfs/qgroup.h | 51 ++-
fs/btrfs/raid56.c | 38 +-
fs/btrfs/reada.c | 37 +-
fs/btrfs/root-tree.c | 3 +-
fs/btrfs/scrub.c | 331 +++++++++++++++--
fs/btrfs/super.c | 3 +-
fs/btrfs/tests/btrfs-tests.c | 1 -
fs/btrfs/transaction.c | 48 ++-
fs/btrfs/transaction.h | 6 +-
fs/btrfs/tree-log.c | 2 +-
fs/btrfs/volumes.c | 856 +++++++++++++++++++++++--------------------
fs/btrfs/volumes.h | 8 +-
include/trace/events/btrfs.h | 187 +++++++++-
include/uapi/linux/btrfs.h | 10 +-
38 files changed, 1354 insertions(+), 770 deletions(-)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PULL] Btrfs, updates for 4.12
2017-04-19 11:35 [PULL] Btrfs, updates for 4.12 David Sterba
@ 2017-04-26 15:06 ` Filipe Manana
2017-04-26 15:12 ` Chris Mason
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Filipe Manana @ 2017-04-26 15:06 UTC (permalink / raw)
To: David Sterba; +Cc: Chris Mason, linux-btrfs
On Wed, Apr 19, 2017 at 12:35 PM, David Sterba <dsterba@suse.com> wrote:
> Hi,
>
> this is the main part of my 4.12 pull, condensed changelog below. I might send
> another pull with low-risk patches, mostly cleanups, but so far I'm done with
> base testing now. We had a high-churn cycle last time, so this could be small
> one and we can concentrate on testing & fixing the raid56 updates.
>
> The qgroup patches have been in for-next but I haven't seen any new review for
> the core part.
>
> Updates:
> * raid56:
> * fix mirror name in warning message after repair
> * scrub fixes: calculate parity correctly
> * scrub recheck and dev replace race fix
> * enabled auto-repair during read
> * fix warnings during recovery, due to races, bogus reports can appear
> * switch to refcount_t where atomic_t was used for plain refcounting
> * new and updated tracepoints
> * split __btrfs_map_block, clean up
> * minor qgroup fixes
> * usual cleanups
>
> ----------------------------------------------------------------
> The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
>
> Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12
>
> for you to fetch changes up to c2a9c7ab475bc3aaf06521a39ac65bc48c8cad4f:
>
> btrfs: check if the device is flush capable (2017-04-18 16:13:27 +0200)
>
> ----------------------------------------------------------------
> Adam Borowski (1):
> btrfs: fix a bogus warning when converting only data or metadata
>
> Anand Jain (3):
> btrfs: use q which is already obtained from bdev_get_queue
> btrfs: delete unused member nobarriers
> btrfs: check if the device is flush capable
>
> Dan Carpenter (1):
> Btrfs: handle only applicable errors returned by btrfs_get_extent
>
> David Sterba (12):
> btrfs: preallocate radix tree node for readahead
> btrfs: preallocate radix tree node for global readahead tree
> btrfs: remove redundant parameter from btree_readahead_hook
> btrfs: remove redundant parameter from reada_find_zone
> btrfs: remove redundant parameter from reada_start_machine_dev
> btrfs: remove local blocksize variable in reada_find_extent
> btrfs: remove unused qgroup members from btrfs_trans_handle
> btrfs: track exclusive filesystem operation in flags
> btrfs: sink GFP flags parameter to tree_mod_log_insert_move
> btrfs: sink GFP flags parameter to tree_mod_log_insert_root
> btrfs: drop redundant parameters from btrfs_map_sblock
> btrfs: use clear_page where appropriate
Hi,
Did you actually ran xfstests with those readahead patches to
preallocate radix tree nodes?
With those 2 patches applied (Chris' for-linus.4,12 branch) this
breaks things and many btrfs specific tests (at least, since I can't
get pass them) result in tons of traces like the following in a debug
kernel:
[ 8180.696804] BUG: sleeping function called from invalid context at
mm/slab.h:432
[ 8180.703584] in_atomic(): 1, irqs_disabled(): 0, pid: 28583, name: btrfs
[ 8180.724146] 2 locks held by btrfs/28583:
[ 8180.726427] #0: (sb_writers#12){.+.+.+}, at: [<ffffffff811c1e33>]
mnt_want_write_file+0x25/0x4d
[ 8180.736742] #1: (&(&fs_info->reada_lock)->rlock){+.+.+.}, at:
[<ffffffffa02306eb>] reada_add_block+0x2fe/0x6cd [btrfs]
[ 8180.766321] Preemption disabled at:
[ 8180.766326] [<ffffffff8107ac54>] preempt_count_add+0x65/0x68
[ 8180.794837] CPU: 5 PID: 28583 Comm: btrfs Tainted: G W
4.11.0-rc8-btrfs-next-39+ #1
[ 8180.798818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 8180.798818] Call Trace:
[ 8180.798818] dump_stack+0x68/0x92
[ 8180.798818] ? preempt_count_add+0x65/0x68
[ 8180.798818] ___might_sleep+0x20f/0x226
[ 8180.798818] __might_sleep+0x77/0x7e
[ 8180.798818] slab_pre_alloc_hook+0x32/0x4f
[ 8180.798818] kmem_cache_alloc+0x39/0x233
[ 8180.798818] ? radix_tree_node_alloc.constprop.12+0x9d/0xdf
[ 8180.798818] radix_tree_node_alloc.constprop.12+0x9d/0xdf
[ 8180.798818] __radix_tree_create+0xc3/0x143
[ 8180.798818] __radix_tree_insert+0x32/0xc0
[ 8180.798818] reada_add_block+0x318/0x6cd [btrfs]
[ 8180.798818] btrfs_reada_add+0xf5/0x122 [btrfs]
[ 8180.798818] scrub_stripe+0x34b/0xdf0 [btrfs]
[ 8180.798818] ? __lock_acquire+0x69b/0xf38
[ 8180.798818] ? scrub_chunk+0x48/0x13b [btrfs]
[ 8180.798818] scrub_chunk+0x10b/0x13b [btrfs]
[ 8180.798818] ? scrub_chunk+0x10b/0x13b [btrfs]
[ 8180.798818] scrub_enumerate_chunks+0x31e/0x59b [btrfs]
[ 8180.798818] ? add_wait_queue+0x44/0x44
[ 8180.798818] btrfs_scrub_dev+0x2e3/0x494 [btrfs]
[ 8180.798818] ? __mnt_want_write+0x65/0x7c
[ 8180.798818] btrfs_ioctl+0x1498/0x1fb9 [btrfs]
[ 8180.798818] vfs_ioctl+0x21/0x38
[ 8180.945015] ? vfs_ioctl+0x21/0x38
[ 8180.945015] do_vfs_ioctl+0x611/0x645
[ 8180.945015] ? rcu_read_unlock+0x5b/0x5d
[ 8180.945015] ? __fget+0x6d/0x79
[ 8180.945015] SyS_ioctl+0x57/0x7b
[ 8180.945015] entry_SYSCALL_64_fastpath+0x18/0xad
[ 8180.945015] RIP: 0033:0x7f2b83eefc47
[ 8180.945015] RSP: 002b:00007f2b82e0bd68 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 8180.945015] RAX: ffffffffffffffda RBX: ffffffff8109612f RCX: 00007f2b83eefc47
[ 8180.945015] RDX: 00000000008a4df0 RSI: 00000000c400941b RDI: 0000000000000003
[ 8181.012957] RBP: ffffc90003613f98 R08: 00007f2b82e0c700 R09: 0000000000000000
[ 8181.012957] R10: 00007f2b82e0c700 R11: 0000000000000246 R12: 0000000000000046
[ 8181.012957] R13: ffffc90003613f78 R14: 0000000000000000 R15: 00007f2b84e75040
thanks
>
> Deepa Dinamani (1):
> btrfs: Use ktime_get_real_ts for root ctime
>
> Dmitry V. Levin (1):
> MAINTAINERS: add btrfs file entries for include directories
>
> Edmund Nadolski (2):
> btrfs: provide enumeration for __merge_refs mode argument
> btrfs: replace hardcoded value with SEQ_LAST macro
>
> Elena Reshetova (16):
> btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
> btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
> btrfs: convert extent_map.refs from atomic_t to refcount_t
> btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t
> btrfs: convert btrfs_caching_control.count from atomic_t to refcount_t
> btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t
> btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
> btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
> btrfs: convert btrfs_root.refs from atomic_t to refcount_t
> btrfs: convert extent_state.refs from atomic_t to refcount_t
> btrfs: convert compressed_bio.pending_bios from atomic_t to refcount_t
> btrfs: convert scrub_recover.refs from atomic_t to refcount_t
> btrfs: convert scrub_block.refs from atomic_t to refcount_t
> btrfs: convert scrub_parity.refs from atomic_t to refcount_t
> btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
> btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t
>
> Goldwyn Rodrigues (2):
> btrfs: No need to check !(flags & MS_RDONLY) twice
> btrfs: qgroups: Retry after commit on getting EDQUOT
>
> Hans van Kranenburg (1):
> Btrfs: consistent usage of types in balance_args
>
> Liu Bo (15):
> Btrfs: remove ASSERT in btrfs_truncate_inode_items
> Btrfs: add file item tracepoints
> Btrfs: create a helper for getting chunk map
> Btrfs: separate DISCARD from __btrfs_map_block
> Btrfs: introduce a function to get extra mirror from replace
> Btrfs: handle operations for device replace separately
> Btrfs: do not add extra mirror when dev_replace target dev is not available
> Btrfs: helper for ops that requires full stripe
> Btrfs: convert BUG_ON to WARN_ON
> Btrfs: update comments in cache_save_setup
> Btrfs: set scrub page's io_error if failing to submit io
> Btrfs: fix wrong failed mirror_num of read-repair on raid56
> Btrfs: enable repair during read for raid56 profile
> Btrfs: update scrub_parity to use u64 stripe_len
> Btrfs: switch to div64_u64 if with a u64 divisor
>
> Qu Wenruo (8):
> btrfs: qgroup: Add trace point for qgroup reserved space
> btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
> btrfs: scrub: Don't append on-disk pages for raid56 scrub
> btrfs: Wait for in-flight bios before freeing target device for raid56
> btrfs: Prevent scrub recheck from racing with dev replace
> btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
> btrfs: scrub: Introduce full stripe lock for RAID56
> btrfs: scrub: Fix RAID56 recovery race condition
>
> MAINTAINERS | 2 +
> fs/btrfs/backref.c | 41 ++-
> fs/btrfs/compression.c | 18 +-
> fs/btrfs/ctree.c | 20 +-
> fs/btrfs/ctree.h | 34 +-
> fs/btrfs/delayed-inode.c | 46 +--
> fs/btrfs/delayed-inode.h | 6 +-
> fs/btrfs/delayed-ref.c | 8 +-
> fs/btrfs/delayed-ref.h | 8 +-
> fs/btrfs/dev-replace.c | 9 +-
> fs/btrfs/disk-io.c | 15 +-
> fs/btrfs/disk-io.h | 4 +-
> fs/btrfs/extent-tree.c | 35 +-
> fs/btrfs/extent_io.c | 59 +--
> fs/btrfs/extent_io.h | 3 +-
> fs/btrfs/extent_map.c | 10 +-
> fs/btrfs/extent_map.h | 3 +-
> fs/btrfs/file.c | 16 +-
> fs/btrfs/free-space-cache.c | 2 +-
> fs/btrfs/inode.c | 47 +--
> fs/btrfs/ioctl.c | 33 +-
> fs/btrfs/ordered-data.c | 20 +-
> fs/btrfs/ordered-data.h | 2 +-
> fs/btrfs/qgroup.c | 102 ++----
> fs/btrfs/qgroup.h | 51 ++-
> fs/btrfs/raid56.c | 38 +-
> fs/btrfs/reada.c | 37 +-
> fs/btrfs/root-tree.c | 3 +-
> fs/btrfs/scrub.c | 331 +++++++++++++++--
> fs/btrfs/super.c | 3 +-
> fs/btrfs/tests/btrfs-tests.c | 1 -
> fs/btrfs/transaction.c | 48 ++-
> fs/btrfs/transaction.h | 6 +-
> fs/btrfs/tree-log.c | 2 +-
> fs/btrfs/volumes.c | 856 +++++++++++++++++++++++--------------------
> fs/btrfs/volumes.h | 8 +-
> include/trace/events/btrfs.h | 187 +++++++++-
> include/uapi/linux/btrfs.h | 10 +-
> 38 files changed, 1354 insertions(+), 770 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PULL] Btrfs, updates for 4.12
2017-04-26 15:06 ` Filipe Manana
@ 2017-04-26 15:12 ` Chris Mason
2017-04-26 16:08 ` David Sterba
2017-04-26 17:26 ` Chris Mason
2 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2017-04-26 15:12 UTC (permalink / raw)
To: fdmanana, David Sterba; +Cc: linux-btrfs
On 04/26/2017 11:06 AM, Filipe Manana wrote:
> Hi,
>
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
>
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:
Huh, I did and these didn't come up. I'll double check I have
preemption enabled.
-chris
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PULL] Btrfs, updates for 4.12
2017-04-26 15:06 ` Filipe Manana
2017-04-26 15:12 ` Chris Mason
@ 2017-04-26 16:08 ` David Sterba
2017-04-26 17:26 ` Chris Mason
2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2017-04-26 16:08 UTC (permalink / raw)
To: Filipe Manana; +Cc: Chris Mason, linux-btrfs
On Wed, Apr 26, 2017 at 04:06:29PM +0100, Filipe Manana wrote:
> On Wed, Apr 19, 2017 at 12:35 PM, David Sterba <dsterba@suse.com> wrote:
> > Adam Borowski (1):
> > btrfs: fix a bogus warning when converting only data or metadata
> >
> > Anand Jain (3):
> > btrfs: use q which is already obtained from bdev_get_queue
> > btrfs: delete unused member nobarriers
> > btrfs: check if the device is flush capable
> >
> > Dan Carpenter (1):
> > Btrfs: handle only applicable errors returned by btrfs_get_extent
> >
> > David Sterba (12):
> > btrfs: preallocate radix tree node for readahead
> > btrfs: preallocate radix tree node for global readahead tree
> > btrfs: remove redundant parameter from btree_readahead_hook
> > btrfs: remove redundant parameter from reada_find_zone
> > btrfs: remove redundant parameter from reada_start_machine_dev
> > btrfs: remove local blocksize variable in reada_find_extent
> > btrfs: remove unused qgroup members from btrfs_trans_handle
> > btrfs: track exclusive filesystem operation in flags
> > btrfs: sink GFP flags parameter to tree_mod_log_insert_move
> > btrfs: sink GFP flags parameter to tree_mod_log_insert_root
> > btrfs: drop redundant parameters from btrfs_map_sblock
> > btrfs: use clear_page where appropriate
>
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
>
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:
I did, no such reports appeared in my setup. There are several debugging
options enabled in the config but I don't see the one to catch sleep in
atomic. I'll fix my setup scripts to enable it before each build. There
might be other surprises that I missed though. Thanks for cathing it.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PULL] Btrfs, updates for 4.12
2017-04-26 15:06 ` Filipe Manana
2017-04-26 15:12 ` Chris Mason
2017-04-26 16:08 ` David Sterba
@ 2017-04-26 17:26 ` Chris Mason
2 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2017-04-26 17:26 UTC (permalink / raw)
To: fdmanana, David Sterba; +Cc: linux-btrfs
On 04/26/2017 11:06 AM, Filipe Manana wrote:
> Hi,
>
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
>
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:
>
> [ 8180.696804] BUG: sleeping function called from invalid context at
> mm/slab.h:432
> [ 8180.703584] in_atomic(): 1, irqs_disabled(): 0, pid: 28583, name: btrfs
> [ 8180.724146] 2 locks held by btrfs/28583:
> [ 8180.726427] #0: (sb_writers#12){.+.+.+}, at: [<ffffffff811c1e33>]
> mnt_want_write_file+0x25/0x4d
> [ 8180.736742] #1: (&(&fs_info->reada_lock)->rlock){+.+.+.}, at:
> [<ffffffffa02306eb>] reada_add_block+0x2fe/0x6cd [btrfs]
> [ 8180.766321] Preemption disabled at:
> [ 8180.766326] [<ffffffff8107ac54>] preempt_count_add+0x65/0x68
> [ 8180.794837] CPU: 5 PID: 28583 Comm: btrfs Tainted: G W
> 4.11.0-rc8-btrfs-next-39+ #1
> [ 8180.798818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 8180.798818] Call Trace:
> [ 8180.798818] dump_stack+0x68/0x92
> [ 8180.798818] ? preempt_count_add+0x65/0x68
> [ 8180.798818] ___might_sleep+0x20f/0x226
> [ 8180.798818] __might_sleep+0x77/0x7e
> [ 8180.798818] slab_pre_alloc_hook+0x32/0x4f
> [ 8180.798818] kmem_cache_alloc+0x39/0x233
> [ 8180.798818] ? radix_tree_node_alloc.constprop.12+0x9d/0xdf
> [ 8180.798818] radix_tree_node_alloc.constprop.12+0x9d/0xdf
> [ 8180.798818] __radix_tree_create+0xc3/0x143
> [ 8180.798818] __radix_tree_insert+0x32/0xc0
> [ 8180.798818] reada_add_block+0x318/0x6cd [btrfs]
So radix_tree_preload doesn't work the way I thought it did. It populates a
per-cpu pool of radix tree nodes so the allocation is sure not to fail.
But, when we go to actually allocate the node during radix_tree_insert:
static struct radix_tree_node *
radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,
struct radix_tree_root *root,
unsigned int shift, unsigned int offset,
unsigned int count, unsigned int exceptional)
{
struct radix_tree_node *ret = NULL;
/*
* Preload code isn't irq safe and it doesn't make sense to use
* preloading during an interrupt anyway as all the allocations have
* to be atomic. So just do normal allocation when in interrupt.
*/
if (!gfpflags_allow_blocking(gfp_mask) && !in_interrupt()) {
struct radix_tree_preload *rtp;
/*
* Even if the caller has preloaded, try to allocate from the
* cache first for the new node to get accounted to the memory
* cgroup.
*/
ret = kmem_cache_alloc(radix_tree_node_cachep,
gfp_mask | __GFP_NOWARN);
if (ret)
goto out;
/*
* Provided the caller has preloaded here, we will always
* succeed in getting a node here (and never reach
* kmem_cache_alloc)
*/
rtp = this_cpu_ptr(&radix_tree_preloads);
if (rtp->nr) {
ret = rtp->nodes;
rtp->nodes = ret->parent;
rtp->nr--;
}
/*
* Update the allocation stack trace as this is more useful
* for debugging.
*/
kmemleak_update_trace(ret);
goto out;
}
ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);
We only jump into the preload pool if our gfp_mask for the root doesn't
allow blocking. And even if we don't allow blocking we'll still hit the
pool as a last resort.
So I think the right answer is to keep the sleeping flag off the root and
also keep the preload GFP_KERNEL.
-chris
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-04-26 17:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 11:35 [PULL] Btrfs, updates for 4.12 David Sterba
2017-04-26 15:06 ` Filipe Manana
2017-04-26 15:12 ` Chris Mason
2017-04-26 16:08 ` David Sterba
2017-04-26 17:26 ` Chris Mason
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.