All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL] Btrfs, updates for 4.12
@ 2017-04-19 11:35 David Sterba
  2017-04-26 15:06 ` Filipe Manana
  0 siblings, 1 reply; 5+ messages in thread
From: David Sterba @ 2017-04-19 11:35 UTC (permalink / raw)
  To: clm; +Cc: David Sterba, linux-btrfs

Hi,

this is the main part of my 4.12 pull, condensed changelog below. I might send
another pull with low-risk patches, mostly cleanups, but so far I'm done with
base testing now. We had a high-churn cycle last time, so this could be small
one and we can concentrate on testing & fixing the raid56 updates.

The qgroup patches have been in for-next but I haven't seen any new review for
the core part.

Updates:
* raid56:
  * fix mirror name in warning message after repair
  * scrub fixes: calculate parity correctly
  * scrub recheck and dev replace race fix
  * enabled auto-repair during read
  * fix warnings during recovery, due to races, bogus reports can appear
* switch to refcount_t where atomic_t was used for plain refcounting
* new and updated tracepoints
* split __btrfs_map_block, clean up
* minor qgroup fixes
* usual cleanups

----------------------------------------------------------------
The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:

  Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12

for you to fetch changes up to c2a9c7ab475bc3aaf06521a39ac65bc48c8cad4f:

  btrfs: check if the device is flush capable (2017-04-18 16:13:27 +0200)

----------------------------------------------------------------
Adam Borowski (1):
      btrfs: fix a bogus warning when converting only data or metadata

Anand Jain (3):
      btrfs: use q which is already obtained from bdev_get_queue
      btrfs: delete unused member nobarriers
      btrfs: check if the device is flush capable

Dan Carpenter (1):
      Btrfs: handle only applicable errors returned by btrfs_get_extent

David Sterba (12):
      btrfs: preallocate radix tree node for readahead
      btrfs: preallocate radix tree node for global readahead tree
      btrfs: remove redundant parameter from btree_readahead_hook
      btrfs: remove redundant parameter from reada_find_zone
      btrfs: remove redundant parameter from reada_start_machine_dev
      btrfs: remove local blocksize variable in reada_find_extent
      btrfs: remove unused qgroup members from btrfs_trans_handle
      btrfs: track exclusive filesystem operation in flags
      btrfs: sink GFP flags parameter to tree_mod_log_insert_move
      btrfs: sink GFP flags parameter to tree_mod_log_insert_root
      btrfs: drop redundant parameters from btrfs_map_sblock
      btrfs: use clear_page where appropriate

Deepa Dinamani (1):
      btrfs: Use ktime_get_real_ts for root ctime

Dmitry V. Levin (1):
      MAINTAINERS: add btrfs file entries for include directories

Edmund Nadolski (2):
      btrfs: provide enumeration for __merge_refs mode argument
      btrfs: replace hardcoded value with SEQ_LAST macro

Elena Reshetova (16):
      btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
      btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
      btrfs: convert extent_map.refs from atomic_t to refcount_t
      btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t
      btrfs: convert btrfs_caching_control.count from atomic_t to refcount_t
      btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t
      btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
      btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
      btrfs: convert btrfs_root.refs from atomic_t to refcount_t
      btrfs: convert extent_state.refs from atomic_t to refcount_t
      btrfs: convert compressed_bio.pending_bios from atomic_t to refcount_t
      btrfs: convert scrub_recover.refs from atomic_t to refcount_t
      btrfs: convert scrub_block.refs from atomic_t to refcount_t
      btrfs: convert scrub_parity.refs from atomic_t to refcount_t
      btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
      btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t

Goldwyn Rodrigues (2):
      btrfs: No need to check !(flags & MS_RDONLY) twice
      btrfs: qgroups: Retry after commit on getting EDQUOT

Hans van Kranenburg (1):
      Btrfs: consistent usage of types in balance_args

Liu Bo (15):
      Btrfs: remove ASSERT in btrfs_truncate_inode_items
      Btrfs: add file item tracepoints
      Btrfs: create a helper for getting chunk map
      Btrfs: separate DISCARD from __btrfs_map_block
      Btrfs: introduce a function to get extra mirror from replace
      Btrfs: handle operations for device replace separately
      Btrfs: do not add extra mirror when dev_replace target dev is not available
      Btrfs: helper for ops that requires full stripe
      Btrfs: convert BUG_ON to WARN_ON
      Btrfs: update comments in cache_save_setup
      Btrfs: set scrub page's io_error if failing to submit io
      Btrfs: fix wrong failed mirror_num of read-repair on raid56
      Btrfs: enable repair during read for raid56 profile
      Btrfs: update scrub_parity to use u64 stripe_len
      Btrfs: switch to div64_u64 if with a u64 divisor

Qu Wenruo (8):
      btrfs: qgroup: Add trace point for qgroup reserved space
      btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
      btrfs: scrub: Don't append on-disk pages for raid56 scrub
      btrfs: Wait for in-flight bios before freeing target device for raid56
      btrfs: Prevent scrub recheck from racing with dev replace
      btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
      btrfs: scrub: Introduce full stripe lock for RAID56
      btrfs: scrub: Fix RAID56 recovery race condition

 MAINTAINERS                  |   2 +
 fs/btrfs/backref.c           |  41 ++-
 fs/btrfs/compression.c       |  18 +-
 fs/btrfs/ctree.c             |  20 +-
 fs/btrfs/ctree.h             |  34 +-
 fs/btrfs/delayed-inode.c     |  46 +--
 fs/btrfs/delayed-inode.h     |   6 +-
 fs/btrfs/delayed-ref.c       |   8 +-
 fs/btrfs/delayed-ref.h       |   8 +-
 fs/btrfs/dev-replace.c       |   9 +-
 fs/btrfs/disk-io.c           |  15 +-
 fs/btrfs/disk-io.h           |   4 +-
 fs/btrfs/extent-tree.c       |  35 +-
 fs/btrfs/extent_io.c         |  59 +--
 fs/btrfs/extent_io.h         |   3 +-
 fs/btrfs/extent_map.c        |  10 +-
 fs/btrfs/extent_map.h        |   3 +-
 fs/btrfs/file.c              |  16 +-
 fs/btrfs/free-space-cache.c  |   2 +-
 fs/btrfs/inode.c             |  47 +--
 fs/btrfs/ioctl.c             |  33 +-
 fs/btrfs/ordered-data.c      |  20 +-
 fs/btrfs/ordered-data.h      |   2 +-
 fs/btrfs/qgroup.c            | 102 ++----
 fs/btrfs/qgroup.h            |  51 ++-
 fs/btrfs/raid56.c            |  38 +-
 fs/btrfs/reada.c             |  37 +-
 fs/btrfs/root-tree.c         |   3 +-
 fs/btrfs/scrub.c             | 331 +++++++++++++++--
 fs/btrfs/super.c             |   3 +-
 fs/btrfs/tests/btrfs-tests.c |   1 -
 fs/btrfs/transaction.c       |  48 ++-
 fs/btrfs/transaction.h       |   6 +-
 fs/btrfs/tree-log.c          |   2 +-
 fs/btrfs/volumes.c           | 856 +++++++++++++++++++++++--------------------
 fs/btrfs/volumes.h           |   8 +-
 include/trace/events/btrfs.h | 187 +++++++++-
 include/uapi/linux/btrfs.h   |  10 +-
 38 files changed, 1354 insertions(+), 770 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PULL] Btrfs, updates for 4.12
  2017-04-19 11:35 [PULL] Btrfs, updates for 4.12 David Sterba
@ 2017-04-26 15:06 ` Filipe Manana
  2017-04-26 15:12   ` Chris Mason
                     ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Filipe Manana @ 2017-04-26 15:06 UTC (permalink / raw)
  To: David Sterba; +Cc: Chris Mason, linux-btrfs

On Wed, Apr 19, 2017 at 12:35 PM, David Sterba <dsterba@suse.com> wrote:
> Hi,
>
> this is the main part of my 4.12 pull, condensed changelog below. I might send
> another pull with low-risk patches, mostly cleanups, but so far I'm done with
> base testing now. We had a high-churn cycle last time, so this could be small
> one and we can concentrate on testing & fixing the raid56 updates.
>
> The qgroup patches have been in for-next but I haven't seen any new review for
> the core part.
>
> Updates:
> * raid56:
>   * fix mirror name in warning message after repair
>   * scrub fixes: calculate parity correctly
>   * scrub recheck and dev replace race fix
>   * enabled auto-repair during read
>   * fix warnings during recovery, due to races, bogus reports can appear
> * switch to refcount_t where atomic_t was used for plain refcounting
> * new and updated tracepoints
> * split __btrfs_map_block, clean up
> * minor qgroup fixes
> * usual cleanups
>
> ----------------------------------------------------------------
> The following changes since commit 4f7d029b9bf009fbee76bb10c0c4351a1870d2f3:
>
>   Linux 4.11-rc7 (2017-04-16 13:00:18 -0700)
>
> are available in the git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git for-chris-4.12
>
> for you to fetch changes up to c2a9c7ab475bc3aaf06521a39ac65bc48c8cad4f:
>
>   btrfs: check if the device is flush capable (2017-04-18 16:13:27 +0200)
>
> ----------------------------------------------------------------
> Adam Borowski (1):
>       btrfs: fix a bogus warning when converting only data or metadata
>
> Anand Jain (3):
>       btrfs: use q which is already obtained from bdev_get_queue
>       btrfs: delete unused member nobarriers
>       btrfs: check if the device is flush capable
>
> Dan Carpenter (1):
>       Btrfs: handle only applicable errors returned by btrfs_get_extent
>
> David Sterba (12):
>       btrfs: preallocate radix tree node for readahead
>       btrfs: preallocate radix tree node for global readahead tree
>       btrfs: remove redundant parameter from btree_readahead_hook
>       btrfs: remove redundant parameter from reada_find_zone
>       btrfs: remove redundant parameter from reada_start_machine_dev
>       btrfs: remove local blocksize variable in reada_find_extent
>       btrfs: remove unused qgroup members from btrfs_trans_handle
>       btrfs: track exclusive filesystem operation in flags
>       btrfs: sink GFP flags parameter to tree_mod_log_insert_move
>       btrfs: sink GFP flags parameter to tree_mod_log_insert_root
>       btrfs: drop redundant parameters from btrfs_map_sblock
>       btrfs: use clear_page where appropriate

Hi,

Did you actually ran xfstests with those readahead patches to
preallocate radix tree nodes?

With those 2 patches applied (Chris' for-linus.4,12 branch) this
breaks things and many btrfs specific tests (at least, since I can't
get pass them) result in tons of traces like the following in a debug
kernel:

[ 8180.696804] BUG: sleeping function called from invalid context at
mm/slab.h:432
[ 8180.703584] in_atomic(): 1, irqs_disabled(): 0, pid: 28583, name: btrfs
[ 8180.724146] 2 locks held by btrfs/28583:
[ 8180.726427]  #0:  (sb_writers#12){.+.+.+}, at: [<ffffffff811c1e33>]
mnt_want_write_file+0x25/0x4d
[ 8180.736742]  #1:  (&(&fs_info->reada_lock)->rlock){+.+.+.}, at:
[<ffffffffa02306eb>] reada_add_block+0x2fe/0x6cd [btrfs]
[ 8180.766321] Preemption disabled at:
[ 8180.766326] [<ffffffff8107ac54>] preempt_count_add+0x65/0x68
[ 8180.794837] CPU: 5 PID: 28583 Comm: btrfs Tainted: G        W
4.11.0-rc8-btrfs-next-39+ #1
[ 8180.798818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[ 8180.798818] Call Trace:
[ 8180.798818]  dump_stack+0x68/0x92
[ 8180.798818]  ? preempt_count_add+0x65/0x68
[ 8180.798818]  ___might_sleep+0x20f/0x226
[ 8180.798818]  __might_sleep+0x77/0x7e
[ 8180.798818]  slab_pre_alloc_hook+0x32/0x4f
[ 8180.798818]  kmem_cache_alloc+0x39/0x233
[ 8180.798818]  ? radix_tree_node_alloc.constprop.12+0x9d/0xdf
[ 8180.798818]  radix_tree_node_alloc.constprop.12+0x9d/0xdf
[ 8180.798818]  __radix_tree_create+0xc3/0x143
[ 8180.798818]  __radix_tree_insert+0x32/0xc0
[ 8180.798818]  reada_add_block+0x318/0x6cd [btrfs]
[ 8180.798818]  btrfs_reada_add+0xf5/0x122 [btrfs]
[ 8180.798818]  scrub_stripe+0x34b/0xdf0 [btrfs]
[ 8180.798818]  ? __lock_acquire+0x69b/0xf38
[ 8180.798818]  ? scrub_chunk+0x48/0x13b [btrfs]
[ 8180.798818]  scrub_chunk+0x10b/0x13b [btrfs]
[ 8180.798818]  ? scrub_chunk+0x10b/0x13b [btrfs]
[ 8180.798818]  scrub_enumerate_chunks+0x31e/0x59b [btrfs]
[ 8180.798818]  ? add_wait_queue+0x44/0x44
[ 8180.798818]  btrfs_scrub_dev+0x2e3/0x494 [btrfs]
[ 8180.798818]  ? __mnt_want_write+0x65/0x7c
[ 8180.798818]  btrfs_ioctl+0x1498/0x1fb9 [btrfs]
[ 8180.798818]  vfs_ioctl+0x21/0x38
[ 8180.945015]  ? vfs_ioctl+0x21/0x38
[ 8180.945015]  do_vfs_ioctl+0x611/0x645
[ 8180.945015]  ? rcu_read_unlock+0x5b/0x5d
[ 8180.945015]  ? __fget+0x6d/0x79
[ 8180.945015]  SyS_ioctl+0x57/0x7b
[ 8180.945015]  entry_SYSCALL_64_fastpath+0x18/0xad
[ 8180.945015] RIP: 0033:0x7f2b83eefc47
[ 8180.945015] RSP: 002b:00007f2b82e0bd68 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 8180.945015] RAX: ffffffffffffffda RBX: ffffffff8109612f RCX: 00007f2b83eefc47
[ 8180.945015] RDX: 00000000008a4df0 RSI: 00000000c400941b RDI: 0000000000000003
[ 8181.012957] RBP: ffffc90003613f98 R08: 00007f2b82e0c700 R09: 0000000000000000
[ 8181.012957] R10: 00007f2b82e0c700 R11: 0000000000000246 R12: 0000000000000046
[ 8181.012957] R13: ffffc90003613f78 R14: 0000000000000000 R15: 00007f2b84e75040

thanks

>
> Deepa Dinamani (1):
>       btrfs: Use ktime_get_real_ts for root ctime
>
> Dmitry V. Levin (1):
>       MAINTAINERS: add btrfs file entries for include directories
>
> Edmund Nadolski (2):
>       btrfs: provide enumeration for __merge_refs mode argument
>       btrfs: replace hardcoded value with SEQ_LAST macro
>
> Elena Reshetova (16):
>       btrfs: convert btrfs_bio.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_transaction.use_count from atomic_t to refcount_t
>       btrfs: convert extent_map.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_ordered_extent.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_caching_control.count from atomic_t to refcount_t
>       btrfs: convert btrfs_delayed_ref_node.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_delayed_node.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_delayed_item.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_root.refs from atomic_t to refcount_t
>       btrfs: convert extent_state.refs from atomic_t to refcount_t
>       btrfs: convert compressed_bio.pending_bios from atomic_t to refcount_t
>       btrfs: convert scrub_recover.refs from atomic_t to refcount_t
>       btrfs: convert scrub_block.refs from atomic_t to refcount_t
>       btrfs: convert scrub_parity.refs from atomic_t to refcount_t
>       btrfs: convert scrub_ctx.refs from atomic_t to refcount_t
>       btrfs: convert btrfs_raid_bio.refs from atomic_t to refcount_t
>
> Goldwyn Rodrigues (2):
>       btrfs: No need to check !(flags & MS_RDONLY) twice
>       btrfs: qgroups: Retry after commit on getting EDQUOT
>
> Hans van Kranenburg (1):
>       Btrfs: consistent usage of types in balance_args
>
> Liu Bo (15):
>       Btrfs: remove ASSERT in btrfs_truncate_inode_items
>       Btrfs: add file item tracepoints
>       Btrfs: create a helper for getting chunk map
>       Btrfs: separate DISCARD from __btrfs_map_block
>       Btrfs: introduce a function to get extra mirror from replace
>       Btrfs: handle operations for device replace separately
>       Btrfs: do not add extra mirror when dev_replace target dev is not available
>       Btrfs: helper for ops that requires full stripe
>       Btrfs: convert BUG_ON to WARN_ON
>       Btrfs: update comments in cache_save_setup
>       Btrfs: set scrub page's io_error if failing to submit io
>       Btrfs: fix wrong failed mirror_num of read-repair on raid56
>       Btrfs: enable repair during read for raid56 profile
>       Btrfs: update scrub_parity to use u64 stripe_len
>       Btrfs: switch to div64_u64 if with a u64 divisor
>
> Qu Wenruo (8):
>       btrfs: qgroup: Add trace point for qgroup reserved space
>       btrfs: qgroup: Re-arrange tracepoint timing to co-operate with reserved space tracepoint
>       btrfs: scrub: Don't append on-disk pages for raid56 scrub
>       btrfs: Wait for in-flight bios before freeing target device for raid56
>       btrfs: Prevent scrub recheck from racing with dev replace
>       btrfs: qgroup: Fix qgroup corruption caused by inode_cache mount option
>       btrfs: scrub: Introduce full stripe lock for RAID56
>       btrfs: scrub: Fix RAID56 recovery race condition
>
>  MAINTAINERS                  |   2 +
>  fs/btrfs/backref.c           |  41 ++-
>  fs/btrfs/compression.c       |  18 +-
>  fs/btrfs/ctree.c             |  20 +-
>  fs/btrfs/ctree.h             |  34 +-
>  fs/btrfs/delayed-inode.c     |  46 +--
>  fs/btrfs/delayed-inode.h     |   6 +-
>  fs/btrfs/delayed-ref.c       |   8 +-
>  fs/btrfs/delayed-ref.h       |   8 +-
>  fs/btrfs/dev-replace.c       |   9 +-
>  fs/btrfs/disk-io.c           |  15 +-
>  fs/btrfs/disk-io.h           |   4 +-
>  fs/btrfs/extent-tree.c       |  35 +-
>  fs/btrfs/extent_io.c         |  59 +--
>  fs/btrfs/extent_io.h         |   3 +-
>  fs/btrfs/extent_map.c        |  10 +-
>  fs/btrfs/extent_map.h        |   3 +-
>  fs/btrfs/file.c              |  16 +-
>  fs/btrfs/free-space-cache.c  |   2 +-
>  fs/btrfs/inode.c             |  47 +--
>  fs/btrfs/ioctl.c             |  33 +-
>  fs/btrfs/ordered-data.c      |  20 +-
>  fs/btrfs/ordered-data.h      |   2 +-
>  fs/btrfs/qgroup.c            | 102 ++----
>  fs/btrfs/qgroup.h            |  51 ++-
>  fs/btrfs/raid56.c            |  38 +-
>  fs/btrfs/reada.c             |  37 +-
>  fs/btrfs/root-tree.c         |   3 +-
>  fs/btrfs/scrub.c             | 331 +++++++++++++++--
>  fs/btrfs/super.c             |   3 +-
>  fs/btrfs/tests/btrfs-tests.c |   1 -
>  fs/btrfs/transaction.c       |  48 ++-
>  fs/btrfs/transaction.h       |   6 +-
>  fs/btrfs/tree-log.c          |   2 +-
>  fs/btrfs/volumes.c           | 856 +++++++++++++++++++++++--------------------
>  fs/btrfs/volumes.h           |   8 +-
>  include/trace/events/btrfs.h | 187 +++++++++-
>  include/uapi/linux/btrfs.h   |  10 +-
>  38 files changed, 1354 insertions(+), 770 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PULL] Btrfs, updates for 4.12
  2017-04-26 15:06 ` Filipe Manana
@ 2017-04-26 15:12   ` Chris Mason
  2017-04-26 16:08   ` David Sterba
  2017-04-26 17:26   ` Chris Mason
  2 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2017-04-26 15:12 UTC (permalink / raw)
  To: fdmanana, David Sterba; +Cc: linux-btrfs



On 04/26/2017 11:06 AM, Filipe Manana wrote:

> Hi,
>
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
>
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:

Huh, I did and these didn't come up.  I'll double check I have 
preemption enabled.

-chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PULL] Btrfs, updates for 4.12
  2017-04-26 15:06 ` Filipe Manana
  2017-04-26 15:12   ` Chris Mason
@ 2017-04-26 16:08   ` David Sterba
  2017-04-26 17:26   ` Chris Mason
  2 siblings, 0 replies; 5+ messages in thread
From: David Sterba @ 2017-04-26 16:08 UTC (permalink / raw)
  To: Filipe Manana; +Cc: Chris Mason, linux-btrfs

On Wed, Apr 26, 2017 at 04:06:29PM +0100, Filipe Manana wrote:
> On Wed, Apr 19, 2017 at 12:35 PM, David Sterba <dsterba@suse.com> wrote:
> > Adam Borowski (1):
> >       btrfs: fix a bogus warning when converting only data or metadata
> >
> > Anand Jain (3):
> >       btrfs: use q which is already obtained from bdev_get_queue
> >       btrfs: delete unused member nobarriers
> >       btrfs: check if the device is flush capable
> >
> > Dan Carpenter (1):
> >       Btrfs: handle only applicable errors returned by btrfs_get_extent
> >
> > David Sterba (12):
> >       btrfs: preallocate radix tree node for readahead
> >       btrfs: preallocate radix tree node for global readahead tree
> >       btrfs: remove redundant parameter from btree_readahead_hook
> >       btrfs: remove redundant parameter from reada_find_zone
> >       btrfs: remove redundant parameter from reada_start_machine_dev
> >       btrfs: remove local blocksize variable in reada_find_extent
> >       btrfs: remove unused qgroup members from btrfs_trans_handle
> >       btrfs: track exclusive filesystem operation in flags
> >       btrfs: sink GFP flags parameter to tree_mod_log_insert_move
> >       btrfs: sink GFP flags parameter to tree_mod_log_insert_root
> >       btrfs: drop redundant parameters from btrfs_map_sblock
> >       btrfs: use clear_page where appropriate
> 
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
> 
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:

I did, no such reports appeared in my setup. There are several debugging
options enabled in the config but I don't see the one to catch sleep in
atomic. I'll fix my setup scripts to enable it before each build. There
might be other surprises that I missed though. Thanks for cathing it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PULL] Btrfs, updates for 4.12
  2017-04-26 15:06 ` Filipe Manana
  2017-04-26 15:12   ` Chris Mason
  2017-04-26 16:08   ` David Sterba
@ 2017-04-26 17:26   ` Chris Mason
  2 siblings, 0 replies; 5+ messages in thread
From: Chris Mason @ 2017-04-26 17:26 UTC (permalink / raw)
  To: fdmanana, David Sterba; +Cc: linux-btrfs

On 04/26/2017 11:06 AM, Filipe Manana wrote:
 
> Hi,
> 
> Did you actually ran xfstests with those readahead patches to
> preallocate radix tree nodes?
> 
> With those 2 patches applied (Chris' for-linus.4,12 branch) this
> breaks things and many btrfs specific tests (at least, since I can't
> get pass them) result in tons of traces like the following in a debug
> kernel:
> 
> [ 8180.696804] BUG: sleeping function called from invalid context at
> mm/slab.h:432
> [ 8180.703584] in_atomic(): 1, irqs_disabled(): 0, pid: 28583, name: btrfs
> [ 8180.724146] 2 locks held by btrfs/28583:
> [ 8180.726427]  #0:  (sb_writers#12){.+.+.+}, at: [<ffffffff811c1e33>]
> mnt_want_write_file+0x25/0x4d
> [ 8180.736742]  #1:  (&(&fs_info->reada_lock)->rlock){+.+.+.}, at:
> [<ffffffffa02306eb>] reada_add_block+0x2fe/0x6cd [btrfs]
> [ 8180.766321] Preemption disabled at:
> [ 8180.766326] [<ffffffff8107ac54>] preempt_count_add+0x65/0x68
> [ 8180.794837] CPU: 5 PID: 28583 Comm: btrfs Tainted: G        W
> 4.11.0-rc8-btrfs-next-39+ #1
> [ 8180.798818] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 8180.798818] Call Trace:
> [ 8180.798818]  dump_stack+0x68/0x92
> [ 8180.798818]  ? preempt_count_add+0x65/0x68
> [ 8180.798818]  ___might_sleep+0x20f/0x226
> [ 8180.798818]  __might_sleep+0x77/0x7e
> [ 8180.798818]  slab_pre_alloc_hook+0x32/0x4f
> [ 8180.798818]  kmem_cache_alloc+0x39/0x233
> [ 8180.798818]  ? radix_tree_node_alloc.constprop.12+0x9d/0xdf
> [ 8180.798818]  radix_tree_node_alloc.constprop.12+0x9d/0xdf
> [ 8180.798818]  __radix_tree_create+0xc3/0x143
> [ 8180.798818]  __radix_tree_insert+0x32/0xc0
> [ 8180.798818]  reada_add_block+0x318/0x6cd [btrfs]

So radix_tree_preload doesn't work the way I thought it did.  It populates a 
per-cpu pool of radix tree nodes so the allocation is sure not to fail.

But, when we go to actually allocate the node during radix_tree_insert:


static struct radix_tree_node *                                                 
radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent,           
                        struct radix_tree_root *root,                           
                        unsigned int shift, unsigned int offset,                
                        unsigned int count, unsigned int exceptional)           
{                                                                               
        struct radix_tree_node *ret = NULL;                                     
                                                                                
        /*                                                                      
         * Preload code isn't irq safe and it doesn't make sense to use         
         * preloading during an interrupt anyway as all the allocations have    
         * to be atomic. So just do normal allocation when in interrupt.        
         */                                                                     
        if (!gfpflags_allow_blocking(gfp_mask) && !in_interrupt()) {            
                struct radix_tree_preload *rtp;                                 
                                                                                
                /*                                                              
                 * Even if the caller has preloaded, try to allocate from the   
                 * cache first for the new node to get accounted to the memory  
                 * cgroup.                                                      
                 */                                                             
                ret = kmem_cache_alloc(radix_tree_node_cachep,                  
                                       gfp_mask | __GFP_NOWARN);                
                if (ret)                                                        
                        goto out;                                               
                                                                                
                /*                                                              
                 * Provided the caller has preloaded here, we will always       
                 * succeed in getting a node here (and never reach              
                 * kmem_cache_alloc)                                            
                 */                                                             
                rtp = this_cpu_ptr(&radix_tree_preloads);                       
                if (rtp->nr) {                                                  
                        ret = rtp->nodes;                                       
                        rtp->nodes = ret->parent;                               
                        rtp->nr--;                                              
                }                                                               
                /*                                                              
                 * Update the allocation stack trace as this is more useful     
                 * for debugging.                                               
                 */                                                             
                kmemleak_update_trace(ret);                                     
                goto out;                                                       
        }                                                                       
        ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask);

We only jump into the preload pool if our gfp_mask for the root doesn't
allow blocking.  And even if we don't allow blocking we'll still hit the
pool as a last resort.

So I think the right answer is to keep the sleeping flag off the root and
also keep the preload GFP_KERNEL.

-chris



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-26 17:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-19 11:35 [PULL] Btrfs, updates for 4.12 David Sterba
2017-04-26 15:06 ` Filipe Manana
2017-04-26 15:12   ` Chris Mason
2017-04-26 16:08   ` David Sterba
2017-04-26 17:26   ` Chris Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.