All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for 4.7-rc] btrfs: Only check first key for committed tree blocks
@ 2018-04-12 22:32 Qu Wenruo
  2018-04-13 14:09 ` David Sterba
  0 siblings, 1 reply; 2+ messages in thread
From: Qu Wenruo @ 2018-04-12 22:32 UTC (permalink / raw)
  To: linux-btrfs

When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
kernel warning due to first key verification:

[ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.523830] Modules linked in:
[ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
[ 4239.527101] Call Trace:
[ 4239.527251]  read_tree_block+0x42/0x70
[ 4239.527434]  read_node_slot+0xd2/0x110
[ 4239.527632]  push_leaf_right+0xad/0x1b0
[ 4239.527809]  split_leaf+0x4ea/0x700
[ 4239.527988]  ? leaf_space_used+0xbc/0xe0
[ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
[ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
[ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
[ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
[ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
[ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
[ 4239.529445]  ? start_transaction+0xa8/0x4f0
[ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
[ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
[ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
[ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
[ 4239.532098]  generic_file_direct_write+0xcb/0x170
[ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
[ 4239.532491]  aio_write+0xe2/0x180
[ 4239.532669]  ? lock_acquire+0xac/0x1e0
[ 4239.532839]  ? __might_fault+0x3e/0x90
[ 4239.533032]  do_io_submit+0x594/0x860
[ 4239.533223]  ? do_io_submit+0x594/0x860
[ 4239.533398]  SyS_io_submit+0x10/0x20
[ 4239.533560]  ? SyS_io_submit+0x10/0x20
[ 4239.533729]  do_syscall_64+0x75/0x1d0
[ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 4239.534182] RIP: 0033:0x7f8519741697

The problem here is, at btree_read_extent_buffer_pages() we don't have
acquired read/write lock on that extent buffer, only basic info like
level/bytenr is reliable.

So race condition leads to such false alert.

However in current call site, it's impossible to acquire proper lock
without race window.
To fix the problem, we only verify first key for committed tree blocks
(whose generation is no larger than fs_info->last_trans_committed), so
the content of such tree blocks will not change and there is no need to
get read/write lock.

Reported-by: Nikolay Borisov <nborisov@suse.com>
Fixes: 581c1760415c ("btrfs: Validate child tree block's level and first key")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/disk-io.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07b5e6f7df67..23803102aa0d 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -449,6 +449,14 @@ static int verify_level_key(struct btrfs_fs_info *fs_info,
 	if (!first_key)
 		return 0;
 
+	/*
+	 * For live tree block (new tree blocks in current transaction),
+	 * we need proper lock context to avoid race, which is impossible here.
+	 * So we only checks tree blocks which is read from disk, whose
+	 * generation <= fs_info->last_trans_committed.
+	 */
+	if (btrfs_header_generation(eb) > fs_info->last_trans_committed)
+		return 0;
 	if (found_level)
 		btrfs_node_key_to_cpu(eb, &found_key, 0);
 	else
-- 
2.17.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH for 4.7-rc] btrfs: Only check first key for committed tree blocks
  2018-04-12 22:32 [PATCH for 4.7-rc] btrfs: Only check first key for committed tree blocks Qu Wenruo
@ 2018-04-13 14:09 ` David Sterba
  0 siblings, 0 replies; 2+ messages in thread
From: David Sterba @ 2018-04-13 14:09 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Fri, Apr 13, 2018 at 06:32:47AM +0800, Qu Wenruo wrote:
> When looping btrfs/074 with many cpus (>= 8), it's possible to trigger
> kernel warning due to first key verification:
> 
> [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210
> [ 4239.523830] Modules linked in:
> [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210
> [ 4239.527101] Call Trace:
> [ 4239.527251]  read_tree_block+0x42/0x70
> [ 4239.527434]  read_node_slot+0xd2/0x110
> [ 4239.527632]  push_leaf_right+0xad/0x1b0
> [ 4239.527809]  split_leaf+0x4ea/0x700
> [ 4239.527988]  ? leaf_space_used+0xbc/0xe0
> [ 4239.528192]  ? btrfs_set_lock_blocking_rw+0x99/0xb0
> [ 4239.528416]  btrfs_search_slot+0x8cc/0xa40
> [ 4239.528605]  btrfs_insert_empty_items+0x71/0xc0
> [ 4239.528798]  __btrfs_run_delayed_refs+0xa98/0x1680
> [ 4239.529013]  btrfs_run_delayed_refs+0x10b/0x1b0
> [ 4239.529205]  btrfs_commit_transaction+0x33/0xaf0
> [ 4239.529445]  ? start_transaction+0xa8/0x4f0
> [ 4239.529630]  btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0
> [ 4239.529833]  btrfs_check_data_free_space+0x54/0xa0
> [ 4239.530045]  btrfs_delalloc_reserve_space+0x25/0x70
> [ 4239.531907]  btrfs_direct_IO+0x233/0x3d0
> [ 4239.532098]  generic_file_direct_write+0xcb/0x170
> [ 4239.532296]  btrfs_file_write_iter+0x2bb/0x5f4
> [ 4239.532491]  aio_write+0xe2/0x180
> [ 4239.532669]  ? lock_acquire+0xac/0x1e0
> [ 4239.532839]  ? __might_fault+0x3e/0x90
> [ 4239.533032]  do_io_submit+0x594/0x860
> [ 4239.533223]  ? do_io_submit+0x594/0x860
> [ 4239.533398]  SyS_io_submit+0x10/0x20
> [ 4239.533560]  ? SyS_io_submit+0x10/0x20
> [ 4239.533729]  do_syscall_64+0x75/0x1d0
> [ 4239.533979]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
> [ 4239.534182] RIP: 0033:0x7f8519741697
> 
> The problem here is, at btree_read_extent_buffer_pages() we don't have
> acquired read/write lock on that extent buffer, only basic info like
> level/bytenr is reliable.
> 
> So race condition leads to such false alert.
> 
> However in current call site, it's impossible to acquire proper lock
> without race window.
> To fix the problem, we only verify first key for committed tree blocks
> (whose generation is no larger than fs_info->last_trans_committed), so
> the content of such tree blocks will not change and there is no need to
> get read/write lock.
> 
> Reported-by: Nikolay Borisov <nborisov@suse.com>
> Fixes: 581c1760415c ("btrfs: Validate child tree block's level and first key")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Thanks, qeueued for 4.17.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2018-04-13 14:12 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-12 22:32 [PATCH for 4.7-rc] btrfs: Only check first key for committed tree blocks Qu Wenruo
2018-04-13 14:09 ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.