All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: [PATCH 18/42] btrfs: move the dio_sem higher up the callchain
Date: Fri, 12 Oct 2018 15:32:32 -0400	[thread overview]
Message-ID: <20181012193256.13735-19-josef@toxicpanda.com> (raw)
In-Reply-To: <20181012193256.13735-1-josef@toxicpanda.com>

We're getting a lockdep splat because we take the dio_sem under the
log_mutex.  What we really need is to protect fsync() from logging an
extent map for an extent we never waited on higher up, so just guard the
whole thing with dio_sem.

======================================================
WARNING: possible circular locking dependency detected
4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411 Not tainted
------------------------------------------------------
aio-dio-invalid/30928 is trying to acquire lock:
0000000092621cfd (&mm->mmap_sem){++++}, at: get_user_pages_unlocked+0x5a/0x1e0

but task is already holding lock:
00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #5 (&ei->dio_sem){++++}:
       lock_acquire+0xbd/0x220
       down_write+0x51/0xb0
       btrfs_log_changed_extents+0x80/0xa40
       btrfs_log_inode+0xbaf/0x1000
       btrfs_log_inode_parent+0x26f/0xa80
       btrfs_log_dentry_safe+0x50/0x70
       btrfs_sync_file+0x357/0x540
       do_fsync+0x38/0x60
       __ia32_sys_fdatasync+0x12/0x20
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #4 (&ei->log_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_record_unlink_dir+0x2a/0xa0
       btrfs_unlink+0x5a/0xc0
       vfs_unlink+0xb1/0x1a0
       do_unlinkat+0x264/0x2b0
       do_fast_syscall_32+0x9a/0x2f0
       entry_SYSENTER_compat+0x84/0x96

-> #3 (sb_internal#2){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       start_transaction+0x3e6/0x590
       btrfs_evict_inode+0x475/0x640
       evict+0xbf/0x1b0
       btrfs_run_delayed_iputs+0x6c/0x90
       cleaner_kthread+0x124/0x1a0
       kthread+0x106/0x140
       ret_from_fork+0x3a/0x50

-> #2 (&fs_info->cleaner_delayed_iput_mutex){+.+.}:
       lock_acquire+0xbd/0x220
       __mutex_lock+0x86/0xa10
       btrfs_alloc_data_chunk_ondemand+0x197/0x530
       btrfs_check_data_free_space+0x4c/0x90
       btrfs_delalloc_reserve_space+0x20/0x60
       btrfs_page_mkwrite+0x87/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #1 (sb_pagefaults){.+.+}:
       lock_acquire+0xbd/0x220
       __sb_start_write+0x14d/0x230
       btrfs_page_mkwrite+0x6a/0x520
       do_page_mkwrite+0x31/0xa0
       __handle_mm_fault+0x799/0xb00
       handle_mm_fault+0x7c/0xe0
       __do_page_fault+0x1d3/0x4a0
       async_page_fault+0x1e/0x30

-> #0 (&mm->mmap_sem){++++}:
       __lock_acquire+0x42e/0x7a0
       lock_acquire+0xbd/0x220
       down_read+0x48/0xb0
       get_user_pages_unlocked+0x5a/0x1e0
       get_user_pages_fast+0xa4/0x150
       iov_iter_get_pages+0xc3/0x340
       do_direct_IO+0xf93/0x1d70
       __blockdev_direct_IO+0x32d/0x1c20
       btrfs_direct_IO+0x227/0x400
       generic_file_direct_write+0xcf/0x180
       btrfs_file_write_iter+0x308/0x58c
       aio_write+0xf8/0x1d0
       io_submit_one+0x3a9/0x620
       __ia32_compat_sys_io_submit+0xb2/0x270
       do_int80_syscall_32+0x5b/0x1a0
       entry_INT80_compat+0x88/0xa0

other info that might help us debug this:

Chain exists of:
  &mm->mmap_sem --> &ei->log_mutex --> &ei->dio_sem

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&ei->dio_sem);
                               lock(&ei->log_mutex);
                               lock(&ei->dio_sem);
  lock(&mm->mmap_sem);

 *** DEADLOCK ***

1 lock held by aio-dio-invalid/30928:
 #0: 00000000cefe6b35 (&ei->dio_sem){++++}, at: btrfs_direct_IO+0x3be/0x400

stack backtrace:
CPU: 0 PID: 30928 Comm: aio-dio-invalid Not tainted 4.18.0-rc4-xfstests-00025-g5de5edbaf1d4 #411
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
Call Trace:
 dump_stack+0x7c/0xbb
 print_circular_bug.isra.37+0x297/0x2a4
 check_prev_add.constprop.45+0x781/0x7a0
 ? __lock_acquire+0x42e/0x7a0
 validate_chain.isra.41+0x7f0/0xb00
 __lock_acquire+0x42e/0x7a0
 lock_acquire+0xbd/0x220
 ? get_user_pages_unlocked+0x5a/0x1e0
 down_read+0x48/0xb0
 ? get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_unlocked+0x5a/0x1e0
 get_user_pages_fast+0xa4/0x150
 iov_iter_get_pages+0xc3/0x340
 do_direct_IO+0xf93/0x1d70
 ? __alloc_workqueue_key+0x358/0x490
 ? __blockdev_direct_IO+0x14b/0x1c20
 __blockdev_direct_IO+0x32d/0x1c20
 ? btrfs_run_delalloc_work+0x40/0x40
 ? can_nocow_extent+0x490/0x490
 ? kvm_clock_read+0x1f/0x30
 ? can_nocow_extent+0x490/0x490
 ? btrfs_run_delalloc_work+0x40/0x40
 btrfs_direct_IO+0x227/0x400
 ? btrfs_run_delalloc_work+0x40/0x40
 generic_file_direct_write+0xcf/0x180
 btrfs_file_write_iter+0x308/0x58c
 aio_write+0xf8/0x1d0
 ? kvm_clock_read+0x1f/0x30
 ? __might_fault+0x3e/0x90
 io_submit_one+0x3a9/0x620
 ? io_submit_one+0xe5/0x620
 __ia32_compat_sys_io_submit+0xb2/0x270
 do_int80_syscall_32+0x5b/0x1a0
 entry_INT80_compat+0x88/0xa0

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/file.c     | 12 ++++++++++++
 fs/btrfs/tree-log.c |  2 --
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 095f0bb86bb7..c07110edb9de 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2079,6 +2079,14 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		goto out;
 
 	inode_lock(inode);
+
+	/*
+	 * We take the dio_sem here because the tree log stuff can race with
+	 * lockless dio writes and get an extent map logged for an extent we
+	 * never waited on.  We need it this high up for lockdep reasons.
+	 */
+	down_write(&BTRFS_I(inode)->dio_sem);
+
 	atomic_inc(&root->log_batch);
 
 	/*
@@ -2087,6 +2095,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 */
 	ret = btrfs_wait_ordered_range(inode, start, len);
 	if (ret) {
+		up_write(&BTRFS_I(inode)->dio_sem);
 		inode_unlock(inode);
 		goto out;
 	}
@@ -2110,6 +2119,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 		 * checked called fsync.
 		 */
 		ret = filemap_check_wb_err(inode->i_mapping, file->f_wb_err);
+		up_write(&BTRFS_I(inode)->dio_sem);
 		inode_unlock(inode);
 		goto out;
 	}
@@ -2128,6 +2138,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	trans = btrfs_start_transaction(root, 0);
 	if (IS_ERR(trans)) {
 		ret = PTR_ERR(trans);
+		up_write(&BTRFS_I(inode)->dio_sem);
 		inode_unlock(inode);
 		goto out;
 	}
@@ -2149,6 +2160,7 @@ int btrfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	 * file again, but that will end up using the synchronization
 	 * inside btrfs_sync_log to keep things safe.
 	 */
+	up_write(&BTRFS_I(inode)->dio_sem);
 	inode_unlock(inode);
 
 	/*
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 1650dc44a5e3..66b7e059b765 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4374,7 +4374,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
 
 	INIT_LIST_HEAD(&extents);
 
-	down_write(&inode->dio_sem);
 	write_lock(&tree->lock);
 	test_gen = root->fs_info->last_trans_committed;
 	logged_start = start;
@@ -4440,7 +4439,6 @@ static int btrfs_log_changed_extents(struct btrfs_trans_handle *trans,
 	}
 	WARN_ON(!list_empty(&extents));
 	write_unlock(&tree->lock);
-	up_write(&inode->dio_sem);
 
 	btrfs_release_path(path);
 	if (!ret)
-- 
2.14.3


  parent reply	other threads:[~2018-10-12 19:33 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12 19:32 [PATCH 00/42][v5] My current patch queue Josef Bacik
2018-10-12 19:32 ` [PATCH 01/42] btrfs: add btrfs_delete_ref_head helper Josef Bacik
2018-10-12 19:32 ` [PATCH 02/42] btrfs: add cleanup_ref_head_accounting helper Josef Bacik
2018-10-12 19:32 ` [PATCH 03/42] btrfs: cleanup extent_op handling Josef Bacik
2018-10-12 19:32 ` [PATCH 04/42] btrfs: only track ref_heads in delayed_ref_updates Josef Bacik
2018-10-12 19:32 ` [PATCH 05/42] btrfs: only count ref heads run in __btrfs_run_delayed_refs Josef Bacik
2018-10-12 19:32 ` [PATCH 06/42] btrfs: introduce delayed_refs_rsv Josef Bacik
2018-10-12 19:32 ` [PATCH 07/42] btrfs: check if free bgs for commit Josef Bacik
2018-10-12 19:32 ` [PATCH 08/42] btrfs: dump block_rsv whe dumping space info Josef Bacik
2018-10-12 19:32 ` [PATCH 09/42] btrfs: release metadata before running delayed refs Josef Bacik
2018-10-12 19:32 ` [PATCH 10/42] btrfs: protect space cache inode alloc with nofs Josef Bacik
2018-10-12 19:32 ` [PATCH 11/42] btrfs: fix truncate throttling Josef Bacik
2018-10-12 19:32 ` [PATCH 12/42] btrfs: don't use global rsv for chunk allocation Josef Bacik
2018-10-12 19:32 ` [PATCH 13/42] btrfs: add ALLOC_CHUNK_FORCE to the flushing code Josef Bacik
2018-10-12 19:32 ` [PATCH 14/42] btrfs: reset max_extent_size properly Josef Bacik
2018-10-12 19:32 ` [PATCH 15/42] btrfs: don't enospc all tickets on flush failure Josef Bacik
2018-10-12 19:32 ` [PATCH 16/42] btrfs: loop in inode_rsv_refill Josef Bacik
2018-10-12 19:32 ` [PATCH 17/42] btrfs: run delayed iputs before committing Josef Bacik
2018-10-12 19:32 ` Josef Bacik [this message]
2018-10-18 16:46   ` [PATCH 18/42] btrfs: move the dio_sem higher up the callchain David Sterba
2018-10-12 19:32 ` [PATCH 19/42] btrfs: set max_extent_size properly Josef Bacik
2018-10-17 11:16   ` David Sterba
2018-10-12 19:32 ` [PATCH 20/42] btrfs: don't use ctl->free_space for max_extent_size Josef Bacik
2018-10-12 19:32 ` [PATCH 21/42] btrfs: reset max_extent_size on clear in a bitmap Josef Bacik
2018-10-12 19:32 ` [PATCH 22/42] btrfs: only run delayed refs if we're committing Josef Bacik
2018-10-12 19:32 ` [PATCH 23/42] btrfs: make sure we create all new bgs Josef Bacik
2018-10-12 19:32 ` [PATCH 24/42] btrfs: assert on non-empty delayed iputs Josef Bacik
2018-10-12 19:32 ` [PATCH 25/42] btrfs: pass delayed_refs_root to btrfs_delayed_ref_lock Josef Bacik
2018-10-12 19:32 ` [PATCH 26/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delayed_ref_lock Josef Bacik
2018-10-12 19:32 ` [PATCH 27/42] btrfs: make btrfs_destroy_delayed_refs use btrfs_delete_ref_head Josef Bacik
2018-10-12 19:32 ` [PATCH 28/42] btrfs: handle delayed ref head accounting cleanup in abort Josef Bacik
2018-10-12 19:32 ` [PATCH 29/42] btrfs: call btrfs_create_pending_block_groups unconditionally Josef Bacik
2018-10-12 19:32 ` [PATCH 30/42] btrfs: just delete pending bgs if we are aborted Josef Bacik
2018-10-12 19:32 ` [PATCH 31/42] btrfs: cleanup pending bgs on transaction abort Josef Bacik
2018-10-12 19:32 ` [PATCH 32/42] btrfs: only free reserved extent if we didn't insert it Josef Bacik
2018-10-12 19:32 ` [PATCH 33/42] btrfs: fix insert_reserved error handling Josef Bacik
2018-10-12 19:32 ` [PATCH 34/42] btrfs: wait on ordered extents on abort cleanup Josef Bacik
2018-10-12 19:32 ` [PATCH 35/42] MAINTAINERS: update my email address for btrfs Josef Bacik
2018-10-12 19:32 ` [PATCH 36/42] btrfs: wait on caching when putting the bg cache Josef Bacik
2018-10-12 19:32 ` [PATCH 37/42] btrfs: wakeup cleaner thread when adding delayed iput Josef Bacik
2018-10-12 19:32 ` [PATCH 38/42] btrfs: be more explicit about allowed flush states Josef Bacik
2018-10-12 19:32 ` [PATCH 39/42] btrfs: replace cleaner_delayed_iput_mutex with a waitqueue Josef Bacik
2018-10-12 19:32 ` [PATCH 40/42] btrfs: drop min_size from evict_refill_and_join Josef Bacik
2018-10-12 19:32 ` [PATCH 41/42] btrfs: reserve extra space during evict() Josef Bacik
2018-10-12 19:32 ` [PATCH 42/42] btrfs: don't run delayed_iputs in commit Josef Bacik
2018-10-12 20:45   ` Filipe Manana
2018-10-17 11:45   ` David Sterba
  -- strict thread matches above, loose matches on Subject: below --
2018-10-11 19:53 [PATCH 00/42][v4] My current patch queue Josef Bacik
2018-10-11 19:54 ` [PATCH 18/42] btrfs: move the dio_sem higher up the callchain Josef Bacik
2018-09-28 11:17 [PATCH 00/42][v3] My current patch queue Josef Bacik
2018-09-28 11:17 ` [PATCH 18/42] btrfs: move the dio_sem higher up the callchain Josef Bacik
2018-10-03 12:27   ` David Sterba
2018-10-03 14:54   ` Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181012193256.13735-19-josef@toxicpanda.com \
    --to=josef@toxicpanda.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.