All of lore.kernel.org
 help / color / mirror / Atom feed
From: Filipe Manana <fdmanana@kernel.org>
To: Liu Bo <bo.li.liu@oracle.com>
Cc: "linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl
Date: Thu, 4 Feb 2016 00:33:18 +0000	[thread overview]
Message-ID: <CAL3q7H7w0h6=GAP=ZLYgVa1sgv9kzt61XkEchbVG81H1Vd2kAA@mail.gmail.com> (raw)
In-Reply-To: <20160204002813.GC13006@localhost.localdomain>

On Thu, Feb 4, 2016 at 12:28 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
> On Thu, Feb 04, 2016 at 12:23:12AM +0000, Filipe Manana wrote:
>> On Thu, Feb 4, 2016 at 12:20 AM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> > On Wed, Feb 03, 2016 at 11:24:33PM +0000, Filipe Manana wrote:
>> >> On Wed, Feb 3, 2016 at 11:19 PM, Liu Bo <bo.li.liu@oracle.com> wrote:
>> >> > On Wed, Feb 03, 2016 at 08:26:50PM +0000, fdmanana@kernel.org wrote:
>> >> >> From: Filipe Manana <fdmanana@suse.com>
>> >> >>
>> >> >> While doing some tests I ran into an hang on an extent buffer's rwlock
>> >> >> that produced the following trace:
>> >> >>
>> >> >> [39389.800012] NMI watchdog: BUG: soft lockup - CPU#15 stuck for 22s! [fdm-stress:32166]
>> >> >> [39389.800016] NMI watchdog: BUG: soft lockup - CPU#14 stuck for 22s! [fdm-stress:32165]
>> >> >> [39389.800016] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
>> >> >> [39389.800016] irq event stamp: 0
>> >> >> [39389.800016] hardirqs last  enabled at (0): [<          (null)>]           (null)
>> >> >> [39389.800016] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
>> >> >> [39389.800016] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
>> >> >> [39389.800016] softirqs last disabled at (0): [<          (null)>]           (null)
>> >> >> [39389.800016] CPU: 14 PID: 32165 Comm: btrfs Not tainted 4.4.0-rc6-btrfs-next-18+ #1
>> >> >> [39389.800016] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
>> >> >> [39389.800016] task: ffff880175b1ca40 ti: ffff8800a185c000 task.ti: ffff8800a185c000
>> >> >> [39389.800016] RIP: 0010:[<ffffffff810902af>]  [<ffffffff810902af>] queued_spin_lock_slowpath+0x57/0x158
>> >> >> [39389.800016] RSP: 0018:ffff8800a185fb80  EFLAGS: 00000202
>> >> >> [39389.800016] RAX: 0000000000000101 RBX: ffff8801710c4e9c RCX: 0000000000000101
>> >> >> [39389.800016] RDX: 0000000000000100 RSI: 0000000000000001 RDI: 0000000000000001
>> >> >> [39389.800016] RBP: ffff8800a185fb98 R08: 0000000000000001 R09: 0000000000000000
>> >> >> [39389.800016] R10: ffff8800a185fb68 R11: 6db6db6db6db6db7 R12: ffff8801710c4e98
>> >> >> [39389.800016] R13: ffff880175b1ca40 R14: ffff8800a185fc10 R15: ffff880175b1ca40
>> >> >> [39389.800016] FS:  00007f6d37fff700(0000) GS:ffff8802be9c0000(0000) knlGS:0000000000000000
>> >> >> [39389.800016] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> >> [39389.800016] CR2: 00007f6d300019b8 CR3: 0000000037c93000 CR4: 00000000001406e0
>> >> >> [39389.800016] Stack:
>> >> >> [39389.800016]  ffff8801710c4e98 ffff8801710c4e98 ffff880175b1ca40 ffff8800a185fbb0
>> >> >> [39389.800016]  ffffffff81091e11 ffff8801710c4e98 ffff8800a185fbc8 ffffffff81091895
>> >> >> [39389.800016]  ffff8801710c4e98 ffff8800a185fbe8 ffffffff81486c5c ffffffffa067288c
>> >> >> [39389.800016] Call Trace:
>> >> >> [39389.800016]  [<ffffffff81091e11>] queued_read_lock_slowpath+0x46/0x60
>> >> >> [39389.800016]  [<ffffffff81091895>] do_raw_read_lock+0x3e/0x41
>> >> >> [39389.800016]  [<ffffffff81486c5c>] _raw_read_lock+0x3d/0x44
>> >> >> [39389.800016]  [<ffffffffa067288c>] ? btrfs_tree_read_lock+0x54/0x125 [btrfs]
>> >> >> [39389.800016]  [<ffffffffa067288c>] btrfs_tree_read_lock+0x54/0x125 [btrfs]
>> >> >> [39389.800016]  [<ffffffffa0622ced>] ? btrfs_find_item+0xa7/0xd2 [btrfs]
>> >> >> [39389.800016]  [<ffffffffa069363f>] btrfs_ref_to_path+0xd6/0x174 [btrfs]
>> >> >> [39389.800016]  [<ffffffffa0693730>] inode_to_path+0x53/0xa2 [btrfs]
>> >> >> [39389.800016]  [<ffffffffa0693e2e>] paths_from_inode+0x117/0x2ec [btrfs]
>> >> >> [39389.800016]  [<ffffffffa0670cff>] btrfs_ioctl+0xd5b/0x2793 [btrfs]
>> >> >> [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
>> >> >> [39389.800016]  [<ffffffff81276727>] ? __this_cpu_preempt_check+0x13/0x15
>> >> >> [39389.800016]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
>> >> >> [39389.800016]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
>> >> >> [39389.800016]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
>> >> >> [39389.800016]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
>> >> >> [39389.800016]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
>> >> >> [39389.800016]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
>> >> >> [39389.800016] Code: b9 01 01 00 00 f7 c6 00 ff ff ff 75 32 83 fe 01 89 ca 89 f0 0f 45 d7 f0 0f b1 13 39 f0 74 04 89 c6 eb e2 ff ca 0f 84 fa 00 00 00 <8b> 03 84 c0 74 04 f3 90 eb f6 66 c7 03 01 00 e9 e6 00 00 00 e8
>> >> >> [39389.800012] Modules linked in: btrfs dm_mod ppdev xor sha256_generic hmac raid6_pq drbg ansi_cprng aesni_intel i2c_piix4 acpi_cpufreq aes_x86_64 ablk_helper tpm_tis parport_pc i2c_core sg cryptd evdev psmouse lrw tpm parport gf128mul serio_raw pcspkr glue_helper processor button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring crc32c_intel scsi_mod e1000 virtio floppy [last unloaded: btrfs]
>> >> >> [39389.800012] irq event stamp: 0
>> >> >> [39389.800012] hardirqs last  enabled at (0): [<          (null)>]           (null)
>> >> >> [39389.800012] hardirqs last disabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
>> >> >> [39389.800012] softirqs last  enabled at (0): [<ffffffff8104e58d>] copy_process+0x638/0x1a35
>> >> >> [39389.800012] softirqs last disabled at (0): [<          (null)>]           (null)
>> >> >> [39389.800012] CPU: 15 PID: 32166 Comm: btrfs Tainted: G             L  4.4.0-rc6-btrfs-next-18+ #1
>> >> >> [39389.800012] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
>> >> >> [39389.800012] task: ffff880179294380 ti: ffff880034a60000 task.ti: ffff880034a60000
>> >> >> [39389.800012] RIP: 0010:[<ffffffff81091e8d>]  [<ffffffff81091e8d>] queued_write_lock_slowpath+0x62/0x72
>> >> >> [39389.800012] RSP: 0018:ffff880034a639f0  EFLAGS: 00000206
>> >> >> [39389.800012] RAX: 0000000000000101 RBX: ffff8801710c4e98 RCX: 0000000000000000
>> >> >> [39389.800012] RDX: 00000000000000ff RSI: 0000000000000000 RDI: ffff8801710c4e9c
>> >> >> [39389.800012] RBP: ffff880034a639f8 R08: 0000000000000001 R09: 0000000000000000
>> >> >> [39389.800012] R10: ffff880034a639b0 R11: 0000000000001000 R12: ffff8801710c4e98
>> >> >> [39389.800012] R13: 0000000000000001 R14: ffff880172cbc000 R15: ffff8801710c4e00
>> >> >> [39389.800012] FS:  00007f6d377fe700(0000) GS:ffff8802be9e0000(0000) knlGS:0000000000000000
>> >> >> [39389.800012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> >> [39389.800012] CR2: 00007f6d3d3c1000 CR3: 0000000037c93000 CR4: 00000000001406e0
>> >> >> [39389.800012] Stack:
>> >> >> [39389.800012]  ffff8801710c4e98 ffff880034a63a10 ffffffff81091963 ffff8801710c4e98
>> >> >> [39389.800012]  ffff880034a63a30 ffffffff81486f1b ffffffffa0672cb3 ffff8801710c4e00
>> >> >> [39389.800012]  ffff880034a63a78 ffffffffa0672cb3 ffff8801710c4e00 ffff880034a63a58
>> >> >> [39389.800012] Call Trace:
>> >> >> [39389.800012]  [<ffffffff81091963>] do_raw_write_lock+0x72/0x8c
>> >> >> [39389.800012]  [<ffffffff81486f1b>] _raw_write_lock+0x3a/0x41
>> >> >> [39389.800012]  [<ffffffffa0672cb3>] ? btrfs_tree_lock+0x119/0x251 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa0672cb3>] btrfs_tree_lock+0x119/0x251 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa061aeba>] ? rcu_read_unlock+0x5b/0x5d [btrfs]
>> >> >> [39389.800012]  [<ffffffffa061ce13>] ? btrfs_root_node+0xda/0xe6 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa061ce83>] btrfs_lock_root_node+0x22/0x42 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa062046b>] btrfs_search_slot+0x1b8/0x758 [btrfs]
>> >> >> [39389.800012]  [<ffffffff810fc6b0>] ? time_hardirqs_on+0x15/0x28
>> >> >> [39389.800012]  [<ffffffffa06365db>] btrfs_lookup_inode+0x31/0x95 [btrfs]
>> >> >> [39389.800012]  [<ffffffff8108d62f>] ? trace_hardirqs_on+0xd/0xf
>> >> >> [39389.800012]  [<ffffffff8148482b>] ? mutex_lock_nested+0x397/0x3bc
>> >> >> [39389.800012]  [<ffffffffa068821b>] __btrfs_update_delayed_inode+0x59/0x1c0 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa068858e>] __btrfs_commit_inode_delayed_items+0x194/0x5aa [btrfs]
>> >> >> [39389.800012]  [<ffffffff81486ab7>] ? _raw_spin_unlock+0x31/0x44
>> >> >> [39389.800012]  [<ffffffffa0688a48>] __btrfs_run_delayed_items+0xa4/0x15c [btrfs]
>> >> >> [39389.800012]  [<ffffffffa0688d62>] btrfs_run_delayed_items+0x11/0x13 [btrfs]
>> >> >> [39389.800012]  [<ffffffffa064048e>] btrfs_commit_transaction+0x234/0x96e [btrfs]
>> >> >> [39389.800012]  [<ffffffffa0618d10>] btrfs_sync_fs+0x145/0x1ad [btrfs]
>> >> >> [39389.800012]  [<ffffffffa0671176>] btrfs_ioctl+0x11d2/0x2793 [btrfs]
>> >> >> [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
>> >> >> [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
>> >> >> [39389.800012]  [<ffffffff81140261>] ? __might_fault+0x4c/0xa7
>> >> >> [39389.800012]  [<ffffffff8108a8b0>] ? arch_local_irq_save+0x9/0xc
>> >> >> [39389.800012]  [<ffffffff8118b3d4>] ? rcu_read_unlock+0x3e/0x5d
>> >> >> [39389.800012]  [<ffffffff811822f8>] do_vfs_ioctl+0x42b/0x4ea
>> >> >> [39389.800012]  [<ffffffff8118b4f3>] ? __fget_light+0x62/0x71
>> >> >> [39389.800012]  [<ffffffff8118240e>] SyS_ioctl+0x57/0x79
>> >> >> [39389.800012]  [<ffffffff814872d7>] entry_SYSCALL_64_fastpath+0x12/0x6f
>> >> >> [39389.800012] Code: f0 0f b1 13 85 c0 75 ef eb 2a f3 90 8a 03 84 c0 75 f8 f0 0f b0 13 84 c0 75 f0 ba ff 00 00 00 eb 0a f0 0f b1 13 ff c8 74 0b f3 90 <8b> 03 83 f8 01 75 f7 eb ed c6 43 04 00 5b 5d c3 0f 1f 44 00 00
>> >> >>
>> >> >> This happens because in the code path executed by the inode_paths ioctl we
>> >> >> end up nesting two calls to read lock a leaf's rwlock when after the first
>> >> >> call to read_lock() and before the second call to read_lock(), another
>> >> >> task (running the delayed items as part of a transaction commit) has
>> >> >> already called write_lock() against the leaf's rwlock. This situation is
>> >> >> illustrated by the following diagram:
>> >> >>
>> >> >>          Task A                       Task B
>> >> >>
>> >> >>   btrfs_ref_to_path()               btrfs_commit_transaction()
>> >> >>     read_lock(&eb->lock);
>> >> >>
>> >> >>                                       btrfs_run_delayed_items()
>> >> >>                                         __btrfs_commit_inode_delayed_items()
>> >> >>                                           __btrfs_update_delayed_inode()
>> >> >>                                             btrfs_lookup_inode()
>> >> >>
>> >> >>                                               write_lock(&eb->lock);
>> >> >>                                                 --> task waits for lock
>> >> >>
>> >> >>     read_lock(&eb->lock);
>> >> >>     --> makes this task hang
>> >> >>         forever (and task B too
>> >> >>       of course)
>> >> >>
>> >> >> So fix this by avoiding doing the nested read lock, which is easily
>> >> >> avoidable. This issue does not happen if task B calls write_lock() after
>> >> >> task A does the second call to read_lock(), however there does not seem
>> >> >> to exist anything in the documentation that mentions what is the expected
>> >> >> behaviour for recursive locking of rwlocks (leaving the idea that doing
>> >> >> so is not a good usage of rwlocks).
>> >> >
>> >> > iterate_inode_refs() does it in anothe way, it's using
>> >> > btrfs_clone_extent_buffer() to keep btrfs from similar deadlock,
>> >>
>> >> Yeah, I know, I made that fix long time ago.
>> >>
>> >> > it creates a clone so we don't have trouble on the original eb's read
>> >> > lock, it's more reasonable to me.
>> >>
>> >> Here it would be overkill to do similar approach, and it's a different
>> >> problem, we're not attempting to lock one leaf while having some other
>> >> leaf locked. We just want to transition a lock from spinning to
>> >> blocking mode - so it was silly to double read lock, we can do
>> >> directly the transition and make sure the release path call below
>> >> doesn't release the leaf. No need to waste memory and time cloning a
>> >> leaf.
>> >
>> > Correct, I missed that fact, so the patch is good.
>> >
>> > Since we're passing a clone eb into btrfs_ref_to_path(), the check 'if
>> > (eb != eb_in) will only be false for the first time when it enters into
>> > while(), maybe we can add 'likely' preference for it.
>>
>> What do you mean by 'likely' preference? The macro likely/unlikely?
>
> Exactly, the macro likely/unlikely.

What's the point? Makes the code more unreadable and most often has
negative effects (and cpus are good doing branch predictions nowadays
I believe). Plus this isn't a performance critical piece of code (if
it was, btrfs would be the fastest fs ever). Also it's not necessarily
always true only for the first iteration, when used from send for
example, eb_in isn't a cloned eb.

Thanks.

>
> Thanks,
>
> -liubo
>
>>
>>
>> >
>> > Thanks,
>> >
>> > -liubo
>> >
>> >>
>> >> thanks
>> >>
>> >> >
>> >> > Thanks,
>> >> >
>> >> > -liubo
>> >> >
>> >> >>
>> >> >> Cc: stable@vger.kernel.org
>> >> >> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> >> >> ---
>> >> >>  fs/btrfs/backref.c | 4 ++--
>> >> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >> >>
>> >> >> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
>> >> >> index b90cd37..198a0f8 100644
>> >> >> --- a/fs/btrfs/backref.c
>> >> >> +++ b/fs/btrfs/backref.c
>> >> >> @@ -1426,9 +1426,9 @@ char *btrfs_ref_to_path(struct btrfs_root *fs_root, struct btrfs_path *path,
>> >> >>               eb = path->nodes[0];
>> >> >>               /* make sure we can use eb after releasing the path */
>> >> >>               if (eb != eb_in) {
>> >> >> -                     atomic_inc(&eb->refs);
>> >> >> -                     btrfs_tree_read_lock(eb);
>> >> >>                       btrfs_set_lock_blocking_rw(eb, BTRFS_READ_LOCK);
>> >> >> +                     path->nodes[0] = NULL;
>> >> >> +                     path->locks[0] = 0;
>> >> >>               }
>> >> >>               btrfs_release_path(path);
>> >> >>               iref = btrfs_item_ptr(eb, slot, struct btrfs_inode_ref);
>> >> >> --
>> >> >> 2.7.0.rc3
>> >> >>
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2016-02-04  0:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-03 20:26 [PATCH] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl fdmanana
2016-02-03 23:19 ` Liu Bo
2016-02-03 23:24   ` Filipe Manana
2016-02-03 23:50     ` Chris Mason
2016-02-04  0:20     ` Liu Bo
2016-02-04  0:23       ` Filipe Manana
2016-02-04  0:28         ` Liu Bo
2016-02-04  0:33           ` Filipe Manana [this message]
2016-02-04 23:03 ` Liu Bo
2016-02-05  2:22 ` [PATCH v2] " fdmanana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL3q7H7w0h6=GAP=ZLYgVa1sgv9kzt61XkEchbVG81H1Vd2kAA@mail.gmail.com' \
    --to=fdmanana@kernel.org \
    --cc=bo.li.liu@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.