From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josef Bacik Subject: Re: [PATCH] sync: wait_sb_inodes() calls iput() with spinlock held (was Re: [PATCH 0/7] super block scalabilit patches V3) Date: Mon, 22 Jun 2015 09:21:03 -0700 Message-ID: <558835EF.2000000@fb.com> References: <1434051673-13838-1-git-send-email-jbacik@fb.com> <20150615213429.GB10224@dastard> <20150622022648.GO10224@dastard> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Cc: , , , , To: Dave Chinner Return-path: Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:53939 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753499AbbFVQVf (ORCPT ); Mon, 22 Jun 2015 12:21:35 -0400 In-Reply-To: <20150622022648.GO10224@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 06/21/2015 07:26 PM, Dave Chinner wrote: > On Tue, Jun 16, 2015 at 07:34:29AM +1000, Dave Chinner wrote: >> On Thu, Jun 11, 2015 at 03:41:05PM -0400, Josef Bacik wrote: >>> Here are the cleaned up versions of Dave Chinners super block scalability >>> patches. I've been testing them locally for a while and they are pretty solid. >>> They fix a few big issues, such as the global inode list and soft lockups on >>> boxes on unmount that have lots of inodes in cache. Al if you would consider >>> pulling these in that would be great, you can pull from here >>> >>> git://git.kernel.org/pub/scm/linux/kernel/git/josef/btrfs-next.git superblock-scaling >> >> Passes all my smoke tests. >> >> Tested-by: Dave Chinner > > FWIW, I just updated my trees to whatever is in the above branch and > v4.1-rc8, and now I'm seeing problems with wb.list_lock recursion > and "sleeping in atomic" scehduling issues. generic/269 produced > this: > > BUG: spinlock cpu recursion on CPU#1, fsstress/3852 > lock: 0xffff88042a650c28, .magic: dead4ead, .owner: fsstress/3804, .owner_cpu: 1 > CPU: 1 PID: 3852 Comm: fsstress Tainted: G W 4.1.0-rc8-dgc+ #263 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > ffff88042a650c28 ffff88039898b8e8 ffffffff81e18ffd ffff88042f250fb0 > ffff880428f6b8e0 ffff88039898b908 ffffffff81e12f09 ffff88042a650c28 > ffffffff8221337b ffff88039898b928 ffffffff81e12f34 ffff88042a650c28 > Call Trace: > [] dump_stack+0x4c/0x6e > [] spin_dump+0x90/0x95 > [] spin_bug+0x26/0x2b > [] do_raw_spin_lock+0x10d/0x150 > [] _raw_spin_lock+0x15/0x20 > [] __mark_inode_dirty+0x2b0/0x450 > [] __set_page_dirty+0x78/0xd0 > [] mark_buffer_dirty+0x61/0xf0 > [] __block_commit_write.isra.24+0x81/0xb0 > [] block_write_end+0x36/0x70 > [] ? __xfs_get_blocks+0x8a0/0x8a0 > [] generic_write_end+0x34/0xb0 > [] ? wait_for_stable_page+0x1d/0x50 > [] xfs_vm_write_end+0x67/0xc0 > [] pagecache_write_end+0x1f/0x30 > [] xfs_iozero+0x10d/0x190 > [] xfs_zero_last_block+0xdb/0x110 > [] xfs_zero_eof+0x11a/0x290 > [] ? complete_walk+0x60/0x100 > [] ? path_lookupat+0x5f/0x660 > [] xfs_file_aio_write_checks+0x13e/0x160 > [] xfs_file_buffered_aio_write+0x75/0x250 > [] ? user_path_at_empty+0x5f/0xa0 > [] ? __might_sleep+0x4d/0x90 > [] xfs_file_write_iter+0x105/0x120 > [] __vfs_write+0xae/0xf0 > [] vfs_write+0xa1/0x190 > [] SyS_write+0x49/0xb0 > [] ? SyS_lseek+0x91/0xb0 > [] system_call_fastpath+0x12/0x71 > > And there are a few tests (including generic/269) producing > in_atomic/"scheduling while atomic" bugs in the evict() path such as: > > in_atomic(): 1, irqs_disabled(): 0, pid: 3852, name: fsstress > CPU: 12 PID: 3852 Comm: fsstress Not tainted 4.1.0-rc8-dgc+ #263 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > 000000000000015d ffff88039898b6d8 ffffffff81e18ffd 0000000000000000 > ffff880398865550 ffff88039898b6f8 ffffffff810c5f89 ffff8803f15c45c0 > ffffffff8227a3bf ffff88039898b728 ffffffff810c601d ffff88039898b758 > Call Trace: > [] dump_stack+0x4c/0x6e > [] ___might_sleep+0xf9/0x140 > [] __might_sleep+0x4d/0x90 > [] block_invalidatepage+0xab/0x140 > [] xfs_vm_invalidatepage+0x39/0xb0 > [] truncate_inode_page+0x67/0xa0 > [] truncate_inode_pages_range+0x1a2/0x6f0 > [] ? find_get_pages_tag+0xf1/0x1b0 > [] ? __switch_to+0x1e3/0x5a0 > [] ? pagevec_lookup_tag+0x25/0x40 > [] ? __inode_wait_for_writeback+0x6d/0xc0 > [] truncate_inode_pages_final+0x4c/0x60 > [] xfs_fs_evict_inode+0x4f/0x100 > [] evict+0xc0/0x1a0 > [] iput+0x1bb/0x220 > [] sync_inodes_sb+0x353/0x3d0 > [] xfs_flush_inodes+0x28/0x40 > [] xfs_create+0x638/0x770 > [] ? xfs_dir2_sf_lookup+0x199/0x330 > [] xfs_generic_create+0xd1/0x300 > [] ? security_inode_permission+0x1c/0x30 > [] xfs_vn_create+0x16/0x20 > [] vfs_create+0xd5/0x140 > [] do_last+0xff3/0x1200 > [] ? path_init+0x186/0x450 > [] path_openat+0x80/0x610 > [] ? xfs_iunlock+0xc4/0x210 > [] do_filp_open+0x3a/0x90 > [] ? getname_flags+0x4f/0x200 > [] ? _raw_spin_unlock+0xe/0x30 > [] ? __alloc_fd+0xa7/0x130 > [] do_sys_open+0x128/0x220 > [] SyS_creat+0x1e/0x20 > [] system_call_fastpath+0x12/0x71 > > It looks to me like iput() is being called with the wb.list_lock > held in wait_sb_inodes(), and everything is going downhill from > there. Patch below fixes the problem for me. > > Cheers, > > Dave. > Thanks Dave I'll add it. I think this is what we were doing at first but then I changed it, didn't notice the wb.list_lock. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in