From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> To: Nick Piggin <npiggin@kernel.dk> Cc: Dave Chinner <david@fromorbit.com>, Nick Piggin <npiggin@gmail.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/46] rcu-walk and dcache scaling Date: Sun, 12 Dec 2010 06:54:07 -0800 [thread overview] Message-ID: <20101212145407.GA1969@linux.vnet.ibm.com> (raw) In-Reply-To: <20101210203257.GI2125@linux.vnet.ibm.com> On Fri, Dec 10, 2010 at 12:32:57PM -0800, Paul E. McKenney wrote: > On Wed, Dec 08, 2010 at 06:09:09PM +1100, Nick Piggin wrote: > > On Wed, Dec 08, 2010 at 03:28:16PM +1100, Dave Chinner wrote: > > > On Wed, Dec 08, 2010 at 02:32:12PM +1100, Dave Chinner wrote: > > > > On Wed, Dec 08, 2010 at 12:47:42PM +1100, Nick Piggin wrote: > > > > > On Wed, Dec 8, 2010 at 8:56 AM, Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Sat, Nov 27, 2010 at 09:15:58PM +1100, Nick Piggin wrote: > > > > > >> > > > > > >> git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git vfs-scale-working > > > > > >> > > > > > >> Here is an new set of vfs patches for review, not that there was much interest > > > > > >> last time they were posted. It is structured like: > > > > > >> > > > > > >> * preparation patches > > > > > >> * introduce new locks to take over dcache_lock, then remove it > > > > > >> * cleaning up and reworking things for new locks > > > > > >> * rcu-walk path walking > > > > > >> * start on some fine grained locking steps > > > > > > > > > > > > Stress test doing: > > > > > > > > > > > > single thread 50M inode create > > > > > > single thread rm -rf > > > > > > 2-way 50M inode create > > > > > > 2-way rm -rf > > > > > > 4-way 50M inode create > > > > > > 4-way rm -rf > > > > > > 8-way 50M inode create > > > > > > 8-way rm -rf > > > > > > 8-way 250M inode create > > > > > > 8-way rm -rf > > > > > > > > > > > > Failed about 5 minutes into the "4-way rm -rf" (~3 hours into the test) > > > > > > with a CPU stuck spinning on here: > > > > > > > > > > > > [37372.084012] NMI backtrace for cpu 5 > > > > > > [37372.084012] CPU 5 > > > > > > [37372.084012] Modules linked in: > > > > > > [37372.084012] > > > > > > [37372.084012] Pid: 15214, comm: rm Not tainted 2.6.37-rc4-dgc+ #797 /Bochs > > > > > > [37372.084012] RIP: 0010:[<ffffffff810643c4>] [<ffffffff810643c4>] __ticket_spin_lock+0x14/0x20 > > > > > > [37372.084012] RSP: 0018:ffff880114643c98 EFLAGS: 00000213 > > > > > > [37372.084012] RAX: 0000000000008801 RBX: ffff8800687be6c0 RCX: ffff8800c4eb2688 > > > > > > [37372.084012] RDX: ffff880114643d38 RSI: ffff8800dfd4ea80 RDI: ffff880114643d14 > > > > > > [37372.084012] RBP: ffff880114643c98 R08: 0000000000000003 R09: 0000000000000000 > > > > > > [37372.084012] R10: 0000000000000000 R11: dead000000200200 R12: ffff880114643d14 > > > > > > [37372.084012] R13: ffff880114643cb8 R14: ffff880114643d38 R15: ffff8800687be71c > > > > > > [37372.084012] FS: 00007fd6d7c93700(0000) GS:ffff8800dfd40000(0000) knlGS:0000000000000000 > > > > > > [37372.084012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > [37372.084012] CR2: 0000000000bbd108 CR3: 0000000107146000 CR4: 00000000000006e0 > > > > > > [37372.084012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > > [37372.084012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > > > > [37372.084012] Process rm (pid: 15214, threadinfo ffff880114642000, task ffff88011b16f890) > > > > > > [37372.084012] Stack: > > > > > > [37372.084012] ffff880114643ca8 ffffffff81ad044e ffff880114643cf8 ffffffff81167ae7 > > > > > > [37372.084012] 0000000000000000 ffff880114643d38 000000000000000e ffff88011901d800 > > > > > > [37372.084012] ffff8800cdb7cf5c ffff88011901d8e0 0000000000000000 0000000000000000 > > > > > > [37372.084012] Call Trace: > > > > > > [37372.084012] [<ffffffff81ad044e>] _raw_spin_lock+0xe/0x20 > > > > > > [37372.084012] [<ffffffff81167ae7>] shrink_dentry_list+0x47/0x370 > > > > > > [37372.084012] [<ffffffff81167f5e>] __shrink_dcache_sb+0x14e/0x1e0 > > > > > > [37372.084012] [<ffffffff81168456>] shrink_dcache_parent+0x276/0x2d0 > > > > > > [37372.084012] [<ffffffff81ad044e>] ? _raw_spin_lock+0xe/0x20 > > > > > > [37372.084012] [<ffffffff8115daa2>] dentry_unhash+0x42/0x80 > > > > > > [37372.084012] [<ffffffff8115db48>] vfs_rmdir+0x68/0x100 > > > > > > [37372.084012] [<ffffffff8115fd93>] do_rmdir+0x113/0x130 > > > > > > [37372.084012] [<ffffffff8114f5ad>] ? filp_close+0x5d/0x90 > > > > > > [37372.084012] [<ffffffff8115fde5>] sys_unlinkat+0x35/0x40 > > > > > > [37372.084012] [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b > > > > > > > > > > OK good, with any luck, that's the same bug. > > > > > > > > > > Is this XFS? > > > > > > > > Yes. > > > > > > > > > Is there any concurrent activity happening on the same dentries? > > > > > > > > Not from an application perspective. > > > > > > > > > Ie. are the rm -rf threads running on the same directories, > > > > > > > > No, each thread operating on a different directory. > > > > This is probably fixed by the same patch as the lockdep splat trace. > > > > > > > > > or is there any reclaim happening in the background? > > > > > > > > IIRC, kswapd was consuming about 5-10% of a CPU during parallel > > > > unlink tests. Mainly reclaiming XFS inodes, I think, but there may > > > > be dentry cache reclaim going as well. > > > > > > Turns out that the kswapd peaks are upwards of 50% of a CPU for a > > > few seconds, then idle for 10-15s. Typical perf top output of kswapd > > > while it is active during unlinks is: > > > > > > samples pcnt function DSO > > > _______ _____ ___________________________ _________________ > > > > > > 17168.00 10.2% __call_rcu [kernel.kallsyms] > > > 13223.00 7.8% kmem_cache_free [kernel.kallsyms] > > > 12917.00 7.6% down_write [kernel.kallsyms] > > > 12665.00 7.5% xfs_iunlock [kernel.kallsyms] > > > 10493.00 6.2% xfs_reclaim_inode_grab [kernel.kallsyms] > > > 9314.00 5.5% __lookup_tag [kernel.kallsyms] > > > 9040.00 5.4% radix_tree_delete [kernel.kallsyms] > > > 8694.00 5.1% is_bad_inode [kernel.kallsyms] > > > 7639.00 4.5% __ticket_spin_lock [kernel.kallsyms] > > > 6821.00 4.0% _raw_spin_unlock_irqrestore [kernel.kallsyms] > > > 5484.00 3.2% __d_drop [kernel.kallsyms] > > > 5114.00 3.0% xfs_reclaim_inode [kernel.kallsyms] > > > 4626.00 2.7% __rcu_process_callbacks [kernel.kallsyms] > > > 3556.00 2.1% up_write [kernel.kallsyms] > > > 3206.00 1.9% _cond_resched [kernel.kallsyms] > > > 3129.00 1.9% xfs_qm_dqdetach [kernel.kallsyms] > > > 2327.00 1.4% radix_tree_tag_clear [kernel.kallsyms] > > > 2327.00 1.4% call_rcu_sched [kernel.kallsyms] > > > 2262.00 1.3% __ticket_spin_unlock [kernel.kallsyms] > > > 2215.00 1.3% xfs_ilock [kernel.kallsyms] > > > 2200.00 1.3% radix_tree_gang_lookup_tag [kernel.kallsyms] > > > 1982.00 1.2% xfs_reclaim_inodes_ag [kernel.kallsyms] > > > 1736.00 1.0% xfs_trans_unlocked_item [kernel.kallsyms] > > > 1707.00 1.0% __ticket_spin_trylock [kernel.kallsyms] > > > 1688.00 1.0% xfs_perag_get_tag [kernel.kallsyms] > > > 1660.00 1.0% flat_send_IPI_mask [kernel.kallsyms] > > > 1538.00 0.9% xfs_inode_item_destroy [kernel.kallsyms] > > > 1312.00 0.8% __shrink_dcache_sb [kernel.kallsyms] > > > 940.00 0.6% xfs_perag_put [kernel.kallsyms] > > > > > > So there is some dentry cache reclaim going on. > > > > > > FWIW, it appears there is quite a lot of RCU freeing overhead (~15% > > > more CPU time) in the work kswapd is doing during these unlinks, too. > > > I just had a look at kswapd when a 8-way create is running - it's running at > > > 50-60% of a cpu for seconds at a time. I caught this while it was doing pure > > > XFS inode cache reclaim (~10s sample, kswapd reclaimed ~1M inodes): > > > > > > samples pcnt function DSO > > > _______ _____ ___________________________ _________________ > > > > > > 27171.00 9.0% __call_rcu [kernel.kallsyms] > > > 21491.00 7.1% down_write [kernel.kallsyms] > > > 20916.00 6.9% xfs_reclaim_inode [kernel.kallsyms] > > > 20313.00 6.7% radix_tree_delete [kernel.kallsyms] > > > 15828.00 5.3% kmem_cache_free [kernel.kallsyms] > > > 15819.00 5.2% xfs_idestroy_fork [kernel.kallsyms] > > > 14893.00 4.9% is_bad_inode [kernel.kallsyms] > > > 14666.00 4.9% _raw_spin_unlock_irqrestore [kernel.kallsyms] > > > 14191.00 4.7% xfs_reclaim_inode_grab [kernel.kallsyms] > > > 14105.00 4.7% xfs_iunlock [kernel.kallsyms] > > > 10916.00 3.6% __ticket_spin_lock [kernel.kallsyms] > > > 10125.00 3.4% xfs_iflush_cluster [kernel.kallsyms] > > > 8221.00 2.7% xfs_qm_dqdetach [kernel.kallsyms] > > > 7639.00 2.5% xfs_trans_unlocked_item [kernel.kallsyms] > > > 7028.00 2.3% xfs_synchronize_times [kernel.kallsyms] > > > 6974.00 2.3% up_write [kernel.kallsyms] > > > 5870.00 1.9% call_rcu_sched [kernel.kallsyms] > > > 5634.00 1.9% _cond_resched [kernel.kallsyms] > > > > > > Which is showing a similar amount of RCU overhead as the unlink as above. > > > And this while it was doing dentry cache reclaim (~10s sample): > > > > > > 35921.00 15.7% __d_drop [kernel.kallsyms] > > > 30056.00 13.1% __ticket_spin_trylock [kernel.kallsyms] > > > 29066.00 12.7% __ticket_spin_lock [kernel.kallsyms] > > > 19043.00 8.3% __call_rcu [kernel.kallsyms] > > > 10098.00 4.4% iput [kernel.kallsyms] > > > 7013.00 3.1% __shrink_dcache_sb [kernel.kallsyms] > > > 6774.00 3.0% __percpu_counter_add [kernel.kallsyms] > > > 6708.00 2.9% radix_tree_tag_set [kernel.kallsyms] > > > 5362.00 2.3% xfs_inactive [kernel.kallsyms] > > > 5130.00 2.2% __ticket_spin_unlock [kernel.kallsyms] > > > 4884.00 2.1% call_rcu_sched [kernel.kallsyms] > > > 4621.00 2.0% dentry_lru_del [kernel.kallsyms] > > > 3735.00 1.6% bit_waitqueue [kernel.kallsyms] > > > 3727.00 1.6% dentry_iput [kernel.kallsyms] > > > 3473.00 1.5% shrink_icache_memory [kernel.kallsyms] > > > 3279.00 1.4% kfree [kernel.kallsyms] > > > 3101.00 1.4% xfs_perag_get [kernel.kallsyms] > > > 2516.00 1.1% kmem_cache_free [kernel.kallsyms] > > > 2272.00 1.0% shrink_dentry_list [kernel.kallsyms] > > > > > > I've never really seen any signficant dentry cache reclaim overhead > > > in profiles of these workloads before, so this was a bit of a > > > surprise.... > > > > call_rcu shouldn't be doing much, except for disabling irqs and linking > > the object into the list. I have a patch somewhere to reduce the irq > > disable overhead a bit, but it really shouldn't be doing a lot of work. > > Could you please enable CONFIG_RCU_TRACE, mount debugfs somewhere, and > look at rcu/rcudata? There will be a "ql=" number printed for each > CPU, and if that number is too large, __call_rcu() does take what it > considers to be corrective action, which can incur some overhead. > > If this is the problem, then increasing the value of the qhimark module > parameter might help. > > If this is not the problem, I could make a patch that disables some of > __call_rcu()'s grace-period acceleration code if you are willing to try > it out. Another thing that might help is to reduce the value of CONFIG_RCU_FANOUT to something like 16. If this does help, then there is a reasonably straightforward change I can make to RCU. Thanx, Paul > > Sometimes you find that touching the rcu head field needs to get a > > cacheline exclusive, so a bit of work gets transferred there.... > > > > But it may also be something going a bit wrong in RCU. I blew it up > > once already, after the files_lock splitup that enabled all CPUs to > > create and destroy files :) > > I would certainly like the opportunity to fix any bugs that might be > in RCU... ;-) > > Thanx, Paul
WARNING: multiple messages have this Message-ID (diff)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> To: Nick Piggin <npiggin@kernel.dk> Cc: Dave Chinner <david@fromorbit.com>, Nick Piggin <npiggin@gmail.com>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 00/46] rcu-walk and dcache scaling Date: Sun, 12 Dec 2010 06:54:07 -0800 [thread overview] Message-ID: <20101212145407.GA1969@linux.vnet.ibm.com> (raw) In-Reply-To: <20101210203257.GI2125@linux.vnet.ibm.com> On Fri, Dec 10, 2010 at 12:32:57PM -0800, Paul E. McKenney wrote: > On Wed, Dec 08, 2010 at 06:09:09PM +1100, Nick Piggin wrote: > > On Wed, Dec 08, 2010 at 03:28:16PM +1100, Dave Chinner wrote: > > > On Wed, Dec 08, 2010 at 02:32:12PM +1100, Dave Chinner wrote: > > > > On Wed, Dec 08, 2010 at 12:47:42PM +1100, Nick Piggin wrote: > > > > > On Wed, Dec 8, 2010 at 8:56 AM, Dave Chinner <david@fromorbit.com> wrote: > > > > > > On Sat, Nov 27, 2010 at 09:15:58PM +1100, Nick Piggin wrote: > > > > > >> > > > > > >> git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin.git vfs-scale-working > > > > > >> > > > > > >> Here is an new set of vfs patches for review, not that there was much interest > > > > > >> last time they were posted. It is structured like: > > > > > >> > > > > > >> * preparation patches > > > > > >> * introduce new locks to take over dcache_lock, then remove it > > > > > >> * cleaning up and reworking things for new locks > > > > > >> * rcu-walk path walking > > > > > >> * start on some fine grained locking steps > > > > > > > > > > > > Stress test doing: > > > > > > > > > > > > single thread 50M inode create > > > > > > single thread rm -rf > > > > > > 2-way 50M inode create > > > > > > 2-way rm -rf > > > > > > 4-way 50M inode create > > > > > > 4-way rm -rf > > > > > > 8-way 50M inode create > > > > > > 8-way rm -rf > > > > > > 8-way 250M inode create > > > > > > 8-way rm -rf > > > > > > > > > > > > Failed about 5 minutes into the "4-way rm -rf" (~3 hours into the test) > > > > > > with a CPU stuck spinning on here: > > > > > > > > > > > > [37372.084012] NMI backtrace for cpu 5 > > > > > > [37372.084012] CPU 5 > > > > > > [37372.084012] Modules linked in: > > > > > > [37372.084012] > > > > > > [37372.084012] Pid: 15214, comm: rm Not tainted 2.6.37-rc4-dgc+ #797 /Bochs > > > > > > [37372.084012] RIP: 0010:[<ffffffff810643c4>] [<ffffffff810643c4>] __ticket_spin_lock+0x14/0x20 > > > > > > [37372.084012] RSP: 0018:ffff880114643c98 EFLAGS: 00000213 > > > > > > [37372.084012] RAX: 0000000000008801 RBX: ffff8800687be6c0 RCX: ffff8800c4eb2688 > > > > > > [37372.084012] RDX: ffff880114643d38 RSI: ffff8800dfd4ea80 RDI: ffff880114643d14 > > > > > > [37372.084012] RBP: ffff880114643c98 R08: 0000000000000003 R09: 0000000000000000 > > > > > > [37372.084012] R10: 0000000000000000 R11: dead000000200200 R12: ffff880114643d14 > > > > > > [37372.084012] R13: ffff880114643cb8 R14: ffff880114643d38 R15: ffff8800687be71c > > > > > > [37372.084012] FS: 00007fd6d7c93700(0000) GS:ffff8800dfd40000(0000) knlGS:0000000000000000 > > > > > > [37372.084012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > > > > > [37372.084012] CR2: 0000000000bbd108 CR3: 0000000107146000 CR4: 00000000000006e0 > > > > > > [37372.084012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > > [37372.084012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > > > > > [37372.084012] Process rm (pid: 15214, threadinfo ffff880114642000, task ffff88011b16f890) > > > > > > [37372.084012] Stack: > > > > > > [37372.084012] ffff880114643ca8 ffffffff81ad044e ffff880114643cf8 ffffffff81167ae7 > > > > > > [37372.084012] 0000000000000000 ffff880114643d38 000000000000000e ffff88011901d800 > > > > > > [37372.084012] ffff8800cdb7cf5c ffff88011901d8e0 0000000000000000 0000000000000000 > > > > > > [37372.084012] Call Trace: > > > > > > [37372.084012] [<ffffffff81ad044e>] _raw_spin_lock+0xe/0x20 > > > > > > [37372.084012] [<ffffffff81167ae7>] shrink_dentry_list+0x47/0x370 > > > > > > [37372.084012] [<ffffffff81167f5e>] __shrink_dcache_sb+0x14e/0x1e0 > > > > > > [37372.084012] [<ffffffff81168456>] shrink_dcache_parent+0x276/0x2d0 > > > > > > [37372.084012] [<ffffffff81ad044e>] ? _raw_spin_lock+0xe/0x20 > > > > > > [37372.084012] [<ffffffff8115daa2>] dentry_unhash+0x42/0x80 > > > > > > [37372.084012] [<ffffffff8115db48>] vfs_rmdir+0x68/0x100 > > > > > > [37372.084012] [<ffffffff8115fd93>] do_rmdir+0x113/0x130 > > > > > > [37372.084012] [<ffffffff8114f5ad>] ? filp_close+0x5d/0x90 > > > > > > [37372.084012] [<ffffffff8115fde5>] sys_unlinkat+0x35/0x40 > > > > > > [37372.084012] [<ffffffff8103a002>] system_call_fastpath+0x16/0x1b > > > > > > > > > > OK good, with any luck, that's the same bug. > > > > > > > > > > Is this XFS? > > > > > > > > Yes. > > > > > > > > > Is there any concurrent activity happening on the same dentries? > > > > > > > > Not from an application perspective. > > > > > > > > > Ie. are the rm -rf threads running on the same directories, > > > > > > > > No, each thread operating on a different directory. > > > > This is probably fixed by the same patch as the lockdep splat trace. > > > > > > > > > or is there any reclaim happening in the background? > > > > > > > > IIRC, kswapd was consuming about 5-10% of a CPU during parallel > > > > unlink tests. Mainly reclaiming XFS inodes, I think, but there may > > > > be dentry cache reclaim going as well. > > > > > > Turns out that the kswapd peaks are upwards of 50% of a CPU for a > > > few seconds, then idle for 10-15s. Typical perf top output of kswapd > > > while it is active during unlinks is: > > > > > > samples pcnt function DSO > > > _______ _____ ___________________________ _________________ > > > > > > 17168.00 10.2% __call_rcu [kernel.kallsyms] > > > 13223.00 7.8% kmem_cache_free [kernel.kallsyms] > > > 12917.00 7.6% down_write [kernel.kallsyms] > > > 12665.00 7.5% xfs_iunlock [kernel.kallsyms] > > > 10493.00 6.2% xfs_reclaim_inode_grab [kernel.kallsyms] > > > 9314.00 5.5% __lookup_tag [kernel.kallsyms] > > > 9040.00 5.4% radix_tree_delete [kernel.kallsyms] > > > 8694.00 5.1% is_bad_inode [kernel.kallsyms] > > > 7639.00 4.5% __ticket_spin_lock [kernel.kallsyms] > > > 6821.00 4.0% _raw_spin_unlock_irqrestore [kernel.kallsyms] > > > 5484.00 3.2% __d_drop [kernel.kallsyms] > > > 5114.00 3.0% xfs_reclaim_inode [kernel.kallsyms] > > > 4626.00 2.7% __rcu_process_callbacks [kernel.kallsyms] > > > 3556.00 2.1% up_write [kernel.kallsyms] > > > 3206.00 1.9% _cond_resched [kernel.kallsyms] > > > 3129.00 1.9% xfs_qm_dqdetach [kernel.kallsyms] > > > 2327.00 1.4% radix_tree_tag_clear [kernel.kallsyms] > > > 2327.00 1.4% call_rcu_sched [kernel.kallsyms] > > > 2262.00 1.3% __ticket_spin_unlock [kernel.kallsyms] > > > 2215.00 1.3% xfs_ilock [kernel.kallsyms] > > > 2200.00 1.3% radix_tree_gang_lookup_tag [kernel.kallsyms] > > > 1982.00 1.2% xfs_reclaim_inodes_ag [kernel.kallsyms] > > > 1736.00 1.0% xfs_trans_unlocked_item [kernel.kallsyms] > > > 1707.00 1.0% __ticket_spin_trylock [kernel.kallsyms] > > > 1688.00 1.0% xfs_perag_get_tag [kernel.kallsyms] > > > 1660.00 1.0% flat_send_IPI_mask [kernel.kallsyms] > > > 1538.00 0.9% xfs_inode_item_destroy [kernel.kallsyms] > > > 1312.00 0.8% __shrink_dcache_sb [kernel.kallsyms] > > > 940.00 0.6% xfs_perag_put [kernel.kallsyms] > > > > > > So there is some dentry cache reclaim going on. > > > > > > FWIW, it appears there is quite a lot of RCU freeing overhead (~15% > > > more CPU time) in the work kswapd is doing during these unlinks, too. > > > I just had a look at kswapd when a 8-way create is running - it's running at > > > 50-60% of a cpu for seconds at a time. I caught this while it was doing pure > > > XFS inode cache reclaim (~10s sample, kswapd reclaimed ~1M inodes): > > > > > > samples pcnt function DSO > > > _______ _____ ___________________________ _________________ > > > > > > 27171.00 9.0% __call_rcu [kernel.kallsyms] > > > 21491.00 7.1% down_write [kernel.kallsyms] > > > 20916.00 6.9% xfs_reclaim_inode [kernel.kallsyms] > > > 20313.00 6.7% radix_tree_delete [kernel.kallsyms] > > > 15828.00 5.3% kmem_cache_free [kernel.kallsyms] > > > 15819.00 5.2% xfs_idestroy_fork [kernel.kallsyms] > > > 14893.00 4.9% is_bad_inode [kernel.kallsyms] > > > 14666.00 4.9% _raw_spin_unlock_irqrestore [kernel.kallsyms] > > > 14191.00 4.7% xfs_reclaim_inode_grab [kernel.kallsyms] > > > 14105.00 4.7% xfs_iunlock [kernel.kallsyms] > > > 10916.00 3.6% __ticket_spin_lock [kernel.kallsyms] > > > 10125.00 3.4% xfs_iflush_cluster [kernel.kallsyms] > > > 8221.00 2.7% xfs_qm_dqdetach [kernel.kallsyms] > > > 7639.00 2.5% xfs_trans_unlocked_item [kernel.kallsyms] > > > 7028.00 2.3% xfs_synchronize_times [kernel.kallsyms] > > > 6974.00 2.3% up_write [kernel.kallsyms] > > > 5870.00 1.9% call_rcu_sched [kernel.kallsyms] > > > 5634.00 1.9% _cond_resched [kernel.kallsyms] > > > > > > Which is showing a similar amount of RCU overhead as the unlink as above. > > > And this while it was doing dentry cache reclaim (~10s sample): > > > > > > 35921.00 15.7% __d_drop [kernel.kallsyms] > > > 30056.00 13.1% __ticket_spin_trylock [kernel.kallsyms] > > > 29066.00 12.7% __ticket_spin_lock [kernel.kallsyms] > > > 19043.00 8.3% __call_rcu [kernel.kallsyms] > > > 10098.00 4.4% iput [kernel.kallsyms] > > > 7013.00 3.1% __shrink_dcache_sb [kernel.kallsyms] > > > 6774.00 3.0% __percpu_counter_add [kernel.kallsyms] > > > 6708.00 2.9% radix_tree_tag_set [kernel.kallsyms] > > > 5362.00 2.3% xfs_inactive [kernel.kallsyms] > > > 5130.00 2.2% __ticket_spin_unlock [kernel.kallsyms] > > > 4884.00 2.1% call_rcu_sched [kernel.kallsyms] > > > 4621.00 2.0% dentry_lru_del [kernel.kallsyms] > > > 3735.00 1.6% bit_waitqueue [kernel.kallsyms] > > > 3727.00 1.6% dentry_iput [kernel.kallsyms] > > > 3473.00 1.5% shrink_icache_memory [kernel.kallsyms] > > > 3279.00 1.4% kfree [kernel.kallsyms] > > > 3101.00 1.4% xfs_perag_get [kernel.kallsyms] > > > 2516.00 1.1% kmem_cache_free [kernel.kallsyms] > > > 2272.00 1.0% shrink_dentry_list [kernel.kallsyms] > > > > > > I've never really seen any signficant dentry cache reclaim overhead > > > in profiles of these workloads before, so this was a bit of a > > > surprise.... > > > > call_rcu shouldn't be doing much, except for disabling irqs and linking > > the object into the list. I have a patch somewhere to reduce the irq > > disable overhead a bit, but it really shouldn't be doing a lot of work. > > Could you please enable CONFIG_RCU_TRACE, mount debugfs somewhere, and > look at rcu/rcudata? There will be a "ql=" number printed for each > CPU, and if that number is too large, __call_rcu() does take what it > considers to be corrective action, which can incur some overhead. > > If this is the problem, then increasing the value of the qhimark module > parameter might help. > > If this is not the problem, I could make a patch that disables some of > __call_rcu()'s grace-period acceleration code if you are willing to try > it out. Another thing that might help is to reduce the value of CONFIG_RCU_FANOUT to something like 16. If this does help, then there is a reasonably straightforward change I can make to RCU. Thanx, Paul > > Sometimes you find that touching the rcu head field needs to get a > > cacheline exclusive, so a bit of work gets transferred there.... > > > > But it may also be something going a bit wrong in RCU. I blew it up > > once already, after the files_lock splitup that enabled all CPUs to > > create and destroy files :) > > I would certainly like the opportunity to fix any bugs that might be > in RCU... ;-) > > Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-12 14:54 UTC|newest] Thread overview: 107+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-11-27 10:15 [PATCH 00/46] rcu-walk and dcache scaling Nick Piggin 2010-11-27 9:44 ` [PATCH 02/46] fs: d_validate fixes Nick Piggin 2010-12-08 1:53 ` Dave Chinner 2010-12-08 6:59 ` Nick Piggin 2010-12-09 0:50 ` Dave Chinner 2010-12-09 0:50 ` Dave Chinner 2010-12-09 4:50 ` Nick Piggin 2010-12-09 4:50 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 03/46] kernel: kmem_ptr_validate considered harmful Nick Piggin 2010-11-27 9:44 ` [PATCH 04/46] fs: dcache documentation cleanup Nick Piggin 2010-11-27 9:44 ` [PATCH 05/46] fs: change d_delete semantics Nick Piggin 2010-11-27 9:44 ` [PATCH 06/46] cifs: dont overwrite dentry name in d_revalidate Nick Piggin 2010-11-27 9:44 ` [PATCH 07/46] jfs: " Nick Piggin 2010-11-27 9:44 ` [PATCH 08/46] fs: change d_compare for rcu-walk Nick Piggin 2010-11-27 9:44 ` [PATCH 09/46] fs: change d_hash " Nick Piggin 2010-11-27 9:44 ` [PATCH 10/46] hostfs: simplify locking Nick Piggin 2010-11-27 9:44 ` [PATCH 11/46] fs: dcache scale hash Nick Piggin 2010-12-09 6:09 ` Dave Chinner 2010-12-09 6:28 ` Nick Piggin 2010-12-09 8:17 ` Dave Chinner 2010-12-09 12:53 ` Nick Piggin 2010-12-09 23:42 ` Dave Chinner 2010-12-10 2:35 ` Nick Piggin 2010-12-10 9:01 ` Dave Chinner 2010-12-13 4:48 ` Nick Piggin 2010-12-13 5:05 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 12/46] fs: dcache scale lru Nick Piggin 2010-12-09 7:22 ` Dave Chinner 2010-12-09 12:34 ` Nick Piggin 2010-11-27 9:44 ` [PATCH 13/46] fs: dcache scale dentry refcount Nick Piggin 2010-11-27 9:44 ` [PATCH 14/46] fs: dcache scale d_unhashed Nick Piggin 2010-11-27 9:44 ` [PATCH 15/46] fs: dcache scale subdirs Nick Piggin 2010-11-27 9:44 ` [PATCH 16/46] fs: scale inode alias list Nick Piggin 2010-11-27 9:44 ` [PATCH 17/46] fs: Use rename lock and RCU for multi-step operations Nick Piggin 2011-01-18 22:32 ` Yehuda Sadeh Weinraub 2011-01-18 22:42 ` Nick Piggin 2011-01-19 22:27 ` Yehuda Sadeh Weinraub 2011-01-19 22:32 ` Nick Piggin 2011-01-25 22:10 ` Yehuda Sadeh Weinraub 2011-01-27 5:18 ` Nick Piggin 2011-02-07 18:52 ` Jim Schutt 2011-02-07 21:04 ` Yehuda Sadeh Weinraub 2011-02-07 21:04 ` Yehuda Sadeh Weinraub 2011-02-07 21:31 ` Jim Schutt 2011-02-07 21:35 ` Gregory Farnum 2011-02-07 22:25 ` Jim Schutt 2011-02-14 17:57 ` Yehuda Sadeh Weinraub 2010-11-27 9:44 ` [PATCH 18/46] fs: increase d_name lock coverage Nick Piggin 2010-11-27 9:44 ` [PATCH 19/46] fs: dcache remove dcache_lock Nick Piggin 2010-11-27 9:44 ` [PATCH 20/46] fs: dcache avoid starvation in dcache multi-step operations Nick Piggin 2010-11-27 9:44 ` [PATCH 21/46] fs: dcache reduce dput locking Nick Piggin 2010-11-27 9:44 ` [PATCH 22/46] fs: dcache reduce locking in d_alloc Nick Piggin 2010-11-27 9:44 ` [PATCH 23/46] fs: dcache reduce dcache_inode_lock Nick Piggin 2010-11-27 9:44 ` [PATCH 24/46] fs: dcache rationalise dget variants Nick Piggin 2010-11-27 9:44 ` [PATCH 25/46] fs: dcache reduce d_parent locking Nick Piggin 2010-11-27 9:44 ` [PATCH 26/46] fs: dcache reduce prune_one_dentry locking Nick Piggin 2010-11-27 9:44 ` [PATCH 27/46] fs: reduce dcache_inode_lock width in lru scanning Nick Piggin 2010-11-27 9:44 ` [PATCH 28/46] fs: use RCU in shrink_dentry_list to reduce lock nesting Nick Piggin 2010-11-27 9:44 ` [PATCH 29/46] fs: consolidate dentry kill sequence Nick Piggin 2010-11-27 9:45 ` [PATCH 30/46] fs: icache RCU free inodes Nick Piggin 2010-11-27 9:45 ` [PATCH 31/46] fs: avoid inode RCU freeing for pseudo fs Nick Piggin 2010-11-27 9:45 ` [PATCH 32/46] kernel: optimise seqlock Nick Piggin 2010-11-27 9:45 ` [PATCH 33/46] fs: rcu-walk for path lookup Nick Piggin 2010-11-27 9:45 ` [PATCH 34/46] fs: fs_struct use seqlock Nick Piggin 2010-11-27 9:45 ` [PATCH 35/46] fs: dcache remove d_mounted Nick Piggin 2010-11-27 9:45 ` [PATCH 36/46] fs: dcache reduce branches in lookup path Nick Piggin 2010-11-27 9:45 ` [PATCH 37/46] fs: cache optimise dentry and inode for rcu-walk Nick Piggin 2010-11-27 9:45 ` [PATCH 38/46] fs: prefetch inode data in dcache lookup Nick Piggin 2010-11-27 9:45 ` [PATCH 39/46] fs: d_revalidate_rcu for rcu-walk Nick Piggin 2010-11-27 9:45 ` [PATCH 40/46] fs: provide rcu-walk aware permission i_ops Nick Piggin 2010-11-27 9:45 ` [PATCH 41/46] fs: provide simple rcu-walk ACL implementation Nick Piggin 2010-11-27 9:45 ` [PATCH 42/46] kernel: add bl_list Nick Piggin 2010-11-27 9:45 ` [PATCH 43/46] bit_spinlock: add required includes Nick Piggin 2010-11-27 9:45 ` [PATCH 44/46] fs: dcache per-bucket dcache hash locking Nick Piggin 2010-11-27 9:45 ` [PATCH 45/46] fs: dcache per-inode inode alias locking Nick Piggin 2010-11-27 9:45 ` [PATCH 46/46] fs: improve scalability of pseudo filesystems Nick Piggin 2010-11-27 9:56 ` [PATCH 01/46] Revert "fs: use RCU read side protection in d_validate" Nick Piggin 2010-12-08 1:16 ` Dave Chinner 2010-12-08 9:38 ` Nick Piggin 2010-12-09 0:44 ` Dave Chinner 2010-12-09 4:38 ` Nick Piggin 2010-12-09 5:16 ` Nick Piggin 2010-11-27 15:04 ` [PATCH 00/46] rcu-walk and dcache scaling Anca Emanuel 2010-11-27 15:04 ` Anca Emanuel 2010-11-28 3:28 ` Nick Piggin 2010-11-28 3:28 ` Nick Piggin 2010-11-28 6:24 ` Sedat Dilek 2010-12-01 18:03 ` David Miller 2010-12-03 16:55 ` Nick Piggin 2010-12-07 11:25 ` Dave Chinner 2010-12-07 15:24 ` Nick Piggin 2010-12-07 15:24 ` Nick Piggin 2010-12-07 15:49 ` Peter Zijlstra 2010-12-07 15:59 ` Nick Piggin 2010-12-07 16:23 ` Peter Zijlstra 2010-12-08 3:28 ` Nick Piggin 2010-12-07 21:56 ` Dave Chinner 2010-12-08 1:47 ` Nick Piggin 2010-12-08 3:32 ` Dave Chinner 2010-12-08 4:28 ` Dave Chinner 2010-12-08 7:09 ` Nick Piggin 2010-12-08 7:09 ` Nick Piggin 2010-12-10 20:32 ` Paul E. McKenney 2010-12-12 14:54 ` Paul E. McKenney [this message] 2010-12-12 14:54 ` Paul E. McKenney 2010-11-27 19:20 Sedat Dilek 2010-11-27 20:53 ` Sedat Dilek
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20101212145407.GA1969@linux.vnet.ibm.com \ --to=paulmck@linux.vnet.ibm.com \ --cc=david@fromorbit.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=npiggin@gmail.com \ --cc=npiggin@kernel.dk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.