From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758144AbbIDHLs (ORCPT ); Fri, 4 Sep 2015 03:11:48 -0400 Received: from ipmail06.adl2.internode.on.net ([150.101.137.129]:48710 "EHLO ipmail06.adl2.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751665AbbIDHLr (ORCPT ); Fri, 4 Sep 2015 03:11:47 -0400 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2D6CADlQulVPAUaLHlXBoMhgT2GUqJ1AQEBAQEHil6REgICAQECgTVNAQEBAQEBBwEBAQFBP4QjAQEBAwEnExwjBQsIAxgJJQ8FJQMHGhOIJgfKawEBCAIBHxmGE4VCgT0BgwpDB4QsBZVRikaCLoFNFYc8kVWCNg0PgWYsM4lLAQEB Date: Fri, 4 Sep 2015 17:11:43 +1000 From: Dave Chinner To: Linus Torvalds Cc: Linux Kernel Mailing List , Peter Zijlstra , Waiman Long , Ingo Molnar Subject: Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression Message-ID: <20150904071143.GZ3902@dastard> References: <20150904054820.GY3902@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 03, 2015 at 11:39:21PM -0700, Linus Torvalds wrote: > On Thu, Sep 3, 2015 at 10:48 PM, Dave Chinner wrote: > > > > When I turned spinlock debugging off on 4.2 to get some perf numbers > > a request from Linus, I got this: > > [ ugly numbers deleted ] > > > And then a quick call graph sample to find the lock: > > > > 37.19% 37.19% [kernel] [k] queued_spin_lock_slowpath > > - queued_spin_lock_slowpath > > - 99.98% _raw_spin_lock > > - 89.16% xfs_log_commit_cil > [ snip ] > > > > This shows that we have catastrophic spinlock contention in the > > transaction commit path. The cil->xc_cil_lock spin lock as it's the > > only spinlock in that path. And while it's the hot lock in the > > commit path, turning spinlock debugging back on (and no other > > changes) shows that it shouldn't be contended: > > > > 8.92% [kernel] [k] _xfs_buf_find > [ snip ] > > So you basically have almost no spinlock overhead at all even when > debugging is on. *nod* > That's unusual, as usually the debug code makes the contention much much worse. Right. The debug behaviour is completely unchanged, that's why I didn't notice this earlier. And it's not until I scale this workload to >32p that is tend to see and significant level of contention on the cil->xc_cil_lock when the basic spin lock debugging is enabled. > > To confirm that this is indeed caused by the queued spinlocks, I > > removed the the spinlock debugging and did this to arch/x86/Kconfig: > > > > - select ARCH_USE_QUEUED_SPINLOCK > > > > And the results are: > > Ok, that's pretty conclusive. It doesn't seem to make much _sense_, > but numbers talk, BS walks. > > If I read things right, the actual spinlock is the "cil->xc_cil_lock" > that is taken in xlog_cil_insert_items(), and it justr shows up in > xfs_log_commit_cil() in the call graph due to inlining. Correct? Yup, that's how I read it, too. > There doesn't seem to be anything even remotely strange going on in that area. > > Is this a PARAVIRT configuration? There were issues with PV > interaction at some point. If it is PV, and you don't actually use PV, > can you test with PV support disabled? $ grep PARAVIRT .config CONFIG_PARAVIRT=y # CONFIG_PARAVIRT_DEBUG is not set # CONFIG_PARAVIRT_SPINLOCKS is not set CONFIG_PARAVIRT_TIME_ACCOUNTING=y CONFIG_PARAVIRT_CLOCK=y $ I'll retest with CONFIG_PARAVIRT=n.... > Also, if you look at the instruction-level profile for > queued_spin_lock_slowpath itself, does anything stand out? For > example, I note that the for-loop with the atomic_cmpxchg() call in it > doesn't ever do a cpu_relax(). It doesn't look like that should > normally loop, but obviously that function also shouldn't normally use > 2/3rds of the cpu, so.. Maybe some part of queued_spin_lock_slowpath() > stands out as "it's spending 99% of the time in _that_ particular > part, and it gives some clue what goes wrong. I'll have a look when the current tests on that machine have finished running. Cheers, Dave. -- Dave Chinner david@fromorbit.com