From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759404AbbIDPFW (ORCPT ); Fri, 4 Sep 2015 11:05:22 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:32913 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751579AbbIDPFU (ORCPT ); Fri, 4 Sep 2015 11:05:20 -0400 MIME-Version: 1.0 In-Reply-To: <20150904082954.GB3902@dastard> References: <20150904054820.GY3902@dastard> <20150904071143.GZ3902@dastard> <20150904082954.GB3902@dastard> Date: Fri, 4 Sep 2015 08:05:16 -0700 X-Google-Sender-Auth: 9NshRvvKq_e6biN8VMEiaFATM9g Message-ID: Subject: Re: [4.2, Regression] Queued spinlocks cause major XFS performance regression From: Linus Torvalds To: Dave Chinner Cc: Linux Kernel Mailing List , Peter Zijlstra , Waiman Long , Ingo Molnar Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Sep 4, 2015 at 1:29 AM, Dave Chinner wrote: > ... > 0.02 1c: pause > 4.45 mov %ecx,%eax > 0.00 lock cmpxchg %edx,(%rdi) > 95.18 test %eax,%eax > jne 1c ... > It looks like it's spending all it's time looping around the cmpxchg. That code sequence doesn't look sensible. Busy-looping on a cmpxchg is insane - if you are busy-looping, you should always make sure the inner tight loop is done while waiting for the value. It seems to come from virt_queued_spin_lock(), and that just looks like completely bogus crap. PeterZ, this is your magical hypervisor thing, and I get the feeling that that explains why Dave sees nasty performance: most people have tested either on raw hardware or using the actual paravirtualized ones, but this is the case for "we're running with a hypervisor, but not paravirtualized". So virt_queued_spin_lock() for the hypervisor case looks completely buggered to me for several reasons: - it doesn't actually ever use any queueing, since it always returns true so the "queued spinlocks" in this case aren't actually queued, and they aren't even ticket-locks, they are just plain 0/1 values if I read things right. - the busy-loop to set the queued spinlock uses that cmpxchg in a tight loop, which kills any memory subsystem. That's unacceptable. So at the very *minimum*, that second issue should be fixed, and the loop in virt_queued_spin_lock() should look something like do { while (READ_ONCE(lock->val) != 0) cpu_relax(); } while (atomic_cmpxchg(&lock->val, 0, _Q_LOCKED_VAL) != 0); which at least has a chance in hell of behaving well on the bus and in a HT environment. But I suspect that it would be even better for Dave to just disable the whole thing, and see how the queued locks actually work. Dave, can you turn that virt_queued_spin_lock() into just "return false"? In fact, I would almost _insist_ we do this when CONFIG_PARAVIRT_SPINLOCK isn't set, isn't that what our old ticket-spinlocks did? They didn't screw up and degrade to a test-and-set lock just because they saw a hypervisor - that only happened when things were paravirt-aware. No? Dave, if you have the energy, try it both ways. But the code as-is for "I'm running in a hypervisor" looks just terminally broken. People who didn't run in hypervisors just never saw the breakage. Linus