From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB1A6C43217 for ; Wed, 9 Nov 2022 13:53:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230088AbiKINw7 (ORCPT ); Wed, 9 Nov 2022 08:52:59 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40066 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229809AbiKINw4 (ORCPT ); Wed, 9 Nov 2022 08:52:56 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id C61BA1CB24 for ; Wed, 9 Nov 2022 05:52:54 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B54321FB; Wed, 9 Nov 2022 05:53:00 -0800 (PST) Received: from [10.57.3.250] (unknown [10.57.3.250]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DA3363F73D; Wed, 9 Nov 2022 05:52:51 -0800 (PST) Message-ID: <9ca45a07-00ba-9afd-2e25-7bab6cefab0e@arm.com> Date: Wed, 9 Nov 2022 14:52:46 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: Crash with PREEMPT_RT on aarch64 machine Content-Language: en-US To: Jan Kara , Mark Rutland Cc: Waiman Long , Sebastian Andrzej Siewior , LKML , Thomas Gleixner , Steven Rostedt , Mel Gorman , Peter Zijlstra , Ingo Molnar , Will Deacon , Catalin Marinas References: <20221103115444.m2rjglbkubydidts@quack3> <20221107135636.biouna36osqc4rik@quack3> <359cc93a-fce0-5af2-0fd5-81999fad186b@redhat.com> <20221108174529.pp4qqi2mhpzww77p@quack3> <20221109110133.txft66ukwfw2ifkj@quack3> From: Pierre Gondois In-Reply-To: <20221109110133.txft66ukwfw2ifkj@quack3> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/9/22 12:01, Jan Kara wrote: > On Wed 09-11-22 09:55:07, Mark Rutland wrote: >> On Tue, Nov 08, 2022 at 06:45:29PM +0100, Jan Kara wrote: >>> On Tue 08-11-22 10:53:40, Mark Rutland wrote: >>>> On Mon, Nov 07, 2022 at 11:49:01AM -0500, Waiman Long wrote: >>>>> On 11/7/22 10:10, Sebastian Andrzej Siewior wrote: >>>>>> + locking, arm64 >>>>>> >>>>>> On 2022-11-07 14:56:36 [+0100], Jan Kara wrote: >>>>>>>> spinlock_t and raw_spinlock_t differ slightly in terms of locking. >>>>>>>> rt_spin_lock() has the fast path via try_cmpxchg_acquire(). If you >>>>>>>> enable CONFIG_DEBUG_RT_MUTEXES then you would force the slow path which >>>>>>>> always acquires the rt_mutex_base::wait_lock (which is a raw_spinlock_t) >>>>>>>> while the actual lock is modified via cmpxchg. >>>>>>> So I've tried enabling CONFIG_DEBUG_RT_MUTEXES and indeed the corruption >>>>>>> stops happening as well. So do you suspect some bug in the CPU itself? >>>>>> If it is only enabling CONFIG_DEBUG_RT_MUTEXES (and not whole lockdep) >>>>>> then it looks very suspicious. >>>>>> CONFIG_DEBUG_RT_MUTEXES enables a few additional checks but the main >>>>>> part is that rt_mutex_cmpxchg_acquire() + rt_mutex_cmpxchg_release() >>>>>> always fail (and so the slowpath under a raw_spinlock_t is done). >>>>>> >>>>>> So if it is really the fast path (rt_mutex_cmpxchg_acquire()) then it >>>>>> somehow smells like the CPU is misbehaving. >>>>>> >>>>>> Could someone from the locking/arm64 department check if the locking in >>>>>> RT-mutex (rtlock_lock()) is correct? >>>>>> >>>>>> rtmutex locking uses try_cmpxchg_acquire(, ptr, ptr) for the fastpath >>>>>> (and try_cmpxchg_release(, ptr, ptr) for unlock). >>>>>> Now looking at it again, I don't see much difference compared to what >>>>>> queued_spin_trylock() does except the latter always operates on 32bit >>>>>> value instead a pointer. >>>>> >>>>> Both the fast path of queued spinlock and rt_spin_lock are using >>>>> try_cmpxchg_acquire(), the only difference I saw is the size of the data to >>>>> be cmpxchg'ed. qspinlock uses 32-bit integer whereas rt_spin_lock uses >>>>> 64-bit pointer. So I believe it is more on how the arm64 does cmpxchg. I >>>>> believe there are two different ways of doing it depending on whether LSE >>>>> atomics is available in the platform. So exactly what arm64 system is being >>>>> used here and what hardware capability does it have? >>>> >>>> From the /proc/cpuinfo output earlier, this is a Neoverse N1 system, with the >>>> LSE atomics. Assuming the kernel was built with support for atomics in-kernel >>>> (which is selected by default), it'll be using the LSE version. >>> >>> So I was able to reproduce the corruption both with LSE atomics enabled & >>> disabled in the kernel. It seems the problem takes considerably longer to >>> reproduce with LSE atomics enabled but it still does happen. >>> >>> BTW, I've tried to reproduced the problem on another aarch64 machine with >>> CPU from a different vendor: >>> >>> processor : 0 >>> BogoMIPS : 200.00 >>> Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm >>> CPU implementer : 0x48 >>> CPU architecture: 8 >>> CPU variant : 0x1 >>> CPU part : 0xd01 >>> CPU revision : 0 >>> >>> And there the problem does not reproduce. So might it be a genuine bug in >>> the CPU implementation? >> >> Perhaps, though I suspect it's more likely that we have an ordering bug in the >> kernel code, and it shows up on CPUs with legitimate but more relaxed ordering. >> We've had a couple of those show up on Apple M1, so it might be worth trying on >> one of those. >> >> How easy is this to reproduce? What's necessary? > > As Pierre writes, on Ampere Altra machine running dbench benchmark on XFS > filesystem triggers this relatively easily (it takes it about 10 minutes to > trigger without atomics and about 30 minutes to trigger with the atomics > enabled). > > Running the benchmark on XFS somehow seems to be important, we didn't see > the crash happen on ext4 (which may just mean it is less frequent on ext4 > and didn't trigger in our initial testing after which we've started to > investigate crashes with XFS). > > Honza It was possible to reproduce on an Ampere eMAG. It takes < 1min to reproduce once dbench is launched and seems more likely to trigger with the previous diff applied. It even sometimes triggers without launching dbench on the Altra. /proc/cpuinfo for eMAG: processor : 0 BogoMIPS : 80.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid CPU implementer : 0x50 CPU architecture: 8 CPU variant : 0x3 CPU part : 0x000 CPU revision : 2