From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA912C4332F for ; Tue, 8 Nov 2022 17:45:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234581AbiKHRpg (ORCPT ); Tue, 8 Nov 2022 12:45:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233945AbiKHRpc (ORCPT ); Tue, 8 Nov 2022 12:45:32 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62B3954B10 for ; Tue, 8 Nov 2022 09:45:31 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 081542236D; Tue, 8 Nov 2022 17:45:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1667929530; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BkRPPsDLD2y/9bXDlwcBueXyAXulj2d4iFZ7gt3vhWo=; b=UMfv/rIpSOrna721v6X1lqEnAz7hv6CJmRKb5CgMAkd04rA5TOf6xOT0VFc/8F6ae0DYlH sy+qwFK82UbZ4ABIdYsabNbnGQ+g16YKimhnUtKbWbC6JJN5h5a3uO3v5x9CYhl7VjOiq/ NT6QssTgdrOlTW1iKQRDWPbZ+31E7G0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1667929530; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=BkRPPsDLD2y/9bXDlwcBueXyAXulj2d4iFZ7gt3vhWo=; b=ggWB/DME/viTUkArcdBIlmuQSFq4PqETGN/3TGdk1F5dCYTkwN4Hjawlp9gAq5ncNRq+LN UBOkgRh7K9iWdKDg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EB6CE139F1; Tue, 8 Nov 2022 17:45:29 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id PDhNObmVamONJQAAMHmgww (envelope-from ); Tue, 08 Nov 2022 17:45:29 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 6CCE3A0704; Tue, 8 Nov 2022 18:45:29 +0100 (CET) Date: Tue, 8 Nov 2022 18:45:29 +0100 From: Jan Kara To: Mark Rutland Cc: Waiman Long , Sebastian Andrzej Siewior , Jan Kara , LKML , Thomas Gleixner , Steven Rostedt , Mel Gorman , Peter Zijlstra , Ingo Molnar , Will Deacon , Catalin Marinas Subject: Re: Crash with PREEMPT_RT on aarch64 machine Message-ID: <20221108174529.pp4qqi2mhpzww77p@quack3> References: <20221103115444.m2rjglbkubydidts@quack3> <20221107135636.biouna36osqc4rik@quack3> <359cc93a-fce0-5af2-0fd5-81999fad186b@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 08-11-22 10:53:40, Mark Rutland wrote: > On Mon, Nov 07, 2022 at 11:49:01AM -0500, Waiman Long wrote: > > On 11/7/22 10:10, Sebastian Andrzej Siewior wrote: > > > + locking, arm64 > > > > > > On 2022-11-07 14:56:36 [+0100], Jan Kara wrote: > > > > > spinlock_t and raw_spinlock_t differ slightly in terms of locking. > > > > > rt_spin_lock() has the fast path via try_cmpxchg_acquire(). If you > > > > > enable CONFIG_DEBUG_RT_MUTEXES then you would force the slow path which > > > > > always acquires the rt_mutex_base::wait_lock (which is a raw_spinlock_t) > > > > > while the actual lock is modified via cmpxchg. > > > > So I've tried enabling CONFIG_DEBUG_RT_MUTEXES and indeed the corruption > > > > stops happening as well. So do you suspect some bug in the CPU itself? > > > If it is only enabling CONFIG_DEBUG_RT_MUTEXES (and not whole lockdep) > > > then it looks very suspicious. > > > CONFIG_DEBUG_RT_MUTEXES enables a few additional checks but the main > > > part is that rt_mutex_cmpxchg_acquire() + rt_mutex_cmpxchg_release() > > > always fail (and so the slowpath under a raw_spinlock_t is done). > > > > > > So if it is really the fast path (rt_mutex_cmpxchg_acquire()) then it > > > somehow smells like the CPU is misbehaving. > > > > > > Could someone from the locking/arm64 department check if the locking in > > > RT-mutex (rtlock_lock()) is correct? > > > > > > rtmutex locking uses try_cmpxchg_acquire(, ptr, ptr) for the fastpath > > > (and try_cmpxchg_release(, ptr, ptr) for unlock). > > > Now looking at it again, I don't see much difference compared to what > > > queued_spin_trylock() does except the latter always operates on 32bit > > > value instead a pointer. > > > > Both the fast path of queued spinlock and rt_spin_lock are using > > try_cmpxchg_acquire(), the only difference I saw is the size of the data to > > be cmpxchg'ed. qspinlock uses 32-bit integer whereas rt_spin_lock uses > > 64-bit pointer. So I believe it is more on how the arm64 does cmpxchg. I > > believe there are two different ways of doing it depending on whether LSE > > atomics is available in the platform. So exactly what arm64 system is being > > used here and what hardware capability does it have? > > From the /proc/cpuinfo output earlier, this is a Neoverse N1 system, with the > LSE atomics. Assuming the kernel was built with support for atomics in-kernel > (which is selected by default), it'll be using the LSE version. So I was able to reproduce the corruption both with LSE atomics enabled & disabled in the kernel. It seems the problem takes considerably longer to reproduce with LSE atomics enabled but it still does happen. BTW, I've tried to reproduced the problem on another aarch64 machine with CPU from a different vendor: processor : 0 BogoMIPS : 200.00 Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm CPU implementer : 0x48 CPU architecture: 8 CPU variant : 0x1 CPU part : 0xd01 CPU revision : 0 And there the problem does not reproduce. So might it be a genuine bug in the CPU implementation? Honza -- Jan Kara SUSE Labs, CR