From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C085E33C7 for ; Fri, 17 Jun 2022 17:45:35 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5FB91C3411F for ; Fri, 17 Jun 2022 17:45:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1655487935; bh=2bLTUGMcqE9XfyJ1xEYibIb9cepOI1zHjzNq7q7v8Pc=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=U9ziLVoVqnseiG0BkAoizHPLxKMq+DjyU3414W4X5eXM7NWlE471ab4uDT7zic0n9 +xOxBcgbcDLbsdWlX+GaLLvY1eB3SkjBM3sW3oFXApv6AbmN1sgQaZSD+kL+gC3vb3 PwVxuE9yqtZJxY3eKqKSLZsPNVtcWcJceSrnHLwvcwJWpjjGzlNg4B9M4xTbIoJRlr gk+Kn06CaoZzC1zkFmxz2gZ9OsuxFOGVBp6OgVEFZrl9szlr9Xn0Df7/wW/cMG6dxQ j3S7QvYEZ4FQsAWsl44xCc4ItILWq+d7//P6AO7NR7rBwMbRCIDBg6Oh6ExrTwAltP pGGDv99PCju/g== Received: by mail-vk1-f169.google.com with SMTP id z17so2297912vkb.13 for ; Fri, 17 Jun 2022 10:45:35 -0700 (PDT) X-Gm-Message-State: AJIora/Doh4g0wfCrVXbRVGxCh5/JqHF7fl5CuLYNqVpY9q/LUYmts6X Xfkh7HbfdFq/36DHPEhmZC+bDdBNuxY5F52dptc= X-Google-Smtp-Source: AGRyM1sQtvUSOGyN+sZo81R1iv/izwlld/9w86rG10v7JeX0QW/zLoy4/ygClMaGKuQlHps7rGmxPqiBOyOgeIkOOoQ= X-Received: by 2002:a1f:2048:0:b0:35e:3d39:addf with SMTP id g69-20020a1f2048000000b0035e3d39addfmr5242726vkg.28.1655487934310; Fri, 17 Jun 2022 10:45:34 -0700 (PDT) Precedence: bulk X-Mailing-List: loongarch@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20220617145705.581985-1-chenhuacai@loongson.cn> In-Reply-To: From: Guo Ren Date: Sat, 18 Jun 2022 01:45:23 +0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] LoongArch: Add qspinlock support To: Arnd Bergmann Cc: Huacai Chen , Huacai Chen , loongarch@lists.linux.dev, linux-arch , Xuefeng Li , Xuerui Wang , Jiaxun Yang , Peter Zijlstra , Will Deacon , Ingo Molnar Content-Type: text/plain; charset="UTF-8" On Sat, Jun 18, 2022 at 12:11 AM Arnd Bergmann wrote: > > On Fri, Jun 17, 2022 at 4:57 PM Huacai Chen wrote: > > > > On NUMA system, the performance of qspinlock is better than generic > > spinlock. Below is the UnixBench test results on a 8 nodes (4 cores > > per node, 32 cores in total) machine. > > > > The performance increase is nice, but this is only half the story we need here. > > I think the more important bit is how you can guarantee that the xchg16() > implementation is correct and always allows forward progress. > > >@@ -123,6 +123,10 @@ static inline unsigned long __percpu_xchg(void *ptr, unsigned long val, > > int size) > > { > > switch (size) { > >+ case 1: > >+ case 2: > >+ return __xchg_small((volatile void *)ptr, val, size); > >+ > > Do you actually need the size 1 as well? > > Generally speaking, I would like to rework the xchg()/cmpxchg() logic > to only cover the 32-bit and word-sized (possibly 64-bit) case, while > having separate optional 8-bit and 16-bit functions. I had a patch for Why not prevent 8-bit and 16-bit xchg()/cmpxchg() directly? eg: move qspinlock xchg_tail to per arch_xchg_tail. That means Linux doesn't provide a mixed-size atomic operation primitive. What does your "separate optional 8-bit and 16-bit functions" mean here? > this in the past, and can try to dig that out, this may be the time to > finally do that. > > I see that the qspinlock() code actually calls a 'relaxed' version of xchg16(), > but you only implement the one with the full barrier. Is it possible to > directly provide a relaxed version that has something less than the > __WEAK_LLSC_MB? I am also curious that __WEAK_LLSC_MB is very magic. How does it prevent preceded accesses from happening after sc for a strong cmpxchg? #define __cmpxchg_asm(ld, st, m, old, new) \ ({ \ __typeof(old) __ret; \ \ __asm__ __volatile__( \ "1: " ld " %0, %2 # __cmpxchg_asm \n" \ " bne %0, %z3, 2f \n" \ " or $t0, %z4, $zero \n" \ " " st " $t0, %1 \n" \ " beq $zero, $t0, 1b \n" \ "2: \n" \ __WEAK_LLSC_MB \ And its __smp_mb__xxx are just defined as a compiler barrier()? #define __smp_mb__before_atomic() barrier() #define __smp_mb__after_atomic() barrier() > > Arnd -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/