From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D04FAE7C4F7 for ; Wed, 4 Oct 2023 20:23:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244853AbjJDUXL (ORCPT ); Wed, 4 Oct 2023 16:23:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245025AbjJDUWc (ORCPT ); Wed, 4 Oct 2023 16:22:32 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 274C0DD for ; Wed, 4 Oct 2023 13:22:25 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id 4fb4d7f45d1cf-5347e657a11so289839a12.2 for ; Wed, 04 Oct 2023 13:22:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1696450943; x=1697055743; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0Mbt0TmvMuhni3vtowqeLm74FeVQbN/EJlrFIRwD6Dg=; b=LTaVvTcgi7bM1Xt5KBlNW0tlYzK/nfQcGgk8Ejh/Zllug3boa2aDmpLnjAuZuyVmRI H5LBAYFkrgA8Yju2sq8za7Q+DzTuAFJSaobX8e2AKSlTVYqyYcmiDmchktEjPg8GN7MB bmpOKhlLe/X304bGxilDCGAB66THJeh34TvVFcaHq+qQUTwrhegO14cqIMq1DSZm/tBV JwPaEF6f/CWqYfvgokjGCZQwXuLtw0qQqNW1/bDQMZw3bCyiBmVj8UXyRLtksMBkcrmi KMGyxgp08aZNLSzKO3tatVoASdBXNuo9uEAMjnIjXLm5aJI19W6lfmi65Jtua+m/kEBT 6O7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696450943; x=1697055743; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0Mbt0TmvMuhni3vtowqeLm74FeVQbN/EJlrFIRwD6Dg=; b=ERvE9DRaH+UeRi+OlSqoEI8jLzvlEvqHMkxLvqnSaWoyIXsJB1gCR3+/NcV45AhXPo o0fV/ClkgVk5xTulEjItXNDD1iqxI2fPi/+xQoKXVw6TUVPLR7kzCoJ30FTqNra8olyN FD324By8TuCcTWi7GP4ExhNXd3ySiGW0KATn10Fw4M+MrHEBmv1MIBDXoaRUn5PFels3 DtbtE+36LSKGUO2G8qyTI4HOyNEThtBChYExhBXUzDksXRf+iIhfqB4NlRVW2RXoq/nO KYAAJs19EFkk7Mdb/414eGjRYk2Y+N1PAo1Lb6vVjZKWFAgKZF4NHrQZWQd0xWHOI0LP VWnQ== X-Gm-Message-State: AOJu0Yxesgx32u3+pOU7ZCMcf8X4oa7jd8cXPDKjS9Iso+OqO2iVb+GM wp1qwphSODBR/knBwbi13ME9xm7QKeXL5mjchi0= X-Google-Smtp-Source: AGHT+IGqzunMaRN/3uw7iRHSFZeK7Mzia1dP9K6VjR+9oK7VZkrT26V2lAcQRmg4kBzBzJ00WOT1E/sylua8cgFnEQE= X-Received: by 2002:a05:6402:699:b0:522:3a89:a7bc with SMTP id f25-20020a056402069900b005223a89a7bcmr2528381edy.42.1696450943354; Wed, 04 Oct 2023 13:22:23 -0700 (PDT) MIME-Version: 1.0 References: <20231004192404.31733-1-ubizjak@gmail.com> In-Reply-To: From: Uros Bizjak Date: Wed, 4 Oct 2023 22:22:12 +0200 Message-ID: Subject: Re: [PATCH v2 4/4] x86/percpu: Use C for percpu read/write accessors To: Linus Torvalds Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Andy Lutomirski , Ingo Molnar , Nadav Amit , Brian Gerst , Denys Vlasenko , "H . Peter Anvin" , Peter Zijlstra , Thomas Gleixner , Borislav Petkov , Josh Poimboeuf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Oct 4, 2023 at 10:20=E2=80=AFPM Linus Torvalds wrote: > > On Wed, 4 Oct 2023 at 13:12, Linus Torvalds > wrote: > > > > On Wed, 4 Oct 2023 at 13:08, Uros Bizjak wrote: > > > > > > You get a store forwarding stall when you write a bigger operand to > > > memory and then read part of it, if the smaller part doesn't start at > > > the same address. > > > > I don't think that has been true for over a decade now. > > > > Afaik, any half-way modern Intel and AMD cores will forward any fully > > contained load. > > https://www.agner.org/optimize/microarchitecture.pdf > > See for example pg 136 (Sandy Bridge / Ivy Bridge): > > "Store forwarding works in the following cases: > .. > =E2=80=A2 When a write of 64 bits or less is followed by a read of a sm= aller > size which is fully contained in the write address range, regardless > of alignment" > > and for AMD Zen cores: > > "Store forwarding of a write to a subsequent read works very well in > all cases, including reads from a part of the written data" > > So forget the whole "same address" rule. It's simply not true or > relevant any more. No problem then, we will implement the optimization in the compiler. Thanks, Uros.