All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vineet Gupta <Vineet.Gupta1@synopsys.com>
Cc: Will Deacon <Will.Deacon@arm.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	arcml <linux-snps-arc@lists.infradead.org>,
	lkml <linux-kernel@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>
Subject: Re: single copy atomicity for double load/stores on 32-bit systems
Date: Fri, 31 May 2019 10:21:12 +0200	[thread overview]
Message-ID: <20190531082112.GH2623@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <2fd3a455-6267-5d21-c530-41964a4f6ce9@synopsys.com>

On Thu, May 30, 2019 at 11:22:42AM -0700, Vineet Gupta wrote:
> Hi Peter,
> 
> Had an interesting lunch time discussion with our hardware architects pertinent to
> "minimal guarantees expected of a CPU" section of memory-barriers.txt
> 
> 
> |  (*) These guarantees apply only to properly aligned and sized scalar
> |     variables.  "Properly sized" currently means variables that are
> |     the same size as "char", "short", "int" and "long".  "Properly
> |     aligned" means the natural alignment, thus no constraints for
> |     "char", two-byte alignment for "short", four-byte alignment for
> |     "int", and either four-byte or eight-byte alignment for "long",
> |     on 32-bit and 64-bit systems, respectively.
> 
> 
> I'm not sure how to interpret "natural alignment" for the case of double
> load/stores on 32-bit systems where the hardware and ABI allow for 4 byte
> alignment (ARCv2 LDD/STD, ARM LDRD/STRD ....)

Natural alignment: !((uintptr_t)ptr % sizeof(*ptr))

For any u64 type, that would give 8 byte alignment. the problem
otherwise being that your data spans two lines/pages etc..

> I presume (and the question) that lkmm doesn't expect such 8 byte load/stores to
> be atomic unless 8-byte aligned
> 
> ARMv7 arch ref manual seems to confirm this. Quoting
> 
> | LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, VLDR,
> | VSTM, and VSTR instructions are executed as a sequence of word-aligned word
> | accesses. Each 32-bit word access is guaranteed to be single-copy atomic. A
> | subsequence of two or more word accesses from the sequence might not exhibit
> | single-copy atomicity
> 
> While it seems reasonable form hardware pov to not implement such atomicity by
> default it seems there's an additional burden on application writers. They could
> be happily using a lockless algorithm with just a shared flag between 2 threads
> w/o need for any explicit synchronization.

If you're that careless with lockless code, you deserve all the pain you
get.

> But upgrade to a new compiler which
> aggressively "packs" struct rendering long long 32-bit aligned (vs. 64-bit before)
> causing the code to suddenly stop working. Is the onus on them to declare such
> memory as c11 atomic or some such.

When a programmer wants guarantees they already need to know wth they're
doing.

And I'll stand by my earlier conviction that any architecture that has a
native u64 (be it a 64bit arch or a 32bit with double-width
instructions) but has an ABI that allows u32 alignment on them is daft.


WARNING: multiple messages have this Message-ID (diff)
From: peterz@infradead.org (Peter Zijlstra)
To: linux-snps-arc@lists.infradead.org
Subject: single copy atomicity for double load/stores on 32-bit systems
Date: Fri, 31 May 2019 10:21:12 +0200	[thread overview]
Message-ID: <20190531082112.GH2623@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <2fd3a455-6267-5d21-c530-41964a4f6ce9@synopsys.com>

On Thu, May 30, 2019@11:22:42AM -0700, Vineet Gupta wrote:
> Hi Peter,
> 
> Had an interesting lunch time discussion with our hardware architects pertinent to
> "minimal guarantees expected of a CPU" section of memory-barriers.txt
> 
> 
> |  (*) These guarantees apply only to properly aligned and sized scalar
> |     variables.  "Properly sized" currently means variables that are
> |     the same size as "char", "short", "int" and "long".  "Properly
> |     aligned" means the natural alignment, thus no constraints for
> |     "char", two-byte alignment for "short", four-byte alignment for
> |     "int", and either four-byte or eight-byte alignment for "long",
> |     on 32-bit and 64-bit systems, respectively.
> 
> 
> I'm not sure how to interpret "natural alignment" for the case of double
> load/stores on 32-bit systems where the hardware and ABI allow for 4 byte
> alignment (ARCv2 LDD/STD, ARM LDRD/STRD ....)

Natural alignment: !((uintptr_t)ptr % sizeof(*ptr))

For any u64 type, that would give 8 byte alignment. the problem
otherwise being that your data spans two lines/pages etc..

> I presume (and the question) that lkmm doesn't expect such 8 byte load/stores to
> be atomic unless 8-byte aligned
> 
> ARMv7 arch ref manual seems to confirm this. Quoting
> 
> | LDM, LDC, LDC2, LDRD, STM, STC, STC2, STRD, PUSH, POP, RFE, SRS, VLDM, VLDR,
> | VSTM, and VSTR instructions are executed as a sequence of word-aligned word
> | accesses. Each 32-bit word access is guaranteed to be single-copy atomic. A
> | subsequence of two or more word accesses from the sequence might not exhibit
> | single-copy atomicity
> 
> While it seems reasonable form hardware pov to not implement such atomicity by
> default it seems there's an additional burden on application writers. They could
> be happily using a lockless algorithm with just a shared flag between 2 threads
> w/o need for any explicit synchronization.

If you're that careless with lockless code, you deserve all the pain you
get.

> But upgrade to a new compiler which
> aggressively "packs" struct rendering long long 32-bit aligned (vs. 64-bit before)
> causing the code to suddenly stop working. Is the onus on them to declare such
> memory as c11 atomic or some such.

When a programmer wants guarantees they already need to know wth they're
doing.

And I'll stand by my earlier conviction that any architecture that has a
native u64 (be it a 64bit arch or a 32bit with double-width
instructions) but has an ABI that allows u32 alignment on them is daft.

  parent reply	other threads:[~2019-05-31  8:21 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-30 18:22 single copy atomicity for double load/stores on 32-bit systems Vineet Gupta
2019-05-30 18:22 ` Vineet Gupta
2019-05-30 18:53 ` Paul E. McKenney
2019-05-30 18:53   ` Paul E. McKenney
2019-05-30 19:16   ` Vineet Gupta
2019-05-30 19:16     ` Vineet Gupta
2019-05-31  8:23     ` Peter Zijlstra
2019-05-31  8:23       ` Peter Zijlstra
2019-05-31  8:25   ` Peter Zijlstra
2019-05-31  8:25     ` Peter Zijlstra
2019-05-31  8:21 ` Peter Zijlstra [this message]
2019-05-31  8:21   ` Peter Zijlstra
2019-06-03 18:08   ` Vineet Gupta
2019-06-03 18:08     ` Vineet Gupta
2019-06-03 20:13     ` Paul E. McKenney
2019-06-03 20:13       ` Paul E. McKenney
2019-06-03 21:59       ` Vineet Gupta
2019-06-03 21:59         ` Vineet Gupta
2019-06-04  7:41       ` Geert Uytterhoeven
2019-06-04  7:41         ` Geert Uytterhoeven
2019-06-04  7:41         ` Geert Uytterhoeven
2019-06-06  9:43         ` Paul E. McKenney
2019-06-06  9:43           ` Paul E. McKenney
2019-06-06  9:53           ` Geert Uytterhoeven
2019-06-06  9:53             ` Geert Uytterhoeven
2019-06-06 16:34           ` David Laight
2019-06-06 16:34             ` David Laight
2019-06-06 21:17             ` Paul E. McKenney
2019-06-06 21:17               ` Paul E. McKenney
2019-06-03 18:43   ` Vineet Gupta
2019-06-03 18:43     ` Vineet Gupta
2019-07-01 20:05   ` Vineet Gupta
2019-07-01 20:05     ` Vineet Gupta
2019-07-02 10:46     ` Will Deacon
2019-07-02 10:46       ` Will Deacon
2019-05-31  9:41 ` David Laight
2019-05-31  9:41   ` David Laight
2019-05-31  9:41   ` David Laight
2019-05-31 11:44   ` Paul E. McKenney
2019-05-31 11:44     ` Paul E. McKenney
2019-06-03 18:44   ` Vineet Gupta
2019-06-03 18:44     ` Vineet Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190531082112.GH2623@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Vineet.Gupta1@synopsys.com \
    --cc=Will.Deacon@arm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-snps-arc@lists.infradead.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.