* [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
@ 2023-10-13 1:22 Joel Fernandes (Google)
2023-10-14 3:07 ` Akira Yokosawa
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
0 siblings, 2 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-10-13 1:22 UTC (permalink / raw)
To: perfbook; +Cc: Joel Fernandes (Google), paulmck
smp_mb() uses lock;add for x86 in the linux kernel. Add information
about the same.
Cc: paulmck@kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
Not even build tested, just focused on the content and to keep my promise I'd
send this out (better than never sending it) ;-). I appreciate maintainers of
perfbook taking this forward ;-). Thanks!
bib/hw.bib | 8 ++++++++
memorder/memorder.tex | 8 ++++++++
2 files changed, 16 insertions(+)
diff --git a/bib/hw.bib b/bib/hw.bib
index b0885e74..b1dfd119 100644
--- a/bib/hw.bib
+++ b/bib/hw.bib
@@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
}
+@unpublished{Tsirkin2017,
+ Author="Michael S. Tsirkin",
+ Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
+ month="November",
+ day="10",
+ year="2017",
+ note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
+}
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 5c978fbe..b28ac4f0 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
stores, and for these CPUs, \co{smp_wmb()} must also be defined to
be \co{lock;addl}.
+A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
+\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
+boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
+the \co{SP} to avoid slowness due to false data-dependencies,
+instead of directly modifying the \co{SP}. \co{clflush} users still
+need to use \co{mfence} for ordering, so they have been converted to use
+\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
+
Although newer x86 implementations accommodate self-modifying code
without any special instructions, to be fully compatible with
past and potential future x86 implementations, a given CPU must
--
2.42.0.655.g421f12c284-goog
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
2023-10-13 1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
@ 2023-10-14 3:07 ` Akira Yokosawa
2023-10-14 22:26 ` Joel Fernandes
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
1 sibling, 1 reply; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14 3:07 UTC (permalink / raw)
To: Joel Fernandes (Google); +Cc: paulmck, Akira Yokosawa, perfbook
Hi Joel,
On 2023/10/13 10:22, Joel Fernandes (Google) wrote:
> smp_mb() uses lock;add for x86 in the linux kernel. Add information
> about the same.
>
> Cc: paulmck@kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> Not even build tested, just focused on the content and to keep my promise I'd
> send this out (better than never sending it) ;-). I appreciate maintainers of
> perfbook taking this forward ;-). Thanks!
I've just tested this...
And it failed to build.
I think I'll post a v2 which will build, with some wordsmithing
I can think of.
A few quick comments below.
>
> bib/hw.bib | 8 ++++++++
bib/memorymodel.bib looks like a suitable destination.
> memorder/memorder.tex | 8 ++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/bib/hw.bib b/bib/hw.bib
> index b0885e74..b1dfd119 100644
> --- a/bib/hw.bib
> +++ b/bib/hw.bib
> @@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
> note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
> }
>
> +@unpublished{Tsirkin2017,
> + Author="Michael S. Tsirkin",
> + Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
"_" in title needs an escape.
> + month="November",
> + day="10",
> + year="2017",
> + note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
> +}
> diff --git a/memorder/memorder.tex b/memorder/memorder.tex
> index 5c978fbe..b28ac4f0 100644
> --- a/memorder/memorder.tex
> +++ b/memorder/memorder.tex
> @@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
> stores, and for these CPUs, \co{smp_wmb()} must also be defined to
> be \co{lock;addl}.
>
> +A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
> +\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
> +boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
^
perfbook's LaTeX source convention needs a line break at the end of a
sentence.
> +the \co{SP} to avoid slowness due to false data-dependencies,
> +instead of directly modifying the \co{SP}. \co{clflush} users still
> +need to use \co{mfence} for ordering, so they have been converted to use
> +\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
> +
> Although newer x86 implementations accommodate self-modifying code
> without any special instructions, to be fully compatible with
> past and potential future x86 implementations, a given CPU must
Anyway, please wait for my v2.
Thanks, Akira
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb()
2023-10-13 1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
2023-10-14 3:07 ` Akira Yokosawa
@ 2023-10-14 8:37 ` Akira Yokosawa
2023-10-14 8:39 ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
` (3 more replies)
1 sibling, 4 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14 8:37 UTC (permalink / raw)
To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa
Hi Joel,
So this is v2 based on your patch.
As is often the case in perfbook workflow, I split the bib
update into Patch 1/3.
Patch 2/3 includes much more changes than I thought.
Please see notes under "---".
Patch 3/3 is my own adjustment of the paragraph in front of
your update.
And I dropped "Cc: Paul", as Paul is the one who will be in
the SOB chain.
Thanks, Akira
--
Akira Yokosawa (1):
memorder: Update of ordering SSE non-temporal memory move instructions
Joel Fernandes (Google) (2):
bib/memorymodel: Add Tsirkin2017
memorder: Add info on recent x86 implementation of smp_mb()
bib/memorymodel.bib | 10 ++++++++++
memorder/memorder.tex | 16 ++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
base-commit: a01629a5f734b7617ee0ac4d4ded28b0605cf7cd
--
2.25.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
@ 2023-10-14 8:39 ` Akira Yokosawa
2023-10-14 8:42 ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14 8:39 UTC (permalink / raw)
To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Add entry of Linux kernel git commit 450cbdd0125c ("locking/x86:
Use LOCK ADD for smp_mb() instead of MFENCE").
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
Changes in v2 (by akiyks):
- Split bib update on its own.
- Add the entry in memorymodel.bib rather than in hw.bib.
memorymodel.bib is more suitable for references related to
memory barrier implementation.
- Escape "_" in the title.
- Use "{}" for upper case.
- Use URL of Git commit rather than that of lore.
Link to lore is available in the changelog.
- Use date of patch submission rather than that of commit.
---
bib/memorymodel.bib | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/bib/memorymodel.bib b/bib/memorymodel.bib
index 1ecff7d65f05..4883dfe30901 100644
--- a/bib/memorymodel.bib
+++ b/bib/memorymodel.bib
@@ -373,6 +373,16 @@ International Workshop on Exploiting Concurrency Efficiently and Correctly },
lastchecked="February 9, 2018",
}
+@unpublished{Tsirkin2017,
+ Author="Michael S. Tsirkin",
+ Title="locking/x86: Use {LOCK ADD} for smp\_mb() instead of {MFENCE}",
+ month="October",
+ day="27",
+ year="2017",
+ note="Git commit:
+\url{https://git.kernel.org/linus/450cbdd0125c}",
+}
+
@article{Pulte:2017:SAC:3177123.3158107,
author = {Pulte, Christopher and Flur, Shaked and Deacon, Will and French, Jon and Sarkar, Susmit and Sewell, Peter},
title = {Simplifying ARM Concurrency: Multicopy-atomic Axiomatic and Operational Models for ARMv8},
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb()
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
2023-10-14 8:39 ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
@ 2023-10-14 8:42 ` Akira Yokosawa
2023-10-14 8:43 ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
2023-10-16 4:01 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney
3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14 8:42 UTC (permalink / raw)
To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa
From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
smp_mb() uses lock;add for x86 in the linux kernel. Add information
about the same.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
Changes in v2 (by akiyks):
- Apply punctuation conventions of perfbook LaTeX source.
- Break lines at sentence-ending punctuation marks.
- Overall wordsmith.
- Fix typo in Subject. (implementation)
- Drop confusing "the"s.
- Use "lock;addl" for consistency in the section.
- Reworded "instead of directly modifying SP" which surprised
me a bit.
- Reorder the final sentence to make it obvious that mb() is the
one who uses mfence.
---
memorder/memorder.tex | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 5c978fbef172..6b9c3268e589 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6081,6 +6081,16 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
stores, and for these CPUs, \co{smp_wmb()} must also be defined to
be \co{lock;addl}.
+A 2017 kernel commit by Michael S.~Tsirkin replaced \co{mfence} with
+\co{lock;addl} in \co{smp_mb()}, achieving a 60 percent performance
+boost~\cite{Tsirkin2017}.
+The change used a 4-byte negative offset from \co{SP} to avoid
+slowness due to false data dependencies, instead of directly
+accessing memory pointed to by \co{SP}.
+\co{clflush} users still need to use \co{mfence} for ordering.
+Therefore, they were converted to use \co{mb()}, which uses \co{mfence}
+as before, instead of \co{smp_mb()}.
+
Although newer x86 implementations accommodate self-modifying code
without any special instructions, to be fully compatible with
past and potential future x86 implementations, a given CPU must
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
2023-10-14 8:39 ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
2023-10-14 8:42 ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
@ 2023-10-14 8:43 ` Akira Yokosawa
2023-10-16 4:01 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney
3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14 8:43 UTC (permalink / raw)
To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa
Since Linux v4.15, smp_mb(), smp_wmb(), and smp_rmb() don't suffice
for ordering them.
Update the text accordingly and add a footnote.
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
memorder/memorder.tex | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 6b9c3268e589..da56a6999c3f 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6075,8 +6075,10 @@ that same location, you are on your own.
Some SSE instructions are weakly ordered (\co{clflush}
and non-temporal move instructions~\cite{IntelXeonV2b-96a}).
Code that uses these non-temporal move instructions
-can also use \co{mfence} for \co{smp_mb()},
-\co{lfence} for \co{smp_rmb()}, and \co{sfence} for \co{smp_wmb()}.
+can use \co{mfence} for \co{mb()},
+\co{lfence} for \co{rmb()}, and \co{sfence} for \co{wmb()}.\footnote{
+ \co{smp_mb()}, \co{smp_rmb()}, and \co{smp_wmb()} don't suffice
+ for ordering non-temporal move instructions since Linux v4.15.}
A few older variants of the x86 CPU have a mode bit that enables out-of-order
stores, and for these CPUs, \co{smp_wmb()} must also be defined to
be \co{lock;addl}.
--
2.25.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
2023-10-14 3:07 ` Akira Yokosawa
@ 2023-10-14 22:26 ` Joel Fernandes
0 siblings, 0 replies; 8+ messages in thread
From: Joel Fernandes @ 2023-10-14 22:26 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: paulmck, perfbook
Hi Akira,
> On Oct 13, 2023, at 11:07 PM, Akira Yokosawa <akiyks@gmail.com> wrote:
>
> Hi Joel,
>
>> On 2023/10/13 10:22, Joel Fernandes (Google) wrote:
>> smp_mb() uses lock;add for x86 in the linux kernel. Add information
>> about the same.
>>
>> Cc: paulmck@kernel.org
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>> Not even build tested, just focused on the content and to keep my promise I'd
>> send this out (better than never sending it) ;-). I appreciate maintainers of
>> perfbook taking this forward ;-). Thanks!
>
> I've just tested this...
> And it failed to build.
>
> I think I'll post a v2 which will build, with some wordsmithing
> I can think of.
>
> A few quick comments below.
Thank you very much for your help! I looked through the v2 and everything LGTM.
- Joel
>
>>
>> bib/hw.bib | 8 ++++++++
>
> bib/memorymodel.bib looks like a suitable destination.
>
>> memorder/memorder.tex | 8 ++++++++
>> 2 files changed, 16 insertions(+)
>>
>> diff --git a/bib/hw.bib b/bib/hw.bib
>> index b0885e74..b1dfd119 100644
>> --- a/bib/hw.bib
>> +++ b/bib/hw.bib
>> @@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
>> note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
>> }
>>
>> +@unpublished{Tsirkin2017,
>> + Author="Michael S. Tsirkin",
>> + Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
> "_" in title needs an escape.
>
>> + month="November",
>> + day="10",
>> + year="2017",
>> + note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
>> +}
>> diff --git a/memorder/memorder.tex b/memorder/memorder.tex
>> index 5c978fbe..b28ac4f0 100644
>> --- a/memorder/memorder.tex
>> +++ b/memorder/memorder.tex
>> @@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
>> stores, and for these CPUs, \co{smp_wmb()} must also be defined to
>> be \co{lock;addl}.
>>
>> +A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
>> +\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
>> +boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
> ^
> perfbook's LaTeX source convention needs a line break at the end of a
> sentence.
>
>> +the \co{SP} to avoid slowness due to false data-dependencies,
>> +instead of directly modifying the \co{SP}. \co{clflush} users still
>> +need to use \co{mfence} for ordering, so they have been converted to use
>> +\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
>> +
>> Although newer x86 implementations accommodate self-modifying code
>> without any special instructions, to be fully compatible with
>> past and potential future x86 implementations, a given CPU must
>
> Anyway, please wait for my v2.
>
> Thanks, Akira
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb()
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
` (2 preceding siblings ...)
2023-10-14 8:43 ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
@ 2023-10-16 4:01 ` Paul E. McKenney
3 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2023-10-16 4:01 UTC (permalink / raw)
To: Akira Yokosawa; +Cc: Joel Fernandes (Google), perfbook
On Sat, Oct 14, 2023 at 05:37:04PM +0900, Akira Yokosawa wrote:
> Hi Joel,
>
> So this is v2 based on your patch.
>
> As is often the case in perfbook workflow, I split the bib
> update into Patch 1/3.
>
> Patch 2/3 includes much more changes than I thought.
> Please see notes under "---".
>
> Patch 3/3 is my own adjustment of the paragraph in front of
> your update.
>
> And I dropped "Cc: Paul", as Paul is the one who will be in
> the SOB chain.
Queued and pushed, thank you both!
Thanx, Paul
> Thanks, Akira
> --
> Akira Yokosawa (1):
> memorder: Update of ordering SSE non-temporal memory move instructions
>
> Joel Fernandes (Google) (2):
> bib/memorymodel: Add Tsirkin2017
> memorder: Add info on recent x86 implementation of smp_mb()
>
> bib/memorymodel.bib | 10 ++++++++++
> memorder/memorder.tex | 16 ++++++++++++++--
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
>
> base-commit: a01629a5f734b7617ee0ac4d4ded28b0605cf7cd
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-10-16 4:01 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-13 1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
2023-10-14 3:07 ` Akira Yokosawa
2023-10-14 22:26 ` Joel Fernandes
2023-10-14 8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
2023-10-14 8:39 ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
2023-10-14 8:42 ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
2023-10-14 8:43 ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
2023-10-16 4:01 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).