perfbook.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
@ 2023-10-13  1:22 Joel Fernandes (Google)
  2023-10-14  3:07 ` Akira Yokosawa
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
  0 siblings, 2 replies; 8+ messages in thread
From: Joel Fernandes (Google) @ 2023-10-13  1:22 UTC (permalink / raw)
  To: perfbook; +Cc: Joel Fernandes (Google), paulmck

smp_mb() uses lock;add for x86 in the linux kernel. Add information
about the same.

Cc: paulmck@kernel.org
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
---
Not even build tested, just focused on the content and to keep my promise I'd
send this out (better than never sending it) ;-). I appreciate maintainers of
perfbook taking this forward ;-). Thanks!

 bib/hw.bib            | 8 ++++++++
 memorder/memorder.tex | 8 ++++++++
 2 files changed, 16 insertions(+)

diff --git a/bib/hw.bib b/bib/hw.bib
index b0885e74..b1dfd119 100644
--- a/bib/hw.bib
+++ b/bib/hw.bib
@@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
  note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
 }
 
+@unpublished{Tsirkin2017,
+ Author="Michael S. Tsirkin",
+ Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
+ month="November",
+ day="10",
+ year="2017",
+ note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
+}
diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 5c978fbe..b28ac4f0 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
 stores, and for these CPUs, \co{smp_wmb()} must also be defined to
 be \co{lock;addl}.
 
+A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
+\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
+boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
+the \co{SP} to avoid slowness due to false data-dependencies,
+instead of directly modifying the \co{SP}. \co{clflush} users still
+need to use \co{mfence} for ordering, so they have been converted to use
+\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
+
 Although newer x86 implementations accommodate self-modifying code
 without any special instructions, to be fully compatible with
 past and potential future x86 implementations, a given CPU must
-- 
2.42.0.655.g421f12c284-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
  2023-10-13  1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
@ 2023-10-14  3:07 ` Akira Yokosawa
  2023-10-14 22:26   ` Joel Fernandes
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
  1 sibling, 1 reply; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14  3:07 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: paulmck, Akira Yokosawa, perfbook

Hi Joel,

On 2023/10/13 10:22, Joel Fernandes (Google) wrote:
> smp_mb() uses lock;add for x86 in the linux kernel. Add information
> about the same.
> 
> Cc: paulmck@kernel.org
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> Not even build tested, just focused on the content and to keep my promise I'd
> send this out (better than never sending it) ;-). I appreciate maintainers of
> perfbook taking this forward ;-). Thanks!

I've just tested this...
And it failed to build.

I think I'll post a v2 which will build, with some wordsmithing
I can think of.

A few quick comments below.

> 
>  bib/hw.bib            | 8 ++++++++

bib/memorymodel.bib looks like a suitable destination.

>  memorder/memorder.tex | 8 ++++++++
>  2 files changed, 16 insertions(+)
> 
> diff --git a/bib/hw.bib b/bib/hw.bib
> index b0885e74..b1dfd119 100644
> --- a/bib/hw.bib
> +++ b/bib/hw.bib
> @@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
>   note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
>  }
>  
> +@unpublished{Tsirkin2017,
> + Author="Michael S. Tsirkin",
> + Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
"_" in title needs an escape.

> + month="November",
> + day="10",
> + year="2017",
> + note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
> +}
> diff --git a/memorder/memorder.tex b/memorder/memorder.tex
> index 5c978fbe..b28ac4f0 100644
> --- a/memorder/memorder.tex
> +++ b/memorder/memorder.tex
> @@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
>  stores, and for these CPUs, \co{smp_wmb()} must also be defined to
>  be \co{lock;addl}.
>  
> +A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
> +\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
> +boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
                           ^
perfbook's LaTeX source convention needs a line break at the end of a
sentence.

> +the \co{SP} to avoid slowness due to false data-dependencies,
> +instead of directly modifying the \co{SP}. \co{clflush} users still
> +need to use \co{mfence} for ordering, so they have been converted to use
> +\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
> +
>  Although newer x86 implementations accommodate self-modifying code
>  without any special instructions, to be fully compatible with
>  past and potential future x86 implementations, a given CPU must

Anyway, please wait for my v2.

        Thanks, Akira

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb()
  2023-10-13  1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
  2023-10-14  3:07 ` Akira Yokosawa
@ 2023-10-14  8:37 ` Akira Yokosawa
  2023-10-14  8:39   ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
                     ` (3 more replies)
  1 sibling, 4 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14  8:37 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa

Hi Joel,

So this is v2 based on your patch.

As is often the case in perfbook workflow, I split the bib
update into Patch 1/3.

Patch 2/3 includes much more changes than I thought.
Please see notes under "---".

Patch 3/3 is my own adjustment of the paragraph in front of
your update.

And I dropped "Cc: Paul", as Paul is the one who will be in
the SOB chain.

        Thanks, Akira
--
Akira Yokosawa (1):
  memorder: Update of ordering SSE non-temporal memory move instructions

Joel Fernandes (Google) (2):
  bib/memorymodel: Add Tsirkin2017
  memorder: Add info on recent x86 implementation of smp_mb()

 bib/memorymodel.bib   | 10 ++++++++++
 memorder/memorder.tex | 16 ++++++++++++++--
 2 files changed, 24 insertions(+), 2 deletions(-)


base-commit: a01629a5f734b7617ee0ac4d4ded28b0605cf7cd
-- 
2.25.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
@ 2023-10-14  8:39   ` Akira Yokosawa
  2023-10-14  8:42   ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14  8:39 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

Add entry of Linux kernel git commit 450cbdd0125c ("locking/x86:
Use LOCK ADD for smp_mb() instead of MFENCE").

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
Changes in v2 (by akiyks):
  - Split bib update on its own.
  - Add the entry in memorymodel.bib rather than in hw.bib.
    memorymodel.bib is more suitable for references related to
    memory barrier implementation.
  - Escape "_" in the title.
  - Use "{}" for upper case.
  - Use URL of Git commit rather than that of lore.
    Link to lore is available in the changelog.
  - Use date of patch submission rather than that of commit.
---
 bib/memorymodel.bib | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/bib/memorymodel.bib b/bib/memorymodel.bib
index 1ecff7d65f05..4883dfe30901 100644
--- a/bib/memorymodel.bib
+++ b/bib/memorymodel.bib
@@ -373,6 +373,16 @@ International Workshop on Exploiting Concurrency Efficiently and Correctly },
  lastchecked="February 9, 2018",
 }
 
+@unpublished{Tsirkin2017,
+ Author="Michael S. Tsirkin",
+ Title="locking/x86: Use {LOCK ADD} for smp\_mb() instead of {MFENCE}",
+ month="October",
+ day="27",
+ year="2017",
+ note="Git commit:
+\url{https://git.kernel.org/linus/450cbdd0125c}",
+}
+
 @article{Pulte:2017:SAC:3177123.3158107,
  author = {Pulte, Christopher and Flur, Shaked and Deacon, Will and French, Jon and Sarkar, Susmit and Sewell, Peter},
  title = {Simplifying ARM Concurrency: Multicopy-atomic Axiomatic and Operational Models for ARMv8},
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb()
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
  2023-10-14  8:39   ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
@ 2023-10-14  8:42   ` Akira Yokosawa
  2023-10-14  8:43   ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
  2023-10-16  4:01   ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney
  3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14  8:42 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa

From: "Joel Fernandes (Google)" <joel@joelfernandes.org>

smp_mb() uses lock;add for x86 in the linux kernel. Add information
about the same.

Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Co-developed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
Changes in v2 (by akiyks):
  - Apply punctuation conventions of perfbook LaTeX source.
      - Break lines at sentence-ending punctuation marks.
  - Overall wordsmith.
      - Fix typo in Subject. (implementation)
      - Drop confusing "the"s.
      - Use "lock;addl" for consistency in the section.
      - Reworded "instead of directly modifying SP" which surprised
        me a bit.
      - Reorder the final sentence to make it obvious that mb() is the
        one who uses mfence.
---
 memorder/memorder.tex | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 5c978fbef172..6b9c3268e589 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6081,6 +6081,16 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
 stores, and for these CPUs, \co{smp_wmb()} must also be defined to
 be \co{lock;addl}.
 
+A 2017 kernel commit by Michael S.~Tsirkin replaced \co{mfence} with
+\co{lock;addl} in \co{smp_mb()}, achieving a 60 percent performance
+boost~\cite{Tsirkin2017}.
+The change used a 4-byte negative offset from \co{SP} to avoid
+slowness due to false data dependencies, instead of directly
+accessing memory pointed to by \co{SP}.
+\co{clflush} users still need to use \co{mfence} for ordering.
+Therefore, they were converted to use \co{mb()}, which uses \co{mfence}
+as before, instead of \co{smp_mb()}.
+
 Although newer x86 implementations accommodate self-modifying code
 without any special instructions, to be fully compatible with
 past and potential future x86 implementations, a given CPU must
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
  2023-10-14  8:39   ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
  2023-10-14  8:42   ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
@ 2023-10-14  8:43   ` Akira Yokosawa
  2023-10-16  4:01   ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney
  3 siblings, 0 replies; 8+ messages in thread
From: Akira Yokosawa @ 2023-10-14  8:43 UTC (permalink / raw)
  To: Joel Fernandes (Google); +Cc: paulmck, perfbook, Akira Yokosawa

Since Linux v4.15, smp_mb(), smp_wmb(), and smp_rmb() don't suffice
for ordering them.

Update the text accordingly and add a footnote.

Signed-off-by: Akira Yokosawa <akiyks@gmail.com>
---
 memorder/memorder.tex | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/memorder/memorder.tex b/memorder/memorder.tex
index 6b9c3268e589..da56a6999c3f 100644
--- a/memorder/memorder.tex
+++ b/memorder/memorder.tex
@@ -6075,8 +6075,10 @@ that same location, you are on your own.
 Some SSE instructions are weakly ordered (\co{clflush}
 and non-temporal move instructions~\cite{IntelXeonV2b-96a}).
 Code that uses these non-temporal move instructions
-can also use \co{mfence} for \co{smp_mb()},
-\co{lfence} for \co{smp_rmb()}, and \co{sfence} for \co{smp_wmb()}.
+can use \co{mfence} for \co{mb()},
+\co{lfence} for \co{rmb()}, and \co{sfence} for \co{wmb()}.\footnote{
+	\co{smp_mb()}, \co{smp_rmb()}, and \co{smp_wmb()} don't suffice
+	for ordering non-temporal move instructions since Linux v4.15.}
 A few older variants of the x86 CPU have a mode bit that enables out-of-order
 stores, and for these CPUs, \co{smp_wmb()} must also be defined to
 be \co{lock;addl}.
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] memorder: Add info on recent x86 implemenation of smp_mb()
  2023-10-14  3:07 ` Akira Yokosawa
@ 2023-10-14 22:26   ` Joel Fernandes
  0 siblings, 0 replies; 8+ messages in thread
From: Joel Fernandes @ 2023-10-14 22:26 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: paulmck, perfbook

Hi Akira,

> On Oct 13, 2023, at 11:07 PM, Akira Yokosawa <akiyks@gmail.com> wrote:
> 
> Hi Joel,
> 
>> On 2023/10/13 10:22, Joel Fernandes (Google) wrote:
>> smp_mb() uses lock;add for x86 in the linux kernel. Add information
>> about the same.
>> 
>> Cc: paulmck@kernel.org
>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
>> ---
>> Not even build tested, just focused on the content and to keep my promise I'd
>> send this out (better than never sending it) ;-). I appreciate maintainers of
>> perfbook taking this forward ;-). Thanks!
> 
> I've just tested this...
> And it failed to build.
> 
> I think I'll post a v2 which will build, with some wordsmithing
> I can think of.
> 
> A few quick comments below.

Thank you very much for your help! I looked through the v2 and everything LGTM.

 - Joel


> 
>> 
>> bib/hw.bib            | 8 ++++++++
> 
> bib/memorymodel.bib looks like a suitable destination.
> 
>> memorder/memorder.tex | 8 ++++++++
>> 2 files changed, 16 insertions(+)
>> 
>> diff --git a/bib/hw.bib b/bib/hw.bib
>> index b0885e74..b1dfd119 100644
>> --- a/bib/hw.bib
>> +++ b/bib/hw.bib
>> @@ -1159,3 +1159,11 @@ Luis Stevens and Anoop Gupta and John Hennessy",
>>  note="\url{https://github.com/google/fuzzing/blob/master/docs/silifuzz.pdf}",
>> }
>> 
>> +@unpublished{Tsirkin2017,
>> + Author="Michael S. Tsirkin",
>> + Title="locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE",
> "_" in title needs an escape.
> 
>> + month="November",
>> + day="10",
>> + year="2017",
>> + note="\url{https://lore.kernel.org/all/tip-450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730@git.kernel.org/}",
>> +}
>> diff --git a/memorder/memorder.tex b/memorder/memorder.tex
>> index 5c978fbe..b28ac4f0 100644
>> --- a/memorder/memorder.tex
>> +++ b/memorder/memorder.tex
>> @@ -6081,6 +6081,14 @@ A few older variants of the x86 CPU have a mode bit that enables out-of-order
>> stores, and for these CPUs, \co{smp_wmb()} must also be defined to
>> be \co{lock;addl}.
>> 
>> +A 2017 kernel commit by Michael S. Tsirkin replaced \co{mfence} with
>> +\co{lock add} in \co{smp_mb()}, achieving a 60 percent performance
>> +boost~\cite{Tsirkin2017}. The change used a 4-byte negative offset from
>                           ^
> perfbook's LaTeX source convention needs a line break at the end of a
> sentence.
> 
>> +the \co{SP} to avoid slowness due to false data-dependencies,
>> +instead of directly modifying the \co{SP}. \co{clflush} users still
>> +need to use \co{mfence} for ordering, so they have been converted to use
>> +\co{mb} instead of \co{smp_mb}, which uses an \co{mfence} as before.
>> +
>> Although newer x86 implementations accommodate self-modifying code
>> without any special instructions, to be fully compatible with
>> past and potential future x86 implementations, a given CPU must
> 
> Anyway, please wait for my v2.
> 
>        Thanks, Akira

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb()
  2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
                     ` (2 preceding siblings ...)
  2023-10-14  8:43   ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
@ 2023-10-16  4:01   ` Paul E. McKenney
  3 siblings, 0 replies; 8+ messages in thread
From: Paul E. McKenney @ 2023-10-16  4:01 UTC (permalink / raw)
  To: Akira Yokosawa; +Cc: Joel Fernandes (Google), perfbook

On Sat, Oct 14, 2023 at 05:37:04PM +0900, Akira Yokosawa wrote:
> Hi Joel,
> 
> So this is v2 based on your patch.
> 
> As is often the case in perfbook workflow, I split the bib
> update into Patch 1/3.
> 
> Patch 2/3 includes much more changes than I thought.
> Please see notes under "---".
> 
> Patch 3/3 is my own adjustment of the paragraph in front of
> your update.
> 
> And I dropped "Cc: Paul", as Paul is the one who will be in
> the SOB chain.

Queued and pushed, thank you both!

							Thanx, Paul

>         Thanks, Akira
> --
> Akira Yokosawa (1):
>   memorder: Update of ordering SSE non-temporal memory move instructions
> 
> Joel Fernandes (Google) (2):
>   bib/memorymodel: Add Tsirkin2017
>   memorder: Add info on recent x86 implementation of smp_mb()
> 
>  bib/memorymodel.bib   | 10 ++++++++++
>  memorder/memorder.tex | 16 ++++++++++++++--
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> 
> base-commit: a01629a5f734b7617ee0ac4d4ded28b0605cf7cd
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-16  4:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-13  1:22 [PATCH] memorder: Add info on recent x86 implemenation of smp_mb() Joel Fernandes (Google)
2023-10-14  3:07 ` Akira Yokosawa
2023-10-14 22:26   ` Joel Fernandes
2023-10-14  8:37 ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation " Akira Yokosawa
2023-10-14  8:39   ` [PATCH -perfbook v2 1/3] bib/memorymodel: Add Tsirkin2017 Akira Yokosawa
2023-10-14  8:42   ` [PATCH -perfbook v2 2/3] memorder: Add info on recent x86 implementation of smp_mb() Akira Yokosawa
2023-10-14  8:43   ` [PATCH -perfbook v2 3/3] memorder: Update of ordering SSE non-temporal memory move instructions Akira Yokosawa
2023-10-16  4:01   ` [PATCH -perfbook v2 0/3] memorder: Add info on recent x86 implementation of smp_mb() Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).