All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] count: Switch from GCC to C11 thread-local storage
@ 2022-08-13 11:45 Elad Lahav
  2022-08-13 11:49 ` Elad Lahav
  2022-08-13 23:08 ` Paul E. McKenney
  0 siblings, 2 replies; 13+ messages in thread
From: Elad Lahav @ 2022-08-13 11:45 UTC (permalink / raw)
  To: perfbook; +Cc: Elad Lahav

Signed-off-by: Elad Lahav <e2lahav@gmail.com>
---
 CodeSamples/count/count_end.c | 12 ++++++------
 count/count.tex               |  4 ++--
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/CodeSamples/count/count_end.c b/CodeSamples/count/count_end.c
index 722ad2f7..5e7b9ee1 100644
--- a/CodeSamples/count/count_end.c
+++ b/CodeSamples/count/count_end.c
@@ -1,6 +1,6 @@
 /*
  * count_end.c: Per-thread statistical counters that provide sum at end.
- *	Uses __thread for each thread's counter.
+ *	Uses _Thread_local for each thread's counter.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -23,17 +23,17 @@
 #include "../api.h"
 
 //\begin{snippet}[labelbase=ln:count:count_end:whole,commandchars=\\\@\$]
-unsigned long __thread counter = 0;		//\lnlbl{var:b}
+unsigned long _Thread_local counter = 0;		//\lnlbl{var:b}
 unsigned long *counterp[NR_THREADS] = { NULL };
 unsigned long finalcount = 0;
 DEFINE_SPINLOCK(final_mutex);			//\lnlbl{var:e}
 
-static __inline__ void inc_count(void)		//\lnlbl{inc:b}
+static inline void inc_count(void)		//\lnlbl{inc:b}
 {
 	WRITE_ONCE(counter, counter + 1);
 }						//\lnlbl{inc:e}
 
-static __inline__ unsigned long read_count(void)
+static inline unsigned long read_count(void)
 {
 	int t;
 	unsigned long sum;
@@ -48,7 +48,7 @@ static __inline__ unsigned long read_count(void)
 }
 
 #ifndef FCV_SNIPPET
-__inline__ void count_init(void)
+inline void count_init(void)
 {
 }
 #endif /* FCV_SNIPPET */
@@ -73,7 +73,7 @@ void count_unregister_thread(int nthreadsexpected)	//\lnlbl{unreg:b}
 }							//\lnlbl{unreg:e}
 //\end{snippet}
 
-__inline__ void count_cleanup(void)
+inline void count_cleanup(void)
 {
 }
 
diff --git a/count/count.tex b/count/count.tex
index 523789e2..173b964c 100644
--- a/count/count.tex
+++ b/count/count.tex
@@ -721,8 +721,8 @@ This is the topic of the next section.
 \subsection{Per-Thread-Variable-Based Implementation}
 \label{sec:count:Per-Thread-Variable-Based Implementation}
 
-\GCC\ provides an \apig{__thread} storage class that provides
-per-thread storage.
+The C language, since C11, features a \apig{_Thread_local} storage class that
+provides per-thread storage.
 This can be used as shown in
 \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c})
 to implement
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 11:45 [PATCH] count: Switch from GCC to C11 thread-local storage Elad Lahav
@ 2022-08-13 11:49 ` Elad Lahav
  2022-08-13 23:12   ` Paul E. McKenney
  2022-08-13 23:08 ` Paul E. McKenney
  1 sibling, 1 reply; 13+ messages in thread
From: Elad Lahav @ 2022-08-13 11:49 UTC (permalink / raw)
  To: perfbook

On 2022-08-13 07:45, Elad Lahav wrote:
> Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> ---

This one will probably require some back-and-forth. If we change it 
here, do other places need to be updated? Have I now removed the 
explanation of GCC's __thread storage class, which will impact the 
readability of other sections? Or are you ready to take the plunge and 
convert all code snippets?

For the record, I did build and test count_end.c.

--Elad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 11:45 [PATCH] count: Switch from GCC to C11 thread-local storage Elad Lahav
  2022-08-13 11:49 ` Elad Lahav
@ 2022-08-13 23:08 ` Paul E. McKenney
  2022-08-13 23:12   ` Elad Lahav
  2022-08-17 11:02   ` Elad Lahav
  1 sibling, 2 replies; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-13 23:08 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Sat, Aug 13, 2022 at 07:45:59AM -0400, Elad Lahav wrote:
> Signed-off-by: Elad Lahav <e2lahav@gmail.com>

Nice, thank you!

But did you verify that none of the text describing count_end.c needed
changing?  (It seems unlikely that any would, but could you please check?)

> ---
>  CodeSamples/count/count_end.c | 12 ++++++------
>  count/count.tex               |  4 ++--
>  2 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/CodeSamples/count/count_end.c b/CodeSamples/count/count_end.c
> index 722ad2f7..5e7b9ee1 100644
> --- a/CodeSamples/count/count_end.c
> +++ b/CodeSamples/count/count_end.c
> @@ -1,6 +1,6 @@
>  /*
>   * count_end.c: Per-thread statistical counters that provide sum at end.
> - *	Uses __thread for each thread's counter.
> + *	Uses _Thread_local for each thread's counter.
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License as published by
> @@ -23,17 +23,17 @@
>  #include "../api.h"
>  
>  //\begin{snippet}[labelbase=ln:count:count_end:whole,commandchars=\\\@\$]
> -unsigned long __thread counter = 0;		//\lnlbl{var:b}
> +unsigned long _Thread_local counter = 0;		//\lnlbl{var:b}
>  unsigned long *counterp[NR_THREADS] = { NULL };
>  unsigned long finalcount = 0;
>  DEFINE_SPINLOCK(final_mutex);			//\lnlbl{var:e}
>  
> -static __inline__ void inc_count(void)		//\lnlbl{inc:b}
> +static inline void inc_count(void)		//\lnlbl{inc:b}
>  {
>  	WRITE_ONCE(counter, counter + 1);
>  }						//\lnlbl{inc:e}
>  
> -static __inline__ unsigned long read_count(void)
> +static inline unsigned long read_count(void)
>  {
>  	int t;
>  	unsigned long sum;
> @@ -48,7 +48,7 @@ static __inline__ unsigned long read_count(void)
>  }
>  
>  #ifndef FCV_SNIPPET
> -__inline__ void count_init(void)
> +inline void count_init(void)
>  {
>  }
>  #endif /* FCV_SNIPPET */
> @@ -73,7 +73,7 @@ void count_unregister_thread(int nthreadsexpected)	//\lnlbl{unreg:b}
>  }							//\lnlbl{unreg:e}
>  //\end{snippet}
>  
> -__inline__ void count_cleanup(void)
> +inline void count_cleanup(void)
>  {
>  }
>  
> diff --git a/count/count.tex b/count/count.tex
> index 523789e2..173b964c 100644
> --- a/count/count.tex
> +++ b/count/count.tex
> @@ -721,8 +721,8 @@ This is the topic of the next section.
>  \subsection{Per-Thread-Variable-Based Implementation}
>  \label{sec:count:Per-Thread-Variable-Based Implementation}
>  
> -\GCC\ provides an \apig{__thread} storage class that provides
> -per-thread storage.
> +The C language, since C11, features a \apig{_Thread_local} storage class that
> +provides per-thread storage.

Could you please mention __thread as the old way?  Either in the text
directly, in parentheses, as a footnote, or as a quick quiz, your choice.

							Thanx, Paul

>  This can be used as shown in
>  \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c})
>  to implement
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 11:49 ` Elad Lahav
@ 2022-08-13 23:12   ` Paul E. McKenney
  2022-08-13 23:23     ` Elad Lahav
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-13 23:12 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Sat, Aug 13, 2022 at 07:49:27AM -0400, Elad Lahav wrote:
> On 2022-08-13 07:45, Elad Lahav wrote:
> > Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> > ---
> 
> This one will probably require some back-and-forth. If we change it here, do
> other places need to be updated? Have I now removed the explanation of GCC's
> __thread storage class, which will impact the readability of other sections?
> Or are you ready to take the plunge and convert all code snippets?
> 
> For the record, I did build and test count_end.c.

As long as you continue building and testing, checking the descriptions in
text of the code snippets (in case there is a mention of __thread there),
and verifying that the PDF still builds, as fast as you are willing to go!

Are you seeing the same performance with the new as with the old on
your hardware?  (No statistically significant difference on my laptop,
but figured I should ask.)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:08 ` Paul E. McKenney
@ 2022-08-13 23:12   ` Elad Lahav
  2022-08-13 23:17     ` Paul E. McKenney
  2022-08-17 11:02   ` Elad Lahav
  1 sibling, 1 reply; 13+ messages in thread
From: Elad Lahav @ 2022-08-13 23:12 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook

Yes, I thought about a footnote along the lines of "previous versions
of this book used GCC extensions..." but wasn't sure if you'd approve.
I can certainly add that.

--Elad

On Sat, 13 Aug 2022 at 19:08, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sat, Aug 13, 2022 at 07:45:59AM -0400, Elad Lahav wrote:
> > Signed-off-by: Elad Lahav <e2lahav@gmail.com>
>
> Nice, thank you!
>
> But did you verify that none of the text describing count_end.c needed
> changing?  (It seems unlikely that any would, but could you please check?)
>
> > ---
> >  CodeSamples/count/count_end.c | 12 ++++++------
> >  count/count.tex               |  4 ++--
> >  2 files changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/CodeSamples/count/count_end.c b/CodeSamples/count/count_end.c
> > index 722ad2f7..5e7b9ee1 100644
> > --- a/CodeSamples/count/count_end.c
> > +++ b/CodeSamples/count/count_end.c
> > @@ -1,6 +1,6 @@
> >  /*
> >   * count_end.c: Per-thread statistical counters that provide sum at end.
> > - *   Uses __thread for each thread's counter.
> > + *   Uses _Thread_local for each thread's counter.
> >   *
> >   * This program is free software; you can redistribute it and/or modify
> >   * it under the terms of the GNU General Public License as published by
> > @@ -23,17 +23,17 @@
> >  #include "../api.h"
> >
> >  //\begin{snippet}[labelbase=ln:count:count_end:whole,commandchars=\\\@\$]
> > -unsigned long __thread counter = 0;          //\lnlbl{var:b}
> > +unsigned long _Thread_local counter = 0;             //\lnlbl{var:b}
> >  unsigned long *counterp[NR_THREADS] = { NULL };
> >  unsigned long finalcount = 0;
> >  DEFINE_SPINLOCK(final_mutex);                        //\lnlbl{var:e}
> >
> > -static __inline__ void inc_count(void)               //\lnlbl{inc:b}
> > +static inline void inc_count(void)           //\lnlbl{inc:b}
> >  {
> >       WRITE_ONCE(counter, counter + 1);
> >  }                                            //\lnlbl{inc:e}
> >
> > -static __inline__ unsigned long read_count(void)
> > +static inline unsigned long read_count(void)
> >  {
> >       int t;
> >       unsigned long sum;
> > @@ -48,7 +48,7 @@ static __inline__ unsigned long read_count(void)
> >  }
> >
> >  #ifndef FCV_SNIPPET
> > -__inline__ void count_init(void)
> > +inline void count_init(void)
> >  {
> >  }
> >  #endif /* FCV_SNIPPET */
> > @@ -73,7 +73,7 @@ void count_unregister_thread(int nthreadsexpected)  //\lnlbl{unreg:b}
> >  }                                                    //\lnlbl{unreg:e}
> >  //\end{snippet}
> >
> > -__inline__ void count_cleanup(void)
> > +inline void count_cleanup(void)
> >  {
> >  }
> >
> > diff --git a/count/count.tex b/count/count.tex
> > index 523789e2..173b964c 100644
> > --- a/count/count.tex
> > +++ b/count/count.tex
> > @@ -721,8 +721,8 @@ This is the topic of the next section.
> >  \subsection{Per-Thread-Variable-Based Implementation}
> >  \label{sec:count:Per-Thread-Variable-Based Implementation}
> >
> > -\GCC\ provides an \apig{__thread} storage class that provides
> > -per-thread storage.
> > +The C language, since C11, features a \apig{_Thread_local} storage class that
> > +provides per-thread storage.
>
> Could you please mention __thread as the old way?  Either in the text
> directly, in parentheses, as a footnote, or as a quick quiz, your choice.
>
>                                                         Thanx, Paul
>
> >  This can be used as shown in
> >  \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c})
> >  to implement
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:12   ` Elad Lahav
@ 2022-08-13 23:17     ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-13 23:17 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

That sounds good, thank you!

There are quite a few people on projects for which C11 is not yet a
thing.  Or that need to look at old code.

But yes, some years hence, perhaps all mention of __thread is dropped
from this book.  Not today, though.  ;-)

						Thanx, Paul

On Sat, Aug 13, 2022 at 07:12:48PM -0400, Elad Lahav wrote:
> Yes, I thought about a footnote along the lines of "previous versions
> of this book used GCC extensions..." but wasn't sure if you'd approve.
> I can certainly add that.
> 
> --Elad
> 
> On Sat, 13 Aug 2022 at 19:08, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sat, Aug 13, 2022 at 07:45:59AM -0400, Elad Lahav wrote:
> > > Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> >
> > Nice, thank you!
> >
> > But did you verify that none of the text describing count_end.c needed
> > changing?  (It seems unlikely that any would, but could you please check?)
> >
> > > ---
> > >  CodeSamples/count/count_end.c | 12 ++++++------
> > >  count/count.tex               |  4 ++--
> > >  2 files changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/CodeSamples/count/count_end.c b/CodeSamples/count/count_end.c
> > > index 722ad2f7..5e7b9ee1 100644
> > > --- a/CodeSamples/count/count_end.c
> > > +++ b/CodeSamples/count/count_end.c
> > > @@ -1,6 +1,6 @@
> > >  /*
> > >   * count_end.c: Per-thread statistical counters that provide sum at end.
> > > - *   Uses __thread for each thread's counter.
> > > + *   Uses _Thread_local for each thread's counter.
> > >   *
> > >   * This program is free software; you can redistribute it and/or modify
> > >   * it under the terms of the GNU General Public License as published by
> > > @@ -23,17 +23,17 @@
> > >  #include "../api.h"
> > >
> > >  //\begin{snippet}[labelbase=ln:count:count_end:whole,commandchars=\\\@\$]
> > > -unsigned long __thread counter = 0;          //\lnlbl{var:b}
> > > +unsigned long _Thread_local counter = 0;             //\lnlbl{var:b}
> > >  unsigned long *counterp[NR_THREADS] = { NULL };
> > >  unsigned long finalcount = 0;
> > >  DEFINE_SPINLOCK(final_mutex);                        //\lnlbl{var:e}
> > >
> > > -static __inline__ void inc_count(void)               //\lnlbl{inc:b}
> > > +static inline void inc_count(void)           //\lnlbl{inc:b}
> > >  {
> > >       WRITE_ONCE(counter, counter + 1);
> > >  }                                            //\lnlbl{inc:e}
> > >
> > > -static __inline__ unsigned long read_count(void)
> > > +static inline unsigned long read_count(void)
> > >  {
> > >       int t;
> > >       unsigned long sum;
> > > @@ -48,7 +48,7 @@ static __inline__ unsigned long read_count(void)
> > >  }
> > >
> > >  #ifndef FCV_SNIPPET
> > > -__inline__ void count_init(void)
> > > +inline void count_init(void)
> > >  {
> > >  }
> > >  #endif /* FCV_SNIPPET */
> > > @@ -73,7 +73,7 @@ void count_unregister_thread(int nthreadsexpected)  //\lnlbl{unreg:b}
> > >  }                                                    //\lnlbl{unreg:e}
> > >  //\end{snippet}
> > >
> > > -__inline__ void count_cleanup(void)
> > > +inline void count_cleanup(void)
> > >  {
> > >  }
> > >
> > > diff --git a/count/count.tex b/count/count.tex
> > > index 523789e2..173b964c 100644
> > > --- a/count/count.tex
> > > +++ b/count/count.tex
> > > @@ -721,8 +721,8 @@ This is the topic of the next section.
> > >  \subsection{Per-Thread-Variable-Based Implementation}
> > >  \label{sec:count:Per-Thread-Variable-Based Implementation}
> > >
> > > -\GCC\ provides an \apig{__thread} storage class that provides
> > > -per-thread storage.
> > > +The C language, since C11, features a \apig{_Thread_local} storage class that
> > > +provides per-thread storage.
> >
> > Could you please mention __thread as the old way?  Either in the text
> > directly, in parentheses, as a footnote, or as a quick quiz, your choice.
> >
> >                                                         Thanx, Paul
> >
> > >  This can be used as shown in
> > >  \cref{lst:count:Per-Thread Statistical Counters} (\path{count_end.c})
> > >  to implement
> > > --
> > > 2.25.1
> > >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:12   ` Paul E. McKenney
@ 2022-08-13 23:23     ` Elad Lahav
  2022-08-13 23:29       ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Elad Lahav @ 2022-08-13 23:23 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook

I believe that performance is statistically the same, but I will
double check. I assume both GCC and C11 end up using the same
underlying mechanism for thread-local storage:

https://uclibc.org/docs/tls.pdf

If not implemented, TLS falls back on pthread_[gs]et_specific(), but
again it would be the same for __thread and _Thread_local.

I was about to run all benchmarks on a cute 16-core aarch64 machine I
recently bought for testing the scalability of my latest kernel work,
but the code requires linking against urcu, and now I'm going down the
rabbit hole of building that for QNX. Does the counter code really
need this library?

--Elad

On Sat, 13 Aug 2022 at 19:12, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sat, Aug 13, 2022 at 07:49:27AM -0400, Elad Lahav wrote:
> > On 2022-08-13 07:45, Elad Lahav wrote:
> > > Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> > > ---
> >
> > This one will probably require some back-and-forth. If we change it here, do
> > other places need to be updated? Have I now removed the explanation of GCC's
> > __thread storage class, which will impact the readability of other sections?
> > Or are you ready to take the plunge and convert all code snippets?
> >
> > For the record, I did build and test count_end.c.
>
> As long as you continue building and testing, checking the descriptions in
> text of the code snippets (in case there is a mention of __thread there),
> and verifying that the PDF still builds, as fast as you are willing to go!
>
> Are you seeing the same performance with the new as with the old on
> your hardware?  (No statistically significant difference on my laptop,
> but figured I should ask.)
>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:23     ` Elad Lahav
@ 2022-08-13 23:29       ` Paul E. McKenney
  2022-08-14 10:11         ` Elad Lahav
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-13 23:29 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
> I believe that performance is statistically the same, but I will
> double check. I assume both GCC and C11 end up using the same
> underlying mechanism for thread-local storage:
> 
> https://uclibc.org/docs/tls.pdf
> 
> If not implemented, TLS falls back on pthread_[gs]et_specific(), but
> again it would be the same for __thread and _Thread_local.

Sounds likely to me, but I have been surprised before.

> I was about to run all benchmarks on a cute 16-core aarch64 machine I
> recently bought for testing the scalability of my latest kernel work,
> but the code requires linking against urcu, and now I'm going down the
> rabbit hole of building that for QNX. Does the counter code really
> need this library?

Only for one of the count programs, count_end_rcu.c.  If you comment that
one out of the Makefile, you should be fine.  Some other adjustments to
other files might be needed, for example, CodeSamples/Makefile.

Probably about the same amount of work to wean CodeSamples/count from
that library as to get it working on QNX.  Murphy would of course argue
that whichever path you choose will be the harder of the two.

							Thanx, Paul

> --Elad
> 
> On Sat, 13 Aug 2022 at 19:12, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sat, Aug 13, 2022 at 07:49:27AM -0400, Elad Lahav wrote:
> > > On 2022-08-13 07:45, Elad Lahav wrote:
> > > > Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> > > > ---
> > >
> > > This one will probably require some back-and-forth. If we change it here, do
> > > other places need to be updated? Have I now removed the explanation of GCC's
> > > __thread storage class, which will impact the readability of other sections?
> > > Or are you ready to take the plunge and convert all code snippets?
> > >
> > > For the record, I did build and test count_end.c.
> >
> > As long as you continue building and testing, checking the descriptions in
> > text of the code snippets (in case there is a mention of __thread there),
> > and verifying that the PDF still builds, as fast as you are willing to go!
> >
> > Are you seeing the same performance with the new as with the old on
> > your hardware?  (No statistically significant difference on my laptop,
> > but figured I should ask.)
> >
> >                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:29       ` Paul E. McKenney
@ 2022-08-14 10:11         ` Elad Lahav
  2022-08-15  3:59           ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Elad Lahav @ 2022-08-14 10:11 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook

On 2022-08-13 19:29, Paul E. McKenney wrote:
> On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
>> I believe that performance is statistically the same, but I will
>> double check. I assume both GCC and C11 end up using the same
>> underlying mechanism for thread-local storage:
>>
>> https://uclibc.org/docs/tls.pdf
>>
>> If not implemented, TLS falls back on pthread_[gs]et_specific(), but
>> again it would be the same for __thread and _Thread_local.
> 
> Sounds likely to me, but I have been surprised before.

Results:

Linux/x86-64, i7-8550U, GCC TLS

elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 408000  n_updates: 689366000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 588.235  ns/update: 0.348146
elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 443000  n_updates: 762876000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 541.761  ns/update: 0.314599
elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 395000  n_updates: 666718000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 607.595  ns/update: 0.359972

Linux/x86-64, i7-8550U, C11 TLS

elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 410000  n_updates: 684880000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 585.366  ns/update: 0.350426
elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 411000  n_updates: 698148000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 583.942  ns/update: 0.343767
elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
n_reads: 409000  n_updates: 704072000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 586.797  ns/update: 0.340874

QNX/aarch64, NXP LX2160A, GCC TLS (emulated)

elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 221000  n_updates: 67876000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1085.97  ns/update: 3.53586
elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 227000  n_updates: 67901000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1057.27  ns/update: 3.53456
elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 211000  n_updates: 65043000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1137.44  ns/update: 3.68987

QNX/aarch64, NXP LX2160A, C11 TLS (emulated)

elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 217000  n_updates: 67814000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1105.99  ns/update: 3.53909
elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 223000  n_updates: 67860000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1076.23  ns/update: 3.53669
elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
n_reads: 218000  n_updates: 68116000  nreaders: 1  nupdaters: 1 
duration: 240
ns/read: 1100.92  ns/update: 3.5234

Looking at the disassembly for the QNX binary I realized that it is 
still using the emulated TLS option (i.e., the compiler generates a shim 
layer that uses pthread_[gs]et_specific()). I will need to rebuild the 
compiler with native TLS support and retest, though I doubt it will have 
a significant impact.

In any case, both the native and the emulated TLS options show the same 
results with the GCC and C11 versions.

--Elad

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-14 10:11         ` Elad Lahav
@ 2022-08-15  3:59           ` Paul E. McKenney
  2022-08-16 23:43             ` Elad Lahav
  0 siblings, 1 reply; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-15  3:59 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

On Sun, Aug 14, 2022 at 06:11:44AM -0400, Elad Lahav wrote:
> On 2022-08-13 19:29, Paul E. McKenney wrote:
> > On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
> > > I believe that performance is statistically the same, but I will
> > > double check. I assume both GCC and C11 end up using the same
> > > underlying mechanism for thread-local storage:
> > > 
> > > https://uclibc.org/docs/tls.pdf
> > > 
> > > If not implemented, TLS falls back on pthread_[gs]et_specific(), but
> > > again it would be the same for __thread and _Thread_local.
> > 
> > Sounds likely to me, but I have been surprised before.
> 
> Results:
> 
> Linux/x86-64, i7-8550U, GCC TLS
> 
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 408000  n_updates: 689366000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 588.235  ns/update: 0.348146
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 443000  n_updates: 762876000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 541.761  ns/update: 0.314599
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 395000  n_updates: 666718000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 607.595  ns/update: 0.359972
> 
> Linux/x86-64, i7-8550U, C11 TLS
> 
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 410000  n_updates: 684880000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 585.366  ns/update: 0.350426
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 411000  n_updates: 698148000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 583.942  ns/update: 0.343767
> elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> n_reads: 409000  n_updates: 704072000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 586.797  ns/update: 0.340874
> 
> QNX/aarch64, NXP LX2160A, GCC TLS (emulated)
> 
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 221000  n_updates: 67876000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1085.97  ns/update: 3.53586
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 227000  n_updates: 67901000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1057.27  ns/update: 3.53456
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 211000  n_updates: 65043000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1137.44  ns/update: 3.68987
> 
> QNX/aarch64, NXP LX2160A, C11 TLS (emulated)
> 
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 217000  n_updates: 67814000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1105.99  ns/update: 3.53909
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 223000  n_updates: 67860000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1076.23  ns/update: 3.53669
> elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> n_reads: 218000  n_updates: 68116000  nreaders: 1  nupdaters: 1 duration:
> 240
> ns/read: 1100.92  ns/update: 3.5234
> 
> Looking at the disassembly for the QNX binary I realized that it is still
> using the emulated TLS option (i.e., the compiler generates a shim layer
> that uses pthread_[gs]et_specific()). I will need to rebuild the compiler
> with native TLS support and retest, though I doubt it will have a
> significant impact.

That would explain the lack of statistical significance.  If there is
a change, I would hope that the new style is faster.

> In any case, both the native and the emulated TLS options show the same
> results with the GCC and C11 versions.

Sounds good, and thank you for checking!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-15  3:59           ` Paul E. McKenney
@ 2022-08-16 23:43             ` Elad Lahav
  2022-08-16 23:55               ` Paul E. McKenney
  0 siblings, 1 reply; 13+ messages in thread
From: Elad Lahav @ 2022-08-16 23:43 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: perfbook

I also ran the count_end test on Linux/aarch64 on the same board, and
got the same results as on QNX. No surprise there, but the discrepancy
with the x86_64 results on Linux made me want to double-check that the
QNX build is not doing something stupid. I guess that what it shows is
that, Apple's M1 chip notwithstanding, ARM still hasn't closed the gap
when it comes to raw performance. You can always claim that we are
comparing apples and oranges here, but I noticed the same trend on
many different boards.

--Elad

On Sun, 14 Aug 2022 at 23:59, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Sun, Aug 14, 2022 at 06:11:44AM -0400, Elad Lahav wrote:
> > On 2022-08-13 19:29, Paul E. McKenney wrote:
> > > On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
> > > > I believe that performance is statistically the same, but I will
> > > > double check. I assume both GCC and C11 end up using the same
> > > > underlying mechanism for thread-local storage:
> > > >
> > > > https://uclibc.org/docs/tls.pdf
> > > >
> > > > If not implemented, TLS falls back on pthread_[gs]et_specific(), but
> > > > again it would be the same for __thread and _Thread_local.
> > >
> > > Sounds likely to me, but I have been surprised before.
> >
> > Results:
> >
> > Linux/x86-64, i7-8550U, GCC TLS
> >
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 408000  n_updates: 689366000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 588.235  ns/update: 0.348146
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 443000  n_updates: 762876000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 541.761  ns/update: 0.314599
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 395000  n_updates: 666718000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 607.595  ns/update: 0.359972
> >
> > Linux/x86-64, i7-8550U, C11 TLS
> >
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 410000  n_updates: 684880000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 585.366  ns/update: 0.350426
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 411000  n_updates: 698148000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 583.942  ns/update: 0.343767
> > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > n_reads: 409000  n_updates: 704072000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 586.797  ns/update: 0.340874
> >
> > QNX/aarch64, NXP LX2160A, GCC TLS (emulated)
> >
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 221000  n_updates: 67876000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1085.97  ns/update: 3.53586
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 227000  n_updates: 67901000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1057.27  ns/update: 3.53456
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 211000  n_updates: 65043000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1137.44  ns/update: 3.68987
> >
> > QNX/aarch64, NXP LX2160A, C11 TLS (emulated)
> >
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 217000  n_updates: 67814000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1105.99  ns/update: 3.53909
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 223000  n_updates: 67860000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1076.23  ns/update: 3.53669
> > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > n_reads: 218000  n_updates: 68116000  nreaders: 1  nupdaters: 1 duration:
> > 240
> > ns/read: 1100.92  ns/update: 3.5234
> >
> > Looking at the disassembly for the QNX binary I realized that it is still
> > using the emulated TLS option (i.e., the compiler generates a shim layer
> > that uses pthread_[gs]et_specific()). I will need to rebuild the compiler
> > with native TLS support and retest, though I doubt it will have a
> > significant impact.
>
> That would explain the lack of statistical significance.  If there is
> a change, I would hope that the new style is faster.
>
> > In any case, both the native and the emulated TLS options show the same
> > results with the GCC and C11 versions.
>
> Sounds good, and thank you for checking!
>
>                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-16 23:43             ` Elad Lahav
@ 2022-08-16 23:55               ` Paul E. McKenney
  0 siblings, 0 replies; 13+ messages in thread
From: Paul E. McKenney @ 2022-08-16 23:55 UTC (permalink / raw)
  To: Elad Lahav; +Cc: perfbook

I am not all that worried about it, mostly curious.

get_timestamp() isn't used all that heavily, and it seems plausible that
a 40-nanosecond timestamp would work in rcu_ts.[ch].

Fun times, though!

							Thanx, Paul

On Tue, Aug 16, 2022 at 07:43:57PM -0400, Elad Lahav wrote:
> I also ran the count_end test on Linux/aarch64 on the same board, and
> got the same results as on QNX. No surprise there, but the discrepancy
> with the x86_64 results on Linux made me want to double-check that the
> QNX build is not doing something stupid. I guess that what it shows is
> that, Apple's M1 chip notwithstanding, ARM still hasn't closed the gap
> when it comes to raw performance. You can always claim that we are
> comparing apples and oranges here, but I noticed the same trend on
> many different boards.
> 
> --Elad
> 
> On Sun, 14 Aug 2022 at 23:59, Paul E. McKenney <paulmck@kernel.org> wrote:
> >
> > On Sun, Aug 14, 2022 at 06:11:44AM -0400, Elad Lahav wrote:
> > > On 2022-08-13 19:29, Paul E. McKenney wrote:
> > > > On Sat, Aug 13, 2022 at 07:23:40PM -0400, Elad Lahav wrote:
> > > > > I believe that performance is statistically the same, but I will
> > > > > double check. I assume both GCC and C11 end up using the same
> > > > > underlying mechanism for thread-local storage:
> > > > >
> > > > > https://uclibc.org/docs/tls.pdf
> > > > >
> > > > > If not implemented, TLS falls back on pthread_[gs]et_specific(), but
> > > > > again it would be the same for __thread and _Thread_local.
> > > >
> > > > Sounds likely to me, but I have been surprised before.
> > >
> > > Results:
> > >
> > > Linux/x86-64, i7-8550U, GCC TLS
> > >
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 408000  n_updates: 689366000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 588.235  ns/update: 0.348146
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 443000  n_updates: 762876000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 541.761  ns/update: 0.314599
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 395000  n_updates: 666718000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 607.595  ns/update: 0.359972
> > >
> > > Linux/x86-64, i7-8550U, C11 TLS
> > >
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 410000  n_updates: 684880000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 585.366  ns/update: 0.350426
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 411000  n_updates: 698148000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 583.942  ns/update: 0.343767
> > > elahav@lamia:~/src/other/perfbook/CodeSamples/count$ ./count_end
> > > n_reads: 409000  n_updates: 704072000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 586.797  ns/update: 0.340874
> > >
> > > QNX/aarch64, NXP LX2160A, GCC TLS (emulated)
> > >
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 221000  n_updates: 67876000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1085.97  ns/update: 3.53586
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 227000  n_updates: 67901000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1057.27  ns/update: 3.53456
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 211000  n_updates: 65043000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1137.44  ns/update: 3.68987
> > >
> > > QNX/aarch64, NXP LX2160A, C11 TLS (emulated)
> > >
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 217000  n_updates: 67814000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1105.99  ns/update: 3.53909
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 223000  n_updates: 67860000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1076.23  ns/update: 3.53669
> > > elahav@honeycomb:~/src/CodeSamples/count$ ./count_end
> > > n_reads: 218000  n_updates: 68116000  nreaders: 1  nupdaters: 1 duration:
> > > 240
> > > ns/read: 1100.92  ns/update: 3.5234
> > >
> > > Looking at the disassembly for the QNX binary I realized that it is still
> > > using the emulated TLS option (i.e., the compiler generates a shim layer
> > > that uses pthread_[gs]et_specific()). I will need to rebuild the compiler
> > > with native TLS support and retest, though I doubt it will have a
> > > significant impact.
> >
> > That would explain the lack of statistical significance.  If there is
> > a change, I would hope that the new style is faster.
> >
> > > In any case, both the native and the emulated TLS options show the same
> > > results with the GCC and C11 versions.
> >
> > Sounds good, and thank you for checking!
> >
> >                                                         Thanx, Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] count: Switch from GCC to C11 thread-local storage
  2022-08-13 23:08 ` Paul E. McKenney
  2022-08-13 23:12   ` Elad Lahav
@ 2022-08-17 11:02   ` Elad Lahav
  1 sibling, 0 replies; 13+ messages in thread
From: Elad Lahav @ 2022-08-17 11:02 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook

On 2022-08-13 19:08, Paul E. McKenney wrote:
> On Sat, Aug 13, 2022 at 07:45:59AM -0400, Elad Lahav wrote:
>> Signed-off-by: Elad Lahav <e2lahav@gmail.com>
> 
> Nice, thank you!
> 
> But did you verify that none of the text describing count_end.c needed
> changing?  (It seems unlikely that any would, but could you please check?)

Indeed there were such references. Updated.

> Could you please mention __thread as the old way?  Either in the text
> directly, in parentheses, as a footnote, or as a quick quiz, your choice.

Done. See v2.

--Elad

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-08-17 11:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-13 11:45 [PATCH] count: Switch from GCC to C11 thread-local storage Elad Lahav
2022-08-13 11:49 ` Elad Lahav
2022-08-13 23:12   ` Paul E. McKenney
2022-08-13 23:23     ` Elad Lahav
2022-08-13 23:29       ` Paul E. McKenney
2022-08-14 10:11         ` Elad Lahav
2022-08-15  3:59           ` Paul E. McKenney
2022-08-16 23:43             ` Elad Lahav
2022-08-16 23:55               ` Paul E. McKenney
2022-08-13 23:08 ` Paul E. McKenney
2022-08-13 23:12   ` Elad Lahav
2022-08-13 23:17     ` Paul E. McKenney
2022-08-17 11:02   ` Elad Lahav

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.