linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
@ 2019-12-08 21:06 Uwe Kleine-König
  2019-12-09  9:40 ` Daniel Wagner
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Uwe Kleine-König @ 2019-12-08 21:06 UTC (permalink / raw)
  To: Clark Williams, John Kacur; +Cc: linux-rt-users

This fixes a build error on arm64, mips*, ppc and several others
---
 src/queuelat/queuelat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/queuelat/queuelat.c b/src/queuelat/queuelat.c
index cccb50ef0cc4..98346f346f82 100644
--- a/src/queuelat/queuelat.c
+++ b/src/queuelat/queuelat.c
@@ -283,7 +283,7 @@ static inline unsigned long long __rdtscll(void)
 
 #define gettick(val) do { (val) = __rdtscll(); } while (0)
 
-#elif defined __arm__
+#else
 
 static inline unsigned long long __clock_gettime(void)
 {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-08 21:06 [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs Uwe Kleine-König
@ 2019-12-09  9:40 ` Daniel Wagner
  2019-12-10 11:24   ` John Kacur
  2019-12-12 17:31   ` Sebastian Andrzej Siewior
  2019-12-10 11:20 ` John Kacur
  2019-12-12 17:46 ` Sebastian Andrzej Siewior
  2 siblings, 2 replies; 11+ messages in thread
From: Daniel Wagner @ 2019-12-09  9:40 UTC (permalink / raw)
  To: Uwe Kleine-König, Clark Williams, John Kacur; +Cc: linux-rt-users

Hi Uwe,

On 2019-12-08 22:06, Uwe Kleine-König wrote:
> This fixes a build error on arm64, mips*, ppc and several others

Just wondering if the tool should print a warning if the fallback is 
used? IIRC, the code wants to use the TSC and clock_gettime is probably 
not so precise.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-08 21:06 [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs Uwe Kleine-König
  2019-12-09  9:40 ` Daniel Wagner
@ 2019-12-10 11:20 ` John Kacur
  2019-12-12 17:46 ` Sebastian Andrzej Siewior
  2 siblings, 0 replies; 11+ messages in thread
From: John Kacur @ 2019-12-10 11:20 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Clark Williams, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 685 bytes --]



On Sun, 8 Dec 2019, Uwe Kleine-König wrote:

> This fixes a build error on arm64, mips*, ppc and several others
> ---
>  src/queuelat/queuelat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/queuelat/queuelat.c b/src/queuelat/queuelat.c
> index cccb50ef0cc4..98346f346f82 100644
> --- a/src/queuelat/queuelat.c
> +++ b/src/queuelat/queuelat.c
> @@ -283,7 +283,7 @@ static inline unsigned long long __rdtscll(void)
>  
>  #define gettick(val) do { (val) = __rdtscll(); } while (0)
>  
> -#elif defined __arm__
> +#else
>  
>  static inline unsigned long long __clock_gettime(void)
>  {
> -- 
> 2.24.0
> 
> 
Signed-off-by: John Kacur <jkacur@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-09  9:40 ` Daniel Wagner
@ 2019-12-10 11:24   ` John Kacur
  2019-12-12 17:31   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 11+ messages in thread
From: John Kacur @ 2019-12-10 11:24 UTC (permalink / raw)
  To: Daniel Wagner; +Cc: Uwe Kleine-König, Clark Williams, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 393 bytes --]



On Mon, 9 Dec 2019, Daniel Wagner wrote:

> Hi Uwe,
> 
> On 2019-12-08 22:06, Uwe Kleine-König wrote:
> > This fixes a build error on arm64, mips*, ppc and several others
> 
> Just wondering if the tool should print a warning if the fallback is used?
> IIRC, the code wants to use the TSC and clock_gettime is probably not so
> precise.
> 
> Thanks,
> Daniel
> 

I take patches!

John Kacur

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-09  9:40 ` Daniel Wagner
  2019-12-10 11:24   ` John Kacur
@ 2019-12-12 17:31   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-12-12 17:31 UTC (permalink / raw)
  To: Daniel Wagner
  Cc: Uwe Kleine-König, Clark Williams, John Kacur, linux-rt-users

On 2019-12-09 10:40:29 [+0100], Daniel Wagner wrote:
> Just wondering if the tool should print a warning if the fallback is used?
> IIRC, the code wants to use the TSC and clock_gettime is probably not so
> precise.

clock_gettime() is precise but it might have more overhead. With VDSO
support the overhead is quite low.

> Thanks,
> Daniel

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-08 21:06 [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs Uwe Kleine-König
  2019-12-09  9:40 ` Daniel Wagner
  2019-12-10 11:20 ` John Kacur
@ 2019-12-12 17:46 ` Sebastian Andrzej Siewior
  2019-12-13 12:41   ` Daniel Wagner
  2019-12-13 14:54   ` John Kacur
  2 siblings, 2 replies; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-12-12 17:46 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: Clark Williams, John Kacur, linux-rt-users

On 2019-12-08 22:06:25 [+0100], Uwe Kleine-König wrote:
> This fixes a build error on arm64, mips*, ppc and several others
> ---
>  src/queuelat/queuelat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/queuelat/queuelat.c b/src/queuelat/queuelat.c
> index cccb50ef0cc4..98346f346f82 100644
> --- a/src/queuelat/queuelat.c
> +++ b/src/queuelat/queuelat.c
> @@ -283,7 +283,7 @@ static inline unsigned long long __rdtscll(void)
>  
>  #define gettick(val) do { (val) = __rdtscll(); } while (0)
>  
> -#elif defined __arm__
> +#else

Did actually anyone look at the code? I somehow missed the queuelat
thingy completely. Now that I look I think I need further assistance…

So what I select as frequency for the !x86 case? And why.

That freq. script reports here:
|1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809

I guess I pick one…

Could someone please figure out the actual difference of clock_gettime()
vs rdtsc() so we know how important it is. Based on its current
implementation, if memmove() takes >1sec then it ends up undetected
because only the ns of the timestamp are considered for.

>  static inline unsigned long long __clock_gettime(void)
>  {
> -- 
> 2.24.0
> 

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-12 17:46 ` Sebastian Andrzej Siewior
@ 2019-12-13 12:41   ` Daniel Wagner
  2019-12-13 14:54   ` John Kacur
  1 sibling, 0 replies; 11+ messages in thread
From: Daniel Wagner @ 2019-12-13 12:41 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Uwe Kleine-König
  Cc: Clark Williams, John Kacur, linux-rt-users

Hi,

On 2019-12-12 18:46, Sebastian Andrzej Siewior wrote:
> So what I select as frequency for the !x86 case? And why.

IMO, an user should be able to run rt-tests without the need to provide 
special configuration or tuning. queuelat is a bit hard to use at this 
point.

> That freq. script reports here:
> |1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809
> 
> I guess I pick one…
> 
> Could someone please figure out the actual difference of clock_gettime()
> vs rdtsc() so we know how important it is.

I didn't really understood what the test is doing. The initial 
clock_gettime() patch was just to shoutup the compiler.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-12 17:46 ` Sebastian Andrzej Siewior
  2019-12-13 12:41   ` Daniel Wagner
@ 2019-12-13 14:54   ` John Kacur
  2019-12-13 23:02     ` Marcelo Tosatti
  1 sibling, 1 reply; 11+ messages in thread
From: John Kacur @ 2019-12-13 14:54 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Uwe Kleine-König, Clark Williams, Linux RT Users, Marcelo Tosatti

[-- Attachment #1: Type: text/plain, Size: 1631 bytes --]



On Thu, 12 Dec 2019, Sebastian Andrzej Siewior wrote:

> On 2019-12-08 22:06:25 [+0100], Uwe Kleine-König wrote:
> > This fixes a build error on arm64, mips*, ppc and several others
> > ---
> >  src/queuelat/queuelat.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/src/queuelat/queuelat.c b/src/queuelat/queuelat.c
> > index cccb50ef0cc4..98346f346f82 100644
> > --- a/src/queuelat/queuelat.c
> > +++ b/src/queuelat/queuelat.c
> > @@ -283,7 +283,7 @@ static inline unsigned long long __rdtscll(void)
> >  
> >  #define gettick(val) do { (val) = __rdtscll(); } while (0)
> >  
> > -#elif defined __arm__
> > +#else
> 
> Did actually anyone look at the code? I somehow missed the queuelat
> thingy completely. Now that I look I think I need further assistance…
> 
> So what I select as frequency for the !x86 case? And why.
> 
> That freq. script reports here:
> |1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809
> 
> I guess I pick one…
> 
> Could someone please figure out the actual difference of clock_gettime()
> vs rdtsc() so we know how important it is. Based on its current
> implementation, if memmove() takes >1sec then it ends up undetected
> because only the ns of the timestamp are considered for.
> 
> >  static inline unsigned long long __clock_gettime(void)
> >  {
> > -- 
> > 2.24.0
> > 
> 
> Sebastian
> 

Adding Marcelo to the cc list

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-13 14:54   ` John Kacur
@ 2019-12-13 23:02     ` Marcelo Tosatti
  2019-12-16 15:34       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 11+ messages in thread
From: Marcelo Tosatti @ 2019-12-13 23:02 UTC (permalink / raw)
  To: John Kacur
  Cc: Sebastian Andrzej Siewior, Uwe Kleine-König, Clark Williams,
	Linux RT Users

On Fri, Dec 13, 2019 at 03:54:48PM +0100, John Kacur wrote:
> 
> 
> On Thu, 12 Dec 2019, Sebastian Andrzej Siewior wrote:
> 
> > On 2019-12-08 22:06:25 [+0100], Uwe Kleine-König wrote:
> > > This fixes a build error on arm64, mips*, ppc and several others
> > > ---
> > >  src/queuelat/queuelat.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/src/queuelat/queuelat.c b/src/queuelat/queuelat.c
> > > index cccb50ef0cc4..98346f346f82 100644
> > > --- a/src/queuelat/queuelat.c
> > > +++ b/src/queuelat/queuelat.c
> > > @@ -283,7 +283,7 @@ static inline unsigned long long __rdtscll(void)
> > >  
> > >  #define gettick(val) do { (val) = __rdtscll(); } while (0)
> > >  
> > > -#elif defined __arm__
> > > +#else
> > 
> > Did actually anyone look at the code? I somehow missed the queuelat
> > thingy completely. Now that I look I think I need further assistance…
> > 
> > So what I select as frequency for the !x86 case? And why.
> > 
> > That freq. script reports here:
> > |1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809
> > 
> > I guess I pick one…
> > 
> > Could someone please figure out the actual difference of clock_gettime()
> > vs rdtsc() so we know how important it is. Based on its current
> > implementation, if memmove() takes >1sec then it ends up undetected
> > because only the ns of the timestamp are considered for.

/* Program parameters:
 * max_queue_len: maximum latency allowed, in nanoseconds (int).
 * cycles_per_packet: number of cycles to process one packet (int).
 * mpps(million-packet-per-sec): million packets per second (float).
 * tsc_freq_mhz: TSC frequency in MHz, as measured by TSC PIT
 * calibration 
 * (search for "Detected XXX MHz processor" in dmesg, and use the
 * integer part).
 */

So you have to pass it "processor frequency" (you can change the names, 
its TSC but thats x86 specific). The script grabs the processor
frequency (so you have to adjust that the script to ARM).

And thats it. Please replace tsc_mhz -> processor_freq.




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-13 23:02     ` Marcelo Tosatti
@ 2019-12-16 15:34       ` Sebastian Andrzej Siewior
  2019-12-16 21:14         ` Marcelo Tosatti
  0 siblings, 1 reply; 11+ messages in thread
From: Sebastian Andrzej Siewior @ 2019-12-16 15:34 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: John Kacur, Uwe Kleine-König, Clark Williams, Linux RT Users

On 2019-12-13 21:02:56 [-0200], Marcelo Tosatti wrote:
> > > That freq. script reports here:
> > > |1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809
> > > 
> > > I guess I pick one…
> > > 
> > > Could someone please figure out the actual difference of clock_gettime()
> > > vs rdtsc() so we know how important it is. Based on its current
> > > implementation, if memmove() takes >1sec then it ends up undetected
> > > because only the ns of the timestamp are considered for.
> 
> /* Program parameters:
>  * max_queue_len: maximum latency allowed, in nanoseconds (int).
>  * cycles_per_packet: number of cycles to process one packet (int).
>  * mpps(million-packet-per-sec): million packets per second (float).
>  * tsc_freq_mhz: TSC frequency in MHz, as measured by TSC PIT
>  * calibration 
>  * (search for "Detected XXX MHz processor" in dmesg, and use the
>  * integer part).
>  */
> 
> So you have to pass it "processor frequency" (you can change the names, 
> its TSC but thats x86 specific). The script grabs the processor
> frequency (so you have to adjust that the script to ARM).
> 
> And thats it. Please replace tsc_mhz -> processor_freq.

So the script reports the freq. from 1555.184 Mhz to 2574.546 Mhz.
And I doubt the numbers remain steady on today's x86 even with
gov=performance.
However, I asked for the frequency to be used on !x86 given the code we
have. It was fixed in hurry and you would have to use 1000 so that the
math keeps working.
Then I asked how much benefit of this complicated TSC calculation vs
clock_gettime() has.
I tried to use it myself but after 30secs I saw no output, it just ate
100% and it seemed to do nothing. So I gave up.

Sebastian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs
  2019-12-16 15:34       ` Sebastian Andrzej Siewior
@ 2019-12-16 21:14         ` Marcelo Tosatti
  0 siblings, 0 replies; 11+ messages in thread
From: Marcelo Tosatti @ 2019-12-16 21:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: John Kacur, Uwe Kleine-König, Clark Williams, Linux RT Users

On Mon, Dec 16, 2019 at 04:34:49PM +0100, Sebastian Andrzej Siewior wrote:
> On 2019-12-13 21:02:56 [-0200], Marcelo Tosatti wrote:
> > > > That freq. script reports here:
> > > > |1555.184 1566.269 1566.498 1560.055 1593.149 1568.185 1583.807 1599.096 2574.546 2572.408 2573.849 2583.862 2619.402 1825.680 1847.264 1870.318 2552.102 1570.552 1589.650 1595.813 1590.253 1573.834 1589.438 1599.439 1770.963 1786.370 1814.918 1811.936 1828.277 1850.905 1861.976 1792.809
> > > > 
> > > > I guess I pick one…
> > > > 
> > > > Could someone please figure out the actual difference of clock_gettime()
> > > > vs rdtsc() so we know how important it is. Based on its current
> > > > implementation, if memmove() takes >1sec then it ends up undetected
> > > > because only the ns of the timestamp are considered for.
> > 
> > /* Program parameters:
> >  * max_queue_len: maximum latency allowed, in nanoseconds (int).
> >  * cycles_per_packet: number of cycles to process one packet (int).
> >  * mpps(million-packet-per-sec): million packets per second (float).
> >  * tsc_freq_mhz: TSC frequency in MHz, as measured by TSC PIT
> >  * calibration 
> >  * (search for "Detected XXX MHz processor" in dmesg, and use the
> >  * integer part).
> >  */
> > 
> > So you have to pass it "processor frequency" (you can change the names, 
> > its TSC but thats x86 specific). The script grabs the processor
> > frequency (so you have to adjust that the script to ARM).
> > 
> > And thats it. Please replace tsc_mhz -> processor_freq.
> 
> So the script reports the freq. from 1555.184 Mhz to 2574.546 Mhz.
> And I doubt the numbers remain steady on today's x86 even with
> gov=performance.
> However, I asked for the frequency to be used on !x86 given the code we
> have. It was fixed in hurry and you would have to use 1000 so that the
> math keeps working.
> Then I asked how much benefit of this complicated TSC calculation vs
> clock_gettime() has.
> I tried to use it myself but after 30secs I saw no output, it just ate
> 100% and it seemed to do nothing. So I gave up.

Then the simulated queue size was not exceeded, accordingly to
the parameters you specified:

* max_queue_len: maximum latency allowed, in nanoseconds (int).
* cycles_per_packet: number of cycles to process one packet (int).
* mpps(million-packet-per-sec): million packets per second (float).

If you increase mpps or increase cycles_per_packet (or both), 
or decrease max_queue_len, it should fail.

Do you see that?


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-12-17  3:35 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-08 21:06 [PATCH rt-tests] queuelat: use ARM implementation of gettick also for all !x86 archs Uwe Kleine-König
2019-12-09  9:40 ` Daniel Wagner
2019-12-10 11:24   ` John Kacur
2019-12-12 17:31   ` Sebastian Andrzej Siewior
2019-12-10 11:20 ` John Kacur
2019-12-12 17:46 ` Sebastian Andrzej Siewior
2019-12-13 12:41   ` Daniel Wagner
2019-12-13 14:54   ` John Kacur
2019-12-13 23:02     ` Marcelo Tosatti
2019-12-16 15:34       ` Sebastian Andrzej Siewior
2019-12-16 21:14         ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).