All of lore.kernel.org
 help / color / mirror / Atom feed
* PREEMPT_RT benchmark
@ 2021-03-10 20:07 Michel Macena Oliveira
  2021-03-11 12:27 ` Daniel Wagner
  2021-03-11 12:58 ` Ahmed S. Darwish
  0 siblings, 2 replies; 11+ messages in thread
From: Michel Macena Oliveira @ 2021-03-10 20:07 UTC (permalink / raw)
  To: linux-rt-users

Hi,
I'm currently developing a time benchmark application where I want
to measure a real time thread latency. My application is based on
Cyclictest from
the rt-tests suit.
> https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/rt-tests

I programmed in such way that it would be equivalent to Cyclict test
running with the following command line:
> cyclictest -m -n -N --threads=1 --interval=10000 --priority=80 --distance=0 --loops=1000 --clock=1

At least that's what I expected, but my latencies are much bigger than
Cyclictest ones.
From cyclictest I get an average in my computer of  something between
2300 and 2500 nanoseconds.  However, in my application I'm having
something between 47000 and 55000 nanoseconds. As you can see it is
much higher!

I'm not sure of what I'm doing wrong, could you help or suggest something?

Processor Info:
> Intel(R) Core(TM) i3-9100F CPU @ 3.60GHz

System info:
> 4.4.208-rt198 #1 SMP PREEMPT RT Wed Jul 8 16:23:16 -03 2020 x86_64 x86_64 x86_64 GNU/Linux

My application code:

> #include <sys/mman.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <sched.h>
> #include <stdint.h>
> #include <time.h>
> #include <pthread.h>
> #define LOOPS 1000
> #define INTERVAL 0.01 //0.01 s = 10 ms = 10000 us = 10000000 ns
> #define USE_NS 1
> #define NSEC_PER_SEC 1000000000
> #define USEC_PER_SEC 1000000
> static inline long int calcdiff_ns(struct timespec t1, struct timespec t2);
> static inline long int calcdiff_us(struct timespec t1, struct timespec t2);
> static inline void time_norm(struct timespec *ts);
>
> void* sim_Thread( void* notUsed );
> pthread_t sim_ThreadID;
> pthread_attr_t sim_AttrThread;
> struct sched_param sim_MySched;
> // Compiler flags: gcc -O0 time_eval_4.c -o time_eval_4 -Wall -lm -lpthread
> int main ()
> {
> if(mlockall(MCL_CURRENT|MCL_FUTURE) == -1)
> {
>                 printf("mlockall failed: %m\n");
>                 exit(-2);
>         }
> pthread_attr_init( &sim_AttrThread );
> sim_MySched.sched_priority = 80;
> pthread_attr_setschedpolicy( &sim_AttrThread, SCHED_FIFO );
> pthread_attr_setinheritsched( &sim_AttrThread, PTHREAD_EXPLICIT_SCHED );
> pthread_attr_setschedparam( &sim_AttrThread, &sim_MySched );
> pthread_create( &sim_ThreadID, &sim_AttrThread, sim_Thread, NULL );
> pthread_join( sim_ThreadID, NULL );
> return 0;
> }
> void* sim_Thread( void* notUsed )
> {
> struct timespec now;
> struct timespec next;
> struct timespec periodicInterval;
> unsigned long int diff;
> unsigned long int j;
> FILE *benchmark_fp;
> benchmark_fp = fopen("Benchmark_4.txt","w+");
> fprintf(benchmark_fp, "loops: %ld \n", (unsigned long int) LOOPS);
> periodicInterval.tv_sec  = (unsigned long int)( INTERVAL );
> periodicInterval.tv_nsec = ( (unsigned long int)( (unsigned long int)( INTERVAL * 1000000000 ) % 1000000000 ) );
> #if USE_NS
> fprintf(benchmark_fp, "interval: %f ns\n",(double) INTERVAL*NSEC_PER_SEC);
> #else
> fprintf(benchmark_fp, "interval: %f us\n",(double) INTERVAL*USEC_PER_SEC);
> #endif
> if( clock_gettime( CLOCK_REALTIME, &now ) != 0 ) return 0;
> next.tv_nsec = now.tv_nsec + periodicInterval.tv_nsec;
> next.tv_sec  = now.tv_sec +  periodicInterval.tv_sec;
> time_norm(&next);
>     for(j=0;j< (unsigned long int) LOOPS; j++ )
>     {
>
> // Wait for the next period to wake up...
> clock_nanosleep( CLOCK_REALTIME, TIMER_ABSTIME, &next, NULL );
> // Get the "now" time moment...
> clock_gettime( CLOCK_REALTIME, &now );
>
> #if USE_NS
>         diff=calcdiff_ns(now, next);
> #else
>     diff=calcdiff_us(now, next);
> #endif
> // Set the timer for interrupt in the next moment...
> next.tv_nsec += periodicInterval.tv_nsec;
> next.tv_sec  += periodicInterval.tv_sec;
> time_norm(&next);
> fprintf(benchmark_fp, "%ld\n", diff);
>     }
>     fclose(benchmark_fp);
> }
> static inline long int calcdiff_ns(struct timespec t1, struct timespec t2)
> {
> long int diff;
> diff = NSEC_PER_SEC * (long int)((int) t1.tv_sec - (int) t2.tv_sec);
> diff += ((int) t1.tv_nsec - (int) t2.tv_nsec);
> return diff;
> }
>
> static inline long int calcdiff_us(struct timespec t1, struct timespec t2)
> {
> long int diff;
> diff = USEC_PER_SEC * (long long)((int) t1.tv_sec - (int) t2.tv_sec);
> diff += ((int) t1.tv_nsec - (int) t2.tv_nsec) / 1000;
> return diff;
> }
> static inline void time_norm(struct timespec *ts)
> {
> while (ts->tv_nsec >= NSEC_PER_SEC) {
> ts->tv_nsec -= NSEC_PER_SEC;
> ts->tv_sec++;
> }
> }


Thanks!
Michel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-10 20:07 PREEMPT_RT benchmark Michel Macena Oliveira
@ 2021-03-11 12:27 ` Daniel Wagner
  2021-03-11 13:38   ` John Ogness
  2021-03-11 14:56   ` Michel Macena Oliveira
  2021-03-11 12:58 ` Ahmed S. Darwish
  1 sibling, 2 replies; 11+ messages in thread
From: Daniel Wagner @ 2021-03-11 12:27 UTC (permalink / raw)
  To: Michel Macena Oliveira, linux-rt-users

Hi Michel,

On 10.03.21 21:07, Michel Macena Oliveira wrote:
> At least that's what I expected, but my latencies are much bigger than
> Cyclictest ones.
>  From cyclictest I get an average in my computer of  something between
> 2300 and 2500 nanoseconds.  However, in my application I'm having
> something between 47000 and 55000 nanoseconds. As you can see it is
> much higher!
> 
> I'm not sure of what I'm doing wrong, could you help or suggest something?

Check if your system uses power managment. cyclictest disables the power 
management by using the /dev/cpu_dma_latency API.

HTH,
Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-10 20:07 PREEMPT_RT benchmark Michel Macena Oliveira
  2021-03-11 12:27 ` Daniel Wagner
@ 2021-03-11 12:58 ` Ahmed S. Darwish
  2021-03-11 15:03   ` Michel Macena Oliveira
  1 sibling, 1 reply; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-03-11 12:58 UTC (permalink / raw)
  To: Michel Macena Oliveira; +Cc: linux-rt-users

On Wed, Mar 10, 2021 at 05:07:47PM -0300, Michel Macena Oliveira wrote:
>
> I'm currently developing a time benchmark application where I want
> to measure a real time thread latency. My application is based on
> Cyclictest from
> the rt-tests suit.
...
>
> I programmed in such way that it would be equivalent to Cyclict test
>
...
>
> At least that's what I expected, but my latencies are much bigger than
> Cyclictest ones.
>

I don't know the nature of the "time benchmark" application you are
writing.

Just a small hint that if you're doing any heavy OpenGL graphics within
that application, the latency can shoot up significantly. On some of our
Intel boxes, this can lead to an increase of 100-200 microseconds.

That's because since Intel Gen9+ GPUs, Intel shares the Last-Level Cache
(and thus also memory bandwidth) between the GPU and the CPU. This
benefits graphical applications, but hurts predicatbility significantly.

Good luck,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 12:27 ` Daniel Wagner
@ 2021-03-11 13:38   ` John Ogness
  2021-03-11 14:56   ` Michel Macena Oliveira
  1 sibling, 0 replies; 11+ messages in thread
From: John Ogness @ 2021-03-11 13:38 UTC (permalink / raw)
  To: Daniel Wagner, Michel Macena Oliveira, linux-rt-users

On 2021-03-11, Daniel Wagner <wagi@monom.org> wrote:
> On 10.03.21 21:07, Michel Macena Oliveira wrote:
>> At least that's what I expected, but my latencies are much bigger than
>> Cyclictest ones.
>>  From cyclictest I get an average in my computer of  something between
>> 2300 and 2500 nanoseconds.  However, in my application I'm having
>> something between 47000 and 55000 nanoseconds. As you can see it is
>> much higher!
>> 
>> I'm not sure of what I'm doing wrong, could you help or suggest something?
>
> Check if your system uses power managment. cyclictest disables the power 
> management by using the /dev/cpu_dma_latency API.

In addition, avoid file I/O in your real-time thread. Store your
calculations in a variable and let another thread read it and write to
disk. Then it would be more "equivalent to Cyclict test".

John Ogness

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 12:27 ` Daniel Wagner
  2021-03-11 13:38   ` John Ogness
@ 2021-03-11 14:56   ` Michel Macena Oliveira
  2021-03-11 16:17     ` Ahmed S. Darwish
  1 sibling, 1 reply; 11+ messages in thread
From: Michel Macena Oliveira @ 2021-03-11 14:56 UTC (permalink / raw)
  To: Daniel Wagner, linux-rt-users

Hi Daniel,
Thanks for your answer! That was the problem!
If I understand correctly  when Cyclictest writes to the
cpu_dma_latency file, the processor does not sleep and remain
full power all the time. Is there any problem writing a RT application
(not a benchmark one) with this trick?
Could it cause any harm to the system?
My idea is to develop a RT application in which I have a scheduled
thread to compute and deliver results in a given time or interval.

Michel

Em qui., 11 de mar. de 2021 às 09:27, Daniel Wagner <wagi@monom.org> escreveu:
>
> Hi Michel,
>
> On 10.03.21 21:07, Michel Macena Oliveira wrote:
> > At least that's what I expected, but my latencies are much bigger than
> > Cyclictest ones.
> >  From cyclictest I get an average in my computer of  something between
> > 2300 and 2500 nanoseconds.  However, in my application I'm having
> > something between 47000 and 55000 nanoseconds. As you can see it is
> > much higher!
> >
> > I'm not sure of what I'm doing wrong, could you help or suggest something?
>
> Check if your system uses power managment. cyclictest disables the power
> management by using the /dev/cpu_dma_latency API.
>
> HTH,
> Daniel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 12:58 ` Ahmed S. Darwish
@ 2021-03-11 15:03   ` Michel Macena Oliveira
  2021-03-11 16:01     ` Ahmed S. Darwish
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Macena Oliveira @ 2021-03-11 15:03 UTC (permalink / raw)
  To: Ahmed S. Darwish, linux-rt-users

Hi Ahmed,
thanks for the answer!
Do you know if it would be valid for GTK interface applications?

Michel

Em qui., 11 de mar. de 2021 às 09:58, Ahmed S. Darwish
<a.darwish@linutronix.de> escreveu:
>
> On Wed, Mar 10, 2021 at 05:07:47PM -0300, Michel Macena Oliveira wrote:
> >
> > I'm currently developing a time benchmark application where I want
> > to measure a real time thread latency. My application is based on
> > Cyclictest from
> > the rt-tests suit.
> ...
> >
> > I programmed in such way that it would be equivalent to Cyclict test
> >
> ...
> >
> > At least that's what I expected, but my latencies are much bigger than
> > Cyclictest ones.
> >
>
> I don't know the nature of the "time benchmark" application you are
> writing.
>
> Just a small hint that if you're doing any heavy OpenGL graphics within
> that application, the latency can shoot up significantly. On some of our
> Intel boxes, this can lead to an increase of 100-200 microseconds.
>
> That's because since Intel Gen9+ GPUs, Intel shares the Last-Level Cache
> (and thus also memory bandwidth) between the GPU and the CPU. This
> benefits graphical applications, but hurts predicatbility significantly.
>
> Good luck,
>
> --
> Ahmed S. Darwish
> Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 15:03   ` Michel Macena Oliveira
@ 2021-03-11 16:01     ` Ahmed S. Darwish
  0 siblings, 0 replies; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-03-11 16:01 UTC (permalink / raw)
  To: Michel Macena Oliveira; +Cc: linux-rt-users


[ No top-posting please. See https://daringfireball.net/2007/07/on_top ]

Macena Oliveira wrote:
>
> Do you know if it would be valid for GTK interface applications?
>

Yes, we notice latency increases even when a modern dekstop graphics
environment like GNOME is running but idle. Remember that both Windows
and modern Linux GTK environments uses the GPU — even for 2D rendering.

If you're interested, I presented a small talk about the topic here:

  https://linutronix.de/PDF/Realtime_and_graphics-acontradiction2021.pdf

Good luck,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 14:56   ` Michel Macena Oliveira
@ 2021-03-11 16:17     ` Ahmed S. Darwish
  2021-03-11 17:02       ` Michel Macena Oliveira
  0 siblings, 1 reply; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-03-11 16:17 UTC (permalink / raw)
  To: Michel Macena Oliveira; +Cc: Daniel Wagner, linux-rt-users

[ again, no top-posting please... ]

Michel Macena Oliveira wrote:
>
> If I understand correctly  when Cyclictest writes to the
> cpu_dma_latency file, the processor does not sleep and remain
> full power all the time. Is there any problem writing a RT application
> (not a benchmark one) with this trick?
>

Well, if you do it for the benchmark, but not for production, then of
course, the benchmark is meaningless.

>
> Could it cause any harm to the system?
>

Talk to your CPU provider.

Yes, sometimes CPU providers warn about that, especially for x86 (Note:
I don't speak for Intel or any other x86 manufacturer).

Good luck,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 16:17     ` Ahmed S. Darwish
@ 2021-03-11 17:02       ` Michel Macena Oliveira
  2021-03-11 17:56         ` Ahmed S. Darwish
  0 siblings, 1 reply; 11+ messages in thread
From: Michel Macena Oliveira @ 2021-03-11 17:02 UTC (permalink / raw)
  To: Ahmed S. Darwish, linux-rt-users

>[ again, no top-posting please... ]
Sorry for that,  I haven't noticed, but thanks for the advice.

>Well, if you do it for the benchmark, but not for production, then of
>course, the benchmark is meaningless.

Then in case of a RT application for production I should "live" with
this problem?

>If you're interested, I presented a small talk about the topic here:
  >https://linutronix.de/PDF/Realtime_and_graphics-acontradiction2021.pdf

Thanks, really helpful, but if I understand, in your application
you've used Intel CAT to mitigate the latency and plan to use Intel
MBA as well. Is that true?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 17:02       ` Michel Macena Oliveira
@ 2021-03-11 17:56         ` Ahmed S. Darwish
  2021-03-12 15:45           ` Michel Macena Oliveira
  0 siblings, 1 reply; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-03-11 17:56 UTC (permalink / raw)
  To: Michel Macena Oliveira; +Cc: linux-rt-users

Michel Macena Oliveira wrote:
> Ahmed S. Darwish wrote:
> > [ again, no top-posting please... ]
...
> >
> >
> > Well, if you do it for the benchmark, but not for production, then of
> > course, the benchmark is meaningless.
>
> Then in case of a RT application for production I should "live" with
> this problem?
>

Well, you should have an over-all realtime strategy for your product.

This includes thermal requirements for both CPU (communicated with the
provider, due to the possible removal of power-saving support in SW) and
system enclosure... then you know if you can "live with it" or not.

> > If you're interested, I presented a small talk about the topic here:
> > https://linutronix.de/PDF/Realtime_and_graphics-acontradiction2021.pdf
>
> Thanks, really helpful, but if I understand, in your application
> you've used Intel CAT to mitigate the latency and plan to use Intel
> MBA as well. Is that true?

Sorry for being picky, but this was part of another e-mail sub-thread.
You should've kept it there...

Anyway, the answer to your question is yes. You need to talk to Intel
for MBA support on non-Xeon CPUs though, as the instructions are only
available under an NDA. I cannot comment further on this.

Good luck,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: PREEMPT_RT benchmark
  2021-03-11 17:56         ` Ahmed S. Darwish
@ 2021-03-12 15:45           ` Michel Macena Oliveira
  0 siblings, 0 replies; 11+ messages in thread
From: Michel Macena Oliveira @ 2021-03-12 15:45 UTC (permalink / raw)
  To: Ahmed S. Darwish; +Cc: linux-rt-users

>Sorry for being picky, but this was part of another e-mail sub-thread.
>You should've kept it there...
That is fine!  I'm learning!

About the rest of the answer was really helpful!
I'll do my research!
Thanks anyway!

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-12 15:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-10 20:07 PREEMPT_RT benchmark Michel Macena Oliveira
2021-03-11 12:27 ` Daniel Wagner
2021-03-11 13:38   ` John Ogness
2021-03-11 14:56   ` Michel Macena Oliveira
2021-03-11 16:17     ` Ahmed S. Darwish
2021-03-11 17:02       ` Michel Macena Oliveira
2021-03-11 17:56         ` Ahmed S. Darwish
2021-03-12 15:45           ` Michel Macena Oliveira
2021-03-11 12:58 ` Ahmed S. Darwish
2021-03-11 15:03   ` Michel Macena Oliveira
2021-03-11 16:01     ` Ahmed S. Darwish

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.