linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rt-tests: cyclictest: Add option to specify main pid affinity
@ 2021-02-22 15:28 Jonathan Schwender
  2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw)
  To: jkacur, williams; +Cc: linux-rt-users


Hi John,

This patch adds the option --mainaffinity to specify the affinity of
the main pid. 
This is mainly useful if you want to bind the main thread to a 
different (e.g. housekeeping ) CPU than the measurement threads.

Regards 

Jonathan



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 0/2] rt-tests: cyclictest: Add option to specify main pid affinity
  2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender
@ 2021-02-22 15:28 ` Jonathan Schwender
  2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw)
  To: jkacur, williams; +Cc: linux-rt-users

Hi John,

This patch adds the option --mainaffinity to specify the affinity of
the main pid. 
This is mainly useful if you want to bind the main thread to a 
different (e.g. housekeeping ) CPU than the measurement threads.

Regards 

Jonathan

Jonathan Schwender (2):
  cyclictest: Move main pid setaffinity handling into a function
  cyclictest: Add --mainaffinity=[CPUSET] option.

 src/cyclictest/cyclictest.c | 39 ++++++++++++++++++++++++++++---------
 1 file changed, 30 insertions(+), 9 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function
  2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender
  2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender
@ 2021-02-22 15:28 ` Jonathan Schwender
  2021-02-23  5:12   ` John Kacur
  2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender
  2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish
  3 siblings, 1 reply; 11+ messages in thread
From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw)
  To: jkacur, williams; +Cc: linux-rt-users

Move error handling for setting the affinity of the main pid
into a separate function.
This prevents duplicating the code in the next commit,
where the main thread pid can be restricted to one of
two bitmasks depending on the passed parameters.

Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
 src/cyclictest/cyclictest.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
index c3d45f3..3cd592d 100644
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -1749,6 +1749,16 @@ static void write_stats(FILE *f, void *data)
 	fprintf(f, "  }\n");
 }
 
+static void set_main_thread_affinity(struct bitmask* cpumask) {
+	int res;
+
+	errno = 0;
+	res = numa_sched_setaffinity(getpid(), cpumask);
+	if (res != 0)
+		warn("Couldn't setaffinity in main thread: %s\n", strerror(errno));
+}
+
+
 int main(int argc, char **argv)
 {
 	sigset_t sigset;
@@ -1778,13 +1788,7 @@ int main(int argc, char **argv)
 
 	/* Restrict the main pid to the affinity specified by the user */
 	if (affinity_mask) {
-		int res;
-
-		errno = 0;
-		res = numa_sched_setaffinity(getpid(), affinity_mask);
-		if (res != 0)
-			warn("Couldn't setaffinity in main thread: %s\n", strerror(errno));
-
+		set_main_thread_affinity(affinity_mask);
 		if (verbose)
 			printf("Using %u cpus.\n",
 				numa_bitmask_weight(affinity_mask));
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option.
  2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender
  2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender
  2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender
@ 2021-02-22 15:28 ` Jonathan Schwender
  2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish
  3 siblings, 0 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw)
  To: jkacur, williams; +Cc: linux-rt-users

This allows the user to specify a separate cpuset for the main pid,
e.g. on a housekeeping CPU.
If --mainaffinity is not specified, but --affinity is, then the
current behaviour is preserved and the main pid is bound
to the cpuset specified by --affinity

Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
---
 src/cyclictest/cyclictest.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
index 3cd592d..803d19a 100644
--- a/src/cyclictest/cyclictest.c
+++ b/src/cyclictest/cyclictest.c
@@ -836,6 +836,8 @@ static void display_help(int error)
 	       "	 --laptop	   Save battery when running cyclictest\n"
 	       "			   This will give you poorer realtime results\n"
 	       "			   but will not drain your battery so quickly\n"
+	       "         --mainaffinity=[CPUSET]  Run the main thread on CPU #N. This only affects\n"
+	       "                           the main thread and not the measurement threads\n"
 	       "-m       --mlockall        lock current and future memory allocations\n"
 	       "-M       --refresh_on_max  delay updating the screen until a new max\n"
 	       "			   latency is hit. Useful for low bandwidth.\n"
@@ -891,6 +893,7 @@ static int quiet;
 static int interval = DEFAULT_INTERVAL;
 static int distance = -1;
 static struct bitmask *affinity_mask = NULL;
+static struct bitmask *main_affinity_mask = NULL;
 static int smp = 0;
 
 static int clocksources[] = {
@@ -943,7 +946,7 @@ enum option_values {
 	OPT_AFFINITY=1, OPT_BREAKTRACE, OPT_CLOCK,
 	OPT_DISTANCE, OPT_DURATION, OPT_LATENCY,
 	OPT_FIFO, OPT_HISTOGRAM, OPT_HISTOFALL, OPT_HISTFILE,
-	OPT_INTERVAL, OPT_LOOPS, OPT_MLOCKALL, OPT_REFRESH,
+	OPT_INTERVAL, OPT_LOOPS, OPT_MAINAFFINITY, OPT_MLOCKALL, OPT_REFRESH,
 	OPT_NANOSLEEP, OPT_NSECS, OPT_OSCOPE, OPT_PRIORITY,
 	OPT_QUIET, OPT_PRIOSPREAD, OPT_RELATIVE, OPT_RESOLUTION,
 	OPT_SYSTEM, OPT_SMP, OPT_THREADS, OPT_TRIGGER,
@@ -980,6 +983,7 @@ static void process_options(int argc, char *argv[], int max_cpus)
 			{"interval",         required_argument, NULL, OPT_INTERVAL },
 			{"laptop",	     no_argument,	NULL, OPT_LAPTOP },
 			{"loops",            required_argument, NULL, OPT_LOOPS },
+			{"mainaffinity",     required_argument, NULL, OPT_MAINAFFINITY},
 			{"mlockall",         no_argument,       NULL, OPT_MLOCKALL },
 			{"refresh_on_max",   no_argument,       NULL, OPT_REFRESH },
 			{"nsecs",            no_argument,       NULL, OPT_NSECS },
@@ -1071,6 +1075,16 @@ static void process_options(int argc, char *argv[], int max_cpus)
 		case 'l':
 		case OPT_LOOPS:
 			max_cycles = atoi(optarg); break;
+		case OPT_MAINAFFINITY:
+			if (optarg) {
+				parse_cpumask(optarg, max_cpus, &main_affinity_mask);
+			} else if (optind < argc &&
+			           (atoi(argv[optind]) ||
+			            argv[optind][0] == '0' ||
+			            argv[optind][0] == '!')) {
+				parse_cpumask(argv[optind], max_cpus, &main_affinity_mask);
+			}
+			break;
 		case 'm':
 		case OPT_MLOCKALL:
 			lockall = 1; break;
@@ -1787,7 +1801,10 @@ int main(int argc, char **argv)
 	}
 
 	/* Restrict the main pid to the affinity specified by the user */
-	if (affinity_mask) {
+	if (main_affinity_mask){
+		set_main_thread_affinity(main_affinity_mask);
+	}
+	else if (affinity_mask) {
 		set_main_thread_affinity(affinity_mask);
 		if (verbose)
 			printf("Using %u cpus.\n",
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender
                   ` (2 preceding siblings ...)
  2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender
@ 2021-02-22 16:20 ` Ahmed S. Darwish
  2021-02-22 17:05   ` Jonathan Schwender
  2021-03-21 17:11   ` Jonathan Schwender
  3 siblings, 2 replies; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-02-22 16:20 UTC (permalink / raw)
  To: Jonathan Schwender; +Cc: jkacur, williams, linux-rt-users

On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote:
>
> Hi John,
>
> This patch adds the option --mainaffinity to specify the affinity of
> the main pid.
> This is mainly useful if you want to bind the main thread to a
> different (e.g. housekeeping ) CPU than the measurement threads.
>

Pardon my ignorance; can you please specify why is this important?
The measurement threads have an RT priority while the main thread is
SCHED_OTHER. So why would the cyclictest measurements really be affected
by the main thread (unless there's a preempt_rt bug)?

Do you also have any numbers showing different results with/without
"--mainaffinity"?

Thanks,

--
Ahmed S. Darwish

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish
@ 2021-02-22 17:05   ` Jonathan Schwender
  2021-03-21 17:11   ` Jonathan Schwender
  1 sibling, 0 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-02-22 17:05 UTC (permalink / raw)
  To: Ahmed S. Darwish; +Cc: jkacur, williams, linux-rt-users

On 2/22/21 5:20 PM, Ahmed S. Darwish wrote:
> On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote:
>> Hi John,
>>
>> This patch adds the option --mainaffinity to specify the affinity of
>> the main pid.
>> This is mainly useful if you want to bind the main thread to a
>> different (e.g. housekeeping ) CPU than the measurement threads.
>>
> Pardon my ignorance; can you please specify why is this important?
> The measurement threads have an RT priority while the main thread is
> SCHED_OTHER. So why would the cyclictest measurements really be affected
> by the main thread (unless there's a preempt_rt bug)?

The option is intended for measuring on isolated CPUs (via isolcpus or 
cpusets).

The RT wiki cyclictest FAQ entry "How can the influence of Cyclictest be 
minimized when evaluating latencies on an isolated set of CPUs?" [1]  
recommends to pin the main thread to a non-isolated CPU since
that reduces context switches.

[1] 
https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/faq


> Do you also have any numbers showing different results with/without
> "--mainaffinity"?
  Sorry, I don't have any numbers, but I've put it on my todo-list.
>
> Thanks,
>
> --
> Ahmed S. Darwish
Jonathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function
  2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender
@ 2021-02-23  5:12   ` John Kacur
  0 siblings, 0 replies; 11+ messages in thread
From: John Kacur @ 2021-02-23  5:12 UTC (permalink / raw)
  To: Jonathan Schwender; +Cc: williams, linux-rt-users



On Mon, 22 Feb 2021, Jonathan Schwender wrote:

> Move error handling for setting the affinity of the main pid
> into a separate function.
> This prevents duplicating the code in the next commit,
> where the main thread pid can be restricted to one of
> two bitmasks depending on the passed parameters.
> 
> Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com>
> ---
>  src/cyclictest/cyclictest.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c
> index c3d45f3..3cd592d 100644
> --- a/src/cyclictest/cyclictest.c
> +++ b/src/cyclictest/cyclictest.c
> @@ -1749,6 +1749,16 @@ static void write_stats(FILE *f, void *data)
>  	fprintf(f, "  }\n");
>  }
>  
> +static void set_main_thread_affinity(struct bitmask* cpumask) {
> +	int res;
> +
> +	errno = 0;
> +	res = numa_sched_setaffinity(getpid(), cpumask);
> +	if (res != 0)
> +		warn("Couldn't setaffinity in main thread: %s\n", strerror(errno));
> +}
> +
> +

Maybe this would be better in src/lib/rt-numa.c ?
Note your brace style is inconsistent with the rest of the suite.
We try to follow the linux kernel style, where it makes sense.

>  int main(int argc, char **argv)
>  {
>  	sigset_t sigset;
> @@ -1778,13 +1788,7 @@ int main(int argc, char **argv)
>  
>  	/* Restrict the main pid to the affinity specified by the user */
>  	if (affinity_mask) {
> -		int res;
> -
> -		errno = 0;
> -		res = numa_sched_setaffinity(getpid(), affinity_mask);
> -		if (res != 0)
> -			warn("Couldn't setaffinity in main thread: %s\n", strerror(errno));
> -
> +		set_main_thread_affinity(affinity_mask);
>  		if (verbose)
>  			printf("Using %u cpus.\n",
>  				numa_bitmask_weight(affinity_mask));
> -- 
> 2.29.2
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish
  2021-02-22 17:05   ` Jonathan Schwender
@ 2021-03-21 17:11   ` Jonathan Schwender
  2021-03-23 16:51     ` John Kacur
  2021-03-24  9:32     ` Ahmed S. Darwish
  1 sibling, 2 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-03-21 17:11 UTC (permalink / raw)
  To: Ahmed S. Darwish; +Cc: jkacur, williams, linux-rt-users


On 2/22/21 5:20 PM, Ahmed S. Darwish wrote:
> On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote:
>> Hi John,
>>
>> This patch adds the option --mainaffinity to specify the affinity of
>> the main pid.
>> This is mainly useful if you want to bind the main thread to a
>> different (e.g. housekeeping ) CPU than the measurement threads.
>>
> Do you also have any numbers showing different results with/without
> "--mainaffinity"?

Sorry for the delay. I do have some numbers now, and there is a benefit 
to using this option if CPU isolation + CAT are used. Otherwise it's not 
really visible.

Rendered Markdown: 
https://gist.github.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e

# Testing proposed cyclictest --mainaffinity option

System information:
- 2 socket Intel Xeon E5-2643 v4 @ 3.40Ghz
- Turbo boost is disabled.
- Fedora 33 with kernel 5.10.1-rt20 with small patch (see cmdline isolcpus)
- Cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,3,5,7,9,11 rcu_nocbs=3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0 systemd.unified_cgroup_hierarchy=0`
     - The additional isolcpus arguments set the HK_FLAG with the corresponding name.
       This cmdline adds all HK_FLAGs usually set by nohz_full, except the actual
       nohz flags `tick` and `timer`. This improves cyclictest latencies on my system.
- Rteval is running on all CPUs from node 0 + CPU 1, but not on the isolated CPUs.
- L3 Cache is reserved for the isolated CPUs via `resctrl` (CPU based allocation)
- Test duration 24 hours, interval 200 µs


## Test 1: 5 cyclictest instance with main pid on same cpu as the measurement thread

This test simply starts 5 cyclictest instances (via numactl) with one measurement thread each and bound to a single
CPU via `--affinity`, so that the main thread is also bound to the same CPU.

![Figure: 5 cyclictest instances with main pid pinned to same CPU as measurement thread](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/e27c3f284cf4bbeecded84865dfee5676b47fe88/2021-03-11.png)

## Test 2: Single cyclictest instance with --mainaffinity=1 for isolated CPUs 3,5,7,9,11
The main thread was placed on CPU 1 via `--mainaffinity` and `--refresh_on_max` was added for good measure to keep the
logfile small.

![Figure: Single cyclictest instance with --mainaffinity=1 for isolated CPUs 3,5,7,9,11](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/afd81b2a70a3e88bdebc46615d0f60e24238b405/2021-03-19.png)

>
> Thanks,
>
> --
> Ahmed S. Darwish
Jonathan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-03-21 17:11   ` Jonathan Schwender
@ 2021-03-23 16:51     ` John Kacur
  2021-03-24  9:32     ` Ahmed S. Darwish
  1 sibling, 0 replies; 11+ messages in thread
From: John Kacur @ 2021-03-23 16:51 UTC (permalink / raw)
  To: Jonathan Schwender; +Cc: Ahmed S. Darwish, williams, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 2976 bytes --]



On Sun, 21 Mar 2021, Jonathan Schwender wrote:

> 
> On 2/22/21 5:20 PM, Ahmed S. Darwish wrote:
> > On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote:
> >> Hi John,
> >>
> >> This patch adds the option --mainaffinity to specify the affinity of
> >> the main pid.
> >> This is mainly useful if you want to bind the main thread to a
> >> different (e.g. housekeeping ) CPU than the measurement threads.
> >>
> > Do you also have any numbers showing different results with/without
> > "--mainaffinity"?
> 
> Sorry for the delay. I do have some numbers now, and there is a benefit to
> using this option if CPU isolation + CAT are used. Otherwise it's not really
> visible.
> 
> Rendered Markdown:
> https://gist.github.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e
> 
> # Testing proposed cyclictest --mainaffinity option
> 
> System information:
> - 2 socket Intel Xeon E5-2643 v4 @ 3.40Ghz
> - Turbo boost is disabled.
> - Fedora 33 with kernel 5.10.1-rt20 with small patch (see cmdline isolcpus)
> - Cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,3,5,7,9,11
> rcu_nocbs=3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog
> tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0
> systemd.unified_cgroup_hierarchy=0`
>     - The additional isolcpus arguments set the HK_FLAG with the corresponding
>     name.
>       This cmdline adds all HK_FLAGs usually set by nohz_full, except the
>       actual
>       nohz flags `tick` and `timer`. This improves cyclictest latencies on my
>       system.
> - Rteval is running on all CPUs from node 0 + CPU 1, but not on the isolated
> CPUs.
> - L3 Cache is reserved for the isolated CPUs via `resctrl` (CPU based
> allocation)
> - Test duration 24 hours, interval 200 µs
> 
> 
> ## Test 1: 5 cyclictest instance with main pid on same cpu as the measurement
> thread
> 
> This test simply starts 5 cyclictest instances (via numactl) with one
> measurement thread each and bound to a single
> CPU via `--affinity`, so that the main thread is also bound to the same CPU.
> 
> ![Figure: 5 cyclictest instances with main pid pinned to same CPU as
> measurement
> thread](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/e27c3f284cf4bbeecded84865dfee5676b47fe88/2021-03-11.png)
> 
> ## Test 2: Single cyclictest instance with --mainaffinity=1 for isolated CPUs
> 3,5,7,9,11
> The main thread was placed on CPU 1 via `--mainaffinity` and
> `--refresh_on_max` was added for good measure to keep the
> logfile small.
> 
> ![Figure: Single cyclictest instance with --mainaffinity=1 for isolated CPUs
> 3,5,7,9,11](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/afd81b2a70a3e88bdebc46615d0f60e24238b405/2021-03-19.png)
> 
> >
> > Thanks,
> >
> > --
> > Ahmed S. Darwish
> Jonathan
> 
> 

Alright, I can imagine how this could be useful. Would you respin the 
patches against the latest upstream and resend to me?

Thanks

John

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-03-21 17:11   ` Jonathan Schwender
  2021-03-23 16:51     ` John Kacur
@ 2021-03-24  9:32     ` Ahmed S. Darwish
  2021-03-29 14:37       ` Jonathan Schwender
  1 sibling, 1 reply; 11+ messages in thread
From: Ahmed S. Darwish @ 2021-03-24  9:32 UTC (permalink / raw)
  To: Jonathan Schwender; +Cc: jkacur, williams, linux-rt-users

Hi Jonathan,

On Mar 21, 2021, Jonathan Schwender wrote:
>
> On 2/22/21 5:20 PM, Ahmed S. Darwish wrote:
> >
> > Do you also have any numbers showing different results
> > with/without "--mainaffinity"?
>
> Sorry for the delay. I do have some numbers now, and there is a
> benefit to using this option if CPU isolation + CAT are
> used. Otherwise it's not really visible.
>

Thanks a lot for the results.

Since I'm doing some CAT-related stuff on RT tasks vs. GPU workloads,
I'm curious, how much was the benefit of CAT ON/OFF?

In your benchmarks you show that the combination of --mainaffinity, CPU
isolation, and CAT, improves worst case latency by 2 micro seconds. If
you keep everything as-is, but disable only CAT, how much change happens
in the results?

Also, how many classes of service (CLOS) your CPU has? How was the cache
bitmask divided vis-a-vis the available CLOSes? And did you assign
isolated CPUs to one CLOS, and non-isolated CPUs to a different CLOS? Or
was the division more granular?

Kind regards,

--
Ahmed S. Darwish
Linutronix GmbH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: rt-tests: cyclictest: Add option to specify main pid affinity
  2021-03-24  9:32     ` Ahmed S. Darwish
@ 2021-03-29 14:37       ` Jonathan Schwender
  0 siblings, 0 replies; 11+ messages in thread
From: Jonathan Schwender @ 2021-03-29 14:37 UTC (permalink / raw)
  To: Ahmed S. Darwish; +Cc: linux-rt-users

Hi Ahmed,


On 3/24/21 10:32 AM, Ahmed S. Darwish wrote:
> Hi Jonathan,
>
>
> Since I'm doing some CAT-related stuff on RT tasks vs. GPU workloads,
> I'm curious, how much was the benefit of CAT ON/OFF?

I'm assuming you're testing iGPU workloads and not on a dedicated GPU 
since you are mentioning CAT. Or is there any benefit of using CAT with 
a dedicated GPU?


> In your benchmarks you show that the combination of --mainaffinity, CPU
> isolation, and CAT, improves worst case latency by 2 micro seconds. If
> you keep everything as-is, but disable only CAT, how much change happens
> in the results?

First I'd like to mention that my test system had an inclusive 
cache-architecture. I'd guess that the difference between CAT and no CAT 
is smaller for exclusive or non-inclusive caches (assuming cyclictest is 
running on an isolated CPU).

So the results will depend on the amount of isolated CPUs and how much 
of the shared L3 cache the load on housekeeping CPU uses.

Rendered Markdown: 
https://gist.github.com/jschwe/3502dbf1e56c85e9bf1a340041885b33

# Isolation capabilities without CAT

## Test 2021-01-31 - Isolate all CPUs on NUMA node 1

The figure below shows a worst-case latency of 4 microseconds
measured by cyclictest on the isolated CPUs on NUMA node 1.

cmdline: `nosmt 
isolcpus=domain,managed_irq,wq,rcu,misc,kthread,1,3,5,7,9,11 
rcu_nocbs=1,3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll 
nowatchdog tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0`

Test parameters: `sudo taskset -c 0-11 rteval --duration=24h 
--loads-cpulist=0,2,4,6,8,10 --measurement-cpulist=0-11`

![Figure: Latency of completely isolated node vs housekeeping 
node](https://gist.githubusercontent.com/jschwe/3502dbf1e56c85e9bf1a340041885b33/raw/962244e4e5309507feb0b4ec0627efbabe064c85/2021-01-31.png)


## Test 2021-02-01 - Isolate only CPU 11

The figure below shows a worst-case latency of 11 microseconds for the 
isolated CPU 11.
Interestingly, the worst-case latencies also increased for the 
housekeeping CPUs with respect
to the previous test.
It is consistent with other tests I made though, and the worst-case 
latency of the housekeeping CPUs is reduced
if I isolate all or all-but-one CPUs on node 1.

cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,11 
rcu_nocbs=11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog 
tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0`

Test parameters: `sudo taskset -c 0-11 rteval --duration=24h 
--loads-cpulist=0-10 --measurement-cpulist=0-11`

![Figure: CPU 11 latency with load on neighboring 
CPUs](https://gist.githubusercontent.com/jschwe/3502dbf1e56c85e9bf1a340041885b33/raw/962244e4e5309507feb0b4ec0627efbabe064c85/2021-02-01.png)

Note: The error bars show the unbiased standard error of the mean

> Also, how many classes of service (CLOS) your CPU has? How was the cache
> bitmask divided vis-a-vis the available CLOSes? And did you assign
> isolated CPUs to one CLOS, and non-isolated CPUs to a different CLOS? Or
> was the division more granular?

I don't have access to the system anymore, but I think it had 8 CLOS 
available (according to resctrl).

I always used exclusive bitmasks. I mostly used one CLOS for the 
isolated CPUs, the default CLOS, and sometimes an additional CLOS for 
tid-based CAT.Due to the "exclusive" setting in resctrl I had to take 
away one way of the node 0 cache, even for CLOS that were only intended 
for node 1, which is a bit unfortunate.

I also tested tid-based vs. CPU based CAT on isolated CPUs and the 
take-away was it doesn't matter too much:

tid based CAT visibly (negatively) impacts the best-case latencies (1 
micro-second bin). However, the differences regarding the worst-case 
latencies were minor.

In one test, I used CDP to reserve 4-ways (4 MiB) for each code and data 
(so 8-way total) for 1 cyclictest instance (with 3 measurement threads). 
For CPU-based CAT the utilization oscillated between 0.98MB and 1.11MB. 
For tid-based CAT, the utilization oscillated between 98kB and 163kB.

In the next test I only used CAT to reserve 2-ways (2 MiB) shared 
between code and data,  also for 1 cyclictest instance with 3 
measurement threads. In this case the CPU-based approach utilized 
between 0.45MB and 0.85MB of the reserved L3 cache, but the latencies 
measured by cyclictest were basically unchanged. The tid-based approach 
actually had a utilization of 0. I'm assuming that's because more L3 was 
available to the default CLOS, and the relevant cache-lines were never 
evicted from that part of the L3 cache, so the reservation didn't even 
come in to play there.


> Kind regards,
>
> --
> Ahmed S. Darwish
> Linutronix GmbH

Best regards


Jonathan Schwender


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-29 14:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender
2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender
2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender
2021-02-23  5:12   ` John Kacur
2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender
2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish
2021-02-22 17:05   ` Jonathan Schwender
2021-03-21 17:11   ` Jonathan Schwender
2021-03-23 16:51     ` John Kacur
2021-03-24  9:32     ` Ahmed S. Darwish
2021-03-29 14:37       ` Jonathan Schwender

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).