* rt-tests: cyclictest: Add option to specify main pid affinity @ 2021-02-22 15:28 Jonathan Schwender 2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw) To: jkacur, williams; +Cc: linux-rt-users Hi John, This patch adds the option --mainaffinity to specify the affinity of the main pid. This is mainly useful if you want to bind the main thread to a different (e.g. housekeeping ) CPU than the measurement threads. Regards Jonathan ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 0/2] rt-tests: cyclictest: Add option to specify main pid affinity 2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender @ 2021-02-22 15:28 ` Jonathan Schwender 2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender ` (2 subsequent siblings) 3 siblings, 0 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw) To: jkacur, williams; +Cc: linux-rt-users Hi John, This patch adds the option --mainaffinity to specify the affinity of the main pid. This is mainly useful if you want to bind the main thread to a different (e.g. housekeeping ) CPU than the measurement threads. Regards Jonathan Jonathan Schwender (2): cyclictest: Move main pid setaffinity handling into a function cyclictest: Add --mainaffinity=[CPUSET] option. src/cyclictest/cyclictest.c | 39 ++++++++++++++++++++++++++++--------- 1 file changed, 30 insertions(+), 9 deletions(-) -- 2.29.2 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function 2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender 2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender @ 2021-02-22 15:28 ` Jonathan Schwender 2021-02-23 5:12 ` John Kacur 2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender 2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish 3 siblings, 1 reply; 11+ messages in thread From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw) To: jkacur, williams; +Cc: linux-rt-users Move error handling for setting the affinity of the main pid into a separate function. This prevents duplicating the code in the next commit, where the main thread pid can be restricted to one of two bitmasks depending on the passed parameters. Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com> --- src/cyclictest/cyclictest.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c index c3d45f3..3cd592d 100644 --- a/src/cyclictest/cyclictest.c +++ b/src/cyclictest/cyclictest.c @@ -1749,6 +1749,16 @@ static void write_stats(FILE *f, void *data) fprintf(f, " }\n"); } +static void set_main_thread_affinity(struct bitmask* cpumask) { + int res; + + errno = 0; + res = numa_sched_setaffinity(getpid(), cpumask); + if (res != 0) + warn("Couldn't setaffinity in main thread: %s\n", strerror(errno)); +} + + int main(int argc, char **argv) { sigset_t sigset; @@ -1778,13 +1788,7 @@ int main(int argc, char **argv) /* Restrict the main pid to the affinity specified by the user */ if (affinity_mask) { - int res; - - errno = 0; - res = numa_sched_setaffinity(getpid(), affinity_mask); - if (res != 0) - warn("Couldn't setaffinity in main thread: %s\n", strerror(errno)); - + set_main_thread_affinity(affinity_mask); if (verbose) printf("Using %u cpus.\n", numa_bitmask_weight(affinity_mask)); -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function 2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender @ 2021-02-23 5:12 ` John Kacur 0 siblings, 0 replies; 11+ messages in thread From: John Kacur @ 2021-02-23 5:12 UTC (permalink / raw) To: Jonathan Schwender; +Cc: williams, linux-rt-users On Mon, 22 Feb 2021, Jonathan Schwender wrote: > Move error handling for setting the affinity of the main pid > into a separate function. > This prevents duplicating the code in the next commit, > where the main thread pid can be restricted to one of > two bitmasks depending on the passed parameters. > > Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com> > --- > src/cyclictest/cyclictest.c | 18 +++++++++++------- > 1 file changed, 11 insertions(+), 7 deletions(-) > > diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c > index c3d45f3..3cd592d 100644 > --- a/src/cyclictest/cyclictest.c > +++ b/src/cyclictest/cyclictest.c > @@ -1749,6 +1749,16 @@ static void write_stats(FILE *f, void *data) > fprintf(f, " }\n"); > } > > +static void set_main_thread_affinity(struct bitmask* cpumask) { > + int res; > + > + errno = 0; > + res = numa_sched_setaffinity(getpid(), cpumask); > + if (res != 0) > + warn("Couldn't setaffinity in main thread: %s\n", strerror(errno)); > +} > + > + Maybe this would be better in src/lib/rt-numa.c ? Note your brace style is inconsistent with the rest of the suite. We try to follow the linux kernel style, where it makes sense. > int main(int argc, char **argv) > { > sigset_t sigset; > @@ -1778,13 +1788,7 @@ int main(int argc, char **argv) > > /* Restrict the main pid to the affinity specified by the user */ > if (affinity_mask) { > - int res; > - > - errno = 0; > - res = numa_sched_setaffinity(getpid(), affinity_mask); > - if (res != 0) > - warn("Couldn't setaffinity in main thread: %s\n", strerror(errno)); > - > + set_main_thread_affinity(affinity_mask); > if (verbose) > printf("Using %u cpus.\n", > numa_bitmask_weight(affinity_mask)); > -- > 2.29.2 > > ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option. 2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender 2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender 2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender @ 2021-02-22 15:28 ` Jonathan Schwender 2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish 3 siblings, 0 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-02-22 15:28 UTC (permalink / raw) To: jkacur, williams; +Cc: linux-rt-users This allows the user to specify a separate cpuset for the main pid, e.g. on a housekeeping CPU. If --mainaffinity is not specified, but --affinity is, then the current behaviour is preserved and the main pid is bound to the cpuset specified by --affinity Signed-off-by: Jonathan Schwender <schwenderjonathan@gmail.com> --- src/cyclictest/cyclictest.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/cyclictest/cyclictest.c b/src/cyclictest/cyclictest.c index 3cd592d..803d19a 100644 --- a/src/cyclictest/cyclictest.c +++ b/src/cyclictest/cyclictest.c @@ -836,6 +836,8 @@ static void display_help(int error) " --laptop Save battery when running cyclictest\n" " This will give you poorer realtime results\n" " but will not drain your battery so quickly\n" + " --mainaffinity=[CPUSET] Run the main thread on CPU #N. This only affects\n" + " the main thread and not the measurement threads\n" "-m --mlockall lock current and future memory allocations\n" "-M --refresh_on_max delay updating the screen until a new max\n" " latency is hit. Useful for low bandwidth.\n" @@ -891,6 +893,7 @@ static int quiet; static int interval = DEFAULT_INTERVAL; static int distance = -1; static struct bitmask *affinity_mask = NULL; +static struct bitmask *main_affinity_mask = NULL; static int smp = 0; static int clocksources[] = { @@ -943,7 +946,7 @@ enum option_values { OPT_AFFINITY=1, OPT_BREAKTRACE, OPT_CLOCK, OPT_DISTANCE, OPT_DURATION, OPT_LATENCY, OPT_FIFO, OPT_HISTOGRAM, OPT_HISTOFALL, OPT_HISTFILE, - OPT_INTERVAL, OPT_LOOPS, OPT_MLOCKALL, OPT_REFRESH, + OPT_INTERVAL, OPT_LOOPS, OPT_MAINAFFINITY, OPT_MLOCKALL, OPT_REFRESH, OPT_NANOSLEEP, OPT_NSECS, OPT_OSCOPE, OPT_PRIORITY, OPT_QUIET, OPT_PRIOSPREAD, OPT_RELATIVE, OPT_RESOLUTION, OPT_SYSTEM, OPT_SMP, OPT_THREADS, OPT_TRIGGER, @@ -980,6 +983,7 @@ static void process_options(int argc, char *argv[], int max_cpus) {"interval", required_argument, NULL, OPT_INTERVAL }, {"laptop", no_argument, NULL, OPT_LAPTOP }, {"loops", required_argument, NULL, OPT_LOOPS }, + {"mainaffinity", required_argument, NULL, OPT_MAINAFFINITY}, {"mlockall", no_argument, NULL, OPT_MLOCKALL }, {"refresh_on_max", no_argument, NULL, OPT_REFRESH }, {"nsecs", no_argument, NULL, OPT_NSECS }, @@ -1071,6 +1075,16 @@ static void process_options(int argc, char *argv[], int max_cpus) case 'l': case OPT_LOOPS: max_cycles = atoi(optarg); break; + case OPT_MAINAFFINITY: + if (optarg) { + parse_cpumask(optarg, max_cpus, &main_affinity_mask); + } else if (optind < argc && + (atoi(argv[optind]) || + argv[optind][0] == '0' || + argv[optind][0] == '!')) { + parse_cpumask(argv[optind], max_cpus, &main_affinity_mask); + } + break; case 'm': case OPT_MLOCKALL: lockall = 1; break; @@ -1787,7 +1801,10 @@ int main(int argc, char **argv) } /* Restrict the main pid to the affinity specified by the user */ - if (affinity_mask) { + if (main_affinity_mask){ + set_main_thread_affinity(main_affinity_mask); + } + else if (affinity_mask) { set_main_thread_affinity(affinity_mask); if (verbose) printf("Using %u cpus.\n", -- 2.29.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender ` (2 preceding siblings ...) 2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender @ 2021-02-22 16:20 ` Ahmed S. Darwish 2021-02-22 17:05 ` Jonathan Schwender 2021-03-21 17:11 ` Jonathan Schwender 3 siblings, 2 replies; 11+ messages in thread From: Ahmed S. Darwish @ 2021-02-22 16:20 UTC (permalink / raw) To: Jonathan Schwender; +Cc: jkacur, williams, linux-rt-users On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote: > > Hi John, > > This patch adds the option --mainaffinity to specify the affinity of > the main pid. > This is mainly useful if you want to bind the main thread to a > different (e.g. housekeeping ) CPU than the measurement threads. > Pardon my ignorance; can you please specify why is this important? The measurement threads have an RT priority while the main thread is SCHED_OTHER. So why would the cyclictest measurements really be affected by the main thread (unless there's a preempt_rt bug)? Do you also have any numbers showing different results with/without "--mainaffinity"? Thanks, -- Ahmed S. Darwish ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish @ 2021-02-22 17:05 ` Jonathan Schwender 2021-03-21 17:11 ` Jonathan Schwender 1 sibling, 0 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-02-22 17:05 UTC (permalink / raw) To: Ahmed S. Darwish; +Cc: jkacur, williams, linux-rt-users On 2/22/21 5:20 PM, Ahmed S. Darwish wrote: > On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote: >> Hi John, >> >> This patch adds the option --mainaffinity to specify the affinity of >> the main pid. >> This is mainly useful if you want to bind the main thread to a >> different (e.g. housekeeping ) CPU than the measurement threads. >> > Pardon my ignorance; can you please specify why is this important? > The measurement threads have an RT priority while the main thread is > SCHED_OTHER. So why would the cyclictest measurements really be affected > by the main thread (unless there's a preempt_rt bug)? The option is intended for measuring on isolated CPUs (via isolcpus or cpusets). The RT wiki cyclictest FAQ entry "How can the influence of Cyclictest be minimized when evaluating latencies on an isolated set of CPUs?" [1] recommends to pin the main thread to a non-isolated CPU since that reduces context switches. [1] https://wiki.linuxfoundation.org/realtime/documentation/howto/tools/cyclictest/faq > Do you also have any numbers showing different results with/without > "--mainaffinity"? Sorry, I don't have any numbers, but I've put it on my todo-list. > > Thanks, > > -- > Ahmed S. Darwish Jonathan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish 2021-02-22 17:05 ` Jonathan Schwender @ 2021-03-21 17:11 ` Jonathan Schwender 2021-03-23 16:51 ` John Kacur 2021-03-24 9:32 ` Ahmed S. Darwish 1 sibling, 2 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-03-21 17:11 UTC (permalink / raw) To: Ahmed S. Darwish; +Cc: jkacur, williams, linux-rt-users On 2/22/21 5:20 PM, Ahmed S. Darwish wrote: > On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote: >> Hi John, >> >> This patch adds the option --mainaffinity to specify the affinity of >> the main pid. >> This is mainly useful if you want to bind the main thread to a >> different (e.g. housekeeping ) CPU than the measurement threads. >> > Do you also have any numbers showing different results with/without > "--mainaffinity"? Sorry for the delay. I do have some numbers now, and there is a benefit to using this option if CPU isolation + CAT are used. Otherwise it's not really visible. Rendered Markdown: https://gist.github.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e # Testing proposed cyclictest --mainaffinity option System information: - 2 socket Intel Xeon E5-2643 v4 @ 3.40Ghz - Turbo boost is disabled. - Fedora 33 with kernel 5.10.1-rt20 with small patch (see cmdline isolcpus) - Cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,3,5,7,9,11 rcu_nocbs=3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0 systemd.unified_cgroup_hierarchy=0` - The additional isolcpus arguments set the HK_FLAG with the corresponding name. This cmdline adds all HK_FLAGs usually set by nohz_full, except the actual nohz flags `tick` and `timer`. This improves cyclictest latencies on my system. - Rteval is running on all CPUs from node 0 + CPU 1, but not on the isolated CPUs. - L3 Cache is reserved for the isolated CPUs via `resctrl` (CPU based allocation) - Test duration 24 hours, interval 200 µs ## Test 1: 5 cyclictest instance with main pid on same cpu as the measurement thread This test simply starts 5 cyclictest instances (via numactl) with one measurement thread each and bound to a single CPU via `--affinity`, so that the main thread is also bound to the same CPU. ![Figure: 5 cyclictest instances with main pid pinned to same CPU as measurement thread](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/e27c3f284cf4bbeecded84865dfee5676b47fe88/2021-03-11.png) ## Test 2: Single cyclictest instance with --mainaffinity=1 for isolated CPUs 3,5,7,9,11 The main thread was placed on CPU 1 via `--mainaffinity` and `--refresh_on_max` was added for good measure to keep the logfile small. ![Figure: Single cyclictest instance with --mainaffinity=1 for isolated CPUs 3,5,7,9,11](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/afd81b2a70a3e88bdebc46615d0f60e24238b405/2021-03-19.png) > > Thanks, > > -- > Ahmed S. Darwish Jonathan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-03-21 17:11 ` Jonathan Schwender @ 2021-03-23 16:51 ` John Kacur 2021-03-24 9:32 ` Ahmed S. Darwish 1 sibling, 0 replies; 11+ messages in thread From: John Kacur @ 2021-03-23 16:51 UTC (permalink / raw) To: Jonathan Schwender; +Cc: Ahmed S. Darwish, williams, linux-rt-users [-- Attachment #1: Type: text/plain, Size: 2976 bytes --] On Sun, 21 Mar 2021, Jonathan Schwender wrote: > > On 2/22/21 5:20 PM, Ahmed S. Darwish wrote: > > On Mon, Feb 22, 2021 at 04:28:30PM +0100, Jonathan Schwender wrote: > >> Hi John, > >> > >> This patch adds the option --mainaffinity to specify the affinity of > >> the main pid. > >> This is mainly useful if you want to bind the main thread to a > >> different (e.g. housekeeping ) CPU than the measurement threads. > >> > > Do you also have any numbers showing different results with/without > > "--mainaffinity"? > > Sorry for the delay. I do have some numbers now, and there is a benefit to > using this option if CPU isolation + CAT are used. Otherwise it's not really > visible. > > Rendered Markdown: > https://gist.github.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e > > # Testing proposed cyclictest --mainaffinity option > > System information: > - 2 socket Intel Xeon E5-2643 v4 @ 3.40Ghz > - Turbo boost is disabled. > - Fedora 33 with kernel 5.10.1-rt20 with small patch (see cmdline isolcpus) > - Cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,3,5,7,9,11 > rcu_nocbs=3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog > tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0 > systemd.unified_cgroup_hierarchy=0` > - The additional isolcpus arguments set the HK_FLAG with the corresponding > name. > This cmdline adds all HK_FLAGs usually set by nohz_full, except the > actual > nohz flags `tick` and `timer`. This improves cyclictest latencies on my > system. > - Rteval is running on all CPUs from node 0 + CPU 1, but not on the isolated > CPUs. > - L3 Cache is reserved for the isolated CPUs via `resctrl` (CPU based > allocation) > - Test duration 24 hours, interval 200 µs > > > ## Test 1: 5 cyclictest instance with main pid on same cpu as the measurement > thread > > This test simply starts 5 cyclictest instances (via numactl) with one > measurement thread each and bound to a single > CPU via `--affinity`, so that the main thread is also bound to the same CPU. > > ![Figure: 5 cyclictest instances with main pid pinned to same CPU as > measurement > thread](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/e27c3f284cf4bbeecded84865dfee5676b47fe88/2021-03-11.png) > > ## Test 2: Single cyclictest instance with --mainaffinity=1 for isolated CPUs > 3,5,7,9,11 > The main thread was placed on CPU 1 via `--mainaffinity` and > `--refresh_on_max` was added for good measure to keep the > logfile small. > > ![Figure: Single cyclictest instance with --mainaffinity=1 for isolated CPUs > 3,5,7,9,11](https://gist.githubusercontent.com/jschwe/d4c46026aec57b10a2b0e6f72258b96e/raw/afd81b2a70a3e88bdebc46615d0f60e24238b405/2021-03-19.png) > > > > > Thanks, > > > > -- > > Ahmed S. Darwish > Jonathan > > Alright, I can imagine how this could be useful. Would you respin the patches against the latest upstream and resend to me? Thanks John ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-03-21 17:11 ` Jonathan Schwender 2021-03-23 16:51 ` John Kacur @ 2021-03-24 9:32 ` Ahmed S. Darwish 2021-03-29 14:37 ` Jonathan Schwender 1 sibling, 1 reply; 11+ messages in thread From: Ahmed S. Darwish @ 2021-03-24 9:32 UTC (permalink / raw) To: Jonathan Schwender; +Cc: jkacur, williams, linux-rt-users Hi Jonathan, On Mar 21, 2021, Jonathan Schwender wrote: > > On 2/22/21 5:20 PM, Ahmed S. Darwish wrote: > > > > Do you also have any numbers showing different results > > with/without "--mainaffinity"? > > Sorry for the delay. I do have some numbers now, and there is a > benefit to using this option if CPU isolation + CAT are > used. Otherwise it's not really visible. > Thanks a lot for the results. Since I'm doing some CAT-related stuff on RT tasks vs. GPU workloads, I'm curious, how much was the benefit of CAT ON/OFF? In your benchmarks you show that the combination of --mainaffinity, CPU isolation, and CAT, improves worst case latency by 2 micro seconds. If you keep everything as-is, but disable only CAT, how much change happens in the results? Also, how many classes of service (CLOS) your CPU has? How was the cache bitmask divided vis-a-vis the available CLOSes? And did you assign isolated CPUs to one CLOS, and non-isolated CPUs to a different CLOS? Or was the division more granular? Kind regards, -- Ahmed S. Darwish Linutronix GmbH ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rt-tests: cyclictest: Add option to specify main pid affinity 2021-03-24 9:32 ` Ahmed S. Darwish @ 2021-03-29 14:37 ` Jonathan Schwender 0 siblings, 0 replies; 11+ messages in thread From: Jonathan Schwender @ 2021-03-29 14:37 UTC (permalink / raw) To: Ahmed S. Darwish; +Cc: linux-rt-users Hi Ahmed, On 3/24/21 10:32 AM, Ahmed S. Darwish wrote: > Hi Jonathan, > > > Since I'm doing some CAT-related stuff on RT tasks vs. GPU workloads, > I'm curious, how much was the benefit of CAT ON/OFF? I'm assuming you're testing iGPU workloads and not on a dedicated GPU since you are mentioning CAT. Or is there any benefit of using CAT with a dedicated GPU? > In your benchmarks you show that the combination of --mainaffinity, CPU > isolation, and CAT, improves worst case latency by 2 micro seconds. If > you keep everything as-is, but disable only CAT, how much change happens > in the results? First I'd like to mention that my test system had an inclusive cache-architecture. I'd guess that the difference between CAT and no CAT is smaller for exclusive or non-inclusive caches (assuming cyclictest is running on an isolated CPU). So the results will depend on the amount of isolated CPUs and how much of the shared L3 cache the load on housekeeping CPU uses. Rendered Markdown: https://gist.github.com/jschwe/3502dbf1e56c85e9bf1a340041885b33 # Isolation capabilities without CAT ## Test 2021-01-31 - Isolate all CPUs on NUMA node 1 The figure below shows a worst-case latency of 4 microseconds measured by cyclictest on the isolated CPUs on NUMA node 1. cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,1,3,5,7,9,11 rcu_nocbs=1,3,5,7,9,11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0` Test parameters: `sudo taskset -c 0-11 rteval --duration=24h --loads-cpulist=0,2,4,6,8,10 --measurement-cpulist=0-11` ![Figure: Latency of completely isolated node vs housekeeping node](https://gist.githubusercontent.com/jschwe/3502dbf1e56c85e9bf1a340041885b33/raw/962244e4e5309507feb0b4ec0627efbabe064c85/2021-01-31.png) ## Test 2021-02-01 - Isolate only CPU 11 The figure below shows a worst-case latency of 11 microseconds for the isolated CPU 11. Interestingly, the worst-case latencies also increased for the housekeeping CPUs with respect to the previous test. It is consistent with other tests I made though, and the worst-case latency of the housekeeping CPUs is reduced if I isolate all or all-but-one CPUs on node 1. cmdline: `nosmt isolcpus=domain,managed_irq,wq,rcu,misc,kthread,11 rcu_nocbs=11 irqaffinity=0,2,4 maxcpus=12 rcu_nocb_poll nowatchdog tsc=nowatchdog processor.max_cstate=1 intel_idle.max_cstate=0` Test parameters: `sudo taskset -c 0-11 rteval --duration=24h --loads-cpulist=0-10 --measurement-cpulist=0-11` ![Figure: CPU 11 latency with load on neighboring CPUs](https://gist.githubusercontent.com/jschwe/3502dbf1e56c85e9bf1a340041885b33/raw/962244e4e5309507feb0b4ec0627efbabe064c85/2021-02-01.png) Note: The error bars show the unbiased standard error of the mean > Also, how many classes of service (CLOS) your CPU has? How was the cache > bitmask divided vis-a-vis the available CLOSes? And did you assign > isolated CPUs to one CLOS, and non-isolated CPUs to a different CLOS? Or > was the division more granular? I don't have access to the system anymore, but I think it had 8 CLOS available (according to resctrl). I always used exclusive bitmasks. I mostly used one CLOS for the isolated CPUs, the default CLOS, and sometimes an additional CLOS for tid-based CAT.Due to the "exclusive" setting in resctrl I had to take away one way of the node 0 cache, even for CLOS that were only intended for node 1, which is a bit unfortunate. I also tested tid-based vs. CPU based CAT on isolated CPUs and the take-away was it doesn't matter too much: tid based CAT visibly (negatively) impacts the best-case latencies (1 micro-second bin). However, the differences regarding the worst-case latencies were minor. In one test, I used CDP to reserve 4-ways (4 MiB) for each code and data (so 8-way total) for 1 cyclictest instance (with 3 measurement threads). For CPU-based CAT the utilization oscillated between 0.98MB and 1.11MB. For tid-based CAT, the utilization oscillated between 98kB and 163kB. In the next test I only used CAT to reserve 2-ways (2 MiB) shared between code and data, also for 1 cyclictest instance with 3 measurement threads. In this case the CPU-based approach utilized between 0.45MB and 0.85MB of the reserved L3 cache, but the latencies measured by cyclictest were basically unchanged. The tid-based approach actually had a utilization of 0. I'm assuming that's because more L3 was available to the default CLOS, and the relevant cache-lines were never evicted from that part of the L3 cache, so the reservation didn't even come in to play there. > Kind regards, > > -- > Ahmed S. Darwish > Linutronix GmbH Best regards Jonathan Schwender ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-03-29 14:38 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-22 15:28 rt-tests: cyclictest: Add option to specify main pid affinity Jonathan Schwender 2021-02-22 15:28 ` [PATCH 0/2] " Jonathan Schwender 2021-02-22 15:28 ` [PATCH 1/2] cyclictest: Move main pid setaffinity handling into a function Jonathan Schwender 2021-02-23 5:12 ` John Kacur 2021-02-22 15:28 ` [PATCH 2/2] cyclictest: Add --mainaffinity=[CPUSET] option Jonathan Schwender 2021-02-22 16:20 ` rt-tests: cyclictest: Add option to specify main pid affinity Ahmed S. Darwish 2021-02-22 17:05 ` Jonathan Schwender 2021-03-21 17:11 ` Jonathan Schwender 2021-03-23 16:51 ` John Kacur 2021-03-24 9:32 ` Ahmed S. Darwish 2021-03-29 14:37 ` Jonathan Schwender
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).