linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] perf tests: Run tests in parallel by default
@ 2024-03-01 17:47 Ian Rogers
  2024-03-20 15:58 ` Ian Rogers
  2024-04-26 15:06 ` James Clark
  0 siblings, 2 replies; 6+ messages in thread
From: Ian Rogers @ 2024-03-01 17:47 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Kan Liang,
	linux-perf-users, linux-kernel

Switch from running tests sequentially to running in parallel by
default. Change the opt-in '-p' or '--parallel' flag to '-S' or
'--sequential'.

On an 8 core tigerlake an address sanitizer run time changes from:
326.54user 622.73system 6:59.91elapsed 226%CPU
to:
973.02user 583.98system 3:01.17elapsed 859%CPU

So over twice as fast, saving 4 minutes.

Signed-off-by: Ian Rogers <irogers@google.com>
---
This change is on top of the test fixes in:
https://lore.kernel.org/lkml/20240301074639.2260708-1-irogers@google.com/
---
 tools/perf/tests/builtin-test.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
index ddb2f4e38ea5..73f53b02f733 100644
--- a/tools/perf/tests/builtin-test.c
+++ b/tools/perf/tests/builtin-test.c
@@ -39,8 +39,8 @@
  * making them easier to debug.
  */
 static bool dont_fork;
-/* Fork the tests in parallel and then wait for their completion. */
-static bool parallel;
+/* Don't fork the tests in parallel and wait for their completion. */
+static bool sequential;
 const char *dso_to_test;
 const char *test_objdump_path = "objdump";
 
@@ -374,7 +374,7 @@ static int start_test(struct test_suite *test, int i, int subi, struct child_tes
 	}
 	(*child)->process.no_exec_cmd = run_test_child;
 	err = start_command(&(*child)->process);
-	if (err || parallel)
+	if (err || !sequential)
 		return  err;
 	return finish_test(*child, width);
 }
@@ -440,7 +440,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
 			int err = start_test(t, curr, -1, &child_tests[child_test_num++], width);
 
 			if (err) {
-				/* TODO: if parallel waitpid the already forked children. */
+				/* TODO: if !sequential waitpid the already forked children. */
 				free(child_tests);
 				return err;
 			}
@@ -460,7 +460,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
 		}
 	}
 	for (i = 0; i < child_test_num; i++) {
-		if (parallel) {
+		if (!sequential) {
 			int ret  = finish_test(child_tests[i], width);
 
 			if (ret)
@@ -536,8 +536,8 @@ int cmd_test(int argc, const char **argv)
 		    "be more verbose (show symbol address, etc)"),
 	OPT_BOOLEAN('F', "dont-fork", &dont_fork,
 		    "Do not fork for testcase"),
-	OPT_BOOLEAN('p', "parallel", &parallel,
-		    "Run the tests altogether in parallel"),
+	OPT_BOOLEAN('S', "sequential", &sequential,
+		    "Run the tests one after another rather than in parallel"),
 	OPT_STRING('w', "workload", &workload, "work", "workload to run for testing"),
 	OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
 	OPT_STRING(0, "objdump", &test_objdump_path, "path",
@@ -564,6 +564,9 @@ int cmd_test(int argc, const char **argv)
 	if (workload)
 		return run_workload(workload, argc, argv);
 
+	if (dont_fork)
+		sequential = true;
+
 	symbol_conf.priv_size = sizeof(int);
 	symbol_conf.try_vmlinux_path = true;
 
-- 
2.44.0.278.ge034bb2e1d-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] perf tests: Run tests in parallel by default
  2024-03-01 17:47 [PATCH v1] perf tests: Run tests in parallel by default Ian Rogers
@ 2024-03-20 15:58 ` Ian Rogers
  2024-03-20 18:20   ` Arnaldo Carvalho de Melo
  2024-04-26 15:06 ` James Clark
  1 sibling, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2024-03-20 15:58 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, James Clark, Kan Liang,
	linux-perf-users, linux-kernel

On Fri, Mar 1, 2024 at 9:47 AM Ian Rogers <irogers@google.com> wrote:
>
> Switch from running tests sequentially to running in parallel by
> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> '--sequential'.
>
> On an 8 core tigerlake an address sanitizer run time changes from:
> 326.54user 622.73system 6:59.91elapsed 226%CPU
> to:
> 973.02user 583.98system 3:01.17elapsed 859%CPU
>
> So over twice as fast, saving 4 minutes.
>
> Signed-off-by: Ian Rogers <irogers@google.com>

Ping.

Thanks,
Ian

> ---
> This change is on top of the test fixes in:
> https://lore.kernel.org/lkml/20240301074639.2260708-1-irogers@google.com/
> ---
>  tools/perf/tests/builtin-test.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c
> index ddb2f4e38ea5..73f53b02f733 100644
> --- a/tools/perf/tests/builtin-test.c
> +++ b/tools/perf/tests/builtin-test.c
> @@ -39,8 +39,8 @@
>   * making them easier to debug.
>   */
>  static bool dont_fork;
> -/* Fork the tests in parallel and then wait for their completion. */
> -static bool parallel;
> +/* Don't fork the tests in parallel and wait for their completion. */
> +static bool sequential;
>  const char *dso_to_test;
>  const char *test_objdump_path = "objdump";
>
> @@ -374,7 +374,7 @@ static int start_test(struct test_suite *test, int i, int subi, struct child_tes
>         }
>         (*child)->process.no_exec_cmd = run_test_child;
>         err = start_command(&(*child)->process);
> -       if (err || parallel)
> +       if (err || !sequential)
>                 return  err;
>         return finish_test(*child, width);
>  }
> @@ -440,7 +440,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
>                         int err = start_test(t, curr, -1, &child_tests[child_test_num++], width);
>
>                         if (err) {
> -                               /* TODO: if parallel waitpid the already forked children. */
> +                               /* TODO: if !sequential waitpid the already forked children. */
>                                 free(child_tests);
>                                 return err;
>                         }
> @@ -460,7 +460,7 @@ static int __cmd_test(int argc, const char *argv[], struct intlist *skiplist)
>                 }
>         }
>         for (i = 0; i < child_test_num; i++) {
> -               if (parallel) {
> +               if (!sequential) {
>                         int ret  = finish_test(child_tests[i], width);
>
>                         if (ret)
> @@ -536,8 +536,8 @@ int cmd_test(int argc, const char **argv)
>                     "be more verbose (show symbol address, etc)"),
>         OPT_BOOLEAN('F', "dont-fork", &dont_fork,
>                     "Do not fork for testcase"),
> -       OPT_BOOLEAN('p', "parallel", &parallel,
> -                   "Run the tests altogether in parallel"),
> +       OPT_BOOLEAN('S', "sequential", &sequential,
> +                   "Run the tests one after another rather than in parallel"),
>         OPT_STRING('w', "workload", &workload, "work", "workload to run for testing"),
>         OPT_STRING(0, "dso", &dso_to_test, "dso", "dso to test"),
>         OPT_STRING(0, "objdump", &test_objdump_path, "path",
> @@ -564,6 +564,9 @@ int cmd_test(int argc, const char **argv)
>         if (workload)
>                 return run_workload(workload, argc, argv);
>
> +       if (dont_fork)
> +               sequential = true;
> +
>         symbol_conf.priv_size = sizeof(int);
>         symbol_conf.try_vmlinux_path = true;
>
> --
> 2.44.0.278.ge034bb2e1d-goog
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] perf tests: Run tests in parallel by default
  2024-03-20 15:58 ` Ian Rogers
@ 2024-03-20 18:20   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-03-20 18:20 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, James Clark,
	Kan Liang, linux-perf-users, linux-kernel

On Wed, Mar 20, 2024 at 08:58:37AM -0700, Ian Rogers wrote:
> On Fri, Mar 1, 2024 at 9:47 AM Ian Rogers <irogers@google.com> wrote:
> >
> > Switch from running tests sequentially to running in parallel by
> > default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> > '--sequential'.
> >
> > On an 8 core tigerlake an address sanitizer run time changes from:
> > 326.54user 622.73system 6:59.91elapsed 226%CPU
> > to:
> > 973.02user 583.98system 3:01.17elapsed 859%CPU
> >
> > So over twice as fast, saving 4 minutes.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> 
> Ping.

Thanks, applied.

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] perf tests: Run tests in parallel by default
  2024-03-01 17:47 [PATCH v1] perf tests: Run tests in parallel by default Ian Rogers
  2024-03-20 15:58 ` Ian Rogers
@ 2024-04-26 15:06 ` James Clark
  2024-04-26 15:42   ` Adrian Hunter
  1 sibling, 1 reply; 6+ messages in thread
From: James Clark @ 2024-04-26 15:06 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, linux-perf-users, linux-kernel



On 01/03/2024 17:47, Ian Rogers wrote:
> Switch from running tests sequentially to running in parallel by
> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> '--sequential'.
> 
> On an 8 core tigerlake an address sanitizer run time changes from:
> 326.54user 622.73system 6:59.91elapsed 226%CPU
> to:
> 973.02user 583.98system 3:01.17elapsed 859%CPU
> 
> So over twice as fast, saving 4 minutes.
> 

Apologies for not replying earlier before this was applied. But IMO this
isn't a good default. Tests that use things like exclusive PMUs
(Coresight for example) can never pass when run in parallel.

For CI it's arguable whether you'd want to trade stability for speed.
And for interactive sessions there was already the --parallel option
which was easy to add and have it in your bash history.

Now we've changed the default, any CI will need to be updated to add
--sequential if it wants all the tests to pass. Maybe we could do some
hack and gate it on interactive vs non interactive sessions, but that
might be getting too clever. (Or a "don't run in parallel" flag on
certain tests)


Thanks
James


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] perf tests: Run tests in parallel by default
  2024-04-26 15:06 ` James Clark
@ 2024-04-26 15:42   ` Adrian Hunter
  2024-04-26 16:45     ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Adrian Hunter @ 2024-04-26 15:42 UTC (permalink / raw)
  To: James Clark, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Kan Liang, linux-perf-users, linux-kernel

On 26/04/24 18:06, James Clark wrote:
> 
> 
> On 01/03/2024 17:47, Ian Rogers wrote:
>> Switch from running tests sequentially to running in parallel by
>> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
>> '--sequential'.
>>
>> On an 8 core tigerlake an address sanitizer run time changes from:
>> 326.54user 622.73system 6:59.91elapsed 226%CPU
>> to:
>> 973.02user 583.98system 3:01.17elapsed 859%CPU
>>
>> So over twice as fast, saving 4 minutes.
>>
> 
> Apologies for not replying earlier before this was applied. But IMO this
> isn't a good default. Tests that use things like exclusive PMUs
> (Coresight for example) can never pass when run in parallel.

Yes, that is an issue for Intel PT also.

> 
> For CI it's arguable whether you'd want to trade stability for speed.
> And for interactive sessions there was already the --parallel option
> which was easy to add and have it in your bash history.
> 
> Now we've changed the default, any CI will need to be updated to add
> --sequential if it wants all the tests to pass.

Same here.

>                                                 Maybe we could do some
> hack and gate it on interactive vs non interactive sessions, but that
> might be getting too clever. (Or a "don't run in parallel" flag on
> certain tests)

Perhaps more attention is needed for the tests that take so long.
For example, maybe they could do things in parallel.

Also -F option doesn't seem to work.

$ tools/perf/perf test -F
Couldn't find a vmlinux that matches the kernel running on this machine, skipping test
  1: vmlinux symtab matches kallsyms                                 : Skip
  2: Detect openat syscall event                                     : Ok
  3: Detect openat syscall event on all cpus                         : Ok
  4.1: Read samples using the mmap interface                         : Ok
  4.2: User space counter reading of instructions                    : Ok
  4.3: User space counter reading of cycles                          : Ok
  5: Test data source output                                         : Ok
WARNING: event 'numpmu' not valid (bits 16-17,20,22 of config '6530160' not supported by kernel)!
  6.1: Test event parsing                                            : Ok
  6.2: Parsing of all PMU events from sysfs                          : Ok
WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
  6.3: Parsing of given PMU events from sysfs                        : Ok
  6.4: Parsing of aliased events from sysfs                          : Skip (no aliases in sysfs)
  6.5: Parsing of aliased events                                     : Ok
  6.6: Parsing of terms (event modifiers)                            : Ok
  7: Simple expression parser                                        : Ok
  8: PERF_RECORD_* events & perf_sample fields                       : Ok
  9: Parse perf pmu format                                           : Ok
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
failed: recursion detected for M1
failed: recursion detected for M2
failed: recursion detected for M3
Not grouping metric tma_memory_fence's events.
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
    echo 0 > /proc/sys/kernel/nmi_watchdog
    perf stat ...
    echo 1 > /proc/sys/kernel/nmi_watchdog
Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
    echo 0 > /proc/sys/kernel/nmi_watchdog
    perf stat ...
    echo 1 > /proc/sys/kernel/nmi_watchdog
...
^C



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v1] perf tests: Run tests in parallel by default
  2024-04-26 15:42   ` Adrian Hunter
@ 2024-04-26 16:45     ` Ian Rogers
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2024-04-26 16:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: James Clark, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Kan Liang, linux-perf-users,
	linux-kernel

On Fri, Apr 26, 2024 at 8:42 AM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 26/04/24 18:06, James Clark wrote:
> >
> >
> > On 01/03/2024 17:47, Ian Rogers wrote:
> >> Switch from running tests sequentially to running in parallel by
> >> default. Change the opt-in '-p' or '--parallel' flag to '-S' or
> >> '--sequential'.
> >>
> >> On an 8 core tigerlake an address sanitizer run time changes from:
> >> 326.54user 622.73system 6:59.91elapsed 226%CPU
> >> to:
> >> 973.02user 583.98system 3:01.17elapsed 859%CPU
> >>
> >> So over twice as fast, saving 4 minutes.
> >>
> >
> > Apologies for not replying earlier before this was applied. But IMO this
> > isn't a good default. Tests that use things like exclusive PMUs
> > (Coresight for example) can never pass when run in parallel.
>
> Yes, that is an issue for Intel PT also.

Right, Arnaldo and I discussed this and the safe thing to do for now
is to keep the default at serial - ie revert this change.

My longer term concern is that basically means people won't use
parallel testing and we'll bitrot in places like synthesis where it's
easy to make naive assumptions that say /proc won't be changing.

We have these tests set up for continuous testing in my team, and as
part of that set up each test is run independently to maximize
parallelism. This means the exclusive PMU thing does make the tests
flaky so it would be nice to have the tests address this, either by
skipping or busy waiting (only when the error code from the PMU is
busy, up to some maximum time period).

> >
> > For CI it's arguable whether you'd want to trade stability for speed.
> > And for interactive sessions there was already the --parallel option
> > which was easy to add and have it in your bash history.
> >
> > Now we've changed the default, any CI will need to be updated to add
> > --sequential if it wants all the tests to pass.
>
> Same here.
>
> >                                                 Maybe we could do some
> > hack and gate it on interactive vs non interactive sessions, but that
> > might be getting too clever. (Or a "don't run in parallel" flag on
> > certain tests)
>
> Perhaps more attention is needed for the tests that take so long.
> For example, maybe they could do things in parallel.

This is true for the tool in general. The problem is a lack of good
abstractions. Probably the easiest candidate for this are the shell
tests.

> Also -F option doesn't seem to work.
>
> $ tools/perf/perf test -F
> Couldn't find a vmlinux that matches the kernel running on this machine, skipping test
>   1: vmlinux symtab matches kallsyms                                 : Skip
>   2: Detect openat syscall event                                     : Ok
>   3: Detect openat syscall event on all cpus                         : Ok
>   4.1: Read samples using the mmap interface                         : Ok
>   4.2: User space counter reading of instructions                    : Ok
>   4.3: User space counter reading of cycles                          : Ok
>   5: Test data source output                                         : Ok
> WARNING: event 'numpmu' not valid (bits 16-17,20,22 of config '6530160' not supported by kernel)!
>   6.1: Test event parsing                                            : Ok
>   6.2: Parsing of all PMU events from sysfs                          : Ok
> WARNING: event 'N/A' not valid (bits 0-1 of config2 '3' not supported by kernel)!
>   6.3: Parsing of given PMU events from sysfs                        : Ok
>   6.4: Parsing of aliased events from sysfs                          : Skip (no aliases in sysfs)
>   6.5: Parsing of aliased events                                     : Ok
>   6.6: Parsing of terms (event modifiers)                            : Ok
>   7: Simple expression parser                                        : Ok
>   8: PERF_RECORD_* events & perf_sample fields                       : Ok
>   9: Parse perf pmu format                                           : Ok
>  10.1: PMU event table sanity                                        : Ok
>  10.2: PMU event map aliases                                         : Ok
> failed: recursion detected for M1
> failed: recursion detected for M2
> failed: recursion detected for M3
> Not grouping metric tma_memory_fence's events.
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>     echo 0 > /proc/sys/kernel/nmi_watchdog
>     perf stat ...
>     echo 1 > /proc/sys/kernel/nmi_watchdog
> Try disabling the NMI watchdog to comply NO_NMI_WATCHDOG metric constraint:
>     echo 0 > /proc/sys/kernel/nmi_watchdog
>     perf stat ...
>     echo 1 > /proc/sys/kernel/nmi_watchdog
> ...
> ^C

Not working meaning the errors are going to stderr? The issue is that
pr_err will write to stderr regardless of the verbosity setting and
some tests deliberately induce errors. We could use debug_set_file
 to set the output to /dev/null, but that could hide trouble. -F
usually means your debugging so we've not cared in the past.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-26 16:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-01 17:47 [PATCH v1] perf tests: Run tests in parallel by default Ian Rogers
2024-03-20 15:58 ` Ian Rogers
2024-03-20 18:20   ` Arnaldo Carvalho de Melo
2024-04-26 15:06 ` James Clark
2024-04-26 15:42   ` Adrian Hunter
2024-04-26 16:45     ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).