On 24/03/18 10:58AM, Jiri Olsa wrote: > On Sat, Mar 16, 2024 at 10:22:41AM -0600, Jose Fernandez wrote: > > SNIP > > > +void test_task_get_cgroup(void) > > +{ > > + struct test_task_get_cgroup *skel; > > + int err, fd; > > + pid_t pid; > > + __u64 cgroup_id, expected_cgroup_id; > > + const struct timespec req = { > > + .tv_sec = 1, > > + .tv_nsec = 0, > > + }; > > + > > + fd = test__join_cgroup(TEST_CGROUP); > > + if (!ASSERT_OK(fd < 0, "test_join_cgroup_TEST_CGROUP")) > > + return; > > + > > + skel = test_task_get_cgroup__open(); > > + if (!ASSERT_OK_PTR(skel, "test_task_get_cgroup__open")) > > + goto cleanup; > > + > > + err = test_task_get_cgroup__load(skel); > > + if (!ASSERT_OK(err, "test_task_get_cgroup__load")) > > + goto cleanup; > > nit, you could call test_task_get_cgroup__open_and_load I'll rename. > > + > > + err = test_task_get_cgroup__attach(skel); > > + if (!ASSERT_OK(err, "test_task_get_cgroup__attach")) > > + goto cleanup; > > + > > + pid = getpid(); > > + expected_cgroup_id = get_cgroup_id(TEST_CGROUP); > > + if (!ASSERT_GT(expected_cgroup_id, 0, "get_cgroup_id")) > > + goto cleanup; > > + > > + /* Trigger nanosleep to enter the sched_switch tracepoint */ > > + /* The previous task should be this process */ > > + syscall(__NR_nanosleep, &req, NULL); > > would smaller sleep do? also we have our own usleep (in test_progs.c) > that calls nanosleep Yes a smaller sleep should be fine. I'll reduce the sleep and use the usleep helper. > > + > > + err = bpf_map_lookup_elem(bpf_map__fd(skel->maps.pid_to_cgid_map), &pid, > > + &cgroup_id); > > + > > + if (!ASSERT_OK(err, "bpf_map_lookup_elem")) > > + goto cleanup; > > + > > + ASSERT_EQ(cgroup_id, expected_cgroup_id, "cgroup_id"); > > + > > +cleanup: > > + test_task_get_cgroup__destroy(skel); > > + close(fd); > > +} > > diff --git a/tools/testing/selftests/bpf/progs/test_task_get_cgroup.c b/tools/testing/selftests/bpf/progs/test_task_get_cgroup.c > > new file mode 100644 > > index 000000000000..580f8f0657d5 > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/progs/test_task_get_cgroup.c > > @@ -0,0 +1,37 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +// Copyright 2024 Netflix, Inc. > > + > > +#include "vmlinux.h" > > +#include <bpf/bpf_helpers.h> > > +#include <bpf/bpf_tracing.h> > > + > > +struct cgroup *bpf_task_get_cgroup(struct task_struct *task) __ksym; > > +void bpf_cgroup_release(struct cgroup *cgrp) __ksym; > > + > > +struct { > > + __uint(type, BPF_MAP_TYPE_HASH); > > + __uint(max_entries, 4096); > > + __type(key, __u32); > > + __type(value, __u64); > > +} pid_to_cgid_map SEC(".maps"); > > + > > +SEC("tp_btf/sched_switch") > > +int BPF_PROG(sched_switch, bool preempt, struct task_struct *prev, > > + struct task_struct *next) > > +{ > > + struct cgroup *cgrp; > > + u64 cgroup_id; > > + u32 pid; > > + > > could you filter for your pid in here like we do in other places, > (eg in progs/kprobe_multi.c) > > in which case you won't need hash map, but just a single value > to store the cgroup id to > > jirka I'll apply this suggestion as well and include it in V3. Thanks for the feedback. > > > + cgrp = bpf_task_get_cgroup(prev); > > + if (cgrp == NULL) > > + return 0; > > + cgroup_id = cgrp->kn->id; > > + pid = prev->pid; > > + bpf_map_update_elem(&pid_to_cgid_map, &pid, &cgroup_id, BPF_ANY); > > + > > + bpf_cgroup_release(cgrp); > > + return 0; > > +} > > + > > +char _license[] SEC("license") = "GPL"; > > -- > > 2.40.1 > >
On Fri, 2024-03-15 at 15:29 +0100, Benjamin Tissoires wrote: > bpf_experimental.h and ../bpf_testmod/bpf_testmod_kfunc.h are both > including vmlinux.h, which is not compatible with including time.h > or bpf_tcp_helpers.h. > > So prevent vmlinux.h to be included, and override the few missing > types. > > Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> [...] > @@ -6,6 +6,14 @@ > #include <bpf/bpf_helpers.h> > #include "bpf_tcp_helpers.h" > > +#define __VMLINUX_H__ > +#define u32 __u32 > +#define u64 __u64 > +#include "bpf_experimental.h" > +struct prog_test_member1; > +#include "../bpf_testmod/bpf_testmod_kfunc.h" > +#undef __VMLINUX_H__ Tbh, this looks very ugly. Would it be possible to create a new tests file sleepable_timer.c and include bpf_experimental.h there, skipping time.h? It appears that for the new tests the only necessary definition from time.h is CLOCK_MONOTONIC.
On Fri, 2024-03-15 at 15:29 +0100, Benjamin Tissoires wrote: [...] > @@ -5279,7 +5281,8 @@ static int map_kptr_match_type(struct bpf_verifier_env *env, > > static bool in_sleepable(struct bpf_verifier_env *env) > { > - return env->prog->sleepable; > + return env->prog->sleepable || > + (env->cur_state && env->cur_state->in_sleepable); > } I was curious why 'env->cur_state &&' check was needed and found that removing it caused an error in the following fragment: static int do_misc_fixups(struct bpf_verifier_env *env) { ... if (is_storage_get_function(insn->imm)) { if (!in_sleepable(env) || env->insn_aux_data[i + delta].storage_get_func_atomic) insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_ATOMIC); else insn_buf[0] = BPF_MOV64_IMM(BPF_REG_5, (__force __s32)GFP_KERNEL); ... } ... } When do_misc_fixups() is done env->cur_state is NULL. Current implementation would use GFP_ATOMIC allocation even for sleepable callbacks, where GFP_KERNEL is sufficient. Is this is something we want to address? > > /* The non-sleepable programs and sleepable programs with explicit bpf_rcu_read_lock()
On Sun, Mar 17, 2024 at 07:49:43PM -0700, David Wei wrote: > I'm working on a similar proposal for zero copy Rx but to host memory > and depend on this memory provider API. How do you need a different provider for that vs just udmabuf? > Jakub also designed this API for hugepages too IIRC. Basically there's > going to be at least three fancy ways of providing pages (one of which > isn't actually pages, hence the merged netmem_t series) to drivers. How do hugepages different from a normal page allocation? They should just a different ordered passed to the page allocator.
On Fri, 2024-03-15 at 15:29 +0100, Benjamin Tissoires wrote: This patch looks good to me, please see two nitpicks below. Acked-by: Eduard Zingerman <eddyz87@gmail.com> [...] > @@ -1350,6 +1358,11 @@ BPF_CALL_3(bpf_timer_start, struct bpf_timer_kern *, timer, u64, nsecs, u64, fla > goto out; > } > > + if (t->is_sleepable && !(flags & BPF_F_TIMER_SLEEPABLE)) { > + ret = -EINVAL; > + goto out; > + } Nit: the BPF_F_TIMER_ABS and BPF_F_TIMER_CPU_PIN don't affect sleepable timers, should this check be changed to: '(t->is_sleepable && flags != BPF_F_TIMER_SLEEPABLE)' ? [...] > @@ -12151,6 +12175,16 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, > } > } > > + if (is_async_callback_calling_kfunc(meta.func_id)) { > + err = push_callback_call(env, insn, insn_idx, meta.subprogno, > + set_timer_callback_state); Nit: still think that this fragment would be better as: if (is_bpf_timer_set_sleepable_cb_impl_kfunc(meta.func_id)) { err = push_callback_call(env, insn, insn_idx, meta.subprogno, set_timer_callback_state); Because of the 'set_timer_callback_state' passed to push_callback_call(). > + if (err) { > + verbose(env, "kfunc %s#%d failed callback verification\n", > + func_name, meta.func_id); > + return err; > + } > + } > + > rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta); > rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta); >
Hi Tony, On 3/18/2024 3:00 PM, Luck, Tony wrote: >> Perhaps ... in this case it may make things easier to understand if >> those "mon_NODE_*" directories are sub-directories of the appropriate >> "mon_L3_*" directories. > > Reinette, > > Like this? > > $ tree mon_data/ > mon_data/ > ├── mon_L3_00 > │ ├── llc_occupancy > │ ├── mbm_local_bytes > │ ├── mbm_total_bytes > │ ├── mon_NODE_00 > │ │ ├── llc_occupancy > │ │ ├── mbm_local_bytes > │ │ └── mbm_total_bytes > │ └── mon_NODE_01 > │ ├── llc_occupancy > │ ├── mbm_local_bytes > │ └── mbm_total_bytes > └── mon_L3_01 > ├── llc_occupancy > ├── mbm_local_bytes > ├── mbm_total_bytes > ├── mon_NODE_02 > │ ├── llc_occupancy > │ ├── mbm_local_bytes > │ └── mbm_total_bytes > └── mon_NODE_03 > ├── llc_occupancy > ├── mbm_local_bytes > └── mbm_total_bytes > Yes. Pro: * This is familiar to users when compared to existing CTRL_MON group counts that are the sum of the MON groups within it. * Users continue to see the clear connection between files listed in /sys/fs/resctrl/info/L3_MON/mon_features found in "mon_L3*" directories. * If I understand correctly it also would continue to be useful to Arm [1] while not breaking existing user space since the mon_L3* counts continue to reflect the entire L3 resource. * This addresses your comment of maintaining the detailed information from each SNC node. * I do assume that if there is only one SNC node per L3 cache then those mon_NODE_* directories will not be present and, to address the issue that triggered this thread, user space can use presence of multiple "mon_NODE_*" directories per cache instance to know if SNC is enabled. Con: * Unclear how this may need to change if/when SNC becomes architectural. * Continues to "muddy" the naming of the directories: mon_<resource>_<id> vs mon_<scope>_<id>. This cannot be turned into agreement with user space where first directory is mon_<resource>_<id> and second directory is mon_<scope>_<id> because then we would need to have, for example, mon_L3_00/mon_L3_00 where the first directory is for the resource and the second is for the scope, which appears redundant. * Things may get confusing if there is ever a "node" resource. This is starting to look like an interface that "checks" all the requirements I've seen so far. Looking at the "con" it is difficult for me to see how doing something like this now may cause frustrations later. Reinette [1] https://lore.kernel.org/lkml/88430722-67b3-4f7d-8db2-95ee52b6f0b0@arm.com/
> Could you please help me understand the details by answering my first
> question: What is the use case for needing to expose the individual cluster
> counts?
>
> This is a model specific feature so if this is something needed for just a
> couple of systems I think we should be less inclined to make changes to
> resctrl interface. I am starting to be concerned about something similar
> becoming architectural later and then we need to wrangle this model specific
> resctrl support (which has then become ABI) again to support whatever that
> may look like.
Reinette,
Model specific. But present in multiple consecutive generations (Sapphire Rapids,
Emerald Rapids, Granite Rapids, Sierra Forest).
Adding Peter Newman for a resctrl user perspective on SNC, rather than me
continue to speculate on possible ways this might be used.
Peter: You will need to dig back a few messages on lore.kernel.org to
get context.
-Tony
> Perhaps ... in this case it may make things easier to understand if
> those "mon_NODE_*" directories are sub-directories of the appropriate
> "mon_L3_*" directories.
Reinette,
Like this?
$ tree mon_data/
mon_data/
├── mon_L3_00
│ ├── llc_occupancy
│ ├── mbm_local_bytes
│ ├── mbm_total_bytes
│ ├── mon_NODE_00
│ │ ├── llc_occupancy
│ │ ├── mbm_local_bytes
│ │ └── mbm_total_bytes
│ └── mon_NODE_01
│ ├── llc_occupancy
│ ├── mbm_local_bytes
│ └── mbm_total_bytes
└── mon_L3_01
├── llc_occupancy
├── mbm_local_bytes
├── mbm_total_bytes
├── mon_NODE_02
│ ├── llc_occupancy
│ ├── mbm_local_bytes
│ └── mbm_total_bytes
└── mon_NODE_03
├── llc_occupancy
├── mbm_local_bytes
└── mbm_total_bytes
-Tony
On Fri, 2024-03-15 at 15:29 +0100, Benjamin Tissoires wrote:
[...]
> @@ -12021,6 +12034,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> if (ret)
> return ret;
> break;
> + case KF_ARG_PTR_TO_TIMER:
> + if (reg->type != PTR_TO_MAP_VALUE) {
> + verbose(env, "arg#%d doesn't point to a map value\n", i);
> + return -EINVAL;
> + }
> + break;
I think that pointer offset has to be checked as well,
otherwise the following program verifies w/o error:
--- 8< ----------------------------
#include <linux/bpf.h>
#include <time.h>
#include <errno.h>
#include <bpf/bpf_helpers.h>
#include "bpf_tcp_helpers.h"
extern int bpf_timer_set_sleepable_cb_impl(struct bpf_timer *timer,
int (callback_fn)(void *map, int *key, struct bpf_timer *timer), void *aux__ign) __ksym;
#define bpf_timer_set_sleepable_cb(timer, cb) \
bpf_timer_set_sleepable_cb_impl(timer, cb, NULL)
struct elem {
struct bpf_timer t;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(max_entries, 2);
__type(key, int);
__type(value, struct elem);
} array SEC(".maps");
static int cb_sleepable(void *map, int *key, struct bpf_timer *timer)
{
return 0;
}
SEC("fentry/bpf_fentry_test5")
int BPF_PROG2(test_sleepable, int, a)
{
struct bpf_timer *arr_timer;
int array_key = 1;
arr_timer = bpf_map_lookup_elem(&array, &array_key);
if (!arr_timer)
return 0;
bpf_timer_init(arr_timer, &array, CLOCK_MONOTONIC);
bpf_timer_set_sleepable_cb((void *)arr_timer + 1, // <-- note incorrect offset
cb_sleepable);
bpf_timer_start(arr_timer, 0, 0);
return 0;
}
char _license[] SEC("license") = "GPL";
---------------------------- >8 ---
Add ability to parse multiple files. Additionally add the ability to parse all results in the KUnit debugfs repository. How to parse multiple files: ./tools/testing/kunit/kunit.py parse results.log results2.log How to parse all files in directory: ./tools/testing/kunit/kunit.py parse directory_path/* How to parse KUnit debugfs repository: ./tools/testing/kunit/kunit.py parse debugfs For each file, the parser outputs the file name, results, and test summary. At the end of all parsing, the parser outputs a total summary line. This feature can be easily tested on the tools/testing/kunit/test_data/ directory. Signed-off-by: Rae Moar <rmoar@google.com> --- Changes since v3: - Changing from input() to stdin - Add checking for non-regular files - Spacing fix - Small printing fix tools/testing/kunit/kunit.py | 54 +++++++++++++++++++++++++----------- 1 file changed, 38 insertions(+), 16 deletions(-) diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py index bc74088c458a..641b8ca83e3e 100755 --- a/tools/testing/kunit/kunit.py +++ b/tools/testing/kunit/kunit.py @@ -511,19 +511,40 @@ def exec_handler(cli_args: argparse.Namespace) -> None: def parse_handler(cli_args: argparse.Namespace) -> None: - if cli_args.file is None: - sys.stdin.reconfigure(errors='backslashreplace') # type: ignore - kunit_output = sys.stdin # type: Iterable[str] - else: - with open(cli_args.file, 'r', errors='backslashreplace') as f: - kunit_output = f.read().splitlines() - # We know nothing about how the result was created! - metadata = kunit_json.Metadata() - request = KunitParseRequest(raw_output=cli_args.raw_output, - json=cli_args.json) - result, _ = parse_tests(request, metadata, kunit_output) - if result.status != KunitStatus.SUCCESS: - sys.exit(1) + parsed_files = cli_args.files # type: List[str] + total_test = kunit_parser.Test() + total_test.status = kunit_parser.TestStatus.SUCCESS + if not parsed_files: + parsed_files.append('/dev/stdin') + elif len(parsed_files) == 1 and parsed_files[0] == "debugfs": + parsed_files.pop() + for (root, _, files) in os.walk("/sys/kernel/debug/kunit"): + parsed_files.extend(os.path.join(root, f) for f in files if f == "results") + if not parsed_files: + print("No files found.") + + for file in parsed_files: + if os.path.isdir(file): + print(f'Ignoring directory "{file}"') + elif os.path.exists(file): + print(file) + with open(file, 'r', errors='backslashreplace') as f: + kunit_output = f.read().splitlines() + # We know nothing about how the result was created! + metadata = kunit_json.Metadata() + request = KunitParseRequest(raw_output=cli_args.raw_output, + json=cli_args.json) + _, test = parse_tests(request, metadata, kunit_output) + total_test.subtests.append(test) + else: + print(f'Could not find "{file}"') + + if len(parsed_files) > 1: # if more than one file was parsed output total summary + print('All files parsed.') + if not request.raw_output: + stdout.print_with_timestamp(kunit_parser.DIVIDER) + kunit_parser.bubble_up_test_results(total_test) + kunit_parser.print_summary_line(total_test) subcommand_handlers_map = { @@ -569,9 +590,10 @@ def main(argv: Sequence[str]) -> None: help='Parses KUnit results from a file, ' 'and parses formatted results.') add_parse_opts(parse_parser) - parse_parser.add_argument('file', - help='Specifies the file to read results from.', - type=str, nargs='?', metavar='input_file') + parse_parser.add_argument('files', + help='List of file paths to read results from or keyword' + '"debugfs" to read all results from the debugfs directory.', + type=str, nargs='*', metavar='input_files') cli_args = parser.parse_args(massage_argv(argv)) base-commit: 806cb2270237ce2ec672a407d66cee17a07d3aa2 -- 2.44.0.291.gc1ea87d7ee-goog
On 3/18/2024 2:04 PM, Luck, Tony wrote:
>>> What is the use case for needing to expose the individual cluster counts? What if
>>> resctrl just summed the cluster counts and presented the data as before - per L3
>>> cache instance? I doubt that resctrl would be what applications would use to verify
>>> whether they are "well behaved" wrt NUMA.
>>
>> Reinette,
>>
>> My (perhaps naïve) belief is that in a cloud server environment there are many
>> well behaved NUMA applications. Only presenting the sum would lose the detailed
>> information from each SNC node.
>
> Is the answer to "A" or "B" ... why not provide both:
>
> $ ls -l /sys/fs/resctrl/mon_data
> total 0
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_L3_00
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_L3_01
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_00
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_01
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_02
> dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_03
>
> The "L3" entries provide the sum across all SNC nodes sharing the cache. The NODE ones
> give the broken out counts.
Perhaps ... in this case it may make things easier to understand if
those "mon_NODE_*" directories are sub-directories of the appropriate
"mon_L3_*" directories.
Reinette
Hi Tony,
On 3/18/2024 1:47 PM, Luck, Tony wrote:
>> What is the use case for needing to expose the individual cluster counts? What if
>> resctrl just summed the cluster counts and presented the data as before - per L3
>> cache instance? I doubt that resctrl would be what applications would use to verify
>> whether they are "well behaved" wrt NUMA.
>
> Reinette,
>
> My (perhaps naïve) belief is that in a cloud server environment there are many
> well behaved NUMA applications. Only presenting the sum would lose the detailed
> information from each SNC node.
Yes ... I understand by providing a sum the values that contributed to the sum
are lost.
Could you please help me understand the details by answering my first
question: What is the use case for needing to expose the individual cluster
counts?
This is a model specific feature so if this is something needed for just a
couple of systems I think we should be less inclined to make changes to
resctrl interface. I am starting to be concerned about something similar
becoming architectural later and then we need to wrangle this model specific
resctrl support (which has then become ABI) again to support whatever that
may look like.
Reinette
On Fri, Mar 15, 2024 at 7:16 PM Daniel Latypov <dlatypov@google.com> wrote: > > On Thu, Mar 7, 2024 at 2:29 PM Rae Moar <rmoar@google.com> wrote: > > > > Add ability to parse multiple files. Additionally add the > > ability to parse all results in the KUnit debugfs repository. > > > > How to parse multiple files: > > > > ./tools/testing/kunit/kunit.py parse results.log results2.log > > > > How to parse all files in directory: > > > > ./tools/testing/kunit/kunit.py parse directory_path/* > > > > How to parse KUnit debugfs repository: > > > > ./tools/testing/kunit/kunit.py parse debugfs > > > > For each file, the parser outputs the file name, results, and test > > summary. At the end of all parsing, the parser outputs a total summary > > line. > > > > This feature can be easily tested on the tools/testing/kunit/test_data/ > > directory. > > > > Signed-off-by: Rae Moar <rmoar@google.com> > > --- > > Changes since v2: > > - Fixed bug with input from command line. I changed this to use > > input(). Daniel, let me know if this works for you. > > Oops, sorry for the delay. Hi! No worries at all. Thanks for the review! > > Hmm, it seems to be treating the stdin lines like file names > > $ ./tools/testing/kunit/kunit.py parse < > ./tools/testing/kunit/test_data/test_config_printk_time.log > File path: Could not find [ 0.060000] printk: console [mc-1] enabled > > Oh, I see, we're prompting the user via > input("File path: ") > ? > > I'm not necessarily against such a change, but I would personally > prefer the old behavior of being able to read ktap from stdin > directly. > As a user, I'd also prefer to only type out filenames as arguments > where I can get autocomplete, so `input()` here wouldn't help me > personally. > > Applying a hackish patch like this [1] on top gets the behavior I'd > personally expect: > $ ./tools/testing/kunit/kunit.py parse < > ./tools/testing/kunit/test_data/test_config_printk_time.log > /dev/stdin > ... > [16:01:50] Testing complete. Ran 10 tests: passed: 10 > > I'd mentioned in the previous version that we could have parsed files > contain a `Union[str, TextIO]` and then read from the `sys.stdin` file > object directly. > But having it blindly open `/dev/stdin` seems to work just the same, > if we want to keep our list simpler and just hold strings. > I definitely see why the change to stdin would be better. My original change to input() was to keep it simple. But I really like the change listed below. I will go ahead and implement that. > [1] this just also re-orders the `os.path.isdir()` check as mentioned > below, which simplifies things > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > index 1aa3d736d80c..311d107bd684 100755 > --- a/tools/testing/kunit/kunit.py > +++ b/tools/testing/kunit/kunit.py > @@ -515,18 +515,18 @@ def parse_handler(cli_args: argparse.Namespace) -> None: > total_test = kunit_parser.Test() > total_test.status = kunit_parser.TestStatus.SUCCESS > if not parsed_files: > - parsed_files.append(input("File path: ")) > - > - if parsed_files[0] == "debugfs" and len(parsed_files) == 1: > + parsed_files.append('/dev/stdin') > + elif len(parsed_files) == 1 and parsed_files[0] == "debugfs": > parsed_files.pop() > for (root, _, files) in os.walk("/sys/kernel/debug/kunit"): > parsed_files.extend(os.path.join(root, f) for > f in files if f == "results") > - > - if not parsed_files: > - print("No files found.") > + if not parsed_files: > + print("No files found.") > > for file in parsed_files: > - if os.path.isfile(file): > + if os.path.isdir(file): > + print("Ignoring directory ", file) > + elif os.path.exists(file): > print(file) > with open(file, 'r', errors='backslashreplace') as f: > kunit_output = f.read().splitlines() > @@ -536,8 +536,6 @@ def parse_handler(cli_args: argparse.Namespace) -> None: > json=cli_args.json) > _, test = parse_tests(request, metadata, kunit_output) > total_test.subtests.append(test) > - elif os.path.isdir(file): > - print("Ignoring directory ", file) > else: > print("Could not find ", file) > > > > - Add more specific warning messages > > > > tools/testing/kunit/kunit.py | 56 +++++++++++++++++++++++++----------- > > 1 file changed, 40 insertions(+), 16 deletions(-) > > > > diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py > > index bc74088c458a..1aa3d736d80c 100755 > > --- a/tools/testing/kunit/kunit.py > > +++ b/tools/testing/kunit/kunit.py > > @@ -511,19 +511,42 @@ def exec_handler(cli_args: argparse.Namespace) -> None: > > > > > > def parse_handler(cli_args: argparse.Namespace) -> None: > > - if cli_args.file is None: > > - sys.stdin.reconfigure(errors='backslashreplace') # type: ignore > > - kunit_output = sys.stdin # type: Iterable[str] > > - else: > > - with open(cli_args.file, 'r', errors='backslashreplace') as f: > > - kunit_output = f.read().splitlines() > > - # We know nothing about how the result was created! > > - metadata = kunit_json.Metadata() > > - request = KunitParseRequest(raw_output=cli_args.raw_output, > > - json=cli_args.json) > > - result, _ = parse_tests(request, metadata, kunit_output) > > - if result.status != KunitStatus.SUCCESS: > > - sys.exit(1) > > + parsed_files = cli_args.files # type: List[str] > > + total_test = kunit_parser.Test() > > + total_test.status = kunit_parser.TestStatus.SUCCESS > > + if not parsed_files: > > + parsed_files.append(input("File path: ")) > > + > > + if parsed_files[0] == "debugfs" and len(parsed_files) == 1: > > + parsed_files.pop() > > + for (root, _, files) in os.walk("/sys/kernel/debug/kunit"): > > + parsed_files.extend(os.path.join(root, f) for f in files if f == "results") > > + > > + if not parsed_files: > > + print("No files found.") > > + > > + for file in parsed_files: > > + if os.path.isfile(file): > > Note: perhaps we should reorder this to > > if os.path.isdir(file): > ... > elif os.path.exists(file): > ... > > That way this code will then start handling non-regular, yet readable > files, like links, etc. > That would also help out if we started passing in the magic > "/dev/stdin" (since it's a symlink) Oh I see. Yes I will try to implement this! Thanks! > > > + print(file) > > + with open(file, 'r', errors='backslashreplace') as f: > > + kunit_output = f.read().splitlines() > > + # We know nothing about how the result was created! > > + metadata = kunit_json.Metadata() > > + request = KunitParseRequest(raw_output=cli_args.raw_output, > > + json=cli_args.json) > > + _, test = parse_tests(request, metadata, kunit_output) > > + total_test.subtests.append(test) > > + elif os.path.isdir(file): > > + print("Ignoring directory ", file) > > minor nit: `print()` will automatically put a space between arguments, e.g. > > Ignoring directory . > is what it'll print if I run `kunit.py parse .` > > It might be better to use a f-string so put quotes around it, like so > print(f'Ignoring directory "{file}"')} > and below, > print(f'Could not find "{file}"') > Yep! Happy to change this. > > + else: > > + print("Could not find ", file) > > + > > + if len(parsed_files) > 1: # if more than one file was parsed output total summary > > + print('All files parsed.') > > + if not request.raw_output: > > + stdout.print_with_timestamp(kunit_parser.DIVIDER) > > + kunit_parser.bubble_up_test_results(total_test) > > + kunit_parser.print_summary_line(total_test) > > > > > > subcommand_handlers_map = { > > @@ -569,9 +592,10 @@ def main(argv: Sequence[str]) -> None: > > help='Parses KUnit results from a file, ' > > 'and parses formatted results.') > > add_parse_opts(parse_parser) > > - parse_parser.add_argument('file', > > - help='Specifies the file to read results from.', > > - type=str, nargs='?', metavar='input_file') > > + parse_parser.add_argument('files', > > + help='List of file paths to read results from or keyword' > > + '"debugfs" to read all results from the debugfs directory.', > > minor spacing note: there are two ' 's here in the series of tabs, i.e. > ^I^I^I^I ^I^I'"debugfs" to read all results from the debugfs directory.',$ > (using vim's :list formatting) > > This was copy-pasted from the lines above and below which look like > ^I^I^I^I help='List of file paths to read results from or keyword'$ > i.e. they use the 2 spaces to align after the tabs. > > We can just drop those 2 spaces since they won't visually affect the > outcome with a tabwidth of 8 spaces. Thanks for pointing this out! I will change the spacing here. I am thinking of just changing it to match the other lines. So something like this: ^I^I^I^I '"debugfs" to read all results from the debugfs directory.',$ > > Sorry again for the delayed reply, > Daniel
> > What is the use case for needing to expose the individual cluster counts? What if
> > resctrl just summed the cluster counts and presented the data as before - per L3
> > cache instance? I doubt that resctrl would be what applications would use to verify
> > whether they are "well behaved" wrt NUMA.
>
> Reinette,
>
> My (perhaps naïve) belief is that in a cloud server environment there are many
> well behaved NUMA applications. Only presenting the sum would lose the detailed
> information from each SNC node.
Is the answer to "A" or "B" ... why not provide both:
$ ls -l /sys/fs/resctrl/mon_data
total 0
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_L3_00
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_L3_01
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_00
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_01
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_02
dr-xr-xr-x. 2 root root 0 Mar 18 14:01 mon_NODE_03
The "L3" entries provide the sum across all SNC nodes sharing the cache. The NODE ones
give the broken out counts.
-Tony
> What is the use case for needing to expose the individual cluster counts? What if
> resctrl just summed the cluster counts and presented the data as before - per L3
> cache instance? I doubt that resctrl would be what applications would use to verify
> whether they are "well behaved" wrt NUMA.
Reinette,
My (perhaps naïve) belief is that in a cloud server environment there are many
well behaved NUMA applications. Only presenting the sum would lose the detailed
information from each SNC node.
-Tony
Hi Tony,
On 3/18/2024 12:34 PM, Luck, Tony wrote:
>>>> While that is in some ways a more accurate view, it breaks a lot of
>>>> legacy monitoring applications that expect the "L3" names.
>>>
>>> True - but the behaviour is different from a non SNC system, if this software can read the
>>> file - but goes wrong because the contents of the file represent something different, its
>>> still broken.
>>
>> This is a good point. There is also /sys/fs/resctrl/info/L3_MON to consider and trying to think
>> what to do about that makes me go in circles about when user space may expect resctrl to indicate
>> the resource and when user space may expect resctrl to indicate the scope. For example,
>> /sys/fs/resctrl/mon_data/mon_L3_00 contains files with data that monitor the
>> "L3" _resource_, no? If we change that to /sys/fs/resctrl/mon_data/mon_NODE_00 then it
>> switches the meaning of the middle term to be "scope" while it still contains the monitoring
>> data of the "L3" resource. So does that mean user space would need to rely on
>> /sys/fs/resctrl/info/L3_MON to obtain the information about which monitoring files
>> (/sys/fs/resctrl/info/L3_MON/mon_features) are related to the particular resource and then
>> match those filenames with the filenames in /sys/fs/resctrl/mon_data/mon_NODE_00 to know
>> which resource it applies to and learn from the directory name what scope measurement is at?
>
> Reinette,
>
> It's both a wave and a particle, depending on the observer.
>
> In SNC systems resources on each socket are divided into 2, 3, 4 nodes. But the
> division is complicated. Memory and CPU cores are easy. They are each assigned
> to an SNC node. The cache is more complicated. The hash function for memory
> address to cache index is the part that is SNC aware. So memory on SNC node1
> will allocate in the cache indices assigned to SNC node1. But that function has to
> be independent of which CPU is doing the access. That's why I keep mentioning
> "well behaved NUMA applications when talking about SNC.
>
> So the resctrl monitoring operations still work on the L3 cache, but in SNC mode
> they work on a portion of the L3 cache. As long as all accesses are NUMA local you
> can think of the cache as partitioned between the SNC nodes.
>
> But not everything is well behaved from a NUMA perspective. It would be misleading
> to describe the occupancy and bandwidth as belonging to an SNC node.
>
> It's also a bit misleading to describe in terms of an L3 cache instance. But doing
> so doesn't require application changes.
What is the use case for needing to expose the individual cluster counts? What if
resctrl just summed the cluster counts and presented the data as before - per L3
cache instance? I doubt that resctrl would be what applications would use to verify
whether they are "well behaved" wrt NUMA.
Reinette
> >> While that is in some ways a more accurate view, it breaks a lot of
> >> legacy monitoring applications that expect the "L3" names.
> >
> > True - but the behaviour is different from a non SNC system, if this software can read the
> > file - but goes wrong because the contents of the file represent something different, its
> > still broken.
>
> This is a good point. There is also /sys/fs/resctrl/info/L3_MON to consider and trying to think
> what to do about that makes me go in circles about when user space may expect resctrl to indicate
> the resource and when user space may expect resctrl to indicate the scope. For example,
> /sys/fs/resctrl/mon_data/mon_L3_00 contains files with data that monitor the
> "L3" _resource_, no? If we change that to /sys/fs/resctrl/mon_data/mon_NODE_00 then it
> switches the meaning of the middle term to be "scope" while it still contains the monitoring
> data of the "L3" resource. So does that mean user space would need to rely on
> /sys/fs/resctrl/info/L3_MON to obtain the information about which monitoring files
> (/sys/fs/resctrl/info/L3_MON/mon_features) are related to the particular resource and then
> match those filenames with the filenames in /sys/fs/resctrl/mon_data/mon_NODE_00 to know
> which resource it applies to and learn from the directory name what scope measurement is at?
Reinette,
It's both a wave and a particle, depending on the observer.
In SNC systems resources on each socket are divided into 2, 3, 4 nodes. But the
division is complicated. Memory and CPU cores are easy. They are each assigned
to an SNC node. The cache is more complicated. The hash function for memory
address to cache index is the part that is SNC aware. So memory on SNC node1
will allocate in the cache indices assigned to SNC node1. But that function has to
be independent of which CPU is doing the access. That's why I keep mentioning
"well behaved NUMA applications when talking about SNC.
So the resctrl monitoring operations still work on the L3 cache, but in SNC mode
they work on a portion of the L3 cache. As long as all accesses are NUMA local you
can think of the cache as partitioned between the SNC nodes.
But not everything is well behaved from a NUMA perspective. It would be misleading
to describe the occupancy and bandwidth as belonging to an SNC node.
It's also a bit misleading to describe in terms of an L3 cache instance. But doing
so doesn't require application changes.
-Tony
On 3/15/2024 11:02 AM, James Morse wrote:
> On 08/03/2024 18:42, Tony Luck wrote:
>> On Fri, Mar 08, 2024 at 06:06:45PM +0000, James Morse wrote:
>>> Hi guys,
>>>
>>> On 07/03/2024 23:16, Tony Luck wrote:
>>>> On Thu, Mar 07, 2024 at 02:39:08PM -0800, Reinette Chatre wrote:
>>>>> Thank you for the example. I find that significantly easier to
>>>>> understand than a single number in a generic "nodes_per_l3_cache".
>>>>> Especially with potential confusion surrounding inconsistent "nodes"
>>>>> between allocation and monitoring.
>>>>>
>>>>> How about domain_cpu_list and domain_cpu_map ?
>>>
>>>> Like this (my test system doesn't have SNC, so all domains are the same):
>>>>
>>>> $ cd /sys/fs/resctrl/info/
>>>> $ grep . */domain*
>>>> L3/domain_cpu_list:0: 0-35,72-107
>>>> L3/domain_cpu_list:1: 36-71,108-143
>>>> L3/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff
>>>> L3/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000
>>>> L3_MON/domain_cpu_list:0: 0-35,72-107
>>>> L3_MON/domain_cpu_list:1: 36-71,108-143
>>>> L3_MON/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff
>>>> L3_MON/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000
>>>> MB/domain_cpu_list:0: 0-35,72-107
>>>> MB/domain_cpu_list:1: 36-71,108-143
>>>> MB/domain_cpu_map:0: 0000,00000fff,ffffff00,0000000f,ffffffff
>>>> MB/domain_cpu_map:1: ffff,fffff000,000000ff,fffffff0,00000000
>>>
>>> This duplicates the information in /sys/devices/system/cpu/cpuX/cache/indexY ... is this
>>> really because that information is, er, wrong on SNC systems. Is it possible to fix that?
>>
>> On an SNC system the resctrl domain for L3_MON becomes the SNC node
>> instead of the L3 cache instance. With 2, 3, or 4 SNC nodes per L3.
>>
>> Even without the SNC issue this duplication may be a useful
>> convienience. On Intel to get from a resctrl domain is a multi-step
>> process to first find which of the indexY directories has level=3
>> and then look for the "id" that matches the domain.
>>
>>> >From Tony's earlier description of how SNC changes things, the MB controls remain
>>> per-socket. To me it feels less invasive to fix the definition of L3 on these platforms to
>>> describe how it behaves (assuming that is possible), and define a new 'MB' that is NUMA
>>> scoped.
>>> This direction of redefining L3 means /sys/fs/resctrl and /sys/devices have different
>>> views of 'the' cache hierarchy.
>>
>> I almost went partly in that direction when I started this epic voyage.
>> The "almost" part was to change the names of the monitoring directories
>> under mon_data from (legacy non-SNC system):
>>
>> $ ls -l mon_data
>> total 0
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_L3_00
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_L3_01
>>
>> to (2 socket, SNC=2 system):
>>
>> $ ls -l mon_data
>> total 0
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_NODE_00
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_NODE_01
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_NODE_02
>> dr-xr-xr-x. 2 root root 0 Mar 8 10:31 mon_NODE_03
>
> This would be useful for MPAM. I've seen a couple of MPAM systems that have per-NUMA MPAM
> controls on the 'L3', but describe it as a single global L3. The MPAM driver currently
> hides this by summing the NUMA node counters and reporting it as the global L3's value.
>
>
>> While that is in some ways a more accurate view, it breaks a lot of
>> legacy monitoring applications that expect the "L3" names.
>
> True - but the behaviour is different from a non SNC system, if this software can read the
> file - but goes wrong because the contents of the file represent something different, its
> still broken.
This is a good point. There is also /sys/fs/resctrl/info/L3_MON to consider and trying to think
what to do about that makes me go in circles about when user space may expect resctrl to indicate
the resource and when user space may expect resctrl to indicate the scope. For example,
/sys/fs/resctrl/mon_data/mon_L3_00 contains files with data that monitor the
"L3" _resource_, no? If we change that to /sys/fs/resctrl/mon_data/mon_NODE_00 then it
switches the meaning of the middle term to be "scope" while it still contains the monitoring
data of the "L3" resource. So does that mean user space would need to rely on
/sys/fs/resctrl/info/L3_MON to obtain the information about which monitoring files
(/sys/fs/resctrl/info/L3_MON/mon_features) are related to the particular resource and then
match those filenames with the filenames in /sys/fs/resctrl/mon_data/mon_NODE_00 to know
which resource it applies to and learn from the directory name what scope measurement is at?
Reinette
As was intended with commit 17107429947b ("selftests/exec: Perform script checks with /bin/bash"), convert the other instance of /bin/sh to /bin/bash. It appears that at least Debian Bookworm's /bin/sh (dash) does not conform to POSIX's "return 127 when script not found" requirement. Fixes: 17107429947b ("selftests/exec: Perform script checks with /bin/bash") Reported-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Closes: https://lore.kernel.org/lkml/02c8bf8e-1934-44ab-a886-e065b37366a7@collabora.com/ Signed-off-by: Kees Cook <keescook@chromium.org> --- Cc: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Shuah Khan <shuah@kernel.org> Cc: "kernelci.org bot" <bot@kernelci.org> Cc: Yang Yingliang <yangyingliang@huawei.com> Cc: linux-mm@kvack.org Cc: linux-kselftest@vger.kernel.org --- tools/testing/selftests/exec/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/exec/Makefile b/tools/testing/selftests/exec/Makefile index a0b8688b0836..fb4472ddffd8 100644 --- a/tools/testing/selftests/exec/Makefile +++ b/tools/testing/selftests/exec/Makefile @@ -19,8 +19,8 @@ include ../lib.mk $(OUTPUT)/subdir: mkdir -p $@ -$(OUTPUT)/script: - echo '#!/bin/sh' > $@ +$(OUTPUT)/script: Makefile + echo '#!/bin/bash' > $@ echo 'exit $$*' >> $@ chmod +x $@ $(OUTPUT)/execveat.symlink: $(OUTPUT)/execveat -- 2.34.1
On Wed, Mar 13, 2024 at 04:11:41PM +0800, Yi Liu wrote:
> yes. how about your opinion? @Jason. I noticed the set_dev_pasid callback
> and pasid_array update is under the group->lock, so update it should be
> fine to adjust the order to update pasid_array after set_dev_pasid returns.
Yes, it makes some sense
But, also I would like it very much if we just have the core pass in
the actual old domain as a an addition function argument.
I think we have some small mistakes in multi-device group error
unwinding for remove because the global xarray can't isn't actually
going to be correct in all scenarios.
Jason
On 03/16, Jose Fernandez wrote:
> This patch enhances the BPF helpers by adding a kfunc to retrieve the
> cgroup v2 of a task, addressing a previous limitation where only
> bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
> particularly useful for scenarios where obtaining the cgroup ID of a
> task other than the "current" one is necessary, which the existing
> bpf_get_current_cgroup_id helper cannot accommodate. A specific use
> case at Netflix involved the sched_switch tracepoint, where we had to
> get the cgroup IDs of both the prev and next tasks.
>
> The bpf_task_get_cgroup kfunc acquires and returns a reference to a
> task's default cgroup, ensuring thread-safe access by correctly
> implementing RCU read locking and unlocking. It leverages the existing
> cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
>
> Signed-off-by: Jose Fernandez <josef@netflix.com>
> Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Stanislav Fomichev <sdf@google.com>
On 3/16/24 9:22 AM, Jose Fernandez wrote:
> This patch enhances the BPF helpers by adding a kfunc to retrieve the
> cgroup v2 of a task, addressing a previous limitation where only
> bpf_task_get_cgroup1 was available for cgroup v1. The new kfunc is
> particularly useful for scenarios where obtaining the cgroup ID of a
> task other than the "current" one is necessary, which the existing
> bpf_get_current_cgroup_id helper cannot accommodate. A specific use
> case at Netflix involved the sched_switch tracepoint, where we had to
> get the cgroup IDs of both the prev and next tasks.
>
> The bpf_task_get_cgroup kfunc acquires and returns a reference to a
> task's default cgroup, ensuring thread-safe access by correctly
> implementing RCU read locking and unlocking. It leverages the existing
> cgroup.h helper, and cgroup_tryget to safely acquire a reference to it.
>
> Signed-off-by: Jose Fernandez <josef@netflix.com>
> Reviewed-by: Tycho Andersen <tycho@tycho.pizza>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
On 3/17/24 8:38 PM, Hangbin Liu wrote:
> Wild guess, the last change of icmp_redirect is my netns update. Maybe
> there are something default sysctl settings in netns cause the error?
It is most likely sysctl settings. It would be good to chase those down
and make sure we have the script setting them.
Mirsad: What OS are you testing with? That script has a verbose option
(-v) to get more output like the commands run and pause-on-fail (-p) to
manually debug at that point.
On Mon, Mar 11, 2024 at 04:28:01PM +0100, Matthieu Baerts wrote:
>Hi Sasha,
>
>On 11/03/2024 16:11, Sasha Levin wrote:
>> From: Paolo Abeni <pabeni@redhat.com>
>>
>> [ Upstream commit b4b51d36bbaa3ddb93b3e1ca3a1ef0aa629d6521 ]
>
>Thank you for having backported this patch to v6.7 and v6.6 versions.
>But it looks like it depends on commit 9369777c2939 ("selftests: mptcp:
>add mptcp_lib_wait_local_port_listen") which is not in these versions.
>
>Because CIs will soon use the kselftests from the new v6.8, I think it
>is better to drop this patch from v6.7 and v6.6 versions.
>
>Cheers,
>Matt
I'll drop it, thanks.
--
Thanks,
Sasha
On 18.03.24 03:34, Vitaly Chikunov wrote:
> Add missing flags argument to open(2) call with O_CREAT.
>
> Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid
> value) (together with -O), resulting in similar error messages such as:
>
> In file included from /usr/include/fcntl.h:342,
> from gup_test.c:1:
> In function 'open',
> inlined from 'main' at gup_test.c:206:10:
> /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
> 50 | __open_missing_mode ();
> | ^~~~~~~~~~~~~~~~~~~~~~
>
> _FORTIFY_SOURCE is enabled by default in some distributions, so the
> tests are not built by default and are skipped.
>
> open(2) man-page warns about missing flags argument: "if it is not
> supplied, some arbitrary bytes from the stack will be applied as the
> file mode."
>
> Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file")
> Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split")
> Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect")
> Cc: Keith Busch <kbusch@kernel.org>
> Cc: Zi Yan <ziy@nvidia.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Nadav Amit <nadav.amit@gmail.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Vitaly Chikunov <vt@altlinux.org>
> ---
Reviewed-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb