On Thu, Feb 9, 2023 at 10:07 AM Mathieu Desnoyers wrote: > > On 2023-02-08 19:54, John Stultz wrote: > > On Wed, Feb 8, 2023 at 4:11 PM Alexei Starovoitov > > wrote: > >> > >> On Wed, Feb 8, 2023 at 2:01 PM John Stultz wrote: > >>> > >>> On Sat, Nov 20, 2021 at 11:27:38AM +0000, Yafang Shao wrote: > >>>> As the sched:sched_switch tracepoint args are derived from the kernel, > >>>> we'd better make it same with the kernel. So the macro TASK_COMM_LEN is > >>>> converted to type enum, then all the BPF programs can get it through BTF. > >>>> > >>>> The BPF program which wants to use TASK_COMM_LEN should include the header > >>>> vmlinux.h. Regarding the test_stacktrace_map and test_tracepoint, as the > >>>> type defined in linux/bpf.h are also defined in vmlinux.h, so we don't > >>>> need to include linux/bpf.h again. > >>>> > >>>> Signed-off-by: Yafang Shao > >>>> Acked-by: Andrii Nakryiko > >>>> Acked-by: David Hildenbrand > >>>> Cc: Mathieu Desnoyers > >>>> Cc: Arnaldo Carvalho de Melo > >>>> Cc: Andrii Nakryiko > >>>> Cc: Michal Miroslaw > >>>> Cc: Peter Zijlstra > >>>> Cc: Steven Rostedt > >>>> Cc: Matthew Wilcox > >>>> Cc: David Hildenbrand > >>>> Cc: Al Viro > >>>> Cc: Kees Cook > >>>> Cc: Petr Mladek > >>>> --- > >>>> include/linux/sched.h | 9 +++++++-- > >>>> tools/testing/selftests/bpf/progs/test_stacktrace_map.c | 6 +++--- > >>>> tools/testing/selftests/bpf/progs/test_tracepoint.c | 6 +++--- > >>>> 3 files changed, 13 insertions(+), 8 deletions(-) > >>> > >>> Hey all, > >>> I know this is a little late, but I recently got a report that > >>> this change was causiing older versions of perfetto to stop > >>> working. > >>> > >>> Apparently newer versions of perfetto has worked around this > >>> via the following changes: > >>> https://android.googlesource.com/platform/external/perfetto/+/c717c93131b1b6e3705a11092a70ac47c78b731d%5E%21/ > >>> https://android.googlesource.com/platform/external/perfetto/+/160a504ad5c91a227e55f84d3e5d3fe22af7c2bb%5E%21/ > >>> > >>> But for older versions of perfetto, reverting upstream commit > >>> 3087c61ed2c4 ("tools/testing/selftests/bpf: replace open-coded 16 > >>> with TASK_COMM_LEN") is necessary to get it back to working. > >>> > >>> I haven't dug very far into the details, and obviously this doesn't > >>> break with the updated perfetto, but from a high level this does > >>> seem to be a breaking-userland regression. > >>> > >>> So I wanted to reach out to see if there was more context for this > >>> breakage? I don't want to raise a unnecessary stink if this was > >>> an unfortuante but forced situation. > >> > >> Let me understand what you're saying... > >> > >> The commit 3087c61ed2c4 did > >> > >> -/* Task command name length: */ > >> -#define TASK_COMM_LEN 16 > >> +/* > >> + * Define the task command name length as enum, then it can be visible to > >> + * BPF programs. > >> + */ > >> +enum { > >> + TASK_COMM_LEN = 16, > >> +}; > >> > >> > >> and that caused: > >> > >> cat /sys/kernel/debug/tracing/events/task/task_newtask/format > >> > >> to print > >> field:char comm[TASK_COMM_LEN]; offset:12; size:16; signed:0; > >> instead of > >> field:char comm[16]; offset:12; size:16; signed:0; > >> > >> so the ftrace parsing android tracing tool had to do: > >> > >> - if (Match(type_and_name.c_str(), R"(char [a-zA-Z_]+\[[0-9]+\])")) { > >> + if (Match(type_and_name.c_str(), > >> + R"(char [a-zA-Z_][a-zA-Z_0-9]*\[[a-zA-Z_0-9]+\])")) { > >> > >> to workaround this change. > >> Right? > > > > I believe so. > > > >> And what are you proposing? > > > > I'm not proposing anything. I was just wanting to understand more > > context around this, as it outwardly appears to be a user-breaking > > change, and that is usually not done, so I figured it was an issue > > worth raising. > > > > If the debug/tracing/*/format output is in the murky not-really-abi > > space, that's fine, but I wanted to know if this was understood as > > something that may require userland updates or if this was a > > unexpected side-effect. > > If you are looking at the root cause in the kernel code generating this: > > kernel/trace/trace_events.c:f_show() > > /* > * Smartly shows the array type(except dynamic array). > * Normal: > * field:TYPE VAR > * If TYPE := TYPE[LEN], it is shown: > * field:TYPE VAR[LEN] > */ > > where it uses the content of field->type (a string) to format the VAR[LEN] part. > > This in turn is the result of the definition of the > struct trace_event_fields done in: > > include/trace/trace_events.h at stage 4, thus with the context of those macros defined: > > include/trace/stages/stage4_event_fields.h: > > #undef __array > #define __array(_type, _item, _len) { \ > .type = #_type"["__stringify(_len)"]", .name = #_item, \ > .size = sizeof(_type[_len]), .align = ALIGN_STRUCTFIELD(_type), \ > .is_signed = is_signed_type(_type), .filter_type = FILTER_OTHER }, > > I suspect the real culprit here is the use of __stringify(_len), which happens to work > on macros, but not on enum labels. > > One possible solution to make this more robust would be to extend > struct trace_event_fields with one more field that indicates the length > of an array as an actual integer, without storing it in its stringified > form in the type, and do the formatting in f_show where it belongs. > > This way everybody can stay happy and no ABI is broken. > > Thoughts ? Many thanks for the detailed analysis. Seems it can work. Hi John, Could you pls. try the attached fix ? I have verified it in my test env. -- Regards Yafang