bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
@ 2019-10-11 16:21 Stanislav Fomichev
  2019-10-11 16:21 ` [PATCH bpf-next 2/3] tools/bpf: sync bpf.h Stanislav Fomichev
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-11 16:21 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Even though we have the pointer to user_struct and can recover
uid of the user who has created the program, it usually contains
0 (root) which is not very informative. Let's store the comm of the
calling process and export it via bpf_prog_info. This should help
answer the question "which process loaded this particular program".

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 include/linux/bpf.h      | 1 +
 include/uapi/linux/bpf.h | 2 ++
 kernel/bpf/syscall.c     | 4 ++++
 3 files changed, 7 insertions(+)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..b03ea396afe5 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -421,6 +421,7 @@ struct bpf_prog_aux {
 		struct work_struct work;
 		struct rcu_head	rcu;
 	};
+	char created_by_comm[BPF_CREATED_COMM_LEN];
 };
 
 struct bpf_array {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a65c3b0c6935..4e883ecbba1e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -326,6 +326,7 @@ enum bpf_attach_type {
 #define BPF_F_NUMA_NODE		(1U << 2)
 
 #define BPF_OBJ_NAME_LEN 16U
+#define BPF_CREATED_COMM_LEN	16U
 
 /* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
@@ -3252,6 +3253,7 @@ struct bpf_prog_info {
 	__aligned_u64 prog_tags;
 	__u64 run_time_ns;
 	__u64 run_cnt;
+	char created_by_comm[BPF_CREATED_COMM_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 82eabd4e38ad..51c125292eaf 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1735,6 +1735,8 @@ static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
 	bpf_prog_kallsyms_add(prog);
 	perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_LOAD, 0);
 
+	get_task_comm(prog->aux->created_by_comm, current);
+
 	err = bpf_prog_new_fd(prog);
 	if (err < 0)
 		bpf_prog_put(prog);
@@ -2337,6 +2339,8 @@ static int bpf_prog_get_info_by_fd(struct bpf_prog *prog,
 
 	memcpy(info.tag, prog->tag, sizeof(prog->tag));
 	memcpy(info.name, prog->aux->name, sizeof(prog->aux->name));
+	memcpy(info.created_by_comm, prog->aux->created_by_comm,
+	       sizeof(prog->aux->created_by_comm));
 
 	ulen = info.nr_map_ids;
 	info.nr_map_ids = prog->aux->used_map_cnt;
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 2/3] tools/bpf: sync bpf.h
  2019-10-11 16:21 [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program Stanislav Fomichev
@ 2019-10-11 16:21 ` Stanislav Fomichev
  2019-10-11 16:21 ` [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program Stanislav Fomichev
  2019-10-12  0:10 ` [PATCH bpf-next 1/3] bpf: preserve command " Alexei Starovoitov
  2 siblings, 0 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-11 16:21 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Add created_by_comm.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/include/uapi/linux/bpf.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a65c3b0c6935..4e883ecbba1e 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -326,6 +326,7 @@ enum bpf_attach_type {
 #define BPF_F_NUMA_NODE		(1U << 2)
 
 #define BPF_OBJ_NAME_LEN 16U
+#define BPF_CREATED_COMM_LEN	16U
 
 /* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
@@ -3252,6 +3253,7 @@ struct bpf_prog_info {
 	__aligned_u64 prog_tags;
 	__u64 run_time_ns;
 	__u64 run_cnt;
+	char created_by_comm[BPF_CREATED_COMM_LEN];
 } __attribute__((aligned(8)));
 
 struct bpf_map_info {
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program
  2019-10-11 16:21 [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program Stanislav Fomichev
  2019-10-11 16:21 ` [PATCH bpf-next 2/3] tools/bpf: sync bpf.h Stanislav Fomichev
@ 2019-10-11 16:21 ` Stanislav Fomichev
  2019-10-11 20:19   ` Martin Lau
  2019-10-12  0:10 ` [PATCH bpf-next 1/3] bpf: preserve command " Alexei Starovoitov
  2 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-11 16:21 UTC (permalink / raw)
  To: netdev, bpf; +Cc: davem, ast, daniel, Stanislav Fomichev

Print recently added created_by_comm along the existing created_by_uid.

Example with loop1.o (loaded via bpftool):
4: raw_tracepoint  name nested_loops  tag b9472b3ff5753ef2  gpl
        loaded_at 2019-10-10T13:38:18-0700  uid 0  comm bpftool
        xlated 264B  jited 152B  memlock 4096B
        btf_id 3

Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
 tools/bpf/bpftool/prog.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 27da96a797ab..400771a942d7 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -296,7 +296,9 @@ static void print_prog_plain(struct bpf_prog_info *info, int fd)
 		print_boot_time(info->load_time, buf, sizeof(buf));
 
 		/* Piggy back on load_time, since 0 uid is a valid one */
-		printf("\tloaded_at %s  uid %u\n", buf, info->created_by_uid);
+		printf("\tloaded_at %s  uid %u  comm %s\n", buf,
+		       info->created_by_uid,
+		       info->created_by_comm);
 	}
 
 	printf("\txlated %uB", info->xlated_prog_len);
-- 
2.23.0.700.g56cf767bdb-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program
  2019-10-11 16:21 ` [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program Stanislav Fomichev
@ 2019-10-11 20:19   ` Martin Lau
  2019-10-11 20:37     ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Martin Lau @ 2019-10-11 20:19 UTC (permalink / raw)
  To: Stanislav Fomichev; +Cc: netdev, bpf, davem, ast, daniel

On Fri, Oct 11, 2019 at 09:21:24AM -0700, Stanislav Fomichev wrote:
> Print recently added created_by_comm along the existing created_by_uid.
> 
> Example with loop1.o (loaded via bpftool):
> 4: raw_tracepoint  name nested_loops  tag b9472b3ff5753ef2  gpl
>         loaded_at 2019-10-10T13:38:18-0700  uid 0  comm bpftool
>         xlated 264B  jited 152B  memlock 4096B
>         btf_id 3
Hopefully CAP_BPF may avoid uid 0 in the future.

What will be in "comm" for the python bcc script?

> 
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  tools/bpf/bpftool/prog.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> index 27da96a797ab..400771a942d7 100644
> --- a/tools/bpf/bpftool/prog.c
> +++ b/tools/bpf/bpftool/prog.c
> @@ -296,7 +296,9 @@ static void print_prog_plain(struct bpf_prog_info *info, int fd)
>  		print_boot_time(info->load_time, buf, sizeof(buf));
>  
>  		/* Piggy back on load_time, since 0 uid is a valid one */
> -		printf("\tloaded_at %s  uid %u\n", buf, info->created_by_uid);
> +		printf("\tloaded_at %s  uid %u  comm %s\n", buf,
> +		       info->created_by_uid,
> +		       info->created_by_comm);
>  	}
>  
>  	printf("\txlated %uB", info->xlated_prog_len);
> -- 
> 2.23.0.700.g56cf767bdb-goog
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program
  2019-10-11 20:19   ` Martin Lau
@ 2019-10-11 20:37     ` Stanislav Fomichev
  2019-10-11 21:11       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-11 20:37 UTC (permalink / raw)
  To: Martin Lau; +Cc: Stanislav Fomichev, netdev, bpf, davem, ast, daniel

On 10/11, Martin Lau wrote:
> On Fri, Oct 11, 2019 at 09:21:24AM -0700, Stanislav Fomichev wrote:
> > Print recently added created_by_comm along the existing created_by_uid.
> > 
> > Example with loop1.o (loaded via bpftool):
> > 4: raw_tracepoint  name nested_loops  tag b9472b3ff5753ef2  gpl
> >         loaded_at 2019-10-10T13:38:18-0700  uid 0  comm bpftool
> >         xlated 264B  jited 152B  memlock 4096B
> >         btf_id 3
> Hopefully CAP_BPF may avoid uid 0 in the future.
Yeah, but this also requires creating a user with CAP_BPF and running
a daemon under this user.

> What will be in "comm" for the python bcc script?
I guess it will be "python". But at least you get a signal that it's
not some other system daemon :-)

> > 
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  tools/bpf/bpftool/prog.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> > index 27da96a797ab..400771a942d7 100644
> > --- a/tools/bpf/bpftool/prog.c
> > +++ b/tools/bpf/bpftool/prog.c
> > @@ -296,7 +296,9 @@ static void print_prog_plain(struct bpf_prog_info *info, int fd)
> >  		print_boot_time(info->load_time, buf, sizeof(buf));
> >  
> >  		/* Piggy back on load_time, since 0 uid is a valid one */
> > -		printf("\tloaded_at %s  uid %u\n", buf, info->created_by_uid);
> > +		printf("\tloaded_at %s  uid %u  comm %s\n", buf,
> > +		       info->created_by_uid,
> > +		       info->created_by_comm);
> >  	}
> >  
> >  	printf("\txlated %uB", info->xlated_prog_len);
> > -- 
> > 2.23.0.700.g56cf767bdb-goog
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program
  2019-10-11 20:37     ` Stanislav Fomichev
@ 2019-10-11 21:11       ` Arnaldo Carvalho de Melo
  2019-10-11 21:30         ` Stanislav Fomichev
  0 siblings, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-10-11 21:11 UTC (permalink / raw)
  To: Stanislav Fomichev, Martin Lau
  Cc: Stanislav Fomichev, netdev, bpf, davem, ast, daniel

On October 11, 2019 5:37:16 PM GMT-03:00, Stanislav Fomichev <sdf@fomichev.me> wrote:
>On 10/11, Martin Lau wrote:
>> On Fri, Oct 11, 2019 at 09:21:24AM -0700, Stanislav Fomichev wrote:
>> > Example with loop1.o (loaded via 
>
>> What will be in "comm" for the python bcc script?
>I guess it will be "python". But at least you get a signal that it's
>not some other system daemon :-)

Perhaps bcc could use prctl to change its comm before calling sys_bpf and set the script name?

- Arnaldo

Sent from smartphone

>
>> > 
>> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
>> > ---
>> >  tools/bpf/bpftool/prog.c | 4 +++-
>> >  1 file changed, 3 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
>> > index 27da96a797ab..400771a942d7 100644
>> > --- a/tools/bpf/bpftool/prog.c
>> > +++ b/tools/bpf/bpftool/prog.c
>> > @@ -296,7 +296,9 @@ static void print_prog_plain(struct
>bpf_prog_info *info, int fd)
>> >  		print_boot_time(info->load_time, buf, sizeof(buf));
>> >  
>> >  		/* Piggy back on load_time, since 0 uid is a valid one */
>> > -		printf("\tloaded_at %s  uid %u\n", buf, info->created_by_uid);
>> > +		printf("\tloaded_at %s  uid %u  comm %s\n", buf,
>> > +		       info->created_by_uid,
>> > +		       info->created_by_comm);
>> >  	}
>> >  
>> >  	printf("\txlated %uB", info->xlated_prog_len);
>> > -- 
>> > 2.23.0.700.g56cf767bdb-goog
>> > 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program
  2019-10-11 21:11       ` Arnaldo Carvalho de Melo
@ 2019-10-11 21:30         ` Stanislav Fomichev
  0 siblings, 0 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-11 21:30 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Martin Lau, Stanislav Fomichev, netdev, bpf, davem, ast, daniel

On 10/11, Arnaldo Carvalho de Melo wrote:
> On October 11, 2019 5:37:16 PM GMT-03:00, Stanislav Fomichev <sdf@fomichev.me> wrote:
> >On 10/11, Martin Lau wrote:
> >> On Fri, Oct 11, 2019 at 09:21:24AM -0700, Stanislav Fomichev wrote:
> >> > Example with loop1.o (loaded via 
> >
> >> What will be in "comm" for the python bcc script?
> >I guess it will be "python". But at least you get a signal that it's
> >not some other system daemon :-)
> 
> Perhaps bcc could use prctl to change its comm before calling sys_bpf and set the script name?
Good idea! prctl does indeed call set_task_comm to change the task comm.

> - Arnaldo
> 
> Sent from smartphone
> 
> >
> >> > 
> >> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> >> > ---
> >> >  tools/bpf/bpftool/prog.c | 4 +++-
> >> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
> >> > index 27da96a797ab..400771a942d7 100644
> >> > --- a/tools/bpf/bpftool/prog.c
> >> > +++ b/tools/bpf/bpftool/prog.c
> >> > @@ -296,7 +296,9 @@ static void print_prog_plain(struct
> >bpf_prog_info *info, int fd)
> >> >  		print_boot_time(info->load_time, buf, sizeof(buf));
> >> >  
> >> >  		/* Piggy back on load_time, since 0 uid is a valid one */
> >> > -		printf("\tloaded_at %s  uid %u\n", buf, info->created_by_uid);
> >> > +		printf("\tloaded_at %s  uid %u  comm %s\n", buf,
> >> > +		       info->created_by_uid,
> >> > +		       info->created_by_comm);
> >> >  	}
> >> >  
> >> >  	printf("\txlated %uB", info->xlated_prog_len);
> >> > -- 
> >> > 2.23.0.700.g56cf767bdb-goog
> >> > 
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-11 16:21 [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program Stanislav Fomichev
  2019-10-11 16:21 ` [PATCH bpf-next 2/3] tools/bpf: sync bpf.h Stanislav Fomichev
  2019-10-11 16:21 ` [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program Stanislav Fomichev
@ 2019-10-12  0:10 ` Alexei Starovoitov
  2019-10-12  0:38   ` Stanislav Fomichev
  2 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2019-10-12  0:10 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, David S. Miller, Alexei Starovoitov,
	Daniel Borkmann

On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> Even though we have the pointer to user_struct and can recover
> uid of the user who has created the program, it usually contains
> 0 (root) which is not very informative. Let's store the comm of the
> calling process and export it via bpf_prog_info. This should help
> answer the question "which process loaded this particular program".
>
> Signed-off-by: Stanislav Fomichev <sdf@google.com>
> ---
>  include/linux/bpf.h      | 1 +
>  include/uapi/linux/bpf.h | 2 ++
>  kernel/bpf/syscall.c     | 4 ++++
>  3 files changed, 7 insertions(+)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 5b9d22338606..b03ea396afe5 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -421,6 +421,7 @@ struct bpf_prog_aux {
>                 struct work_struct work;
>                 struct rcu_head rcu;
>         };
> +       char created_by_comm[BPF_CREATED_COMM_LEN];
>  };
>
>  struct bpf_array {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index a65c3b0c6935..4e883ecbba1e 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -326,6 +326,7 @@ enum bpf_attach_type {
>  #define BPF_F_NUMA_NODE                (1U << 2)
>
>  #define BPF_OBJ_NAME_LEN 16U
> +#define BPF_CREATED_COMM_LEN   16U

Nack.
16 bytes is going to be useless.
We found it the hard way with prog_name.
If you want to embed additional debug information
please use BTF for that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-12  0:10 ` [PATCH bpf-next 1/3] bpf: preserve command " Alexei Starovoitov
@ 2019-10-12  0:38   ` Stanislav Fomichev
  2019-10-15 21:21     ` debug annotations for bpf progs. Was: " Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-12  0:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Network Development, bpf, David S. Miller,
	Alexei Starovoitov, Daniel Borkmann

On 10/11, Alexei Starovoitov wrote:
> On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > Even though we have the pointer to user_struct and can recover
> > uid of the user who has created the program, it usually contains
> > 0 (root) which is not very informative. Let's store the comm of the
> > calling process and export it via bpf_prog_info. This should help
> > answer the question "which process loaded this particular program".
> >
> > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > ---
> >  include/linux/bpf.h      | 1 +
> >  include/uapi/linux/bpf.h | 2 ++
> >  kernel/bpf/syscall.c     | 4 ++++
> >  3 files changed, 7 insertions(+)
> >
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 5b9d22338606..b03ea396afe5 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> >                 struct work_struct work;
> >                 struct rcu_head rcu;
> >         };
> > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> >  };
> >
> >  struct bpf_array {
> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > index a65c3b0c6935..4e883ecbba1e 100644
> > --- a/include/uapi/linux/bpf.h
> > +++ b/include/uapi/linux/bpf.h
> > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> >  #define BPF_F_NUMA_NODE                (1U << 2)
> >
> >  #define BPF_OBJ_NAME_LEN 16U
> > +#define BPF_CREATED_COMM_LEN   16U
> 
> Nack.
> 16 bytes is going to be useless.
> We found it the hard way with prog_name.
> If you want to embed additional debug information
> please use BTF for that.
BTF was my natural choice initially, but then I saw created_by_uid and
thought created_by_comm might have a chance :-)

To clarify, by BTF you mean creating some unused global variable
and use its name as the debugging info? Or there is some better way?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-12  0:38   ` Stanislav Fomichev
@ 2019-10-15 21:21     ` Alexei Starovoitov
  2019-10-15 22:14       ` Andrii Nakryiko
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2019-10-15 21:21 UTC (permalink / raw)
  To: Stanislav Fomichev, Andrii Nakryiko
  Cc: Stanislav Fomichev, Network Development, bpf, David S. Miller,
	Daniel Borkmann, Yonghong Song

On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
>
> On 10/11, Alexei Starovoitov wrote:
> > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > Even though we have the pointer to user_struct and can recover
> > > uid of the user who has created the program, it usually contains
> > > 0 (root) which is not very informative. Let's store the comm of the
> > > calling process and export it via bpf_prog_info. This should help
> > > answer the question "which process loaded this particular program".
> > >
> > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > ---
> > >  include/linux/bpf.h      | 1 +
> > >  include/uapi/linux/bpf.h | 2 ++
> > >  kernel/bpf/syscall.c     | 4 ++++
> > >  3 files changed, 7 insertions(+)
> > >
> > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > index 5b9d22338606..b03ea396afe5 100644
> > > --- a/include/linux/bpf.h
> > > +++ b/include/linux/bpf.h
> > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > >                 struct work_struct work;
> > >                 struct rcu_head rcu;
> > >         };
> > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > >  };
> > >
> > >  struct bpf_array {
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index a65c3b0c6935..4e883ecbba1e 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > >
> > >  #define BPF_OBJ_NAME_LEN 16U
> > > +#define BPF_CREATED_COMM_LEN   16U
> >
> > Nack.
> > 16 bytes is going to be useless.
> > We found it the hard way with prog_name.
> > If you want to embed additional debug information
> > please use BTF for that.
> BTF was my natural choice initially, but then I saw created_by_uid and
> thought created_by_comm might have a chance :-)
>
> To clarify, by BTF you mean creating some unused global variable
> and use its name as the debugging info? Or there is some better way?

I was thinking about adding new section to .btf.ext with this extra data,
but global variable is a better idea indeed.
We'd need to standardize such variables names, so that
bpftool can parse and print it while doing 'bpftool prog show'.
We see more and more cases where services use more than
one program in single .c file to accomplish their goals.
Tying such debug info (like 'created_by_comm') to each program
individually isn't quite right.
In that sense global variables are better, since they cover the
whole .c file.
Beyond 'created_by_comm' there are others things that people
will likely want to know.
Like which version of llvm was used to compile this .o file.
Which unix user name compiled it.
The name of service/daemon that will be using this .o
and so on.
May be some standard prefix to such global variables will do?
Like "bpftool prog show" can scan global data for
"__annotate_#name" and print both name and string contents ?
For folks who regularly ssh into servers to debug bpf progs
that will help a lot.
May be some annotations llvm can automatically add to .o.
Thoughts?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 21:21     ` debug annotations for bpf progs. Was: " Alexei Starovoitov
@ 2019-10-15 22:14       ` Andrii Nakryiko
  2019-10-15 22:24         ` Alexei Starovoitov
  2019-10-15 22:26       ` Stanislav Fomichev
  2019-10-16 14:01       ` Daniel Borkmann
  2 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2019-10-15 22:14 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Stanislav Fomichev, Network Development, bpf,
	David S. Miller, Daniel Borkmann, Yonghong Song

On Tue, Oct 15, 2019 at 2:22 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 10/11, Alexei Starovoitov wrote:
> > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Even though we have the pointer to user_struct and can recover
> > > > uid of the user who has created the program, it usually contains
> > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > calling process and export it via bpf_prog_info. This should help
> > > > answer the question "which process loaded this particular program".
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  include/linux/bpf.h      | 1 +
> > > >  include/uapi/linux/bpf.h | 2 ++
> > > >  kernel/bpf/syscall.c     | 4 ++++
> > > >  3 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 5b9d22338606..b03ea396afe5 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > >                 struct work_struct work;
> > > >                 struct rcu_head rcu;
> > > >         };
> > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > >  };
> > > >
> > > >  struct bpf_array {
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > >
> > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > +#define BPF_CREATED_COMM_LEN   16U
> > >
> > > Nack.
> > > 16 bytes is going to be useless.
> > > We found it the hard way with prog_name.
> > > If you want to embed additional debug information
> > > please use BTF for that.
> > BTF was my natural choice initially, but then I saw created_by_uid and
> > thought created_by_comm might have a chance :-)
> >
> > To clarify, by BTF you mean creating some unused global variable
> > and use its name as the debugging info? Or there is some better way?
>
> I was thinking about adding new section to .btf.ext with this extra data,
> but global variable is a better idea indeed.
> We'd need to standardize such variables names, so that
> bpftool can parse and print it while doing 'bpftool prog show'.
> We see more and more cases where services use more than
> one program in single .c file to accomplish their goals.
> Tying such debug info (like 'created_by_comm') to each program
> individually isn't quite right.
> In that sense global variables are better, since they cover the
> whole .c file.
> Beyond 'created_by_comm' there are others things that people
> will likely want to know.
> Like which version of llvm was used to compile this .o file.
> Which unix user name compiled it.
> The name of service/daemon that will be using this .o
> and so on.
> May be some standard prefix to such global variables will do?
> Like "bpftool prog show" can scan global data for
> "__annotate_#name" and print both name and string contents ?
> For folks who regularly ssh into servers to debug bpf progs
> that will help a lot.
> May be some annotations llvm can automatically add to .o.
> Thoughts?

We can dedicate separate ELF section for such variables, similar to
license and version today, so that libbpf will know that those
variables are not real variables and shouldn't be used from BPF
program itself. But we can have many of them in single section, unlike
version and license. :) With that, we'll have metadata and list of
variables in BTF (DATASEC + VARs). The only downside - you'll need ELF
itself to get the value of that variable, no? Is that acceptable? Do
we always know where original ELF is?

Alternatively we can have extra .BTF.ext section and re-use BTF's
string section for values.

Which one is more acceptable?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 22:14       ` Andrii Nakryiko
@ 2019-10-15 22:24         ` Alexei Starovoitov
  2019-10-15 22:33           ` Andrii Nakryiko
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2019-10-15 22:24 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Stanislav Fomichev, Network Development, bpf,
	David S. Miller, Daniel Borkmann, Yonghong Song

On Tue, Oct 15, 2019 at 3:14 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 2:22 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > >
> > > On 10/11, Alexei Starovoitov wrote:
> > > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > Even though we have the pointer to user_struct and can recover
> > > > > uid of the user who has created the program, it usually contains
> > > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > > calling process and export it via bpf_prog_info. This should help
> > > > > answer the question "which process loaded this particular program".
> > > > >
> > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > ---
> > > > >  include/linux/bpf.h      | 1 +
> > > > >  include/uapi/linux/bpf.h | 2 ++
> > > > >  kernel/bpf/syscall.c     | 4 ++++
> > > > >  3 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index 5b9d22338606..b03ea396afe5 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > > >                 struct work_struct work;
> > > > >                 struct rcu_head rcu;
> > > > >         };
> > > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > > >  };
> > > > >
> > > > >  struct bpf_array {
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > > >
> > > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > > +#define BPF_CREATED_COMM_LEN   16U
> > > >
> > > > Nack.
> > > > 16 bytes is going to be useless.
> > > > We found it the hard way with prog_name.
> > > > If you want to embed additional debug information
> > > > please use BTF for that.
> > > BTF was my natural choice initially, but then I saw created_by_uid and
> > > thought created_by_comm might have a chance :-)
> > >
> > > To clarify, by BTF you mean creating some unused global variable
> > > and use its name as the debugging info? Or there is some better way?
> >
> > I was thinking about adding new section to .btf.ext with this extra data,
> > but global variable is a better idea indeed.
> > We'd need to standardize such variables names, so that
> > bpftool can parse and print it while doing 'bpftool prog show'.
> > We see more and more cases where services use more than
> > one program in single .c file to accomplish their goals.
> > Tying such debug info (like 'created_by_comm') to each program
> > individually isn't quite right.
> > In that sense global variables are better, since they cover the
> > whole .c file.
> > Beyond 'created_by_comm' there are others things that people
> > will likely want to know.
> > Like which version of llvm was used to compile this .o file.
> > Which unix user name compiled it.
> > The name of service/daemon that will be using this .o
> > and so on.
> > May be some standard prefix to such global variables will do?
> > Like "bpftool prog show" can scan global data for
> > "__annotate_#name" and print both name and string contents ?
> > For folks who regularly ssh into servers to debug bpf progs
> > that will help a lot.
> > May be some annotations llvm can automatically add to .o.
> > Thoughts?
>
> We can dedicate separate ELF section for such variables, similar to
> license and version today, so that libbpf will know that those
> variables are not real variables and shouldn't be used from BPF
> program itself. But we can have many of them in single section, unlike
> version and license. :) With that, we'll have metadata and list of
> variables in BTF (DATASEC + VARs). The only downside - you'll need ELF
> itself to get the value of that variable, no? Is that acceptable? Do
> we always know where original ELF is?

Having .o around is not acceptable.
That was already tried and didn't work with bcc.
I was proposing to have these special vars to be loaded into the kernel
as part of normal btf loading.
Not sure what special section gives.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 21:21     ` debug annotations for bpf progs. Was: " Alexei Starovoitov
  2019-10-15 22:14       ` Andrii Nakryiko
@ 2019-10-15 22:26       ` Stanislav Fomichev
  2019-10-16 14:01       ` Daniel Borkmann
  2 siblings, 0 replies; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-15 22:26 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Stanislav Fomichev, Network Development, bpf,
	David S. Miller, Daniel Borkmann, Yonghong Song

On 10/15, Alexei Starovoitov wrote:
> On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> >
> > On 10/11, Alexei Starovoitov wrote:
> > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Even though we have the pointer to user_struct and can recover
> > > > uid of the user who has created the program, it usually contains
> > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > calling process and export it via bpf_prog_info. This should help
> > > > answer the question "which process loaded this particular program".
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  include/linux/bpf.h      | 1 +
> > > >  include/uapi/linux/bpf.h | 2 ++
> > > >  kernel/bpf/syscall.c     | 4 ++++
> > > >  3 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 5b9d22338606..b03ea396afe5 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > >                 struct work_struct work;
> > > >                 struct rcu_head rcu;
> > > >         };
> > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > >  };
> > > >
> > > >  struct bpf_array {
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > >
> > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > +#define BPF_CREATED_COMM_LEN   16U
> > >
> > > Nack.
> > > 16 bytes is going to be useless.
> > > We found it the hard way with prog_name.
> > > If you want to embed additional debug information
> > > please use BTF for that.
> > BTF was my natural choice initially, but then I saw created_by_uid and
> > thought created_by_comm might have a chance :-)
> >
> > To clarify, by BTF you mean creating some unused global variable
> > and use its name as the debugging info? Or there is some better way?
> 
> I was thinking about adding new section to .btf.ext with this extra data,
> but global variable is a better idea indeed.
> We'd need to standardize such variables names, so that
> bpftool can parse and print it while doing 'bpftool prog show'.
> We see more and more cases where services use more than
> one program in single .c file to accomplish their goals.
> Tying such debug info (like 'created_by_comm') to each program
> individually isn't quite right.
> In that sense global variables are better, since they cover the
> whole .c file.
> Beyond 'created_by_comm' there are others things that people
> will likely want to know.
> Like which version of llvm was used to compile this .o file.
> Which unix user name compiled it.
> The name of service/daemon that will be using this .o
> and so on.
> May be some standard prefix to such global variables will do?
> Like "bpftool prog show" can scan global data for
> "__annotate_#name" and print both name and string contents ?
> For folks who regularly ssh into servers to debug bpf progs
> that will help a lot.
> May be some annotations llvm can automatically add to .o.
> Thoughts?
We started some proof-of-concept prototyping yesterday; the idea, roughly:
* build system generates build_info.h header which contains:
  char __attribute__((section("aux_timestamp"))) *__aux_<build timestamp> = "";
  char __attribute__((section("aux_version"))) *__aux_<version> = "";
  ...
* clang has -include flag which includes this auto-generated file,
  so we don't rely on users including it
* 'bpftool show btf | grep aux_' can be used for low-level debugging

It's not pretty, but it gets the job done. I agree that having some sort of
convention is nice to make it more usable. If we can agree on a pre-defined
section (aux in my case) so that bpftool can take "variable" from
aux_<variable> section name and print "value" from __aux_<value>,
that would be nice.

One thing I still have no idea how to implement with this scheme
is some alternative to created_by_comm. There is no easy way to
add some BTF at runtime (load) time. Ideas?

> "__annotate_#name" and print both name and string contents ?
As Andrii just pointed out, this requires knowing where to look
for the obj files to print the contents of the vars :-(

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 22:24         ` Alexei Starovoitov
@ 2019-10-15 22:33           ` Andrii Nakryiko
  2019-10-15 22:48             ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Andrii Nakryiko @ 2019-10-15 22:33 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Stanislav Fomichev, Network Development, bpf,
	David S. Miller, Daniel Borkmann, Yonghong Song

On Tue, Oct 15, 2019 at 3:24 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 3:14 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 2:22 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > >
> > > > On 10/11, Alexei Starovoitov wrote:
> > > > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > >
> > > > > > Even though we have the pointer to user_struct and can recover
> > > > > > uid of the user who has created the program, it usually contains
> > > > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > > > calling process and export it via bpf_prog_info. This should help
> > > > > > answer the question "which process loaded this particular program".
> > > > > >
> > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > ---
> > > > > >  include/linux/bpf.h      | 1 +
> > > > > >  include/uapi/linux/bpf.h | 2 ++
> > > > > >  kernel/bpf/syscall.c     | 4 ++++
> > > > > >  3 files changed, 7 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > index 5b9d22338606..b03ea396afe5 100644
> > > > > > --- a/include/linux/bpf.h
> > > > > > +++ b/include/linux/bpf.h
> > > > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > > > >                 struct work_struct work;
> > > > > >                 struct rcu_head rcu;
> > > > > >         };
> > > > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > > > >  };
> > > > > >
> > > > > >  struct bpf_array {
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > > > >
> > > > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > > > +#define BPF_CREATED_COMM_LEN   16U
> > > > >
> > > > > Nack.
> > > > > 16 bytes is going to be useless.
> > > > > We found it the hard way with prog_name.
> > > > > If you want to embed additional debug information
> > > > > please use BTF for that.
> > > > BTF was my natural choice initially, but then I saw created_by_uid and
> > > > thought created_by_comm might have a chance :-)
> > > >
> > > > To clarify, by BTF you mean creating some unused global variable
> > > > and use its name as the debugging info? Or there is some better way?
> > >
> > > I was thinking about adding new section to .btf.ext with this extra data,
> > > but global variable is a better idea indeed.
> > > We'd need to standardize such variables names, so that
> > > bpftool can parse and print it while doing 'bpftool prog show'.
> > > We see more and more cases where services use more than
> > > one program in single .c file to accomplish their goals.
> > > Tying such debug info (like 'created_by_comm') to each program
> > > individually isn't quite right.
> > > In that sense global variables are better, since they cover the
> > > whole .c file.
> > > Beyond 'created_by_comm' there are others things that people
> > > will likely want to know.
> > > Like which version of llvm was used to compile this .o file.
> > > Which unix user name compiled it.
> > > The name of service/daemon that will be using this .o
> > > and so on.
> > > May be some standard prefix to such global variables will do?
> > > Like "bpftool prog show" can scan global data for
> > > "__annotate_#name" and print both name and string contents ?
> > > For folks who regularly ssh into servers to debug bpf progs
> > > that will help a lot.
> > > May be some annotations llvm can automatically add to .o.
> > > Thoughts?
> >
> > We can dedicate separate ELF section for such variables, similar to
> > license and version today, so that libbpf will know that those
> > variables are not real variables and shouldn't be used from BPF
> > program itself. But we can have many of them in single section, unlike
> > version and license. :) With that, we'll have metadata and list of
> > variables in BTF (DATASEC + VARs). The only downside - you'll need ELF
> > itself to get the value of that variable, no? Is that acceptable? Do
> > we always know where original ELF is?
>
> Having .o around is not acceptable.
> That was already tried and didn't work with bcc.
> I was proposing to have these special vars to be loaded into the kernel
> as part of normal btf loading.

BTF is just metadata for variables. We'll know name and type
information about variable, but we need a string contents. That is
stored in ELF, so without .o file we won't be able to extract it.
Unless you have something else in mind?

> Not sure what special section gives.

It's a marker that libbpf doesn't have to allocate memory and create
internal map for that section. We don't want those annotation
variables to be backed by BPF map, do we?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 22:33           ` Andrii Nakryiko
@ 2019-10-15 22:48             ` Alexei Starovoitov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2019-10-15 22:48 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Stanislav Fomichev, Stanislav Fomichev, Network Development, bpf,
	David S. Miller, Daniel Borkmann, Yonghong Song

On Tue, Oct 15, 2019 at 3:34 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Tue, Oct 15, 2019 at 3:24 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Oct 15, 2019 at 3:14 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Tue, Oct 15, 2019 at 2:22 PM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > > >
> > > > > On 10/11, Alexei Starovoitov wrote:
> > > > > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > > > > >
> > > > > > > Even though we have the pointer to user_struct and can recover
> > > > > > > uid of the user who has created the program, it usually contains
> > > > > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > > > > calling process and export it via bpf_prog_info. This should help
> > > > > > > answer the question "which process loaded this particular program".
> > > > > > >
> > > > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > > > ---
> > > > > > >  include/linux/bpf.h      | 1 +
> > > > > > >  include/uapi/linux/bpf.h | 2 ++
> > > > > > >  kernel/bpf/syscall.c     | 4 ++++
> > > > > > >  3 files changed, 7 insertions(+)
> > > > > > >
> > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > > > index 5b9d22338606..b03ea396afe5 100644
> > > > > > > --- a/include/linux/bpf.h
> > > > > > > +++ b/include/linux/bpf.h
> > > > > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > > > > >                 struct work_struct work;
> > > > > > >                 struct rcu_head rcu;
> > > > > > >         };
> > > > > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > > > > >  };
> > > > > > >
> > > > > > >  struct bpf_array {
> > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > > > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > > > > >
> > > > > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > > > > +#define BPF_CREATED_COMM_LEN   16U
> > > > > >
> > > > > > Nack.
> > > > > > 16 bytes is going to be useless.
> > > > > > We found it the hard way with prog_name.
> > > > > > If you want to embed additional debug information
> > > > > > please use BTF for that.
> > > > > BTF was my natural choice initially, but then I saw created_by_uid and
> > > > > thought created_by_comm might have a chance :-)
> > > > >
> > > > > To clarify, by BTF you mean creating some unused global variable
> > > > > and use its name as the debugging info? Or there is some better way?
> > > >
> > > > I was thinking about adding new section to .btf.ext with this extra data,
> > > > but global variable is a better idea indeed.
> > > > We'd need to standardize such variables names, so that
> > > > bpftool can parse and print it while doing 'bpftool prog show'.
> > > > We see more and more cases where services use more than
> > > > one program in single .c file to accomplish their goals.
> > > > Tying such debug info (like 'created_by_comm') to each program
> > > > individually isn't quite right.
> > > > In that sense global variables are better, since they cover the
> > > > whole .c file.
> > > > Beyond 'created_by_comm' there are others things that people
> > > > will likely want to know.
> > > > Like which version of llvm was used to compile this .o file.
> > > > Which unix user name compiled it.
> > > > The name of service/daemon that will be using this .o
> > > > and so on.
> > > > May be some standard prefix to such global variables will do?
> > > > Like "bpftool prog show" can scan global data for
> > > > "__annotate_#name" and print both name and string contents ?
> > > > For folks who regularly ssh into servers to debug bpf progs
> > > > that will help a lot.
> > > > May be some annotations llvm can automatically add to .o.
> > > > Thoughts?
> > >
> > > We can dedicate separate ELF section for such variables, similar to
> > > license and version today, so that libbpf will know that those
> > > variables are not real variables and shouldn't be used from BPF
> > > program itself. But we can have many of them in single section, unlike
> > > version and license. :) With that, we'll have metadata and list of
> > > variables in BTF (DATASEC + VARs). The only downside - you'll need ELF
> > > itself to get the value of that variable, no? Is that acceptable? Do
> > > we always know where original ELF is?
> >
> > Having .o around is not acceptable.
> > That was already tried and didn't work with bcc.
> > I was proposing to have these special vars to be loaded into the kernel
> > as part of normal btf loading.
>
> BTF is just metadata for variables. We'll know name and type
> information about variable, but we need a string contents. That is
> stored in ELF, so without .o file we won't be able to extract it.
> Unless you have something else in mind?
>
> > Not sure what special section gives.
>
> It's a marker that libbpf doesn't have to allocate memory and create
> internal map for that section. We don't want those annotation
> variables to be backed by BPF map, do we?

I'm proposing that these special variables to be part of normal global data.
Which will be loaded into the kernel and .o is not necessary.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-15 21:21     ` debug annotations for bpf progs. Was: " Alexei Starovoitov
  2019-10-15 22:14       ` Andrii Nakryiko
  2019-10-15 22:26       ` Stanislav Fomichev
@ 2019-10-16 14:01       ` Daniel Borkmann
  2019-10-17 16:28         ` Stanislav Fomichev
  2 siblings, 1 reply; 18+ messages in thread
From: Daniel Borkmann @ 2019-10-16 14:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Stanislav Fomichev, Andrii Nakryiko, Stanislav Fomichev,
	Network Development, bpf, David S. Miller, Yonghong Song

On Tue, Oct 15, 2019 at 02:21:50PM -0700, Alexei Starovoitov wrote:
> On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > On 10/11, Alexei Starovoitov wrote:
> > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > Even though we have the pointer to user_struct and can recover
> > > > uid of the user who has created the program, it usually contains
> > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > calling process and export it via bpf_prog_info. This should help
> > > > answer the question "which process loaded this particular program".
> > > >
> > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > ---
> > > >  include/linux/bpf.h      | 1 +
> > > >  include/uapi/linux/bpf.h | 2 ++
> > > >  kernel/bpf/syscall.c     | 4 ++++
> > > >  3 files changed, 7 insertions(+)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 5b9d22338606..b03ea396afe5 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > >                 struct work_struct work;
> > > >                 struct rcu_head rcu;
> > > >         };
> > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > >  };
> > > >
> > > >  struct bpf_array {
> > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > --- a/include/uapi/linux/bpf.h
> > > > +++ b/include/uapi/linux/bpf.h
> > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > >
> > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > +#define BPF_CREATED_COMM_LEN   16U
> > >
> > > Nack.
> > > 16 bytes is going to be useless.
> > > We found it the hard way with prog_name.
> > > If you want to embed additional debug information
> > > please use BTF for that.
> > BTF was my natural choice initially, but then I saw created_by_uid and
> > thought created_by_comm might have a chance :-)
> >
> > To clarify, by BTF you mean creating some unused global variable
> > and use its name as the debugging info? Or there is some better way?
> 
> I was thinking about adding new section to .btf.ext with this extra data,
> but global variable is a better idea indeed.
> We'd need to standardize such variables names, so that
> bpftool can parse and print it while doing 'bpftool prog show'.

+1, much better indeed.

> We see more and more cases where services use more than
> one program in single .c file to accomplish their goals.
> Tying such debug info (like 'created_by_comm') to each program
> individually isn't quite right.
> In that sense global variables are better, since they cover the
> whole .c file.
> Beyond 'created_by_comm' there are others things that people
> will likely want to know.
> Like which version of llvm was used to compile this .o file.
> Which unix user name compiled it.
> The name of service/daemon that will be using this .o
> and so on.

Also latest git sha of the source repo, for example.

> May be some standard prefix to such global variables will do?
> Like "bpftool prog show" can scan global data for
> "__annotate_#name" and print both name and string contents ?
> For folks who regularly ssh into servers to debug bpf progs
> that will help a lot.
> May be some annotations llvm can automatically add to .o.
> Thoughts?

One thing that might be less clear is how information such as comm
or comm args would be stuffed into BTF here, but perhaps these two
wouldn't necessarily need to be part of it since these can be retrieved
today (as in: "which program is currently holding a reference via fd
to a certain prog/map"). For that bpftool could simply walk procfs
once and correlate via fdinfo on unique prog/map id, so we could list
comms in the dump which should be trivial to add:

  # ls -la /proc/30651/fd/10
  lrwx------ 1 root root 64 Oct 16 15:53 /proc/30651/fd/10 -> anon_inode:bpf-map
  # cat /proc/30651/fdinfo/10
  pos:	0
  flags:	02000002
  mnt_id:	15
  map_type:	1
  key_size:	24
  value_size:	12
  max_entries:	65536
  map_flags:	0x0
  memlock:	6819840
  map_id:	384          <---
  frozen:	0
  # cat /proc/30651/comm 
  cilium-agent
  # cat /proc/30651/cmdline 
  ./daemon/cilium-agent--ipv4-range10.11.0.0/16[...]--enable-node-port=true

... and similar for progs. Getting the cmdline from kernel side seems
rather annoying from looking into what detour procfs needs to perform.

But aside from these, such annotations via BTF would be really useful.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-16 14:01       ` Daniel Borkmann
@ 2019-10-17 16:28         ` Stanislav Fomichev
  2019-10-18  6:19           ` Alexei Starovoitov
  0 siblings, 1 reply; 18+ messages in thread
From: Stanislav Fomichev @ 2019-10-17 16:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, Stanislav Fomichev,
	Network Development, bpf, David S. Miller, Yonghong Song

On 10/16, Daniel Borkmann wrote:
> On Tue, Oct 15, 2019 at 02:21:50PM -0700, Alexei Starovoitov wrote:
> > On Fri, Oct 11, 2019 at 5:38 PM Stanislav Fomichev <sdf@fomichev.me> wrote:
> > > On 10/11, Alexei Starovoitov wrote:
> > > > On Fri, Oct 11, 2019 at 9:21 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > > >
> > > > > Even though we have the pointer to user_struct and can recover
> > > > > uid of the user who has created the program, it usually contains
> > > > > 0 (root) which is not very informative. Let's store the comm of the
> > > > > calling process and export it via bpf_prog_info. This should help
> > > > > answer the question "which process loaded this particular program".
> > > > >
> > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com>
> > > > > ---
> > > > >  include/linux/bpf.h      | 1 +
> > > > >  include/uapi/linux/bpf.h | 2 ++
> > > > >  kernel/bpf/syscall.c     | 4 ++++
> > > > >  3 files changed, 7 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > > index 5b9d22338606..b03ea396afe5 100644
> > > > > --- a/include/linux/bpf.h
> > > > > +++ b/include/linux/bpf.h
> > > > > @@ -421,6 +421,7 @@ struct bpf_prog_aux {
> > > > >                 struct work_struct work;
> > > > >                 struct rcu_head rcu;
> > > > >         };
> > > > > +       char created_by_comm[BPF_CREATED_COMM_LEN];
> > > > >  };
> > > > >
> > > > >  struct bpf_array {
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index a65c3b0c6935..4e883ecbba1e 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -326,6 +326,7 @@ enum bpf_attach_type {
> > > > >  #define BPF_F_NUMA_NODE                (1U << 2)
> > > > >
> > > > >  #define BPF_OBJ_NAME_LEN 16U
> > > > > +#define BPF_CREATED_COMM_LEN   16U
> > > >
> > > > Nack.
> > > > 16 bytes is going to be useless.
> > > > We found it the hard way with prog_name.
> > > > If you want to embed additional debug information
> > > > please use BTF for that.
> > > BTF was my natural choice initially, but then I saw created_by_uid and
> > > thought created_by_comm might have a chance :-)
> > >
> > > To clarify, by BTF you mean creating some unused global variable
> > > and use its name as the debugging info? Or there is some better way?
> > 
> > I was thinking about adding new section to .btf.ext with this extra data,
> > but global variable is a better idea indeed.
> > We'd need to standardize such variables names, so that
> > bpftool can parse and print it while doing 'bpftool prog show'.
> 
> +1, much better indeed.
> 
> > We see more and more cases where services use more than
> > one program in single .c file to accomplish their goals.
> > Tying such debug info (like 'created_by_comm') to each program
> > individually isn't quite right.
> > In that sense global variables are better, since they cover the
> > whole .c file.
> > Beyond 'created_by_comm' there are others things that people
> > will likely want to know.
> > Like which version of llvm was used to compile this .o file.
> > Which unix user name compiled it.
> > The name of service/daemon that will be using this .o
> > and so on.
> 
> Also latest git sha of the source repo, for example.
> 
> > May be some standard prefix to such global variables will do?
> > Like "bpftool prog show" can scan global data for
> > "__annotate_#name" and print both name and string contents ?
> > For folks who regularly ssh into servers to debug bpf progs
> > that will help a lot.
> > May be some annotations llvm can automatically add to .o.
> > Thoughts?
> 
> One thing that might be less clear is how information such as comm
> or comm args would be stuffed into BTF here, but perhaps these two
> wouldn't necessarily need to be part of it since these can be retrieved
> today (as in: "which program is currently holding a reference via fd
> to a certain prog/map"). For that bpftool could simply walk procfs
> once and correlate via fdinfo on unique prog/map id, so we could list
> comms in the dump which should be trivial to add:
> 
>   # ls -la /proc/30651/fd/10
>   lrwx------ 1 root root 64 Oct 16 15:53 /proc/30651/fd/10 -> anon_inode:bpf-map
>   # cat /proc/30651/fdinfo/10
>   pos:	0
>   flags:	02000002
>   mnt_id:	15
>   map_type:	1
>   key_size:	24
>   value_size:	12
>   max_entries:	65536
>   map_flags:	0x0
>   memlock:	6819840
>   map_id:	384          <---
>   frozen:	0
>   # cat /proc/30651/comm 
>   cilium-agent
>   # cat /proc/30651/cmdline 
>   ./daemon/cilium-agent--ipv4-range10.11.0.0/16[...]--enable-node-port=true
> 
> ... and similar for progs. Getting the cmdline from kernel side seems
> rather annoying from looking into what detour procfs needs to perform.
> 
> But aside from these, such annotations via BTF would be really useful.
Tried to do the following:

1. Add: static volatile const char __annotate_source1[] = __FILE__;
   to test_rdonly_maps.c and I think it got optimized away :-/
   At least I don't see it in the 'bpftool btf dump' output.

2. Add: char __annotate_source2[] SEC(".meta") = __FILE__;
   to test_rdonly_maps.c and do all the required plumbing in libbpf
   to treat .meta like .rodata. I think it works, but the map
   disappears after bpftool exits because this data is not referenced
   in the prog and the refcount drops to zero :-(

Am I missing something?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: debug annotations for bpf progs. Was: [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program
  2019-10-17 16:28         ` Stanislav Fomichev
@ 2019-10-18  6:19           ` Alexei Starovoitov
  0 siblings, 0 replies; 18+ messages in thread
From: Alexei Starovoitov @ 2019-10-18  6:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Daniel Borkmann, Andrii Nakryiko, Stanislav Fomichev,
	Network Development, bpf, David S. Miller, Yonghong Song

On Thu, Oct 17, 2019 at 9:28 AM Stanislav Fomichev <sdf@fomichev.me> wrote:
> Tried to do the following:
>
> 1. Add: static volatile const char __annotate_source1[] = __FILE__;
>    to test_rdonly_maps.c and I think it got optimized away :-/
>    At least I don't see it in the 'bpftool btf dump' output.
>
> 2. Add: char __annotate_source2[] SEC(".meta") = __FILE__;
>    to test_rdonly_maps.c and do all the required plumbing in libbpf
>    to treat .meta like .rodata. I think it works, but the map
>    disappears after bpftool exits because this data is not referenced
>    in the prog and the refcount drops to zero :-(
>
> Am I missing something?

"Some assembly required".

I think first variant should work at the end.
I don't think extra section should be a requirement.
Sounds like with 2 it's pretty close. Just need to make sure
that prog side sees the map as referenced.
And more importantly bpftool can find that map from prog later.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-10-18  6:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11 16:21 [PATCH bpf-next 1/3] bpf: preserve command of the process that loaded the program Stanislav Fomichev
2019-10-11 16:21 ` [PATCH bpf-next 2/3] tools/bpf: sync bpf.h Stanislav Fomichev
2019-10-11 16:21 ` [PATCH bpf-next 3/3] bpftool: print the comm of the process that loaded the program Stanislav Fomichev
2019-10-11 20:19   ` Martin Lau
2019-10-11 20:37     ` Stanislav Fomichev
2019-10-11 21:11       ` Arnaldo Carvalho de Melo
2019-10-11 21:30         ` Stanislav Fomichev
2019-10-12  0:10 ` [PATCH bpf-next 1/3] bpf: preserve command " Alexei Starovoitov
2019-10-12  0:38   ` Stanislav Fomichev
2019-10-15 21:21     ` debug annotations for bpf progs. Was: " Alexei Starovoitov
2019-10-15 22:14       ` Andrii Nakryiko
2019-10-15 22:24         ` Alexei Starovoitov
2019-10-15 22:33           ` Andrii Nakryiko
2019-10-15 22:48             ` Alexei Starovoitov
2019-10-15 22:26       ` Stanislav Fomichev
2019-10-16 14:01       ` Daniel Borkmann
2019-10-17 16:28         ` Stanislav Fomichev
2019-10-18  6:19           ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).