All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging
@ 2022-11-15  9:50 Jiri Olsa
  2022-11-15 12:49 ` Daniel Borkmann
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2022-11-15  9:50 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jakub Kicinski

hi,
perf_event_bpf_event and bpf_audit_prog calls currently report zero
program id for unload path.

It's because of the [1] change moved those audit calls into work queue
and they are executed after the id is zeroed in bpf_prog_free_id.

I originally made a change that added 'id_audit' field to struct
bpf_prog, which would be initialized as id, untouched and used
in audit callbacks.

Then I realized we might actually not need to zero prog->aux->id
in bpf_prog_free_id. It seems to be called just once on release
paths. Tests seems ok with that.

thoughts?

thanks,
jirka


[1] d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
---
 kernel/bpf/syscall.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fdbae52f463f..426529355c29 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1991,7 +1991,6 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 		__acquire(&prog_idr_lock);
 
 	idr_remove(&prog_idr, prog->aux->id);
-	prog->aux->id = 0;
 
 	if (do_idr_lock)
 		spin_unlock_irqrestore(&prog_idr_lock, flags);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging
  2022-11-15  9:50 [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging Jiri Olsa
@ 2022-11-15 12:49 ` Daniel Borkmann
  2022-11-15 23:26   ` Jiri Olsa
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Borkmann @ 2022-11-15 12:49 UTC (permalink / raw)
  To: Jiri Olsa, Alexei Starovoitov, Andrii Nakryiko
  Cc: bpf, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jakub Kicinski

On 11/15/22 10:50 AM, Jiri Olsa wrote:
> hi,
> perf_event_bpf_event and bpf_audit_prog calls currently report zero
> program id for unload path.
> 
> It's because of the [1] change moved those audit calls into work queue
> and they are executed after the id is zeroed in bpf_prog_free_id.
> 
> I originally made a change that added 'id_audit' field to struct
> bpf_prog, which would be initialized as id, untouched and used
> in audit callbacks.
> 
> Then I realized we might actually not need to zero prog->aux->id
> in bpf_prog_free_id. It seems to be called just once on release
> paths. Tests seems ok with that.
> 
> thoughts?
> 
> thanks,
> jirka
> 
> 
> [1] d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> ---
>   kernel/bpf/syscall.c | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index fdbae52f463f..426529355c29 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1991,7 +1991,6 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
>   		__acquire(&prog_idr_lock);
>   
>   	idr_remove(&prog_idr, prog->aux->id);
> -	prog->aux->id = 0;

This would trigger a race when offloaded progs are used, see also ad8ad79f4f60 ("bpf:
offload: free program id when device disappears"). __bpf_prog_offload_destroy() calls
it, and my read is that later bpf_prog_free_id() then hits the early !prog->aux->id
return path. Is there a reason for irq context to not defer the bpf_prog_free_id()?

>   	if (do_idr_lock)
>   		spin_unlock_irqrestore(&prog_idr_lock, flags);
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging
  2022-11-15 12:49 ` Daniel Borkmann
@ 2022-11-15 23:26   ` Jiri Olsa
  2022-11-15 23:34     ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Olsa @ 2022-11-15 23:26 UTC (permalink / raw)
  To: Daniel Borkmann, Martin KaFai Lau
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jakub Kicinski

On Tue, Nov 15, 2022 at 01:49:54PM +0100, Daniel Borkmann wrote:
> On 11/15/22 10:50 AM, Jiri Olsa wrote:
> > hi,
> > perf_event_bpf_event and bpf_audit_prog calls currently report zero
> > program id for unload path.
> > 
> > It's because of the [1] change moved those audit calls into work queue
> > and they are executed after the id is zeroed in bpf_prog_free_id.
> > 
> > I originally made a change that added 'id_audit' field to struct
> > bpf_prog, which would be initialized as id, untouched and used
> > in audit callbacks.
> > 
> > Then I realized we might actually not need to zero prog->aux->id
> > in bpf_prog_free_id. It seems to be called just once on release
> > paths. Tests seems ok with that.
> > 
> > thoughts?
> > 
> > thanks,
> > jirka
> > 
> > 
> > [1] d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> > ---
> >   kernel/bpf/syscall.c | 1 -
> >   1 file changed, 1 deletion(-)
> > 
> > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > index fdbae52f463f..426529355c29 100644
> > --- a/kernel/bpf/syscall.c
> > +++ b/kernel/bpf/syscall.c
> > @@ -1991,7 +1991,6 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
> >   		__acquire(&prog_idr_lock);
> >   	idr_remove(&prog_idr, prog->aux->id);
> > -	prog->aux->id = 0;
> 
> This would trigger a race when offloaded progs are used, see also ad8ad79f4f60 ("bpf:
> offload: free program id when device disappears"). __bpf_prog_offload_destroy() calls
> it, and my read is that later bpf_prog_free_id() then hits the early !prog->aux->id
> return path. Is there a reason for irq context to not defer the bpf_prog_free_id()?

there's comment saying:
  /* bpf_prog_free_id() must be called first */

it was added together with the BPF_(PROG|MAP)_GET_NEXT_ID api:
  34ad5580f8f9 bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command

Martin, any idea?

while looking on this I noticed we can remove the do_idr_lock argument
(patch below) because it's always true and I think we need to upgrade
all the prog_idr_lock locks to spin_lock_irqsave because bpf_prog_put
could be called from irq context because of d809e134be7a

thanks,
jirka


---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 49f9d2bec401..50c71ce698d0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1761,7 +1761,7 @@ void bpf_prog_inc(struct bpf_prog *prog);
 struct bpf_prog * __must_check bpf_prog_inc_not_zero(struct bpf_prog *prog);
 void bpf_prog_put(struct bpf_prog *prog);
 
-void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock);
+void bpf_prog_free_id(struct bpf_prog *prog);
 void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock);
 
 struct btf_field *btf_record_find(const struct btf_record *rec,
diff --git a/kernel/bpf/offload.c b/kernel/bpf/offload.c
index 13e4efc971e6..327dab644200 100644
--- a/kernel/bpf/offload.c
+++ b/kernel/bpf/offload.c
@@ -217,7 +217,7 @@ static void __bpf_prog_offload_destroy(struct bpf_prog *prog)
 		offload->offdev->ops->destroy(prog);
 
 	/* Make sure BPF_PROG_GET_NEXT_ID can't find this dead program */
-	bpf_prog_free_id(prog, true);
+	bpf_prog_free_id(prog);
 
 	list_del_init(&offload->offloads);
 	kfree(offload);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index fdbae52f463f..9b929d8ab06d 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1973,7 +1973,7 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	return id > 0 ? 0 : id;
 }
 
-void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
+void bpf_prog_free_id(struct bpf_prog *prog)
 {
 	unsigned long flags;
 
@@ -1985,18 +1985,12 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
 	if (!prog->aux->id)
 		return;
 
-	if (do_idr_lock)
-		spin_lock_irqsave(&prog_idr_lock, flags);
-	else
-		__acquire(&prog_idr_lock);
+	spin_lock_irqsave(&prog_idr_lock, flags);
 
 	idr_remove(&prog_idr, prog->aux->id);
 	prog->aux->id = 0;
 
-	if (do_idr_lock)
-		spin_unlock_irqrestore(&prog_idr_lock, flags);
-	else
-		__release(&prog_idr_lock);
+	spin_unlock_irqrestore(&prog_idr_lock, flags);
 }
 
 static void __bpf_prog_put_rcu(struct rcu_head *rcu)
@@ -2048,7 +2042,7 @@ static void __bpf_prog_put(struct bpf_prog *prog, bool do_idr_lock)
 
 	if (atomic64_dec_and_test(&aux->refcnt)) {
 		/* bpf_prog_free_id() must be called first */
-		bpf_prog_free_id(prog, do_idr_lock);
+		bpf_prog_free_id(prog);
 
 		if (in_irq() || irqs_disabled()) {
 			INIT_WORK(&aux->work, bpf_prog_put_deferred);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging
  2022-11-15 23:26   ` Jiri Olsa
@ 2022-11-15 23:34     ` Alexei Starovoitov
  2022-11-16  7:29       ` Jiri Olsa
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2022-11-15 23:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Daniel Borkmann, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jakub Kicinski

On Tue, Nov 15, 2022 at 3:26 PM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Tue, Nov 15, 2022 at 01:49:54PM +0100, Daniel Borkmann wrote:
> > On 11/15/22 10:50 AM, Jiri Olsa wrote:
> > > hi,
> > > perf_event_bpf_event and bpf_audit_prog calls currently report zero
> > > program id for unload path.
> > >
> > > It's because of the [1] change moved those audit calls into work queue
> > > and they are executed after the id is zeroed in bpf_prog_free_id.
> > >
> > > I originally made a change that added 'id_audit' field to struct
> > > bpf_prog, which would be initialized as id, untouched and used
> > > in audit callbacks.
> > >
> > > Then I realized we might actually not need to zero prog->aux->id
> > > in bpf_prog_free_id. It seems to be called just once on release
> > > paths. Tests seems ok with that.
> > >
> > > thoughts?
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > [1] d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> > > ---
> > >   kernel/bpf/syscall.c | 1 -
> > >   1 file changed, 1 deletion(-)
> > >
> > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > index fdbae52f463f..426529355c29 100644
> > > --- a/kernel/bpf/syscall.c
> > > +++ b/kernel/bpf/syscall.c
> > > @@ -1991,7 +1991,6 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
> > >             __acquire(&prog_idr_lock);
> > >     idr_remove(&prog_idr, prog->aux->id);
> > > -   prog->aux->id = 0;
> >
> > This would trigger a race when offloaded progs are used, see also ad8ad79f4f60 ("bpf:
> > offload: free program id when device disappears"). __bpf_prog_offload_destroy() calls
> > it, and my read is that later bpf_prog_free_id() then hits the early !prog->aux->id
> > return path. Is there a reason for irq context to not defer the bpf_prog_free_id()?
>
> there's comment saying:
>   /* bpf_prog_free_id() must be called first */
>
> it was added together with the BPF_(PROG|MAP)_GET_NEXT_ID api:
>   34ad5580f8f9 bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
>
> Martin, any idea?
>
> while looking on this I noticed we can remove the do_idr_lock argument
> (patch below) because it's always true and I think we need to upgrade
> all the prog_idr_lock locks to spin_lock_irqsave because bpf_prog_put
> could be called from irq context because of d809e134be7a

before we dive into rabbit hole let's understand
the severity of
"perf_event_bpf_event and bpf_audit_prog calls currently report zero
 program id for unload path"

and why we cannot leave it as-is.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging
  2022-11-15 23:34     ` Alexei Starovoitov
@ 2022-11-16  7:29       ` Jiri Olsa
  0 siblings, 0 replies; 5+ messages in thread
From: Jiri Olsa @ 2022-11-16  7:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Jiri Olsa, Daniel Borkmann, Martin KaFai Lau, Alexei Starovoitov,
	Andrii Nakryiko, bpf, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jakub Kicinski

On Tue, Nov 15, 2022 at 03:34:55PM -0800, Alexei Starovoitov wrote:
> On Tue, Nov 15, 2022 at 3:26 PM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Tue, Nov 15, 2022 at 01:49:54PM +0100, Daniel Borkmann wrote:
> > > On 11/15/22 10:50 AM, Jiri Olsa wrote:
> > > > hi,
> > > > perf_event_bpf_event and bpf_audit_prog calls currently report zero
> > > > program id for unload path.
> > > >
> > > > It's because of the [1] change moved those audit calls into work queue
> > > > and they are executed after the id is zeroed in bpf_prog_free_id.
> > > >
> > > > I originally made a change that added 'id_audit' field to struct
> > > > bpf_prog, which would be initialized as id, untouched and used
> > > > in audit callbacks.
> > > >
> > > > Then I realized we might actually not need to zero prog->aux->id
> > > > in bpf_prog_free_id. It seems to be called just once on release
> > > > paths. Tests seems ok with that.
> > > >
> > > > thoughts?
> > > >
> > > > thanks,
> > > > jirka
> > > >
> > > >
> > > > [1] d809e134be7a ("bpf: Prepare bpf_prog_put() to be called from irq context.")
> > > > ---
> > > >   kernel/bpf/syscall.c | 1 -
> > > >   1 file changed, 1 deletion(-)
> > > >
> > > > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> > > > index fdbae52f463f..426529355c29 100644
> > > > --- a/kernel/bpf/syscall.c
> > > > +++ b/kernel/bpf/syscall.c
> > > > @@ -1991,7 +1991,6 @@ void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock)
> > > >             __acquire(&prog_idr_lock);
> > > >     idr_remove(&prog_idr, prog->aux->id);
> > > > -   prog->aux->id = 0;
> > >
> > > This would trigger a race when offloaded progs are used, see also ad8ad79f4f60 ("bpf:
> > > offload: free program id when device disappears"). __bpf_prog_offload_destroy() calls
> > > it, and my read is that later bpf_prog_free_id() then hits the early !prog->aux->id
> > > return path. Is there a reason for irq context to not defer the bpf_prog_free_id()?
> >
> > there's comment saying:
> >   /* bpf_prog_free_id() must be called first */
> >
> > it was added together with the BPF_(PROG|MAP)_GET_NEXT_ID api:
> >   34ad5580f8f9 bpf: Add BPF_(PROG|MAP)_GET_NEXT_ID command
> >
> > Martin, any idea?
> >
> > while looking on this I noticed we can remove the do_idr_lock argument
> > (patch below) because it's always true and I think we need to upgrade
> > all the prog_idr_lock locks to spin_lock_irqsave because bpf_prog_put
> > could be called from irq context because of d809e134be7a
> 
> before we dive into rabbit hole let's understand
> the severity of
> "perf_event_bpf_event and bpf_audit_prog calls currently report zero
>  program id for unload path"
> 
> and why we cannot leave it as-is.

I found this because I wanted use perf bpf-event to monitor bpf
programs loads/unloads.. and I need 'id' on the unload event

perf_event_bpf_event is irq safe so quick fix for me would be
to move it back:

        if (atomic64_dec_and_test(&aux->refcnt)) {
+               perf_event_bpf_event(prog, PERF_BPF_EVENT_PROG_UNLOAD, 0);
                /* bpf_prog_free_id() must be called first */
                bpf_prog_free_id(prog, do_idr_lock);

FWIW I also used fix below for a while for testing

jirka


---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index d88e0741b381..6b752d4815e6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1163,6 +1163,7 @@ struct bpf_prog_aux {
 	u32 max_tp_access;
 	u32 stack_depth;
 	u32 id;
+	u32 id_audit;
 	u32 func_cnt; /* used by non-func prog as the number of func progs */
 	u32 func_idx; /* 0 for non-func prog, the index in func array for func prog */
 	u32 attach_btf_id; /* in-kernel BTF type id to attach to */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d6081e8336c6..1f4fdf0aaac6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1930,7 +1930,7 @@ static void bpf_audit_prog(const struct bpf_prog *prog, unsigned int op)
 	if (unlikely(!ab))
 		return;
 	audit_log_format(ab, "prog-id=%u op=%s",
-			 prog->aux->id, bpf_audit_str[op]);
+			 prog->aux->id_audit, bpf_audit_str[op]);
 	audit_log_end(ab);
 }
 
@@ -1942,7 +1942,7 @@ static int bpf_prog_alloc_id(struct bpf_prog *prog)
 	spin_lock_bh(&prog_idr_lock);
 	id = idr_alloc_cyclic(&prog_idr, prog, 1, INT_MAX, GFP_ATOMIC);
 	if (id > 0)
-		prog->aux->id = id;
+		prog->aux->id = prog->aux->id_audit = id;
 	spin_unlock_bh(&prog_idr_lock);
 	idr_preload_end();
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0f0bfcf5adce..32edb3a4360e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9065,7 +9065,7 @@ void perf_event_bpf_event(struct bpf_prog *prog,
 			},
 			.type = type,
 			.flags = flags,
-			.id = prog->aux->id,
+			.id = prog->aux->id_audit,
 		},
 	};
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-16  7:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-15  9:50 [RFC bpf-next] bpf: Fix perf bpf event and audit prog id logging Jiri Olsa
2022-11-15 12:49 ` Daniel Borkmann
2022-11-15 23:26   ` Jiri Olsa
2022-11-15 23:34     ` Alexei Starovoitov
2022-11-16  7:29       ` Jiri Olsa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.