linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization
@ 2019-06-25 21:38 Roman Gushchin
  2019-06-26 20:04 ` Song Liu
  2019-06-26 20:04 ` Alexei Starovoitov
  0 siblings, 2 replies; 3+ messages in thread
From: Roman Gushchin @ 2019-06-25 21:38 UTC (permalink / raw)
  To: Alexei Starovoitov, Tejun Heo, bpf
  Cc: Daniel Borkmann, kernel-team, linux-kernel, Roman Gushchin

Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
from cgroup itself"), cgroup_bpf release occurs asynchronously
(from a worker context), and before the release of the cgroup itself.

This introduced a previously non-existing race between the release
and update paths. E.g. if a leaf's cgroup_bpf is released and a new
bpf program is attached to the one of ancestor cgroups at the same
time. The race may result in double-free and other memory corruptions.

To fix the problem, let's protect the body of cgroup_bpf_release()
with cgroup_mutex, as it was effectively previously, when all this
code was called from the cgroup release path with cgroup mutex held.

Also let's skip cgroups, which have no chances to invoke a bpf
program, on the update path. If the cgroup bpf refcnt reached 0,
it means that the cgroup is offline (no attached processes), and
there are no associated sockets left. It means there is no point
in updating effective progs array! And it can lead to a leak,
if it happens after the release. So, let's skip such cgroups.

Big thanks for Tejun Heo for discovering and debugging of this
problem!

Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from
cgroup itself")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roman Gushchin <guro@fb.com>
---
 kernel/bpf/cgroup.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index c225c42e114a..077ed3a19848 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -16,6 +16,8 @@
 #include <linux/bpf-cgroup.h>
 #include <net/sock.h>
 
+#include "../cgroup/cgroup-internal.h"
+
 DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
 EXPORT_SYMBOL(cgroup_bpf_enabled_key);
 
@@ -38,6 +40,8 @@ static void cgroup_bpf_release(struct work_struct *work)
 	struct bpf_prog_array *old_array;
 	unsigned int type;
 
+	mutex_lock(&cgroup_mutex);
+
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
 		struct list_head *progs = &cgrp->bpf.progs[type];
 		struct bpf_prog_list *pl, *tmp;
@@ -54,10 +58,12 @@ static void cgroup_bpf_release(struct work_struct *work)
 		}
 		old_array = rcu_dereference_protected(
 				cgrp->bpf.effective[type],
-				percpu_ref_is_dying(&cgrp->bpf.refcnt));
+				lockdep_is_held(&cgroup_mutex));
 		bpf_prog_array_free(old_array);
 	}
 
+	mutex_unlock(&cgroup_mutex);
+
 	percpu_ref_exit(&cgrp->bpf.refcnt);
 	cgroup_put(cgrp);
 }
@@ -229,6 +235,9 @@ static int update_effective_progs(struct cgroup *cgrp,
 	css_for_each_descendant_pre(css, &cgrp->self) {
 		struct cgroup *desc = container_of(css, struct cgroup, self);
 
+		if (percpu_ref_is_zero(&desc->bpf.refcnt))
+			continue;
+
 		err = compute_effective_progs(desc, type, &desc->bpf.inactive);
 		if (err)
 			goto cleanup;
@@ -238,6 +247,14 @@ static int update_effective_progs(struct cgroup *cgrp,
 	css_for_each_descendant_pre(css, &cgrp->self) {
 		struct cgroup *desc = container_of(css, struct cgroup, self);
 
+		if (percpu_ref_is_zero(&desc->bpf.refcnt)) {
+			if (unlikely(desc->bpf.inactive)) {
+				bpf_prog_array_free(desc->bpf.inactive);
+				desc->bpf.inactive = NULL;
+			}
+			continue;
+		}
+
 		activate_effective_progs(desc, type, desc->bpf.inactive);
 		desc->bpf.inactive = NULL;
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization
  2019-06-25 21:38 [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization Roman Gushchin
@ 2019-06-26 20:04 ` Song Liu
  2019-06-26 20:04 ` Alexei Starovoitov
  1 sibling, 0 replies; 3+ messages in thread
From: Song Liu @ 2019-06-26 20:04 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Tejun Heo, bpf, Daniel Borkmann, Kernel Team,
	linux-kernel



> On Jun 25, 2019, at 2:38 PM, Roman Gushchin <guro@fb.com> wrote:
> 
> Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
> from cgroup itself"), cgroup_bpf release occurs asynchronously
> (from a worker context), and before the release of the cgroup itself.
> 
> This introduced a previously non-existing race between the release
> and update paths. E.g. if a leaf's cgroup_bpf is released and a new
> bpf program is attached to the one of ancestor cgroups at the same
> time. The race may result in double-free and other memory corruptions.
> 
> To fix the problem, let's protect the body of cgroup_bpf_release()
> with cgroup_mutex, as it was effectively previously, when all this
> code was called from the cgroup release path with cgroup mutex held.
> 
> Also let's skip cgroups, which have no chances to invoke a bpf
> program, on the update path. If the cgroup bpf refcnt reached 0,
> it means that the cgroup is offline (no attached processes), and
> there are no associated sockets left. It means there is no point
> in updating effective progs array! And it can lead to a leak,
> if it happens after the release. So, let's skip such cgroups.
> 
> Big thanks for Tejun Heo for discovering and debugging of this
> problem!
> 
> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from
> cgroup itself")
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Roman Gushchin <guro@fb.com>

LGTM. 

Acked-by: Song Liu <songliubraving@fb.com>


> ---
> kernel/bpf/cgroup.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index c225c42e114a..077ed3a19848 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -16,6 +16,8 @@
> #include <linux/bpf-cgroup.h>
> #include <net/sock.h>
> 
> +#include "../cgroup/cgroup-internal.h"
> +
> DEFINE_STATIC_KEY_FALSE(cgroup_bpf_enabled_key);
> EXPORT_SYMBOL(cgroup_bpf_enabled_key);
> 
> @@ -38,6 +40,8 @@ static void cgroup_bpf_release(struct work_struct *work)
> 	struct bpf_prog_array *old_array;
> 	unsigned int type;
> 
> +	mutex_lock(&cgroup_mutex);
> +
> 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
> 		struct list_head *progs = &cgrp->bpf.progs[type];
> 		struct bpf_prog_list *pl, *tmp;
> @@ -54,10 +58,12 @@ static void cgroup_bpf_release(struct work_struct *work)
> 		}
> 		old_array = rcu_dereference_protected(
> 				cgrp->bpf.effective[type],
> -				percpu_ref_is_dying(&cgrp->bpf.refcnt));
> +				lockdep_is_held(&cgroup_mutex));
> 		bpf_prog_array_free(old_array);
> 	}
> 
> +	mutex_unlock(&cgroup_mutex);
> +
> 	percpu_ref_exit(&cgrp->bpf.refcnt);
> 	cgroup_put(cgrp);
> }
> @@ -229,6 +235,9 @@ static int update_effective_progs(struct cgroup *cgrp,
> 	css_for_each_descendant_pre(css, &cgrp->self) {
> 		struct cgroup *desc = container_of(css, struct cgroup, self);
> 
> +		if (percpu_ref_is_zero(&desc->bpf.refcnt))
> +			continue;
> +
> 		err = compute_effective_progs(desc, type, &desc->bpf.inactive);
> 		if (err)
> 			goto cleanup;
> @@ -238,6 +247,14 @@ static int update_effective_progs(struct cgroup *cgrp,
> 	css_for_each_descendant_pre(css, &cgrp->self) {
> 		struct cgroup *desc = container_of(css, struct cgroup, self);
> 
> +		if (percpu_ref_is_zero(&desc->bpf.refcnt)) {
> +			if (unlikely(desc->bpf.inactive)) {
> +				bpf_prog_array_free(desc->bpf.inactive);
> +				desc->bpf.inactive = NULL;
> +			}
> +			continue;
> +		}
> +
> 		activate_effective_progs(desc, type, desc->bpf.inactive);
> 		desc->bpf.inactive = NULL;
> 	}
> -- 
> 2.21.0
> 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization
  2019-06-25 21:38 [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization Roman Gushchin
  2019-06-26 20:04 ` Song Liu
@ 2019-06-26 20:04 ` Alexei Starovoitov
  1 sibling, 0 replies; 3+ messages in thread
From: Alexei Starovoitov @ 2019-06-26 20:04 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Tejun Heo, bpf, Daniel Borkmann, Kernel Team, LKML

On Tue, Jun 25, 2019 at 2:39 PM Roman Gushchin <guro@fb.com> wrote:
>
> Since commit 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf
> from cgroup itself"), cgroup_bpf release occurs asynchronously
> (from a worker context), and before the release of the cgroup itself.
>
> This introduced a previously non-existing race between the release
> and update paths. E.g. if a leaf's cgroup_bpf is released and a new
> bpf program is attached to the one of ancestor cgroups at the same
> time. The race may result in double-free and other memory corruptions.
>
> To fix the problem, let's protect the body of cgroup_bpf_release()
> with cgroup_mutex, as it was effectively previously, when all this
> code was called from the cgroup release path with cgroup mutex held.
>
> Also let's skip cgroups, which have no chances to invoke a bpf
> program, on the update path. If the cgroup bpf refcnt reached 0,
> it means that the cgroup is offline (no attached processes), and
> there are no associated sockets left. It means there is no point
> in updating effective progs array! And it can lead to a leak,
> if it happens after the release. So, let's skip such cgroups.
>
> Big thanks for Tejun Heo for discovering and debugging of this
> problem!
>
> Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from
> cgroup itself")
> Reported-by: Tejun Heo <tj@kernel.org>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Applied. Thanks

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-26 20:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-25 21:38 [PATCH v2 bpf-next] bpf: fix cgroup bpf release synchronization Roman Gushchin
2019-06-26 20:04 ` Song Liu
2019-06-26 20:04 ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).