BPF Archive on lore.kernel.org
 help / Atom feed
* [RFC] cgroup gets release after long time
@ 2019-05-16 10:39 Jiri Olsa
  2019-05-16 15:12 ` Alexei Starovoitov
  2019-05-16 15:22 ` Roman Gushchin
  0 siblings, 2 replies; 13+ messages in thread
From: Jiri Olsa @ 2019-05-16 10:39 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Daniel Mack
  Cc: cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Pavel Hrdina

hi,
Pavel reported an issue with bpf programs (attached to cgroup)
not being released at the time when the cgroup is removed and
are still visible in 'bpftool prog' list afterwards.

It seems like this is not bpf specific, because I was able
to cut the bpf code from his example and still see delayed
release of cgroup.

It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
kernel command line option), please check the attached program
below and following scenario:

TERM 1
# gcc -o test test.c

			TERM 2
			# cd /sys/kernel/debug/tracing
			# echo 1 > events/cgroup/cgroup_release/enable

TERM 1 -> create and remove cgroup1
# ./test group1
qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)

			TERM 2
			# cat trace_pipe
			<nothing>

TERM 1 -> create and remove cgroup2
# ./test group2
qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)

			TERM 2  - group1 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1

TERM 1 -> create and remove cgroup3
# ./test group3
qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)

			TERM 2 - group2 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
			kworker/22:0-1787  [022] ....  2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2


Looks like the previous cgroup release is triggered by creating
another cgroup.  If I don't do anything the cgroup is released
(tracepoint shows) in about 90 seconds.

The cgroup_release tracepoint is triggered in css_release_work_fn,
the same function where the cgroup_bpf_put is called, hence the
delay in releasing of the bpf programs.

Is this expected or somehow configurable? It's confusing seeing
all the bpf programs from removed cgroups being around. In Pavel's
setup it's about 100 of them.

Note, I could reproduce this only with qemu-kvm being run in child
process in the example below.

thoughts? thanks,
jirka


---
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define CGROUP_PATH "/sys/fs/cgroup"

int
main(int argc, char **argv)
{
	pid_t pid = -1;
	char path[1024];
	int rc;

	pid = fork();

	if (pid == 0) {
		execl("/usr/bin/qemu-kvm",
		      "/usr/bin/qemu-kvm",
		      "-display", "none",
		      NULL);
		fprintf(stderr, "failed to start qemu process\n");
		_exit(-1);
	} else {
		int filefd = -1;
		char proc[1024];

		snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);

		sleep(1);

		if (mkdir(path, 0755) < 0) {
			fprintf(stderr, "failed to create cgroup '%s'\n", path);
			return -1;
		}

		snprintf(proc, 1024, "%s/cgroup.procs", path);

		filefd = open(proc, O_WRONLY|O_TRUNC);
		if (filefd > 0) {
			dprintf(filefd, "%u", pid);
			close(filefd);
		}

		sleep(1);
	}

	if (pid > 0)
		kill(pid, SIGTERM);
	do {
		rc = rmdir(path);
	} while (rc != 0);

	return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 10:39 [RFC] cgroup gets release after long time Jiri Olsa
@ 2019-05-16 15:12 ` Alexei Starovoitov
  2019-05-16 15:22 ` Roman Gushchin
  1 sibling, 0 replies; 13+ messages in thread
From: Alexei Starovoitov @ 2019-05-16 15:12 UTC (permalink / raw)
  To: Jiri Olsa, Roman Gushchin
  Cc: Tejun Heo, Li Zefan, Daniel Mack,
	open list:CONTROL GROUP (CGROUP),
	bpf, Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Pavel Hrdina

On Thu, May 16, 2019 at 3:39 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> hi,
> Pavel reported an issue with bpf programs (attached to cgroup)
> not being released at the time when the cgroup is removed and
> are still visible in 'bpftool prog' list afterwards.

right. the workaround systemd and others are using today is
to detach bpf prog before rmdir of cgroup.
Roman has patches to do this automatically.

> It seems like this is not bpf specific, because I was able
> to cut the bpf code from his example and still see delayed
> release of cgroup.
>
> It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
> kernel command line option), please check the attached program
> below and following scenario:
>
> TERM 1
> # gcc -o test test.c
>
>                         TERM 2
>                         # cd /sys/kernel/debug/tracing
>                         # echo 1 > events/cgroup/cgroup_release/enable
>
> TERM 1 -> create and remove cgroup1
> # ./test group1
> qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)
>
>                         TERM 2
>                         # cat trace_pipe
>                         <nothing>
>
> TERM 1 -> create and remove cgroup2
> # ./test group2
> qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)
>
>                         TERM 2  - group1 being released
>                         # cat trace_pipe
>                         kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
>
> TERM 1 -> create and remove cgroup3
> # ./test group3
> qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)
>
>                         TERM 2 - group2 being released
>                         # cat trace_pipe
>                         kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
>                         kworker/22:0-1787  [022] ....  2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2
>
>
> Looks like the previous cgroup release is triggered by creating
> another cgroup.  If I don't do anything the cgroup is released
> (tracepoint shows) in about 90 seconds.
>
> The cgroup_release tracepoint is triggered in css_release_work_fn,
> the same function where the cgroup_bpf_put is called, hence the
> delay in releasing of the bpf programs.
>
> Is this expected or somehow configurable? It's confusing seeing
> all the bpf programs from removed cgroups being around. In Pavel's
> setup it's about 100 of them.
>
> Note, I could reproduce this only with qemu-kvm being run in child
> process in the example below.
>
> thoughts? thanks,
> jirka
>
>
> ---
> #include <fcntl.h>
> #include <signal.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> #define CGROUP_PATH "/sys/fs/cgroup"
>
> int
> main(int argc, char **argv)
> {
>         pid_t pid = -1;
>         char path[1024];
>         int rc;
>
>         pid = fork();
>
>         if (pid == 0) {
>                 execl("/usr/bin/qemu-kvm",
>                       "/usr/bin/qemu-kvm",
>                       "-display", "none",
>                       NULL);
>                 fprintf(stderr, "failed to start qemu process\n");
>                 _exit(-1);
>         } else {
>                 int filefd = -1;
>                 char proc[1024];
>
>                 snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);
>
>                 sleep(1);
>
>                 if (mkdir(path, 0755) < 0) {
>                         fprintf(stderr, "failed to create cgroup '%s'\n", path);
>                         return -1;
>                 }
>
>                 snprintf(proc, 1024, "%s/cgroup.procs", path);
>
>                 filefd = open(proc, O_WRONLY|O_TRUNC);
>                 if (filefd > 0) {
>                         dprintf(filefd, "%u", pid);
>                         close(filefd);
>                 }
>
>                 sleep(1);
>         }
>
>         if (pid > 0)
>                 kill(pid, SIGTERM);
>         do {
>                 rc = rmdir(path);
>         } while (rc != 0);
>
>         return 0;
> }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 10:39 [RFC] cgroup gets release after long time Jiri Olsa
  2019-05-16 15:12 ` Alexei Starovoitov
@ 2019-05-16 15:22 ` Roman Gushchin
  2019-05-16 15:26   ` Jiri Olsa
  2019-05-16 15:31   ` Pavel Hrdina
  1 sibling, 2 replies; 13+ messages in thread
From: Roman Gushchin @ 2019-05-16 15:22 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Tejun Heo, Li Zefan, Daniel Mack, cgroups, bpf,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Pavel Hrdina

On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> hi,
> Pavel reported an issue with bpf programs (attached to cgroup)
> not being released at the time when the cgroup is removed and
> are still visible in 'bpftool prog' list afterwards.

Hi Jiri!

Can you, please, try the patch from
https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?

It should solve the problem, and I'm about to post it upstream.

Thanks!

> 
> It seems like this is not bpf specific, because I was able
> to cut the bpf code from his example and still see delayed
> release of cgroup.
> 
> It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
> kernel command line option), please check the attached program
> below and following scenario:
> 
> TERM 1
> # gcc -o test test.c
> 
> 			TERM 2
> 			# cd /sys/kernel/debug/tracing
> 			# echo 1 > events/cgroup/cgroup_release/enable
> 
> TERM 1 -> create and remove cgroup1
> # ./test group1
> qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)
> 
> 			TERM 2
> 			# cat trace_pipe
> 			<nothing>
> 
> TERM 1 -> create and remove cgroup2
> # ./test group2
> qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)
> 
> 			TERM 2  - group1 being released
> 			# cat trace_pipe
> 			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
> 
> TERM 1 -> create and remove cgroup3
> # ./test group3
> qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)
> 
> 			TERM 2 - group2 being released
> 			# cat trace_pipe
> 			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
> 			kworker/22:0-1787  [022] ....  2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2
> 
> 
> Looks like the previous cgroup release is triggered by creating
> another cgroup.  If I don't do anything the cgroup is released
> (tracepoint shows) in about 90 seconds.
> 
> The cgroup_release tracepoint is triggered in css_release_work_fn,
> the same function where the cgroup_bpf_put is called, hence the
> delay in releasing of the bpf programs.
> 
> Is this expected or somehow configurable? It's confusing seeing
> all the bpf programs from removed cgroups being around. In Pavel's
> setup it's about 100 of them.
> 
> Note, I could reproduce this only with qemu-kvm being run in child
> process in the example below.
> 
> thoughts? thanks,
> jirka
> 
> 
> ---
> #include <fcntl.h>
> #include <signal.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
> 
> #define CGROUP_PATH "/sys/fs/cgroup"
> 
> int
> main(int argc, char **argv)
> {
> 	pid_t pid = -1;
> 	char path[1024];
> 	int rc;
> 
> 	pid = fork();
> 
> 	if (pid == 0) {
> 		execl("/usr/bin/qemu-kvm",
> 		      "/usr/bin/qemu-kvm",
> 		      "-display", "none",
> 		      NULL);
> 		fprintf(stderr, "failed to start qemu process\n");
> 		_exit(-1);
> 	} else {
> 		int filefd = -1;
> 		char proc[1024];
> 
> 		snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);
> 
> 		sleep(1);
> 
> 		if (mkdir(path, 0755) < 0) {
> 			fprintf(stderr, "failed to create cgroup '%s'\n", path);
> 			return -1;
> 		}
> 
> 		snprintf(proc, 1024, "%s/cgroup.procs", path);
> 
> 		filefd = open(proc, O_WRONLY|O_TRUNC);
> 		if (filefd > 0) {
> 			dprintf(filefd, "%u", pid);
> 			close(filefd);
> 		}
> 
> 		sleep(1);
> 	}
> 
> 	if (pid > 0)
> 		kill(pid, SIGTERM);
> 	do {
> 		rc = rmdir(path);
> 	} while (rc != 0);
> 
> 	return 0;
> }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 15:22 ` Roman Gushchin
@ 2019-05-16 15:26   ` Jiri Olsa
  2019-05-16 16:46     ` Roman Gushchin
  2019-05-16 15:31   ` Pavel Hrdina
  1 sibling, 1 reply; 13+ messages in thread
From: Jiri Olsa @ 2019-05-16 15:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Tejun Heo, Li Zefan, Daniel Mack, cgroups, bpf,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Pavel Hrdina

On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > hi,
> > Pavel reported an issue with bpf programs (attached to cgroup)
> > not being released at the time when the cgroup is removed and
> > are still visible in 'bpftool prog' list afterwards.
> 
> Hi Jiri!
> 
> Can you, please, try the patch from
> https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> 
> It should solve the problem, and I'm about to post it upstream.

awesome, could you please cc me on the post?

thanks,
jirka

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 15:22 ` Roman Gushchin
  2019-05-16 15:26   ` Jiri Olsa
@ 2019-05-16 15:31   ` Pavel Hrdina
  2019-05-16 17:14     ` Roman Gushchin
  1 sibling, 1 reply; 13+ messages in thread
From: Pavel Hrdina @ 2019-05-16 15:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack, cgroups, bpf,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > hi,
> > Pavel reported an issue with bpf programs (attached to cgroup)
> > not being released at the time when the cgroup is removed and
> > are still visible in 'bpftool prog' list afterwards.
> 
> Hi Jiri!
> 
> Can you, please, try the patch from
> https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> 
> It should solve the problem, and I'm about to post it upstream.

Perfect, I'll give it a try with full libvirt setup as well.

Can we have this somehow detectable from user-space so libvirt can
decide when to use BPF or not?  I would like to avoid using BPF with
libvirt if this issue is not fixed and we cannot simply workaround it
as systemd automatically removes cgroups for us.

Thanks!

Pavel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 15:26   ` Jiri Olsa
@ 2019-05-16 16:46     ` Roman Gushchin
  0 siblings, 0 replies; 13+ messages in thread
From: Roman Gushchin @ 2019-05-16 16:46 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Tejun Heo, Li Zefan, Daniel Mack, cgroups, bpf,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	Pavel Hrdina

On Thu, May 16, 2019 at 05:26:22PM +0200, Jiri Olsa wrote:
> On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > hi,
> > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > not being released at the time when the cgroup is removed and
> > > are still visible in 'bpftool prog' list afterwards.
> > 
> > Hi Jiri!
> > 
> > Can you, please, try the patch from
> > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > 
> > It should solve the problem, and I'm about to post it upstream.
> 
> awesome, could you please cc me on the post?

Sure.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 15:31   ` Pavel Hrdina
@ 2019-05-16 17:14     ` Roman Gushchin
  2019-05-16 17:25       ` Alexei Starovoitov
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-16 17:14 UTC (permalink / raw)
  To: Pavel Hrdina
  Cc: Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack, cgroups, bpf,
	Alexei Starovoitov, Daniel Borkmann, David S. Miller,
	hange-folder>?

On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > hi,
> > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > not being released at the time when the cgroup is removed and
> > > are still visible in 'bpftool prog' list afterwards.
> > 
> > Hi Jiri!
> > 
> > Can you, please, try the patch from
> > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > 
> > It should solve the problem, and I'm about to post it upstream.
> 
> Perfect, I'll give it a try with full libvirt setup as well.
> 
> Can we have this somehow detectable from user-space so libvirt can
> decide when to use BPF or not?  I would like to avoid using BPF with
> libvirt if this issue is not fixed and we cannot simply workaround it
> as systemd automatically removes cgroups for us.

Hm, I don't think there is a good way to detect it from userspace.
At least I have no good ideas. Alexei? Daniel?

If you're interested in a particular stable version, we can probably
treat it as a "fix", and backport.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 17:14     ` Roman Gushchin
@ 2019-05-16 17:25       ` Alexei Starovoitov
  2019-05-17 10:12         ` Pavel Hrdina
  0 siblings, 1 reply; 13+ messages in thread
From: Alexei Starovoitov @ 2019-05-16 17:25 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Pavel Hrdina, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, hange-folder>?

On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
>
> On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > hi,
> > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > not being released at the time when the cgroup is removed and
> > > > are still visible in 'bpftool prog' list afterwards.
> > >
> > > Hi Jiri!
> > >
> > > Can you, please, try the patch from
> > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > >
> > > It should solve the problem, and I'm about to post it upstream.
> >
> > Perfect, I'll give it a try with full libvirt setup as well.
> >
> > Can we have this somehow detectable from user-space so libvirt can
> > decide when to use BPF or not?  I would like to avoid using BPF with
> > libvirt if this issue is not fixed and we cannot simply workaround it
> > as systemd automatically removes cgroups for us.
>
> Hm, I don't think there is a good way to detect it from userspace.
> At least I have no good ideas. Alexei? Daniel?
>
> If you're interested in a particular stable version, we can probably
> treat it as a "fix", and backport.

right.
also user space workaround is trivial.
Just detach before rmdir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-16 17:25       ` Alexei Starovoitov
@ 2019-05-17 10:12         ` Pavel Hrdina
  2019-05-18  0:56           ` Roman Gushchin
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Hrdina @ 2019-05-17 10:12 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Roman Gushchin, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1824 bytes --]

On Thu, May 16, 2019 at 10:25:50AM -0700, Alexei Starovoitov wrote:
> On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > > hi,
> > > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > > not being released at the time when the cgroup is removed and
> > > > > are still visible in 'bpftool prog' list afterwards.
> > > >
> > > > Hi Jiri!
> > > >
> > > > Can you, please, try the patch from
> > > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > > >
> > > > It should solve the problem, and I'm about to post it upstream.
> > >
> > > Perfect, I'll give it a try with full libvirt setup as well.
> > >
> > > Can we have this somehow detectable from user-space so libvirt can
> > > decide when to use BPF or not?  I would like to avoid using BPF with
> > > libvirt if this issue is not fixed and we cannot simply workaround it
> > > as systemd automatically removes cgroups for us.
> >
> > Hm, I don't think there is a good way to detect it from userspace.
> > At least I have no good ideas. Alexei? Daniel?
> >
> > If you're interested in a particular stable version, we can probably
> > treat it as a "fix", and backport.
> 
> right.
> also user space workaround is trivial.
> Just detach before rmdir.

Well yes, it's trivial but not if you are using machined from systemd.
Once libvirt kills QEMU process systemd automatically removes the
cgroup so we don't have any chance to remove the BPF program.

Would it be too ugly to put something into
'/sys/kernel/cgroup/features'?

Pavel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-17 10:12         ` Pavel Hrdina
@ 2019-05-18  0:56           ` Roman Gushchin
  2019-05-20  8:41             ` Pavel Hrdina
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-18  0:56 UTC (permalink / raw)
  To: Pavel Hrdina
  Cc: Alexei Starovoitov, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller

On Fri, May 17, 2019 at 12:12:51PM +0200, Pavel Hrdina wrote:
> On Thu, May 16, 2019 at 10:25:50AM -0700, Alexei Starovoitov wrote:
> > On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > > > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > > > hi,
> > > > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > > > not being released at the time when the cgroup is removed and
> > > > > > are still visible in 'bpftool prog' list afterwards.
> > > > >
> > > > > Hi Jiri!
> > > > >
> > > > > Can you, please, try the patch from
> > > > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > > > >
> > > > > It should solve the problem, and I'm about to post it upstream.
> > > >
> > > > Perfect, I'll give it a try with full libvirt setup as well.
> > > >
> > > > Can we have this somehow detectable from user-space so libvirt can
> > > > decide when to use BPF or not?  I would like to avoid using BPF with
> > > > libvirt if this issue is not fixed and we cannot simply workaround it
> > > > as systemd automatically removes cgroups for us.
> > >
> > > Hm, I don't think there is a good way to detect it from userspace.
> > > At least I have no good ideas. Alexei? Daniel?
> > >
> > > If you're interested in a particular stable version, we can probably
> > > treat it as a "fix", and backport.
> > 
> > right.
> > also user space workaround is trivial.
> > Just detach before rmdir.
> 
> Well yes, it's trivial but not if you are using machined from systemd.
> Once libvirt kills QEMU process systemd automatically removes the
> cgroup so we don't have any chance to remove the BPF program.
> 
> Would it be too ugly to put something into
> '/sys/kernel/cgroup/features'?

I thought about it, but it seems that /sys/kernel/cgroup/features is also
relatively new. So if we're not going to backport it (I mean auto-detaching),
than we can simple look at the kernel version, right?

If we're going to backport it, the question is which stable version we're
looking at.

In general, I don't see any reasons why cgroup/features can't be used.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-18  0:56           ` Roman Gushchin
@ 2019-05-20  8:41             ` Pavel Hrdina
  2019-05-20 19:11               ` Roman Gushchin
  0 siblings, 1 reply; 13+ messages in thread
From: Pavel Hrdina @ 2019-05-20  8:41 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3123 bytes --]

On Sat, May 18, 2019 at 12:56:12AM +0000, Roman Gushchin wrote:
> On Fri, May 17, 2019 at 12:12:51PM +0200, Pavel Hrdina wrote:
> > On Thu, May 16, 2019 at 10:25:50AM -0700, Alexei Starovoitov wrote:
> > > On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
> > > >
> > > > On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > > > > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > > > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > > > > hi,
> > > > > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > > > > not being released at the time when the cgroup is removed and
> > > > > > > are still visible in 'bpftool prog' list afterwards.
> > > > > >
> > > > > > Hi Jiri!
> > > > > >
> > > > > > Can you, please, try the patch from
> > > > > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > > > > >
> > > > > > It should solve the problem, and I'm about to post it upstream.
> > > > >
> > > > > Perfect, I'll give it a try with full libvirt setup as well.
> > > > >
> > > > > Can we have this somehow detectable from user-space so libvirt can
> > > > > decide when to use BPF or not?  I would like to avoid using BPF with
> > > > > libvirt if this issue is not fixed and we cannot simply workaround it
> > > > > as systemd automatically removes cgroups for us.
> > > >
> > > > Hm, I don't think there is a good way to detect it from userspace.
> > > > At least I have no good ideas. Alexei? Daniel?
> > > >
> > > > If you're interested in a particular stable version, we can probably
> > > > treat it as a "fix", and backport.
> > > 
> > > right.
> > > also user space workaround is trivial.
> > > Just detach before rmdir.
> > 
> > Well yes, it's trivial but not if you are using machined from systemd.
> > Once libvirt kills QEMU process systemd automatically removes the
> > cgroup so we don't have any chance to remove the BPF program.
> > 
> > Would it be too ugly to put something into
> > '/sys/kernel/cgroup/features'?
> 
> I thought about it, but it seems that /sys/kernel/cgroup/features is also
> relatively new. So if we're not going to backport it (I mean auto-detaching),
> than we can simple look at the kernel version, right?

If you think only about upstream then the version check is in most cases
good enough, but usually that's not the case and patches are backported
to downstream distributions as well.

Yes, that file was introduced in kernel 4.15 so there are some
limitations where the fix would be introspectable.

> If we're going to backport it, the question is which stable version we're
> looking at.
> 
> In general, I don't see any reasons why cgroup/features can't be used.

Perfect, in that case I would prefer if we could export it in
cgroup/features as it will be easier for user-space to figure out
whether it's safe to relay on proper cleanup behavior or not and
it will make downstream distributions life easier.

I'll try the patch today with libvirt setup.

Thanks,

Pavel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-20  8:41             ` Pavel Hrdina
@ 2019-05-20 19:11               ` Roman Gushchin
  2019-05-21  8:00                 ` Pavel Hrdina
  0 siblings, 1 reply; 13+ messages in thread
From: Roman Gushchin @ 2019-05-20 19:11 UTC (permalink / raw)
  To: Pavel Hrdina
  Cc: Alexei Starovoitov, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller

On Mon, May 20, 2019 at 10:41:26AM +0200, Pavel Hrdina wrote:
> On Sat, May 18, 2019 at 12:56:12AM +0000, Roman Gushchin wrote:
> > On Fri, May 17, 2019 at 12:12:51PM +0200, Pavel Hrdina wrote:
> > > On Thu, May 16, 2019 at 10:25:50AM -0700, Alexei Starovoitov wrote:
> > > > On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
> > > > >
> > > > > On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > > > > > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > > > > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > > > > > hi,
> > > > > > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > > > > > not being released at the time when the cgroup is removed and
> > > > > > > > are still visible in 'bpftool prog' list afterwards.
> > > > > > >
> > > > > > > Hi Jiri!
> > > > > > >
> > > > > > > Can you, please, try the patch from
> > > > > > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > > > > > >
> > > > > > > It should solve the problem, and I'm about to post it upstream.
> > > > > >
> > > > > > Perfect, I'll give it a try with full libvirt setup as well.
> > > > > >
> > > > > > Can we have this somehow detectable from user-space so libvirt can
> > > > > > decide when to use BPF or not?  I would like to avoid using BPF with
> > > > > > libvirt if this issue is not fixed and we cannot simply workaround it
> > > > > > as systemd automatically removes cgroups for us.
> > > > >
> > > > > Hm, I don't think there is a good way to detect it from userspace.
> > > > > At least I have no good ideas. Alexei? Daniel?
> > > > >
> > > > > If you're interested in a particular stable version, we can probably
> > > > > treat it as a "fix", and backport.
> > > > 
> > > > right.
> > > > also user space workaround is trivial.
> > > > Just detach before rmdir.
> > > 
> > > Well yes, it's trivial but not if you are using machined from systemd.
> > > Once libvirt kills QEMU process systemd automatically removes the
> > > cgroup so we don't have any chance to remove the BPF program.
> > > 
> > > Would it be too ugly to put something into
> > > '/sys/kernel/cgroup/features'?
> > 
> > I thought about it, but it seems that /sys/kernel/cgroup/features is also
> > relatively new. So if we're not going to backport it (I mean auto-detaching),
> > than we can simple look at the kernel version, right?
> 
> If you think only about upstream then the version check is in most cases
> good enough, but usually that's not the case and patches are backported
> to downstream distributions as well.
> 
> Yes, that file was introduced in kernel 4.15 so there are some
> limitations where the fix would be introspectable.
> 
> > If we're going to backport it, the question is which stable version we're
> > looking at.
> > 
> > In general, I don't see any reasons why cgroup/features can't be used.
> 
> Perfect, in that case I would prefer if we could export it in
> cgroup/features as it will be easier for user-space to figure out
> whether it's safe to relay on proper cleanup behavior or not and
> it will make downstream distributions life easier.

Hello, Pavel!

Tejun noticed that cgroup features are supposed to match cgroupfs mount options,
so it can't be used here. And this >= 4.15 limitation is also a significant
constraint.

Thanks!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] cgroup gets release after long time
  2019-05-20 19:11               ` Roman Gushchin
@ 2019-05-21  8:00                 ` Pavel Hrdina
  0 siblings, 0 replies; 13+ messages in thread
From: Pavel Hrdina @ 2019-05-21  8:00 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Alexei Starovoitov, Jiri Olsa, Tejun Heo, Li Zefan, Daniel Mack,
	cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller

[-- Attachment #1: Type: text/plain, Size: 3758 bytes --]

On Mon, May 20, 2019 at 07:11:39PM +0000, Roman Gushchin wrote:
> On Mon, May 20, 2019 at 10:41:26AM +0200, Pavel Hrdina wrote:
> > On Sat, May 18, 2019 at 12:56:12AM +0000, Roman Gushchin wrote:
> > > On Fri, May 17, 2019 at 12:12:51PM +0200, Pavel Hrdina wrote:
> > > > On Thu, May 16, 2019 at 10:25:50AM -0700, Alexei Starovoitov wrote:
> > > > > On Thu, May 16, 2019 at 10:15 AM Roman Gushchin <guro@fb.com> wrote:
> > > > > >
> > > > > > On Thu, May 16, 2019 at 05:31:44PM +0200, Pavel Hrdina wrote:
> > > > > > > On Thu, May 16, 2019 at 03:22:33PM +0000, Roman Gushchin wrote:
> > > > > > > > On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> > > > > > > > > hi,
> > > > > > > > > Pavel reported an issue with bpf programs (attached to cgroup)
> > > > > > > > > not being released at the time when the cgroup is removed and
> > > > > > > > > are still visible in 'bpftool prog' list afterwards.
> > > > > > > >
> > > > > > > > Hi Jiri!
> > > > > > > >
> > > > > > > > Can you, please, try the patch from
> > > > > > > > https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
> > > > > > > >
> > > > > > > > It should solve the problem, and I'm about to post it upstream.
> > > > > > >
> > > > > > > Perfect, I'll give it a try with full libvirt setup as well.
> > > > > > >
> > > > > > > Can we have this somehow detectable from user-space so libvirt can
> > > > > > > decide when to use BPF or not?  I would like to avoid using BPF with
> > > > > > > libvirt if this issue is not fixed and we cannot simply workaround it
> > > > > > > as systemd automatically removes cgroups for us.
> > > > > >
> > > > > > Hm, I don't think there is a good way to detect it from userspace.
> > > > > > At least I have no good ideas. Alexei? Daniel?
> > > > > >
> > > > > > If you're interested in a particular stable version, we can probably
> > > > > > treat it as a "fix", and backport.
> > > > > 
> > > > > right.
> > > > > also user space workaround is trivial.
> > > > > Just detach before rmdir.
> > > > 
> > > > Well yes, it's trivial but not if you are using machined from systemd.
> > > > Once libvirt kills QEMU process systemd automatically removes the
> > > > cgroup so we don't have any chance to remove the BPF program.
> > > > 
> > > > Would it be too ugly to put something into
> > > > '/sys/kernel/cgroup/features'?
> > > 
> > > I thought about it, but it seems that /sys/kernel/cgroup/features is also
> > > relatively new. So if we're not going to backport it (I mean auto-detaching),
> > > than we can simple look at the kernel version, right?
> > 
> > If you think only about upstream then the version check is in most cases
> > good enough, but usually that's not the case and patches are backported
> > to downstream distributions as well.
> > 
> > Yes, that file was introduced in kernel 4.15 so there are some
> > limitations where the fix would be introspectable.
> > 
> > > If we're going to backport it, the question is which stable version we're
> > > looking at.
> > > 
> > > In general, I don't see any reasons why cgroup/features can't be used.
> > 
> > Perfect, in that case I would prefer if we could export it in
> > cgroup/features as it will be easier for user-space to figure out
> > whether it's safe to relay on proper cleanup behavior or not and
> > it will make downstream distributions life easier.
> 
> Hello, Pavel!
> 
> Tejun noticed that cgroup features are supposed to match cgroupfs mount options,
> so it can't be used here. And this >= 4.15 limitation is also a significant
> constraint.

Hi Roman,

That's unfortunate, I guess I will have to do the version check.

Thanks for the info.

Pavel

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, back to index

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 10:39 [RFC] cgroup gets release after long time Jiri Olsa
2019-05-16 15:12 ` Alexei Starovoitov
2019-05-16 15:22 ` Roman Gushchin
2019-05-16 15:26   ` Jiri Olsa
2019-05-16 16:46     ` Roman Gushchin
2019-05-16 15:31   ` Pavel Hrdina
2019-05-16 17:14     ` Roman Gushchin
2019-05-16 17:25       ` Alexei Starovoitov
2019-05-17 10:12         ` Pavel Hrdina
2019-05-18  0:56           ` Roman Gushchin
2019-05-20  8:41             ` Pavel Hrdina
2019-05-20 19:11               ` Roman Gushchin
2019-05-21  8:00                 ` Pavel Hrdina

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org bpf@archiver.kernel.org
	public-inbox-index bpf


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/ public-inbox