All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] cgroup gets release after long time
@ 2019-05-16 10:39 Jiri Olsa
  2019-05-16 15:12 ` Alexei Starovoitov
  2019-05-16 15:22 ` Roman Gushchin
  0 siblings, 2 replies; 13+ messages in thread
From: Jiri Olsa @ 2019-05-16 10:39 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Daniel Mack
  Cc: cgroups, bpf, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Pavel Hrdina

hi,
Pavel reported an issue with bpf programs (attached to cgroup)
not being released at the time when the cgroup is removed and
are still visible in 'bpftool prog' list afterwards.

It seems like this is not bpf specific, because I was able
to cut the bpf code from his example and still see delayed
release of cgroup.

It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
kernel command line option), please check the attached program
below and following scenario:

TERM 1
# gcc -o test test.c

			TERM 2
			# cd /sys/kernel/debug/tracing
			# echo 1 > events/cgroup/cgroup_release/enable

TERM 1 -> create and remove cgroup1
# ./test group1
qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)

			TERM 2
			# cat trace_pipe
			<nothing>

TERM 1 -> create and remove cgroup2
# ./test group2
qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)

			TERM 2  - group1 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1

TERM 1 -> create and remove cgroup3
# ./test group3
qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)

			TERM 2 - group2 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
			kworker/22:0-1787  [022] ....  2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2


Looks like the previous cgroup release is triggered by creating
another cgroup.  If I don't do anything the cgroup is released
(tracepoint shows) in about 90 seconds.

The cgroup_release tracepoint is triggered in css_release_work_fn,
the same function where the cgroup_bpf_put is called, hence the
delay in releasing of the bpf programs.

Is this expected or somehow configurable? It's confusing seeing
all the bpf programs from removed cgroups being around. In Pavel's
setup it's about 100 of them.

Note, I could reproduce this only with qemu-kvm being run in child
process in the example below.

thoughts? thanks,
jirka


---
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define CGROUP_PATH "/sys/fs/cgroup"

int
main(int argc, char **argv)
{
	pid_t pid = -1;
	char path[1024];
	int rc;

	pid = fork();

	if (pid == 0) {
		execl("/usr/bin/qemu-kvm",
		      "/usr/bin/qemu-kvm",
		      "-display", "none",
		      NULL);
		fprintf(stderr, "failed to start qemu process\n");
		_exit(-1);
	} else {
		int filefd = -1;
		char proc[1024];

		snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);

		sleep(1);

		if (mkdir(path, 0755) < 0) {
			fprintf(stderr, "failed to create cgroup '%s'\n", path);
			return -1;
		}

		snprintf(proc, 1024, "%s/cgroup.procs", path);

		filefd = open(proc, O_WRONLY|O_TRUNC);
		if (filefd > 0) {
			dprintf(filefd, "%u", pid);
			close(filefd);
		}

		sleep(1);
	}

	if (pid > 0)
		kill(pid, SIGTERM);
	do {
		rc = rmdir(path);
	} while (rc != 0);

	return 0;
}

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-05-21  8:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 10:39 [RFC] cgroup gets release after long time Jiri Olsa
2019-05-16 15:12 ` Alexei Starovoitov
2019-05-16 15:22 ` Roman Gushchin
2019-05-16 15:26   ` Jiri Olsa
2019-05-16 16:46     ` Roman Gushchin
2019-05-16 15:31   ` Pavel Hrdina
2019-05-16 17:14     ` Roman Gushchin
2019-05-16 17:25       ` Alexei Starovoitov
2019-05-17 10:12         ` Pavel Hrdina
2019-05-18  0:56           ` Roman Gushchin
2019-05-20  8:41             ` Pavel Hrdina
2019-05-20 19:11               ` Roman Gushchin
2019-05-21  8:00                 ` Pavel Hrdina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.