From: Roman Gushchin <guro@fb.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
Daniel Mack <daniel@zonque.org>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"David S. Miller" <davem@davemloft.net>,
Pavel Hrdina <phrdina@redhat.com>
Subject: Re: [RFC] cgroup gets release after long time
Date: Thu, 16 May 2019 15:22:33 +0000 [thread overview]
Message-ID: <20190516152224.GA7163@castle.DHCP.thefacebook.com> (raw)
In-Reply-To: <20190516103915.GB27421@krava>
On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote:
> hi,
> Pavel reported an issue with bpf programs (attached to cgroup)
> not being released at the time when the cgroup is removed and
> are still visible in 'bpftool prog' list afterwards.
Hi Jiri!
Can you, please, try the patch from
https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ?
It should solve the problem, and I'm about to post it upstream.
Thanks!
>
> It seems like this is not bpf specific, because I was able
> to cut the bpf code from his example and still see delayed
> release of cgroup.
>
> It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
> kernel command line option), please check the attached program
> below and following scenario:
>
> TERM 1
> # gcc -o test test.c
>
> TERM 2
> # cd /sys/kernel/debug/tracing
> # echo 1 > events/cgroup/cgroup_release/enable
>
> TERM 1 -> create and remove cgroup1
> # ./test group1
> qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)
>
> TERM 2
> # cat trace_pipe
> <nothing>
>
> TERM 1 -> create and remove cgroup2
> # ./test group2
> qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)
>
> TERM 2 - group1 being released
> # cat trace_pipe
> kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
>
> TERM 1 -> create and remove cgroup3
> # ./test group3
> qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)
>
> TERM 2 - group2 being released
> # cat trace_pipe
> kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
> kworker/22:0-1787 [022] .... 2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2
>
>
> Looks like the previous cgroup release is triggered by creating
> another cgroup. If I don't do anything the cgroup is released
> (tracepoint shows) in about 90 seconds.
>
> The cgroup_release tracepoint is triggered in css_release_work_fn,
> the same function where the cgroup_bpf_put is called, hence the
> delay in releasing of the bpf programs.
>
> Is this expected or somehow configurable? It's confusing seeing
> all the bpf programs from removed cgroups being around. In Pavel's
> setup it's about 100 of them.
>
> Note, I could reproduce this only with qemu-kvm being run in child
> process in the example below.
>
> thoughts? thanks,
> jirka
>
>
> ---
> #include <fcntl.h>
> #include <signal.h>
> #include <stdio.h>
> #include <string.h>
> #include <sys/stat.h>
> #include <sys/types.h>
> #include <unistd.h>
>
> #define CGROUP_PATH "/sys/fs/cgroup"
>
> int
> main(int argc, char **argv)
> {
> pid_t pid = -1;
> char path[1024];
> int rc;
>
> pid = fork();
>
> if (pid == 0) {
> execl("/usr/bin/qemu-kvm",
> "/usr/bin/qemu-kvm",
> "-display", "none",
> NULL);
> fprintf(stderr, "failed to start qemu process\n");
> _exit(-1);
> } else {
> int filefd = -1;
> char proc[1024];
>
> snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);
>
> sleep(1);
>
> if (mkdir(path, 0755) < 0) {
> fprintf(stderr, "failed to create cgroup '%s'\n", path);
> return -1;
> }
>
> snprintf(proc, 1024, "%s/cgroup.procs", path);
>
> filefd = open(proc, O_WRONLY|O_TRUNC);
> if (filefd > 0) {
> dprintf(filefd, "%u", pid);
> close(filefd);
> }
>
> sleep(1);
> }
>
> if (pid > 0)
> kill(pid, SIGTERM);
> do {
> rc = rmdir(path);
> } while (rc != 0);
>
> return 0;
> }
next prev parent reply other threads:[~2019-05-16 15:23 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-16 10:39 [RFC] cgroup gets release after long time Jiri Olsa
2019-05-16 15:12 ` Alexei Starovoitov
2019-05-16 15:22 ` Roman Gushchin [this message]
2019-05-16 15:26 ` Jiri Olsa
2019-05-16 16:46 ` Roman Gushchin
2019-05-16 15:31 ` Pavel Hrdina
2019-05-16 17:14 ` Roman Gushchin
2019-05-16 17:25 ` Alexei Starovoitov
2019-05-17 10:12 ` Pavel Hrdina
2019-05-18 0:56 ` Roman Gushchin
2019-05-20 8:41 ` Pavel Hrdina
2019-05-20 19:11 ` Roman Gushchin
2019-05-21 8:00 ` Pavel Hrdina
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190516152224.GA7163@castle.DHCP.thefacebook.com \
--to=guro@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=daniel@zonque.org \
--cc=davem@davemloft.net \
--cc=jolsa@redhat.com \
--cc=lizefan@huawei.com \
--cc=phrdina@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).