From: Eugene Lubarsky <elubarsky.linux@gmail.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-api@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, adobriyan@gmail.com,
avagin@gmail.com, dsahern@gmail.com
Subject: Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes
Date: Tue, 25 Aug 2020 19:59:09 +1000 [thread overview]
Message-ID: <20200825195909.1d1dcd72@eug-lubuntu> (raw)
In-Reply-To: <20200810154132.GA4171851@kroah.com>
On Mon, 10 Aug 2020 17:41:32 +0200
Greg KH <gregkh@linuxfoundation.org> wrote:
> On Tue, Aug 11, 2020 at 01:27:00AM +1000, Eugene Lubarsky wrote:
> > On Mon, 10 Aug 2020 17:04:53 +0200
> > Greg KH <gregkh@linuxfoundation.org> wrote:
> And have you benchmarked any of this? Try working with the common
> tools that want this information and see if it actually is noticeable
> (hint, I have been doing that with the readfile work and it's
> surprising what the results are in places...)
Apologies for the delay. Here are some benchmarks with atop.
Patch to atop at: https://github.com/eug48/atop/commits/proc-all
Patch to add /proc/all/schedstat & cpuset below.
atop not collecting threads & cmdline as /proc/all/ doesn't support it.
10,000 processes, kernel 5.8, nested KVM, 2 cores of i7-6700HQ @ 2.60GHz
# USE_PROC_ALL=0 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1
01:33:05 %usr %system %guest %wait %CPU CPU Command
01:33:06 33.66 33.66 0.00 0.99 67.33 1 atop
01:33:07 33.00 32.00 0.00 2.00 65.00 0 atop
01:33:08 34.00 31.00 0.00 1.00 65.00 0 atop
...
Average: 33.15 32.79 0.00 1.09 65.94 - atop
# USE_PROC_ALL=1 ./atop -w test 1 &
# pidstat -p $(pidof atop) 1
01:33:33 %usr %system %guest %wait %CPU CPU Command
01:33:34 28.00 14.00 0.00 1.00 42.00 1 atop
01:33:35 28.00 14.00 0.00 0.00 42.00 1 atop
01:33:36 26.00 13.00 0.00 0.00 39.00 1 atop
...
Average: 27.08 12.86 0.00 0.35 39.94 - atop
So CPU usage goes down from ~65% to ~40%.
Data collection times in milliseconds are:
# xsv cat columns proc.csv procall.csv \
> | xsv stats \
> | xsv select field,min,max,mean,stddev \
> | xsv table
field min max mean stddev
/proc time 558 625 586.59 18.29
/proc/all time 231 262 243.56 8.02
Much performance optimisation can still be done, e.g. the modified atop
uses fgets which is reading 1KB at a time, and seq_file seems to only
return 4KB pages. task_diag should be much faster still.
I'd imagine this sort of thing would be useful for daemons monitoring
large numbers of processes. I don't run such systems myself; my initial
motivation was frustration with the Kubernetes kubelet having ~2-4% CPU
usage even with a couple of containers. Basic profiling suggests syscalls
have a lot to do with it - it's actually reading loads of tiny cgroup files
and enumerating many directories every 10 seconds, but /proc has similar
issues and seemed easier to start with.
Anyway, I've read that io_uring could also help here in the near future,
which would be really cool especially if there was a way to enumerate
directories and read many files regex-style in a single operation,
e.g. /proc/[0-9].*/(stat|statm|io)
> > Currently I'm trying to re-use the existing code in fs/proc that
> > controls which PIDs are visible, but may well be missing
> > something..
>
> Try it out and see if it works correctly. And pid namespaces are not
> the only thing these days from what I call :)
>
I've tried `unshare --fork --pid --mount-proc cat /proc/all/stat`
which seems to behave correctly. ptrace flags are handled by the
existing code.
Best Wishes,
Eugene
From 2ffc2e388f7ce4e3f182c2442823e5f13bae03dd Mon Sep 17 00:00:00 2001
From: Eugene Lubarsky <elubarsky.linux@gmail.com>
Date: Tue, 25 Aug 2020 12:36:41 +1000
Subject: [RFC PATCH] fs/proc: /proc/all: add schedstat and cpuset
Signed-off-by: Eugene Lubarsky <elubarsky.linux@gmail.com>
---
fs/proc/base.c | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0bba4b3a985e..44d73f1ade4a 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3944,6 +3944,36 @@ static int proc_all_io(struct seq_file *m, void *v)
}
#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+static int proc_all_cpuset(struct seq_file *m, void *v)
+{
+ struct all_iter *iter = (struct all_iter *) v;
+ struct pid_namespace *ns = iter->ns;
+ struct task_struct *task = iter->tgid_iter.task;
+ struct pid *pid = task->thread_pid;
+
+ seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+ seq_puts(m, " ");
+
+ return proc_cpuset_show(m, ns, pid, task);
+}
+#endif
+
+#ifdef CONFIG_SCHED_INFO
+static int proc_all_schedstat(struct seq_file *m, void *v)
+{
+ struct all_iter *iter = (struct all_iter *) v;
+ struct pid_namespace *ns = iter->ns;
+ struct task_struct *task = iter->tgid_iter.task;
+ struct pid *pid = task->thread_pid;
+
+ seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
+ seq_puts(m, " ");
+
+ return proc_pid_schedstat(m, ns, pid, task);
+}
+#endif
+
static int proc_all_statx(struct seq_file *m, void *v)
{
struct all_iter *iter = (struct all_iter *) v;
@@ -3990,6 +4020,12 @@ PROC_ALL_OPS(status);
#ifdef CONFIG_TASK_IO_ACCOUNTING
PROC_ALL_OPS(io);
#endif
+#ifdef CONFIG_SCHED_INFO
+ PROC_ALL_OPS(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+ PROC_ALL_OPS(cpuset);
+#endif
#define PROC_ALL_CREATE(NAME) \
do { \
@@ -4011,4 +4047,10 @@ void __init proc_all_init(void)
#ifdef CONFIG_TASK_IO_ACCOUNTING
PROC_ALL_CREATE(io);
#endif
+#ifdef CONFIG_SCHED_INFO
+ PROC_ALL_CREATE(schedstat);
+#endif
+#ifdef CONFIG_PROC_PID_CPUSET
+ PROC_ALL_CREATE(cpuset);
+#endif
}
--
2.25.1
next prev parent reply other threads:[~2020-08-25 9:59 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 14:58 [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 1/5] fs/proc: Introduce /proc/all/stat Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 2/5] fs/proc: Introduce /proc/all/statm Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 3/5] fs/proc: Introduce /proc/all/status Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 4/5] fs/proc: Introduce /proc/all/io Eugene Lubarsky
2020-08-10 14:58 ` [RFC PATCH 5/5] fs/proc: Introduce /proc/all/statx Eugene Lubarsky
2020-08-10 15:04 ` [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes Greg KH
2020-08-10 15:27 ` Eugene Lubarsky
2020-08-10 15:41 ` Greg KH
2020-08-25 9:59 ` Eugene Lubarsky [this message]
2020-08-12 7:51 ` Andrei Vagin
2020-08-13 4:47 ` David Ahern
2020-08-13 8:03 ` Andrei Vagin
2020-08-13 15:01 ` Eugene Lubarsky
2020-08-20 17:41 ` Andrei Vagin
2020-08-25 10:00 ` Eugene Lubarsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200825195909.1d1dcd72@eug-lubuntu \
--to=elubarsky.linux@gmail.com \
--cc=adobriyan@gmail.com \
--cc=avagin@gmail.com \
--cc=dsahern@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).