linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Davidlohr Bueso <dave@stgolabs.net>
To: akpm@linux-foundation.org
Cc: longman@redhat.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, dave@stgolabs.net,
	Davidlohr Bueso <dbueso@suse.de>
Subject: [PATCH] fs/proc: introduce /proc/stat2 file
Date: Mon, 29 Oct 2018 12:25:21 -0700	[thread overview]
Message-ID: <20181029192521.23059-1-dave@stgolabs.net> (raw)

A recent report from a large database vendor which I shall not name
shows concerns about poor performance when consuming /proc/stat info.
Particularly  kstat_irq() pops up in the profiles and most time is
being spent there. The overall system is under a lot of irqs and
almost 1k cores, thus this comes to little surprise.

Granted that procfs in general is not known for its performance,
nor designed for it, for that matter. Some users, however may be able
to overcome this performance limitation, some not. Therefore it isn't
bad having a kernel option for users that don't want any hard irq info
-- and care enough about this.

This patch introduces a new /proc/stat2 file that is identical to the
regular 'stat' except that it zeroes all hard irq statistics. The new
file is a drop in replacement to stat for users that need performance.

The stat file is not touched, of course -- this was also previously
suggested by Waiman:
https://lore.kernel.org/lkml/1524166562-5644-1-git-send-email-longman@redhat.com/

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 Documentation/filesystems/proc.txt | 12 +++++++---
 fs/proc/stat.c                     | 45 ++++++++++++++++++++++++++++++++------
 2 files changed, 47 insertions(+), 10 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 12a5e6e693b6..563b01decb1e 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -27,7 +27,7 @@ Table of Contents
   1.5	SCSI info
   1.6	Parallel port info in /proc/parport
   1.7	TTY info in /proc/tty
-  1.8	Miscellaneous kernel statistics in /proc/stat
+  1.8	Miscellaneous kernel statistics in /proc/stat and /proc/stat2
   1.9	Ext4 file system parameters
 
   2	Modifying System Parameters
@@ -140,6 +140,7 @@ Table 1-1: Process specific entries in /proc
  mem		Memory held by this process
  root		Link to the root directory of this process
  stat		Process status
+ stat2		Process status without irq information
  statm		Process memory status information
  status		Process status in human readable form
  wchan		Present with CONFIG_KALLSYMS=y: it shows the kernel function
@@ -1301,8 +1302,8 @@ To see  which  tty's  are  currently in use, you can simply look into the file
   unknown              /dev/tty        4    1-63 console 
 
 
-1.8 Miscellaneous kernel statistics in /proc/stat
--------------------------------------------------
+1.8 Miscellaneous kernel statistics in /proc/stat and /proc/stat2
+-----------------------------------------------------------------
 
 Various pieces   of  information about  kernel activity  are  available in the
 /proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
@@ -1371,6 +1372,11 @@ of the possible system softirqs. The first column is the total of all
 softirqs serviced; each subsequent column is the total for that particular
 softirq.
 
+The stat2 file acts as a performance alternative to /proc/stat for workloads
+and systems that care and are under heavy irq load. In order to to be completely
+compatible, /proc/stat and /proc/stat2 are identical with the exception that the
+later will show 0 for any (hard)irq-related fields. This refers particularly
+to the "intr" line and 'irq' column for that aggregate in the cpu line.
 
 1.9 Ext4 file system parameters
 -------------------------------
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index 535eda7857cf..349040270003 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -79,7 +79,7 @@ static u64 get_iowait_time(int cpu)
 
 #endif
 
-static int show_stat(struct seq_file *p, void *v)
+static int __show_stat(struct seq_file *p, void *v, bool irq_stats)
 {
 	int i, j;
 	u64 user, nice, system, idle, iowait, irq, softirq, steal;
@@ -100,13 +100,17 @@ static int show_stat(struct seq_file *p, void *v)
 		system += kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM];
 		idle += get_idle_time(i);
 		iowait += get_iowait_time(i);
-		irq += kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
 		softirq += kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ];
 		steal += kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
 		guest += kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
 		guest_nice += kcpustat_cpu(i).cpustat[CPUTIME_GUEST_NICE];
-		sum += kstat_cpu_irqs_sum(i);
-		sum += arch_irq_stat_cpu(i);
+
+		if (irq_stats) {
+			irq += kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
+
+			sum += kstat_cpu_irqs_sum(i);
+			sum += arch_irq_stat_cpu(i);
+		}
 
 		for (j = 0; j < NR_SOFTIRQS; j++) {
 			unsigned int softirq_stat = kstat_softirqs_cpu(j, i);
@@ -115,7 +119,9 @@ static int show_stat(struct seq_file *p, void *v)
 			sum_softirq += softirq_stat;
 		}
 	}
-	sum += arch_irq_stat();
+
+	if (irq_stats)
+		sum += arch_irq_stat();
 
 	seq_put_decimal_ull(p, "cpu  ", nsec_to_clock_t(user));
 	seq_put_decimal_ull(p, " ", nsec_to_clock_t(nice));
@@ -136,7 +142,8 @@ static int show_stat(struct seq_file *p, void *v)
 		system = kcpustat_cpu(i).cpustat[CPUTIME_SYSTEM];
 		idle = get_idle_time(i);
 		iowait = get_iowait_time(i);
-		irq = kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
+		if (irq_stats)
+			irq = kcpustat_cpu(i).cpustat[CPUTIME_IRQ];
 		softirq = kcpustat_cpu(i).cpustat[CPUTIME_SOFTIRQ];
 		steal = kcpustat_cpu(i).cpustat[CPUTIME_STEAL];
 		guest = kcpustat_cpu(i).cpustat[CPUTIME_GUEST];
@@ -158,7 +165,7 @@ static int show_stat(struct seq_file *p, void *v)
 
 	/* sum again ? it could be updated? */
 	for_each_irq_nr(j)
-		seq_put_decimal_ull(p, " ", kstat_irqs_usr(j));
+		seq_put_decimal_ull(p, " ", irq_stats ? kstat_irqs_usr(j) : 0);
 
 	seq_printf(p,
 		"\nctxt %llu\n"
@@ -181,6 +188,16 @@ static int show_stat(struct seq_file *p, void *v)
 	return 0;
 }
 
+static int show_stat(struct seq_file *p, void *v)
+{
+	return __show_stat(p, v, true);
+}
+
+static int show_stat2(struct seq_file *p, void *v)
+{
+	return __show_stat(p, v, false);
+}
+
 static int stat_open(struct inode *inode, struct file *file)
 {
 	unsigned int size = 1024 + 128 * num_online_cpus();
@@ -190,6 +207,12 @@ static int stat_open(struct inode *inode, struct file *file)
 	return single_open_size(file, show_stat, NULL, size);
 }
 
+static int stat2_open(struct inode *inode, struct file *file)
+{
+	unsigned int size = 1024 + 128 * num_online_cpus();
+	return single_open_size(file, show_stat2, NULL, size);
+}
+
 static const struct file_operations proc_stat_operations = {
 	.open		= stat_open,
 	.read		= seq_read,
@@ -197,9 +220,17 @@ static const struct file_operations proc_stat_operations = {
 	.release	= single_release,
 };
 
+static const struct file_operations proc_stat2_operations = {
+	.open		= stat2_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
 static int __init proc_stat_init(void)
 {
 	proc_create("stat", 0, NULL, &proc_stat_operations);
+	proc_create("stat2", 0, NULL, &proc_stat2_operations);
 	return 0;
 }
 fs_initcall(proc_stat_init);
-- 
2.16.4


             reply	other threads:[~2018-10-29 19:25 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-29 19:25 Davidlohr Bueso [this message]
2018-10-29 19:35 ` [PATCH] fs/proc: introduce /proc/stat2 file Waiman Long
2018-10-29 20:00   ` Davidlohr Bueso
2018-10-29 20:29     ` Waiman Long
2018-10-29 20:38       ` Davidlohr Bueso
2018-10-29 20:59         ` Waiman Long
2018-10-29 21:23           ` Vito Caputo
2018-10-29 21:35             ` Waiman Long
2018-10-29 22:41               ` Vito Caputo
2018-10-30 18:57             ` Davidlohr Bueso
2018-10-30 22:40               ` Vito Caputo
2018-10-30 23:15                 ` Davidlohr Bueso
2018-10-29 21:01 ` Waiman Long
2018-10-29 23:04 ` Daniel Colascione
2018-10-30  0:58   ` Vito Caputo
2018-11-06 23:48   ` Andrew Morton
2018-11-07  3:32     ` Davidlohr Bueso
2018-11-07 16:31       ` Waiman Long
2018-11-07 10:03     ` Miklos Szeredi
2018-11-07 15:42       ` Daniel Colascione
2018-11-07 15:54         ` Miklos Szeredi
2018-11-07 16:01           ` Daniel Colascione
2018-11-07 20:32       ` Vito Caputo
2018-11-08  2:07       ` Dave Chinner
2018-11-08  7:24         ` Davidlohr Bueso
2018-11-08  7:44           ` Davidlohr Bueso
2018-10-29 20:01 Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181029192521.23059-1-dave@stgolabs.net \
    --to=dave@stgolabs.net \
    --cc=akpm@linux-foundation.org \
    --cc=dbueso@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longman@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).