linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
@ 2009-06-08 11:38 alexander.shishckin
  2009-07-29 16:17 ` Alexander Shishkin
  0 siblings, 1 reply; 11+ messages in thread
From: alexander.shishckin @ 2009-06-08 11:38 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Shishkin

From: Alexander Shishkin <alexander.shishckin@gmail.com>

When a file descriptor limit is hit, it might be useful to see all the
users to be able to identify those that leak descriptors.

Signed-off-by: Alexander Shishkin <alexander.shishckin@gmail.com>
---
 fs/file_table.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 54018fe..9e53167 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -136,8 +136,35 @@ struct file *get_empty_filp(void)
 over:
 	/* Ran out of filps - report that */
 	if (get_nr_files() > old_max) {
+		struct task_struct *p;
+		struct files_struct *files;
+		struct fdtable *fdt;
+		int i, count = 0;
+
 		printk(KERN_INFO "VFS: file-max limit %d reached\n",
 					get_max_files());
+
+		read_lock(&tasklist_lock);
+		for_each_process(p) {
+			files = get_files_struct(p);
+			if (!files)
+				continue;
+
+			spin_lock(&files->file_lock);
+			fdt = files_fdtable(files);
+
+			/* we have to actually *count* the fds */
+			for (count = i = 0; i < fdt->max_fds; i++)
+				count += !!fcheck_files(files, i);
+
+			printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
+					p->pid, count);
+
+			spin_unlock(&files->file_lock);
+			put_files_struct(files);
+		}
+		read_unlock(&tasklist_lock);
+
 		old_max = get_nr_files();
 	}
 	goto fail;
-- 
1.6.1.3


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
  2009-06-08 11:38 [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max alexander.shishckin
@ 2009-07-29 16:17 ` Alexander Shishkin
  2009-07-30 12:40   ` Valdis.Kletnieks
  0 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2009-07-29 16:17 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: Alexander Shishkin, akpm, viro, Linux Kernel Mailing List

2009/6/8  <alexander.shishckin@gmail.com>:
> From: Alexander Shishkin <alexander.shishckin@gmail.com>
>
> When a file descriptor limit is hit, it might be useful to see all the
> users to be able to identify those that leak descriptors.
Is there anything dramatically wrong with this one, or could someone
please review this?

> Signed-off-by: Alexander Shishkin <alexander.shishckin@gmail.com>
> ---
>  fs/file_table.c |   27 +++++++++++++++++++++++++++
>  1 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 54018fe..9e53167 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -136,8 +136,35 @@ struct file *get_empty_filp(void)
>  over:
>        /* Ran out of filps - report that */
>        if (get_nr_files() > old_max) {
> +               struct task_struct *p;
> +               struct files_struct *files;
> +               struct fdtable *fdt;
> +               int i, count = 0;
> +
>                printk(KERN_INFO "VFS: file-max limit %d reached\n",
>                                        get_max_files());
> +
> +               read_lock(&tasklist_lock);
> +               for_each_process(p) {
> +                       files = get_files_struct(p);
> +                       if (!files)
> +                               continue;
> +
> +                       spin_lock(&files->file_lock);
> +                       fdt = files_fdtable(files);
> +
> +                       /* we have to actually *count* the fds */
> +                       for (count = i = 0; i < fdt->max_fds; i++)
> +                               count += !!fcheck_files(files, i);
> +
> +                       printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
> +                                       p->pid, count);
> +
> +                       spin_unlock(&files->file_lock);
> +                       put_files_struct(files);
> +               }
> +               read_unlock(&tasklist_lock);
> +
>                old_max = get_nr_files();
>        }
>        goto fail;
> --
> 1.6.1.3
>
>

TIA,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
  2009-07-29 16:17 ` Alexander Shishkin
@ 2009-07-30 12:40   ` Valdis.Kletnieks
  2009-10-11 12:17     ` Alexander Shishkin
                       ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Valdis.Kletnieks @ 2009-07-30 12:40 UTC (permalink / raw)
  To: Alexander Shishkin; +Cc: linux-fsdevel, akpm, viro, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]

On Wed, 29 Jul 2009 19:17:00 +0300, Alexander Shishkin said:
>Is there anything dramatically wrong with this one, or could someone please review this?


> +               for_each_process(p) {
> +                       files = get_files_struct(p);
> +                       if (!files)
> +                               continue;
> +
> +                       spin_lock(&files->file_lock);
> +                       fdt = files_fdtable(files);
> +
> +                       /* we have to actually *count* the fds */
> +                       for (count = i = 0; i < fdt->max_fds; i++)
> +                               count += !!fcheck_files(files, i);
> +
> +                       printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
> +                                       p->pid, count);

1) Splatting out 'count' without a hint of what it is isn't very user friendly.
Consider something like "=> %s[%d]: open=%d\n" instead, or add a second line
to the 'VFS: file-max' printk to provide a header.

2) What context does this run in, and what locks/scheduling considerations
are there? On a large system with many processes running, this could conceivably
wrap the logmsg buffer before syslog has a chance to get scheduled and read
the stuff out.

3) This can be used by a miscreant to spam the logs - consider a program
that does open() until it hits the limit, then goes into a close()/open()
loop to repeatedly bang up against the limit. Every 2 syscalls by the
abuser could get them another 5,000+ lines in the log - an incredible
amplification factor.

Now, if you fixed it to only print out the top 10 offending processes, it would
make it a lot more useful to the sysadmin, and a lot of those considerations go
away, but it also makes the already N**2 behavior even more expensive...

At that point, it would be good to report some CPU numbers by running a abusive
program that repeatedly hit the limit, and be able to say "Even under full
stress, it only used 15% of a CPU on a 2.4Ghz Core2" or similar...


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
  2009-07-30 12:40   ` Valdis.Kletnieks
@ 2009-10-11 12:17     ` Alexander Shishkin
  2010-01-10 16:34     ` [RFC][PATCHv2] " Alexander Shishkin
  2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
  2 siblings, 0 replies; 11+ messages in thread
From: Alexander Shishkin @ 2009-10-11 12:17 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-fsdevel, akpm, viro, Linux Kernel Mailing List

2009/7/30  <Valdis.Kletnieks@vt.edu>:
> On Wed, 29 Jul 2009 19:17:00 +0300, Alexander Shishkin said:
>>Is there anything dramatically wrong with this one, or could someone please review this?
>
>
>> +               for_each_process(p) {
>> +                       files = get_files_struct(p);
>> +                       if (!files)
>> +                               continue;
>> +
>> +                       spin_lock(&files->file_lock);
>> +                       fdt = files_fdtable(files);
>> +
>> +                       /* we have to actually *count* the fds */
>> +                       for (count = i = 0; i < fdt->max_fds; i++)
>> +                               count += !!fcheck_files(files, i);
>> +
>> +                       printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
>> +                                       p->pid, count);
>
> 1) Splatting out 'count' without a hint of what it is isn't very user friendly.
> Consider something like "=> %s[%d]: open=%d\n" instead, or add a second line
> to the 'VFS: file-max' printk to provide a header.
Fair enough.

> 2) What context does this run in, and what locks/scheduling considerations
> are there? On a large system with many processes running, this could conceivably
> wrap the logmsg buffer before syslog has a chance to get scheduled and read
> the stuff out.
That's a good point.

> 3) This can be used by a miscreant to spam the logs - consider a program
> that does open() until it hits the limit, then goes into a close()/open()
> loop to repeatedly bang up against the limit. Every 2 syscalls by the
> abuser could get them another 5,000+ lines in the log - an incredible
> amplification factor.
>
> Now, if you fixed it to only print out the top 10 offending processes, it would
> make it a lot more useful to the sysadmin, and a lot of those considerations go
> away, but it also makes the already N**2 behavior even more expensive...
That's a good idea. I think some kind of rate-limiting can be applied here too.

> At that point, it would be good to report some CPU numbers by running a abusive
> program that repeatedly hit the limit, and be able to say "Even under full
> stress, it only used 15% of a CPU on a 2.4Ghz Core2" or similar...
I'll see what I can do.
Thanks for your comments and ideas!

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC][PATCHv2] List per-process file descriptor consumption when hitting file-max
  2009-07-30 12:40   ` Valdis.Kletnieks
  2009-10-11 12:17     ` Alexander Shishkin
@ 2010-01-10 16:34     ` Alexander Shishkin
  2010-01-13 22:12       ` Andrew Morton
  2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
  2 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2010-01-10 16:34 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-fsdevel, akpm, Alexander Shishkin, viro

When a file descriptor limit is hit, display the top consumers of
descriptors so that it is possible to identify and fix those which
leak them.

Two new sysctl tunables are introduced:
  * file-max-consumers -- number of processes to display (defaults
    to 10);
  * file-max-rate-limit -- time interval between subsequent dumps
    (defaults to 10 seconds).

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
---
 fs/file_table.c    |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h |    5 +++
 kernel/sysctl.c    |   14 ++++++++
 3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 69652c5..d172e89 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/sort.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/fs.h>
@@ -29,7 +30,8 @@
 
 /* sysctl tunables... */
 struct files_stat_struct files_stat = {
-	.max_files = NR_FILE
+	.max_files = NR_FILE,
+	.max_consumers = NR_CONSUMERS,
 };
 
 /* public. Not pretty! */
@@ -90,6 +92,81 @@ int proc_nr_files(ctl_table *table, int write,
 }
 #endif
 
+/*
+ * Number of open file descriptors per task_struct
+ */
+struct fd_consumer
+{
+	struct task_struct *task;
+	int fd_count;
+};
+
+static int cmp_fd_consumers(const void *a, const void *b)
+{
+	const struct fd_consumer *x = a, *y = b;
+
+	return y->fd_count - x->fd_count;
+}
+
+static void dump_fd_consumers(void)
+{
+	struct task_struct *p;
+	struct files_struct *files;
+	struct fdtable *fdt;
+	int proc_limit = files_stat.max_consumers;
+	int i, nproc;
+	struct fd_consumer *procs, *tmp;
+
+	if (!files_stat.max_consumers)
+		return;
+
+	read_lock(&tasklist_lock);
+
+	/* build an array of per-task file descriptor usage */
+	nproc = nr_processes();
+	procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
+	if (!procs)
+		goto out;
+
+	tmp = procs;
+
+	for_each_process(p) {
+		tmp->task = p;
+
+		files = get_files_struct(p);
+		if (!files)
+			continue;
+
+		spin_lock(&files->file_lock);
+		fdt = files_fdtable(files);
+
+		/* we have to actually *count* the fds */
+		for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
+			tmp->fd_count += !!fcheck_files(files, i);
+
+		spin_unlock(&files->file_lock);
+		put_files_struct(files);
+
+		tmp++;
+	}
+
+	/* sort by number of used descriptor in descending order */
+	sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);
+
+	if (proc_limit > nproc)
+		proc_limit = nproc;
+
+	/* output the 'proc_limit' first entries */
+	for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
+		printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
+		       tmp->task->pid, tmp->fd_count);
+
+	kfree(procs);
+
+out:
+	read_unlock(&tasklist_lock);
+}
+
 /* Find an unused file structure and return a pointer to it.
  * Returns NULL, if there are no more free file structures or
  * we run out of memory.
@@ -105,6 +182,7 @@ struct file *get_empty_filp(void)
 	const struct cred *cred = current_cred();
 	static int old_max;
 	struct file * f;
+	static unsigned long next_dump = 0;
 
 	/*
 	 * Privileged users can go above max_files
@@ -140,6 +218,14 @@ over:
 	if (get_nr_files() > old_max) {
 		printk(KERN_INFO "VFS: file-max limit %d reached\n",
 					get_max_files());
+
+		/* dump the biggest file descriptor users */
+		if (!next_dump || time_after(jiffies, next_dump)) {
+			next_dump = jiffies + files_stat.rate_limit;
+
+			dump_fd_consumers();
+		}
+
 		old_max = get_nr_files();
 	}
 	goto fail;
@@ -425,6 +511,8 @@ void __init files_init(unsigned long mempages)
 	files_stat.max_files = n; 
 	if (files_stat.max_files < NR_FILE)
 		files_stat.max_files = NR_FILE;
+
+	files_stat.rate_limit = DUMP_RATE_LIMIT;
 	files_defer_init();
 	percpu_counter_init(&nr_files, 0);
 } 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9147ca8..291beb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,8 @@ struct files_stat_struct {
 	int nr_files;		/* read only */
 	int nr_free_files;	/* read only */
 	int max_files;		/* tunable */
+	int max_consumers;	/* tunable */
+	unsigned long rate_limit;	/* tunable */
 };
 
 struct inodes_stat_t {
@@ -46,6 +48,9 @@ struct inodes_stat_t {
 
 
 #define NR_FILE  8192	/* this can well be larger on a larger system */
+#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
+#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
+						   dumps */
 
 #define MAY_EXEC 1
 #define MAY_WRITE 2
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a68b24..dfb08fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.procname	= "file-max-consumers",
+		.data		= &files_stat.max_consumers,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.procname	= "file-max-rate-limit",
+		.data		= &files_stat.rate_limit,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_ms_jiffies_minmax,
+	},
+	{
 		.procname	= "nr_open",
 		.data		= &sysctl_nr_open,
 		.maxlen		= sizeof(int),
-- 
1.6.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
  2009-07-30 12:40   ` Valdis.Kletnieks
  2009-10-11 12:17     ` Alexander Shishkin
  2010-01-10 16:34     ` [RFC][PATCHv2] " Alexander Shishkin
@ 2010-01-11  9:38     ` Alexander Shishkin
  2010-01-11 12:40       ` Andreas Dilger
                         ` (2 more replies)
  2 siblings, 3 replies; 11+ messages in thread
From: Alexander Shishkin @ 2010-01-11  9:38 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: linux-fsdevel, akpm, linux-kernel, Alexander Shishkin, viro

When a file descriptor limit is hit, display the top consumers of
descriptors so that it is possible to identify and fix those which
leak them.

Two new sysctl tunables are introduced:
  * file-max-consumers -- number of processes to display (defaults
    to 10);
  * file-max-rate-limit -- time interval between subsequent dumps
    (defaults to 10 seconds).

Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
---
Changes:
v3 -- fix a couple of silly checkpatch errors
v2 -- add rate-limiting and reduce number of processes to be output
v1 -- initial implementation.

 fs/file_table.c    |   89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/fs.h |    5 +++
 kernel/sysctl.c    |   14 ++++++++
 3 files changed, 107 insertions(+), 1 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 69652c5..26666fd 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -9,6 +9,7 @@
 #include <linux/slab.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/sort.h>
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/fs.h>
@@ -29,7 +30,8 @@
 
 /* sysctl tunables... */
 struct files_stat_struct files_stat = {
-	.max_files = NR_FILE
+	.max_files = NR_FILE,
+	.max_consumers = NR_CONSUMERS,
 };
 
 /* public. Not pretty! */
@@ -90,6 +92,80 @@ int proc_nr_files(ctl_table *table, int write,
 }
 #endif
 
+/*
+ * Number of open file descriptors per task_struct
+ */
+struct fd_consumer {
+	struct task_struct *task;
+	int fd_count;
+};
+
+static int cmp_fd_consumers(const void *a, const void *b)
+{
+	const struct fd_consumer *x = a, *y = b;
+
+	return y->fd_count - x->fd_count;
+}
+
+static void dump_fd_consumers(void)
+{
+	struct task_struct *p;
+	struct files_struct *files;
+	struct fdtable *fdt;
+	int proc_limit = files_stat.max_consumers;
+	int i, nproc;
+	struct fd_consumer *procs, *tmp;
+
+	if (!files_stat.max_consumers)
+		return;
+
+	read_lock(&tasklist_lock);
+
+	/* build an array of per-task file descriptor usage */
+	nproc = nr_processes();
+	procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
+	if (!procs)
+		goto out;
+
+	tmp = procs;
+
+	for_each_process(p) {
+		tmp->task = p;
+
+		files = get_files_struct(p);
+		if (!files)
+			continue;
+
+		spin_lock(&files->file_lock);
+		fdt = files_fdtable(files);
+
+		/* we have to actually *count* the fds */
+		for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
+			tmp->fd_count += !!fcheck_files(files, i);
+
+		spin_unlock(&files->file_lock);
+		put_files_struct(files);
+
+		tmp++;
+	}
+
+	/* sort by number of used descriptor in descending order */
+	sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);
+
+	if (proc_limit > nproc)
+		proc_limit = nproc;
+
+	/* output the 'proc_limit' first entries */
+	for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
+		printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
+		       tmp->task->pid, tmp->fd_count);
+
+	kfree(procs);
+
+out:
+	read_unlock(&tasklist_lock);
+}
+
 /* Find an unused file structure and return a pointer to it.
  * Returns NULL, if there are no more free file structures or
  * we run out of memory.
@@ -105,6 +181,7 @@ struct file *get_empty_filp(void)
 	const struct cred *cred = current_cred();
 	static int old_max;
 	struct file * f;
+	static unsigned long next_dump;
 
 	/*
 	 * Privileged users can go above max_files
@@ -140,6 +217,14 @@ over:
 	if (get_nr_files() > old_max) {
 		printk(KERN_INFO "VFS: file-max limit %d reached\n",
 					get_max_files());
+
+		/* dump the biggest file descriptor users */
+		if (!next_dump || time_after(jiffies, next_dump)) {
+			next_dump = jiffies + files_stat.rate_limit;
+
+			dump_fd_consumers();
+		}
+
 		old_max = get_nr_files();
 	}
 	goto fail;
@@ -425,6 +510,8 @@ void __init files_init(unsigned long mempages)
 	files_stat.max_files = n; 
 	if (files_stat.max_files < NR_FILE)
 		files_stat.max_files = NR_FILE;
+
+	files_stat.rate_limit = DUMP_RATE_LIMIT;
 	files_defer_init();
 	percpu_counter_init(&nr_files, 0);
 } 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9147ca8..291beb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,8 @@ struct files_stat_struct {
 	int nr_files;		/* read only */
 	int nr_free_files;	/* read only */
 	int max_files;		/* tunable */
+	int max_consumers;	/* tunable */
+	unsigned long rate_limit;	/* tunable */
 };
 
 struct inodes_stat_t {
@@ -46,6 +48,9 @@ struct inodes_stat_t {
 
 
 #define NR_FILE  8192	/* this can well be larger on a larger system */
+#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
+#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
+						   dumps */
 
 #define MAY_EXEC 1
 #define MAY_WRITE 2
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a68b24..dfb08fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
 		.proc_handler	= proc_dointvec,
 	},
 	{
+		.procname	= "file-max-consumers",
+		.data		= &files_stat.max_consumers,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
+	{
+		.procname	= "file-max-rate-limit",
+		.data		= &files_stat.rate_limit,
+		.maxlen		= sizeof(unsigned long),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_ms_jiffies_minmax,
+	},
+	{
 		.procname	= "nr_open",
 		.data		= &sysctl_nr_open,
 		.maxlen		= sizeof(int),
-- 
1.6.5


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
  2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
@ 2010-01-11 12:40       ` Andreas Dilger
  2010-01-13 22:06       ` Andi Kleen
  2010-01-13 22:44       ` Al Viro
  2 siblings, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2010-01-11 12:40 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel, viro

On 2010-01-11, at 05:38, Alexander Shishkin wrote:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
>
> Two new sysctl tunables are introduced:
>  * file-max-consumers -- number of processes to display (defaults
>    to 10);
>  * file-max-rate-limit -- time interval between subsequent dumps
>    (defaults to 10 seconds).

This should default to max_consumers=0 to avoid spamming the logs, IMHO.

> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: viro@zeniv.linux.org.uk
> CC: linux-fsdevel@vger.kernel.org
> ---
> Changes:
> v3 -- fix a couple of silly checkpatch errors
> v2 -- add rate-limiting and reduce number of processes to be output
> v1 -- initial implementation.
>
> fs/file_table.c    |   89 +++++++++++++++++++++++++++++++++++++++++++ 
> ++++++++-
> include/linux/fs.h |    5 +++
> kernel/sysctl.c    |   14 ++++++++
> 3 files changed, 107 insertions(+), 1 deletions(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 69652c5..26666fd 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -9,6 +9,7 @@
> #include <linux/slab.h>
> #include <linux/file.h>
> #include <linux/fdtable.h>
> +#include <linux/sort.h>
> #include <linux/init.h>
> #include <linux/module.h>
> #include <linux/fs.h>
> @@ -29,7 +30,8 @@
>
> /* sysctl tunables... */
> struct files_stat_struct files_stat = {
> -	.max_files = NR_FILE
> +	.max_files = NR_FILE,
> +	.max_consumers = NR_CONSUMERS,
> };
>
> /* public. Not pretty! */
> @@ -90,6 +92,80 @@ int proc_nr_files(ctl_table *table, int write,
> }
> #endif
>
> +/*
> + * Number of open file descriptors per task_struct
> + */
> +struct fd_consumer {
> +	struct task_struct *task;
> +	int fd_count;
> +};
> +
> +static int cmp_fd_consumers(const void *a, const void *b)
> +{
> +	const struct fd_consumer *x = a, *y = b;
> +
> +	return y->fd_count - x->fd_count;
> +}
> +
> +static void dump_fd_consumers(void)
> +{
> +	struct task_struct *p;
> +	struct files_struct *files;
> +	struct fdtable *fdt;
> +	int proc_limit = files_stat.max_consumers;
> +	int i, nproc;
> +	struct fd_consumer *procs, *tmp;
> +
> +	if (!files_stat.max_consumers)
> +		return;
> +
> +	read_lock(&tasklist_lock);
> +
> +	/* build an array of per-task file descriptor usage */
> +	nproc = nr_processes();
> +	procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
> +	if (!procs)
> +		goto out;
> +
> +	tmp = procs;
> +
> +	for_each_process(p) {
> +		tmp->task = p;
> +
> +		files = get_files_struct(p);
> +		if (!files)
> +			continue;
> +
> +		spin_lock(&files->file_lock);
> +		fdt = files_fdtable(files);
> +
> +		/* we have to actually *count* the fds */
> +		for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
> +			tmp->fd_count += !!fcheck_files(files, i);
> +
> +		spin_unlock(&files->file_lock);
> +		put_files_struct(files);
> +
> +		tmp++;
> +	}
> +
> +	/* sort by number of used descriptor in descending order */
> +	sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers,  
> NULL);
> +
> +	if (proc_limit > nproc)
> +		proc_limit = nproc;
> +
> +	/* output the 'proc_limit' first entries */
> +	for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
> +		printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
> +		       tmp->task->pid, tmp->fd_count);
> +
> +	kfree(procs);
> +
> +out:
> +	read_unlock(&tasklist_lock);
> +}
> +
> /* Find an unused file structure and return a pointer to it.
>  * Returns NULL, if there are no more free file structures or
>  * we run out of memory.
> @@ -105,6 +181,7 @@ struct file *get_empty_filp(void)
> 	const struct cred *cred = current_cred();
> 	static int old_max;
> 	struct file * f;
> +	static unsigned long next_dump;
>
> 	/*
> 	 * Privileged users can go above max_files
> @@ -140,6 +217,14 @@ over:
> 	if (get_nr_files() > old_max) {
> 		printk(KERN_INFO "VFS: file-max limit %d reached\n",
> 					get_max_files());
> +
> +		/* dump the biggest file descriptor users */
> +		if (!next_dump || time_after(jiffies, next_dump)) {
> +			next_dump = jiffies + files_stat.rate_limit;
> +
> +			dump_fd_consumers();
> +		}
> +
> 		old_max = get_nr_files();
> 	}
> 	goto fail;
> @@ -425,6 +510,8 @@ void __init files_init(unsigned long mempages)
> 	files_stat.max_files = n;
> 	if (files_stat.max_files < NR_FILE)
> 		files_stat.max_files = NR_FILE;
> +
> +	files_stat.rate_limit = DUMP_RATE_LIMIT;
> 	files_defer_init();
> 	percpu_counter_init(&nr_files, 0);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9147ca8..291beb3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -36,6 +36,8 @@ struct files_stat_struct {
> 	int nr_files;		/* read only */
> 	int nr_free_files;	/* read only */
> 	int max_files;		/* tunable */
> +	int max_consumers;	/* tunable */
> +	unsigned long rate_limit;	/* tunable */
> };
>
> struct inodes_stat_t {
> @@ -46,6 +48,9 @@ struct inodes_stat_t {
>
>
> #define NR_FILE  8192	/* this can well be larger on a larger system */
> +#define NR_CONSUMERS 10 /* dump this many tasks when file-max is  
> hit */
> +#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long  
> between
> +						   dumps */
>
> #define MAY_EXEC 1
> #define MAY_WRITE 2
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a68b24..dfb08fc 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
> 		.proc_handler	= proc_dointvec,
> 	},
> 	{
> +		.procname	= "file-max-consumers",
> +		.data		= &files_stat.max_consumers,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "file-max-rate-limit",
> +		.data		= &files_stat.rate_limit,
> +		.maxlen		= sizeof(unsigned long),
> +		.mode		= 0644,
> +		.proc_handler	= proc_doulongvec_ms_jiffies_minmax,
> +	},
> +	{
> 		.procname	= "nr_open",
> 		.data		= &sysctl_nr_open,
> 		.maxlen		= sizeof(int),
> -- 
> 1.6.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux- 
> fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
  2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
  2010-01-11 12:40       ` Andreas Dilger
@ 2010-01-13 22:06       ` Andi Kleen
  2010-01-13 22:44       ` Al Viro
  2 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2010-01-13 22:06 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel, viro

Alexander Shishkin <virtuoso@slind.org> writes:

> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.

Can't you get that information from /proc/*/fd/ ? 

Doing it in user space would be much saner.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCHv2] List per-process file descriptor consumption when hitting file-max
  2010-01-10 16:34     ` [RFC][PATCHv2] " Alexander Shishkin
@ 2010-01-13 22:12       ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2010-01-13 22:12 UTC (permalink / raw)
  To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, viro

On Sun, 10 Jan 2010 18:34:02 +0200
Alexander Shishkin <virtuoso@slind.org> wrote:

> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
> 
> Two new sysctl tunables are introduced:
>   * file-max-consumers -- number of processes to display (defaults
>     to 10);
>   * file-max-rate-limit -- time interval between subsequent dumps
>     (defaults to 10 seconds).
> 
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: viro@zeniv.linux.org.uk
> CC: linux-fsdevel@vger.kernel.org
> ---
>  fs/file_table.c    |   90 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/fs.h |    5 +++
>  kernel/sysctl.c    |   14 ++++++++
>  3 files changed, 108 insertions(+), 1 deletions(-)
> 

Seems to be a lot of code for a fairly obscure requirement.  Can we
justify adding it?  Can't the operator use lsof(8)?

> +struct fd_consumer
> +{

Please use scripts/checkpatch.pl.

> +	struct task_struct *task;
> +	int fd_count;
> +};
> +
> +static int cmp_fd_consumers(const void *a, const void *b)
> +{
> +	const struct fd_consumer *x = a, *y = b;
> +
> +	return y->fd_count - x->fd_count;
> +}
> +
> +static void dump_fd_consumers(void)
> +{
> +	struct task_struct *p;
> +	struct files_struct *files;
> +	struct fdtable *fdt;
> +	int proc_limit = files_stat.max_consumers;
> +	int i, nproc;
> +	struct fd_consumer *procs, *tmp;

Please avoid variables called `tmp'.  Think up a meaningful name.

> +	if (!files_stat.max_consumers)
> +		return;
> +
> +	read_lock(&tasklist_lock);
> +
> +	/* build an array of per-task file descriptor usage */
> +	nproc = nr_processes();
> +	procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);

GFP_KERNEL allocation under tasklist_lock is a bug.  A bug which would
have been revealed during runtime testing had you empoyed the wise
advice in Documentation/SubmitChecklist!

> +	if (!procs)
> +		goto out;
> +
> +	tmp = procs;
> +
> +	for_each_process(p) {
> +		tmp->task = p;
> +
> +		files = get_files_struct(p);

This does task_lock() under tasklist_lock.  Is that correct?

> +		if (!files)
> +			continue;
> +
> +		spin_lock(&files->file_lock);

Similarly, is this the correct ranking?

Also, is it a _new_ ranking?  Is there any place where we
simultaneously take tasklist_lock and file_lock?  If so, that's a bit
undesirable.

> +		fdt = files_fdtable(files);
> +
> +		/* we have to actually *count* the fds */
> +		for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
> +			tmp->fd_count += !!fcheck_files(files, i);
> +
> +		spin_unlock(&files->file_lock);
> +		put_files_struct(files);
> +
> +		tmp++;
> +	}
> +
> +	/* sort by number of used descriptor in descending order */
> +	sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);

This code can hold tasklist lock for quite a long time.

> +	if (proc_limit > nproc)
> +		proc_limit = nproc;
> +
> +	/* output the 'proc_limit' first entries */
> +	for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
> +		printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
> +		       tmp->task->pid, tmp->fd_count);
> +
> +	kfree(procs);
> +
> +out:
> +	read_unlock(&tasklist_lock);
> +}
> +
>  /* Find an unused file structure and return a pointer to it.
>   * Returns NULL, if there are no more free file structures or
>   * we run out of memory.
> @@ -105,6 +182,7 @@ struct file *get_empty_filp(void)
>  	const struct cred *cred = current_cred();
>  	static int old_max;
>  	struct file * f;
> +	static unsigned long next_dump = 0;
>  
>  	/*
>  	 * Privileged users can go above max_files
> @@ -140,6 +218,14 @@ over:
>  	if (get_nr_files() > old_max) {
>  		printk(KERN_INFO "VFS: file-max limit %d reached\n",
>  					get_max_files());
> +
> +		/* dump the biggest file descriptor users */
> +		if (!next_dump || time_after(jiffies, next_dump)) {
> +			next_dump = jiffies + files_stat.rate_limit;
> +
> +			dump_fd_consumers();
> +		}

Use of __ratelimit() would be appropriate here.

>  		old_max = get_nr_files();
>  	}
>  	goto fail;
> @@ -425,6 +511,8 @@ void __init files_init(unsigned long mempages)
>  	files_stat.max_files = n; 
>  	if (files_stat.max_files < NR_FILE)
>  		files_stat.max_files = NR_FILE;
> +
> +	files_stat.rate_limit = DUMP_RATE_LIMIT;
>  	files_defer_init();
>  	percpu_counter_init(&nr_files, 0);
>  } 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9147ca8..291beb3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -36,6 +36,8 @@ struct files_stat_struct {
>  	int nr_files;		/* read only */
>  	int nr_free_files;	/* read only */
>  	int max_files;		/* tunable */
> +	int max_consumers;	/* tunable */
> +	unsigned long rate_limit;	/* tunable */
>  };
>  
>  struct inodes_stat_t {
> @@ -46,6 +48,9 @@ struct inodes_stat_t {
>  
>  
>  #define NR_FILE  8192	/* this can well be larger on a larger system */
> +#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
> +#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
> +						   dumps */
>  
>  #define MAY_EXEC 1
>  #define MAY_WRITE 2
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a68b24..dfb08fc 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
>  		.proc_handler	= proc_dointvec,
>  	},
>  	{
> +		.procname	= "file-max-consumers",
> +		.data		= &files_stat.max_consumers,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +	},
> +	{
> +		.procname	= "file-max-rate-limit",
> +		.data		= &files_stat.rate_limit,
> +		.maxlen		= sizeof(unsigned long),
> +		.mode		= 0644,
> +		.proc_handler	= proc_doulongvec_ms_jiffies_minmax,
> +	},
> +	{
>  		.procname	= "nr_open",
>  		.data		= &sysctl_nr_open,
>  		.maxlen		= sizeof(int),


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
  2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
  2010-01-11 12:40       ` Andreas Dilger
  2010-01-13 22:06       ` Andi Kleen
@ 2010-01-13 22:44       ` Al Viro
  2010-01-13 22:57         ` Al Viro
  2 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2010-01-13 22:44 UTC (permalink / raw)
  To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel

On Mon, Jan 11, 2010 at 11:38:07AM +0200, Alexander Shishkin wrote:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
> 
> Two new sysctl tunables are introduced:
>   * file-max-consumers -- number of processes to display (defaults
>     to 10);
>   * file-max-rate-limit -- time interval between subsequent dumps
>     (defaults to 10 seconds).

That *still* doesn't answer the most important question: what for?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
  2010-01-13 22:44       ` Al Viro
@ 2010-01-13 22:57         ` Al Viro
  0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2010-01-13 22:57 UTC (permalink / raw)
  To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel

On Wed, Jan 13, 2010 at 10:44:59PM +0000, Al Viro wrote:
> On Mon, Jan 11, 2010 at 11:38:07AM +0200, Alexander Shishkin wrote:
> > When a file descriptor limit is hit, display the top consumers of
> > descriptors so that it is possible to identify and fix those which
> > leak them.
> > 
> > Two new sysctl tunables are introduced:
> >   * file-max-consumers -- number of processes to display (defaults
> >     to 10);
> >   * file-max-rate-limit -- time interval between subsequent dumps
> >     (defaults to 10 seconds).
> 
> That *still* doesn't answer the most important question: what for?

BTW, even leaving that (and obvious deadlocks) aside, this stuff is
monumentally bogus.  A process can easily have shitloads of opened
descriptors and very few opened files (see dup() and friends).  Conversely,
you can have shitloads of opened files and not a single opened descriptor
(see mmap()).  And you are calling that when we have too many opened
struct file.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-01-13 22:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08 11:38 [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max alexander.shishckin
2009-07-29 16:17 ` Alexander Shishkin
2009-07-30 12:40   ` Valdis.Kletnieks
2009-10-11 12:17     ` Alexander Shishkin
2010-01-10 16:34     ` [RFC][PATCHv2] " Alexander Shishkin
2010-01-13 22:12       ` Andrew Morton
2010-01-11  9:38     ` [RFC][PATCHv3] " Alexander Shishkin
2010-01-11 12:40       ` Andreas Dilger
2010-01-13 22:06       ` Andi Kleen
2010-01-13 22:44       ` Al Viro
2010-01-13 22:57         ` Al Viro

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).