* [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
@ 2009-06-08 11:38 alexander.shishckin
2009-07-29 16:17 ` Alexander Shishkin
0 siblings, 1 reply; 11+ messages in thread
From: alexander.shishckin @ 2009-06-08 11:38 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Alexander Shishkin
From: Alexander Shishkin <alexander.shishckin@gmail.com>
When a file descriptor limit is hit, it might be useful to see all the
users to be able to identify those that leak descriptors.
Signed-off-by: Alexander Shishkin <alexander.shishckin@gmail.com>
---
fs/file_table.c | 27 +++++++++++++++++++++++++++
1 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 54018fe..9e53167 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -136,8 +136,35 @@ struct file *get_empty_filp(void)
over:
/* Ran out of filps - report that */
if (get_nr_files() > old_max) {
+ struct task_struct *p;
+ struct files_struct *files;
+ struct fdtable *fdt;
+ int i, count = 0;
+
printk(KERN_INFO "VFS: file-max limit %d reached\n",
get_max_files());
+
+ read_lock(&tasklist_lock);
+ for_each_process(p) {
+ files = get_files_struct(p);
+ if (!files)
+ continue;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+
+ /* we have to actually *count* the fds */
+ for (count = i = 0; i < fdt->max_fds; i++)
+ count += !!fcheck_files(files, i);
+
+ printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
+ p->pid, count);
+
+ spin_unlock(&files->file_lock);
+ put_files_struct(files);
+ }
+ read_unlock(&tasklist_lock);
+
old_max = get_nr_files();
}
goto fail;
--
1.6.1.3
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
2009-06-08 11:38 [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max alexander.shishckin
@ 2009-07-29 16:17 ` Alexander Shishkin
2009-07-30 12:40 ` Valdis.Kletnieks
0 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2009-07-29 16:17 UTC (permalink / raw)
To: linux-fsdevel; +Cc: Alexander Shishkin, akpm, viro, Linux Kernel Mailing List
2009/6/8 <alexander.shishckin@gmail.com>:
> From: Alexander Shishkin <alexander.shishckin@gmail.com>
>
> When a file descriptor limit is hit, it might be useful to see all the
> users to be able to identify those that leak descriptors.
Is there anything dramatically wrong with this one, or could someone
please review this?
> Signed-off-by: Alexander Shishkin <alexander.shishckin@gmail.com>
> ---
> fs/file_table.c | 27 +++++++++++++++++++++++++++
> 1 files changed, 27 insertions(+), 0 deletions(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 54018fe..9e53167 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -136,8 +136,35 @@ struct file *get_empty_filp(void)
> over:
> /* Ran out of filps - report that */
> if (get_nr_files() > old_max) {
> + struct task_struct *p;
> + struct files_struct *files;
> + struct fdtable *fdt;
> + int i, count = 0;
> +
> printk(KERN_INFO "VFS: file-max limit %d reached\n",
> get_max_files());
> +
> + read_lock(&tasklist_lock);
> + for_each_process(p) {
> + files = get_files_struct(p);
> + if (!files)
> + continue;
> +
> + spin_lock(&files->file_lock);
> + fdt = files_fdtable(files);
> +
> + /* we have to actually *count* the fds */
> + for (count = i = 0; i < fdt->max_fds; i++)
> + count += !!fcheck_files(files, i);
> +
> + printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
> + p->pid, count);
> +
> + spin_unlock(&files->file_lock);
> + put_files_struct(files);
> + }
> + read_unlock(&tasklist_lock);
> +
> old_max = get_nr_files();
> }
> goto fail;
> --
> 1.6.1.3
>
>
TIA,
--
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
2009-07-29 16:17 ` Alexander Shishkin
@ 2009-07-30 12:40 ` Valdis.Kletnieks
2009-10-11 12:17 ` Alexander Shishkin
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Valdis.Kletnieks @ 2009-07-30 12:40 UTC (permalink / raw)
To: Alexander Shishkin; +Cc: linux-fsdevel, akpm, viro, Linux Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 2055 bytes --]
On Wed, 29 Jul 2009 19:17:00 +0300, Alexander Shishkin said:
>Is there anything dramatically wrong with this one, or could someone please review this?
> + for_each_process(p) {
> + files = get_files_struct(p);
> + if (!files)
> + continue;
> +
> + spin_lock(&files->file_lock);
> + fdt = files_fdtable(files);
> +
> + /* we have to actually *count* the fds */
> + for (count = i = 0; i < fdt->max_fds; i++)
> + count += !!fcheck_files(files, i);
> +
> + printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
> + p->pid, count);
1) Splatting out 'count' without a hint of what it is isn't very user friendly.
Consider something like "=> %s[%d]: open=%d\n" instead, or add a second line
to the 'VFS: file-max' printk to provide a header.
2) What context does this run in, and what locks/scheduling considerations
are there? On a large system with many processes running, this could conceivably
wrap the logmsg buffer before syslog has a chance to get scheduled and read
the stuff out.
3) This can be used by a miscreant to spam the logs - consider a program
that does open() until it hits the limit, then goes into a close()/open()
loop to repeatedly bang up against the limit. Every 2 syscalls by the
abuser could get them another 5,000+ lines in the log - an incredible
amplification factor.
Now, if you fixed it to only print out the top 10 offending processes, it would
make it a lot more useful to the sysadmin, and a lot of those considerations go
away, but it also makes the already N**2 behavior even more expensive...
At that point, it would be good to report some CPU numbers by running a abusive
program that repeatedly hit the limit, and be able to say "Even under full
stress, it only used 15% of a CPU on a 2.4Ghz Core2" or similar...
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max
2009-07-30 12:40 ` Valdis.Kletnieks
@ 2009-10-11 12:17 ` Alexander Shishkin
2010-01-10 16:34 ` [RFC][PATCHv2] " Alexander Shishkin
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
2 siblings, 0 replies; 11+ messages in thread
From: Alexander Shishkin @ 2009-10-11 12:17 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-fsdevel, akpm, viro, Linux Kernel Mailing List
2009/7/30 <Valdis.Kletnieks@vt.edu>:
> On Wed, 29 Jul 2009 19:17:00 +0300, Alexander Shishkin said:
>>Is there anything dramatically wrong with this one, or could someone please review this?
>
>
>> + for_each_process(p) {
>> + files = get_files_struct(p);
>> + if (!files)
>> + continue;
>> +
>> + spin_lock(&files->file_lock);
>> + fdt = files_fdtable(files);
>> +
>> + /* we have to actually *count* the fds */
>> + for (count = i = 0; i < fdt->max_fds; i++)
>> + count += !!fcheck_files(files, i);
>> +
>> + printk(KERN_INFO "=> %s [%d]: %d\n", p->comm,
>> + p->pid, count);
>
> 1) Splatting out 'count' without a hint of what it is isn't very user friendly.
> Consider something like "=> %s[%d]: open=%d\n" instead, or add a second line
> to the 'VFS: file-max' printk to provide a header.
Fair enough.
> 2) What context does this run in, and what locks/scheduling considerations
> are there? On a large system with many processes running, this could conceivably
> wrap the logmsg buffer before syslog has a chance to get scheduled and read
> the stuff out.
That's a good point.
> 3) This can be used by a miscreant to spam the logs - consider a program
> that does open() until it hits the limit, then goes into a close()/open()
> loop to repeatedly bang up against the limit. Every 2 syscalls by the
> abuser could get them another 5,000+ lines in the log - an incredible
> amplification factor.
>
> Now, if you fixed it to only print out the top 10 offending processes, it would
> make it a lot more useful to the sysadmin, and a lot of those considerations go
> away, but it also makes the already N**2 behavior even more expensive...
That's a good idea. I think some kind of rate-limiting can be applied here too.
> At that point, it would be good to report some CPU numbers by running a abusive
> program that repeatedly hit the limit, and be able to say "Even under full
> stress, it only used 15% of a CPU on a 2.4Ghz Core2" or similar...
I'll see what I can do.
Thanks for your comments and ideas!
Regards,
--
Alex
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC][PATCHv2] List per-process file descriptor consumption when hitting file-max
2009-07-30 12:40 ` Valdis.Kletnieks
2009-10-11 12:17 ` Alexander Shishkin
@ 2010-01-10 16:34 ` Alexander Shishkin
2010-01-13 22:12 ` Andrew Morton
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
2 siblings, 1 reply; 11+ messages in thread
From: Alexander Shishkin @ 2010-01-10 16:34 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: linux-fsdevel, akpm, Alexander Shishkin, viro
When a file descriptor limit is hit, display the top consumers of
descriptors so that it is possible to identify and fix those which
leak them.
Two new sysctl tunables are introduced:
* file-max-consumers -- number of processes to display (defaults
to 10);
* file-max-rate-limit -- time interval between subsequent dumps
(defaults to 10 seconds).
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
---
fs/file_table.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/fs.h | 5 +++
kernel/sysctl.c | 14 ++++++++
3 files changed, 108 insertions(+), 1 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 69652c5..d172e89 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -9,6 +9,7 @@
#include <linux/slab.h>
#include <linux/file.h>
#include <linux/fdtable.h>
+#include <linux/sort.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
@@ -29,7 +30,8 @@
/* sysctl tunables... */
struct files_stat_struct files_stat = {
- .max_files = NR_FILE
+ .max_files = NR_FILE,
+ .max_consumers = NR_CONSUMERS,
};
/* public. Not pretty! */
@@ -90,6 +92,81 @@ int proc_nr_files(ctl_table *table, int write,
}
#endif
+/*
+ * Number of open file descriptors per task_struct
+ */
+struct fd_consumer
+{
+ struct task_struct *task;
+ int fd_count;
+};
+
+static int cmp_fd_consumers(const void *a, const void *b)
+{
+ const struct fd_consumer *x = a, *y = b;
+
+ return y->fd_count - x->fd_count;
+}
+
+static void dump_fd_consumers(void)
+{
+ struct task_struct *p;
+ struct files_struct *files;
+ struct fdtable *fdt;
+ int proc_limit = files_stat.max_consumers;
+ int i, nproc;
+ struct fd_consumer *procs, *tmp;
+
+ if (!files_stat.max_consumers)
+ return;
+
+ read_lock(&tasklist_lock);
+
+ /* build an array of per-task file descriptor usage */
+ nproc = nr_processes();
+ procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
+ if (!procs)
+ goto out;
+
+ tmp = procs;
+
+ for_each_process(p) {
+ tmp->task = p;
+
+ files = get_files_struct(p);
+ if (!files)
+ continue;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+
+ /* we have to actually *count* the fds */
+ for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
+ tmp->fd_count += !!fcheck_files(files, i);
+
+ spin_unlock(&files->file_lock);
+ put_files_struct(files);
+
+ tmp++;
+ }
+
+ /* sort by number of used descriptor in descending order */
+ sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);
+
+ if (proc_limit > nproc)
+ proc_limit = nproc;
+
+ /* output the 'proc_limit' first entries */
+ for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
+ printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
+ tmp->task->pid, tmp->fd_count);
+
+ kfree(procs);
+
+out:
+ read_unlock(&tasklist_lock);
+}
+
/* Find an unused file structure and return a pointer to it.
* Returns NULL, if there are no more free file structures or
* we run out of memory.
@@ -105,6 +182,7 @@ struct file *get_empty_filp(void)
const struct cred *cred = current_cred();
static int old_max;
struct file * f;
+ static unsigned long next_dump = 0;
/*
* Privileged users can go above max_files
@@ -140,6 +218,14 @@ over:
if (get_nr_files() > old_max) {
printk(KERN_INFO "VFS: file-max limit %d reached\n",
get_max_files());
+
+ /* dump the biggest file descriptor users */
+ if (!next_dump || time_after(jiffies, next_dump)) {
+ next_dump = jiffies + files_stat.rate_limit;
+
+ dump_fd_consumers();
+ }
+
old_max = get_nr_files();
}
goto fail;
@@ -425,6 +511,8 @@ void __init files_init(unsigned long mempages)
files_stat.max_files = n;
if (files_stat.max_files < NR_FILE)
files_stat.max_files = NR_FILE;
+
+ files_stat.rate_limit = DUMP_RATE_LIMIT;
files_defer_init();
percpu_counter_init(&nr_files, 0);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9147ca8..291beb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,8 @@ struct files_stat_struct {
int nr_files; /* read only */
int nr_free_files; /* read only */
int max_files; /* tunable */
+ int max_consumers; /* tunable */
+ unsigned long rate_limit; /* tunable */
};
struct inodes_stat_t {
@@ -46,6 +48,9 @@ struct inodes_stat_t {
#define NR_FILE 8192 /* this can well be larger on a larger system */
+#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
+#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
+ dumps */
#define MAY_EXEC 1
#define MAY_WRITE 2
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a68b24..dfb08fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_dointvec,
},
{
+ .procname = "file-max-consumers",
+ .data = &files_stat.max_consumers,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "file-max-rate-limit",
+ .data = &files_stat.rate_limit,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_ms_jiffies_minmax,
+ },
+ {
.procname = "nr_open",
.data = &sysctl_nr_open,
.maxlen = sizeof(int),
--
1.6.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
2009-07-30 12:40 ` Valdis.Kletnieks
2009-10-11 12:17 ` Alexander Shishkin
2010-01-10 16:34 ` [RFC][PATCHv2] " Alexander Shishkin
@ 2010-01-11 9:38 ` Alexander Shishkin
2010-01-11 12:40 ` Andreas Dilger
` (2 more replies)
2 siblings, 3 replies; 11+ messages in thread
From: Alexander Shishkin @ 2010-01-11 9:38 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: linux-fsdevel, akpm, linux-kernel, Alexander Shishkin, viro
When a file descriptor limit is hit, display the top consumers of
descriptors so that it is possible to identify and fix those which
leak them.
Two new sysctl tunables are introduced:
* file-max-consumers -- number of processes to display (defaults
to 10);
* file-max-rate-limit -- time interval between subsequent dumps
(defaults to 10 seconds).
Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
---
Changes:
v3 -- fix a couple of silly checkpatch errors
v2 -- add rate-limiting and reduce number of processes to be output
v1 -- initial implementation.
fs/file_table.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++-
include/linux/fs.h | 5 +++
kernel/sysctl.c | 14 ++++++++
3 files changed, 107 insertions(+), 1 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 69652c5..26666fd 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -9,6 +9,7 @@
#include <linux/slab.h>
#include <linux/file.h>
#include <linux/fdtable.h>
+#include <linux/sort.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/fs.h>
@@ -29,7 +30,8 @@
/* sysctl tunables... */
struct files_stat_struct files_stat = {
- .max_files = NR_FILE
+ .max_files = NR_FILE,
+ .max_consumers = NR_CONSUMERS,
};
/* public. Not pretty! */
@@ -90,6 +92,80 @@ int proc_nr_files(ctl_table *table, int write,
}
#endif
+/*
+ * Number of open file descriptors per task_struct
+ */
+struct fd_consumer {
+ struct task_struct *task;
+ int fd_count;
+};
+
+static int cmp_fd_consumers(const void *a, const void *b)
+{
+ const struct fd_consumer *x = a, *y = b;
+
+ return y->fd_count - x->fd_count;
+}
+
+static void dump_fd_consumers(void)
+{
+ struct task_struct *p;
+ struct files_struct *files;
+ struct fdtable *fdt;
+ int proc_limit = files_stat.max_consumers;
+ int i, nproc;
+ struct fd_consumer *procs, *tmp;
+
+ if (!files_stat.max_consumers)
+ return;
+
+ read_lock(&tasklist_lock);
+
+ /* build an array of per-task file descriptor usage */
+ nproc = nr_processes();
+ procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
+ if (!procs)
+ goto out;
+
+ tmp = procs;
+
+ for_each_process(p) {
+ tmp->task = p;
+
+ files = get_files_struct(p);
+ if (!files)
+ continue;
+
+ spin_lock(&files->file_lock);
+ fdt = files_fdtable(files);
+
+ /* we have to actually *count* the fds */
+ for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
+ tmp->fd_count += !!fcheck_files(files, i);
+
+ spin_unlock(&files->file_lock);
+ put_files_struct(files);
+
+ tmp++;
+ }
+
+ /* sort by number of used descriptor in descending order */
+ sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);
+
+ if (proc_limit > nproc)
+ proc_limit = nproc;
+
+ /* output the 'proc_limit' first entries */
+ for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
+ printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
+ tmp->task->pid, tmp->fd_count);
+
+ kfree(procs);
+
+out:
+ read_unlock(&tasklist_lock);
+}
+
/* Find an unused file structure and return a pointer to it.
* Returns NULL, if there are no more free file structures or
* we run out of memory.
@@ -105,6 +181,7 @@ struct file *get_empty_filp(void)
const struct cred *cred = current_cred();
static int old_max;
struct file * f;
+ static unsigned long next_dump;
/*
* Privileged users can go above max_files
@@ -140,6 +217,14 @@ over:
if (get_nr_files() > old_max) {
printk(KERN_INFO "VFS: file-max limit %d reached\n",
get_max_files());
+
+ /* dump the biggest file descriptor users */
+ if (!next_dump || time_after(jiffies, next_dump)) {
+ next_dump = jiffies + files_stat.rate_limit;
+
+ dump_fd_consumers();
+ }
+
old_max = get_nr_files();
}
goto fail;
@@ -425,6 +510,8 @@ void __init files_init(unsigned long mempages)
files_stat.max_files = n;
if (files_stat.max_files < NR_FILE)
files_stat.max_files = NR_FILE;
+
+ files_stat.rate_limit = DUMP_RATE_LIMIT;
files_defer_init();
percpu_counter_init(&nr_files, 0);
}
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9147ca8..291beb3 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -36,6 +36,8 @@ struct files_stat_struct {
int nr_files; /* read only */
int nr_free_files; /* read only */
int max_files; /* tunable */
+ int max_consumers; /* tunable */
+ unsigned long rate_limit; /* tunable */
};
struct inodes_stat_t {
@@ -46,6 +48,9 @@ struct inodes_stat_t {
#define NR_FILE 8192 /* this can well be larger on a larger system */
+#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
+#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
+ dumps */
#define MAY_EXEC 1
#define MAY_WRITE 2
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 8a68b24..dfb08fc 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
.proc_handler = proc_dointvec,
},
{
+ .procname = "file-max-consumers",
+ .data = &files_stat.max_consumers,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "file-max-rate-limit",
+ .data = &files_stat.rate_limit,
+ .maxlen = sizeof(unsigned long),
+ .mode = 0644,
+ .proc_handler = proc_doulongvec_ms_jiffies_minmax,
+ },
+ {
.procname = "nr_open",
.data = &sysctl_nr_open,
.maxlen = sizeof(int),
--
1.6.5
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
@ 2010-01-11 12:40 ` Andreas Dilger
2010-01-13 22:06 ` Andi Kleen
2010-01-13 22:44 ` Al Viro
2 siblings, 0 replies; 11+ messages in thread
From: Andreas Dilger @ 2010-01-11 12:40 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel, viro
On 2010-01-11, at 05:38, Alexander Shishkin wrote:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
>
> Two new sysctl tunables are introduced:
> * file-max-consumers -- number of processes to display (defaults
> to 10);
> * file-max-rate-limit -- time interval between subsequent dumps
> (defaults to 10 seconds).
This should default to max_consumers=0 to avoid spamming the logs, IMHO.
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: viro@zeniv.linux.org.uk
> CC: linux-fsdevel@vger.kernel.org
> ---
> Changes:
> v3 -- fix a couple of silly checkpatch errors
> v2 -- add rate-limiting and reduce number of processes to be output
> v1 -- initial implementation.
>
> fs/file_table.c | 89 +++++++++++++++++++++++++++++++++++++++++++
> ++++++++-
> include/linux/fs.h | 5 +++
> kernel/sysctl.c | 14 ++++++++
> 3 files changed, 107 insertions(+), 1 deletions(-)
>
> diff --git a/fs/file_table.c b/fs/file_table.c
> index 69652c5..26666fd 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -9,6 +9,7 @@
> #include <linux/slab.h>
> #include <linux/file.h>
> #include <linux/fdtable.h>
> +#include <linux/sort.h>
> #include <linux/init.h>
> #include <linux/module.h>
> #include <linux/fs.h>
> @@ -29,7 +30,8 @@
>
> /* sysctl tunables... */
> struct files_stat_struct files_stat = {
> - .max_files = NR_FILE
> + .max_files = NR_FILE,
> + .max_consumers = NR_CONSUMERS,
> };
>
> /* public. Not pretty! */
> @@ -90,6 +92,80 @@ int proc_nr_files(ctl_table *table, int write,
> }
> #endif
>
> +/*
> + * Number of open file descriptors per task_struct
> + */
> +struct fd_consumer {
> + struct task_struct *task;
> + int fd_count;
> +};
> +
> +static int cmp_fd_consumers(const void *a, const void *b)
> +{
> + const struct fd_consumer *x = a, *y = b;
> +
> + return y->fd_count - x->fd_count;
> +}
> +
> +static void dump_fd_consumers(void)
> +{
> + struct task_struct *p;
> + struct files_struct *files;
> + struct fdtable *fdt;
> + int proc_limit = files_stat.max_consumers;
> + int i, nproc;
> + struct fd_consumer *procs, *tmp;
> +
> + if (!files_stat.max_consumers)
> + return;
> +
> + read_lock(&tasklist_lock);
> +
> + /* build an array of per-task file descriptor usage */
> + nproc = nr_processes();
> + procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
> + if (!procs)
> + goto out;
> +
> + tmp = procs;
> +
> + for_each_process(p) {
> + tmp->task = p;
> +
> + files = get_files_struct(p);
> + if (!files)
> + continue;
> +
> + spin_lock(&files->file_lock);
> + fdt = files_fdtable(files);
> +
> + /* we have to actually *count* the fds */
> + for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
> + tmp->fd_count += !!fcheck_files(files, i);
> +
> + spin_unlock(&files->file_lock);
> + put_files_struct(files);
> +
> + tmp++;
> + }
> +
> + /* sort by number of used descriptor in descending order */
> + sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers,
> NULL);
> +
> + if (proc_limit > nproc)
> + proc_limit = nproc;
> +
> + /* output the 'proc_limit' first entries */
> + for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
> + printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
> + tmp->task->pid, tmp->fd_count);
> +
> + kfree(procs);
> +
> +out:
> + read_unlock(&tasklist_lock);
> +}
> +
> /* Find an unused file structure and return a pointer to it.
> * Returns NULL, if there are no more free file structures or
> * we run out of memory.
> @@ -105,6 +181,7 @@ struct file *get_empty_filp(void)
> const struct cred *cred = current_cred();
> static int old_max;
> struct file * f;
> + static unsigned long next_dump;
>
> /*
> * Privileged users can go above max_files
> @@ -140,6 +217,14 @@ over:
> if (get_nr_files() > old_max) {
> printk(KERN_INFO "VFS: file-max limit %d reached\n",
> get_max_files());
> +
> + /* dump the biggest file descriptor users */
> + if (!next_dump || time_after(jiffies, next_dump)) {
> + next_dump = jiffies + files_stat.rate_limit;
> +
> + dump_fd_consumers();
> + }
> +
> old_max = get_nr_files();
> }
> goto fail;
> @@ -425,6 +510,8 @@ void __init files_init(unsigned long mempages)
> files_stat.max_files = n;
> if (files_stat.max_files < NR_FILE)
> files_stat.max_files = NR_FILE;
> +
> + files_stat.rate_limit = DUMP_RATE_LIMIT;
> files_defer_init();
> percpu_counter_init(&nr_files, 0);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9147ca8..291beb3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -36,6 +36,8 @@ struct files_stat_struct {
> int nr_files; /* read only */
> int nr_free_files; /* read only */
> int max_files; /* tunable */
> + int max_consumers; /* tunable */
> + unsigned long rate_limit; /* tunable */
> };
>
> struct inodes_stat_t {
> @@ -46,6 +48,9 @@ struct inodes_stat_t {
>
>
> #define NR_FILE 8192 /* this can well be larger on a larger system */
> +#define NR_CONSUMERS 10 /* dump this many tasks when file-max is
> hit */
> +#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long
> between
> + dumps */
>
> #define MAY_EXEC 1
> #define MAY_WRITE 2
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a68b24..dfb08fc 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
> .proc_handler = proc_dointvec,
> },
> {
> + .procname = "file-max-consumers",
> + .data = &files_stat.max_consumers,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> + {
> + .procname = "file-max-rate-limit",
> + .data = &files_stat.rate_limit,
> + .maxlen = sizeof(unsigned long),
> + .mode = 0644,
> + .proc_handler = proc_doulongvec_ms_jiffies_minmax,
> + },
> + {
> .procname = "nr_open",
> .data = &sysctl_nr_open,
> .maxlen = sizeof(int),
> --
> 1.6.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
2010-01-11 12:40 ` Andreas Dilger
@ 2010-01-13 22:06 ` Andi Kleen
2010-01-13 22:44 ` Al Viro
2 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2010-01-13 22:06 UTC (permalink / raw)
To: Alexander Shishkin
Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel, viro
Alexander Shishkin <virtuoso@slind.org> writes:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
Can't you get that information from /proc/*/fd/ ?
Doing it in user space would be much saner.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCHv2] List per-process file descriptor consumption when hitting file-max
2010-01-10 16:34 ` [RFC][PATCHv2] " Alexander Shishkin
@ 2010-01-13 22:12 ` Andrew Morton
0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2010-01-13 22:12 UTC (permalink / raw)
To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, viro
On Sun, 10 Jan 2010 18:34:02 +0200
Alexander Shishkin <virtuoso@slind.org> wrote:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
>
> Two new sysctl tunables are introduced:
> * file-max-consumers -- number of processes to display (defaults
> to 10);
> * file-max-rate-limit -- time interval between subsequent dumps
> (defaults to 10 seconds).
>
> Signed-off-by: Alexander Shishkin <virtuoso@slind.org>
> CC: viro@zeniv.linux.org.uk
> CC: linux-fsdevel@vger.kernel.org
> ---
> fs/file_table.c | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> include/linux/fs.h | 5 +++
> kernel/sysctl.c | 14 ++++++++
> 3 files changed, 108 insertions(+), 1 deletions(-)
>
Seems to be a lot of code for a fairly obscure requirement. Can we
justify adding it? Can't the operator use lsof(8)?
> +struct fd_consumer
> +{
Please use scripts/checkpatch.pl.
> + struct task_struct *task;
> + int fd_count;
> +};
> +
> +static int cmp_fd_consumers(const void *a, const void *b)
> +{
> + const struct fd_consumer *x = a, *y = b;
> +
> + return y->fd_count - x->fd_count;
> +}
> +
> +static void dump_fd_consumers(void)
> +{
> + struct task_struct *p;
> + struct files_struct *files;
> + struct fdtable *fdt;
> + int proc_limit = files_stat.max_consumers;
> + int i, nproc;
> + struct fd_consumer *procs, *tmp;
Please avoid variables called `tmp'. Think up a meaningful name.
> + if (!files_stat.max_consumers)
> + return;
> +
> + read_lock(&tasklist_lock);
> +
> + /* build an array of per-task file descriptor usage */
> + nproc = nr_processes();
> + procs = kzalloc(nproc * sizeof(struct fd_consumer), GFP_KERNEL);
GFP_KERNEL allocation under tasklist_lock is a bug. A bug which would
have been revealed during runtime testing had you empoyed the wise
advice in Documentation/SubmitChecklist!
> + if (!procs)
> + goto out;
> +
> + tmp = procs;
> +
> + for_each_process(p) {
> + tmp->task = p;
> +
> + files = get_files_struct(p);
This does task_lock() under tasklist_lock. Is that correct?
> + if (!files)
> + continue;
> +
> + spin_lock(&files->file_lock);
Similarly, is this the correct ranking?
Also, is it a _new_ ranking? Is there any place where we
simultaneously take tasklist_lock and file_lock? If so, that's a bit
undesirable.
> + fdt = files_fdtable(files);
> +
> + /* we have to actually *count* the fds */
> + for (tmp->fd_count = i = 0; i < fdt->max_fds; i++)
> + tmp->fd_count += !!fcheck_files(files, i);
> +
> + spin_unlock(&files->file_lock);
> + put_files_struct(files);
> +
> + tmp++;
> + }
> +
> + /* sort by number of used descriptor in descending order */
> + sort(procs, nproc, sizeof(struct fd_consumer), cmp_fd_consumers, NULL);
This code can hold tasklist lock for quite a long time.
> + if (proc_limit > nproc)
> + proc_limit = nproc;
> +
> + /* output the 'proc_limit' first entries */
> + for (i = 0, tmp = procs; i < proc_limit; i++, tmp++)
> + printk(KERN_INFO "=> %s [%d]: open=%d\n", tmp->task->comm,
> + tmp->task->pid, tmp->fd_count);
> +
> + kfree(procs);
> +
> +out:
> + read_unlock(&tasklist_lock);
> +}
> +
> /* Find an unused file structure and return a pointer to it.
> * Returns NULL, if there are no more free file structures or
> * we run out of memory.
> @@ -105,6 +182,7 @@ struct file *get_empty_filp(void)
> const struct cred *cred = current_cred();
> static int old_max;
> struct file * f;
> + static unsigned long next_dump = 0;
>
> /*
> * Privileged users can go above max_files
> @@ -140,6 +218,14 @@ over:
> if (get_nr_files() > old_max) {
> printk(KERN_INFO "VFS: file-max limit %d reached\n",
> get_max_files());
> +
> + /* dump the biggest file descriptor users */
> + if (!next_dump || time_after(jiffies, next_dump)) {
> + next_dump = jiffies + files_stat.rate_limit;
> +
> + dump_fd_consumers();
> + }
Use of __ratelimit() would be appropriate here.
> old_max = get_nr_files();
> }
> goto fail;
> @@ -425,6 +511,8 @@ void __init files_init(unsigned long mempages)
> files_stat.max_files = n;
> if (files_stat.max_files < NR_FILE)
> files_stat.max_files = NR_FILE;
> +
> + files_stat.rate_limit = DUMP_RATE_LIMIT;
> files_defer_init();
> percpu_counter_init(&nr_files, 0);
> }
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 9147ca8..291beb3 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -36,6 +36,8 @@ struct files_stat_struct {
> int nr_files; /* read only */
> int nr_free_files; /* read only */
> int max_files; /* tunable */
> + int max_consumers; /* tunable */
> + unsigned long rate_limit; /* tunable */
> };
>
> struct inodes_stat_t {
> @@ -46,6 +48,9 @@ struct inodes_stat_t {
>
>
> #define NR_FILE 8192 /* this can well be larger on a larger system */
> +#define NR_CONSUMERS 10 /* dump this many tasks when file-max is hit */
> +#define DUMP_RATE_LIMIT msecs_to_jiffies(10000) /* wait this long between
> + dumps */
>
> #define MAY_EXEC 1
> #define MAY_WRITE 2
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index 8a68b24..dfb08fc 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -1325,6 +1325,20 @@ static struct ctl_table fs_table[] = {
> .proc_handler = proc_dointvec,
> },
> {
> + .procname = "file-max-consumers",
> + .data = &files_stat.max_consumers,
> + .maxlen = sizeof(int),
> + .mode = 0644,
> + .proc_handler = proc_dointvec,
> + },
> + {
> + .procname = "file-max-rate-limit",
> + .data = &files_stat.rate_limit,
> + .maxlen = sizeof(unsigned long),
> + .mode = 0644,
> + .proc_handler = proc_doulongvec_ms_jiffies_minmax,
> + },
> + {
> .procname = "nr_open",
> .data = &sysctl_nr_open,
> .maxlen = sizeof(int),
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
2010-01-11 12:40 ` Andreas Dilger
2010-01-13 22:06 ` Andi Kleen
@ 2010-01-13 22:44 ` Al Viro
2010-01-13 22:57 ` Al Viro
2 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2010-01-13 22:44 UTC (permalink / raw)
To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel
On Mon, Jan 11, 2010 at 11:38:07AM +0200, Alexander Shishkin wrote:
> When a file descriptor limit is hit, display the top consumers of
> descriptors so that it is possible to identify and fix those which
> leak them.
>
> Two new sysctl tunables are introduced:
> * file-max-consumers -- number of processes to display (defaults
> to 10);
> * file-max-rate-limit -- time interval between subsequent dumps
> (defaults to 10 seconds).
That *still* doesn't answer the most important question: what for?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC][PATCHv3] List per-process file descriptor consumption when hitting file-max
2010-01-13 22:44 ` Al Viro
@ 2010-01-13 22:57 ` Al Viro
0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2010-01-13 22:57 UTC (permalink / raw)
To: Alexander Shishkin; +Cc: Valdis.Kletnieks, linux-fsdevel, akpm, linux-kernel
On Wed, Jan 13, 2010 at 10:44:59PM +0000, Al Viro wrote:
> On Mon, Jan 11, 2010 at 11:38:07AM +0200, Alexander Shishkin wrote:
> > When a file descriptor limit is hit, display the top consumers of
> > descriptors so that it is possible to identify and fix those which
> > leak them.
> >
> > Two new sysctl tunables are introduced:
> > * file-max-consumers -- number of processes to display (defaults
> > to 10);
> > * file-max-rate-limit -- time interval between subsequent dumps
> > (defaults to 10 seconds).
>
> That *still* doesn't answer the most important question: what for?
BTW, even leaving that (and obvious deadlocks) aside, this stuff is
monumentally bogus. A process can easily have shitloads of opened
descriptors and very few opened files (see dup() and friends). Conversely,
you can have shitloads of opened files and not a single opened descriptor
(see mmap()). And you are calling that when we have too many opened
struct file.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-01-13 22:57 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-08 11:38 [PATCH] [RFC] List per-process file descriptor consumption when hitting file-max alexander.shishckin
2009-07-29 16:17 ` Alexander Shishkin
2009-07-30 12:40 ` Valdis.Kletnieks
2009-10-11 12:17 ` Alexander Shishkin
2010-01-10 16:34 ` [RFC][PATCHv2] " Alexander Shishkin
2010-01-13 22:12 ` Andrew Morton
2010-01-11 9:38 ` [RFC][PATCHv3] " Alexander Shishkin
2010-01-11 12:40 ` Andreas Dilger
2010-01-13 22:06 ` Andi Kleen
2010-01-13 22:44 ` Al Viro
2010-01-13 22:57 ` Al Viro
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).