From: Andrew Morton <akpm@osdl.org>
To: dipankar@in.ibm.com
Cc: linux-kernel@vger.kernel.org, paulmck@us.ibm.com, dada1@cosmosbay.com
Subject: Re: [PATCH 2/2] fix file counting
Date: Sat, 18 Feb 2006 01:04:14 -0800 [thread overview]
Message-ID: <20060218010414.1f8d6782.akpm@osdl.org> (raw)
In-Reply-To: <20060217154626.GN29846@in.ibm.com>
Dipankar Sarma <dipankar@in.ibm.com> wrote:
>
> I have benchmarked this on an x86_64 NUMA system and see no
> significant performance difference on kernbench. Tested on
> both x86_64 and powerpc.
>
> The way we do file struct accounting is not very suitable for batched
> freeing. For scalability reasons, file accounting was constructor/destructor
> based. This meant that nr_files was decremented only when
> the object was removed from the slab cache. This is
> susceptible to slab fragmentation. With RCU based file structure,
> consequent batched freeing and a test program like Serge's,
> we just speed this up and end up with a very fragmented slab -
>
> llm22:~ # cat /proc/sys/fs/file-nr
> 587730 0 758844
>
> At the same time, I see only a 2000+ objects in filp cache.
> The following patch I fixes this problem.
>
> This patch changes the file counting by removing the filp_count_lock.
> Instead we use a separate percpu counter, nr_files, for now and all
> accesses to it are through get_nr_files() api. In the sysctl
> handler for nr_files, we populate files_stat.nr_files before returning
> to user.
>
> Counting files as an when they are created and destroyed (as opposed
> to inside slab) allows us to correctly count open files with RCU.
Fair enough.
What do you think of these changes?
From: Andrew Morton <akpm@osdl.org>
- Nuke the blank line between "}" and EXPORT_SYMBOL(). That's never seemed
pointful to me.
- Make the get_max_files export use _GPL - only unix.ko uses it.
- Use `-1' in the arg to percpu_counter_mod() rather than `-1L'. The
compiler will dtrt and we shouldn't be peering inside percpu_counter
internals here anyway.
- Scrub that - use percpu_counter_dec() and percpu_counter_inc().
- percpu_counters can be inaccurate on big SMP. Before we actually fail a
get_empty_filp() attempt, use the (new in -mm) expensive
percpu_counter_sum() to check whether we're really over the limit.
- Make get_nr_files() static. Which is just as well - any callers might
want the percpu_counter_sum() treatment. In which case we'd be better off
exporting some
bool are_files_over_limit(how_many_i_want);
API.
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---
fs/file_table.c | 21 +++++++++++++--------
include/linux/fs.h | 1 -
2 files changed, 13 insertions(+), 9 deletions(-)
diff -puN fs/file_table.c~fix-file-counting-fixes fs/file_table.c
--- devel/fs/file_table.c~fix-file-counting-fixes 2006-02-18 01:02:43.000000000 -0800
+++ devel-akpm/fs/file_table.c 2006-02-18 01:02:43.000000000 -0800
@@ -22,6 +22,7 @@
#include <linux/fsnotify.h>
#include <linux/sysctl.h>
#include <linux/percpu_counter.h>
+
#include <asm/atomic.h>
/* sysctl tunables... */
@@ -42,14 +43,14 @@ static inline void file_free_rcu(struct
static inline void file_free(struct file *f)
{
- percpu_counter_mod(&nr_files, -1L);
+ percpu_counter_dec(&nr_files);
call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
}
/*
* Return the total number of open files in the system
*/
-int get_nr_files(void)
+static int get_nr_files(void)
{
return percpu_counter_read_positive(&nr_files);
}
@@ -61,8 +62,7 @@ int get_max_files(void)
{
return files_stat.max_files;
}
-
-EXPORT_SYMBOL(get_max_files);
+EXPORT_SYMBOL_GPL(get_max_files);
/*
* Handle nr_files sysctl
@@ -95,15 +95,20 @@ struct file *get_empty_filp(void)
/*
* Privileged users can go above max_files
*/
- if (get_nr_files() >= files_stat.max_files &&
- !capable(CAP_SYS_ADMIN))
- goto over;
+ if (get_nr_files() >= files_stat.max_files && !capable(CAP_SYS_ADMIN)) {
+ /*
+ * percpu_counters are inaccurate. Do an expensive check before
+ * we go and fail.
+ */
+ if (percpu_counter_sum(&nr_files) >= files_stat.max_files)
+ goto over;
+ }
f = kmem_cache_alloc(filp_cachep, GFP_KERNEL);
if (f == NULL)
goto fail;
- percpu_counter_mod(&nr_files, 1L);
+ percpu_counter_inc(&nr_files);
memset(f, 0, sizeof(*f));
if (security_file_alloc(f))
goto fail_sec;
diff -puN include/linux/fs.h~fix-file-counting-fixes include/linux/fs.h
--- devel/include/linux/fs.h~fix-file-counting-fixes 2006-02-18 01:02:43.000000000 -0800
+++ devel-akpm/include/linux/fs.h 2006-02-18 01:02:43.000000000 -0800
@@ -35,7 +35,6 @@ struct files_stat_struct {
int max_files; /* tunable */
};
extern struct files_stat_struct files_stat;
-extern int get_nr_files(void);
extern int get_max_files(void);
struct inodes_stat_t {
_
next prev parent reply other threads:[~2006-02-18 9:05 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-02-17 15:41 [PATCH 0/2] RCU updates Dipankar Sarma
2006-02-17 15:43 ` [PATCH 1/2] rcu batch tuning Dipankar Sarma
2006-02-17 15:46 ` [PATCH 2/2] fix file counting Dipankar Sarma
2006-02-18 9:04 ` Andrew Morton [this message]
2006-02-18 9:25 ` Dipankar Sarma
2006-02-18 9:45 ` Andrew Morton
2006-02-18 10:06 ` Dipankar Sarma
2006-02-18 10:10 ` Andrew Morton
2006-02-18 10:44 ` Dipankar Sarma
2006-02-18 12:14 ` Christoph Hellwig
2006-02-18 12:31 ` Arjan van de Ven
2006-02-20 22:36 ` [2.6 patch] make UNIX a bool Adrian Bunk
2006-02-17 20:33 ` [PATCH 1/2] rcu batch tuning Dipankar Sarma
2006-02-18 8:45 ` Andrew Morton
2006-02-18 9:15 ` Dipankar Sarma
-- strict thread matches above, loose matches on Subject: below --
2006-01-26 18:40 [patch 0/2] RCU: fix various latency/oom issues Dipankar Sarma
2006-01-26 18:41 ` [patch 1/2] rcu batch tuning Dipankar Sarma
2006-01-26 18:42 ` [patch 2/2] fix file counting Dipankar Sarma
2006-01-26 20:15 ` Eric Dumazet
2006-01-27 22:54 ` Andrew Morton
2006-01-27 23:14 ` Paul E. McKenney
2006-01-27 23:28 ` Andrew Morton
2006-01-28 18:42 ` Dipankar Sarma
2006-01-28 18:51 ` Andrew Morton
2006-01-28 19:10 ` Dipankar Sarma
2006-01-30 17:00 ` Dipankar Sarma
2006-01-31 10:33 ` Eric Dumazet
2006-01-31 20:19 ` Dipankar Sarma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20060218010414.1f8d6782.akpm@osdl.org \
--to=akpm@osdl.org \
--cc=dada1@cosmosbay.com \
--cc=dipankar@in.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=paulmck@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).