linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@osdl.org>
To: dipankar@in.ibm.com
Cc: linux-kernel@vger.kernel.org, paulmck@us.ibm.com, dada1@cosmosbay.com
Subject: Re: [PATCH 2/2] fix file counting
Date: Sat, 18 Feb 2006 01:04:14 -0800	[thread overview]
Message-ID: <20060218010414.1f8d6782.akpm@osdl.org> (raw)
In-Reply-To: <20060217154626.GN29846@in.ibm.com>

Dipankar Sarma <dipankar@in.ibm.com> wrote:
>
> I have benchmarked this on an x86_64 NUMA system and see no
>  significant performance difference on kernbench. Tested on
>  both x86_64 and powerpc.
> 
>  The way we do file struct accounting is not very suitable for batched
>  freeing. For scalability reasons, file accounting was constructor/destructor
>  based. This meant that nr_files was decremented only when
>  the object was removed from the slab cache. This is
>  susceptible to slab fragmentation. With RCU based file structure,
>  consequent batched freeing and a test program like Serge's,
>  we just speed this up and end up with a very fragmented slab -
> 
>  llm22:~ # cat /proc/sys/fs/file-nr
>  587730  0       758844
> 
>  At the same time, I see only a 2000+ objects in filp cache.
>  The following patch I fixes this problem. 
> 
>  This patch changes the file counting by removing the filp_count_lock.
>  Instead we use a separate percpu counter, nr_files, for now and all
>  accesses to it are through get_nr_files() api. In the sysctl
>  handler for nr_files, we populate files_stat.nr_files before returning
>  to user.
> 
>  Counting files as an when they are created and destroyed (as opposed
>  to inside slab) allows us to correctly count open files with RCU.

Fair enough.

What do you think of these changes?




From: Andrew Morton <akpm@osdl.org>

- Nuke the blank line between "}" and EXPORT_SYMBOL().  That's never seemed
  pointful to me.

- Make the get_max_files export use _GPL - only unix.ko uses it.

- Use `-1' in the arg to percpu_counter_mod() rather than `-1L'.  The
  compiler will dtrt and we shouldn't be peering inside percpu_counter
  internals here anyway.

- Scrub that - use percpu_counter_dec() and percpu_counter_inc().

- percpu_counters can be inaccurate on big SMP.  Before we actually fail a
  get_empty_filp() attempt, use the (new in -mm) expensive
  percpu_counter_sum() to check whether we're really over the limit.

- Make get_nr_files() static.  Which is just as well - any callers might
  want the percpu_counter_sum() treatment.  In which case we'd be better off
  exporting some

	bool are_files_over_limit(how_many_i_want);

  API.

Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 fs/file_table.c    |   21 +++++++++++++--------
 include/linux/fs.h |    1 -
 2 files changed, 13 insertions(+), 9 deletions(-)

diff -puN fs/file_table.c~fix-file-counting-fixes fs/file_table.c
--- devel/fs/file_table.c~fix-file-counting-fixes	2006-02-18 01:02:43.000000000 -0800
+++ devel-akpm/fs/file_table.c	2006-02-18 01:02:43.000000000 -0800
@@ -22,6 +22,7 @@
 #include <linux/fsnotify.h>
 #include <linux/sysctl.h>
 #include <linux/percpu_counter.h>
+
 #include <asm/atomic.h>
 
 /* sysctl tunables... */
@@ -42,14 +43,14 @@ static inline void file_free_rcu(struct 
 
 static inline void file_free(struct file *f)
 {
-	percpu_counter_mod(&nr_files, -1L);
+	percpu_counter_dec(&nr_files);
 	call_rcu(&f->f_u.fu_rcuhead, file_free_rcu);
 }
 
 /*
  * Return the total number of open files in the system
  */
-int get_nr_files(void)
+static int get_nr_files(void)
 {
 	return percpu_counter_read_positive(&nr_files);
 }
@@ -61,8 +62,7 @@ int get_max_files(void)
 {
 	return files_stat.max_files;
 }
-
-EXPORT_SYMBOL(get_max_files);
+EXPORT_SYMBOL_GPL(get_max_files);
 
 /*
  * Handle nr_files sysctl
@@ -95,15 +95,20 @@ struct file *get_empty_filp(void)
 	/*
 	 * Privileged users can go above max_files
 	 */
-	if (get_nr_files() >= files_stat.max_files &&
-				!capable(CAP_SYS_ADMIN))
-		goto over;
+	if (get_nr_files() >= files_stat.max_files && !capable(CAP_SYS_ADMIN)) {
+		/*
+		 * percpu_counters are inaccurate.  Do an expensive check before
+		 * we go and fail.
+		 */
+		if (percpu_counter_sum(&nr_files) >= files_stat.max_files)
+			goto over;
+	}
 
 	f = kmem_cache_alloc(filp_cachep, GFP_KERNEL);
 	if (f == NULL)
 		goto fail;
 
-	percpu_counter_mod(&nr_files, 1L);
+	percpu_counter_inc(&nr_files);
 	memset(f, 0, sizeof(*f));
 	if (security_file_alloc(f))
 		goto fail_sec;
diff -puN include/linux/fs.h~fix-file-counting-fixes include/linux/fs.h
--- devel/include/linux/fs.h~fix-file-counting-fixes	2006-02-18 01:02:43.000000000 -0800
+++ devel-akpm/include/linux/fs.h	2006-02-18 01:02:43.000000000 -0800
@@ -35,7 +35,6 @@ struct files_stat_struct {
 	int max_files;		/* tunable */
 };
 extern struct files_stat_struct files_stat;
-extern int get_nr_files(void);
 extern int get_max_files(void);
 
 struct inodes_stat_t {
_


  reply	other threads:[~2006-02-18  9:05 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-17 15:41 [PATCH 0/2] RCU updates Dipankar Sarma
2006-02-17 15:43 ` [PATCH 1/2] rcu batch tuning Dipankar Sarma
2006-02-17 15:46   ` [PATCH 2/2] fix file counting Dipankar Sarma
2006-02-18  9:04     ` Andrew Morton [this message]
2006-02-18  9:25       ` Dipankar Sarma
2006-02-18  9:45         ` Andrew Morton
2006-02-18 10:06           ` Dipankar Sarma
2006-02-18 10:10             ` Andrew Morton
2006-02-18 10:44               ` Dipankar Sarma
2006-02-18 12:14         ` Christoph Hellwig
2006-02-18 12:31           ` Arjan van de Ven
2006-02-20 22:36             ` [2.6 patch] make UNIX a bool Adrian Bunk
2006-02-17 20:33   ` [PATCH 1/2] rcu batch tuning Dipankar Sarma
2006-02-18  8:45   ` Andrew Morton
2006-02-18  9:15     ` Dipankar Sarma
  -- strict thread matches above, loose matches on Subject: below --
2006-01-26 18:40 [patch 0/2] RCU: fix various latency/oom issues Dipankar Sarma
2006-01-26 18:41 ` [patch 1/2] rcu batch tuning Dipankar Sarma
2006-01-26 18:42   ` [patch 2/2] fix file counting Dipankar Sarma
2006-01-26 20:15     ` Eric Dumazet
2006-01-27 22:54       ` Andrew Morton
2006-01-27 23:14         ` Paul E. McKenney
2006-01-27 23:28           ` Andrew Morton
2006-01-28 18:42             ` Dipankar Sarma
2006-01-28 18:51               ` Andrew Morton
2006-01-28 19:10                 ` Dipankar Sarma
2006-01-30 17:00             ` Dipankar Sarma
2006-01-31 10:33               ` Eric Dumazet
2006-01-31 20:19                 ` Dipankar Sarma

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060218010414.1f8d6782.akpm@osdl.org \
    --to=akpm@osdl.org \
    --cc=dada1@cosmosbay.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).