All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Per-container dcache size limitation
@ 2011-08-05  0:35 Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, containers, Pavel Emelyanov, Al Viro,
	Hugh Dickins, Nick Piggin, Andrea Arcangeli, Rik van Riel,
	Dave Hansen, James Bottomley, David Chinner, Glauber Costa

Hi,

Since v1, there is not too much new here.
I'm incorporating David's suggestion of calling the sb
shrinker, which will, in effect, prune the icache and
other sb related objects as well.

I am also keeping the mount based interface, since I
still believe it is the way to go. But I'm obviously
still open for suggestions. Some small corrections
were also made to it since v1. Specifically, bind
mounts are not allowed to alter the original sb dcache
size.

Glauber Costa (4):
  factor out single-shrinker code
  Keep nr_dentry per super block
  limit nr_dentries per superblock
  parse options in the vfs level

 fs/dcache.c              |   44 +++++++++++-
 fs/namespace.c           |  105 ++++++++++++++++++++++++++
 fs/super.c               |   16 ++++-
 include/linux/dcache.h   |    4 +
 include/linux/fs.h       |    3 +
 include/linux/shrinker.h |    6 ++
 mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
 7 files changed, 274 insertions(+), 89 deletions(-)

-- 
1.7.6


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2 1/4] factor out single-shrinker code
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-08-05  0:35   ` Glauber Costa
  2011-08-05  0:35     ` Glauber Costa
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

While shrinking our caches, vmscan.c passes through all
registered shrinkers, trying to free objects as it goes.

We would like to do that individually for some caches,
like the dcache, when certain conditions apply (for
example, when we reach a soon-to-exist maximum allowed size)

To avoid re-writing the same logic at more than one place,
this patch factors out the shrink logic at shrink_one_shrinker(),
that we can call from other places of the kernel.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
---
 include/linux/shrinker.h |    6 ++
 mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
 2 files changed, 104 insertions(+), 87 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 790651b..c5db650 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -39,4 +39,10 @@ struct shrinker {
 #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
 extern void register_shrinker(struct shrinker *);
 extern void unregister_shrinker(struct shrinker *);
+
+unsigned long shrink_one_shrinker(struct shrinker *shrinker,
+				  struct shrink_control *shrink,
+				  unsigned long nr_pages_scanned,                      
+				  unsigned long lru_pages);
+
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7ef6912..50dfc61 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -211,6 +211,102 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
 }
 
 #define SHRINK_BATCH 128
+unsigned long shrink_one_shrinker(struct shrinker *shrinker,
+				  struct shrink_control *shrink,
+				  unsigned long nr_pages_scanned,
+				  unsigned long lru_pages)
+{
+	unsigned long ret = 0;
+	unsigned long long delta;
+	unsigned long total_scan;
+	unsigned long max_pass;
+	int shrink_ret = 0;
+	long nr;
+	long new_nr;
+	long batch_size = shrinker->batch ? shrinker->batch
+					  : SHRINK_BATCH;
+
+	/*
+	 * copy the current shrinker scan count into a local variable
+	 * and zero it so that other concurrent shrinker invocations
+	 * don't also do this scanning work.
+	 */
+	do {
+		nr = shrinker->nr;
+	} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
+
+	total_scan = nr;
+	max_pass = do_shrinker_shrink(shrinker, shrink, 0);
+	delta = (4 * nr_pages_scanned) / shrinker->seeks;
+	delta *= max_pass;
+	do_div(delta, lru_pages + 1);
+	total_scan += delta;
+	if (total_scan < 0) {
+		printk(KERN_ERR "shrink_slab: %pF negative objects to "
+		       "delete nr=%ld\n",
+		       shrinker->shrink, total_scan);
+		total_scan = max_pass;
+	}
+
+	/*
+	 * We need to avoid excessive windup on filesystem shrinkers
+	 * due to large numbers of GFP_NOFS allocations causing the
+	 * shrinkers to return -1 all the time. This results in a large
+	 * nr being built up so when a shrink that can do some work
+	 * comes along it empties the entire cache due to nr >>>
+	 * max_pass.  This is bad for sustaining a working set in
+	 * memory.
+	 *
+	 * Hence only allow the shrinker to scan the entire cache when
+	 * a large delta change is calculated directly.
+	 */
+	if (delta < max_pass / 4)
+		total_scan = min(total_scan, max_pass / 2);
+
+	/*
+	 * Avoid risking looping forever due to too large nr value:
+	 * never try to free more than twice the estimate number of
+	 * freeable entries.
+	 */
+	if (total_scan > max_pass * 2)
+		total_scan = max_pass * 2;
+
+	trace_mm_shrink_slab_start(shrinker, shrink, nr,
+				nr_pages_scanned, lru_pages,
+				max_pass, delta, total_scan);
+
+	while (total_scan >= batch_size) {
+		int nr_before;
+
+		nr_before = do_shrinker_shrink(shrinker, shrink, 0);
+		shrink_ret = do_shrinker_shrink(shrinker, shrink,
+						batch_size);
+		if (shrink_ret == -1)
+			break;
+		if (shrink_ret < nr_before)
+			ret += nr_before - shrink_ret;
+		count_vm_events(SLABS_SCANNED, batch_size);
+		total_scan -= batch_size;
+
+		cond_resched();
+	}
+
+	/*
+	 * move the unused scan count back into the shrinker in a
+	 * manner that handles concurrent updates. If we exhausted the
+	 * scan, there is no need to do an update.
+	 */
+	do {
+		nr = shrinker->nr;
+		new_nr = total_scan + nr;
+		if (total_scan <= 0)
+			break;
+	} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
+
+	trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
+	return ret;
+}
+
 /*
  * Call the shrink functions to age shrinkable caches
  *
@@ -247,93 +343,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
 	}
 
 	list_for_each_entry(shrinker, &shrinker_list, list) {
-		unsigned long long delta;
-		unsigned long total_scan;
-		unsigned long max_pass;
-		int shrink_ret = 0;
-		long nr;
-		long new_nr;
-		long batch_size = shrinker->batch ? shrinker->batch
-						  : SHRINK_BATCH;
-
-		/*
-		 * copy the current shrinker scan count into a local variable
-		 * and zero it so that other concurrent shrinker invocations
-		 * don't also do this scanning work.
-		 */
-		do {
-			nr = shrinker->nr;
-		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
-
-		total_scan = nr;
-		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
-		delta = (4 * nr_pages_scanned) / shrinker->seeks;
-		delta *= max_pass;
-		do_div(delta, lru_pages + 1);
-		total_scan += delta;
-		if (total_scan < 0) {
-			printk(KERN_ERR "shrink_slab: %pF negative objects to "
-			       "delete nr=%ld\n",
-			       shrinker->shrink, total_scan);
-			total_scan = max_pass;
-		}
-
-		/*
-		 * We need to avoid excessive windup on filesystem shrinkers
-		 * due to large numbers of GFP_NOFS allocations causing the
-		 * shrinkers to return -1 all the time. This results in a large
-		 * nr being built up so when a shrink that can do some work
-		 * comes along it empties the entire cache due to nr >>>
-		 * max_pass.  This is bad for sustaining a working set in
-		 * memory.
-		 *
-		 * Hence only allow the shrinker to scan the entire cache when
-		 * a large delta change is calculated directly.
-		 */
-		if (delta < max_pass / 4)
-			total_scan = min(total_scan, max_pass / 2);
-
-		/*
-		 * Avoid risking looping forever due to too large nr value:
-		 * never try to free more than twice the estimate number of
-		 * freeable entries.
-		 */
-		if (total_scan > max_pass * 2)
-			total_scan = max_pass * 2;
-
-		trace_mm_shrink_slab_start(shrinker, shrink, nr,
-					nr_pages_scanned, lru_pages,
-					max_pass, delta, total_scan);
-
-		while (total_scan >= batch_size) {
-			int nr_before;
-
-			nr_before = do_shrinker_shrink(shrinker, shrink, 0);
-			shrink_ret = do_shrinker_shrink(shrinker, shrink,
-							batch_size);
-			if (shrink_ret == -1)
-				break;
-			if (shrink_ret < nr_before)
-				ret += nr_before - shrink_ret;
-			count_vm_events(SLABS_SCANNED, batch_size);
-			total_scan -= batch_size;
-
-			cond_resched();
-		}
-
-		/*
-		 * move the unused scan count back into the shrinker in a
-		 * manner that handles concurrent updates. If we exhausted the
-		 * scan, there is no need to do an update.
-		 */
-		do {
-			nr = shrinker->nr;
-			new_nr = total_scan + nr;
-			if (total_scan <= 0)
-				break;
-		} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
-
-		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
+		ret += shrink_one_shrinker(shrinker, shrink,
+					   nr_pages_scanned, lru_pages);
 	}
 	up_read(&shrinker_rwsem);
 out:
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 1/4] factor out single-shrinker code
  2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
@ 2011-08-05  0:35 ` Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, containers, Pavel Emelyanov, Al Viro,
	Hugh Dickins, Nick Piggin, Andrea Arcangeli, Rik van Riel,
	Dave Hansen, James Bottomley, David Chinner, Glauber Costa

While shrinking our caches, vmscan.c passes through all
registered shrinkers, trying to free objects as it goes.

We would like to do that individually for some caches,
like the dcache, when certain conditions apply (for
example, when we reach a soon-to-exist maximum allowed size)

To avoid re-writing the same logic at more than one place,
this patch factors out the shrink logic at shrink_one_shrinker(),
that we can call from other places of the kernel.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Dave Chinner <david@fromorbit.com>
---
 include/linux/shrinker.h |    6 ++
 mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
 2 files changed, 104 insertions(+), 87 deletions(-)

diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 790651b..c5db650 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -39,4 +39,10 @@ struct shrinker {
 #define DEFAULT_SEEKS 2 /* A good number if you don't know better. */
 extern void register_shrinker(struct shrinker *);
 extern void unregister_shrinker(struct shrinker *);
+
+unsigned long shrink_one_shrinker(struct shrinker *shrinker,
+				  struct shrink_control *shrink,
+				  unsigned long nr_pages_scanned,                      
+				  unsigned long lru_pages);
+
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7ef6912..50dfc61 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -211,6 +211,102 @@ static inline int do_shrinker_shrink(struct shrinker *shrinker,
 }
 
 #define SHRINK_BATCH 128
+unsigned long shrink_one_shrinker(struct shrinker *shrinker,
+				  struct shrink_control *shrink,
+				  unsigned long nr_pages_scanned,
+				  unsigned long lru_pages)
+{
+	unsigned long ret = 0;
+	unsigned long long delta;
+	unsigned long total_scan;
+	unsigned long max_pass;
+	int shrink_ret = 0;
+	long nr;
+	long new_nr;
+	long batch_size = shrinker->batch ? shrinker->batch
+					  : SHRINK_BATCH;
+
+	/*
+	 * copy the current shrinker scan count into a local variable
+	 * and zero it so that other concurrent shrinker invocations
+	 * don't also do this scanning work.
+	 */
+	do {
+		nr = shrinker->nr;
+	} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
+
+	total_scan = nr;
+	max_pass = do_shrinker_shrink(shrinker, shrink, 0);
+	delta = (4 * nr_pages_scanned) / shrinker->seeks;
+	delta *= max_pass;
+	do_div(delta, lru_pages + 1);
+	total_scan += delta;
+	if (total_scan < 0) {
+		printk(KERN_ERR "shrink_slab: %pF negative objects to "
+		       "delete nr=%ld\n",
+		       shrinker->shrink, total_scan);
+		total_scan = max_pass;
+	}
+
+	/*
+	 * We need to avoid excessive windup on filesystem shrinkers
+	 * due to large numbers of GFP_NOFS allocations causing the
+	 * shrinkers to return -1 all the time. This results in a large
+	 * nr being built up so when a shrink that can do some work
+	 * comes along it empties the entire cache due to nr >>>
+	 * max_pass.  This is bad for sustaining a working set in
+	 * memory.
+	 *
+	 * Hence only allow the shrinker to scan the entire cache when
+	 * a large delta change is calculated directly.
+	 */
+	if (delta < max_pass / 4)
+		total_scan = min(total_scan, max_pass / 2);
+
+	/*
+	 * Avoid risking looping forever due to too large nr value:
+	 * never try to free more than twice the estimate number of
+	 * freeable entries.
+	 */
+	if (total_scan > max_pass * 2)
+		total_scan = max_pass * 2;
+
+	trace_mm_shrink_slab_start(shrinker, shrink, nr,
+				nr_pages_scanned, lru_pages,
+				max_pass, delta, total_scan);
+
+	while (total_scan >= batch_size) {
+		int nr_before;
+
+		nr_before = do_shrinker_shrink(shrinker, shrink, 0);
+		shrink_ret = do_shrinker_shrink(shrinker, shrink,
+						batch_size);
+		if (shrink_ret == -1)
+			break;
+		if (shrink_ret < nr_before)
+			ret += nr_before - shrink_ret;
+		count_vm_events(SLABS_SCANNED, batch_size);
+		total_scan -= batch_size;
+
+		cond_resched();
+	}
+
+	/*
+	 * move the unused scan count back into the shrinker in a
+	 * manner that handles concurrent updates. If we exhausted the
+	 * scan, there is no need to do an update.
+	 */
+	do {
+		nr = shrinker->nr;
+		new_nr = total_scan + nr;
+		if (total_scan <= 0)
+			break;
+	} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
+
+	trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
+	return ret;
+}
+
 /*
  * Call the shrink functions to age shrinkable caches
  *
@@ -247,93 +343,8 @@ unsigned long shrink_slab(struct shrink_control *shrink,
 	}
 
 	list_for_each_entry(shrinker, &shrinker_list, list) {
-		unsigned long long delta;
-		unsigned long total_scan;
-		unsigned long max_pass;
-		int shrink_ret = 0;
-		long nr;
-		long new_nr;
-		long batch_size = shrinker->batch ? shrinker->batch
-						  : SHRINK_BATCH;
-
-		/*
-		 * copy the current shrinker scan count into a local variable
-		 * and zero it so that other concurrent shrinker invocations
-		 * don't also do this scanning work.
-		 */
-		do {
-			nr = shrinker->nr;
-		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
-
-		total_scan = nr;
-		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
-		delta = (4 * nr_pages_scanned) / shrinker->seeks;
-		delta *= max_pass;
-		do_div(delta, lru_pages + 1);
-		total_scan += delta;
-		if (total_scan < 0) {
-			printk(KERN_ERR "shrink_slab: %pF negative objects to "
-			       "delete nr=%ld\n",
-			       shrinker->shrink, total_scan);
-			total_scan = max_pass;
-		}
-
-		/*
-		 * We need to avoid excessive windup on filesystem shrinkers
-		 * due to large numbers of GFP_NOFS allocations causing the
-		 * shrinkers to return -1 all the time. This results in a large
-		 * nr being built up so when a shrink that can do some work
-		 * comes along it empties the entire cache due to nr >>>
-		 * max_pass.  This is bad for sustaining a working set in
-		 * memory.
-		 *
-		 * Hence only allow the shrinker to scan the entire cache when
-		 * a large delta change is calculated directly.
-		 */
-		if (delta < max_pass / 4)
-			total_scan = min(total_scan, max_pass / 2);
-
-		/*
-		 * Avoid risking looping forever due to too large nr value:
-		 * never try to free more than twice the estimate number of
-		 * freeable entries.
-		 */
-		if (total_scan > max_pass * 2)
-			total_scan = max_pass * 2;
-
-		trace_mm_shrink_slab_start(shrinker, shrink, nr,
-					nr_pages_scanned, lru_pages,
-					max_pass, delta, total_scan);
-
-		while (total_scan >= batch_size) {
-			int nr_before;
-
-			nr_before = do_shrinker_shrink(shrinker, shrink, 0);
-			shrink_ret = do_shrinker_shrink(shrinker, shrink,
-							batch_size);
-			if (shrink_ret == -1)
-				break;
-			if (shrink_ret < nr_before)
-				ret += nr_before - shrink_ret;
-			count_vm_events(SLABS_SCANNED, batch_size);
-			total_scan -= batch_size;
-
-			cond_resched();
-		}
-
-		/*
-		 * move the unused scan count back into the shrinker in a
-		 * manner that handles concurrent updates. If we exhausted the
-		 * scan, there is no need to do an update.
-		 */
-		do {
-			nr = shrinker->nr;
-			new_nr = total_scan + nr;
-			if (total_scan <= 0)
-				break;
-		} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
-
-		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
+		ret += shrink_one_shrinker(shrinker, shrink,
+					   nr_pages_scanned, lru_pages);
 	}
 	up_read(&shrinker_rwsem);
 out:
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/4] Keep nr_dentry per super block
  2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
@ 2011-08-05  0:35     ` Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Now that we have per-sb shrinkers, it makes sense to have nr_dentries
stored per sb as well. We turn them into per-cpu counters so we can
keep acessing them without locking.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
---
 fs/dcache.c        |   12 +++++++++++-
 fs/super.c         |   15 ++++++++++++++-
 include/linux/fs.h |    2 ++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b05aac3..ac19d24 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -151,7 +151,13 @@ static void __d_free(struct rcu_head *head)
 static void d_free(struct dentry *dentry)
 {
 	BUG_ON(dentry->d_count);
+	/*
+	 * It is cheaper to keep a global counter separate
+	 * then to scan through all superblocks when needed
+	 */
 	this_cpu_dec(nr_dentry);
+	this_cpu_dec(*dentry->d_sb->s_nr_dentry);
+
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
 
@@ -1224,7 +1230,11 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_alias);
 	INIT_LIST_HEAD(&dentry->d_u.d_child);
 	d_set_d_op(dentry, dentry->d_sb->s_d_op);
-
+	/*
+	 * It is cheaper to keep a global counter separate
+	 * then to scan through all superblocks when needed
+	 */
+	this_cpu_inc(*dentry->d_sb->s_nr_dentry);
 	this_cpu_inc(nr_dentry);
 
 	return dentry;
diff --git a/fs/super.c b/fs/super.c
index 3f56a26..9345385 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -112,6 +112,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
+	int i;
 
 	if (s) {
 		if (security_sb_alloc(s)) {
@@ -119,15 +120,26 @@ static struct super_block *alloc_super(struct file_system_type *type)
 			s = NULL;
 			goto out;
 		}
+
+		s->s_nr_dentry = alloc_percpu(int);
+		if (!s->s_nr_dentry) {
+			security_sb_free(s);
+			kfree(s);
+			s = NULL;
+			goto out;
+		}
+		for_each_possible_cpu(i)
+			*per_cpu_ptr(s->s_nr_dentry, i) = 0;
+
 #ifdef CONFIG_SMP
 		s->s_files = alloc_percpu(struct list_head);
 		if (!s->s_files) {
+			free_percpu(s->s_nr_dentry);
 			security_sb_free(s);
 			kfree(s);
 			s = NULL;
 			goto out;
 		} else {
-			int i;
 
 			for_each_possible_cpu(i)
 				INIT_LIST_HEAD(per_cpu_ptr(s->s_files, i));
@@ -198,6 +210,7 @@ static inline void destroy_super(struct super_block *s)
 {
 #ifdef CONFIG_SMP
 	free_percpu(s->s_files);
+	free_percpu(s->s_nr_dentry);
 #endif
 	security_sb_free(s);
 	kfree(s->s_subtype);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f23bcb7..35113fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1399,6 +1399,8 @@ struct super_block {
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
+	int __percpu 		*s_nr_dentry;		/* # of dentry on this sb */
+
 	/* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */
 	spinlock_t		s_inode_lru_lock ____cacheline_aligned_in_smp;
 	struct list_head	s_inode_lru;		/* unused inode lru */
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 2/4] Keep nr_dentry per super block
@ 2011-08-05  0:35     ` Glauber Costa
  0 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, containers, Pavel Emelyanov, Al Viro,
	Hugh Dickins, Nick Piggin, Andrea Arcangeli, Rik van Riel,
	Dave Hansen, James Bottomley, David Chinner, Glauber Costa

Now that we have per-sb shrinkers, it makes sense to have nr_dentries
stored per sb as well. We turn them into per-cpu counters so we can
keep acessing them without locking.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Dave Chinner <david@fromorbit.com>
---
 fs/dcache.c        |   12 +++++++++++-
 fs/super.c         |   15 ++++++++++++++-
 include/linux/fs.h |    2 ++
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index b05aac3..ac19d24 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -151,7 +151,13 @@ static void __d_free(struct rcu_head *head)
 static void d_free(struct dentry *dentry)
 {
 	BUG_ON(dentry->d_count);
+	/*
+	 * It is cheaper to keep a global counter separate
+	 * then to scan through all superblocks when needed
+	 */
 	this_cpu_dec(nr_dentry);
+	this_cpu_dec(*dentry->d_sb->s_nr_dentry);
+
 	if (dentry->d_op && dentry->d_op->d_release)
 		dentry->d_op->d_release(dentry);
 
@@ -1224,7 +1230,11 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	INIT_LIST_HEAD(&dentry->d_alias);
 	INIT_LIST_HEAD(&dentry->d_u.d_child);
 	d_set_d_op(dentry, dentry->d_sb->s_d_op);
-
+	/*
+	 * It is cheaper to keep a global counter separate
+	 * then to scan through all superblocks when needed
+	 */
+	this_cpu_inc(*dentry->d_sb->s_nr_dentry);
 	this_cpu_inc(nr_dentry);
 
 	return dentry;
diff --git a/fs/super.c b/fs/super.c
index 3f56a26..9345385 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -112,6 +112,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 {
 	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
 	static const struct super_operations default_op;
+	int i;
 
 	if (s) {
 		if (security_sb_alloc(s)) {
@@ -119,15 +120,26 @@ static struct super_block *alloc_super(struct file_system_type *type)
 			s = NULL;
 			goto out;
 		}
+
+		s->s_nr_dentry = alloc_percpu(int);
+		if (!s->s_nr_dentry) {
+			security_sb_free(s);
+			kfree(s);
+			s = NULL;
+			goto out;
+		}
+		for_each_possible_cpu(i)
+			*per_cpu_ptr(s->s_nr_dentry, i) = 0;
+
 #ifdef CONFIG_SMP
 		s->s_files = alloc_percpu(struct list_head);
 		if (!s->s_files) {
+			free_percpu(s->s_nr_dentry);
 			security_sb_free(s);
 			kfree(s);
 			s = NULL;
 			goto out;
 		} else {
-			int i;
 
 			for_each_possible_cpu(i)
 				INIT_LIST_HEAD(per_cpu_ptr(s->s_files, i));
@@ -198,6 +210,7 @@ static inline void destroy_super(struct super_block *s)
 {
 #ifdef CONFIG_SMP
 	free_percpu(s->s_files);
+	free_percpu(s->s_nr_dentry);
 #endif
 	security_sb_free(s);
 	kfree(s->s_subtype);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f23bcb7..35113fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1399,6 +1399,8 @@ struct super_block {
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
+	int __percpu 		*s_nr_dentry;		/* # of dentry on this sb */
+
 	/* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */
 	spinlock_t		s_inode_lru_lock ____cacheline_aligned_in_smp;
 	struct list_head	s_inode_lru;		/* unused inode lru */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/4] limit nr_dentries per superblock
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2011-08-05  0:35   ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
  2011-08-05  0:35     ` Glauber Costa
@ 2011-08-05  0:35   ` Glauber Costa
  2011-08-05  0:35   ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
  2011-08-12 10:52   ` [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

This patch lays the foundation for us to limit the dcache size.
Each super block can have only a maximum amount of dentries under its
sub-tree. Allocation fails if we we're over limit and the cache
can't be pruned to free up space for the newcomers.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
---
 fs/dcache.c        |   25 +++++++++++++++++++++++++
 fs/super.c         |    1 +
 include/linux/fs.h |    1 +
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ac19d24..52a0faf 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
+static int dcache_mem_check(struct super_block *sb)
+{
+	int i;
+	int nr_dentry;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+
+	do {
+		nr_dentry = 0;
+		for_each_possible_cpu(i)
+			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
+
+		if (nr_dentry < sb->s_nr_dentry_max)
+			return 0;
+
+	/* nr_pages = 1, lru_pages = 0 should get delta ~ 2 */
+	} while (shrink_one_shrinker(&sb->s_shrink, &sc, 1, 0));
+
+	return -ENOMEM;
+}
+
 /**
  * __d_alloc	-	allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1195,6 +1217,9 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	struct dentry *dentry;
 	char *dname;
 
+	if (dcache_mem_check(sb))
+		return NULL;
+
 	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
 	if (!dentry)
 		return NULL;
diff --git a/fs/super.c b/fs/super.c
index 9345385..d9bcae8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -130,6 +130,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		}
 		for_each_possible_cpu(i)
 			*per_cpu_ptr(s->s_nr_dentry, i) = 0;
+		s->s_nr_dentry_max = INT_MAX;
 
 #ifdef CONFIG_SMP
 		s->s_files = alloc_percpu(struct list_head);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35113fd..e90dcb4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1399,6 +1399,7 @@ struct super_block {
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
+	int 			s_nr_dentry_max;	/* max # of dentry on this sb*/
 	int __percpu 		*s_nr_dentry;		/* # of dentry on this sb */
 
 	/* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
@ 2011-08-05  0:35 ` Glauber Costa
       [not found]   ` <1312504544-1108-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  2011-08-12 13:56   ` Eric Dumazet
  2011-08-05  0:35 ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, containers, Pavel Emelyanov, Al Viro,
	Hugh Dickins, Nick Piggin, Andrea Arcangeli, Rik van Riel,
	Dave Hansen, James Bottomley, David Chinner, Glauber Costa

This patch lays the foundation for us to limit the dcache size.
Each super block can have only a maximum amount of dentries under its
sub-tree. Allocation fails if we we're over limit and the cache
can't be pruned to free up space for the newcomers.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Dave Chinner <david@fromorbit.com>
---
 fs/dcache.c        |   25 +++++++++++++++++++++++++
 fs/super.c         |    1 +
 include/linux/fs.h |    1 +
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index ac19d24..52a0faf 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
 }
 EXPORT_SYMBOL(shrink_dcache_parent);
 
+static int dcache_mem_check(struct super_block *sb)
+{
+	int i;
+	int nr_dentry;
+	struct shrink_control sc = {
+		.gfp_mask = GFP_KERNEL,
+	};
+
+	do {
+		nr_dentry = 0;
+		for_each_possible_cpu(i)
+			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
+
+		if (nr_dentry < sb->s_nr_dentry_max)
+			return 0;
+
+	/* nr_pages = 1, lru_pages = 0 should get delta ~ 2 */
+	} while (shrink_one_shrinker(&sb->s_shrink, &sc, 1, 0));
+
+	return -ENOMEM;
+}
+
 /**
  * __d_alloc	-	allocate a dcache entry
  * @sb: filesystem it will belong to
@@ -1195,6 +1217,9 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
 	struct dentry *dentry;
 	char *dname;
 
+	if (dcache_mem_check(sb))
+		return NULL;
+
 	dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
 	if (!dentry)
 		return NULL;
diff --git a/fs/super.c b/fs/super.c
index 9345385..d9bcae8 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -130,6 +130,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
 		}
 		for_each_possible_cpu(i)
 			*per_cpu_ptr(s->s_nr_dentry, i) = 0;
+		s->s_nr_dentry_max = INT_MAX;
 
 #ifdef CONFIG_SMP
 		s->s_files = alloc_percpu(struct list_head);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 35113fd..e90dcb4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1399,6 +1399,7 @@ struct super_block {
 	struct list_head	s_dentry_lru;	/* unused dentry lru */
 	int			s_nr_dentry_unused;	/* # of dentry on lru */
 
+	int 			s_nr_dentry_max;	/* max # of dentry on this sb*/
 	int __percpu 		*s_nr_dentry;		/* # of dentry on this sb */
 
 	/* s_inode_lru_lock protects s_inode_lru and s_nr_inodes_unused */
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/4] parse options in the vfs level
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2011-08-05  0:35   ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
@ 2011-08-05  0:35   ` Glauber Costa
  2011-08-12 10:52   ` [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

This patch introduces a simple generic vfs option parser.
Right now, the only option we have is to limit the size of the dcache.

So any user that wants to have a dcache entries limit, can specify:

  mount -o whatever_options,vfs_dcache_size=XXX <dev> <mntpoint>

It is supposed to work well with remounts, allowing it to change
multiple over the course of the filesystem's lifecycle.

I find mount a natural interface for handling filesystem options,
so that's what I've choosen. Feel free to yell at it at will if
you disagree.

Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
CC: Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>
---
 fs/dcache.c            |    7 +++
 fs/namespace.c         |  105 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    4 ++
 3 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 52a0faf..ace5670 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1202,6 +1202,13 @@ static int dcache_mem_check(struct super_block *sb)
 	return -ENOMEM;
 }
 
+int vfs_set_dcache_size(struct super_block *sb, int size)
+{
+	sb->s_nr_dentry_max = size;
+
+	return dcache_mem_check(sb);
+}
+
 /**
  * __d_alloc	-	allocate a dcache entry
  * @sb: filesystem it will belong to
diff --git a/fs/namespace.c b/fs/namespace.c
index 22bfe82..43b2cdb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -31,6 +31,7 @@
 #include <linux/idr.h>
 #include <linux/fs_struct.h>
 #include <linux/fsnotify.h>
+#include <linux/parser.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include "pnode.h"
@@ -958,6 +959,9 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 	};
 	const struct proc_fs_info *fs_infop;
 
+	if (sb->s_nr_dentry_max != INT_MAX)
+		seq_printf(m, ",vfs_dcache_size=%d",sb->s_nr_dentry_max);
+
 	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
 		if (sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
@@ -2271,6 +2275,94 @@ int copy_mount_string(const void __user *data, char **where)
 	return 0;
 }
 
+static const match_table_t tokens = {
+	{1, "vfs_dcache_size=%u"},
+};
+
+struct vfs_options {
+	unsigned long vfs_dcache_size;
+};
+
+/**
+ * Generic option parsing for the VFS.
+ *
+ * Since most of the filesystems already do their own option parsing, and with
+ * very few code shared between them, this function strips out any options that
+ * we succeed in parsing ourselves. Passing them forward would just give the
+ * underlying fs an option it does not expect, leading it to fail.
+ *
+ * We don't yet have a pointer to the super block as well, since this is
+ * pre-mount. We accumulate in struct vfs_options whatever data we collected,
+ * and act on it later.
+ */
+static int vfs_parse_options(char *options, struct vfs_options *ops)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int option;
+	char *p;
+	char *opt;
+	char *start = NULL;
+	int ret;
+	
+	if (!options)
+		return 0;
+
+	opt = kstrdup(options, GFP_KERNEL);
+	if (!opt)
+		return 1;
+	
+	ret = 1;
+
+	start = opt;
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		if (!*p)
+			continue;
+
+		/*
+		 * Initialize args struct so we know whether arg was
+		 * found; some options take optional arguments.
+		 */
+		args[0].to = args[0].from = 0;
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case 1:
+			if (!args[0].from)
+				break;
+
+			if (match_int(&args[0], &option))
+				break;
+
+			if (option < DCACHE_MIN_SIZE) {
+				printk(KERN_INFO "dcache size %d smaller than "
+				       "minimum (%d)\n", option, DCACHE_MIN_SIZE);
+				option = DCACHE_MIN_SIZE;
+			}
+
+			ops->vfs_dcache_size = option;
+
+			/*
+			 * The actual filesystems don't expect any option
+			 * they don't understand to be received in the option
+			 * string. So we strip off anything we processed, and
+			 * give them a clean options string.
+			 */
+			ret = 0;
+			if (!opt) /* it is the last option listed */
+				*(options + (p - start)) = '\0';
+			else
+				strcpy(options + (p - start), opt);
+			break;
+		default:
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(start);
+	return ret;
+}
+
 /*
  * Flags is a 32-bit value that allows up to 31 non-fs dependent flags to
  * be given to the mount() call (ie: read-only, no-dev, no-suid etc).
@@ -2291,6 +2383,7 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	struct path path;
 	int retval = 0;
 	int mnt_flags = 0;
+	struct vfs_options vfs_options;
 
 	/* Discard magic */
 	if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
@@ -2318,6 +2411,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	if (!(flags & MS_NOATIME))
 		mnt_flags |= MNT_RELATIME;
 
+
+	vfs_options.vfs_dcache_size = INT_MAX;
+	retval = vfs_parse_options(data_page, &vfs_options);
+	if (retval)
+		goto dput_out;
+
 	/* Separate the per-mountpoint flags */
 	if (flags & MS_NOSUID)
 		mnt_flags |= MNT_NOSUID;
@@ -2350,6 +2449,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	else
 		retval = do_new_mount(&path, type_page, flags, mnt_flags,
 				      dev_name, data_page);
+
+	/* bind mounts get to respect their parents decision */
+	if (!retval && !(flags & MS_BIND))
+		vfs_set_dcache_size(path.mnt->mnt_sb,
+				    vfs_options.vfs_dcache_size);
+			
 dput_out:
 	path_put(&path);
 	return retval;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d37d2a7..1a309f3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -251,6 +251,10 @@ extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */
 extern struct dentry * d_alloc_root(struct inode *);
+extern int vfs_set_dcache_size(struct super_block *sb, int size);
+
+#define DCACHE_MIN_SIZE 1024
+extern int vfs_set_dcache_size(struct super_block *sb, int size);
 
 /* <clickety>-<click> the ramfs-type tree */
 extern void d_genocide(struct dentry *);
-- 
1.7.6

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2 4/4] parse options in the vfs level
  2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
  2011-08-05  0:35 ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
@ 2011-08-05  0:35 ` Glauber Costa
  2011-08-12 10:52 ` [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-05  0:35 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-fsdevel, containers, Pavel Emelyanov, Al Viro,
	Hugh Dickins, Nick Piggin, Andrea Arcangeli, Rik van Riel,
	Dave Hansen, James Bottomley, David Chinner, Glauber Costa

This patch introduces a simple generic vfs option parser.
Right now, the only option we have is to limit the size of the dcache.

So any user that wants to have a dcache entries limit, can specify:

  mount -o whatever_options,vfs_dcache_size=XXX <dev> <mntpoint>

It is supposed to work well with remounts, allowing it to change
multiple over the course of the filesystem's lifecycle.

I find mount a natural interface for handling filesystem options,
so that's what I've choosen. Feel free to yell at it at will if
you disagree.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Dave Chinner <david@fromorbit.com>
---
 fs/dcache.c            |    7 +++
 fs/namespace.c         |  105 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dcache.h |    4 ++
 3 files changed, 116 insertions(+), 0 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 52a0faf..ace5670 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1202,6 +1202,13 @@ static int dcache_mem_check(struct super_block *sb)
 	return -ENOMEM;
 }
 
+int vfs_set_dcache_size(struct super_block *sb, int size)
+{
+	sb->s_nr_dentry_max = size;
+
+	return dcache_mem_check(sb);
+}
+
 /**
  * __d_alloc	-	allocate a dcache entry
  * @sb: filesystem it will belong to
diff --git a/fs/namespace.c b/fs/namespace.c
index 22bfe82..43b2cdb 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -31,6 +31,7 @@
 #include <linux/idr.h>
 #include <linux/fs_struct.h>
 #include <linux/fsnotify.h>
+#include <linux/parser.h>
 #include <asm/uaccess.h>
 #include <asm/unistd.h>
 #include "pnode.h"
@@ -958,6 +959,9 @@ static int show_sb_opts(struct seq_file *m, struct super_block *sb)
 	};
 	const struct proc_fs_info *fs_infop;
 
+	if (sb->s_nr_dentry_max != INT_MAX)
+		seq_printf(m, ",vfs_dcache_size=%d",sb->s_nr_dentry_max);
+
 	for (fs_infop = fs_info; fs_infop->flag; fs_infop++) {
 		if (sb->s_flags & fs_infop->flag)
 			seq_puts(m, fs_infop->str);
@@ -2271,6 +2275,94 @@ int copy_mount_string(const void __user *data, char **where)
 	return 0;
 }
 
+static const match_table_t tokens = {
+	{1, "vfs_dcache_size=%u"},
+};
+
+struct vfs_options {
+	unsigned long vfs_dcache_size;
+};
+
+/**
+ * Generic option parsing for the VFS.
+ *
+ * Since most of the filesystems already do their own option parsing, and with
+ * very few code shared between them, this function strips out any options that
+ * we succeed in parsing ourselves. Passing them forward would just give the
+ * underlying fs an option it does not expect, leading it to fail.
+ *
+ * We don't yet have a pointer to the super block as well, since this is
+ * pre-mount. We accumulate in struct vfs_options whatever data we collected,
+ * and act on it later.
+ */
+static int vfs_parse_options(char *options, struct vfs_options *ops)
+{
+	substring_t args[MAX_OPT_ARGS];
+	int option;
+	char *p;
+	char *opt;
+	char *start = NULL;
+	int ret;
+	
+	if (!options)
+		return 0;
+
+	opt = kstrdup(options, GFP_KERNEL);
+	if (!opt)
+		return 1;
+	
+	ret = 1;
+
+	start = opt;
+	while ((p = strsep(&opt, ",")) != NULL) {
+		int token;
+		if (!*p)
+			continue;
+
+		/*
+		 * Initialize args struct so we know whether arg was
+		 * found; some options take optional arguments.
+		 */
+		args[0].to = args[0].from = 0;
+		token = match_token(p, tokens, args);
+		switch (token) {
+		case 1:
+			if (!args[0].from)
+				break;
+
+			if (match_int(&args[0], &option))
+				break;
+
+			if (option < DCACHE_MIN_SIZE) {
+				printk(KERN_INFO "dcache size %d smaller than "
+				       "minimum (%d)\n", option, DCACHE_MIN_SIZE);
+				option = DCACHE_MIN_SIZE;
+			}
+
+			ops->vfs_dcache_size = option;
+
+			/*
+			 * The actual filesystems don't expect any option
+			 * they don't understand to be received in the option
+			 * string. So we strip off anything we processed, and
+			 * give them a clean options string.
+			 */
+			ret = 0;
+			if (!opt) /* it is the last option listed */
+				*(options + (p - start)) = '\0';
+			else
+				strcpy(options + (p - start), opt);
+			break;
+		default:
+			ret = 0;
+			break;
+		}
+	}
+
+	kfree(start);
+	return ret;
+}
+
 /*
  * Flags is a 32-bit value that allows up to 31 non-fs dependent flags to
  * be given to the mount() call (ie: read-only, no-dev, no-suid etc).
@@ -2291,6 +2383,7 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	struct path path;
 	int retval = 0;
 	int mnt_flags = 0;
+	struct vfs_options vfs_options;
 
 	/* Discard magic */
 	if ((flags & MS_MGC_MSK) == MS_MGC_VAL)
@@ -2318,6 +2411,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	if (!(flags & MS_NOATIME))
 		mnt_flags |= MNT_RELATIME;
 
+
+	vfs_options.vfs_dcache_size = INT_MAX;
+	retval = vfs_parse_options(data_page, &vfs_options);
+	if (retval)
+		goto dput_out;
+
 	/* Separate the per-mountpoint flags */
 	if (flags & MS_NOSUID)
 		mnt_flags |= MNT_NOSUID;
@@ -2350,6 +2449,12 @@ long do_mount(char *dev_name, char *dir_name, char *type_page,
 	else
 		retval = do_new_mount(&path, type_page, flags, mnt_flags,
 				      dev_name, data_page);
+
+	/* bind mounts get to respect their parents decision */
+	if (!retval && !(flags & MS_BIND))
+		vfs_set_dcache_size(path.mnt->mnt_sb,
+				    vfs_options.vfs_dcache_size);
+			
 dput_out:
 	path_put(&path);
 	return retval;
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index d37d2a7..1a309f3 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -251,6 +251,10 @@ extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */
 extern struct dentry * d_alloc_root(struct inode *);
+extern int vfs_set_dcache_size(struct super_block *sb, int size);
+
+#define DCACHE_MIN_SIZE 1024
+extern int vfs_set_dcache_size(struct super_block *sb, int size);
 
 /* <clickety>-<click> the ramfs-type tree */
 extern void d_genocide(struct dentry *);
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/4] Per-container dcache size limitation
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2011-08-05  0:35   ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
@ 2011-08-12 10:52   ` Glauber Costa
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 10:52 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On 08/04/2011 09:35 PM, Glauber Costa wrote:
> Hi,
>
> Since v1, there is not too much new here.
> I'm incorporating David's suggestion of calling the sb
> shrinker, which will, in effect, prune the icache and
> other sb related objects as well.
>
> I am also keeping the mount based interface, since I
> still believe it is the way to go. But I'm obviously
> still open for suggestions. Some small corrections
> were also made to it since v1. Specifically, bind
> mounts are not allowed to alter the original sb dcache
> size.
>
> Glauber Costa (4):
>    factor out single-shrinker code
>    Keep nr_dentry per super block
>    limit nr_dentries per superblock
>    parse options in the vfs level
>
>   fs/dcache.c              |   44 +++++++++++-
>   fs/namespace.c           |  105 ++++++++++++++++++++++++++
>   fs/super.c               |   16 ++++-
>   include/linux/dcache.h   |    4 +
>   include/linux/fs.h       |    3 +
>   include/linux/shrinker.h |    6 ++
>   mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
>   7 files changed, 274 insertions(+), 89 deletions(-)
>
People,

Any comments on this? I think I addressed all the comments previously
made, and if there are no further requests, I think this is ready for
getting in. In case there are issues, please let me know so I can 
address them.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 0/4] Per-container dcache size limitation
  2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
                   ` (2 preceding siblings ...)
  2011-08-05  0:35 ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
@ 2011-08-12 10:52 ` Glauber Costa
       [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
  4 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 10:52 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

On 08/04/2011 09:35 PM, Glauber Costa wrote:
> Hi,
>
> Since v1, there is not too much new here.
> I'm incorporating David's suggestion of calling the sb
> shrinker, which will, in effect, prune the icache and
> other sb related objects as well.
>
> I am also keeping the mount based interface, since I
> still believe it is the way to go. But I'm obviously
> still open for suggestions. Some small corrections
> were also made to it since v1. Specifically, bind
> mounts are not allowed to alter the original sb dcache
> size.
>
> Glauber Costa (4):
>    factor out single-shrinker code
>    Keep nr_dentry per super block
>    limit nr_dentries per superblock
>    parse options in the vfs level
>
>   fs/dcache.c              |   44 +++++++++++-
>   fs/namespace.c           |  105 ++++++++++++++++++++++++++
>   fs/super.c               |   16 ++++-
>   include/linux/dcache.h   |    4 +
>   include/linux/fs.h       |    3 +
>   include/linux/shrinker.h |    6 ++
>   mm/vmscan.c              |  185 ++++++++++++++++++++++++----------------------
>   7 files changed, 274 insertions(+), 89 deletions(-)
>
People,

Any comments on this? I think I addressed all the comments previously
made, and if there are no further requests, I think this is ready for
getting in. In case there are issues, please let me know so I can 
address them.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] Keep nr_dentry per super block
       [not found]     ` <1312504544-1108-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-08-12 13:51       ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2011-08-12 13:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
> Now that we have per-sb shrinkers, it makes sense to have nr_dentries
> stored per sb as well. We turn them into per-cpu counters so we can
> keep acessing them without locking.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dcache.c        |   12 +++++++++++-
>  fs/super.c         |   15 ++++++++++++++-
>  include/linux/fs.h |    2 ++
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b05aac3..ac19d24 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -151,7 +151,13 @@ static void __d_free(struct rcu_head *head)
>  static void d_free(struct dentry *dentry)
>  {
>  	BUG_ON(dentry->d_count);
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed

"then to scan" or "than scanning" ?

> +	 */
>  	this_cpu_dec(nr_dentry);
> +	this_cpu_dec(*dentry->d_sb->s_nr_dentry);
> +
>  	if (dentry->d_op && dentry->d_op->d_release)
>  		dentry->d_op->d_release(dentry);
>  
> @@ -1224,7 +1230,11 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  	INIT_LIST_HEAD(&dentry->d_alias);
>  	INIT_LIST_HEAD(&dentry->d_u.d_child);
>  	d_set_d_op(dentry, dentry->d_sb->s_d_op);
> -
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed
> +	 */
> +	this_cpu_inc(*dentry->d_sb->s_nr_dentry);
>  	this_cpu_inc(nr_dentry);
>  
>  	return dentry;
> diff --git a/fs/super.c b/fs/super.c
> index 3f56a26..9345385 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -112,6 +112,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  {
>  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
>  	static const struct super_operations default_op;
> +	int i;
>  
>  	if (s) {
>  		if (security_sb_alloc(s)) {
> @@ -119,15 +120,26 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  			s = NULL;
>  			goto out;
>  		}
> +
> +		s->s_nr_dentry = alloc_percpu(int);
> +		if (!s->s_nr_dentry) {
> +			security_sb_free(s);
> +			kfree(s);
> +			s = NULL;
> +			goto out;
> +		}



> +		for_each_possible_cpu(i)
> +			*per_cpu_ptr(s->s_nr_dentry, i) = 0;

This loop is not needed, alloc_percpu() gives zeroed data

Why dont you use a percpu_counter ?


_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] Keep nr_dentry per super block
  2011-08-05  0:35     ` Glauber Costa
@ 2011-08-12 13:51       ` Eric Dumazet
  -1 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2011-08-12 13:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
> Now that we have per-sb shrinkers, it makes sense to have nr_dentries
> stored per sb as well. We turn them into per-cpu counters so we can
> keep acessing them without locking.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dcache.c        |   12 +++++++++++-
>  fs/super.c         |   15 ++++++++++++++-
>  include/linux/fs.h |    2 ++
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b05aac3..ac19d24 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -151,7 +151,13 @@ static void __d_free(struct rcu_head *head)
>  static void d_free(struct dentry *dentry)
>  {
>  	BUG_ON(dentry->d_count);
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed

"then to scan" or "than scanning" ?

> +	 */
>  	this_cpu_dec(nr_dentry);
> +	this_cpu_dec(*dentry->d_sb->s_nr_dentry);
> +
>  	if (dentry->d_op && dentry->d_op->d_release)
>  		dentry->d_op->d_release(dentry);
>  
> @@ -1224,7 +1230,11 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  	INIT_LIST_HEAD(&dentry->d_alias);
>  	INIT_LIST_HEAD(&dentry->d_u.d_child);
>  	d_set_d_op(dentry, dentry->d_sb->s_d_op);
> -
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed
> +	 */
> +	this_cpu_inc(*dentry->d_sb->s_nr_dentry);
>  	this_cpu_inc(nr_dentry);
>  
>  	return dentry;
> diff --git a/fs/super.c b/fs/super.c
> index 3f56a26..9345385 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -112,6 +112,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  {
>  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
>  	static const struct super_operations default_op;
> +	int i;
>  
>  	if (s) {
>  		if (security_sb_alloc(s)) {
> @@ -119,15 +120,26 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  			s = NULL;
>  			goto out;
>  		}
> +
> +		s->s_nr_dentry = alloc_percpu(int);
> +		if (!s->s_nr_dentry) {
> +			security_sb_free(s);
> +			kfree(s);
> +			s = NULL;
> +			goto out;
> +		}



> +		for_each_possible_cpu(i)
> +			*per_cpu_ptr(s->s_nr_dentry, i) = 0;

This loop is not needed, alloc_percpu() gives zeroed data

Why dont you use a percpu_counter ?



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 2/4] Keep nr_dentry per super block
@ 2011-08-12 13:51       ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2011-08-12 13:51 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
> Now that we have per-sb shrinkers, it makes sense to have nr_dentries
> stored per sb as well. We turn them into per-cpu counters so we can
> keep acessing them without locking.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dcache.c        |   12 +++++++++++-
>  fs/super.c         |   15 ++++++++++++++-
>  include/linux/fs.h |    2 ++
>  3 files changed, 27 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index b05aac3..ac19d24 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -151,7 +151,13 @@ static void __d_free(struct rcu_head *head)
>  static void d_free(struct dentry *dentry)
>  {
>  	BUG_ON(dentry->d_count);
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed

"then to scan" or "than scanning" ?

> +	 */
>  	this_cpu_dec(nr_dentry);
> +	this_cpu_dec(*dentry->d_sb->s_nr_dentry);
> +
>  	if (dentry->d_op && dentry->d_op->d_release)
>  		dentry->d_op->d_release(dentry);
>  
> @@ -1224,7 +1230,11 @@ struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>  	INIT_LIST_HEAD(&dentry->d_alias);
>  	INIT_LIST_HEAD(&dentry->d_u.d_child);
>  	d_set_d_op(dentry, dentry->d_sb->s_d_op);
> -
> +	/*
> +	 * It is cheaper to keep a global counter separate
> +	 * then to scan through all superblocks when needed
> +	 */
> +	this_cpu_inc(*dentry->d_sb->s_nr_dentry);
>  	this_cpu_inc(nr_dentry);
>  
>  	return dentry;
> diff --git a/fs/super.c b/fs/super.c
> index 3f56a26..9345385 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -112,6 +112,7 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  {
>  	struct super_block *s = kzalloc(sizeof(struct super_block),  GFP_USER);
>  	static const struct super_operations default_op;
> +	int i;
>  
>  	if (s) {
>  		if (security_sb_alloc(s)) {
> @@ -119,15 +120,26 @@ static struct super_block *alloc_super(struct file_system_type *type)
>  			s = NULL;
>  			goto out;
>  		}
> +
> +		s->s_nr_dentry = alloc_percpu(int);
> +		if (!s->s_nr_dentry) {
> +			security_sb_free(s);
> +			kfree(s);
> +			s = NULL;
> +			goto out;
> +		}



> +		for_each_possible_cpu(i)
> +			*per_cpu_ptr(s->s_nr_dentry, i) = 0;

This loop is not needed, alloc_percpu() gives zeroed data

Why dont you use a percpu_counter ?


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
       [not found]   ` <1312504544-1108-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-08-12 13:56     ` Eric Dumazet
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Dumazet @ 2011-08-12 13:56 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
> This patch lays the foundation for us to limit the dcache size.
> Each super block can have only a maximum amount of dentries under its
> sub-tree. Allocation fails if we we're over limit and the cache
> can't be pruned to free up space for the newcomers.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dcache.c        |   25 +++++++++++++++++++++++++
>  fs/super.c         |    1 +
>  include/linux/fs.h |    1 +
>  3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index ac19d24..52a0faf 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>  }
>  EXPORT_SYMBOL(shrink_dcache_parent);
>  
> +static int dcache_mem_check(struct super_block *sb)
> +{
> +	int i;
> +	int nr_dentry;
> +	struct shrink_control sc = {
> +		.gfp_mask = GFP_KERNEL,
> +	};
> +
> +	do {
> +		nr_dentry = 0;
> +		for_each_possible_cpu(i)
> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);

You seriously want to call this for every __d_alloc() invocation,
even if s_nr_dentry_max is the default value (INT_MAX) ?

On a 4096 cpu machine, it will be _very_ slow.

A percpu_counter would be the thing to consider, since you can avoid the
for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
sb->s_nr_dentry_max.

Check how its done in include/net/tcp.h, tcp_too_many_orphans()



_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-05  0:35 ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
       [not found]   ` <1312504544-1108-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
@ 2011-08-12 13:56   ` Eric Dumazet
  2011-08-12 19:18     ` Glauber Costa
                       ` (3 more replies)
  1 sibling, 4 replies; 21+ messages in thread
From: Eric Dumazet @ 2011-08-12 13:56 UTC (permalink / raw)
  To: Glauber Costa
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
> This patch lays the foundation for us to limit the dcache size.
> Each super block can have only a maximum amount of dentries under its
> sub-tree. Allocation fails if we we're over limit and the cache
> can't be pruned to free up space for the newcomers.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> CC: Dave Chinner <david@fromorbit.com>
> ---
>  fs/dcache.c        |   25 +++++++++++++++++++++++++
>  fs/super.c         |    1 +
>  include/linux/fs.h |    1 +
>  3 files changed, 27 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index ac19d24..52a0faf 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>  }
>  EXPORT_SYMBOL(shrink_dcache_parent);
>  
> +static int dcache_mem_check(struct super_block *sb)
> +{
> +	int i;
> +	int nr_dentry;
> +	struct shrink_control sc = {
> +		.gfp_mask = GFP_KERNEL,
> +	};
> +
> +	do {
> +		nr_dentry = 0;
> +		for_each_possible_cpu(i)
> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);

You seriously want to call this for every __d_alloc() invocation,
even if s_nr_dentry_max is the default value (INT_MAX) ?

On a 4096 cpu machine, it will be _very_ slow.

A percpu_counter would be the thing to consider, since you can avoid the
for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
sb->s_nr_dentry_max.

Check how its done in include/net/tcp.h, tcp_too_many_orphans()




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-12 13:56   ` Eric Dumazet
@ 2011-08-12 19:18     ` Glauber Costa
  2011-08-12 19:18       ` Glauber Costa
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 19:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On 08/12/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
>> This patch lays the foundation for us to limit the dcache size.
>> Each super block can have only a maximum amount of dentries under its
>> sub-tree. Allocation fails if we we're over limit and the cache
>> can't be pruned to free up space for the newcomers.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: Dave Chinner<david@fromorbit.com>
>> ---
>>   fs/dcache.c        |   25 +++++++++++++++++++++++++
>>   fs/super.c         |    1 +
>>   include/linux/fs.h |    1 +
>>   3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index ac19d24..52a0faf 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static int dcache_mem_check(struct super_block *sb)
>> +{
>> +	int i;
>> +	int nr_dentry;
>> +	struct shrink_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +	};
>> +
>> +	do {
>> +		nr_dentry = 0;
>> +		for_each_possible_cpu(i)
>> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
>
> You seriously want to call this for every __d_alloc() invocation,
> even if s_nr_dentry_max is the default value (INT_MAX) ?

Well, I guess that special-casing INT_MAX is a good thing.
I can include it in the next submission, I like it. Thanks.

> On a 4096 cpu machine, it will be _very_ slow.
>
> A percpu_counter would be the thing to consider, since you can avoid the
> for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
> sb->s_nr_dentry_max.
>
> Check how its done in include/net/tcp.h, tcp_too_many_orphans()

Yeah, I guess I could do that. In fact, my first series used 
percpu_counters, and then I switched. But looking back, percpu_counters
are indeed more suitable. The goal back then was trying to avoid 
percpu_counter_add, but as it is right now, we trade it for an even 
worse thing.

Thank you for your comments.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-12 13:56   ` Eric Dumazet
@ 2011-08-12 19:18       ` Glauber Costa
  2011-08-12 19:18       ` Glauber Costa
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 19:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

On 08/12/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
>> This patch lays the foundation for us to limit the dcache size.
>> Each super block can have only a maximum amount of dentries under its
>> sub-tree. Allocation fails if we we're over limit and the cache
>> can't be pruned to free up space for the newcomers.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: Dave Chinner<david@fromorbit.com>
>> ---
>>   fs/dcache.c        |   25 +++++++++++++++++++++++++
>>   fs/super.c         |    1 +
>>   include/linux/fs.h |    1 +
>>   3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index ac19d24..52a0faf 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static int dcache_mem_check(struct super_block *sb)
>> +{
>> +	int i;
>> +	int nr_dentry;
>> +	struct shrink_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +	};
>> +
>> +	do {
>> +		nr_dentry = 0;
>> +		for_each_possible_cpu(i)
>> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
>
> You seriously want to call this for every __d_alloc() invocation,
> even if s_nr_dentry_max is the default value (INT_MAX) ?

Well, I guess that special-casing INT_MAX is a good thing.
I can include it in the next submission, I like it. Thanks.

> On a 4096 cpu machine, it will be _very_ slow.
>
> A percpu_counter would be the thing to consider, since you can avoid the
> for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
> sb->s_nr_dentry_max.
>
> Check how its done in include/net/tcp.h, tcp_too_many_orphans()

Yeah, I guess I could do that. In fact, my first series used 
percpu_counters, and then I switched. But looking back, percpu_counters
are indeed more suitable. The goal back then was trying to avoid 
percpu_counter_add, but as it is right now, we trade it for an even 
worse thing.

Thank you for your comments.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
@ 2011-08-12 19:18       ` Glauber Costa
  0 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 19:18 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

On 08/12/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
>> This patch lays the foundation for us to limit the dcache size.
>> Each super block can have only a maximum amount of dentries under its
>> sub-tree. Allocation fails if we we're over limit and the cache
>> can't be pruned to free up space for the newcomers.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: Dave Chinner<david@fromorbit.com>
>> ---
>>   fs/dcache.c        |   25 +++++++++++++++++++++++++
>>   fs/super.c         |    1 +
>>   include/linux/fs.h |    1 +
>>   3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index ac19d24..52a0faf 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static int dcache_mem_check(struct super_block *sb)
>> +{
>> +	int i;
>> +	int nr_dentry;
>> +	struct shrink_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +	};
>> +
>> +	do {
>> +		nr_dentry = 0;
>> +		for_each_possible_cpu(i)
>> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
>
> You seriously want to call this for every __d_alloc() invocation,
> even if s_nr_dentry_max is the default value (INT_MAX) ?

Well, I guess that special-casing INT_MAX is a good thing.
I can include it in the next submission, I like it. Thanks.

> On a 4096 cpu machine, it will be _very_ slow.
>
> A percpu_counter would be the thing to consider, since you can avoid the
> for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
> sb->s_nr_dentry_max.
>
> Check how its done in include/net/tcp.h, tcp_too_many_orphans()

Yeah, I guess I could do that. In fact, my first series used 
percpu_counters, and then I switched. But looking back, percpu_counters
are indeed more suitable. The goal back then was trying to avoid 
percpu_counter_add, but as it is right now, we trade it for an even 
worse thing.

Thank you for your comments.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-12 13:56   ` Eric Dumazet
                       ` (2 preceding siblings ...)
  2011-08-12 19:20     ` Glauber Costa
@ 2011-08-12 19:20     ` Glauber Costa
  3 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrea Arcangeli, Rik van Riel, Nick Piggin, Pavel Emelyanov,
	David Chinner,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	James Bottomley, Dave Hansen, Al Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA

On 08/12/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
>> This patch lays the foundation for us to limit the dcache size.
>> Each super block can have only a maximum amount of dentries under its
>> sub-tree. Allocation fails if we we're over limit and the cache
>> can't be pruned to free up space for the newcomers.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: Dave Chinner<david@fromorbit.com>
>> ---
>>   fs/dcache.c        |   25 +++++++++++++++++++++++++
>>   fs/super.c         |    1 +
>>   include/linux/fs.h |    1 +
>>   3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index ac19d24..52a0faf 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static int dcache_mem_check(struct super_block *sb)
>> +{
>> +	int i;
>> +	int nr_dentry;
>> +	struct shrink_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +	};
>> +
>> +	do {
>> +		nr_dentry = 0;
>> +		for_each_possible_cpu(i)
>> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
>
> You seriously want to call this for every __d_alloc() invocation,
> even if s_nr_dentry_max is the default value (INT_MAX) ?

Well, I guess that special-casing INT_MAX is a good thing.
I can include it in the next submission, I like it. Thanks.

> On a 4096 cpu machine, it will be _very_ slow.
>
> A percpu_counter would be the thing to consider, since you can avoid the
> for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
> sb->s_nr_dentry_max.
>
> Check how its done in include/net/tcp.h, tcp_too_many_orphans()

Yeah, I guess I could do that. In fact, my first series used 
percpu_counters, and then I switched. But looking back, percpu_counters
are indeed more suitable. The goal back then was trying to avoid 
percpu_counter_add, but as it is right now, we trade it for an even 
worse thing.

Thank you for your comments.
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/containers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2 3/4] limit nr_dentries per superblock
  2011-08-12 13:56   ` Eric Dumazet
  2011-08-12 19:18     ` Glauber Costa
  2011-08-12 19:18       ` Glauber Costa
@ 2011-08-12 19:20     ` Glauber Costa
  2011-08-12 19:20     ` Glauber Costa
  3 siblings, 0 replies; 21+ messages in thread
From: Glauber Costa @ 2011-08-12 19:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, linux-fsdevel, containers, Pavel Emelyanov,
	Al Viro, Hugh Dickins, Nick Piggin, Andrea Arcangeli,
	Rik van Riel, Dave Hansen, James Bottomley, David Chinner

On 08/12/2011 10:56 AM, Eric Dumazet wrote:
> Le vendredi 05 août 2011 à 04:35 +0400, Glauber Costa a écrit :
>> This patch lays the foundation for us to limit the dcache size.
>> Each super block can have only a maximum amount of dentries under its
>> sub-tree. Allocation fails if we we're over limit and the cache
>> can't be pruned to free up space for the newcomers.
>>
>> Signed-off-by: Glauber Costa<glommer@parallels.com>
>> CC: Dave Chinner<david@fromorbit.com>
>> ---
>>   fs/dcache.c        |   25 +++++++++++++++++++++++++
>>   fs/super.c         |    1 +
>>   include/linux/fs.h |    1 +
>>   3 files changed, 27 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index ac19d24..52a0faf 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1180,6 +1180,28 @@ void shrink_dcache_parent(struct dentry * parent)
>>   }
>>   EXPORT_SYMBOL(shrink_dcache_parent);
>>
>> +static int dcache_mem_check(struct super_block *sb)
>> +{
>> +	int i;
>> +	int nr_dentry;
>> +	struct shrink_control sc = {
>> +		.gfp_mask = GFP_KERNEL,
>> +	};
>> +
>> +	do {
>> +		nr_dentry = 0;
>> +		for_each_possible_cpu(i)
>> +			nr_dentry += per_cpu(*sb->s_nr_dentry, i);
>
> You seriously want to call this for every __d_alloc() invocation,
> even if s_nr_dentry_max is the default value (INT_MAX) ?

Well, I guess that special-casing INT_MAX is a good thing.
I can include it in the next submission, I like it. Thanks.

> On a 4096 cpu machine, it will be _very_ slow.
>
> A percpu_counter would be the thing to consider, since you can avoid the
> for_each_possible_cpu(i) loop if percpu_counter_read() is smaller than
> sb->s_nr_dentry_max.
>
> Check how its done in include/net/tcp.h, tcp_too_many_orphans()

Yeah, I guess I could do that. In fact, my first series used 
percpu_counters, and then I switched. But looking back, percpu_counters
are indeed more suitable. The goal back then was trying to avoid 
percpu_counter_add, but as it is right now, we trade it for an even 
worse thing.

Thank you for your comments.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2011-08-12 19:20 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-05  0:35 [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
2011-08-05  0:35 ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
2011-08-05  0:35 ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
     [not found]   ` <1312504544-1108-4-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-08-12 13:56     ` Eric Dumazet
2011-08-12 13:56   ` Eric Dumazet
2011-08-12 19:18     ` Glauber Costa
2011-08-12 19:18     ` Glauber Costa
2011-08-12 19:18       ` Glauber Costa
2011-08-12 19:20     ` Glauber Costa
2011-08-12 19:20     ` Glauber Costa
2011-08-05  0:35 ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
2011-08-12 10:52 ` [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa
     [not found] ` <1312504544-1108-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-08-05  0:35   ` [PATCH v2 1/4] factor out single-shrinker code Glauber Costa
2011-08-05  0:35   ` [PATCH v2 2/4] Keep nr_dentry per super block Glauber Costa
2011-08-05  0:35     ` Glauber Costa
2011-08-12 13:51     ` Eric Dumazet
2011-08-12 13:51       ` Eric Dumazet
     [not found]     ` <1312504544-1108-3-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2011-08-12 13:51       ` Eric Dumazet
2011-08-05  0:35   ` [PATCH v2 3/4] limit nr_dentries per superblock Glauber Costa
2011-08-05  0:35   ` [PATCH v2 4/4] parse options in the vfs level Glauber Costa
2011-08-12 10:52   ` [PATCH v2 0/4] Per-container dcache size limitation Glauber Costa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.