linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/26] Slab defragmentation V5
@ 2007-09-01  1:41 Christoph Lameter
  2007-09-01  1:41 ` [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
                   ` (26 more replies)
  0 siblings, 27 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

Slab defragmentation is mainly an issue if Linux is used as a fileserver
and large amounts of dentries, inodes and buffer heads accumulate. In some
load situations the slabs become very sparsely populated so that a lot of
memory is wasted by slabs that only contain one or a few objects. In
extreme cases the performance of a machine will become sluggish since
we are continually running reclaim. Slab defragmentation adds the
capability to recover wasted memory.

For lumpy reclaim slab defragmentation can be used to enhance the
ability to recover larger contiguous areas of memory. Lumpy reclaim currently
cannot do anything if a slab page is encountered. With slab defragmentation
that slab page can be removed and a large contiguous page freed. It may
be possible to have slab pages also part of ZONE_MOVABLE (Mel's defrag
scheme in 2.6.23) or the MOVABLE areas (antifrag patches in mm).

The trouble with this patchset is that it is difficult to validate.
Activities are only performed when special load situations are encountered.
Are there any tests that could give meaningful information about
the effectiveness of these measures? I have run various tests here
creating and deleting files and building kernels under low memory situations
to trigger these reclaim mechanisms but how does one measure their
effectiveness?

The patchset is also available via git

git pull git://git.kernel.org/pub/scm/linux/kernel/git/christoph/slab.git defrag


We currently support the following types of reclaim:

1. dentry cache
2. inode cache (with a generic interface to allow easy setup of more
   filesystems than the currently supported ext2/3/4 reiserfs, XFS
   and proc)
3. buffer_head

One typical mechanism that triggers slab defragmentation on my systems
is the daily run of

	updatedb

Updatedb scans all files on the system which causes a high inode and dentry
use. After updatedb is complete we need to go back to the regular use
patterns (typical on my machine: kernel compiles). Those need the memory now
for different purposes. The inodes and dentries used for updatedb will
gradually be aged by the dentry/inode reclaim algorithm which will free
up the dentries and inode entries randomly through the slabs that were
allocated. As a result the slabs will become sparsely populated. If they
become empty then they can be freed but a lot of them will remain sparsely
populated. That is where slab defrag comes in: It removes the slabs with
just a few entries reclaiming more memory for other uses.

V4->V5:
- Support lumpy reclaim for slabs
- Support reclaim via slab_shrink()
- Add constructors to insure a consistent object state at all times.

V3->V4:
- Optimize scan for slabs that need defragmentation
- Add /sys/slab/*/defrag_ratio to allow setting defrag limits
  per slab.
- Add support for buffer heads.
- Describe how the cleanup after the daily updatedb can be
  improved by slab defragmentation.

V2->V3
- Support directory reclaim
- Add infrastructure to trigger defragmentation after slab shrinking if we
  have slabs with a high degree of fragmentation.

V1->V2
- Clean up control flow using a state variable. Simplify API. Back to 2
  functions that now take arrays of objects.
- Inode defrag support for a set of filesystems
- Fix up dentry defrag support to work on negative dentries by adding
  a new dentry flag that indicates that a dentry is not in the process
  of being freed or allocated.

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 02/26] SLUB: Move count_partial() Christoph Lameter
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0001-slab_defrag_slabinfo_update.patch --]
[-- Type: text/plain, Size: 5877 bytes --]

-D lists caches that support defragmentation

-C lists caches that use a ctor.

Change field names for defrag_ratio and remote_node_defrag_ratio.

Add determination of the allocation ratio for slab. The allocation ratio
is the percentage of available slots for objects in use.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 Documentation/vm/slabinfo.c |   52 ++++++++++++++++++++++++++++++++++++------
 1 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c
index 1af7bd5..1319756 100644
--- a/Documentation/vm/slabinfo.c
+++ b/Documentation/vm/slabinfo.c
@@ -30,6 +30,8 @@ struct slabinfo {
 	int hwcache_align, object_size, objs_per_slab;
 	int sanity_checks, slab_size, store_user, trace;
 	int order, poison, reclaim_account, red_zone;
+	int defrag, ctor;
+	int defrag_ratio, remote_node_defrag_ratio;
 	unsigned long partial, objects, slabs;
 	int numa[MAX_NODES];
 	int numa_partial[MAX_NODES];
@@ -56,6 +58,8 @@ int show_slab = 0;
 int skip_zero = 1;
 int show_numa = 0;
 int show_track = 0;
+int show_defrag = 0;
+int show_ctor = 0;
 int show_first_alias = 0;
 int validate = 0;
 int shrink = 0;
@@ -90,18 +94,20 @@ void fatal(const char *x, ...)
 void usage(void)
 {
 	printf("slabinfo 5/7/2007. (c) 2007 sgi. clameter@sgi.com\n\n"
-		"slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n"
+		"slabinfo [-aCDefhilnosSrtTvz1] [-d debugopts] [slab-regexp]\n"
 		"-a|--aliases           Show aliases\n"
+		"-C|--ctor              Show slabs with ctors\n"
 		"-d<options>|--debug=<options> Set/Clear Debug options\n"
-		"-e|--empty		Show empty slabs\n"
+		"-D|--defrag            Show defragmentable caches\n"
+		"-e|--empty             Show empty slabs\n"
 		"-f|--first-alias       Show first alias\n"
 		"-h|--help              Show usage information\n"
 		"-i|--inverted          Inverted list\n"
 		"-l|--slabs             Show slabs\n"
 		"-n|--numa              Show NUMA information\n"
-		"-o|--ops		Show kmem_cache_ops\n"
+		"-o|--ops               Show kmem_cache_ops\n"
 		"-s|--shrink            Shrink slabs\n"
-		"-r|--report		Detailed report on single slabs\n"
+		"-r|--report            Detailed report on single slabs\n"
 		"-S|--Size              Sort by size\n"
 		"-t|--tracking          Show alloc/free information\n"
 		"-T|--Totals            Show summary information\n"
@@ -281,7 +287,7 @@ int line = 0;
 void first_line(void)
 {
 	printf("Name                   Objects Objsize    Space "
-		"Slabs/Part/Cpu  O/S O %%Fr %%Ef Flg\n");
+		"Slabs/Part/Cpu  O/S O %%Ra %%Ef Flg\n");
 }
 
 /*
@@ -324,7 +330,7 @@ void slab_numa(struct slabinfo *s, int mode)
 		return;
 
 	if (!line) {
-		printf("\n%-21s:", mode ? "NUMA nodes" : "Slab");
+		printf("\n%-21s: Rto ", mode ? "NUMA nodes" : "Slab");
 		for(node = 0; node <= highest_node; node++)
 			printf(" %4d", node);
 		printf("\n----------------------");
@@ -333,6 +339,7 @@ void slab_numa(struct slabinfo *s, int mode)
 		printf("\n");
 	}
 	printf("%-21s ", mode ? "All slabs" : s->name);
+	printf("%3d ", s->remote_node_defrag_ratio);
 	for(node = 0; node <= highest_node; node++) {
 		char b[20];
 
@@ -406,6 +413,8 @@ void report(struct slabinfo *s)
 		printf("** Slabs are destroyed via RCU\n");
 	if (s->reclaim_account)
 		printf("** Reclaim accounting active\n");
+	if (s->defrag)
+		printf("** Defragmentation at %d%%\n", s->defrag_ratio);
 
 	printf("\nSizes (bytes)     Slabs              Debug                Memory\n");
 	printf("------------------------------------------------------------------------\n");
@@ -452,6 +461,12 @@ void slabcache(struct slabinfo *s)
 	if (show_empty && s->slabs)
 		return;
 
+	if (show_defrag && !s->defrag)
+		return;
+
+	if (show_ctor && !s->ctor)
+		return;
+
 	store_size(size_str, slab_size(s));
 	sprintf(dist_str,"%lu/%lu/%d", s->slabs, s->partial, s->cpu_slabs);
 
@@ -462,6 +477,10 @@ void slabcache(struct slabinfo *s)
 		*p++ = '*';
 	if (s->cache_dma)
 		*p++ = 'd';
+	if (s->defrag)
+		*p++ = 'D';
+	if (s->ctor)
+		*p++ = 'C';
 	if (s->hwcache_align)
 		*p++ = 'A';
 	if (s->poison)
@@ -481,7 +500,7 @@ void slabcache(struct slabinfo *s)
 	printf("%-21s %8ld %7d %8s %14s %4d %1d %3ld %3ld %s\n",
 		s->name, s->objects, s->object_size, size_str, dist_str,
 		s->objs_per_slab, s->order,
-		s->slabs ? (s->partial * 100) / s->slabs : 100,
+		s->slabs ? (s->objects * 100) / (s->slabs * s->objs_per_slab) : 100,
 		s->slabs ? (s->objects * s->object_size * 100) /
 			(s->slabs * (page_size << s->order)) : 100,
 		flags);
@@ -1071,7 +1090,16 @@ void read_slab_dir(void)
 			decode_numa_list(slab->numa, t);
 			slab->store_user = get_obj("store_user");
 			slab->trace = get_obj("trace");
+			slab->defrag_ratio = get_obj("defrag_ratio");
+			slab->remote_node_defrag_ratio =
+					get_obj("remote_node_defrag_ratio");
 			chdir("..");
+			if (read_slab_obj(slab, "ops")) {
+				if (strstr(buffer, "ctor :"))
+					slab->ctor = 1;
+				if (strstr(buffer, "kick :"))
+					slab->defrag = 1;
+			}
 			if (slab->name[0] == ':')
 				alias_targets++;
 			slab++;
@@ -1121,7 +1149,9 @@ void output_slabs(void)
 
 struct option opts[] = {
 	{ "aliases", 0, NULL, 'a' },
+	{ "ctor", 0, NULL, 'C' },
 	{ "debug", 2, NULL, 'd' },
+	{ "defrag", 0, NULL, 'D' },
 	{ "empty", 0, NULL, 'e' },
 	{ "first-alias", 0, NULL, 'f' },
 	{ "help", 0, NULL, 'h' },
@@ -1146,7 +1176,7 @@ int main(int argc, char *argv[])
 
 	page_size = getpagesize();
 
-	while ((c = getopt_long(argc, argv, "ad::efhil1noprstvzTS",
+	while ((c = getopt_long(argc, argv, "ad::efhil1noprstvzCDTS",
 						opts, NULL)) != -1)
 	switch(c) {
 		case '1':
@@ -1196,6 +1226,12 @@ int main(int argc, char *argv[])
 		case 'z':
 			skip_zero = 0;
 			break;
+		case 'C':
+			show_ctor = 1;
+			break;
+		case 'D':
+			show_defrag = 1;
+			break;
 		case 'T':
 			show_totals = 1;
 			break;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 02/26] SLUB: Move count_partial()
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
  2007-09-01  1:41 ` [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio Christoph Lameter
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0002-slab_defrag_move_count_partial.patch --]
[-- Type: text/plain, Size: 1453 bytes --]

Move the counting function for objects in partial slabs so that it is placed
before kmem_cache_shrink. We will need to use it to establish the
fragmentation ratio of per node slab lists.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   26 +++++++++++++-------------
 1 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 45c76fe..aad6f83 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2595,6 +2595,19 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);
 
+static unsigned long count_partial(struct kmem_cache_node *n)
+{
+	unsigned long flags;
+	unsigned long x = 0;
+	struct page *page;
+
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry(page, &n->partial, lru)
+		x += page->inuse;
+	spin_unlock_irqrestore(&n->list_lock, flags);
+	return x;
+}
+
 /*
  * kmem_cache_shrink removes empty slabs from the partial lists and sorts
  * the remaining slabs by the number of items in use. The slabs with the
@@ -3331,19 +3344,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
 	return n;
 }
 
-static unsigned long count_partial(struct kmem_cache_node *n)
-{
-	unsigned long flags;
-	unsigned long x = 0;
-	struct page *page;
-
-	spin_lock_irqsave(&n->list_lock, flags);
-	list_for_each_entry(page, &n->partial, lru)
-		x += page->inuse;
-	spin_unlock_irqrestore(&n->list_lock, flags);
-	return x;
-}
-
 enum slab_stat_type {
 	SL_FULL,
 	SL_PARTIAL,
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
  2007-09-01  1:41 ` [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
  2007-09-01  1:41 ` [RFC 02/26] SLUB: Move count_partial() Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 04/26] SLUB: Add defrag_ratio field and sysfs support Christoph Lameter
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0003-slab_defrag_remote_node_defrag_ratio.patch --]
[-- Type: text/plain, Size: 2656 bytes --]

We need the defrag ratio for the non NUMA situation now. The NUMA defrag works
by allocating objects from partial slabs on remote nodes. Rename it to

	remote_node_defrag_ratio

to be clear about this.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slub_def.h |    5 ++++-
 mm/slub.c                |   17 +++++++++--------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 8aad7dc..5912b58 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -59,7 +59,10 @@ struct kmem_cache {
 #endif
 
 #ifdef CONFIG_NUMA
-	int defrag_ratio;
+	/*
+	 * Defragmentation by allocating from a remote node.
+	 */
+	int remote_node_defrag_ratio;
 	struct kmem_cache_node *node[MAX_NUMNODES];
 #endif
 #ifdef CONFIG_SMP
diff --git a/mm/slub.c b/mm/slub.c
index aad6f83..e63aba5 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1267,7 +1267,8 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags)
 	 * expensive if we do it every time we are trying to find a slab
 	 * with available objects.
 	 */
-	if (!s->defrag_ratio || get_cycles() % 1024 > s->defrag_ratio)
+	if (!s->remote_node_defrag_ratio ||
+			get_cycles() % 1024 > s->remote_node_defrag_ratio)
 		return NULL;
 
 	zonelist = &NODE_DATA(slab_node(current->mempolicy))
@@ -2200,7 +2201,7 @@ static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
 
 	s->refcount = 1;
 #ifdef CONFIG_NUMA
-	s->defrag_ratio = 100;
+	s->remote_node_defrag_ratio = 100;
 #endif
 
 	if (!init_kmem_cache_nodes(s, gfpflags & ~SLUB_DMA))
@@ -3717,21 +3718,21 @@ static ssize_t free_calls_show(struct kmem_cache *s, char *buf)
 SLAB_ATTR_RO(free_calls);
 
 #ifdef CONFIG_NUMA
-static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
+static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%d\n", s->defrag_ratio / 10);
+	return sprintf(buf, "%d\n", s->remote_node_defrag_ratio / 10);
 }
 
-static ssize_t defrag_ratio_store(struct kmem_cache *s,
+static ssize_t remote_node_defrag_ratio_store(struct kmem_cache *s,
 				const char *buf, size_t length)
 {
 	int n = simple_strtoul(buf, NULL, 10);
 
 	if (n < 100)
-		s->defrag_ratio = n * 10;
+		s->remote_node_defrag_ratio = n * 10;
 	return length;
 }
-SLAB_ATTR(defrag_ratio);
+SLAB_ATTR(remote_node_defrag_ratio);
 #endif
 
 static struct attribute * slab_attrs[] = {
@@ -3762,7 +3763,7 @@ static struct attribute * slab_attrs[] = {
 	&cache_dma_attr.attr,
 #endif
 #ifdef CONFIG_NUMA
-	&defrag_ratio_attr.attr,
+	&remote_node_defrag_ratio_attr.attr,
 #endif
 	NULL
 };
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 04/26] SLUB: Add defrag_ratio field and sysfs support.
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Parse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/slab/UDP-Lite /sys/slab/UNIX /sys/slab/anon_vma /sys/slab/arp_cache /sys/slab/bdev_cache /sys/ slab/bio /sys/slab/biovec-1 /sys/slab/biovec-128 /sys/slab/biovec-16 /sys/slab/biovec-256 /sys/slab/biovec-4 /sys/slab/biovec-64 /sys/slab/blkdev_ioc /sys/slab/blkdev_queue /sys/slab/blkdev_requests /sys/slab/buffer_head /sys/slab/cfq_io_context /sys/slab/cfq_queue /sys/slab/dentry /sys/slab/eventpoll_epi /sys/slab/eventpoll_pwq /sys/slab/ext2_inode_cache /sys/slab/ext3_inode_cache /sys/slab/fasync_cache /sys/slab/file_lock_cache /sys/slab/files_cache /sys/slab/filp /sys/slab/flow_cache /sys/slab/fs_cache /sys/slab/idr_layer_cache /sys/slab/inet_peer_cache /sys/slab/inode_cache /sys/slab/inotify_event_cache /sys/slab/inotify_watch_cache /sys/slab/ip_dst_cache /sys/slab/ip_fib_alias /sys/slab/ip_fib_hash /sys/slab/jbd_1k /sys/slab/jbd_4k /sys/slab/journal_handle /sys/slab/journal_head /sys/slab/kiocb /sys/slab/kioctx /sys/slab/kmalloc-1024 /sys/slab/kmalloc-128 /sys/slab/kmalloc-16 /sys/slab/kmalloc-192 /sys/slab/kmalloc-2048 /sys/slab/kmalloc-256 /sys/slab/kmalloc-32 /sys/sl ab/kmalloc-512 /sys/slab/kmalloc-64 /sys/slab/kmalloc-8 /sys/slab/kmalloc-96 /sys/slab/mm_struct /sys/slab/mnt_cache /sys/slab/mqueue_inode_cache /sys/slab/names_cache /sys/slab/nfs_direct_cache /sys/slab/nfs_inode_cache /sys/slab/nfs_page /sys/slab/nfs_read_data /sys/slab/nfs_write_data /sys/slab/nfsd4_delegations /sys/slab/nfsd4_files /sys/slab/nfsd4_stateids /sys/slab/nfsd4_stateowners /sys/slab/nsproxy /sys/slab/pid /sys/slab/posix_timers_cache /sys/slab/proc_inode_cache /sys/slab/radix_tree_node /sys/slab/request_sock_TCP /sys/slab/revoke_record /sys/slab/revoke_table /sys/slab/rpc_buffers /sys/slab/rpc_inode_cache /sys/slab/rpc_tasks /sys/slab/scsi_cmd_cache /sys/slab/scsi_io_context /sys/slab/secpath_cache /sys/slab/sgpool-128 /sys/slab/sgpool-16 /sys/slab/sgpool-32 /sys/slab/sgpool-64 /sys/slab/sgpool-8 /sys/slab/shmem_inode_cache /sys/slab/sighand_cache /sys/slab/signal_cache /sys/slab/sigqueue /sys/slab/skbuff_fclone_cache /sys/slab/skbuff_head_cache /sys/slab/sock _inode_cache /sys/slab/sysfs_dir_cache /sys/slab/task_struct /sys/slab/tcp_bind_bucket /sys/slab/tw_sock_TCP /sys/slab/uhci_urb_priv /sys/slab/uid_cache /sys/slab/vm_area_struct /sys/slab/xfrm_dst_cache Christoph Lameter
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0004-slab_defrag_add_defrag_ratio.patch --]
[-- Type: text/plain, Size: 2356 bytes --]

The defrag_ratio is used to set the threshold when a slabcache should be
defragmented.

The allocation ratio is measured in a percentage of the available slots.
The percentage will be lower for slabs that are more fragmented.

Add a defrag ratio field and set it to 30% by default. A limit of 30%
that less than 3 out of 10 available slots for objects are in use.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slub_def.h |    7 +++++++
 mm/slub.c                |   18 ++++++++++++++++++
 2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 5912b58..291881d 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -52,6 +52,13 @@ struct kmem_cache {
 	void (*ctor)(void *, struct kmem_cache *, unsigned long);
 	int inuse;		/* Offset to metadata */
 	int align;		/* Alignment */
+	int defrag_ratio;	/*
+				 * objects/possible-objects limit. If we have
+				 * less that the specified percentage of
+				 * objects allocated then defrag passes
+				 * will start to occur during reclaim.
+				 */
+
 	const char *name;	/* Name (only for display!) */
 	struct list_head list;	/* List of slab caches */
 #ifdef CONFIG_SLUB_DEBUG
diff --git a/mm/slub.c b/mm/slub.c
index e63aba5..f95a760 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2200,6 +2200,7 @@ static int kmem_cache_open(struct kmem_cache *s, gfp_t gfpflags,
 		goto error;
 
 	s->refcount = 1;
+	s->defrag_ratio = 30;
 #ifdef CONFIG_NUMA
 	s->remote_node_defrag_ratio = 100;
 #endif
@@ -3717,6 +3718,22 @@ static ssize_t free_calls_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(free_calls);
 
+static ssize_t defrag_ratio_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", s->defrag_ratio);
+}
+
+static ssize_t defrag_ratio_store(struct kmem_cache *s,
+				const char *buf, size_t length)
+{
+	int n = simple_strtoul(buf, NULL, 10);
+
+	if (n < 100)
+		s->defrag_ratio = n;
+	return length;
+}
+SLAB_ATTR(defrag_ratio);
+
 #ifdef CONFIG_NUMA
 static ssize_t remote_node_defrag_ratio_show(struct kmem_cache *s, char *buf)
 {
@@ -3759,6 +3776,7 @@ static struct attribute * slab_attrs[] = {
 	&shrink_attr.attr,
 	&alloc_calls_attr.attr,
 	&free_calls_attr.attr,
+	&defrag_ratio_attr.attr,
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,
 #endif
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Parse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/slab/UDP-Lite /sys/slab/UNIX /sys/slab/anon_vma /sys/slab/arp_cache /sys/slab/bdev_cache /sys/ slab/bio /sys/slab/biovec-1 /sys/slab/biovec-128 /sys/slab/biovec-16 /sys/slab/biovec-256 /sys/slab/biovec-4 /sys/slab/biovec-64 /sys/slab/blkdev_ioc /sys/slab/blkdev_queue /sys/slab/blkdev_requests /sys/slab/buffer_head /sys/slab/cfq_io_context /sys/slab/cfq_queue /sys/slab/dentry /sys/slab/eventpoll_epi /sys/slab/eventpoll_pwq /sys/slab/ext2_inode_cache /sys/slab/ext3_inode_cache /sys/slab/fasync_cache /sys/slab/file_lock_cache /sys/slab/files_cache /sys/slab/filp /sys/slab/flow_cache /sys/slab/fs_cache /sys/slab/idr_layer_cache /sys/slab/inet_peer_cache /sys/slab/inode_cache /sys/slab/inotify_event_cache /sys/slab/inotify_watch_cache /sys/slab/ip_dst_cache /sys/slab/ip_fib_alias /sys/slab/ip_fib_hash /sys/slab/jbd_1k /sys/slab/jbd_4k /sys/slab/journal_handle /sys/slab/journal_head /sys/slab/kiocb /sys/slab/kioctx /sys/slab/kmalloc-1024 /sys/slab/kmalloc-128 /sys/slab/kmalloc-16 /sys/slab/kmalloc-192 /sys/slab/kmalloc-2048 /sys/slab/kmalloc-256 /sys/slab/kmalloc-32 /sys/sl ab/kmalloc-512 /sys/slab/kmalloc-64 /sys/slab/kmalloc-8 /sys/slab/kmalloc-96 /sys/slab/mm_struct /sys/slab/mnt_cache /sys/slab/mqueue_inode_cache /sys/slab/names_cache /sys/slab/nfs_direct_cache /sys/slab/nfs_inode_cache /sys/slab/nfs_page /sys/slab/nfs_read_data /sys/slab/nfs_write_data /sys/slab/nfsd4_delegations /sys/slab/nfsd4_files /sys/slab/nfsd4_stateids /sys/slab/nfsd4_stateowners /sys/slab/nsproxy /sys/slab/pid /sys/slab/posix_timers_cache /sys/slab/proc_inode_cache /sys/slab/radix_tree_node /sys/slab/request_sock_TCP /sys/slab/revoke_record /sys/slab/revoke_table /sys/slab/rpc_buffers /sys/slab/rpc_inode_cache /sys/slab/rpc_tasks /sys/slab/scsi_cmd_cache /sys/slab/scsi_io_context /sys/slab/secpath_cache /sys/slab/sgpool-128 /sys/slab/sgpool-16 /sys/slab/sgpool-32 /sys/slab/sgpool-64 /sys/slab/sgpool-8 /sys/slab/shmem_inode_cache /sys/slab/sighand_cache /sys/slab/signal_cache /sys/slab/sigqueue /sys/slab/skbuff_fclone_cache /sys/slab/skbuff_head_cache /sys/slab/sock _inode_cache /sys/slab/sysfs_dir_cache /sys/slab/task_struct /sys/slab/tcp_bind_bucket /sys/slab/tw_sock_TCP /sys/slab/uhci_urb_priv /sys/slab/uid_cache /sys/slab/vm_area_struct /sys/slab/xfrm_dst_cache
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 04/26] SLUB: Add defrag_ratio field and sysfs support Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 06/26] SLUB: Add get() and kick() methods Christoph Lameter
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0005-slab_defrag_ops_field.patch --]
[-- Type: text/plain, Size: 1305 bytes --]

Create an ops field in /sys/slab/*/ops to contain all the operations defined
on a slab. This will be used to display the additional operations that we
will define soon.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index f95a760..fc2f1e3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3501,16 +3501,18 @@ static ssize_t order_show(struct kmem_cache *s, char *buf)
 }
 SLAB_ATTR_RO(order);
 
-static ssize_t ctor_show(struct kmem_cache *s, char *buf)
+static ssize_t ops_show(struct kmem_cache *s, char *buf)
 {
-	if (s->ctor) {
-		int n = sprint_symbol(buf, (unsigned long)s->ctor);
+	int x = 0;
 
-		return n + sprintf(buf + n, "\n");
+	if (s->ctor) {
+		x += sprintf(buf + x, "ctor : ");
+		x += sprint_symbol(buf + x, (unsigned long)s->ops->ctor);
+		x += sprintf(buf + x, "\n");
 	}
-	return 0;
+	return x;
 }
-SLAB_ATTR_RO(ctor);
+SLAB_ATTR_RO(ops);
 
 static ssize_t aliases_show(struct kmem_cache *s, char *buf)
 {
@@ -3761,7 +3763,7 @@ static struct attribute * slab_attrs[] = {
 	&slabs_attr.attr,
 	&partial_attr.attr,
 	&cpu_slabs_attr.attr,
-	&ctor_attr.attr,
+	&ops_attr.attr,
 	&aliases_attr.attr,
 	&align_attr.attr,
 	&sanity_checks_attr.attr,
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 06/26] SLUB: Add get() and kick() methods
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Parse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/slab/UDP-Lite /sys/slab/UNIX /sys/slab/anon_vma /sys/slab/arp_cache /sys/slab/bdev_cache /sys/ slab/bio /sys/slab/biovec-1 /sys/slab/biovec-128 /sys/slab/biovec-16 /sys/slab/biovec-256 /sys/slab/biovec-4 /sys/slab/biovec-64 /sys/slab/blkdev_ioc /sys/slab/blkdev_queue /sys/slab/blkdev_requests /sys/slab/buffer_head /sys/slab/cfq_io_context /sys/slab/cfq_queue /sys/slab/dentry /sys/slab/eventpoll_epi /sys/slab/eventpoll_pwq /sys/slab/ext2_inode_cache /sys/slab/ext3_inode_cache /sys/slab/fasync_cache /sys/slab/file_lock_cache /sys/slab/files_cache /sys/slab/filp /sys/slab/flow_cache /sys/slab/fs_cache /sys/slab/idr_layer_cache /sys/slab/inet_peer_cache /sys/slab/inode_cache /sys/slab/inotify_event_cache /sys/slab/inotify_watch_cache /sys/slab/ip_dst_cache /sys/slab/ip_fib_alias /sys/slab/ip_fib_hash /sys/slab/jbd_1k /sys/slab/jbd_4k /sys/slab/journal_handle /sys/slab/journal_head /sys/slab/kiocb /sys/slab/kioctx /sys/slab/kmalloc-1024 /sys/slab/kmalloc-128 /sys/slab/kmalloc-16 /sys/slab/kmalloc-192 /sys/slab/kmalloc-2048 /sys/slab/kmalloc-256 /sys/slab/kmalloc-32 /sys/sl ab/kmalloc-512 /sys/slab/kmalloc-64 /sys/slab/kmalloc-8 /sys/slab/kmalloc-96 /sys/slab/mm_struct /sys/slab/mnt_cache /sys/slab/mqueue_inode_cache /sys/slab/names_cache /sys/slab/nfs_direct_cache /sys/slab/nfs_inode_cache /sys/slab/nfs_page /sys/slab/nfs_read_data /sys/slab/nfs_write_data /sys/slab/nfsd4_delegations /sys/slab/nfsd4_files /sys/slab/nfsd4_stateids /sys/slab/nfsd4_stateowners /sys/slab/nsproxy /sys/slab/pid /sys/slab/posix_timers_cache /sys/slab/proc_inode_cache /sys/slab/radix_tree_node /sys/slab/request_sock_TCP /sys/slab/revoke_record /sys/slab/revoke_table /sys/slab/rpc_buffers /sys/slab/rpc_inode_cache /sys/slab/rpc_tasks /sys/slab/scsi_cmd_cache /sys/slab/scsi_io_context /sys/slab/secpath_cache /sys/slab/sgpool-128 /sys/slab/sgpool-16 /sys/slab/sgpool-32 /sys/slab/sgpool-64 /sys/slab/sgpool-8 /sys/slab/shmem_inode_cache /sys/slab/sighand_cache /sys/slab/signal_cache /sys/slab/sigqueue /sys/slab/skbuff_fclone_cache /sys/slab/skbuff_head_cache /sys/slab/sock _inode_cache /sys/slab/sysfs_dir_cache /sys/slab/task_struct /sys/slab/tcp_bind_bucket /sys/slab/tw_sock_TCP /sys/slab/uhci_urb_priv /sys/slab/uid_cache /sys/slab/vm_area_struct /sys/slab/xfrm_dst_cache Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0006-slab_defrag_get_and_kick_method.patch --]
[-- Type: text/plain, Size: 4220 bytes --]

Add the two methods needed for defragmentation and add the display of the
methods via the proc interface.

Add documentation explaining the use of these methods.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slab.h     |    3 +++
 include/linux/slub_def.h |   32 ++++++++++++++++++++++++++++++++
 mm/slub.c                |   32 ++++++++++++++++++++++++++++++--
 3 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index d859354..848e9a7 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -54,6 +54,9 @@ struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
 			void (*)(void *, struct kmem_cache *, unsigned long));
 void kmem_cache_destroy(struct kmem_cache *);
 int kmem_cache_shrink(struct kmem_cache *);
+void kmem_cache_setup_defrag(struct kmem_cache *s,
+	void *(*get)(struct kmem_cache *, int nr, void **),
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private));
 void kmem_cache_free(struct kmem_cache *, void *);
 unsigned int kmem_cache_size(struct kmem_cache *);
 const char *kmem_cache_name(struct kmem_cache *);
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 291881d..69c32a7 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -50,6 +50,38 @@ struct kmem_cache {
 	int objects;		/* Number of objects in slab */
 	int refcount;		/* Refcount for slab cache destroy */
 	void (*ctor)(void *, struct kmem_cache *, unsigned long);
+
+	/*
+	 * Called with slab lock held and interrupts disabled.
+	 * No slab operation may be performed in get().
+	 *
+	 * Parameters passed are the number of objects to process
+	 * and an array of pointers to objects for which we
+	 * need references.
+	 *
+	 * Returns a pointer that is passed to the kick function.
+	 * If all objects cannot be moved then the pointer may
+	 * indicate that this wont work and then kick can simply
+	 * remove the references that were already obtained.
+	 *
+	 * The array passed to get() is also passed to kick(). The
+	 * function may remove objects by setting array elements to NULL.
+	 */
+	void *(*get)(struct kmem_cache *, int nr, void **);
+
+	/*
+	 * Called with no locks held and interrupts enabled.
+	 * Any operation may be performed in kick().
+	 *
+	 * Parameters passed are the number of objects in the array,
+	 * the array of pointers to the objects and the pointer
+	 * returned by get().
+	 *
+	 * Success is checked by examining the number of remaining
+	 * objects in the slab.
+	 */
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private);
+
 	int inuse;		/* Offset to metadata */
 	int align;		/* Alignment */
 	int defrag_ratio;	/*
diff --git a/mm/slub.c b/mm/slub.c
index fc2f1e3..4a64038 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2597,6 +2597,20 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);
 
+void kmem_cache_setup_defrag(struct kmem_cache *s,
+	void *(*get)(struct kmem_cache *, int nr, void **),
+	void (*kick)(struct kmem_cache *, int nr, void **, void *private))
+{
+	/*
+	 * Defragmentable slabs must have a ctor otherwise objects may be
+	 * in an undetermined state after they are allocated.
+	 */
+	BUG_ON(!s->ctor);
+	s->get = get;
+	s->kick = kick;
+}
+EXPORT_SYMBOL(kmem_cache_setup_defrag);
+
 static unsigned long count_partial(struct kmem_cache_node *n)
 {
 	unsigned long flags;
@@ -2777,7 +2791,7 @@ static int slab_unmergeable(struct kmem_cache *s)
 	if (slub_nomerge || (s->flags & SLUB_NEVER_MERGE))
 		return 1;
 
-	if (s->ctor)
+	if (s->ctor || s->kick || s->get)
 		return 1;
 
 	/*
@@ -3507,7 +3521,21 @@ static ssize_t ops_show(struct kmem_cache *s, char *buf)
 
 	if (s->ctor) {
 		x += sprintf(buf + x, "ctor : ");
-		x += sprint_symbol(buf + x, (unsigned long)s->ops->ctor);
+		x += sprint_symbol(buf + x, (unsigned long)s->ctor);
+		x += sprintf(buf + x, "\n");
+	}
+
+	if (s->get) {
+		x += sprintf(buf + x, "get : ");
+		x += sprint_symbol(buf + x,
+				(unsigned long)s->get);
+		x += sprintf(buf + x, "\n");
+	}
+
+	if (s->kick) {
+		x += sprintf(buf + x, "kick : ");
+		x += sprint_symbol(buf + x,
+				(unsigned long)s->kick);
 		x += sprintf(buf + x, "\n");
 	}
 	return x;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 06/26] SLUB: Add get() and kick() methods Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function Christoph Lameter
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0007-slab_defrag_determine_maximum_objects.patch --]
[-- Type: text/plain, Size: 2288 bytes --]

When we defragmenting slabs then it is advantageous to have all
defragmentable slabs together at the beginning of the list so that we do not
have to scan the complete list. When adding a slab cache put defragmentale
caches first and others last.

Determine the maximum number of objects in defragmentable slabs. This allows
to size the allocation of arrays holding refs to these objects later.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 4a64038..9006069 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -226,6 +226,9 @@ static enum {
 static DECLARE_RWSEM(slub_lock);
 static LIST_HEAD(slab_caches);
 
+/* Maximum objects in defragmentable slabs */
+static unsigned int max_defrag_slab_objects = 0;
+
 /*
  * Tracking user of a slab.
  */
@@ -2385,7 +2388,7 @@ static struct kmem_cache *create_kmalloc_cache(struct kmem_cache *s,
 			flags, NULL))
 		goto panic;
 
-	list_add(&s->list, &slab_caches);
+	list_add_tail(&s->list, &slab_caches);
 	up_write(&slub_lock);
 	if (sysfs_slab_add(s))
 		goto panic;
@@ -2597,6 +2600,13 @@ void kfree(const void *x)
 }
 EXPORT_SYMBOL(kfree);
 
+static inline void *alloc_scratch(void)
+{
+	return kmalloc(max_defrag_slab_objects * sizeof(void *) +
+	    BITS_TO_LONGS(max_defrag_slab_objects) * sizeof(unsigned long),
+								GFP_KERNEL);
+}
+
 void kmem_cache_setup_defrag(struct kmem_cache *s,
 	void *(*get)(struct kmem_cache *, int nr, void **),
 	void (*kick)(struct kmem_cache *, int nr, void **, void *private))
@@ -2608,6 +2618,11 @@ void kmem_cache_setup_defrag(struct kmem_cache *s,
 	BUG_ON(!s->ctor);
 	s->get = get;
 	s->kick = kick;
+	down_write(&slub_lock);
+	list_move(&s->list, &slab_caches);
+	if (s->objects > max_defrag_slab_objects)
+		max_defrag_slab_objects = s->objects;
+	up_write(&slub_lock);
 }
 EXPORT_SYMBOL(kmem_cache_setup_defrag);
 
@@ -2878,7 +2893,7 @@ struct kmem_cache *kmem_cache_create(const char *name, size_t size,
 	if (s) {
 		if (kmem_cache_open(s, GFP_KERNEL, name,
 				size, align, flags, ctor)) {
-			list_add(&s->list, &slab_caches);
+			list_add_tail(&s->list, &slab_caches);
 			up_write(&slub_lock);
 			if (sysfs_slab_add(s))
 				goto err;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 09/26] SLUB: Slab defrag core Christoph Lameter
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0008-slab_defrag_add_partial_tail.patch --]
[-- Type: text/plain, Size: 3926 bytes --]

Add a parameter to add_partial instead of having separate functions.
That allows the detailed control from multiple places when putting
slabs back to the partial list. If we put slabs back to the front
then they are likely used immediately for allocations. If they are
put at the end then we can maximize the time that the partial slabs
spent without allocations.

When deactivating slab we can put the slabs that had remote objects freed
to them at the end of the list so that the cachelines can cool down.
Slabs that had objects from the cpu freed to them are put in the front
of the list to be reused ASAP.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-08-28 20:03:16.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-08-28 20:21:55.000000000 -0700
@@ -1173,19 +1173,15 @@ static __always_inline int slab_trylock(
 /*
  * Management of partially allocated slabs
  */
-static void add_partial_tail(struct kmem_cache_node *n, struct page *page)
+static void add_partial(struct kmem_cache_node *n,
+				struct page *page, int tail)
 {
 	spin_lock(&n->list_lock);
 	n->nr_partial++;
-	list_add_tail(&page->lru, &n->partial);
-	spin_unlock(&n->list_lock);
-}
-
-static void add_partial(struct kmem_cache_node *n, struct page *page)
-{
-	spin_lock(&n->list_lock);
-	n->nr_partial++;
-	list_add(&page->lru, &n->partial);
+	if (tail)
+		list_add_tail(&page->lru, &n->partial);
+	else
+		list_add(&page->lru, &n->partial);
 	spin_unlock(&n->list_lock);
 }
 
@@ -1314,7 +1310,7 @@ static struct page *get_partial(struct k
  *
  * On exit the slab lock will have been dropped.
  */
-static void unfreeze_slab(struct kmem_cache *s, struct page *page)
+static void unfreeze_slab(struct kmem_cache *s, struct page *page, int tail)
 {
 	struct kmem_cache_node *n = get_node(s, page_to_nid(page));
 
@@ -1322,7 +1318,7 @@ static void unfreeze_slab(struct kmem_ca
 	if (page->inuse) {
 
 		if (page->freelist)
-			add_partial(n, page);
+			add_partial(n, page, tail);
 		else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
 			add_full(n, page);
 		slab_unlock(page);
@@ -1337,7 +1333,7 @@ static void unfreeze_slab(struct kmem_ca
 			 * partial list stays small. kmem_cache_shrink can
 			 * reclaim empty slabs from the partial list.
 			 */
-			add_partial_tail(n, page);
+			add_partial(n, page, 1);
 			slab_unlock(page);
 		} else {
 			slab_unlock(page);
@@ -1352,6 +1348,7 @@ static void unfreeze_slab(struct kmem_ca
 static void deactivate_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
 {
 	struct page *page = c->page;
+	int tail = 1;
 	/*
 	 * Merge cpu freelist into freelist. Typically we get here
 	 * because both freelists are empty. So this is unlikely
@@ -1360,6 +1357,8 @@ static void deactivate_slab(struct kmem_
 	while (unlikely(c->freelist)) {
 		void **object;
 
+		tail = 0;	/* Hot objects. Put the slab first */
+
 		/* Retrieve object from cpu_freelist */
 		object = c->freelist;
 		c->freelist = c->freelist[c->offset];
@@ -1370,7 +1369,7 @@ static void deactivate_slab(struct kmem_
 		page->inuse--;
 	}
 	c->page = NULL;
-	unfreeze_slab(s, page);
+	unfreeze_slab(s, page, tail);
 }
 
 static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c)
@@ -1603,7 +1602,7 @@ checks_ok:
 	 * then add it.
 	 */
 	if (unlikely(!prior))
-		add_partial(get_node(s, page_to_nid(page)), page);
+		add_partial(get_node(s, page_to_nid(page)), page, 0);
 
 out_unlock:
 	slab_unlock(page);
@@ -2012,7 +2011,7 @@ static struct kmem_cache_node * __init e
 #endif
 	init_kmem_cache_node(n);
 	atomic_long_inc(&n->nr_slabs);
-	add_partial(n, page);
+	add_partial(n, page, 0);
 
 	/*
 	 * new_slab() disables interupts. If we do not reenable interrupts here

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 09/26] SLUB: Slab defrag core
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (7 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 10/26] SLUB: Trigger defragmentation from memory reclaim Christoph Lameter
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0009-slab_defrag_core.patch --]
[-- Type: text/plain, Size: 11836 bytes --]

Slab defragmentation (aside from Lumpy Reclaim) may occur:

1. Unconditionally when kmem_cache_shrink is called on a slab cache by the
   kernel calling kmem_cache_shrink.

2. Use of the slabinfo command line to trigger slab shrinking.

3. Per node defrag conditionally when kmem_cache_defrag(<node>) is called.

   Defragmentation is only performed if the fragmentation of the slab
   is lower than the specified percentage. Fragmentation ratios are measured
   by calculating the percentage of objects in use compared to the total
   number of objects that the slab cache could hold.

   kmem_cache_defrag takes a node parameter. This can either be -1 if
   defragmentation should be performed on all nodes, or a node number.
   If a node number was specified then defragmentation is only performed
   on a specific node.

   Slab defragmentation is a memory intensive operation that can be
   sped up in a NUMA system if mostly node local memory is accessed. That
   is the case if we just have reclaimed reclaim on a node.

In order for a slabcache to support defragmentation a couple of functions
must be setup via a call to kmem_cache_setup_defrag(). These are

void *get(struct kmem_cache *s, int nr, void **objects)

	Must obtain a reference to the listed objects. SLUB guarantees that
	the objects are still allocated. However, other threads may be blocked
	in slab_free attempting to free objects in the slab. These may succeed
	as soon as get() returns to the slab allocator. The function must
	be able to detect such situations and void the attempts to free such
	objects (by for example voiding the corresponding entry in the objects
	array).

	No slab operations may be performed in get(). Interrupts
	are disabled. What can be done is very limited. The slab lock
	for the page with the object is taken. Any attempt to perform a slab
	operation may lead to a deadlock.

	get() returns a private pointer that is passed to kick. Should we
	be unable to obtain all references then that pointer may indicate
	to the kick() function that it should not attempt any object removal
	or move but simply remove the reference counts.

void kick(struct kmem_cache *, int nr, void **objects, void *get_result)

	After SLUB has established references to the objects in a
	slab it will then drop all locks and use kick() to move objects out
	of the slab. The existence of the object is guaranteed by virtue of
	the earlier obtained references via get(). The callback may perform
	any slab operation since no locks are held at the time of call.

	The callback should remove the object from the slab in some way. This
	may be accomplished by reclaiming the object and then running
	kmem_cache_free() or reallocating it and then running
	kmem_cache_free(). Reallocation is advantageous because the partial
	slabs were just sorted to have the partial slabs with the most objects
	first. Reallocation is likely to result in filling up a slab in
	addition to freeing up one slab so that it also can be removed from
	the partial list.

	Kick() does not return a result. SLUB will check the number of
	remaining objects in the slab. If all objects were removed then
	we know that the operation was successful.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slab.c |    5 +
 mm/slub.c |  265 ++++++++++++++++++++++++++++++++++++++++++++++++++------------
 2 files changed, 222 insertions(+), 48 deletions(-)

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2007-08-28 20:04:05.000000000 -0700
+++ linux-2.6/mm/slab.c	2007-08-28 20:04:54.000000000 -0700
@@ -2527,6 +2527,11 @@ int kmem_cache_shrink(struct kmem_cache 
 }
 EXPORT_SYMBOL(kmem_cache_shrink);
 
+int kmem_cache_defrag(int node)
+{
+	return 0;
+}
+
 /**
  * kmem_cache_destroy - delete a cache
  * @cachep: the cache to destroy
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-08-28 20:04:10.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-08-28 20:04:54.000000000 -0700
@@ -2639,75 +2639,244 @@ static unsigned long count_partial(struc
 }
 
 /*
- * kmem_cache_shrink removes empty slabs from the partial lists and sorts
- * the remaining slabs by the number of items in use. The slabs with the
- * most items in use come first. New allocations will then fill those up
- * and thus they can be removed from the partial lists.
+ * Vacate all objects in the given slab.
  *
- * The slabs with the least items are placed last. This results in them
- * being allocated from last increasing the chance that the last objects
- * are freed in them.
+ * The scratch aread passed to list function is sufficient to hold
+ * struct listhead times objects per slab. We use it to hold void ** times
+ * objects per slab plus a bitmap for each object.
  */
-int kmem_cache_shrink(struct kmem_cache *s)
+static int kmem_cache_vacate(struct page *page, void *scratch)
 {
-	int node;
-	int i;
-	struct kmem_cache_node *n;
+	void **vector = scratch;
+	void *p;
+	void *addr = page_address(page);
+	struct kmem_cache *s;
+	unsigned long *map;
+	int leftover;
+	int objects;
+	void *private;
+	unsigned long flags;
+	int tail = 1;
+
+	BUG_ON(!PageSlab(page) || !SlabFrozen(page));
+	local_irq_save(flags);
+	slab_lock(page);
+
+	s = page->slab;
+	map = scratch + s->objects * sizeof(void **);
+	if (!page->inuse || !s->kick)
+		goto out;
+
+	/* Determine used objects */
+	bitmap_fill(map, s->objects);
+	for_each_free_object(p, s, page->freelist)
+		__clear_bit(slab_index(p, s, addr), map);
+
+	objects = 0;
+	memset(vector, 0, s->objects * sizeof(void **));
+	for_each_object(p, s, addr)
+		if (test_bit(slab_index(p, s, addr), map))
+			vector[objects++] = p;
+
+	private = s->get(s, objects, vector);
+
+	/*
+	 * Got references. Now we can drop the slab lock. The slab
+	 * is frozen so it cannot vanish from under us nor will
+	 * allocations be performed on the slab. However, unlocking the
+	 * slab will allow concurrent slab_frees to proceed.
+	 */
+	slab_unlock(page);
+	local_irq_restore(flags);
+
+	/*
+	 * Perform the KICK callbacks to remove the objects.
+	 */
+	s->kick(s, objects, vector, private);
+
+	local_irq_save(flags);
+	slab_lock(page);
+	tail = 0;
+out:
+	/*
+	 * Check the result and unfreeze the slab
+	 */
+	leftover = page->inuse;
+	unfreeze_slab(s, page, tail);
+	local_irq_restore(flags);
+	return leftover;
+}
+
+/*
+ * Reclaim objects from a list of slab pages that have been gathered.
+ * Must be called with slabs that have been isolated before.
+ */
+int kmem_cache_reclaim(struct list_head *zaplist)
+{
+	int freed = 0;
+	void **scratch;
 	struct page *page;
-	struct page *t;
-	struct list_head *slabs_by_inuse =
-		kmalloc(sizeof(struct list_head) * s->objects, GFP_KERNEL);
+	struct page *page2;
+
+	if (list_empty(zaplist))
+		return 0;
+
+	scratch = alloc_scratch();
+	if (!scratch)
+		return 0;
+
+	list_for_each_entry_safe(page, page2, zaplist, lru) {
+		list_del(&page->lru);
+		if (kmem_cache_vacate(page, scratch) == 0)
+				freed++;
+	}
+	kfree(scratch);
+	return freed;
+}
+
+/*
+ * Shrink the slab cache on a particular node of the cache
+ * by releasing slabs with zero objects and trying to reclaim
+ * slabs with less than a quarter of objects allocated.
+ */
+static unsigned long __kmem_cache_shrink(struct kmem_cache *s,
+	struct kmem_cache_node *n)
+{
 	unsigned long flags;
+	struct page *page, *page2;
+	LIST_HEAD(zaplist);
+	int freed = 0;
+	int inuse;
 
-	if (!slabs_by_inuse)
-		return -ENOMEM;
+	spin_lock_irqsave(&n->list_lock, flags);
+	list_for_each_entry_safe(page, page2, &n->partial, lru) {
+		inuse = page->inuse;
 
-	flush_all(s);
-	for_each_online_node(node) {
-		n = get_node(s, node);
+		if (inuse > s->objects / 4)
+			continue;
 
-		if (!n->nr_partial)
+		if (!slab_trylock(page))
 			continue;
 
-		for (i = 0; i < s->objects; i++)
-			INIT_LIST_HEAD(slabs_by_inuse + i);
+		if (inuse) {
 
-		spin_lock_irqsave(&n->list_lock, flags);
+			list_move(&page->lru, &zaplist);
 
-		/*
-		 * Build lists indexed by the items in use in each slab.
-		 *
-		 * Note that concurrent frees may occur while we hold the
-		 * list_lock. page->inuse here is the upper limit.
-		 */
-		list_for_each_entry_safe(page, t, &n->partial, lru) {
-			if (!page->inuse && slab_trylock(page)) {
-				/*
-				 * Must hold slab lock here because slab_free
-				 * may have freed the last object and be
-				 * waiting to release the slab.
-				 */
-				list_del(&page->lru);
+			if (s->kick) {
 				n->nr_partial--;
-				slab_unlock(page);
-				discard_slab(s, page);
-			} else {
-				list_move(&page->lru,
-				slabs_by_inuse + page->inuse);
+				SetSlabFrozen(page);
 			}
+			slab_unlock(page);
+
+		} else {
+			list_del(&page->lru);
+			slab_unlock(page);
+			discard_slab(s, page);
+			freed++;
 		}
+	}
+
+	if (!s->kick)
+		/* Simply put the zaplist at the end */
+		list_splice(&zaplist, n->partial.prev);
 
+	spin_unlock_irqrestore(&n->list_lock, flags);
+
+	if (s->kick)
 		/*
-		 * Rebuild the partial list with the slabs filled up most
-		 * first and the least used slabs at the end.
+		 * Now we can free objects in the slabs on the zaplist
+		 * (or we simply reorder the list
 		 */
-		for (i = s->objects - 1; i >= 0; i--)
-			list_splice(slabs_by_inuse + i, n->partial.prev);
+		freed += kmem_cache_reclaim(&zaplist);
 
-		spin_unlock_irqrestore(&n->list_lock, flags);
+	return freed;
+}
+
+
+static unsigned long __kmem_cache_defrag(struct kmem_cache *s, int node)
+{
+	unsigned long capacity;
+	unsigned long objects_in_full_slabs;
+	unsigned long ratio;
+	struct kmem_cache_node *n = get_node(s, node);
+
+	/*
+	 * An insignificant number of partial slabs makes
+	 * the slab not interesting.
+	 */
+	if (n->nr_partial <= MAX_PARTIAL)
+		return 0;
+
+	capacity = atomic_long_read(&n->nr_slabs) * s->objects;
+	objects_in_full_slabs =
+			(atomic_long_read(&n->nr_slabs) - n->nr_partial)
+							* s->objects;
+	/*
+	 * Worst case calculation: If we would be over the ratio
+	 * even if all partial slabs would only have one object
+	 * then we can skip the further test that would require a scan
+	 * through all the partial page structs to sum up the actual
+	 * number of objects in the partial slabs.
+	 */
+	ratio = (objects_in_full_slabs + 1 * n->nr_partial) * 100 / capacity;
+	if (ratio > s->defrag_ratio)
+		return 0;
+
+	/*
+	 * Now for the real calculation. If usage ratio is more than required
+	 * then no defragmentation
+	 */
+	ratio = (objects_in_full_slabs + count_partial(n)) * 100 / capacity;
+	if (ratio > s->defrag_ratio)
+		return 0;
+
+	return __kmem_cache_shrink(s, n) << s->order;
+}
+
+/*
+ * Defrag slabs conditional on the fragmentation ratio on each node.
+ */
+int kmem_cache_defrag(int node)
+{
+	struct kmem_cache *s;
+	unsigned long pages = 0;
+
+	/*
+	 * kmem_cache_defrag may be called from the reclaim path which may be
+	 * called for any page allocator alloc. So there is the danger that we
+	 * get called in a situation where slub already acquired the slub_lock
+	 * for other purposes.
+	 */
+	if (!down_read_trylock(&slub_lock))
+		return 0;
+
+	list_for_each_entry(s, &slab_caches, list) {
+		if (node == -1) {
+			int nid;
+
+			for_each_online_node(nid)
+				pages += __kmem_cache_defrag(s, nid);
+		} else
+			pages += __kmem_cache_defrag(s, node);
 	}
+	up_read(&slub_lock);
+	return pages;
+}
+EXPORT_SYMBOL(kmem_cache_defrag);
+
+/*
+ * kmem_cache_shrink removes empty slabs from the partial lists.
+ * If the slab cache support defragmentation then objects are
+ * reclaimed.
+ */
+int kmem_cache_shrink(struct kmem_cache *s)
+{
+	int node;
+
+	flush_all(s);
+	for_each_online_node(node)
+		__kmem_cache_shrink(s, get_node(s, node));
 
-	kfree(slabs_by_inuse);
 	return 0;
 }
 EXPORT_SYMBOL(kmem_cache_shrink);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 10/26] SLUB: Trigger defragmentation from memory reclaim
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (8 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 09/26] SLUB: Slab defrag core Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 11/26] VM: Allow get_page_unless_zero on compound pages Christoph Lameter
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0010-slab_defrag_trigger_defrag_from_reclaim.patch --]
[-- Type: text/plain, Size: 5672 bytes --]

This patch triggers slab defragmentation from memory reclaim.
The logical point for this is after slab shrinking was performed in
vmscan.c. At that point the fragmentation ratio of a slab was increased
by objects being freed. So we call kmem_cache_defrag from there.

slab_shrink() from vmscan.c is called in some contexts to do
global shrinking of slabs and in others to do shrinking for
a particular zone. Pass the zone to slab_shrink, so that slab_shrink
can call kmem_cache_defrag() and restrict the defragmentation to
the node that is under memory pressure.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/drop_caches.c     |    2 +-
 include/linux/mm.h   |    2 +-
 include/linux/slab.h |    1 +
 mm/vmscan.c          |   27 ++++++++++++++++++++-------
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index 59375ef..fb58e63 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -50,7 +50,7 @@ void drop_slab(void)
 	int nr_objects;
 
 	do {
-		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000, NULL);
 	} while (nr_objects > 10);
 }
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a396aac..9fbb6ba 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1202,7 +1202,7 @@ int in_gate_area_no_task(unsigned long addr);
 int drop_caches_sysctl_handler(struct ctl_table *, int, struct file *,
 					void __user *, size_t *, loff_t *);
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
-			unsigned long lru_pages);
+			unsigned long lru_pages, struct zone *zone);
 void drop_pagecache(void);
 void drop_slab(void);
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 848e9a7..7d8ec17 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -61,6 +61,7 @@ void kmem_cache_free(struct kmem_cache *, void *);
 unsigned int kmem_cache_size(struct kmem_cache *);
 const char *kmem_cache_name(struct kmem_cache *);
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
+int kmem_cache_defrag(int node);
 
 /*
  * Please use this macro to create slab caches. Simply specify the
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d419e10..c6882d8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -150,10 +150,18 @@ EXPORT_SYMBOL(unregister_shrinker);
  * are eligible for the caller's allocation attempt.  It is used for balancing
  * slab reclaim versus page reclaim.
  *
+ * zone is the zone for which we are shrinking the slabs. If the intent
+ * is to do a global shrink then zone may be NULL. Specification of a
+ * zone is currently only used to limit slab defragmentation to a NUMA node.
+ * The performace of shrink_slab would be better (in particular under NUMA)
+ * if it could be targeted as a whole to the zone that is under memory
+ * pressure but the VFS infrastructure does not allow that at the present
+ * time.
+ *
  * Returns the number of slab objects which we shrunk.
  */
 unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
-			unsigned long lru_pages)
+			unsigned long lru_pages, struct zone *zone)
 {
 	struct shrinker *shrinker;
 	unsigned long ret = 0;
@@ -210,6 +218,8 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 		shrinker->nr += total_scan;
 	}
 	up_read(&shrinker_rwsem);
+	if (gfp_mask & __GFP_FS)
+		kmem_cache_defrag(zone ? zone_to_nid(zone) : -1);
 	return ret;
 }
 
@@ -1151,7 +1161,8 @@ unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask)
 		if (!priority)
 			disable_swap_token();
 		nr_reclaimed += shrink_zones(priority, zones, &sc);
-		shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);
+		shrink_slab(sc.nr_scanned, gfp_mask, lru_pages,
+						NULL);
 		if (reclaim_state) {
 			nr_reclaimed += reclaim_state->reclaimed_slab;
 			reclaim_state->reclaimed_slab = 0;
@@ -1321,7 +1332,7 @@ loop_again:
 			nr_reclaimed += shrink_zone(priority, zone, &sc);
 			reclaim_state->reclaimed_slab = 0;
 			nr_slab = shrink_slab(sc.nr_scanned, GFP_KERNEL,
-						lru_pages);
+						lru_pages, zone);
 			nr_reclaimed += reclaim_state->reclaimed_slab;
 			total_scanned += sc.nr_scanned;
 			if (zone->all_unreclaimable)
@@ -1559,7 +1570,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	/* If slab caches are huge, it's better to hit them first */
 	while (nr_slab >= lru_pages) {
 		reclaim_state.reclaimed_slab = 0;
-		shrink_slab(nr_pages, sc.gfp_mask, lru_pages);
+		shrink_slab(nr_pages, sc.gfp_mask, lru_pages, NULL);
 		if (!reclaim_state.reclaimed_slab)
 			break;
 
@@ -1597,7 +1608,7 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 
 			reclaim_state.reclaimed_slab = 0;
 			shrink_slab(sc.nr_scanned, sc.gfp_mask,
-					count_lru_pages());
+					count_lru_pages(), NULL);
 			ret += reclaim_state.reclaimed_slab;
 			if (ret >= nr_pages)
 				goto out;
@@ -1614,7 +1625,8 @@ unsigned long shrink_all_memory(unsigned long nr_pages)
 	if (!ret) {
 		do {
 			reclaim_state.reclaimed_slab = 0;
-			shrink_slab(nr_pages, sc.gfp_mask, count_lru_pages());
+			shrink_slab(nr_pages, sc.gfp_mask,
+					count_lru_pages(), NULL);
 			ret += reclaim_state.reclaimed_slab;
 		} while (ret < nr_pages && reclaim_state.reclaimed_slab > 0);
 	}
@@ -1774,7 +1786,8 @@ static int __zone_reclaim(struct zone *zone, gfp_t gfp_mask, unsigned int order)
 		 * Note that shrink_slab will free memory on all zones and may
 		 * take a long time.
 		 */
-		while (shrink_slab(sc.nr_scanned, gfp_mask, order) &&
+		while (shrink_slab(sc.nr_scanned, gfp_mask, order,
+						zone) &&
 			zone_page_state(zone, NR_SLAB_RECLAIMABLE) >
 				slab_reclaimable - nr_pages)
 			;
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 11/26] VM: Allow get_page_unless_zero on compound pages
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (9 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 10/26] SLUB: Trigger defragmentation from memory reclaim Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim Christoph Lameter
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0011-slab_defrag_get_page_unless.patch --]
[-- Type: text/plain, Size: 788 bytes --]

SLUB uses compound pages for larger slabs. We need to increment
the page count of these pages in order to make sure that they are not
freed under us for reclaim from within lumpy reclaim.

(The patch is also part of the large blocksize patchset)

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/mm.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9fbb6ba..713d096 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -290,7 +290,7 @@ static inline int put_page_testzero(struct page *page)
  */
 static inline int get_page_unless_zero(struct page *page)
 {
-	VM_BUG_ON(PageCompound(page));
+	VM_BUG_ON(PageTail(page));
 	return atomic_inc_not_zero(&page->_count);
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (10 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 11/26] VM: Allow get_page_unless_zero on compound pages Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0012-slab_defrag_lumpy_reclaim.patch --]
[-- Type: text/plain, Size: 8011 bytes --]

Creates a special function kmem_cache_isolate_slab() and kmem_cache_reclaim()
to support lumpy reclaim.

In order to isolate pages we will have to handle slab page allocations in
such a way that we can determine if a slab is valid whenever we access it
regardless of its time in life.

A valid slab that can be freed has PageSlab(page) and page->inuse > 0 set.
So we need to make sure in allocate_slab that page->inuse is zero before
PageSlab is set otherwise kmem_cache_vacate may operate on a slab that
has not been properly setup yet.

kmem_cache_isolate_page() is called from lumpy reclaim to isolate pages
neighboring a page cache page that is being reclaimed. Lumpy reclaim will
gather the slabs and call kmem_cache_reclaim() on the list.

This means that we can remove a slab that is in the way of coalescing
together a higher order page.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slab.h |    2 +
 mm/slab.c            |   13 +++++++
 mm/slub.c            |   88 +++++++++++++++++++++++++++++++++++++++++++++++----
 mm/vmscan.c          |   15 ++++++--
 4 files changed, 109 insertions(+), 9 deletions(-)

Index: linux-2.6/include/linux/slab.h
===================================================================
--- linux-2.6.orig/include/linux/slab.h	2007-08-28 20:05:42.000000000 -0700
+++ linux-2.6/include/linux/slab.h	2007-08-28 20:06:22.000000000 -0700
@@ -62,6 +62,8 @@ unsigned int kmem_cache_size(struct kmem
 const char *kmem_cache_name(struct kmem_cache *);
 int kmem_ptr_validate(struct kmem_cache *cachep, const void *ptr);
 int kmem_cache_defrag(int node);
+int kmem_cache_isolate_slab(struct page *);
+int kmem_cache_reclaim(struct list_head *);
 
 /*
  * Please use this macro to create slab caches. Simply specify the
Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c	2007-08-28 20:04:54.000000000 -0700
+++ linux-2.6/mm/slab.c	2007-08-28 20:06:22.000000000 -0700
@@ -2532,6 +2532,19 @@ int kmem_cache_defrag(int node)
 	return 0;
 }
 
+/*
+ * SLAB does not support slab defragmentation
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+	return -ENOSYS;
+}
+
+int kmem_cache_reclaim(struct list_head *zaplist)
+{
+	return 0;
+}
+
 /**
  * kmem_cache_destroy - delete a cache
  * @cachep: the cache to destroy
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-08-28 20:04:54.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-08-28 20:10:37.000000000 -0700
@@ -1006,6 +1006,7 @@ static inline int slab_pad_check(struct 
 static inline int check_object(struct kmem_cache *s, struct page *page,
 			void *object, int active) { return 1; }
 static inline void add_full(struct kmem_cache_node *n, struct page *page) {}
+static inline void remove_full(struct kmem_cache *s, struct page *page) {}
 static inline void kmem_cache_open_debug_check(struct kmem_cache *s) {}
 #define slub_debug 0
 #endif
@@ -1068,11 +1069,9 @@ static struct page *new_slab(struct kmem
 	n = get_node(s, page_to_nid(page));
 	if (n)
 		atomic_long_inc(&n->nr_slabs);
+
+	page->inuse = 0;
 	page->slab = s;
-	page->flags |= 1 << PG_slab;
-	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
-			SLAB_STORE_USER | SLAB_TRACE))
-		SetSlabDebug(page);
 
 	start = page_address(page);
 	end = start + s->objects * s->size;
@@ -1090,8 +1089,18 @@ static struct page *new_slab(struct kmem
 	set_freepointer(s, last, NULL);
 
 	page->freelist = start;
-	page->inuse = 0;
-out:
+
+	/*
+	 * page->inuse must be 0 when PageSlab(page) becomes
+	 * true so that defrag knows that this slab is not in use.
+	 */
+	smp_wmb();
+	__SetPageSlab(page);
+	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
+			SLAB_STORE_USER | SLAB_TRACE))
+		SetSlabDebug(page);
+
+ out:
 	if (flags & __GFP_WAIT)
 		local_irq_disable();
 	return page;
@@ -2638,6 +2647,73 @@ static unsigned long count_partial(struc
 	return x;
 }
 
+ /*
+ * Isolate page from the slab partial lists. Return 0 if succesful.
+ *
+ * After isolation the LRU field can be used to put the page onto
+ * a reclaim list.
+ */
+int kmem_cache_isolate_slab(struct page *page)
+{
+	unsigned long flags;
+	struct kmem_cache *s;
+	int rc = -ENOENT;
+
+	if (!PageSlab(page) || SlabFrozen(page))
+		return rc;
+
+	/*
+	 * Get a reference to the page. Return if its freed or being freed.
+	 * This is necessary to make sure that the page does not vanish
+	 * from under us before we are able to check the result.
+	 */
+	if (!get_page_unless_zero(page))
+		return rc;
+
+	local_irq_save(flags);
+	slab_lock(page);
+
+	/*
+	 * Check a variety of conditions to insure that the page was not
+	 *  1. Freed
+	 *  2. Frozen
+	 *  3. Is in the process of being freed (min one remaining object)
+	 */
+	if (!PageSlab(page) || SlabFrozen(page) || !page->inuse) {
+		slab_unlock(page);
+		put_page(page);
+		goto out;
+	}
+
+	/*
+	 * Drop reference. There are object remaining and therefore
+	 * the slab lock will be taken before the last objects can
+	 * be removed. So we cannot be in the process of freeing the
+	 * object.
+	 *
+	 * We set the slab frozen before releasing the lock. This means
+	 * that no free action will be performed. If it becomes empty
+	 * then we will free it during kmem_cache_reclaim().
+	 */
+	BUG_ON(page_count(page) <= 1);
+	put_page(page);
+
+	/*
+	 * Remove the slab from the lists and mark it frozen
+	 */
+	s = page->slab;
+	if (page->inuse < s->objects)
+		remove_partial(s, page);
+	else if (s->flags & SLAB_STORE_USER)
+		remove_full(s, page);
+	SetSlabFrozen(page);
+	slab_unlock(page);
+	rc = 0;
+out:
+	local_irq_restore(flags);
+	return rc;
+}
+
 /*
  * Vacate all objects in the given slab.
  *
Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c	2007-08-28 20:05:42.000000000 -0700
+++ linux-2.6/mm/vmscan.c	2007-08-28 20:06:22.000000000 -0700
@@ -657,6 +657,7 @@ static int __isolate_lru_page(struct pag
  */
 static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
 		struct list_head *src, struct list_head *dst,
+		struct list_head *slab_pages,
 		unsigned long *scanned, int order, int mode)
 {
 	unsigned long nr_taken = 0;
@@ -730,7 +731,13 @@ static unsigned long isolate_lru_pages(u
 			case -EBUSY:
 				/* else it is being freed elsewhere */
 				list_move(&cursor_page->lru, src);
+				break;
+
 			default:
+				if (slab_pages &&
+				    kmem_cache_isolate_slab(cursor_page) == 0)
+						list_add(&cursor_page->lru,
+							slab_pages);
 				break;
 			}
 		}
@@ -766,6 +773,7 @@ static unsigned long shrink_inactive_lis
 				struct zone *zone, struct scan_control *sc)
 {
 	LIST_HEAD(page_list);
+	LIST_HEAD(slab_list);
 	struct pagevec pvec;
 	unsigned long nr_scanned = 0;
 	unsigned long nr_reclaimed = 0;
@@ -783,7 +791,7 @@ static unsigned long shrink_inactive_lis
 
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
 			     &zone->inactive_list,
-			     &page_list, &nr_scan, sc->order,
+			     &page_list, &slab_list, &nr_scan, sc->order,
 			     (sc->order > PAGE_ALLOC_COSTLY_ORDER)?
 					     ISOLATE_BOTH : ISOLATE_INACTIVE);
 		nr_active = clear_active_flags(&page_list);
@@ -793,6 +801,7 @@ static unsigned long shrink_inactive_lis
 						-(nr_taken - nr_active));
 		zone->pages_scanned += nr_scan;
 		spin_unlock_irq(&zone->lru_lock);
+		kmem_cache_reclaim(&slab_list);
 
 		nr_scanned += nr_scan;
 		nr_freed = shrink_page_list(&page_list, sc);
@@ -934,8 +943,8 @@ force_reclaim_mapped:
 
 	lru_add_drain();
 	spin_lock_irq(&zone->lru_lock);
-	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
-			    &l_hold, &pgscanned, sc->order, ISOLATE_ACTIVE);
+	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list, &l_hold,
+			NULL, &pgscanned, sc->order, ISOLATE_ACTIVE);
 	zone->pages_scanned += pgscanned;
 	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (11 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-19 15:08   ` Rik van Riel
  2007-09-01  1:41 ` [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support Christoph Lameter
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0013-slab_defrag_reclaim_flag.patch --]
[-- Type: text/plain, Size: 3425 bytes --]

Add a flag SlabReclaimable() that is set on slabs with a method
that allows defrag/reclaim. Clear the flag if a reclaim action is not
successful in reducing the number of objects in a slab. The reclaim
flag is set again if all objects have been allocated from it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   42 ++++++++++++++++++++++++++++++++++++------
 1 file changed, 36 insertions(+), 6 deletions(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-08-28 20:10:37.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-08-28 20:10:47.000000000 -0700
@@ -107,6 +107,8 @@
 #define SLABDEBUG 0
 #endif
 
+#define SLABRECLAIMABLE (1 << PG_dirty)
+
 static inline int SlabFrozen(struct page *page)
 {
 	return page->flags & FROZEN;
@@ -137,6 +139,21 @@ static inline void ClearSlabDebug(struct
 	page->flags &= ~SLABDEBUG;
 }
 
+static inline int SlabReclaimable(struct page *page)
+{
+	return page->flags & SLABRECLAIMABLE;
+}
+
+static inline void SetSlabReclaimable(struct page *page)
+{
+	page->flags |= SLABRECLAIMABLE;
+}
+
+static inline void ClearSlabReclaimable(struct page *page)
+{
+	page->flags &= ~SLABRECLAIMABLE;
+}
+
 /*
  * Issues still to be resolved:
  *
@@ -1099,6 +1116,8 @@ static struct page *new_slab(struct kmem
 	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
 			SLAB_STORE_USER | SLAB_TRACE))
 		SetSlabDebug(page);
+	if (s->kick)
+		SetSlabReclaimable(page);
 
  out:
 	if (flags & __GFP_WAIT)
@@ -1155,6 +1174,7 @@ static void discard_slab(struct kmem_cac
 	atomic_long_dec(&n->nr_slabs);
 	reset_page_mapcount(page);
 	__ClearPageSlab(page);
+	ClearSlabReclaimable(page);
 	free_slab(s, page);
 }
 
@@ -1328,8 +1348,12 @@ static void unfreeze_slab(struct kmem_ca
 
 		if (page->freelist)
 			add_partial(n, page, tail);
-		else if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
-			add_full(n, page);
+		else {
+			if (SlabDebug(page) && (s->flags & SLAB_STORE_USER))
+				add_full(n, page);
+			if (s->kick && !SlabReclaimable(page))
+				SetSlabReclaimable(page);
+		}
 		slab_unlock(page);
 
 	} else {
@@ -2659,7 +2683,7 @@ int kmem_cache_isolate_slab(struct page 
 	struct kmem_cache *s;
 	int rc = -ENOENT;
 
-	if (!PageSlab(page) || SlabFrozen(page))
+	if (!PageSlab(page) || SlabFrozen(page) || !SlabReclaimable(page))
 		return rc;
 
 	/*
@@ -2729,7 +2753,7 @@ static int kmem_cache_vacate(struct page
 	struct kmem_cache *s;
 	unsigned long *map;
 	int leftover;
-	int objects;
+	int objects = -1;
 	void *private;
 	unsigned long flags;
 	int tail = 1;
@@ -2739,7 +2763,7 @@ static int kmem_cache_vacate(struct page
 	slab_lock(page);
 
 	s = page->slab;
-	map = scratch + s->objects * sizeof(void **);
+	map = scratch + max_defrag_slab_objects * sizeof(void **);
 	if (!page->inuse || !s->kick)
 		goto out;
 
@@ -2773,10 +2797,13 @@ static int kmem_cache_vacate(struct page
 	local_irq_save(flags);
 	slab_lock(page);
 	tail = 0;
-out:
+
 	/*
 	 * Check the result and unfreeze the slab
 	 */
+	if (page->inuse == objects)
+		ClearSlabReclaimable(page);
+out:
 	leftover = page->inuse;
 	unfreeze_slab(s, page, tail);
 	local_irq_restore(flags);
@@ -2831,6 +2858,9 @@ static unsigned long __kmem_cache_shrink
 		if (inuse > s->objects / 4)
 			continue;
 
+		if (s->kick && !SlabReclaimable(page))
+			continue;
+
 		if (!slab_trylock(page))
 			continue;
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (12 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  2:04   ` KAMEZAWA Hiroyuki
  2007-09-01  1:41 ` [RFC 15/26] bufferhead: Revert constructor removal Christoph Lameter
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0014-slab_defrag_movable.patch --]
[-- Type: text/plain, Size: 2707 bytes --]

Slabs that are reclaimable fit the definition of the objects in
ZONE_MOVABLE. So set __GFP_MOVABLE on them (this only works
on platforms where there is no HIGHMEM. Hopefully that restriction
will vanish at some point).

Also add the SLAB_TEMPORARY flag for slab caches that allocate objects with
a short lifetime. Slabs with SLAB_TEMPORARY also are allocated with
__GFP_MOVABLE. Reclaim on them works by isolating the slab for awhile and
waiting for the objects to expire.

The skbuff_head_cache is a prime example of such a slab. Add the
SLAB_TEMPORARY flag to it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 include/linux/slab.h |    1 +
 mm/slub.c            |    8 +++++++-
 net/core/skbuff.c    |    2 +-
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 2923861..daffc22 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -23,6 +23,7 @@
 #define SLAB_POISON		0x00000800UL	/* DEBUG: Poison objects */
 #define SLAB_HWCACHE_ALIGN	0x00002000UL	/* Align objs on cache lines */
 #define SLAB_CACHE_DMA		0x00004000UL	/* Use GFP_DMA memory */
+#define SLAB_TEMPORARY		0x00008000UL	/* Only volatile objects */
 #define SLAB_STORE_USER		0x00010000UL	/* DEBUG: Store the last owner for bug hunting */
 #define SLAB_RECLAIM_ACCOUNT	0x00020000UL	/* Objects are reclaimable */
 #define SLAB_PANIC		0x00040000UL	/* Panic if kmem_cache_create() fails */
diff --git a/mm/slub.c b/mm/slub.c
index bad5291..85ba259 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1040,6 +1040,11 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (s->flags & SLAB_CACHE_DMA)
 		flags |= SLUB_DMA;
 
+#ifndef CONFIG_HIGHMEM
+	if (s->kick || s->flags & SLAB_TEMPORARY)
+		flags |= __GFP_MOVABLE;
+#endif
+
 	if (node == -1)
 		page = alloc_pages(flags, s->order);
 	else
@@ -1118,7 +1123,8 @@ static struct page *new_slab(struct kmem_cache *s, gfp_t flags, int node)
 	if (s->flags & (SLAB_DEBUG_FREE | SLAB_RED_ZONE | SLAB_POISON |
 			SLAB_STORE_USER | SLAB_TRACE))
 		SetSlabDebug(page);
-	if (s->kick)
+
+	if (s->flags & SLAB_TEMPORARY || s->kick)
 		SetSlabReclaimable(page);
 
  out:
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 35021eb..51b2236 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2020,7 +2020,7 @@ void __init skb_init(void)
 	skbuff_head_cache = kmem_cache_create("skbuff_head_cache",
 					      sizeof(struct sk_buff),
 					      0,
-					      SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+			      SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TEMPORARY,
 					      NULL);
 	skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
 						(2*sizeof(struct sk_buff)) +
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 15/26] bufferhead: Revert constructor removal
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (13 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 16/26] Buffer heads: Support slab defrag Christoph Lameter
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0015-slab_defrag_buffer_head_revert.patch --]
[-- Type: text/plain, Size: 1554 bytes --]

The constructor for buffer_head slabs was removed recently. We need
the constructor in order to insure that slab objects always have a definite
state even before we allocated them.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

---
 fs/buffer.c |   19 +++++++++++++++----
 1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 0e5ec37..f4824d1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2960,9 +2960,8 @@ static void recalc_bh_state(void)
 	
 struct buffer_head *alloc_buffer_head(gfp_t gfp_flags)
 {
-	struct buffer_head *ret = kmem_cache_zalloc(bh_cachep, gfp_flags);
+	struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags);
 	if (ret) {
-		INIT_LIST_HEAD(&ret->b_assoc_buffers);
 		get_cpu_var(bh_accounting).nr++;
 		recalc_bh_state();
 		put_cpu_var(bh_accounting);
@@ -3003,12 +3002,24 @@ static int buffer_cpu_notify(struct notifier_block *self,
 	return NOTIFY_OK;
 }
 
+static void
+init_buffer_head(void *data, struct kmem_cache *cachep, unsigned long flags)
+{
+	struct buffer_head * bh = (struct buffer_head *)data;
+
+	memset(bh, 0, sizeof(*bh));
+	INIT_LIST_HEAD(&bh->b_assoc_buffers);
+}
+
 void __init buffer_init(void)
 {
 	int nrpages;
 
-	bh_cachep = KMEM_CACHE(buffer_head,
-			SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
+	bh_cachep = kmem_cache_create("buffer_head",
+			sizeof(struct buffer_head), 0,
+				(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
+				SLAB_MEM_SPREAD),
+				init_buffer_head);
 
 	/*
 	 * Limit the bh occupancy to 10% of ZONE_NORMAL
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 16/26] Buffer heads: Support slab defrag
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (14 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 15/26] bufferhead: Revert constructor removal Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 17/26] inodes: Support generic defragmentation Christoph Lameter
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0016-slab_defrag_buffer_head.patch --]
[-- Type: text/plain, Size: 3205 bytes --]

Defragmentation support for buffer heads. We convert the references to
buffers to struct page references and try to remove the buffers from
those pages. If the pages are dirty then trigger writeout so that the
buffer heads can be removed later.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/buffer.c |  101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c	2007-08-28 20:13:08.000000000 -0700
+++ linux-2.6/fs/buffer.c	2007-08-28 20:14:30.000000000 -0700
@@ -3011,6 +3011,106 @@ init_buffer_head(void *data, struct kmem
 	INIT_LIST_HEAD(&bh->b_assoc_buffers);
 }
 
+/*
+ * Writeback a page to clean the dirty state
+ */
+static void trigger_write(struct page *page)
+{
+	struct address_space *mapping = page_mapping(page);
+	int rc;
+	struct writeback_control wbc = {
+		.sync_mode = WB_SYNC_NONE,
+		.nr_to_write = 1,
+		.range_start = 0,
+		.range_end = LLONG_MAX,
+		.nonblocking = 1,
+		.for_reclaim = 0
+	};
+
+	if (!mapping->a_ops->writepage)
+		/* No write method for the address space */
+		return;
+
+	if (!clear_page_dirty_for_io(page))
+		/* Someone else already triggered a write */
+		return;
+
+	rc = mapping->a_ops->writepage(page, &wbc);
+	if (rc < 0)
+		/* I/O Error writing */
+		return;
+
+	if (rc == AOP_WRITEPAGE_ACTIVATE)
+		unlock_page(page);
+}
+
+/*
+ * Get references on buffers.
+ *
+ * We obtain references on the page that uses the buffer. v[i] will point to
+ * the corresponding page after get_buffers() is through.
+ *
+ * We are safe from the underlying page being removed simply by doing
+ * a get_page_unless_zero. The buffer head removal may race at will.
+ * try_to_free_buffes will later take appropriate locks to remove the
+ * buffers if they are still there.
+ */
+static void *get_buffers(struct kmem_cache *s, int nr, void **v)
+{
+	struct page *page;
+	struct buffer_head *bh;
+	int i,j;
+	int n = 0;
+
+	for (i = 0; i < nr; i++) {
+		bh = v[i];
+		v[i] = NULL;
+
+		page = bh->b_page;
+
+		if (page && PagePrivate(page)) {
+			for (j = 0; j < n; j++)
+				if (page == v[j])
+					goto cont;
+		}
+
+		if (get_page_unless_zero(page))
+			v[n++] = page;
+cont:	;
+	}
+	return NULL;
+}
+
+/*
+ * Despite its name: kick_buffers operates on a list of pointers to
+ * page structs that was setup by get_buffer
+ */
+static void kick_buffers(struct kmem_cache *s, int nr, void **v,
+							void *private)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < nr; i++) {
+		page = v[i];
+
+		if (!page || PageWriteback(page))
+			continue;
+
+
+		if (!TestSetPageLocked(page)) {
+			if (PageDirty(page))
+				trigger_write(page);
+			else {
+				if (PagePrivate(page))
+					try_to_free_buffers(page);
+				unlock_page(page);
+			}
+		}
+		put_page(page);
+	}
+}
+
 void __init buffer_init(void)
 {
 	int nrpages;
@@ -3020,6 +3120,7 @@ void __init buffer_init(void)
 				(SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|
 				SLAB_MEM_SPREAD),
 				init_buffer_head);
+	kmem_cache_setup_defrag(bh_cachep, get_buffers, kick_buffers);
 
 	/*
 	 * Limit the bh occupancy to 10% of ZONE_NORMAL

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 17/26] inodes: Support generic defragmentation
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (15 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 16/26] Buffer heads: Support slab defrag Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 18/26] FS: ExtX filesystem defrag Christoph Lameter
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0017-slab_defrag_generic_inode_defrag.patch --]
[-- Type: text/plain, Size: 3950 bytes --]

This implements the ability to remove inodes in a particular slab
from inode cache. In order to remove an inode we may have to write out
the pages of an inode, the inode itself and remove the dentries referring
to the node.

Provide generic functionality that can be used by filesystems that have
their own inode caches to also tie into the defragmentation functions
that are made available here.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/inode.c         |   95 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/fs.h |    5 ++
 2 files changed, 100 insertions(+)

Index: linux-2.6/fs/inode.c
===================================================================
--- linux-2.6.orig/fs/inode.c	2007-08-28 19:48:07.000000000 -0700
+++ linux-2.6/fs/inode.c	2007-08-28 20:15:26.000000000 -0700
@@ -1351,6 +1351,100 @@ static int __init set_ihash_entries(char
 }
 __setup("ihash_entries=", set_ihash_entries);
 
+static void *get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	int i;
+
+	spin_lock(&inode_lock);
+	for (i = 0; i < nr; i++) {
+		struct inode *inode = v[i];
+
+		if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+			v[i] = NULL;
+		else
+			__iget(inode);
+	}
+	spin_unlock(&inode_lock);
+	return NULL;
+}
+
+/*
+ * Function for filesystems that embedd struct inode into their own
+ * structures. The offset is the offset of the struct inode in the fs inode.
+ */
+void *fs_get_inodes(struct kmem_cache *s, int nr, void **v,
+						unsigned long offset)
+{
+	int i;
+
+	for (i = 0; i < nr; i++)
+		v[i] += offset;
+
+	return get_inodes(s, nr, v);
+}
+EXPORT_SYMBOL(fs_get_inodes);
+
+void kick_inodes(struct kmem_cache *s, int nr, void **v, void *private)
+{
+	struct inode *inode;
+	int i;
+	int abort = 0;
+	LIST_HEAD(freeable);
+	struct super_block *sb;
+
+	for (i = 0; i < nr; i++) {
+		inode = v[i];
+		if (!inode)
+			continue;
+
+		if (inode_has_buffers(inode) || inode->i_data.nrpages) {
+			if (remove_inode_buffers(inode))
+				invalidate_mapping_pages(&inode->i_data,
+								0, -1);
+		}
+
+		/* Invalidate children and dentry */
+		if (S_ISDIR(inode->i_mode)) {
+			struct dentry *d = d_find_alias(inode);
+
+			if (d) {
+				d_invalidate(d);
+				dput(d);
+			}
+		}
+
+		if (inode->i_state & I_DIRTY)
+			write_inode_now(inode, 1);
+
+		d_prune_aliases(inode);
+	}
+
+	mutex_lock(&iprune_mutex);
+	for (i = 0; i < nr; i++) {
+		inode = v[i];
+		if (!inode)
+			continue;
+
+		sb = inode->i_sb;
+		iput(inode);
+		if (abort || !(sb->s_flags & MS_ACTIVE))
+			continue;
+
+		spin_lock(&inode_lock);
+		abort =  !can_unuse(inode);
+
+		if (!abort) {
+			list_move(&inode->i_list, &freeable);
+			inode->i_state |= I_FREEING;
+			inodes_stat.nr_unused--;
+		}
+		spin_unlock(&inode_lock);
+	}
+	dispose_list(&freeable);
+	mutex_unlock(&iprune_mutex);
+}
+EXPORT_SYMBOL(kick_inodes);
+
 /*
  * Initialize the waitqueues and inode hash table.
  */
@@ -1390,6 +1484,7 @@ void __init inode_init(unsigned long mem
 					 SLAB_MEM_SPREAD),
 					 init_once);
 	register_shrinker(&icache_shrinker);
+	kmem_cache_setup_defrag(inode_cachep, get_inodes, kick_inodes);
 
 	/* Hash may have been set up in inode_init_early */
 	if (!hashdist)
Index: linux-2.6/include/linux/fs.h
===================================================================
--- linux-2.6.orig/include/linux/fs.h	2007-08-28 19:48:07.000000000 -0700
+++ linux-2.6/include/linux/fs.h	2007-08-28 20:15:26.000000000 -0700
@@ -1644,6 +1644,11 @@ static inline void insert_inode_hash(str
 	__insert_inode_hash(inode, inode->i_ino);
 }
 
+/* Helper functions for inode defragmentation support in filesystems */
+extern void kick_inodes(struct kmem_cache *, int, void **, void *);
+extern void *fs_get_inodes(struct kmem_cache *, int nr, void **,
+						unsigned long offset);
+
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
 extern void file_kill(struct file *f);

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 18/26] FS: ExtX filesystem defrag
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (16 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 17/26] inodes: Support generic defragmentation Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  9:48   ` Jeff Garzik
  2007-09-01  1:41 ` [RFC 19/26] FS: XFS slab defragmentation Christoph Lameter
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0018-slab_defrag_ext234.patch --]
[-- Type: text/plain, Size: 2717 bytes --]

Support defragmentation for extX filesystem inodes

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/ext2/super.c |    9 +++++++++
 fs/ext3/super.c |    8 ++++++++
 fs/ext4/super.c |    8 ++++++++
 3 files changed, 25 insertions(+)

Index: linux-2.6/fs/ext2/super.c
===================================================================
--- linux-2.6.orig/fs/ext2/super.c	2007-08-28 19:48:06.000000000 -0700
+++ linux-2.6/fs/ext2/super.c	2007-08-28 20:16:05.000000000 -0700
@@ -168,6 +168,12 @@ static void init_once(void * foo, struct
 	inode_init_once(&ei->vfs_inode);
 }
 
+static void *ext2_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct ext2_inode_info, vfs_inode));
+}
+
 static int init_inodecache(void)
 {
 	ext2_inode_cachep = kmem_cache_create("ext2_inode_cache",
@@ -177,6 +183,9 @@ static int init_inodecache(void)
 					     init_once);
 	if (ext2_inode_cachep == NULL)
 		return -ENOMEM;
+
+	kmem_cache_setup_defrag(ext2_inode_cachep,
+			ext2_get_inodes, kick_inodes);
 	return 0;
 }
 
Index: linux-2.6/fs/ext3/super.c
===================================================================
--- linux-2.6.orig/fs/ext3/super.c	2007-08-28 19:48:06.000000000 -0700
+++ linux-2.6/fs/ext3/super.c	2007-08-28 20:16:05.000000000 -0700
@@ -484,6 +484,12 @@ static void init_once(void * foo, struct
 	inode_init_once(&ei->vfs_inode);
 }
 
+static void *ext3_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct ext3_inode_info, vfs_inode));
+}
+
 static int init_inodecache(void)
 {
 	ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
@@ -493,6 +499,8 @@ static int init_inodecache(void)
 					     init_once);
 	if (ext3_inode_cachep == NULL)
 		return -ENOMEM;
+	kmem_cache_setup_defrag(ext3_inode_cachep,
+			ext3_get_inodes, kick_inodes);
 	return 0;
 }
 
Index: linux-2.6/fs/ext4/super.c
===================================================================
--- linux-2.6.orig/fs/ext4/super.c	2007-08-28 19:48:06.000000000 -0700
+++ linux-2.6/fs/ext4/super.c	2007-08-28 20:16:05.000000000 -0700
@@ -535,6 +535,12 @@ static void init_once(void * foo, struct
 	inode_init_once(&ei->vfs_inode);
 }
 
+static void *ext4_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct ext4_inode_info, vfs_inode));
+}
+
 static int init_inodecache(void)
 {
 	ext4_inode_cachep = kmem_cache_create("ext4_inode_cache",
@@ -544,6 +550,8 @@ static int init_inodecache(void)
 					     init_once);
 	if (ext4_inode_cachep == NULL)
 		return -ENOMEM;
+	kmem_cache_setup_defrag(ext4_inode_cachep,
+			ext4_get_inodes, kick_inodes);
 	return 0;
 }
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 19/26] FS: XFS slab defragmentation
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (17 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 18/26] FS: ExtX filesystem defrag Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 20/26] FS: Proc filesystem support for slab defrag Christoph Lameter
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0019-slab_defrag_xfs.patch --]
[-- Type: text/plain, Size: 996 bytes --]

Support inode defragmentation for xfs

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 4528f9a..e60c90e 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -363,6 +363,11 @@ xfs_fs_inode_init_once(
 	inode_init_once(vn_to_inode((bhv_vnode_t *)vnode));
 }
 
+static void *xfs_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v, offsetof(bhv_vnode_t, v_inode));
+};
+
 STATIC int
 xfs_init_zones(void)
 {
@@ -376,6 +381,7 @@ xfs_init_zones(void)
 	xfs_ioend_zone = kmem_zone_init(sizeof(xfs_ioend_t), "xfs_ioend");
 	if (!xfs_ioend_zone)
 		goto out_destroy_vnode_zone;
+	kmem_cache_setup_defrag(xfs_vnode_zone, xfs_get_inodes, kick_inodes);
 
 	xfs_ioend_pool = mempool_create_slab_pool(4 * MAX_BUF_PER_PAGE,
 						  xfs_ioend_zone);
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 20/26] FS: Proc filesystem support for slab defrag
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (18 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 19/26] FS: XFS slab defragmentation Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 21/26] FS: Slab defrag: Reiserfs support Christoph Lameter
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0020-slab_defrag_proc.patch --]
[-- Type: text/plain, Size: 947 bytes --]

Support procfs inode defragmentation

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/proc/inode.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index a5b0dfd..83a66d7 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -113,6 +113,12 @@ static void init_once(void * foo, struct kmem_cache * cachep, unsigned long flag
 	inode_init_once(&ei->vfs_inode);
 }
 
+static void *proc_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct proc_inode, vfs_inode));
+};
+
 int __init proc_init_inodecache(void)
 {
 	proc_inode_cachep = kmem_cache_create("proc_inode_cache",
@@ -122,6 +128,8 @@ int __init proc_init_inodecache(void)
 					     init_once);
 	if (proc_inode_cachep == NULL)
 		return -ENOMEM;
+	kmem_cache_setup_defrag(proc_inode_cachep,
+				proc_get_inodes, kick_inodes);
 	return 0;
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 21/26] FS: Slab defrag: Reiserfs support
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (19 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 20/26] FS: Proc filesystem support for slab defrag Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 22/26] FS: Socket inode defragmentation Christoph Lameter
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0021-slab_defrag_reiserfs.patch --]
[-- Type: text/plain, Size: 981 bytes --]

Slab defragmentation: Support reiserfs inode defragmentation

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/reiserfs/super.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index 5b68dd3..0344be9 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -520,6 +520,12 @@ static void init_once(void *foo, struct kmem_cache * cachep, unsigned long flags
 #endif
 }
 
+static void *reiserfs_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct reiserfs_inode_info, vfs_inode));
+}
+
 static int init_inodecache(void)
 {
 	reiserfs_inode_cachep = kmem_cache_create("reiser_inode_cache",
@@ -530,6 +536,8 @@ static int init_inodecache(void)
 						  init_once);
 	if (reiserfs_inode_cachep == NULL)
 		return -ENOMEM;
+	kmem_cache_setup_defrag(reiserfs_inode_cachep,
+			reiserfs_get_inodes, kick_inodes);
 	return 0;
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 22/26] FS: Socket inode defragmentation
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (20 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 21/26] FS: Slab defrag: Reiserfs support Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 23/26] dentries: Extract common code to remove dentry from lru Christoph Lameter
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0022-slab_defrag_socket.patch --]
[-- Type: text/plain, Size: 928 bytes --]

Support inode defragmentation for sockets

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 net/socket.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index ec07703..89fc7a5 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -264,6 +264,12 @@ static void init_once(void *foo, struct kmem_cache *cachep, unsigned long flags)
 	inode_init_once(&ei->vfs_inode);
 }
 
+static void *sock_get_inodes(struct kmem_cache *s, int nr, void **v)
+{
+	return fs_get_inodes(s, nr, v,
+		offsetof(struct socket_alloc, vfs_inode));
+}
+
 static int init_inodecache(void)
 {
 	sock_inode_cachep = kmem_cache_create("sock_inode_cache",
@@ -275,6 +281,8 @@ static int init_inodecache(void)
 					      init_once);
 	if (sock_inode_cachep == NULL)
 		return -ENOMEM;
+	kmem_cache_setup_defrag(sock_inode_cachep,
+			sock_get_inodes, kick_inodes);
 	return 0;
 }
 
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 23/26] dentries: Extract common code to remove dentry from lru
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (21 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 22/26] FS: Socket inode defragmentation Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 24/26] dentries: Add constructor Christoph Lameter
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0023-slab_defrag_dentry_remove_lru.patch --]
[-- Type: text/plain, Size: 3184 bytes --]

Extract the common code to remove a dentry from the lru into a new function
dentry_lru_remove().

Two call sites used list_del() instead of list_del_init(). AFAIK the
performance of both is the same. dentry_lru_remove() does a list_del_init().

As a result dentry->d_lru is now always empty when a dentry is freed.
A consistent state is useful to establish dentry state from slab defrag.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/dcache.c |   42 ++++++++++++++----------------------------
 1 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 678d39d..71e4877 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -95,6 +95,14 @@ static void d_free(struct dentry *dentry)
 		call_rcu(&dentry->d_u.d_rcu, d_callback);
 }
 
+static void dentry_lru_remove(struct dentry *dentry)
+{
+	if (!list_empty(&dentry->d_lru)) {
+		list_del_init(&dentry->d_lru);
+		dentry_stat.nr_unused--;
+	}
+}
+
 /*
  * Release the dentry's inode, using the filesystem
  * d_iput() operation if defined.
@@ -211,13 +219,7 @@ repeat:
 unhash_it:
 	__d_drop(dentry);
 kill_it:
-	/* If dentry was on d_lru list
-	 * delete it from there
-	 */
-	if (!list_empty(&dentry->d_lru)) {
-		list_del(&dentry->d_lru);
-		dentry_stat.nr_unused--;
-	}
+	dentry_lru_remove(dentry);
 	dentry = d_kill(dentry);
 	if (dentry)
 		goto repeat;
@@ -285,10 +287,7 @@ int d_invalidate(struct dentry * dentry)
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
 	atomic_inc(&dentry->d_count);
-	if (!list_empty(&dentry->d_lru)) {
-		dentry_stat.nr_unused--;
-		list_del_init(&dentry->d_lru);
-	}
+	dentry_lru_remove(dentry);
 	return dentry;
 }
 
@@ -407,10 +406,7 @@ static void prune_one_dentry(struct dentry * dentry, int prune_parents)
 
 		if (dentry->d_op && dentry->d_op->d_delete)
 			dentry->d_op->d_delete(dentry);
-		if (!list_empty(&dentry->d_lru)) {
-			list_del(&dentry->d_lru);
-			dentry_stat.nr_unused--;
-		}
+		dentry_lru_remove(dentry);
 		__d_drop(dentry);
 		dentry = d_kill(dentry);
 		spin_lock(&dcache_lock);
@@ -600,10 +596,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 
 	/* detach this root from the system */
 	spin_lock(&dcache_lock);
-	if (!list_empty(&dentry->d_lru)) {
-		dentry_stat.nr_unused--;
-		list_del_init(&dentry->d_lru);
-	}
+	dentry_lru_remove(dentry);
 	__d_drop(dentry);
 	spin_unlock(&dcache_lock);
 
@@ -617,11 +610,7 @@ static void shrink_dcache_for_umount_subtree(struct dentry *dentry)
 			spin_lock(&dcache_lock);
 			list_for_each_entry(loop, &dentry->d_subdirs,
 					    d_u.d_child) {
-				if (!list_empty(&loop->d_lru)) {
-					dentry_stat.nr_unused--;
-					list_del_init(&loop->d_lru);
-				}
-
+				dentry_lru_remove(dentry);
 				__d_drop(loop);
 				cond_resched_lock(&dcache_lock);
 			}
@@ -803,10 +792,7 @@ resume:
 		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
 		next = tmp->next;
 
-		if (!list_empty(&dentry->d_lru)) {
-			dentry_stat.nr_unused--;
-			list_del_init(&dentry->d_lru);
-		}
+		dentry_lru_remove(dentry);
 		/* 
 		 * move only zero ref count dentries to the end 
 		 * of the unused list for prune_dcache
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 24/26] dentries: Add constructor
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (22 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 23/26] dentries: Extract common code to remove dentry from lru Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 25/26] dentries: dentry defragmentation Christoph Lameter
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0024-slab_defrag_dentry_state.patch --]
[-- Type: text/plain, Size: 2137 bytes --]

In order to support defragmentation on the dentry cache we need to have
an determined object state at all times. Without a destructor the object
would have a random state after allocation.

So provide a constructor.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/dcache.c |   26 ++++++++++++++------------
 1 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 71e4877..282a467 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -874,6 +874,16 @@ static struct shrinker dcache_shrinker = {
 	.seeks = DEFAULT_SEEKS,
 };
 
+void dcache_ctor(void *p, struct kmem_cache *s, unsigned long flags)
+{
+	struct dentry *dentry = p;
+
+	spin_lock_init(&dentry->d_lock);
+	dentry->d_inode = NULL;
+	INIT_LIST_HEAD(&dentry->d_lru);
+	INIT_LIST_HEAD(&dentry->d_alias);
+}
+
 /**
  * d_alloc	-	allocate a dcache entry
  * @parent: parent of entry to allocate
@@ -911,8 +921,6 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 
 	atomic_set(&dentry->d_count, 1);
 	dentry->d_flags = DCACHE_UNHASHED;
-	spin_lock_init(&dentry->d_lock);
-	dentry->d_inode = NULL;
 	dentry->d_parent = NULL;
 	dentry->d_sb = NULL;
 	dentry->d_op = NULL;
@@ -922,9 +930,7 @@ struct dentry *d_alloc(struct dentry * parent, const struct qstr *name)
 	dentry->d_cookie = NULL;
 #endif
 	INIT_HLIST_NODE(&dentry->d_hash);
-	INIT_LIST_HEAD(&dentry->d_lru);
 	INIT_LIST_HEAD(&dentry->d_subdirs);
-	INIT_LIST_HEAD(&dentry->d_alias);
 
 	if (parent) {
 		dentry->d_parent = dget(parent);
@@ -2098,14 +2104,10 @@ static void __init dcache_init(unsigned long mempages)
 {
 	int loop;
 
-	/* 
-	 * A constructor could be added for stable state like the lists,
-	 * but it is probably not worth it because of the cache nature
-	 * of the dcache. 
-	 */
-	dentry_cache = KMEM_CACHE(dentry,
-		SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD);
-	
+	dentry_cache = kmem_cache_create("dentry_cache", sizeof(struct dentry),
+		0, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC|SLAB_MEM_SPREAD,
+		dcache_ctor);
+
 	register_shrinker(&dcache_shrinker);
 
 	/* Hash may have been set up in dcache_init_early */
-- 
1.5.2.4

-- 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [RFC 25/26] dentries: dentry defragmentation
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (23 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 24/26] dentries: Add constructor Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-01  1:41 ` [RFC 26/26] SLUB: Add debugging for slab defrag Christoph Lameter
  2007-09-06 20:34 ` [RFC 00/26] Slab defragmentation V5 Jörn Engel
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0025-slab_defrag_dentry_defrag.patch --]
[-- Type: text/plain, Size: 3884 bytes --]

kick() is called after get() has been used and after the slab has dropped
all of its own locks. The dentry pruning for unused entries works in a
straightforward way.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 fs/dcache.c |  100 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 99 insertions(+), 1 deletion(-)

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c	2007-08-29 18:55:21.000000000 -0700
+++ linux-2.6/fs/dcache.c	2007-08-29 18:57:51.000000000 -0700
@@ -143,7 +143,10 @@ static struct dentry *d_kill(struct dent
 
 	list_del(&dentry->d_u.d_child);
 	dentry_stat.nr_dentry--;	/* For d_free, below */
-	/*drops the locks, at that point nobody can reach this dentry */
+	/*
+	 * drops the locks, at that point nobody (aside from defrag)
+	 * can reach this dentry
+	 */
 	dentry_iput(dentry);
 	parent = dentry->d_parent;
 	d_free(dentry);
@@ -2100,6 +2103,100 @@ static void __init dcache_init_early(voi
 		INIT_HLIST_HEAD(&dentry_hashtable[loop]);
 }
 
+/*
+ * The slab allocator is holding off frees. We can safely examine
+ * the object without the danger of it vanishing from under us.
+ */
+static void *get_dentries(struct kmem_cache *s, int nr, void **v)
+{
+	struct dentry *dentry;
+	int i;
+
+	spin_lock(&dcache_lock);
+	for (i = 0; i < nr; i++) {
+		dentry = v[i];
+
+		/*
+		 * Three sorts of dentries cannot be reclaimed:
+		 *
+		 * 1. dentries that are in the process of being allocated
+		 *    or being freed. In that case the dentry is neither
+		 *    on the LRU nor hashed.
+		 *
+		 * 2. Fake hashed entries as used for anonymous dentries
+		 *    and pipe I/O. The fake hashed entries have d_flags
+		 *    set to indicate a hashed entry. However, the
+		 *    d_hash field indicates that the entry is not hashed.
+		 *
+		 * 3. dentries that have a backing store that is not
+		 *    writable. This is true for tmpsfs and other in
+		 *    memory filesystems. Removing dentries from them
+		 *    would loose dentries for good.
+		 */
+		if ((d_unhashed(dentry) && list_empty(&dentry->d_lru)) ||
+		   (!d_unhashed(dentry) && hlist_unhashed(&dentry->d_hash)) ||
+		   (dentry->d_inode &&
+		   !mapping_cap_writeback_dirty(dentry->d_inode->i_mapping)))
+		   	/* Ignore this dentry */
+			v[i] = NULL;
+		else
+			/* dget_locked will remove the dentry from the LRU */
+			dget_locked(dentry);
+	}
+	spin_unlock(&dcache_lock);
+	return NULL;
+}
+
+/*
+ * Slab has dropped all the locks. Get rid of the refcount obtained
+ * earlier and also free the object.
+ */
+static void kick_dentries(struct kmem_cache *s,
+				int nr, void **v, void *private)
+{
+	struct dentry *dentry;
+	int i;
+
+	/*
+	 * First invalidate the dentries without holding the dcache lock
+	 */
+	for (i = 0; i < nr; i++) {
+		dentry = v[i];
+
+		if (dentry)
+			d_invalidate(dentry);
+	}
+
+	/*
+	 * If we are the last one holding a reference then the dentries can
+	 * be freed. We need the dcache_lock.
+	 */
+	spin_lock(&dcache_lock);
+	for (i = 0; i < nr; i++) {
+		dentry = v[i];
+		if (!dentry)
+			continue;
+
+		spin_lock(&dentry->d_lock);
+		if (atomic_read(&dentry->d_count) > 1) {
+			spin_unlock(&dentry->d_lock);
+			spin_unlock(&dcache_lock);
+			dput(dentry);
+			spin_lock(&dcache_lock);
+			continue;
+		}
+
+		prune_one_dentry(dentry, 1);
+	}
+	spin_unlock(&dcache_lock);
+
+	/*
+	 * dentries are freed using RCU so we need to wait until RCU
+	 * operations are complete
+	 */
+	synchronize_rcu();
+}
+
 static void __init dcache_init(unsigned long mempages)
 {
 	int loop;
@@ -2109,6 +2206,7 @@ static void __init dcache_init(unsigned 
 		dcache_ctor);
 
 	register_shrinker(&dcache_shrinker);
+	kmem_cache_setup_defrag(dentry_cache, get_dentries, kick_dentries);
 
 	/* Hash may have been set up in dcache_init_early */
 	if (!hashdist)

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [RFC 26/26] SLUB: Add debugging for slab defrag
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (24 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 25/26] dentries: dentry defragmentation Christoph Lameter
@ 2007-09-01  1:41 ` Christoph Lameter
  2007-09-06 20:34 ` [RFC 00/26] Slab defragmentation V5 Jörn Engel
  26 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  1:41 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, linux-mm, linux-fsdevel, Christoph Hellwig,
	Mel Gorman, David Chinner

[-- Attachment #1: 0026-debug.patch --]
[-- Type: text/plain, Size: 2179 bytes --]

Add some debugging printks for slab defragmentation

Signed-off-by: Christoph Lameter <clameter@sgi.com>
---
 mm/slub.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c	2007-08-28 20:11:34.000000000 -0700
+++ linux-2.6/mm/slub.c	2007-08-28 20:21:39.000000000 -0700
@@ -2697,8 +2697,10 @@ int kmem_cache_isolate_slab(struct page 
 	 * This is necessary to make sure that the page does not vanish
 	 * from under us before we are able to check the result.
 	 */
-	if (!get_page_unless_zero(page))
+	if (!get_page_unless_zero(page)) {
+		printk(KERN_ERR "isolate %p zero ref\n", page);
 		return rc;
+	}
 
 	local_irq_save(flags);
 	slab_lock(page);
@@ -2712,6 +2714,8 @@ int kmem_cache_isolate_slab(struct page 
 	if (!PageSlab(page) || SlabFrozen(page) || !page->inuse) {
 		slab_unlock(page);
 		put_page(page);
+		printk(KERN_ERR "isolate faillock %p flags=%lx %s\n",
+			page, page->flags, PageSlab(page)?page->slab->name:"--");
 		goto out;
 	}
 
@@ -2739,6 +2743,7 @@ int kmem_cache_isolate_slab(struct page 
 	SetSlabFrozen(page);
 	slab_unlock(page);
 	rc = 0;
+	printk(KERN_ERR "Isolated %s slab=%p objects=%d\n", s->name, page, page->inuse);
 out:
 	local_irq_restore(flags);
 	return rc;
@@ -2809,6 +2814,8 @@ static int kmem_cache_vacate(struct page
 	 */
 	if (page->inuse == objects)
 		ClearSlabReclaimable(page);
+	printk(KERN_ERR "Finish vacate %s slab=%p objects=%d->%d\n",
+		s->name, page, objects, page->inuse);
 out:
 	leftover = page->inuse;
 	unfreeze_slab(s, page, tail);
@@ -2826,6 +2833,7 @@ int kmem_cache_reclaim(struct list_head 
 	void **scratch;
 	struct page *page;
 	struct page *page2;
+	int pages = 0;
 
 	if (list_empty(zaplist))
 		return 0;
@@ -2836,10 +2844,13 @@ int kmem_cache_reclaim(struct list_head 
 
 	list_for_each_entry_safe(page, page2, zaplist, lru) {
 		list_del(&page->lru);
+		pages++;
 		if (kmem_cache_vacate(page, scratch) == 0)
 				freed++;
 	}
 	kfree(scratch);
+	printk(KERN_ERR "kmem_cache_reclaim recovered %d of %d slabs.\n",
+			freed, pages);
 	return freed;
 }
 

-- 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support
  2007-09-01  1:41 ` [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support Christoph Lameter
@ 2007-09-01  2:04   ` KAMEZAWA Hiroyuki
  2007-09-01  2:07     ` Christoph Lameter
  0 siblings, 1 reply; 34+ messages in thread
From: KAMEZAWA Hiroyuki @ 2007-09-01  2:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: apw, linux-kernel, linux-mm, linux-fsdevel, hch, mel, dgc

On Fri, 31 Aug 2007 18:41:21 -0700
Christoph Lameter <clameter@sgi.com> wrote:

> +#ifndef CONFIG_HIGHMEM
> +	if (s->kick || s->flags & SLAB_TEMPORARY)
> +		flags |= __GFP_MOVABLE;
> +#endif
> +

Should I do this as

#if !defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTREMOVE)

?

-Kame

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support
  2007-09-01  2:04   ` KAMEZAWA Hiroyuki
@ 2007-09-01  2:07     ` Christoph Lameter
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-01  2:07 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: apw, linux-kernel, linux-mm, linux-fsdevel, hch, mel, dgc

On Sat, 1 Sep 2007, KAMEZAWA Hiroyuki wrote:

> On Fri, 31 Aug 2007 18:41:21 -0700
> Christoph Lameter <clameter@sgi.com> wrote:
> 
> > +#ifndef CONFIG_HIGHMEM
> > +	if (s->kick || s->flags & SLAB_TEMPORARY)
> > +		flags |= __GFP_MOVABLE;
> > +#endif
> > +
> 
> Should I do this as
> 
> #if !defined(CONFIG_HIGHMEM) && !defined(CONFIG_MEMORY_HOTREMOVE)

Hmmm.... Not sure... I think the use of __GFP_MOVABLE the way it is up 
there will change as soon as Mel's antifrag patchset is merged.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 18/26] FS: ExtX filesystem defrag
  2007-09-01  1:41 ` [RFC 18/26] FS: ExtX filesystem defrag Christoph Lameter
@ 2007-09-01  9:48   ` Jeff Garzik
  2007-09-02 11:37     ` Christoph Lameter
  0 siblings, 1 reply; 34+ messages in thread
From: Jeff Garzik @ 2007-09-01  9:48 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andy Whitcroft, linux-kernel, linux-mm, linux-fsdevel,
	Christoph Hellwig, Mel Gorman, David Chinner

Please add 'slab' to the title, otherwise you conflict with a feature of 
the same name...



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 18/26] FS: ExtX filesystem defrag
  2007-09-01  9:48   ` Jeff Garzik
@ 2007-09-02 11:37     ` Christoph Lameter
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-02 11:37 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andy Whitcroft, linux-kernel, linux-mm, linux-fsdevel,
	Christoph Hellwig, Mel Gorman, David Chinner

On Sat, 1 Sep 2007, Jeff Garzik wrote:

> Please add 'slab' to the title, otherwise you conflict with a feature of the
> same name...

Ok.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 00/26] Slab defragmentation V5
  2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
                   ` (25 preceding siblings ...)
  2007-09-01  1:41 ` [RFC 26/26] SLUB: Add debugging for slab defrag Christoph Lameter
@ 2007-09-06 20:34 ` Jörn Engel
  26 siblings, 0 replies; 34+ messages in thread
From: Jörn Engel @ 2007-09-06 20:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andy Whitcroft, linux-kernel, linux-mm, linux-fsdevel,
	Christoph Hellwig, Mel Gorman, David Chinner

On Fri, 31 August 2007 18:41:07 -0700, Christoph Lameter wrote:
> 
> The trouble with this patchset is that it is difficult to validate.
> Activities are only performed when special load situations are encountered.
> Are there any tests that could give meaningful information about
> the effectiveness of these measures? I have run various tests here
> creating and deleting files and building kernels under low memory situations
> to trigger these reclaim mechanisms but how does one measure their
> effectiveness?

One could play with updatedb followed by a memhog.  How much time passes
and how many slab objects have to be freed before the memhog has
allocated N% of physical memory?  Both numbers are relevant.  The first
indicates how quickly pages are reclaimed from slab caches, while the
second show how many objects remain cached for future lookups.  Updatedb
aside, caching objects is done for solid performance reasons.

Creating a qemu image with little memory and a huge directory hierarchy
filled with 0-byte files may be a nice test system.  Unless you beat me
to it I'll try to set it up once logfs is in merge-worthy shape.

Jörn

-- 
A quarrel is quickly settled when deserted by one party; there is
no battle unless there be two.
-- Seneca

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts
  2007-09-01  1:41 ` [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
@ 2007-09-19 15:08   ` Rik van Riel
  2007-09-19 18:00     ` Christoph Lameter
  0 siblings, 1 reply; 34+ messages in thread
From: Rik van Riel @ 2007-09-19 15:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andy Whitcroft, linux-kernel, linux-mm, linux-fsdevel,
	Christoph Hellwig, Mel Gorman, David Chinner

Christoph Lameter wrote:
> Add a flag SlabReclaimable() that is set on slabs with a method
> that allows defrag/reclaim. Clear the flag if a reclaim action is not
> successful in reducing the number of objects in a slab. The reclaim
> flag is set again if all objects have been allocated from it.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> ---
>  mm/slub.c |   42 ++++++++++++++++++++++++++++++++++++------
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c	2007-08-28 20:10:37.000000000 -0700
> +++ linux-2.6/mm/slub.c	2007-08-28 20:10:47.000000000 -0700
> @@ -107,6 +107,8 @@
>  #define SLABDEBUG 0
>  #endif
>  
> +#define SLABRECLAIMABLE (1 << PG_dirty)
> +
>  static inline int SlabFrozen(struct page *page)
>  {
>  	return page->flags & FROZEN;
> @@ -137,6 +139,21 @@ static inline void ClearSlabDebug(struct
>  	page->flags &= ~SLABDEBUG;
>  }
>  
> +static inline int SlabReclaimable(struct page *page)
> +{
> +	return page->flags & SLABRECLAIMABLE;
> +}
> +
> +static inline void SetSlabReclaimable(struct page *page)
> +{
> +	page->flags |= SLABRECLAIMABLE;
> +}
> +
> +static inline void ClearSlabReclaimable(struct page *page)
> +{
> +	page->flags &= ~SLABRECLAIMABLE;
> +}

Why is it safe to not use the normal page flag bit operators
for these page flags operations?

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts
  2007-09-19 15:08   ` Rik van Riel
@ 2007-09-19 18:00     ` Christoph Lameter
  0 siblings, 0 replies; 34+ messages in thread
From: Christoph Lameter @ 2007-09-19 18:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andy Whitcroft, linux-kernel, linux-mm, linux-fsdevel,
	Christoph Hellwig, Mel Gorman, David Chinner

On Wed, 19 Sep 2007, Rik van Riel wrote:

> Why is it safe to not use the normal page flag bit operators
> for these page flags operations?

Because SLUB always modifies page flags under PageLock.


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-09-19 18:01 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-01  1:41 [RFC 00/26] Slab defragmentation V5 Christoph Lameter
2007-09-01  1:41 ` [RFC 01/26] SLUB: Extend slabinfo to support -D and -C options Christoph Lameter
2007-09-01  1:41 ` [RFC 02/26] SLUB: Move count_partial() Christoph Lameter
2007-09-01  1:41 ` [RFC 03/26] SLUB: Rename NUMA defrag_ratio to remote_node_defrag_ratio Christoph Lameter
2007-09-01  1:41 ` [RFC 04/26] SLUB: Add defrag_ratio field and sysfs support Christoph Lameter
2007-09-01  1:41 ` [RFC 05/26] SLUB: Replace ctor field with ops field in /sys/slab/:0000008 /sys/slab/:0000016 /sys/slab/:0000024 /sys/slab/:0000032 /sys/slab/:0000040 /sys/slab/:0000048 /sys/slab/:0000056 /sys/slab/:0000064 /sys/slab/:0000072 /sys/slab/:0000080 /sys/slab/:0000088 /sys/slab/:0000096 /sys/slab/:0000104 /sys/slab/:0000128 /sys/slab/:0000144 /sys/slab/:0000184 /sys/slab/:0000192 /sys/slab/:0000216 /sys/slab/:0000256 /sys/slab/:0000344 /sys/slab/:0000384 /sys/slab/:0000448 /sys/slab/:0000512 /sys/slab/:0000768 /sys/slab/:0000920 /sys/slab/:0001024 /sys/slab/:0001152 /sys/slab/:0001344 /sys/slab/:0001536 /sys/slab/:0002048 /sys/slab/:0003072 /sys/slab/:0004096 /sys/slab/:a-0000056 /sys/slab/:a-0000080 /sys/slab/:a-0000128 /sys/slab/Acpi-Namespace /sys/slab/Acpi-Operand /sys/slab/Acpi-Parse /sys/slab/Acpi-ParseExt /sys/slab/Acpi-State /sys/slab/RAW /sys/slab/TCP /sys/slab/UDP /sys/slab/UDP-Lite /sys/slab/UNIX /sys/slab/anon_vma /sys/slab/arp_cache /sys/slab/bdev_cache /sys/ slab/bio /sys/slab/biovec-1 /sys/slab/biovec-128 /sys/slab/biovec-16 /sys/slab/biovec-256 /sys/slab/biovec-4 /sys/slab/biovec-64 /sys/slab/blkdev_ioc /sys/slab/blkdev_queue /sys/slab/blkdev_requests /sys/slab/buffer_head /sys/slab/cfq_io_context /sys/slab/cfq_queue /sys/slab/dentry /sys/slab/eventpoll_epi /sys/slab/eventpoll_pwq /sys/slab/ext2_inode_cache /sys/slab/ext3_inode_cache /sys/slab/fasync_cache /sys/slab/file_lock_cache /sys/slab/files_cache /sys/slab/filp /sys/slab/flow_cache /sys/slab/fs_cache /sys/slab/idr_layer_cache /sys/slab/inet_peer_cache /sys/slab/inode_cache /sys/slab/inotify_event_cache /sys/slab/inotify_watch_cache /sys/slab/ip_dst_cache /sys/slab/ip_fib_alias /sys/slab/ip_fib_hash /sys/slab/jbd_1k /sys/slab/jbd_4k /sys/slab/journal_handle /sys/slab/journal_head /sys/slab/kiocb /sys/slab/kioctx /sys/slab/kmalloc-1024 /sys/slab/kmalloc-128 /sys/slab/kmalloc-16 /sys/slab/kmalloc-192 /sys/slab/kmalloc-2048 /sys/slab/kmalloc-256 /sys/slab/kmalloc-32 /sys/sl ab/kmalloc-512 /sys/slab/kmalloc-64 /sys/slab/kmalloc-8 /sys/slab/kmalloc-96 /sys/slab/mm_struct /sys/slab/mnt_cache /sys/slab/mqueue_inode_cache /sys/slab/names_cache /sys/slab/nfs_direct_cache /sys/slab/nfs_inode_cache /sys/slab/nfs_page /sys/slab/nfs_read_data /sys/slab/nfs_write_data /sys/slab/nfsd4_delegations /sys/slab/nfsd4_files /sys/slab/nfsd4_stateids /sys/slab/nfsd4_stateowners /sys/slab/nsproxy /sys/slab/pid /sys/slab/posix_timers_cache /sys/slab/proc_inode_cache /sys/slab/radix_tree_node /sys/slab/request_sock_TCP /sys/slab/revoke_record /sys/slab/revoke_table /sys/slab/rpc_buffers /sys/slab/rpc_inode_cache /sys/slab/rpc_tasks /sys/slab/scsi_cmd_cache /sys/slab/scsi_io_context /sys/slab/secpath_cache /sys/slab/sgpool-128 /sys/slab/sgpool-16 /sys/slab/sgpool-32 /sys/slab/sgpool-64 /sys/slab/sgpool-8 /sys/slab/shmem_inode_cache /sys/slab/sighand_cache /sys/slab/signal_cache /sys/slab/sigqueue /sys/slab/skbuff_fclone_cache /sys/slab/skbuff_head_cache /sys/slab/sock _inode_cache /sys/slab/sysfs_dir_cache /sys/slab/task_struct /sys/slab/tcp_bind_bucket /sys/slab/tw_sock_TCP /sys/slab/uhci_urb_priv /sys/slab/uid_cache /sys/slab/vm_area_struct /sys/slab/xfrm_dst_cache Christoph Lameter
2007-09-01  1:41 ` [RFC 06/26] SLUB: Add get() and kick() methods Christoph Lameter
2007-09-01  1:41 ` [RFC 07/26] SLUB: Sort slab cache list and establish maximum objects for defrag slabs Christoph Lameter
2007-09-01  1:41 ` [RFC 08/26] SLUB: Consolidate add_partial and add_partial_tail to one function Christoph Lameter
2007-09-01  1:41 ` [RFC 09/26] SLUB: Slab defrag core Christoph Lameter
2007-09-01  1:41 ` [RFC 10/26] SLUB: Trigger defragmentation from memory reclaim Christoph Lameter
2007-09-01  1:41 ` [RFC 11/26] VM: Allow get_page_unless_zero on compound pages Christoph Lameter
2007-09-01  1:41 ` [RFC 12/26] SLUB: Slab reclaim through Lumpy reclaim Christoph Lameter
2007-09-01  1:41 ` [RFC 13/26] SLUB: Add SlabReclaimable() to avoid repeated reclaim attempts Christoph Lameter
2007-09-19 15:08   ` Rik van Riel
2007-09-19 18:00     ` Christoph Lameter
2007-09-01  1:41 ` [RFC 14/26] SLUB: __GFP_MOVABLE and SLAB_TEMPORARY support Christoph Lameter
2007-09-01  2:04   ` KAMEZAWA Hiroyuki
2007-09-01  2:07     ` Christoph Lameter
2007-09-01  1:41 ` [RFC 15/26] bufferhead: Revert constructor removal Christoph Lameter
2007-09-01  1:41 ` [RFC 16/26] Buffer heads: Support slab defrag Christoph Lameter
2007-09-01  1:41 ` [RFC 17/26] inodes: Support generic defragmentation Christoph Lameter
2007-09-01  1:41 ` [RFC 18/26] FS: ExtX filesystem defrag Christoph Lameter
2007-09-01  9:48   ` Jeff Garzik
2007-09-02 11:37     ` Christoph Lameter
2007-09-01  1:41 ` [RFC 19/26] FS: XFS slab defragmentation Christoph Lameter
2007-09-01  1:41 ` [RFC 20/26] FS: Proc filesystem support for slab defrag Christoph Lameter
2007-09-01  1:41 ` [RFC 21/26] FS: Slab defrag: Reiserfs support Christoph Lameter
2007-09-01  1:41 ` [RFC 22/26] FS: Socket inode defragmentation Christoph Lameter
2007-09-01  1:41 ` [RFC 23/26] dentries: Extract common code to remove dentry from lru Christoph Lameter
2007-09-01  1:41 ` [RFC 24/26] dentries: Add constructor Christoph Lameter
2007-09-01  1:41 ` [RFC 25/26] dentries: dentry defragmentation Christoph Lameter
2007-09-01  1:41 ` [RFC 26/26] SLUB: Add debugging for slab defrag Christoph Lameter
2007-09-06 20:34 ` [RFC 00/26] Slab defragmentation V5 Jörn Engel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).