linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree()
@ 2008-11-17 12:26 Lai Jiangshan
  2008-11-17 13:14 ` Johannes Weiner
  2008-11-18  6:57 ` [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Pekka Enberg
  0 siblings, 2 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-17 12:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Dave Airlie, Paul Menage, kamezawa.hiroyu,
	Balbir Singh, Arjan van de Ven, Jan Kara, Jes Sorensen,
	KOSAKI Motohiro, dada1, Alexey Dobriyan, Jens Axboe,
	Linux Kernel Mailing List


For some reason, we use alternative malloc.
1) we want NUMA spreading and reliable allocation.
2) we use kmalloc(), but we want vmalloc() rarely when required
   memory is very large.
......

there are N copied of such codes in the kernel, it's pure code duplication.
this patch provides two helper functions do such alternative malloc/free,
and duplication codes will be cleanup.

changed from v1:

1) name them kvmalloc()/kvfree() instead of simple_malloc()/simple_free()
2) implement them in mm/util.c
   Since vmalloc()/vfree() is potentially a very expensive operation,
   we should make kvmalloc()/kvfree() uninlined. No need to try to
   save 3 or 4 cpu cycles. This will save icache at least.
3) add gfp_t flags parameter.

changelog of v1:

some subsystem needs vmalloc() when required memory is large.
but current kernel has not APIs for this requirement.
this patch introduces simple_malloc() and simple_free().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ffee2f7..ff89388 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -278,6 +278,9 @@ static inline int is_vmalloc_addr(const void *x)
 #endif
 }
 
+void *kvmalloc(unsigned long size, gfp_t flags);
+void kvfree(void *ptr);
+
 static inline struct page *compound_head(struct page *page)
 {
 	if (unlikely(PageTail(page)))
diff --git a/mm/util.c b/mm/util.c
index cb00b74..66d1a12 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -5,6 +5,8 @@
 #include <linux/err.h>
 #include <linux/sched.h>
 #include <asm/uaccess.h>
+#include <linux/interrupt.h>
+#include <linux/vmalloc.h>
 
 /**
  * kstrdup - allocate space for and copy an existing string
@@ -163,6 +165,38 @@ char *strndup_user(const char __user *s, long n)
 }
 EXPORT_SYMBOL(strndup_user);
 
+/**
+ * kvmalloc - allocate memory by kmalloc() or vmalloc()
+ *
+ * if @size <= PAGE_SIZE, memory is allocated by kmalloc(),
+ * otherwise by vmalloc().
+ *
+ * NOTICE: Generally, you should use kmalloc() or vmalloc() directly.
+ * kvmalloc() is provided for some special condition.
+ */
+void *kvmalloc(unsigned long size, gfp_t flags)
+{
+	if (size <= PAGE_SIZE)
+		return kmalloc(size, flags & ~__GFP_HIGH);
+	else
+		return __vmalloc(size, flags | __GFP_HIGHMEM, PAGE_KERNEL);
+}
+EXPORT_SYMBOL(kvmalloc);
+
+/**
+ * kvfree - free the memory by kfree(), or vfree() if it is vmalloc addr
+ */
+void kvfree(void *ptr)
+{
+	BUG_ON(in_interrupt());
+
+	if (is_vmalloc_addr(ptr))
+		vfree(ptr);
+	else
+		kfree(ptr);
+}
+EXPORT_SYMBOL(kvfree);
+
 #ifndef HAVE_ARCH_PICK_MMAP_LAYOUT
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree()
  2008-11-17 12:26 [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Lai Jiangshan
@ 2008-11-17 13:14 ` Johannes Weiner
  2008-11-18  0:28   ` Lai Jiangshan
                     ` (5 more replies)
  2008-11-18  6:57 ` [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Pekka Enberg
  1 sibling, 6 replies; 12+ messages in thread
From: Johannes Weiner @ 2008-11-17 13:14 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List

Hi Lai,

On Mon, Nov 17, 2008 at 08:26:12PM +0800, Lai Jiangshan wrote:
> +/**
> + * kvfree - free the memory by kfree(), or vfree() if it is vmalloc addr

Could that be just `free the memory allocated by kvmalloc()'?

	Hannes

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree()
  2008-11-17 13:14 ` Johannes Weiner
@ 2008-11-18  0:28   ` Lai Jiangshan
  2008-11-18  8:50   ` [PATCH V2 0/4] use kvmalloc()/kvfree() the second part, about RCU Lai Jiangshan
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  0:28 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List

Johannes Weiner wrote:
> Hi Lai,
> 
> On Mon, Nov 17, 2008 at 08:26:12PM +0800, Lai Jiangshan wrote:
>> +/**
>> + * kvfree - free the memory by kfree(), or vfree() if it is vmalloc addr
> 
> Could that be just `free the memory allocated by kvmalloc()'?
> 
> 	Hannes
> 
> 
> 


We don't and we won't introduce another allocator, kvmalloc() is
a wrapper of kmalloc() and vmalloc(), and kvfree() is a wrapper
of kfree() and vfree()

	Lai.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree()
  2008-11-17 12:26 [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Lai Jiangshan
  2008-11-17 13:14 ` Johannes Weiner
@ 2008-11-18  6:57 ` Pekka Enberg
  1 sibling, 0 replies; 12+ messages in thread
From: Pekka Enberg @ 2008-11-18  6:57 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List

On Mon, Nov 17, 2008 at 2:26 PM, Lai Jiangshan <laijs@cn.fujitsu.com> wrote:
> @@ -163,6 +165,38 @@ char *strndup_user(const char __user *s, long n)
>  }
>  EXPORT_SYMBOL(strndup_user);
>
> +/**
> + * kvmalloc - allocate memory by kmalloc() or vmalloc()
> + *
> + * if @size <= PAGE_SIZE, memory is allocated by kmalloc(),
> + * otherwise by vmalloc().
> + *
> + * NOTICE: Generally, you should use kmalloc() or vmalloc() directly.
> + * kvmalloc() is provided for some special condition.

You might as well explicitly state what those special conditions are here.

> + */
> +void *kvmalloc(unsigned long size, gfp_t flags)
> +{
> +       if (size <= PAGE_SIZE)
> +               return kmalloc(size, flags & ~__GFP_HIGH);
> +       else
> +               return __vmalloc(size, flags | __GFP_HIGHMEM, PAGE_KERNEL);
> +}
> +EXPORT_SYMBOL(kvmalloc);
> +
> +/**
> + * kvfree - free the memory by kfree(), or vfree() if it is vmalloc addr
> + */
> +void kvfree(void *ptr)
> +{
> +       BUG_ON(in_interrupt());

I wonder if this should be a WARN_ON() to let the machine limp along
for a while if it can. At least in the kfree() case people can catch
the oops trace more easily that way.

> +
> +       if (is_vmalloc_addr(ptr))
> +               vfree(ptr);
> +       else
> +               kfree(ptr);
> +}
> +EXPORT_SYMBOL(kvfree);
> +
>  #ifndef HAVE_ARCH_PICK_MMAP_LAYOUT
>  void arch_pick_mmap_layout(struct mm_struct *mm)
>  {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 0/4] use kvmalloc()/kvfree() the second part, about RCU
  2008-11-17 13:14 ` Johannes Weiner
  2008-11-18  0:28   ` Lai Jiangshan
@ 2008-11-18  8:50   ` Lai Jiangshan
  2008-11-18  8:51   ` [PATCH V2 1/4] vmalloc: introduce vfree_atomic() Lai Jiangshan
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  8:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Miller, Dave Airlie, Paul Menage, kamezawa.hiroyu,
	Balbir Singh, Arjan van de Ven, Jan Kara, Jes Sorensen,
	KOSAKI Motohiro, dada1, Alexey Dobriyan, Jens Axboe,
	Linux Kernel Mailing List, Paul E. McKenney, Nick Piggin,
	Al Viro, Rik van Riel, Pekka Enberg

Hi, all

I have sent patch for introducing kvmalloc()/kvfree().
http://lkml.org/lkml/2008/11/17/137

and 6 patches for using kvmalloc()/kvfree() (these patches need to be fixed)

fdtable and sysipc uses vfree() in the RCU callback.
Now, these patches cleanup for them.

any comments are welcome.

Thanx, Lai.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH V2 1/4] vmalloc: introduce vfree_atomic()
  2008-11-17 13:14 ` Johannes Weiner
  2008-11-18  0:28   ` Lai Jiangshan
  2008-11-18  8:50   ` [PATCH V2 0/4] use kvmalloc()/kvfree() the second part, about RCU Lai Jiangshan
@ 2008-11-18  8:51   ` Lai Jiangshan
  2008-11-18  9:19     ` Nick Piggin
  2008-11-18  8:51   ` [PATCH V2 2/4] mm: introduce kvfree_atomic() Lai Jiangshan
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg


fdtable and sysipc use vfree() in RCU callback. this patch
introduce vfree_atomic() for them.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 307b885..195379a 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -60,6 +60,7 @@ extern void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot);
 extern void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask,
 				pgprot_t prot);
 extern void vfree(const void *addr);
+extern void vfree_atomic(void *addr);
 
 extern void *vmap(struct page **pages, unsigned int count,
 			unsigned long flags, pgprot_t prot);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ba6b0f5..39c164e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -23,6 +23,7 @@
 #include <linux/rbtree.h>
 #include <linux/radix-tree.h>
 #include <linux/rcupdate.h>
+#include <linux/workqueue.h>
 
 #include <asm/atomic.h>
 #include <asm/uaccess.h>
@@ -1177,6 +1178,21 @@ void vfree(const void *addr)
 }
 EXPORT_SYMBOL(vfree);
 
+static void vfree_defer(struct work_struct *work)
+{
+	vfree(work);
+}
+
+void vfree_atomic(void *ptr)
+{
+	if (ptr) {
+		struct work_struct *work = ptr;
+		INIT_WORK(work, vfree_defer);
+		schedule_work(work);
+	}
+}
+EXPORT_SYMBOL_GPL(vfree_atomic);
+
 /**
  *	vunmap  -  release virtual mapping obtained by vmap()
  *	@addr:		memory base address


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 2/4] mm: introduce kvfree_atomic()
  2008-11-17 13:14 ` Johannes Weiner
                     ` (2 preceding siblings ...)
  2008-11-18  8:51   ` [PATCH V2 1/4] vmalloc: introduce vfree_atomic() Lai Jiangshan
@ 2008-11-18  8:51   ` Lai Jiangshan
  2008-11-18  8:51   ` [PATCH V2 3/4] files: use kvmalloc()/kvfree()/kvfree_atomic() Lai Jiangshan
  2008-11-18  8:51   ` [PATCH V2 4/4] sysipc: use kvmalloc()/kvfree_atomic() Lai Jiangshan
  5 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg


fdtable and sysipc use vfree() in RCU callback. this patch
introduce vfree_atomic() for them.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ff89388..aa7b5ca 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -280,6 +280,7 @@ static inline int is_vmalloc_addr(const void *x)
 
 void *kvmalloc(unsigned long size, gfp_t flags);
 void kvfree(void *ptr);
+void kvfree_atomic(void *ptr);
 
 static inline struct page *compound_head(struct page *page)
 {
diff --git a/mm/util.c b/mm/util.c
index 66d1a12..ed2d317 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -197,6 +197,15 @@ void kvfree(void *ptr)
 }
 EXPORT_SYMBOL(kvfree);
 
+void kvfree_atomic(void *ptr)
+{
+	if (is_vmalloc_addr(ptr))
+		vfree_atomic(ptr);
+	else
+		kfree(ptr);
+}
+EXPORT_SYMBOL(kvfree_atomic);
+
 #ifndef HAVE_ARCH_PICK_MMAP_LAYOUT
 void arch_pick_mmap_layout(struct mm_struct *mm)
 {


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 3/4] files: use kvmalloc()/kvfree()/kvfree_atomic()
  2008-11-17 13:14 ` Johannes Weiner
                     ` (3 preceding siblings ...)
  2008-11-18  8:51   ` [PATCH V2 2/4] mm: introduce kvfree_atomic() Lai Jiangshan
@ 2008-11-18  8:51   ` Lai Jiangshan
  2008-11-18  8:51   ` [PATCH V2 4/4] sysipc: use kvmalloc()/kvfree_atomic() Lai Jiangshan
  5 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg


RCU callback here use vfree()
use kvmalloc()/kvfree()/kvfree_atomic() make it simple

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 fs/file.c               |  122 +++++++-----------------------------------------
 include/linux/fdtable.h |    1
 2 files changed, 19 insertions(+), 104 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index f313314..a71fdf3 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -20,71 +20,13 @@
 #include <linux/rcupdate.h>
 #include <linux/workqueue.h>
 
-struct fdtable_defer {
-	spinlock_t lock;
-	struct work_struct wq;
-	struct fdtable *next;
-};
-
 int sysctl_nr_open __read_mostly = 1024*1024;
 int sysctl_nr_open_min = BITS_PER_LONG;
 int sysctl_nr_open_max = 1024 * 1024; /* raised later */
 
-/*
- * We use this list to defer free fdtables that have vmalloced
- * sets/arrays. By keeping a per-cpu list, we avoid having to embed
- * the work_struct in fdtable itself which avoids a 64 byte (i386) increase in
- * this per-task structure.
- */
-static DEFINE_PER_CPU(struct fdtable_defer, fdtable_defer_list);
-
-static inline void * alloc_fdmem(unsigned int size)
-{
-	if (size <= PAGE_SIZE)
-		return kmalloc(size, GFP_KERNEL);
-	else
-		return vmalloc(size);
-}
-
-static inline void free_fdarr(struct fdtable *fdt)
-{
-	if (fdt->max_fds <= (PAGE_SIZE / sizeof(struct file *)))
-		kfree(fdt->fd);
-	else
-		vfree(fdt->fd);
-}
-
-static inline void free_fdset(struct fdtable *fdt)
-{
-	if (fdt->max_fds <= (PAGE_SIZE * BITS_PER_BYTE / 2))
-		kfree(fdt->open_fds);
-	else
-		vfree(fdt->open_fds);
-}
-
-static void free_fdtable_work(struct work_struct *work)
-{
-	struct fdtable_defer *f =
-		container_of(work, struct fdtable_defer, wq);
-	struct fdtable *fdt;
-
-	spin_lock_bh(&f->lock);
-	fdt = f->next;
-	f->next = NULL;
-	spin_unlock_bh(&f->lock);
-	while(fdt) {
-		struct fdtable *next = fdt->next;
-		vfree(fdt->fd);
-		free_fdset(fdt);
-		kfree(fdt);
-		fdt = next;
-	}
-}
-
 void free_fdtable_rcu(struct rcu_head *rcu)
 {
 	struct fdtable *fdt = container_of(rcu, struct fdtable, rcu);
-	struct fdtable_defer *fddef;
 
 	BUG_ON(!fdt);
 
@@ -97,20 +39,9 @@ void free_fdtable_rcu(struct rcu_head *rcu)
 				container_of(fdt, struct files_struct, fdtab));
 		return;
 	}
-	if (fdt->max_fds <= (PAGE_SIZE / sizeof(struct file *))) {
-		kfree(fdt->fd);
-		kfree(fdt->open_fds);
-		kfree(fdt);
-	} else {
-		fddef = &get_cpu_var(fdtable_defer_list);
-		spin_lock(&fddef->lock);
-		fdt->next = fddef->next;
-		fddef->next = fdt;
-		/* vmallocs are handled from the workqueue context */
-		schedule_work(&fddef->wq);
-		spin_unlock(&fddef->lock);
-		put_cpu_var(fdtable_defer_list);
-	}
+	kvfree_atomic(fdt->fd);
+	kvfree_atomic(fdt->open_fds);
+	kfree(fdt);
 }
 
 /*
@@ -166,30 +97,36 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
 	if (!fdt)
 		goto out;
 	fdt->max_fds = nr;
-	data = alloc_fdmem(nr * sizeof(struct file *));
+	data = kvmalloc(nr * sizeof(struct file *), GFP_KERNEL);
 	if (!data)
 		goto out_fdt;
 	fdt->fd = (struct file **)data;
-	data = alloc_fdmem(max_t(unsigned int,
-				 2 * nr / BITS_PER_BYTE, L1_CACHE_BYTES));
+	data = kvmalloc(max_t(unsigned int, 2 * nr / BITS_PER_BYTE,
+			      L1_CACHE_BYTES), GFP_KERNEL);
 	if (!data)
 		goto out_arr;
 	fdt->open_fds = (fd_set *)data;
 	data += nr / BITS_PER_BYTE;
 	fdt->close_on_exec = (fd_set *)data;
 	INIT_RCU_HEAD(&fdt->rcu);
-	fdt->next = NULL;
 
 	return fdt;
 
 out_arr:
-	free_fdarr(fdt);
+	kvfree(fdt->fd);
 out_fdt:
 	kfree(fdt);
 out:
 	return NULL;
 }
 
+static void immediate_free_fdtable(struct fdtable *fdt)
+{
+	kvfree(fdt->fd);
+	kvfree(fdt->open_fds);
+	kfree(fdt);
+}
+
 /*
  * Expand the file descriptor table.
  * This function will allocate a new fdtable and both fd array and fdset, of
@@ -213,9 +150,7 @@ static int expand_fdtable(struct files_struct *files, int nr)
 	 * caller and alloc_fdtable().  Cheaper to catch it here...
 	 */
 	if (unlikely(new_fdt->max_fds <= nr)) {
-		free_fdarr(new_fdt);
-		free_fdset(new_fdt);
-		kfree(new_fdt);
+		immediate_free_fdtable(new_fdt);
 		return -EMFILE;
 	}
 	/*
@@ -231,9 +166,7 @@ static int expand_fdtable(struct files_struct *files, int nr)
 			free_fdtable(cur_fdt);
 	} else {
 		/* Somebody else expanded, so undo our attempt */
-		free_fdarr(new_fdt);
-		free_fdset(new_fdt);
-		kfree(new_fdt);
+		immediate_free_fdtable(new_fdt);
 	}
 	return 1;
 }
@@ -312,7 +245,6 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	new_fdt->open_fds = (fd_set *)&newf->open_fds_init;
 	new_fdt->fd = &newf->fd_array[0];
 	INIT_RCU_HEAD(&new_fdt->rcu);
-	new_fdt->next = NULL;
 
 	spin_lock(&oldf->file_lock);
 	old_fdt = files_fdtable(oldf);
@@ -324,11 +256,8 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 	while (unlikely(open_files > new_fdt->max_fds)) {
 		spin_unlock(&oldf->file_lock);
 
-		if (new_fdt != &newf->fdtab) {
-			free_fdarr(new_fdt);
-			free_fdset(new_fdt);
-			kfree(new_fdt);
-		}
+		if (new_fdt != &newf->fdtab)
+			immediate_free_fdtable(new_fdt);
 
 		new_fdt = alloc_fdtable(open_files - 1);
 		if (!new_fdt) {
@@ -338,9 +267,7 @@ struct files_struct *dup_fd(struct files_struct *oldf, int *errorp)
 
 		/* beyond sysctl_nr_open; nothing to do */
 		if (unlikely(new_fdt->max_fds < open_files)) {
-			free_fdarr(new_fdt);
-			free_fdset(new_fdt);
-			kfree(new_fdt);
+			immediate_free_fdtable(new_fdt);
 			*errorp = -EMFILE;
 			goto out_release;
 		}
@@ -404,19 +331,8 @@ out:
 	return NULL;
 }
 
-static void __devinit fdtable_defer_list_init(int cpu)
-{
-	struct fdtable_defer *fddef = &per_cpu(fdtable_defer_list, cpu);
-	spin_lock_init(&fddef->lock);
-	INIT_WORK(&fddef->wq, free_fdtable_work);
-	fddef->next = NULL;
-}
-
 void __init files_defer_init(void)
 {
-	int i;
-	for_each_possible_cpu(i)
-		fdtable_defer_list_init(i);
 	sysctl_nr_open_max = min((size_t)INT_MAX, ~(size_t)0/sizeof(void *)) &
 			     -BITS_PER_LONG;
 }
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index 4aab6f1..cacdae6 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -32,7 +32,6 @@ struct fdtable {
 	fd_set *close_on_exec;
 	fd_set *open_fds;
 	struct rcu_head rcu;
-	struct fdtable *next;
 };
 
 /*


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH V2 4/4] sysipc: use kvmalloc()/kvfree_atomic()
  2008-11-17 13:14 ` Johannes Weiner
                     ` (4 preceding siblings ...)
  2008-11-18  8:51   ` [PATCH V2 3/4] files: use kvmalloc()/kvfree()/kvfree_atomic() Lai Jiangshan
@ 2008-11-18  8:51   ` Lai Jiangshan
  5 siblings, 0 replies; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18  8:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, David Miller, Dave Airlie, Paul Menage,
	kamezawa.hiroyu, Balbir Singh, Arjan van de Ven, Jan Kara,
	Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg


RCU callback here use vfree()
use kvmalloc()/kvfree_atomic() make it simple


Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
---
 util.c |  116 ++++++++++-------------------------------------------------------
 1 file changed, 19 insertions(+), 97 deletions(-)
diff --git a/ipc/util.c b/ipc/util.c
index 49b3ea6..3ecf04d 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -468,51 +468,16 @@ void ipc_free(void* ptr, int size)
 		kfree(ptr);
 }
 
-/*
- * rcu allocations:
- * There are three headers that are prepended to the actual allocation:
- * - during use: ipc_rcu_hdr.
- * - during the rcu grace period: ipc_rcu_grace.
- * - [only if vmalloc]: ipc_rcu_sched.
- * Their lifetime doesn't overlap, thus the headers share the same memory.
- * Unlike a normal union, they are right-aligned, thus some container_of
- * forward/backward casting is necessary:
- */
 struct ipc_rcu_hdr
 {
-	int refcount;
-	int is_vmalloc;
-	void *data[0];
-};
-
-
-struct ipc_rcu_grace
-{
-	struct rcu_head rcu;
+	union {
+		int refcount;
+		struct rcu_head rcu;
+	};
 	/* "void *" makes sure alignment of following data is sane. */
 	void *data[0];
 };
 
-struct ipc_rcu_sched
-{
-	struct work_struct work;
-	/* "void *" makes sure alignment of following data is sane. */
-	void *data[0];
-};
-
-#define HDRLEN_KMALLOC		(sizeof(struct ipc_rcu_grace) > sizeof(struct ipc_rcu_hdr) ? \
-					sizeof(struct ipc_rcu_grace) : sizeof(struct ipc_rcu_hdr))
-#define HDRLEN_VMALLOC		(sizeof(struct ipc_rcu_sched) > HDRLEN_KMALLOC ? \
-					sizeof(struct ipc_rcu_sched) : HDRLEN_KMALLOC)
-
-static inline int rcu_use_vmalloc(int size)
-{
-	/* Too big for a single page? */
-	if (HDRLEN_KMALLOC + size > PAGE_SIZE)
-		return 1;
-	return 0;
-}
-
 /**
  *	ipc_rcu_alloc	-	allocate ipc and rcu space 
  *	@size: size desired
@@ -522,30 +487,18 @@ static inline int rcu_use_vmalloc(int size)
  *	NULL is returned if the allocation fails. 
  */
  
-void* ipc_rcu_alloc(int size)
+void *ipc_rcu_alloc(int size)
 {
-	void* out;
 	/* 
-	 * We prepend the allocation with the rcu struct, and
-	 * workqueue if necessary (for vmalloc). 
+	 * We prepend the allocation with the ipc_rcu_hdr
 	 */
-	if (rcu_use_vmalloc(size)) {
-		out = vmalloc(HDRLEN_VMALLOC + size);
-		if (out) {
-			out += HDRLEN_VMALLOC;
-			container_of(out, struct ipc_rcu_hdr, data)->is_vmalloc = 1;
-			container_of(out, struct ipc_rcu_hdr, data)->refcount = 1;
-		}
-	} else {
-		out = kmalloc(HDRLEN_KMALLOC + size, GFP_KERNEL);
-		if (out) {
-			out += HDRLEN_KMALLOC;
-			container_of(out, struct ipc_rcu_hdr, data)->is_vmalloc = 0;
-			container_of(out, struct ipc_rcu_hdr, data)->refcount = 1;
-		}
-	}
+	struct ipc_rcu_hdr *hdr = kvmalloc(sizeof(struct ipc_rcu_hdr) + size,
+			GFP_KERNEL);
+	if (!hdr)
+		return NULL;
 
-	return out;
+	hdr->refcount = 1;
+	return hdr->data;
 }
 
 void ipc_rcu_getref(void *ptr)
@@ -553,56 +506,25 @@ void ipc_rcu_getref(void *ptr)
 	container_of(ptr, struct ipc_rcu_hdr, data)->refcount++;
 }
 
-static void ipc_do_vfree(struct work_struct *work)
-{
-	vfree(container_of(work, struct ipc_rcu_sched, work));
-}
-
 /**
- * ipc_schedule_free - free ipc + rcu space
+ * ipc_free_rcu - free ipc + ipc_rcu_hdr
  * @head: RCU callback structure for queued work
  * 
  * Since RCU callback function is called in bh,
- * we need to defer the vfree to schedule_work().
- */
-static void ipc_schedule_free(struct rcu_head *head)
-{
-	struct ipc_rcu_grace *grace;
-	struct ipc_rcu_sched *sched;
-
-	grace = container_of(head, struct ipc_rcu_grace, rcu);
-	sched = container_of(&(grace->data[0]), struct ipc_rcu_sched,
-				data[0]);
-
-	INIT_WORK(&sched->work, ipc_do_vfree);
-	schedule_work(&sched->work);
-}
-
-/**
- * ipc_immediate_free - free ipc + rcu space
- * @head: RCU callback structure that contains pointer to be freed
- *
- * Free from the RCU callback context.
+ * we need to use kvfree_atomic().
  */
-static void ipc_immediate_free(struct rcu_head *head)
+static void ipc_free_rcu(struct rcu_head *head)
 {
-	struct ipc_rcu_grace *free =
-		container_of(head, struct ipc_rcu_grace, rcu);
-	kfree(free);
+	kvfree_atomic(container_of(head, struct ipc_rcu_hdr, rcu));
 }
 
 void ipc_rcu_putref(void *ptr)
 {
-	if (--container_of(ptr, struct ipc_rcu_hdr, data)->refcount > 0)
+	struct ipc_rcu_hdr *hdr = container_of(ptr, struct ipc_rcu_hdr, data);
+	if (--hdr->refcount > 0)
 		return;
 
-	if (container_of(ptr, struct ipc_rcu_hdr, data)->is_vmalloc) {
-		call_rcu(&container_of(ptr, struct ipc_rcu_grace, data)->rcu,
-				ipc_schedule_free);
-	} else {
-		call_rcu(&container_of(ptr, struct ipc_rcu_grace, data)->rcu,
-				ipc_immediate_free);
-	}
+	call_rcu(&hdr->rcu, ipc_free_rcu);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/4] vmalloc: introduce vfree_atomic()
  2008-11-18  8:51   ` [PATCH V2 1/4] vmalloc: introduce vfree_atomic() Lai Jiangshan
@ 2008-11-18  9:19     ` Nick Piggin
  2008-11-18 11:38       ` Lai Jiangshan
  0 siblings, 1 reply; 12+ messages in thread
From: Nick Piggin @ 2008-11-18  9:19 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Johannes Weiner, David Miller, Dave Airlie,
	Paul Menage, kamezawa.hiroyu, Balbir Singh, Arjan van de Ven,
	Jan Kara, Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg

On Tuesday 18 November 2008 19:51, Lai Jiangshan wrote:
> fdtable and sysipc use vfree() in RCU callback. this patch
> introduce vfree_atomic() for them.

AFAIKS, vfree is usable from atomic context? What am I missing?
Actually, one could argue that we don't want to perform such
costly operations in the atomic context, however with lazy
unmapping, vfree is very cheap now (amortized, at least).

But it should be much cheaper on average not to schedule this in
another context.

If there was any concern about the TLB flush from atomic context,
we should just defer the lazy flushing, rather than every single
vunmap.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/4] vmalloc: introduce vfree_atomic()
  2008-11-18  9:19     ` Nick Piggin
@ 2008-11-18 11:38       ` Lai Jiangshan
  2008-11-18 12:05         ` Nick Piggin
  0 siblings, 1 reply; 12+ messages in thread
From: Lai Jiangshan @ 2008-11-18 11:38 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Johannes Weiner, David Miller, Dave Airlie,
	Paul Menage, kamezawa.hiroyu, Balbir Singh, Arjan van de Ven,
	Jan Kara, Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg

Nick Piggin wrote:
> On Tuesday 18 November 2008 19:51, Lai Jiangshan wrote:
>> fdtable and sysipc use vfree() in RCU callback. this patch
>> introduce vfree_atomic() for them.
> 
> AFAIKS, vfree is usable from atomic context? What am I missing?

Hi, Nick Piggin,

Sorry for misled you.

fdtable and sysipc use vfree() in RCU callback.(_but defer it by schedule_work()_)
current vfree() is not usable from atomic context, so this patches are worthy.

even if vfree() is usable from atomic context soon,
[PATCH 3/4] [PATCH 4/4] are still worthy now. Because these two patches are
independent from vfree().(just needs to be changed one or two lines
when vfree() is usable from atomic context)

I suggest we can use vfree_atomic() before vfree() is available
for atomic context. Because fdtable and sysipc need a grace way for
using RCU and vmalloc/vfree. (actually, fdtable and sysipc have implemented
they own "vfree_atomic()", but it's very ugly)

Thanx, Lai.

> Actually, one could argue that we don't want to perform such
> costly operations in the atomic context, however with lazy
> unmapping, vfree is very cheap now (amortized, at least).
>

I'm looking forward to vfree() is available for atomic context.

> But it should be much cheaper on average not to schedule this in
> another context.
> 
> If there was any concern about the TLB flush from atomic context,
> we should just defer the lazy flushing, rather than every single
> vunmap.
> 
> 
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V2 1/4] vmalloc: introduce vfree_atomic()
  2008-11-18 11:38       ` Lai Jiangshan
@ 2008-11-18 12:05         ` Nick Piggin
  0 siblings, 0 replies; 12+ messages in thread
From: Nick Piggin @ 2008-11-18 12:05 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Andrew Morton, Johannes Weiner, David Miller, Dave Airlie,
	Paul Menage, kamezawa.hiroyu, Balbir Singh, Arjan van de Ven,
	Jan Kara, Jes Sorensen, KOSAKI Motohiro, dada1, Alexey Dobriyan,
	Jens Axboe, Linux Kernel Mailing List, Paul E. McKenney,
	Nick Piggin, Al Viro, Rik van Riel, Pekka Enberg

[-- Attachment #1: Type: text/plain, Size: 1786 bytes --]

On Tuesday 18 November 2008 22:38, Lai Jiangshan wrote:
> Nick Piggin wrote:
> > On Tuesday 18 November 2008 19:51, Lai Jiangshan wrote:
> >> fdtable and sysipc use vfree() in RCU callback. this patch
> >> introduce vfree_atomic() for them.
> >
> > AFAIKS, vfree is usable from atomic context? What am I missing?
>
> Hi, Nick Piggin,
>
> Sorry for misled you.
>
> fdtable and sysipc use vfree() in RCU callback.(_but defer it by
> schedule_work()_) current vfree() is not usable from atomic context, so
> this patches are worthy.

Hi,

I just wonder why vfree is not usable from atomic context? It is well
known that it can't be used in interrupt context, but just atomic
should work?


> even if vfree() is usable from atomic context soon,
> [PATCH 3/4] [PATCH 4/4] are still worthy now. Because these two patches are
> independent from vfree().(just needs to be changed one or two lines
> when vfree() is usable from atomic context)
>
> I suggest we can use vfree_atomic() before vfree() is available
> for atomic context. Because fdtable and sysipc need a grace way for
> using RCU and vmalloc/vfree. (actually, fdtable and sysipc have implemented
> they own "vfree_atomic()", but it's very ugly)

It's probably not a bad idea to consolidate these into one place.
Using vfree directly is not a trivial change, it could cause
regressions.


> Thanx, Lai.
>
> > Actually, one could argue that we don't want to perform such
> > costly operations in the atomic context, however with lazy
> > unmapping, vfree is very cheap now (amortized, at least).
>
> I'm looking forward to vfree() is available for atomic context.

Something like this. Actually, this is something we quite possibly
should be doing anyway, so that the expensive flush path can be
deferred to a less critical context.

[-- Attachment #2: mm-vfree-defer.patch --]
[-- Type: text/x-diff, Size: 1039 bytes --]

Index: linux-2.6/mm/vmalloc.c
===================================================================
--- linux-2.6.orig/mm/vmalloc.c
+++ linux-2.6/mm/vmalloc.c
@@ -531,6 +531,17 @@ static void purge_vmap_area_lazy(void)
 	__purge_vmap_area_lazy(&start, &end, 0, 0);
 }
 
+static void deferred_purge(struct work_struct *work)
+{
+	purge_vmap_area_lazy();
+}
+
+static struct work_struct purge_work;
+static void kick_purge_vmap_area_lazy(void)
+{
+	schedule_work(&purge_work);
+}
+
 /*
  * Free and unmap a vmap area
  */
@@ -539,7 +550,7 @@ static void free_unmap_vmap_area(struct 
 	va->flags |= VM_LAZY_FREE;
 	atomic_add((va->va_end - va->va_start) >> PAGE_SHIFT, &vmap_lazy_nr);
 	if (unlikely(atomic_read(&vmap_lazy_nr) > lazy_max_pages()))
-		purge_vmap_area_lazy();
+		kick_purge_vmap_area_lazy();
 }
 
 static struct vmap_area *find_vmap_area(unsigned long addr)
@@ -938,6 +949,7 @@ void __init vmalloc_init(void)
 {
 	int i;
 
+	INIT_WORK(&purge_work, deferred_purge);
 	for_each_possible_cpu(i) {
 		struct vmap_block_queue *vbq;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-11-18 12:06 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-17 12:26 [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Lai Jiangshan
2008-11-17 13:14 ` Johannes Weiner
2008-11-18  0:28   ` Lai Jiangshan
2008-11-18  8:50   ` [PATCH V2 0/4] use kvmalloc()/kvfree() the second part, about RCU Lai Jiangshan
2008-11-18  8:51   ` [PATCH V2 1/4] vmalloc: introduce vfree_atomic() Lai Jiangshan
2008-11-18  9:19     ` Nick Piggin
2008-11-18 11:38       ` Lai Jiangshan
2008-11-18 12:05         ` Nick Piggin
2008-11-18  8:51   ` [PATCH V2 2/4] mm: introduce kvfree_atomic() Lai Jiangshan
2008-11-18  8:51   ` [PATCH V2 3/4] files: use kvmalloc()/kvfree()/kvfree_atomic() Lai Jiangshan
2008-11-18  8:51   ` [PATCH V2 4/4] sysipc: use kvmalloc()/kvfree_atomic() Lai Jiangshan
2008-11-18  6:57 ` [PATCH V2 1/1] mm: introduce kvmalloc()/kvfree() Pekka Enberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).