linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] remove vm_struct list management
@ 2012-12-06 16:09 Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

This patchset remove vm_struct list management after initializing vmalloc.
Adding and removing an entry to vmlist is linear time complexity, so
it is inefficient. If we maintain this list, overall time complexity of
adding and removing area to vmalloc space is O(N), although we use
rbtree for finding vacant place and it's time complexity is just O(logN).

And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
It is preferable that we hide this raw data structure and provide
well-defined function for supporting them, because it makes that they
cannot mistake when manipulating theses structure and it makes us easily
maintain vmalloc layer.

I'm not sure that "7/8: makes vmlist only for kexec" is fine.
Because it is related to userspace program.
As far as I know, makedumpfile use kexec's output information and it only
need first address of vmalloc layer. So my implementation reflect this
fact, but I'm not sure. And now, I don't fully test this patchset.
Basic operation work well, but I don't test kexec. So I send this
patchset with 'RFC'.

Please let me know what I am missing.

This series based on v3.7-rc7 and on top of submitted patchset for ARM.
'introduce static_vm for ARM-specific static mapped area'
https://lkml.org/lkml/2012/11/27/356
But, running properly on x86 without ARM patchset.

Joonsoo Kim (8):
  mm, vmalloc: change iterating a vmlist to find_vm_area()
  mm, vmalloc: move get_vmalloc_info() to vmalloc.c
  mm, vmalloc: protect va->vm by vmap_area_lock
  mm, vmalloc: iterate vmap_area_list, instead of vmlist in
    vread/vwrite()
  mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
  mm, vmalloc: iterate vmap_area_list, instead of vmlist, in
    vmallocinfo()
  mm, vmalloc: makes vmlist only for kexec
  mm, vmalloc: remove list management operation after initializing
    vmalloc

 arch/tile/mm/pgtable.c      |    7 +-
 arch/unicore32/mm/ioremap.c |   17 +--
 arch/x86/mm/ioremap.c       |    7 +-
 fs/proc/Makefile            |    2 +-
 fs/proc/internal.h          |   18 ---
 fs/proc/meminfo.c           |    1 +
 fs/proc/mmu.c               |   60 ----------
 include/linux/vmalloc.h     |   19 +++-
 mm/vmalloc.c                |  258 +++++++++++++++++++++++++++++--------------
 9 files changed, 204 insertions(+), 185 deletions(-)
 delete mode 100644 fs/proc/mmu.c

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-07  7:44   ` Pekka Enberg
                     ` (3 more replies)
  2012-12-06 16:09 ` [RFC PATCH 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c Joonsoo Kim
                   ` (9 subsequent siblings)
  10 siblings, 4 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim,
	Chris Metcalf, Guan Xuetao, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

The purpose of iterating a vmlist is finding vm area with specific
virtual address. find_vm_area() is provided for this purpose
and more efficient, because it uses a rbtree.
So change it.

Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
index de0de0c..862782d 100644
--- a/arch/tile/mm/pgtable.c
+++ b/arch/tile/mm/pgtable.c
@@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
 	   in parallel. Reuse of the virtual address is prevented by
 	   leaving it in the global lists until we're done with it.
 	   cpa takes care of the direct mappings. */
-	read_lock(&vmlist_lock);
-	for (p = vmlist; p; p = p->next) {
-		if (p->addr == addr)
-			break;
-	}
-	read_unlock(&vmlist_lock);
+	p = find_vm_area((void *)addr);
 
 	if (!p) {
 		pr_err("iounmap: bad address %p\n", addr);
diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
index b7a6055..13068ee 100644
--- a/arch/unicore32/mm/ioremap.c
+++ b/arch/unicore32/mm/ioremap.c
@@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
 void __uc32_iounmap(volatile void __iomem *io_addr)
 {
 	void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
-	struct vm_struct **p, *tmp;
+	struct vm_struct *vm;
 
 	/*
 	 * If this is a section based mapping we need to handle it
@@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
 	 * all the mappings before the area can be reclaimed
 	 * by someone else.
 	 */
-	write_lock(&vmlist_lock);
-	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
-		if ((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) {
-			if (tmp->flags & VM_UNICORE_SECTION_MAPPING) {
-				unmap_area_sections((unsigned long)tmp->addr,
-						    tmp->size);
-			}
-			break;
-		}
-	}
-	write_unlock(&vmlist_lock);
+	vm = find_vm_area(addr);
+	if (vm && (vm->flags & VM_IOREMAP) &&
+		(vm->flags & VM_UNICORE_SECTION_MAPPING))
+		unmap_area_sections((unsigned long)vm->addr, vm->size);
 
 	vunmap(addr);
 }
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 78fe3f1..9a1e658 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
 	   in parallel. Reuse of the virtual address is prevented by
 	   leaving it in the global lists until we're done with it.
 	   cpa takes care of the direct mappings. */
-	read_lock(&vmlist_lock);
-	for (p = vmlist; p; p = p->next) {
-		if (p->addr == (void __force *)addr)
-			break;
-	}
-	read_unlock(&vmlist_lock);
+	p = find_vm_area((void __force *)addr);
 
 	if (!p) {
 		printk(KERN_ERR "iounmap: bad address %p\n", addr);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 3/8] mm, vmalloc: protect va->vm by vmap_area_lock Joonsoo Kim
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

Now get_vmalloc_info() is in fs/proc/mmu.c. There is no reason
that this code must be here and it's implementation needs vmlist_lock
and it iterate a vmlist which may be internal data structure for vmalloc.

It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
for maintainability. So move the code to vmalloc.c

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index 99349ef..88092c1 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -5,7 +5,7 @@
 obj-y   += proc.o
 
 proc-y			:= nommu.o task_nommu.o
-proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
+proc-$(CONFIG_MMU)	:= task_mmu.o
 
 proc-y       += inode.o root.o base.o generic.o array.o \
 		proc_tty.o fd.o
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 43973b0..5a1eda2 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -28,24 +28,6 @@ extern int proc_net_init(void);
 static inline int proc_net_init(void) { return 0; }
 #endif
 
-struct vmalloc_info {
-	unsigned long	used;
-	unsigned long	largest_chunk;
-};
-
-#ifdef CONFIG_MMU
-#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
-extern void get_vmalloc_info(struct vmalloc_info *vmi);
-#else
-
-#define VMALLOC_TOTAL 0UL
-#define get_vmalloc_info(vmi)			\
-do {						\
-	(vmi)->used = 0;			\
-	(vmi)->largest_chunk = 0;		\
-} while(0)
-#endif
-
 extern int proc_tid_stat(struct seq_file *m, struct pid_namespace *ns,
 				struct pid *pid, struct task_struct *task);
 extern int proc_tgid_stat(struct seq_file *m, struct pid_namespace *ns,
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 80e4645..c594dfb 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -11,6 +11,7 @@
 #include <linux/swap.h>
 #include <linux/vmstat.h>
 #include <linux/atomic.h>
+#include <linux/vmalloc.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include "internal.h"
diff --git a/fs/proc/mmu.c b/fs/proc/mmu.c
deleted file mode 100644
index 8ae221d..0000000
--- a/fs/proc/mmu.c
+++ /dev/null
@@ -1,60 +0,0 @@
-/* mmu.c: mmu memory info files
- *
- * Copyright (C) 2004 Red Hat, Inc. All Rights Reserved.
- * Written by David Howells (dhowells@redhat.com)
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-#include <linux/spinlock.h>
-#include <linux/vmalloc.h>
-#include <linux/highmem.h>
-#include <asm/pgtable.h>
-#include "internal.h"
-
-void get_vmalloc_info(struct vmalloc_info *vmi)
-{
-	struct vm_struct *vma;
-	unsigned long free_area_size;
-	unsigned long prev_end;
-
-	vmi->used = 0;
-
-	if (!vmlist) {
-		vmi->largest_chunk = VMALLOC_TOTAL;
-	}
-	else {
-		vmi->largest_chunk = 0;
-
-		prev_end = VMALLOC_START;
-
-		read_lock(&vmlist_lock);
-
-		for (vma = vmlist; vma; vma = vma->next) {
-			unsigned long addr = (unsigned long) vma->addr;
-
-			/*
-			 * Some archs keep another range for modules in vmlist
-			 */
-			if (addr < VMALLOC_START)
-				continue;
-			if (addr >= VMALLOC_END)
-				break;
-
-			vmi->used += vma->size;
-
-			free_area_size = addr - prev_end;
-			if (vmi->largest_chunk < free_area_size)
-				vmi->largest_chunk = free_area_size;
-
-			prev_end = vma->size + addr;
-		}
-
-		if (VMALLOC_END - prev_end > vmi->largest_chunk)
-			vmi->largest_chunk = VMALLOC_END - prev_end;
-
-		read_unlock(&vmlist_lock);
-	}
-}
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 6071e91..698b1e5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -158,4 +158,22 @@ pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 # endif
 #endif
 
+struct vmalloc_info {
+	unsigned long   used;
+	unsigned long   largest_chunk;
+};
+
+#ifdef CONFIG_MMU
+#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
+extern void get_vmalloc_info(struct vmalloc_info *vmi);
+#else
+
+#define VMALLOC_TOTAL 0UL
+#define get_vmalloc_info(vmi)			\
+do {						\
+	(vmi)->used = 0;			\
+	(vmi)->largest_chunk = 0;		\
+} while (0)
+#endif
+
 #endif /* _LINUX_VMALLOC_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 78e0830..16147bc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2642,5 +2642,49 @@ static int __init proc_vmalloc_init(void)
 	return 0;
 }
 module_init(proc_vmalloc_init);
+
+void get_vmalloc_info(struct vmalloc_info *vmi)
+{
+	struct vm_struct *vma;
+	unsigned long free_area_size;
+	unsigned long prev_end;
+
+	vmi->used = 0;
+
+	if (!vmlist) {
+		vmi->largest_chunk = VMALLOC_TOTAL;
+	} else {
+		vmi->largest_chunk = 0;
+
+		prev_end = VMALLOC_START;
+
+		read_lock(&vmlist_lock);
+
+		for (vma = vmlist; vma; vma = vma->next) {
+			unsigned long addr = (unsigned long) vma->addr;
+
+			/*
+			 * Some archs keep another range for modules in vmlist
+			 */
+			if (addr < VMALLOC_START)
+				continue;
+			if (addr >= VMALLOC_END)
+				break;
+
+			vmi->used += vma->size;
+
+			free_area_size = addr - prev_end;
+			if (vmi->largest_chunk < free_area_size)
+				vmi->largest_chunk = free_area_size;
+
+			prev_end = vma->size + addr;
+		}
+
+		if (VMALLOC_END - prev_end > vmi->largest_chunk)
+			vmi->largest_chunk = VMALLOC_END - prev_end;
+
+		read_unlock(&vmlist_lock);
+	}
+}
 #endif
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 3/8] mm, vmalloc: protect va->vm by vmap_area_lock
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() Joonsoo Kim
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

Inserting and removing an entry to vmlist is linear time complexity, so
it is inefficient. Following patches will try to remove vmlist entirely.
This patch is preparing step for it.

For removing vmlist, iterating vmlist codes should be changed to iterating
a vmap_area_list. Before implementing that, we should make sure that
when we iterate a vmap_area_list, accessing to va->vm doesn't cause a race
condition. This patch ensure that when iterating a vmap_area_list,
there is no race condition for accessing to vm_struct.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 16147bc..a0b85a6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1290,12 +1290,14 @@ struct vm_struct *vmlist;
 static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 			      unsigned long flags, const void *caller)
 {
+	spin_lock(&vmap_area_lock);
 	vm->flags = flags;
 	vm->addr = (void *)va->va_start;
 	vm->size = va->va_end - va->va_start;
 	vm->caller = caller;
 	va->vm = vm;
 	va->flags |= VM_VM_AREA;
+	spin_unlock(&vmap_area_lock);
 }
 
 static void insert_vmalloc_vmlist(struct vm_struct *vm)
@@ -1446,6 +1448,11 @@ struct vm_struct *remove_vm_area(const void *addr)
 	if (va && va->flags & VM_VM_AREA) {
 		struct vm_struct *vm = va->vm;
 
+		spin_lock(&vmap_area_lock);
+		va->vm = NULL;
+		va->flags &= ~VM_VM_AREA;
+		spin_unlock(&vmap_area_lock);
+
 		if (!(vm->flags & VM_UNLIST)) {
 			struct vm_struct *tmp, **p;
 			/*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite()
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (2 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 3/8] mm, vmalloc: protect va->vm by vmap_area_lock Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() Joonsoo Kim
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

Now, when we hold a vmap_area_lock, va->vm can't be discarded. So we can
safely access to va->vm when iterating a vmap_area_list with holding a
vmap_area_lock. With this property, change iterating vmlist codes in
vread/vwrite() to iterating vmap_area_list.

There is a little difference relate to lock, because vmlist_lock is mutex,
but, vmap_area_lock is spin_lock. It may introduce a spinning overhead
during vread/vwrite() is executing. But, these are debug-oriented
functions, so this overhead is not real problem for common case.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a0b85a6..d21167f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2009,7 +2009,8 @@ static int aligned_vwrite(char *buf, char *addr, unsigned long count)
 
 long vread(char *buf, char *addr, unsigned long count)
 {
-	struct vm_struct *tmp;
+	struct vmap_area *va;
+	struct vm_struct *vm;
 	char *vaddr, *buf_start = buf;
 	unsigned long buflen = count;
 	unsigned long n;
@@ -2018,10 +2019,17 @@ long vread(char *buf, char *addr, unsigned long count)
 	if ((unsigned long) addr + count < count)
 		count = -(unsigned long) addr;
 
-	read_lock(&vmlist_lock);
-	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
-		vaddr = (char *) tmp->addr;
-		if (addr >= vaddr + tmp->size - PAGE_SIZE)
+	spin_lock(&vmap_area_lock);
+	list_for_each_entry(va, &vmap_area_list, list) {
+		if (!count)
+			break;
+
+		if (!(va->flags & VM_VM_AREA))
+			continue;
+
+		vm = va->vm;
+		vaddr = (char *) vm->addr;
+		if (addr >= vaddr + vm->size - PAGE_SIZE)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -2031,10 +2039,10 @@ long vread(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + tmp->size - PAGE_SIZE - addr;
+		n = vaddr + vm->size - PAGE_SIZE - addr;
 		if (n > count)
 			n = count;
-		if (!(tmp->flags & VM_IOREMAP))
+		if (!(vm->flags & VM_IOREMAP))
 			aligned_vread(buf, addr, n);
 		else /* IOREMAP area is treated as memory hole */
 			memset(buf, 0, n);
@@ -2043,7 +2051,7 @@ long vread(char *buf, char *addr, unsigned long count)
 		count -= n;
 	}
 finished:
-	read_unlock(&vmlist_lock);
+	spin_unlock(&vmap_area_lock);
 
 	if (buf == buf_start)
 		return 0;
@@ -2082,7 +2090,8 @@ finished:
 
 long vwrite(char *buf, char *addr, unsigned long count)
 {
-	struct vm_struct *tmp;
+	struct vmap_area *va;
+	struct vm_struct *vm;
 	char *vaddr;
 	unsigned long n, buflen;
 	int copied = 0;
@@ -2092,10 +2101,17 @@ long vwrite(char *buf, char *addr, unsigned long count)
 		count = -(unsigned long) addr;
 	buflen = count;
 
-	read_lock(&vmlist_lock);
-	for (tmp = vmlist; count && tmp; tmp = tmp->next) {
-		vaddr = (char *) tmp->addr;
-		if (addr >= vaddr + tmp->size - PAGE_SIZE)
+	spin_lock(&vmap_area_lock);
+	list_for_each_entry(va, &vmap_area_list, list) {
+		if (!count)
+			break;
+
+		if (!(va->flags & VM_VM_AREA))
+			continue;
+
+		vm = va->vm;
+		vaddr = (char *) vm->addr;
+		if (addr >= vaddr + vm->size - PAGE_SIZE)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -2104,10 +2120,10 @@ long vwrite(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + tmp->size - PAGE_SIZE - addr;
+		n = vaddr + vm->size - PAGE_SIZE - addr;
 		if (n > count)
 			n = count;
-		if (!(tmp->flags & VM_IOREMAP)) {
+		if (!(vm->flags & VM_IOREMAP)) {
 			aligned_vwrite(buf, addr, n);
 			copied++;
 		}
@@ -2116,7 +2132,7 @@ long vwrite(char *buf, char *addr, unsigned long count)
 		count -= n;
 	}
 finished:
-	read_unlock(&vmlist_lock);
+	spin_unlock(&vmap_area_lock);
 	if (!copied)
 		return 0;
 	return buflen;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (3 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() Joonsoo Kim
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

This patch is preparing step for removing vmlist entirely.
For above purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list. It is somewhat trivial change, but just one thing
should be noticed.

vmlist is lack of information about some areas in vmalloc address space.
For example, vm_map_ram() allocate area in vmalloc address space,
but it doesn't make a link with vmlist. To provide full information about
vmalloc address space is better idea, so we don't use va->vm and use
vmap_area directly.
This makes get_vmalloc_info() more precise.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d21167f..f7f4a35 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2668,46 +2668,47 @@ module_init(proc_vmalloc_init);
 
 void get_vmalloc_info(struct vmalloc_info *vmi)
 {
-	struct vm_struct *vma;
+	struct vmap_area *va;
 	unsigned long free_area_size;
 	unsigned long prev_end;
 
 	vmi->used = 0;
+	vmi->largest_chunk = 0;
 
-	if (!vmlist) {
-		vmi->largest_chunk = VMALLOC_TOTAL;
-	} else {
-		vmi->largest_chunk = 0;
+	prev_end = VMALLOC_START;
 
-		prev_end = VMALLOC_START;
+	spin_lock(&vmap_area_lock);
 
-		read_lock(&vmlist_lock);
+	if (list_empty(&vmap_area_list)) {
+		vmi->largest_chunk = VMALLOC_TOTAL;
+		goto out;
+	}
 
-		for (vma = vmlist; vma; vma = vma->next) {
-			unsigned long addr = (unsigned long) vma->addr;
+	list_for_each_entry(va, &vmap_area_list, list) {
+		unsigned long addr = va->va_start;
 
-			/*
-			 * Some archs keep another range for modules in vmlist
-			 */
-			if (addr < VMALLOC_START)
-				continue;
-			if (addr >= VMALLOC_END)
-				break;
+		/*
+		 * Some archs keep another range for modules in vmalloc space
+		 */
+		if (addr < VMALLOC_START)
+			continue;
+		if (addr >= VMALLOC_END)
+			break;
 
-			vmi->used += vma->size;
+		vmi->used += (va->va_end - va->va_start);
 
-			free_area_size = addr - prev_end;
-			if (vmi->largest_chunk < free_area_size)
-				vmi->largest_chunk = free_area_size;
+		free_area_size = addr - prev_end;
+		if (vmi->largest_chunk < free_area_size)
+			vmi->largest_chunk = free_area_size;
 
-			prev_end = vma->size + addr;
-		}
+		prev_end = va->va_end;
+	}
 
-		if (VMALLOC_END - prev_end > vmi->largest_chunk)
-			vmi->largest_chunk = VMALLOC_END - prev_end;
+	if (VMALLOC_END - prev_end > vmi->largest_chunk)
+		vmi->largest_chunk = VMALLOC_END - prev_end;
 
-		read_unlock(&vmlist_lock);
-	}
+out:
+	spin_unlock(&vmap_area_lock);
 }
 #endif
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo()
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (4 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 7/8] mm, vmalloc: makes vmlist only for kexec Joonsoo Kim
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

This patch is preparing step for removing vmlist entirely.
For above purpose, we change iterating a vmap_list codes to iterating a
vmap_area_list. It is somewhat trivial change, but just one thing
should be noticed.

Using vmap_area_list in vmallocinfo() introduce ordering problem in SMP
system. In s_show(), we retrieve some values from vm_struct. vm_struct's
values is not fully setup when va->vm is assigned. Full setup is notified
by removing VM_UNLIST flag without holding a lock. When we see that
VM_UNLIST is removed, it is not ensured that vm_struct has proper values
in view of other CPUs. So we need smp_[rw]mb for ensuring that proper
values is assigned when we see that VM_UNLIST is removed.

Therefore, this patch not only change a iteration list, but also add a
appropriate smp_[rw]mb to right places.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f7f4a35..f134950 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1304,7 +1304,14 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm)
 {
 	struct vm_struct *tmp, **p;
 
+	/*
+	 * Before removing VM_UNLIST,
+	 * we should make sure that vm has proper values.
+	 * Pair with smp_rmb() in show_numa_info().
+	 */
+	smp_wmb();
 	vm->flags &= ~VM_UNLIST;
+
 	write_lock(&vmlist_lock);
 	for (p = &vmlist; (tmp = *p) != NULL; p = &tmp->next) {
 		if (tmp->addr >= vm->addr)
@@ -2539,19 +2546,19 @@ void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms)
 
 #ifdef CONFIG_PROC_FS
 static void *s_start(struct seq_file *m, loff_t *pos)
-	__acquires(&vmlist_lock)
+	__acquires(&vmap_area_lock)
 {
 	loff_t n = *pos;
-	struct vm_struct *v;
+	struct vmap_area *va;
 
-	read_lock(&vmlist_lock);
-	v = vmlist;
-	while (n > 0 && v) {
+	spin_lock(&vmap_area_lock);
+	va = list_entry((&vmap_area_list)->next, typeof(*va), list);
+	while (n > 0 && &va->list != &vmap_area_list) {
 		n--;
-		v = v->next;
+		va = list_entry(va->list.next, typeof(*va), list);
 	}
-	if (!n)
-		return v;
+	if (!n && &va->list != &vmap_area_list)
+		return va;
 
 	return NULL;
 
@@ -2559,16 +2566,20 @@ static void *s_start(struct seq_file *m, loff_t *pos)
 
 static void *s_next(struct seq_file *m, void *p, loff_t *pos)
 {
-	struct vm_struct *v = p;
+	struct vmap_area *va = p, *next;
 
 	++*pos;
-	return v->next;
+	next = list_entry(va->list.next, typeof(*va), list);
+	if (&next->list != &vmap_area_list)
+		return next;
+
+	return NULL;
 }
 
 static void s_stop(struct seq_file *m, void *p)
-	__releases(&vmlist_lock)
+	__releases(&vmap_area_lock)
 {
-	read_unlock(&vmlist_lock);
+	spin_unlock(&vmap_area_lock);
 }
 
 static void show_numa_info(struct seq_file *m, struct vm_struct *v)
@@ -2579,6 +2590,11 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
 		if (!counters)
 			return;
 
+		/* Pair with smp_wmb() in insert_vmalloc_vmlist() */
+		smp_rmb();
+		if (v->flags & VM_UNLIST)
+			return;
+
 		memset(counters, 0, nr_node_ids * sizeof(unsigned int));
 
 		for (nr = 0; nr < v->nr_pages; nr++)
@@ -2592,36 +2608,50 @@ static void show_numa_info(struct seq_file *m, struct vm_struct *v)
 
 static int s_show(struct seq_file *m, void *p)
 {
-	struct vm_struct *v = p;
+	struct vmap_area *va = p;
+	struct vm_struct *vm;
+
+	if (!(va->flags & VM_VM_AREA)) {
+		seq_printf(m, "0x%pK-0x%pK %7ld",
+			(void *)va->va_start, (void *)va->va_end,
+					va->va_end - va->va_start);
+		if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEING))
+			seq_printf(m, " (freeing)");
+
+		seq_putc(m, '\n');
+		return 0;
+	}
+
+	vm = va->vm;
 
 	seq_printf(m, "0x%pK-0x%pK %7ld",
-		v->addr, v->addr + v->size, v->size);
+		vm->addr, vm->addr + vm->size, vm->size);
 
-	if (v->caller)
-		seq_printf(m, " %pS", v->caller);
+	if (vm->caller)
+		seq_printf(m, " %pS", vm->caller);
 
-	if (v->nr_pages)
-		seq_printf(m, " pages=%d", v->nr_pages);
+	if (vm->nr_pages)
+		seq_printf(m, " pages=%d", vm->nr_pages);
 
-	if (v->phys_addr)
-		seq_printf(m, " phys=%llx", (unsigned long long)v->phys_addr);
+	if (vm->phys_addr)
+		seq_printf(m, " phys=%llx", (unsigned long long)vm->phys_addr);
 
-	if (v->flags & VM_IOREMAP)
+	if (vm->flags & VM_IOREMAP)
 		seq_printf(m, " ioremap");
 
-	if (v->flags & VM_ALLOC)
+	if (vm->flags & VM_ALLOC)
 		seq_printf(m, " vmalloc");
 
-	if (v->flags & VM_MAP)
+	if (vm->flags & VM_MAP)
 		seq_printf(m, " vmap");
 
-	if (v->flags & VM_USERMAP)
+	if (vm->flags & VM_USERMAP)
 		seq_printf(m, " user");
 
-	if (v->flags & VM_VPAGES)
+	if (vm->flags & VM_VPAGES)
 		seq_printf(m, " vpages");
 
-	show_numa_info(m, v);
+	show_numa_info(m, vm);
 	seq_putc(m, '\n');
 	return 0;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 7/8] mm, vmalloc: makes vmlist only for kexec
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (5 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 16:09 ` [RFC PATCH 8/8] mm, vmalloc: remove list management operation after initializing vmalloc Joonsoo Kim
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim, Eric Biederman

Although our intention remove vmlist entirely, but there is one exception.
kexec use vmlist symbol, and we can't remove it, because it is related to
userspace program. When kexec dumps system information, it write vmlist
address and vm_struct's address offset. In userspace program, these
information is used for getting first address in vmalloc space. Then it
dumps memory content in vmalloc space which is higher than this address.
For supporting this optimization, we should maintain a vmlist.

But this doesn't means that we should maintain full vmlist.
Just one vm_struct for vmlist is sufficient.
So use vmlist_early for full chain of vm_struct and assign a dummy_vm
to vmlist for supporting kexec.

Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index f134950..8a1b959 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,6 +272,27 @@ static unsigned long cached_align;
 
 static unsigned long vmap_area_pcpu_hole;
 
+/*** Old vmalloc interfaces ***/
+DEFINE_RWLOCK(vmlist_lock);
+/* vmlist is only for kexec */
+struct vm_struct *vmlist;
+static struct vm_struct dummy_vm;
+
+/* This is only for kexec.
+ * It wants to know first vmalloc address for optimization */
+static void setup_vmlist(void)
+{
+	struct vmap_area *va;
+
+	if (list_empty(&vmap_area_list)) {
+		vmlist = NULL;
+	} else {
+		va = list_entry((&vmap_area_list)->next, typeof(*va), list);
+		dummy_vm.addr = (void *)va->va_start;
+		vmlist = &dummy_vm;
+	}
+}
+
 static struct vmap_area *__find_vmap_area(unsigned long addr)
 {
 	struct rb_node *n = vmap_area_root.rb_node;
@@ -313,7 +334,7 @@ static void __insert_vmap_area(struct vmap_area *va)
 	rb_link_node(&va->rb_node, parent, p);
 	rb_insert_color(&va->rb_node, &vmap_area_root);
 
-	/* address-sort this list so it is usable like the vmlist */
+	/* address-sort this list so it is usable like the vmlist_early */
 	tmp = rb_prev(&va->rb_node);
 	if (tmp) {
 		struct vmap_area *prev;
@@ -321,6 +342,8 @@ static void __insert_vmap_area(struct vmap_area *va)
 		list_add_rcu(&va->list, &prev->list);
 	} else
 		list_add_rcu(&va->list, &vmap_area_list);
+
+	setup_vmlist();
 }
 
 static void purge_vmap_area_lazy(void);
@@ -485,6 +508,8 @@ static void __free_vmap_area(struct vmap_area *va)
 		vmap_area_pcpu_hole = max(vmap_area_pcpu_hole, va->va_end);
 
 	kfree_rcu(va, rcu_head);
+
+	setup_vmlist();
 }
 
 /*
@@ -1125,11 +1150,13 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro
 }
 EXPORT_SYMBOL(vm_map_ram);
 
+static struct vm_struct *vmlist_early;
+
 /**
  * vm_area_add_early - add vmap area early during boot
  * @vm: vm_struct to add
  *
- * This function is used to add fixed kernel vm area to vmlist before
+ * This function is used to add fixed kernel vm area to vmlist_early before
  * vmalloc_init() is called.  @vm->addr, @vm->size, and @vm->flags
  * should contain proper values and the other fields should be zero.
  *
@@ -1140,7 +1167,7 @@ void __init vm_area_add_early(struct vm_struct *vm)
 	struct vm_struct *tmp, **p;
 
 	BUG_ON(vmap_initialized);
-	for (p = &vmlist; (tmp = *p) != NULL; p = &tmp->next) {
+	for (p = &vmlist_early; (tmp = *p) != NULL; p = &tmp->next) {
 		if (tmp->addr >= vm->addr) {
 			BUG_ON(tmp->addr < vm->addr + vm->size);
 			break;
@@ -1190,8 +1217,8 @@ void __init vmalloc_init(void)
 		INIT_LIST_HEAD(&vbq->free);
 	}
 
-	/* Import existing vmlist entries. */
-	for (tmp = vmlist; tmp; tmp = tmp->next) {
+	/* Import existing vmlist_early entries. */
+	for (tmp = vmlist_early; tmp; tmp = tmp->next) {
 		va = kzalloc(sizeof(struct vmap_area), GFP_NOWAIT);
 		va->flags = VM_VM_AREA;
 		va->va_start = (unsigned long)tmp->addr;
@@ -1283,10 +1310,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page ***pages)
 }
 EXPORT_SYMBOL_GPL(map_vm_area);
 
-/*** Old vmalloc interfaces ***/
-DEFINE_RWLOCK(vmlist_lock);
-struct vm_struct *vmlist;
-
 static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 			      unsigned long flags, const void *caller)
 {
@@ -1313,7 +1336,7 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm)
 	vm->flags &= ~VM_UNLIST;
 
 	write_lock(&vmlist_lock);
-	for (p = &vmlist; (tmp = *p) != NULL; p = &tmp->next) {
+	for (p = &vmlist_early; (tmp = *p) != NULL; p = &tmp->next) {
 		if (tmp->addr >= vm->addr)
 			break;
 	}
@@ -1369,7 +1392,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	/*
 	 * When this function is called from __vmalloc_node_range,
-	 * we do not add vm_struct to vmlist here to avoid
+	 * we do not add vm_struct to vmlist_early here to avoid
 	 * accessing uninitialized members of vm_struct such as
 	 * pages and nr_pages fields. They will be set later.
 	 * To distinguish it from others, we use a VM_UNLIST flag.
@@ -1468,7 +1491,8 @@ struct vm_struct *remove_vm_area(const void *addr)
 			 * confliction is maintained by vmap.)
 			 */
 			write_lock(&vmlist_lock);
-			for (p = &vmlist; (tmp = *p) != vm; p = &tmp->next)
+			for (p = &vmlist_early; (tmp = *p) != vm;
+							p = &tmp->next)
 				;
 			*p = tmp->next;
 			write_unlock(&vmlist_lock);
@@ -1694,7 +1718,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 
 	/*
 	 * In this function, newly allocated vm_struct is not added
-	 * to vmlist at __get_vm_area_node(). so, it is added here.
+	 * to vmlist_early at __get_vm_area_node(). so, it is added here.
 	 */
 	insert_vmalloc_vmlist(area);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 8/8] mm, vmalloc: remove list management operation after initializing vmalloc
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (6 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 7/8] mm, vmalloc: makes vmlist only for kexec Joonsoo Kim
@ 2012-12-06 16:09 ` Joonsoo Kim
  2012-12-06 22:45 ` [RFC PATCH 0/8] remove vm_struct list management Andrew Morton
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 28+ messages in thread
From: Joonsoo Kim @ 2012-12-06 16:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Joonsoo Kim

Now, there is no need to maintain vmlist_early after initializing vmalloc.
So remove related code and data structure.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 698b1e5..10d19c9 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -130,7 +130,6 @@ extern long vwrite(char *buf, char *addr, unsigned long count);
 /*
  *	Internals.  Dont't use..
  */
-extern rwlock_t vmlist_lock;
 extern struct vm_struct *vmlist;
 extern __init void vm_area_add_early(struct vm_struct *vm);
 extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 8a1b959..957a098 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,8 +272,6 @@ static unsigned long cached_align;
 
 static unsigned long vmap_area_pcpu_hole;
 
-/*** Old vmalloc interfaces ***/
-DEFINE_RWLOCK(vmlist_lock);
 /* vmlist is only for kexec */
 struct vm_struct *vmlist;
 static struct vm_struct dummy_vm;
@@ -334,7 +332,7 @@ static void __insert_vmap_area(struct vmap_area *va)
 	rb_link_node(&va->rb_node, parent, p);
 	rb_insert_color(&va->rb_node, &vmap_area_root);
 
-	/* address-sort this list so it is usable like the vmlist_early */
+	/* address-sort this list */
 	tmp = rb_prev(&va->rb_node);
 	if (tmp) {
 		struct vmap_area *prev;
@@ -1150,7 +1148,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro
 }
 EXPORT_SYMBOL(vm_map_ram);
 
-static struct vm_struct *vmlist_early;
+static struct vm_struct *vmlist_early __initdata;
 
 /**
  * vm_area_add_early - add vmap area early during boot
@@ -1323,7 +1321,7 @@ static void setup_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 	spin_unlock(&vmap_area_lock);
 }
 
-static void insert_vmalloc_vmlist(struct vm_struct *vm)
+static void remove_vm_unlist(struct vm_struct *vm)
 {
 	struct vm_struct *tmp, **p;
 
@@ -1334,22 +1332,13 @@ static void insert_vmalloc_vmlist(struct vm_struct *vm)
 	 */
 	smp_wmb();
 	vm->flags &= ~VM_UNLIST;
-
-	write_lock(&vmlist_lock);
-	for (p = &vmlist_early; (tmp = *p) != NULL; p = &tmp->next) {
-		if (tmp->addr >= vm->addr)
-			break;
-	}
-	vm->next = *p;
-	*p = vm;
-	write_unlock(&vmlist_lock);
 }
 
 static void insert_vmalloc_vm(struct vm_struct *vm, struct vmap_area *va,
 			      unsigned long flags, const void *caller)
 {
 	setup_vmalloc_vm(vm, va, flags, caller);
-	insert_vmalloc_vmlist(vm);
+	remove_vm_unlist(vm);
 }
 
 static struct vm_struct *__get_vm_area_node(unsigned long size,
@@ -1392,10 +1381,9 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 
 	/*
 	 * When this function is called from __vmalloc_node_range,
-	 * we do not add vm_struct to vmlist_early here to avoid
-	 * accessing uninitialized members of vm_struct such as
-	 * pages and nr_pages fields. They will be set later.
-	 * To distinguish it from others, we use a VM_UNLIST flag.
+	 * we add VM_UNLIST flag to avoid accessing uninitialized
+	 * members of vm_struct such as pages and nr_pages fields.
+	 * They will be set later.
 	 */
 	if (flags & VM_UNLIST)
 		setup_vmalloc_vm(area, va, flags, caller);
@@ -1483,21 +1471,6 @@ struct vm_struct *remove_vm_area(const void *addr)
 		va->flags &= ~VM_VM_AREA;
 		spin_unlock(&vmap_area_lock);
 
-		if (!(vm->flags & VM_UNLIST)) {
-			struct vm_struct *tmp, **p;
-			/*
-			 * remove from list and disallow access to
-			 * this vm_struct before unmap. (address range
-			 * confliction is maintained by vmap.)
-			 */
-			write_lock(&vmlist_lock);
-			for (p = &vmlist_early; (tmp = *p) != vm;
-							p = &tmp->next)
-				;
-			*p = tmp->next;
-			write_unlock(&vmlist_lock);
-		}
-
 		vmap_debug_free_range(va->va_start, va->va_end);
 		free_unmap_vmap_area(va);
 		vm->size -= PAGE_SIZE;
@@ -1717,10 +1690,11 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
 		return NULL;
 
 	/*
-	 * In this function, newly allocated vm_struct is not added
-	 * to vmlist_early at __get_vm_area_node(). so, it is added here.
+	 * In this function, newly allocated vm_struct has VM_UNLIST flag.
+	 * It means that vm_struct is not fully initialized.
+	 * Now, it is fully initialized, so remove this flag here.
 	 */
-	insert_vmalloc_vmlist(area);
+	remove_vm_unlist(area);
 
 	/*
 	 * A ref_count = 3 is needed because the vm_struct and vmap_area
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (7 preceding siblings ...)
  2012-12-06 16:09 ` [RFC PATCH 8/8] mm, vmalloc: remove list management operation after initializing vmalloc Joonsoo Kim
@ 2012-12-06 22:45 ` Andrew Morton
  2012-12-07 13:05   ` JoonSoo Kim
  2012-12-06 22:50 ` Andrew Morton
  2012-12-07  3:37 ` Bob Liu
  10 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2012-12-06 22:45 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Russell King, linux-kernel, linux-mm, kexec

On Fri,  7 Dec 2012 01:09:27 +0900
Joonsoo Kim <js1304@gmail.com> wrote:

> This patchset remove vm_struct list management after initializing vmalloc.
> Adding and removing an entry to vmlist is linear time complexity, so
> it is inefficient. If we maintain this list, overall time complexity of
> adding and removing area to vmalloc space is O(N), although we use
> rbtree for finding vacant place and it's time complexity is just O(logN).
> 
> And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
> It is preferable that we hide this raw data structure and provide
> well-defined function for supporting them, because it makes that they
> cannot mistake when manipulating theses structure and it makes us easily
> maintain vmalloc layer.
> 
> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
> Because it is related to userspace program.
> As far as I know, makedumpfile use kexec's output information and it only
> need first address of vmalloc layer. So my implementation reflect this
> fact, but I'm not sure. And now, I don't fully test this patchset.
> Basic operation work well, but I don't test kexec. So I send this
> patchset with 'RFC'.
> 
> Please let me know what I am missing.
> 
> This series based on v3.7-rc7 and on top of submitted patchset for ARM.
> 'introduce static_vm for ARM-specific static mapped area'
> https://lkml.org/lkml/2012/11/27/356
> But, running properly on x86 without ARM patchset.

This all looks rather nice, but not mergeable into anything at this
stage in the release cycle.

What are the implications of "on top of submitted patchset for ARM"? 
Does it depens on the ARM patches in any way, or it it independently
mergeable and testable?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (8 preceding siblings ...)
  2012-12-06 22:45 ` [RFC PATCH 0/8] remove vm_struct list management Andrew Morton
@ 2012-12-06 22:50 ` Andrew Morton
  2012-12-07 13:16   ` JoonSoo Kim
  2012-12-07  3:37 ` Bob Liu
  10 siblings, 1 reply; 28+ messages in thread
From: Andrew Morton @ 2012-12-06 22:50 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Russell King, linux-kernel, linux-mm, kexec, Vivek Goyal

On Fri,  7 Dec 2012 01:09:27 +0900
Joonsoo Kim <js1304@gmail.com> wrote:

> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
> Because it is related to userspace program.
> As far as I know, makedumpfile use kexec's output information and it only
> need first address of vmalloc layer. So my implementation reflect this
> fact, but I'm not sure. And now, I don't fully test this patchset.
> Basic operation work well, but I don't test kexec. So I send this
> patchset with 'RFC'.

Yes, this is irritating.  Perhaps Vivek or one of the other kexec
people could take a look at this please - if would obviously be much
better if we can avoid merging [patch 7/8] at all.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
                   ` (9 preceding siblings ...)
  2012-12-06 22:50 ` Andrew Morton
@ 2012-12-07  3:37 ` Bob Liu
  2012-12-07 13:35   ` JoonSoo Kim
  10 siblings, 1 reply; 28+ messages in thread
From: Bob Liu @ 2012-12-07  3:37 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec

Hi Joonsoo,

On Fri, Dec 7, 2012 at 12:09 AM, Joonsoo Kim <js1304@gmail.com> wrote:
> This patchset remove vm_struct list management after initializing vmalloc.
> Adding and removing an entry to vmlist is linear time complexity, so
> it is inefficient. If we maintain this list, overall time complexity of
> adding and removing area to vmalloc space is O(N), although we use
> rbtree for finding vacant place and it's time complexity is just O(logN).
>
> And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
> It is preferable that we hide this raw data structure and provide
> well-defined function for supporting them, because it makes that they
> cannot mistake when manipulating theses structure and it makes us easily
> maintain vmalloc layer.
>
> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
> Because it is related to userspace program.
> As far as I know, makedumpfile use kexec's output information and it only
> need first address of vmalloc layer. So my implementation reflect this
> fact, but I'm not sure. And now, I don't fully test this patchset.
> Basic operation work well, but I don't test kexec. So I send this
> patchset with 'RFC'.
>
> Please let me know what I am missing.
>

Nice work!
I also thought about this several weeks ago but I think the efficiency
may be a problem.

As you know two locks(vmap_area_lock and vmlist_lock) are used
currently so that some
work may be done in parallel(not proved).
If removed vmlist, i'm afraid vmap_area_lock will become a bottleneck
which will reduce the efficiency.

> This series based on v3.7-rc7 and on top of submitted patchset for ARM.
> 'introduce static_vm for ARM-specific static mapped area'
> https://lkml.org/lkml/2012/11/27/356
> But, running properly on x86 without ARM patchset.
>
> Joonsoo Kim (8):
>   mm, vmalloc: change iterating a vmlist to find_vm_area()
>   mm, vmalloc: move get_vmalloc_info() to vmalloc.c
>   mm, vmalloc: protect va->vm by vmap_area_lock
>   mm, vmalloc: iterate vmap_area_list, instead of vmlist in
>     vread/vwrite()
>   mm, vmalloc: iterate vmap_area_list in get_vmalloc_info()
>   mm, vmalloc: iterate vmap_area_list, instead of vmlist, in
>     vmallocinfo()
>   mm, vmalloc: makes vmlist only for kexec
>   mm, vmalloc: remove list management operation after initializing
>     vmalloc
>
>  arch/tile/mm/pgtable.c      |    7 +-
>  arch/unicore32/mm/ioremap.c |   17 +--
>  arch/x86/mm/ioremap.c       |    7 +-
>  fs/proc/Makefile            |    2 +-
>  fs/proc/internal.h          |   18 ---
>  fs/proc/meminfo.c           |    1 +
>  fs/proc/mmu.c               |   60 ----------
>  include/linux/vmalloc.h     |   19 +++-
>  mm/vmalloc.c                |  258 +++++++++++++++++++++++++++++--------------
>  9 files changed, 204 insertions(+), 185 deletions(-)
>  delete mode 100644 fs/proc/mmu.c
>
> --
> 1.7.9.5
>

-- 
Regards,
--Bob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
@ 2012-12-07  7:44   ` Pekka Enberg
  2012-12-07  8:15     ` Bob Liu
  2012-12-07 13:40     ` JoonSoo Kim
  2012-12-10  5:20   ` guanxuetao
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 28+ messages in thread
From: Pekka Enberg @ 2012-12-07  7:44 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec,
	Chris Metcalf, Guan Xuetao, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

On Thu, Dec 6, 2012 at 6:09 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> The purpose of iterating a vmlist is finding vm area with specific
> virtual address. find_vm_area() is provided for this purpose
> and more efficient, because it uses a rbtree.
> So change it.

You no longer take the 'vmlist_lock'. This is safe, because...?

> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
> index de0de0c..862782d 100644
> --- a/arch/tile/mm/pgtable.c
> +++ b/arch/tile/mm/pgtable.c
> @@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
>            in parallel. Reuse of the virtual address is prevented by
>            leaving it in the global lists until we're done with it.
>            cpa takes care of the direct mappings. */
> -       read_lock(&vmlist_lock);
> -       for (p = vmlist; p; p = p->next) {
> -               if (p->addr == addr)
> -                       break;
> -       }
> -       read_unlock(&vmlist_lock);
> +       p = find_vm_area((void *)addr);
>
>         if (!p) {
>                 pr_err("iounmap: bad address %p\n", addr);
> diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
> index b7a6055..13068ee 100644
> --- a/arch/unicore32/mm/ioremap.c
> +++ b/arch/unicore32/mm/ioremap.c
> @@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
>  void __uc32_iounmap(volatile void __iomem *io_addr)
>  {
>         void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
> -       struct vm_struct **p, *tmp;
> +       struct vm_struct *vm;
>
>         /*
>          * If this is a section based mapping we need to handle it
> @@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
>          * all the mappings before the area can be reclaimed
>          * by someone else.
>          */
> -       write_lock(&vmlist_lock);
> -       for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
> -               if ((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) {
> -                       if (tmp->flags & VM_UNICORE_SECTION_MAPPING) {
> -                               unmap_area_sections((unsigned long)tmp->addr,
> -                                                   tmp->size);
> -                       }
> -                       break;
> -               }
> -       }
> -       write_unlock(&vmlist_lock);
> +       vm = find_vm_area(addr);
> +       if (vm && (vm->flags & VM_IOREMAP) &&
> +               (vm->flags & VM_UNICORE_SECTION_MAPPING))
> +               unmap_area_sections((unsigned long)vm->addr, vm->size);
>
>         vunmap(addr);
>  }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 78fe3f1..9a1e658 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
>            in parallel. Reuse of the virtual address is prevented by
>            leaving it in the global lists until we're done with it.
>            cpa takes care of the direct mappings. */
> -       read_lock(&vmlist_lock);
> -       for (p = vmlist; p; p = p->next) {
> -               if (p->addr == (void __force *)addr)
> -                       break;
> -       }
> -       read_unlock(&vmlist_lock);
> +       p = find_vm_area((void __force *)addr);
>
>         if (!p) {
>                 printk(KERN_ERR "iounmap: bad address %p\n", addr);
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-07  7:44   ` Pekka Enberg
@ 2012-12-07  8:15     ` Bob Liu
  2012-12-07 13:40     ` JoonSoo Kim
  1 sibling, 0 replies; 28+ messages in thread
From: Bob Liu @ 2012-12-07  8:15 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Joonsoo Kim, Andrew Morton, Russell King, linux-kernel, linux-mm,
	kexec, Chris Metcalf, Guan Xuetao, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

On Fri, Dec 7, 2012 at 3:44 PM, Pekka Enberg <penberg@kernel.org> wrote:
> On Thu, Dec 6, 2012 at 6:09 PM, Joonsoo Kim <js1304@gmail.com> wrote:
>> The purpose of iterating a vmlist is finding vm area with specific
>> virtual address. find_vm_area() is provided for this purpose
>> and more efficient, because it uses a rbtree.
>> So change it.
>
> You no longer take the 'vmlist_lock'. This is safe, because...?
>

I think it's because find_vm_area() -> find_vmap_area() will use
vmap_area_lock instead.

>> Cc: Chris Metcalf <cmetcalf@tilera.com>
>> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
>> index de0de0c..862782d 100644
>> --- a/arch/tile/mm/pgtable.c
>> +++ b/arch/tile/mm/pgtable.c
>> @@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
>>            in parallel. Reuse of the virtual address is prevented by
>>            leaving it in the global lists until we're done with it.
>>            cpa takes care of the direct mappings. */
>> -       read_lock(&vmlist_lock);
>> -       for (p = vmlist; p; p = p->next) {
>> -               if (p->addr == addr)
>> -                       break;
>> -       }
>> -       read_unlock(&vmlist_lock);
>> +       p = find_vm_area((void *)addr);
>>
>>         if (!p) {
>>                 pr_err("iounmap: bad address %p\n", addr);
>> diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
>> index b7a6055..13068ee 100644
>> --- a/arch/unicore32/mm/ioremap.c
>> +++ b/arch/unicore32/mm/ioremap.c
>> @@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
>>  void __uc32_iounmap(volatile void __iomem *io_addr)
>>  {
>>         void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
>> -       struct vm_struct **p, *tmp;
>> +       struct vm_struct *vm;
>>
>>         /*
>>          * If this is a section based mapping we need to handle it
>> @@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
>>          * all the mappings before the area can be reclaimed
>>          * by someone else.
>>          */
>> -       write_lock(&vmlist_lock);
>> -       for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
>> -               if ((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) {
>> -                       if (tmp->flags & VM_UNICORE_SECTION_MAPPING) {
>> -                               unmap_area_sections((unsigned long)tmp->addr,
>> -                                                   tmp->size);
>> -                       }
>> -                       break;
>> -               }
>> -       }
>> -       write_unlock(&vmlist_lock);
>> +       vm = find_vm_area(addr);
>> +       if (vm && (vm->flags & VM_IOREMAP) &&
>> +               (vm->flags & VM_UNICORE_SECTION_MAPPING))
>> +               unmap_area_sections((unsigned long)vm->addr, vm->size);
>>
>>         vunmap(addr);
>>  }
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 78fe3f1..9a1e658 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
>>            in parallel. Reuse of the virtual address is prevented by
>>            leaving it in the global lists until we're done with it.
>>            cpa takes care of the direct mappings. */
>> -       read_lock(&vmlist_lock);
>> -       for (p = vmlist; p; p = p->next) {
>> -               if (p->addr == (void __force *)addr)
>> -                       break;
>> -       }
>> -       read_unlock(&vmlist_lock);
>> +       p = find_vm_area((void __force *)addr);
>>
>>         if (!p) {
>>                 printk(KERN_ERR "iounmap: bad address %p\n", addr);
>> --
>> 1.7.9.5

-- 
Thanks,
--Bob

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-06 22:45 ` [RFC PATCH 0/8] remove vm_struct list management Andrew Morton
@ 2012-12-07 13:05   ` JoonSoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-07 13:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec

Hello, Andrew.

2012/12/7 Andrew Morton <akpm@linux-foundation.org>:
> On Fri,  7 Dec 2012 01:09:27 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> This patchset remove vm_struct list management after initializing vmalloc.
>> Adding and removing an entry to vmlist is linear time complexity, so
>> it is inefficient. If we maintain this list, overall time complexity of
>> adding and removing area to vmalloc space is O(N), although we use
>> rbtree for finding vacant place and it's time complexity is just O(logN).
>>
>> And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
>> It is preferable that we hide this raw data structure and provide
>> well-defined function for supporting them, because it makes that they
>> cannot mistake when manipulating theses structure and it makes us easily
>> maintain vmalloc layer.
>>
>> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
>> Because it is related to userspace program.
>> As far as I know, makedumpfile use kexec's output information and it only
>> need first address of vmalloc layer. So my implementation reflect this
>> fact, but I'm not sure. And now, I don't fully test this patchset.
>> Basic operation work well, but I don't test kexec. So I send this
>> patchset with 'RFC'.
>>
>> Please let me know what I am missing.
>>
>> This series based on v3.7-rc7 and on top of submitted patchset for ARM.
>> 'introduce static_vm for ARM-specific static mapped area'
>> https://lkml.org/lkml/2012/11/27/356
>> But, running properly on x86 without ARM patchset.
>
> This all looks rather nice, but not mergeable into anything at this
> stage in the release cycle.
>
> What are the implications of "on top of submitted patchset for ARM"?
> Does it depens on the ARM patches in any way, or it it independently
> mergeable and testable?
>

Yes. It depends on ARM patches.
There is a code to manipulate a vmlist in ARM.
So without applying ARM patches, this patchset makes compile error for ARM.
But, build for x86 works fine with this patchset :)

In ARM patches, a method used for removing vmlist related code is same
as 1/8 of this patchset.
But, it includes some optimization for ARM, so I sent it separately.
If it can't be accepted, I can rework ARM patches like as 1/8 of this patchset.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-06 22:50 ` Andrew Morton
@ 2012-12-07 13:16   ` JoonSoo Kim
  2012-12-07 14:59     ` Vivek Goyal
  0 siblings, 1 reply; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-07 13:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Russell King, linux-kernel, linux-mm, kexec, Vivek Goyal

2012/12/7 Andrew Morton <akpm@linux-foundation.org>:
> On Fri,  7 Dec 2012 01:09:27 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
>> Because it is related to userspace program.
>> As far as I know, makedumpfile use kexec's output information and it only
>> need first address of vmalloc layer. So my implementation reflect this
>> fact, but I'm not sure. And now, I don't fully test this patchset.
>> Basic operation work well, but I don't test kexec. So I send this
>> patchset with 'RFC'.
>
> Yes, this is irritating.  Perhaps Vivek or one of the other kexec
> people could take a look at this please - if would obviously be much
> better if we can avoid merging [patch 7/8] at all.

I'm not sure, but I almost sure that [patch 7/8] have no problem.
In kexec.c, they write an address of vmlist and offset of vm_struct's
address field.
It imply that user for this information doesn't have any other
information about vm_struct,
and they can't use other field of vm_struct. They can use *only* address field.
So, remaining just one vm_struct for vmlist which represent first area
of vmalloc layer
may be safe.

But, kexec people may be very helpful to validate this patch.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-07  3:37 ` Bob Liu
@ 2012-12-07 13:35   ` JoonSoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-07 13:35 UTC (permalink / raw)
  To: Bob Liu; +Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec

Hello, Bob.

2012/12/7 Bob Liu <lliubbo@gmail.com>:
> Hi Joonsoo,
>
> On Fri, Dec 7, 2012 at 12:09 AM, Joonsoo Kim <js1304@gmail.com> wrote:
>> This patchset remove vm_struct list management after initializing vmalloc.
>> Adding and removing an entry to vmlist is linear time complexity, so
>> it is inefficient. If we maintain this list, overall time complexity of
>> adding and removing area to vmalloc space is O(N), although we use
>> rbtree for finding vacant place and it's time complexity is just O(logN).
>>
>> And vmlist and vmlist_lock is used many places of outside of vmalloc.c.
>> It is preferable that we hide this raw data structure and provide
>> well-defined function for supporting them, because it makes that they
>> cannot mistake when manipulating theses structure and it makes us easily
>> maintain vmalloc layer.
>>
>> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
>> Because it is related to userspace program.
>> As far as I know, makedumpfile use kexec's output information and it only
>> need first address of vmalloc layer. So my implementation reflect this
>> fact, but I'm not sure. And now, I don't fully test this patchset.
>> Basic operation work well, but I don't test kexec. So I send this
>> patchset with 'RFC'.
>>
>> Please let me know what I am missing.
>>
>
> Nice work!
> I also thought about this several weeks ago but I think the efficiency
> may be a problem.
>
> As you know two locks(vmap_area_lock and vmlist_lock) are used
> currently so that some
> work may be done in parallel(not proved).
> If removed vmlist, i'm afraid vmap_area_lock will become a bottleneck
> which will reduce the efficiency.

Thanks for comment!

Yes, there were some place that work may be done in parallel.
For example, access to '/proc/meminfo', '/proc/vmallocinfo' and '/proc/kcore'
may be done in parallel. But, access to these are not main
functionality of vmalloc layer.
Optimizing main function like vmalloc, vfree is more preferable than above.
And this patchset optimize main function with removing vmlist iteration.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-07  7:44   ` Pekka Enberg
  2012-12-07  8:15     ` Bob Liu
@ 2012-12-07 13:40     ` JoonSoo Kim
  1 sibling, 0 replies; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-07 13:40 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec,
	Chris Metcalf, Guan Xuetao, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin

Hello, Pekka.

2012/12/7 Pekka Enberg <penberg@kernel.org>:
> On Thu, Dec 6, 2012 at 6:09 PM, Joonsoo Kim <js1304@gmail.com> wrote:
>> The purpose of iterating a vmlist is finding vm area with specific
>> virtual address. find_vm_area() is provided for this purpose
>> and more efficient, because it uses a rbtree.
>> So change it.
>
> You no longer take the 'vmlist_lock'. This is safe, because...?

As Bob mentioned, find_vm_area() hold a 'vmap_area_lock' during
searching a area.
When we hold a 'vmap_area_lock', area can't be removed.
So this change is safe.

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-07 13:16   ` JoonSoo Kim
@ 2012-12-07 14:59     ` Vivek Goyal
  2012-12-10 14:40       ` JoonSoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Vivek Goyal @ 2012-12-07 14:59 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, Russell King, kexec, linux-kernel, linux-mm,
	Dave Anderson, Atsushi Kumagai

On Fri, Dec 07, 2012 at 10:16:55PM +0900, JoonSoo Kim wrote:
> 2012/12/7 Andrew Morton <akpm@linux-foundation.org>:
> > On Fri,  7 Dec 2012 01:09:27 +0900
> > Joonsoo Kim <js1304@gmail.com> wrote:
> >
> >> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
> >> Because it is related to userspace program.
> >> As far as I know, makedumpfile use kexec's output information and it only
> >> need first address of vmalloc layer. So my implementation reflect this
> >> fact, but I'm not sure. And now, I don't fully test this patchset.
> >> Basic operation work well, but I don't test kexec. So I send this
> >> patchset with 'RFC'.
> >
> > Yes, this is irritating.  Perhaps Vivek or one of the other kexec
> > people could take a look at this please - if would obviously be much
> > better if we can avoid merging [patch 7/8] at all.
> 
> I'm not sure, but I almost sure that [patch 7/8] have no problem.
> In kexec.c, they write an address of vmlist and offset of vm_struct's
> address field.
> It imply that user for this information doesn't have any other
> information about vm_struct,
> and they can't use other field of vm_struct. They can use *only* address field.
> So, remaining just one vm_struct for vmlist which represent first area
> of vmalloc layer
> may be safe.

I browsed through makedumpfile source quickly. So yes it does look like
that we look at first vmlist element ->addr field to figure out where
vmalloc area is starting.

Can we get the same information from this rb-tree of vmap_area? Is
->va_start field communication same information as vmlist was
communicating? What's the difference between vmap_area_root and vmlist.

So without knowing details of both the data structures, I think if vmlist
is going away, then user space tools should be able to traverse vmap_area_root
rb tree. I am assuming it is sorted using ->addr field and we should be
able to get vmalloc area start from there. It will just be a matter of
exporting right fields to user space (instead of vmlist).

CCing Atsushi Kumagai and Dave Anderson. Atsushi-san is the one who
maintains makedumpfile. Dave Anderson maintains "crash" and looks like
it already has the capability to traverse through vmap_area_root
rb-tree.

So please let us know if left most element of vmap_area_root rb-tree will
give us start of vmalloc area or not?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
  2012-12-07  7:44   ` Pekka Enberg
@ 2012-12-10  5:20   ` guanxuetao
  2012-12-10 15:13   ` Chris Metcalf
  2013-01-24 15:50   ` Ingo Molnar
  3 siblings, 0 replies; 28+ messages in thread
From: guanxuetao @ 2012-12-10  5:20 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec,
	Joonsoo Kim, Chris Metcalf, Guan Xuetao, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin

> The purpose of iterating a vmlist is finding vm area with specific
> virtual address. find_vm_area() is provided for this purpose
> and more efficient, because it uses a rbtree.
> So change it.
>
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>

For UniCore32 bits:
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>

>
> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
> index de0de0c..862782d 100644
> --- a/arch/tile/mm/pgtable.c
> +++ b/arch/tile/mm/pgtable.c
> @@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
>  	   in parallel. Reuse of the virtual address is prevented by
>  	   leaving it in the global lists until we're done with it.
>  	   cpa takes care of the direct mappings. */
> -	read_lock(&vmlist_lock);
> -	for (p = vmlist; p; p = p->next) {
> -		if (p->addr == addr)
> -			break;
> -	}
> -	read_unlock(&vmlist_lock);
> +	p = find_vm_area((void *)addr);
>
>  	if (!p) {
>  		pr_err("iounmap: bad address %p\n", addr);
> diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
> index b7a6055..13068ee 100644
> --- a/arch/unicore32/mm/ioremap.c
> +++ b/arch/unicore32/mm/ioremap.c
> @@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
>  void __uc32_iounmap(volatile void __iomem *io_addr)
>  {
>  	void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
> -	struct vm_struct **p, *tmp;
> +	struct vm_struct *vm;
>
>  	/*
>  	 * If this is a section based mapping we need to handle it
> @@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
>  	 * all the mappings before the area can be reclaimed
>  	 * by someone else.
>  	 */
> -	write_lock(&vmlist_lock);
> -	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
> -		if ((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) {
> -			if (tmp->flags & VM_UNICORE_SECTION_MAPPING) {
> -				unmap_area_sections((unsigned long)tmp->addr,
> -						    tmp->size);
> -			}
> -			break;
> -		}
> -	}
> -	write_unlock(&vmlist_lock);
> +	vm = find_vm_area(addr);
> +	if (vm && (vm->flags & VM_IOREMAP) &&
> +		(vm->flags & VM_UNICORE_SECTION_MAPPING))
> +		unmap_area_sections((unsigned long)vm->addr, vm->size);
>
>  	vunmap(addr);
>  }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 78fe3f1..9a1e658 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
>  	   in parallel. Reuse of the virtual address is prevented by
>  	   leaving it in the global lists until we're done with it.
>  	   cpa takes care of the direct mappings. */
> -	read_lock(&vmlist_lock);
> -	for (p = vmlist; p; p = p->next) {
> -		if (p->addr == (void __force *)addr)
> -			break;
> -	}
> -	read_unlock(&vmlist_lock);
> +	p = find_vm_area((void __force *)addr);
>
>  	if (!p) {
>  		printk(KERN_ERR "iounmap: bad address %p\n", addr);
> --
> 1.7.9.5
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-07 14:59     ` Vivek Goyal
@ 2012-12-10 14:40       ` JoonSoo Kim
  2012-12-11 14:41         ` Dave Anderson
  2012-12-11 21:48         ` Vivek Goyal
  0 siblings, 2 replies; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-10 14:40 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, Russell King, kexec, linux-kernel, linux-mm,
	Dave Anderson, Atsushi Kumagai

Hello, Vivek.

2012/12/7 Vivek Goyal <vgoyal@redhat.com>:
> On Fri, Dec 07, 2012 at 10:16:55PM +0900, JoonSoo Kim wrote:
>> 2012/12/7 Andrew Morton <akpm@linux-foundation.org>:
>> > On Fri,  7 Dec 2012 01:09:27 +0900
>> > Joonsoo Kim <js1304@gmail.com> wrote:
>> >
>> >> I'm not sure that "7/8: makes vmlist only for kexec" is fine.
>> >> Because it is related to userspace program.
>> >> As far as I know, makedumpfile use kexec's output information and it only
>> >> need first address of vmalloc layer. So my implementation reflect this
>> >> fact, but I'm not sure. And now, I don't fully test this patchset.
>> >> Basic operation work well, but I don't test kexec. So I send this
>> >> patchset with 'RFC'.
>> >
>> > Yes, this is irritating.  Perhaps Vivek or one of the other kexec
>> > people could take a look at this please - if would obviously be much
>> > better if we can avoid merging [patch 7/8] at all.
>>
>> I'm not sure, but I almost sure that [patch 7/8] have no problem.
>> In kexec.c, they write an address of vmlist and offset of vm_struct's
>> address field.
>> It imply that user for this information doesn't have any other
>> information about vm_struct,
>> and they can't use other field of vm_struct. They can use *only* address field.
>> So, remaining just one vm_struct for vmlist which represent first area
>> of vmalloc layer
>> may be safe.
>
> I browsed through makedumpfile source quickly. So yes it does look like
> that we look at first vmlist element ->addr field to figure out where
> vmalloc area is starting.
>
> Can we get the same information from this rb-tree of vmap_area? Is
> ->va_start field communication same information as vmlist was
> communicating? What's the difference between vmap_area_root and vmlist.

Thanks for comment.

Yes. vmap_area's va_start field represent same information as vm_struct's addr.
vmap_area_root is data structure for fast searching an area.
vmap_area_list is address sorted list, so we can use it like as vmlist.

There is a little difference vmap_area_list and vmlist.
vmlist is lack of information about some areas in vmalloc address space.
For example, vm_map_ram() allocate area in vmalloc address space,
but it doesn't make a link with vmlist. To provide full information
about vmalloc address space,
using vmap_area_list is more adequate.

> So without knowing details of both the data structures, I think if vmlist
> is going away, then user space tools should be able to traverse vmap_area_root
> rb tree. I am assuming it is sorted using ->addr field and we should be
> able to get vmalloc area start from there. It will just be a matter of
> exporting right fields to user space (instead of vmlist).

There is address sorted list of vmap_area, vmap_area_list.
So we can use it for traversing vmalloc areas if it is necessary.
But, as I mentioned before, kexec write *just* address of vmlist and
offset of vm_struct's address field.
It imply that they don't traverse vmlist,
because they didn't write vm_struct's next field which is needed for traversing.
Without vm_struct's next field, they have no method for traversing.
So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
a safe way to maintain a compatibility of userspace tool. :)

Thanks.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
  2012-12-07  7:44   ` Pekka Enberg
  2012-12-10  5:20   ` guanxuetao
@ 2012-12-10 15:13   ` Chris Metcalf
  2013-01-24 15:50   ` Ingo Molnar
  3 siblings, 0 replies; 28+ messages in thread
From: Chris Metcalf @ 2012-12-10 15:13 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec,
	Guan Xuetao, Thomas Gleixner, Ingo Molnar, H. Peter Anvin

On 12/6/2012 11:09 AM, Joonsoo Kim wrote:
> The purpose of iterating a vmlist is finding vm area with specific
> virtual address. find_vm_area() is provided for this purpose
> and more efficient, because it uses a rbtree.
> So change it.

If you get an Acked-by for the x86 change, feel free to apply it to the tile file as well.  You'll note that for tile it's under an #if 0, which in retrospect I shouldn't have pushed anyway.  So I don't feel strongly :-)

FWIW, the change certainly seems at least plausible to me.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-10 14:40       ` JoonSoo Kim
@ 2012-12-11 14:41         ` Dave Anderson
  2012-12-11 21:48         ` Vivek Goyal
  1 sibling, 0 replies; 28+ messages in thread
From: Dave Anderson @ 2012-12-11 14:41 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, Russell King, kexec, linux-kernel, linux-mm,
	Atsushi Kumagai, Vivek Goyal



----- Original Message -----

> > Can we get the same information from this rb-tree of vmap_area? Is
> > ->va_start field communication same information as vmlist was
> > communicating? What's the difference between vmap_area_root and vmlist.
> 
> Thanks for comment.
> 
> Yes. vmap_area's va_start field represent same information as vm_struct's addr.
> vmap_area_root is data structure for fast searching an area.
> vmap_area_list is address sorted list, so we can use it like as vmlist.
> 
> There is a little difference vmap_area_list and vmlist.
> vmlist is lack of information about some areas in vmalloc address space.
> For example, vm_map_ram() allocate area in vmalloc address space,
> but it doesn't make a link with vmlist. To provide full information
> about vmalloc address space, using vmap_area_list is more adequate.
> 
> > So without knowing details of both the data structures, I think if vmlist
> > is going away, then user space tools should be able to traverse vmap_area_root
> > rb tree. I am assuming it is sorted using ->addr field and we should be
> > able to get vmalloc area start from there. It will just be a matter of
> > exporting right fields to user space (instead of vmlist).
> 
> There is address sorted list of vmap_area, vmap_area_list.
> So we can use it for traversing vmalloc areas if it is necessary.
> But, as I mentioned before, kexec write *just* address of vmlist and
> offset of vm_struct's address field.  It imply that they don't traverse vmlist,
> because they didn't write vm_struct's next field which is needed for traversing.
> Without vm_struct's next field, they have no method for traversing.
> So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
> a safe way to maintain a compatibility of userspace tool. :)

Why bother keeping vmlist around?  kdump's makedumpfile command would not
even need to traverse the vmap_area rbtree, because it could simply look
at the first vmap_area in the sorted vmap_area_list, correct?

Dave Anderson



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-10 14:40       ` JoonSoo Kim
  2012-12-11 14:41         ` Dave Anderson
@ 2012-12-11 21:48         ` Vivek Goyal
  2012-12-11 22:17           ` Dave Anderson
  1 sibling, 1 reply; 28+ messages in thread
From: Vivek Goyal @ 2012-12-11 21:48 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, Russell King, kexec, linux-kernel, linux-mm,
	Dave Anderson, Atsushi Kumagai

On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:

[..]
> > So without knowing details of both the data structures, I think if vmlist
> > is going away, then user space tools should be able to traverse vmap_area_root
> > rb tree. I am assuming it is sorted using ->addr field and we should be
> > able to get vmalloc area start from there. It will just be a matter of
> > exporting right fields to user space (instead of vmlist).
> 
> There is address sorted list of vmap_area, vmap_area_list.
> So we can use it for traversing vmalloc areas if it is necessary.
> But, as I mentioned before, kexec write *just* address of vmlist and
> offset of vm_struct's address field.
> It imply that they don't traverse vmlist,
> because they didn't write vm_struct's next field which is needed for traversing.
> Without vm_struct's next field, they have no method for traversing.
> So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
> a safe way to maintain a compatibility of userspace tool. :)

Actually the design of "makedumpfile" and "crash" tool is that they know
about kernel data structures and they adopt to changes. So for major
changes they keep track of kernel version numbers and if access the
data structures accordingly.

Currently we access first element of vmlist to determine start of vmalloc
address. True we don't have to traverse the list.

But as you mentioned we should be able to get same information by
traversing to left most element of vmap_area_list rb tree. So I think
instead of trying to retain vmlist first element just for backward
compatibility, I will rather prefer get rid of that code completely
from kernel and let user space tool traverse rbtree. Just export
minimum needed info for traversal in user space.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-11 21:48         ` Vivek Goyal
@ 2012-12-11 22:17           ` Dave Anderson
  2012-12-12  5:56             ` Atsushi Kumagai
  0 siblings, 1 reply; 28+ messages in thread
From: Dave Anderson @ 2012-12-11 22:17 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Andrew Morton, Russell King, kexec, linux-kernel, linux-mm,
	Atsushi Kumagai, JoonSoo Kim



----- Original Message -----
> On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:
> 
> [..]
> > > So without knowing details of both the data structures, I think if vmlist
> > > is going away, then user space tools should be able to traverse vmap_area_root
> > > rb tree. I am assuming it is sorted using ->addr field and we should be
> > > able to get vmalloc area start from there. It will just be a matter of
> > > exporting right fields to user space (instead of vmlist).
> > 
> > There is address sorted list of vmap_area, vmap_area_list.
> > So we can use it for traversing vmalloc areas if it is necessary.
> > But, as I mentioned before, kexec write *just* address of vmlist and
> > offset of vm_struct's address field.  It imply that they don't traverse vmlist,
> > because they didn't write vm_struct's next field which is needed for traversing.
> > Without vm_struct's next field, they have no method for traversing.
> > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
> > a safe way to maintain a compatibility of userspace tool. :)
> 
> Actually the design of "makedumpfile" and "crash" tool is that they know
> about kernel data structures and they adopt to changes. So for major
> changes they keep track of kernel version numbers and if access the
> data structures accordingly.
> 
> Currently we access first element of vmlist to determine start of vmalloc
> address. True we don't have to traverse the list.
> 
> But as you mentioned we should be able to get same information by
> traversing to left most element of vmap_area_list rb tree. So I think
> instead of trying to retain vmlist first element just for backward
> compatibility, I will rather prefer get rid of that code completely
> from kernel and let user space tool traverse rbtree. Just export
> minimum needed info for traversal in user space.

There's no need to traverse the rbtree.  There is a vmap_area_list
linked list of vmap_area structures that is also sorted by virtual
address.

All that makedumpfile would have to do is to access the first vmap_area
in the vmap_area_list -- as opposed to the way that it does now, which is
by accessing the first vm_struct in the to-be-obsoleted vmlist list.

So it seems silly to keep the dummy "vmlist" around.

Dave

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-11 22:17           ` Dave Anderson
@ 2012-12-12  5:56             ` Atsushi Kumagai
  2012-12-12 14:10               ` JoonSoo Kim
  0 siblings, 1 reply; 28+ messages in thread
From: Atsushi Kumagai @ 2012-12-12  5:56 UTC (permalink / raw)
  To: anderson; +Cc: vgoyal, akpm, rmk+kernel, kexec, linux-kernel, linux-mm, js1304

Hello,

On Tue, 11 Dec 2012 17:17:05 -0500 (EST)
Dave Anderson <anderson@redhat.com> wrote:

> 
> 
> ----- Original Message -----
> > On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:
> > 
> > [..]
> > > > So without knowing details of both the data structures, I think if vmlist
> > > > is going away, then user space tools should be able to traverse vmap_area_root
> > > > rb tree. I am assuming it is sorted using ->addr field and we should be
> > > > able to get vmalloc area start from there. It will just be a matter of
> > > > exporting right fields to user space (instead of vmlist).
> > > 
> > > There is address sorted list of vmap_area, vmap_area_list.
> > > So we can use it for traversing vmalloc areas if it is necessary.
> > > But, as I mentioned before, kexec write *just* address of vmlist and
> > > offset of vm_struct's address field.  It imply that they don't traverse vmlist,
> > > because they didn't write vm_struct's next field which is needed for traversing.
> > > Without vm_struct's next field, they have no method for traversing.
> > > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
> > > a safe way to maintain a compatibility of userspace tool. :)
> > 
> > Actually the design of "makedumpfile" and "crash" tool is that they know
> > about kernel data structures and they adopt to changes. So for major
> > changes they keep track of kernel version numbers and if access the
> > data structures accordingly.
> > 
> > Currently we access first element of vmlist to determine start of vmalloc
> > address. True we don't have to traverse the list.
> > 
> > But as you mentioned we should be able to get same information by
> > traversing to left most element of vmap_area_list rb tree. So I think
> > instead of trying to retain vmlist first element just for backward
> > compatibility, I will rather prefer get rid of that code completely
> > from kernel and let user space tool traverse rbtree. Just export
> > minimum needed info for traversal in user space.
> 
> There's no need to traverse the rbtree.  There is a vmap_area_list
> linked list of vmap_area structures that is also sorted by virtual
> address.
> 
> All that makedumpfile would have to do is to access the first vmap_area
> in the vmap_area_list -- as opposed to the way that it does now, which is
> by accessing the first vm_struct in the to-be-obsoleted vmlist list.
> 
> So it seems silly to keep the dummy "vmlist" around.

I think so, I will modify makedumpfile to get the start address of vmalloc 
with vmap_area_list if the related symbols are provided as VMCOREINFO like
vmlist.

BTW, have we to consider other tools ?
If it is clear, I think we can get rid of the dummy vmlist.


Thanks
Atsushi Kumagai

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 0/8] remove vm_struct list management
  2012-12-12  5:56             ` Atsushi Kumagai
@ 2012-12-12 14:10               ` JoonSoo Kim
  0 siblings, 0 replies; 28+ messages in thread
From: JoonSoo Kim @ 2012-12-12 14:10 UTC (permalink / raw)
  To: Atsushi Kumagai
  Cc: anderson, vgoyal, akpm, rmk+kernel, kexec, linux-kernel, linux-mm

Hello, Atsushi.

2012/12/12 Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>:
> Hello,
>
> On Tue, 11 Dec 2012 17:17:05 -0500 (EST)
> Dave Anderson <anderson@redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>> > On Mon, Dec 10, 2012 at 11:40:47PM +0900, JoonSoo Kim wrote:
>> >
>> > [..]
>> > > > So without knowing details of both the data structures, I think if vmlist
>> > > > is going away, then user space tools should be able to traverse vmap_area_root
>> > > > rb tree. I am assuming it is sorted using ->addr field and we should be
>> > > > able to get vmalloc area start from there. It will just be a matter of
>> > > > exporting right fields to user space (instead of vmlist).
>> > >
>> > > There is address sorted list of vmap_area, vmap_area_list.
>> > > So we can use it for traversing vmalloc areas if it is necessary.
>> > > But, as I mentioned before, kexec write *just* address of vmlist and
>> > > offset of vm_struct's address field.  It imply that they don't traverse vmlist,
>> > > because they didn't write vm_struct's next field which is needed for traversing.
>> > > Without vm_struct's next field, they have no method for traversing.
>> > > So, IMHO, assigning dummy vm_struct to vmlist which is implemented by [7/8] is
>> > > a safe way to maintain a compatibility of userspace tool. :)
>> >
>> > Actually the design of "makedumpfile" and "crash" tool is that they know
>> > about kernel data structures and they adopt to changes. So for major
>> > changes they keep track of kernel version numbers and if access the
>> > data structures accordingly.
>> >
>> > Currently we access first element of vmlist to determine start of vmalloc
>> > address. True we don't have to traverse the list.
>> >
>> > But as you mentioned we should be able to get same information by
>> > traversing to left most element of vmap_area_list rb tree. So I think
>> > instead of trying to retain vmlist first element just for backward
>> > compatibility, I will rather prefer get rid of that code completely
>> > from kernel and let user space tool traverse rbtree. Just export
>> > minimum needed info for traversal in user space.
>>
>> There's no need to traverse the rbtree.  There is a vmap_area_list
>> linked list of vmap_area structures that is also sorted by virtual
>> address.
>>
>> All that makedumpfile would have to do is to access the first vmap_area
>> in the vmap_area_list -- as opposed to the way that it does now, which is
>> by accessing the first vm_struct in the to-be-obsoleted vmlist list.
>>
>> So it seems silly to keep the dummy "vmlist" around.
>
> I think so, I will modify makedumpfile to get the start address of vmalloc
> with vmap_area_list if the related symbols are provided as VMCOREINFO like
> vmlist.
> BTW, have we to consider other tools ?
> If it is clear, I think we can get rid of the dummy vmlist.

Good!
In next spin, I will remove dummy vmlist and export vmap_area_list symbol
for makedumpfile.

I don't know any other tools.
If anyone knows it, please let me know.

Thanks! Atsushi, Dave and Vivek.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area()
  2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
                     ` (2 preceding siblings ...)
  2012-12-10 15:13   ` Chris Metcalf
@ 2013-01-24 15:50   ` Ingo Molnar
  3 siblings, 0 replies; 28+ messages in thread
From: Ingo Molnar @ 2013-01-24 15:50 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Russell King, linux-kernel, linux-mm, kexec,
	Chris Metcalf, Guan Xuetao, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin


* Joonsoo Kim <js1304@gmail.com> wrote:

> The purpose of iterating a vmlist is finding vm area with specific
> virtual address. find_vm_area() is provided for this purpose
> and more efficient, because it uses a rbtree.
> So change it.
> 
> Cc: Chris Metcalf <cmetcalf@tilera.com>
> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/arch/tile/mm/pgtable.c b/arch/tile/mm/pgtable.c
> index de0de0c..862782d 100644
> --- a/arch/tile/mm/pgtable.c
> +++ b/arch/tile/mm/pgtable.c
> @@ -592,12 +592,7 @@ void iounmap(volatile void __iomem *addr_in)
>  	   in parallel. Reuse of the virtual address is prevented by
>  	   leaving it in the global lists until we're done with it.
>  	   cpa takes care of the direct mappings. */
> -	read_lock(&vmlist_lock);
> -	for (p = vmlist; p; p = p->next) {
> -		if (p->addr == addr)
> -			break;
> -	}
> -	read_unlock(&vmlist_lock);
> +	p = find_vm_area((void *)addr);
>  
>  	if (!p) {
>  		pr_err("iounmap: bad address %p\n", addr);
> diff --git a/arch/unicore32/mm/ioremap.c b/arch/unicore32/mm/ioremap.c
> index b7a6055..13068ee 100644
> --- a/arch/unicore32/mm/ioremap.c
> +++ b/arch/unicore32/mm/ioremap.c
> @@ -235,7 +235,7 @@ EXPORT_SYMBOL(__uc32_ioremap_cached);
>  void __uc32_iounmap(volatile void __iomem *io_addr)
>  {
>  	void *addr = (void *)(PAGE_MASK & (unsigned long)io_addr);
> -	struct vm_struct **p, *tmp;
> +	struct vm_struct *vm;
>  
>  	/*
>  	 * If this is a section based mapping we need to handle it
> @@ -244,17 +244,10 @@ void __uc32_iounmap(volatile void __iomem *io_addr)
>  	 * all the mappings before the area can be reclaimed
>  	 * by someone else.
>  	 */
> -	write_lock(&vmlist_lock);
> -	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
> -		if ((tmp->flags & VM_IOREMAP) && (tmp->addr == addr)) {
> -			if (tmp->flags & VM_UNICORE_SECTION_MAPPING) {
> -				unmap_area_sections((unsigned long)tmp->addr,
> -						    tmp->size);
> -			}
> -			break;
> -		}
> -	}
> -	write_unlock(&vmlist_lock);
> +	vm = find_vm_area(addr);
> +	if (vm && (vm->flags & VM_IOREMAP) &&
> +		(vm->flags & VM_UNICORE_SECTION_MAPPING))
> +		unmap_area_sections((unsigned long)vm->addr, vm->size);
>  
>  	vunmap(addr);
>  }
> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
> index 78fe3f1..9a1e658 100644
> --- a/arch/x86/mm/ioremap.c
> +++ b/arch/x86/mm/ioremap.c
> @@ -282,12 +282,7 @@ void iounmap(volatile void __iomem *addr)
>  	   in parallel. Reuse of the virtual address is prevented by
>  	   leaving it in the global lists until we're done with it.
>  	   cpa takes care of the direct mappings. */
> -	read_lock(&vmlist_lock);
> -	for (p = vmlist; p; p = p->next) {
> -		if (p->addr == (void __force *)addr)
> -			break;
> -	}
> -	read_unlock(&vmlist_lock);
> +	p = find_vm_area((void __force *)addr);
>  
>  	if (!p) {
>  		printk(KERN_ERR "iounmap: bad address %p\n", addr);

For the x86 bits, provided it gets some good testing:

Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-01-24 15:50 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-06 16:09 [RFC PATCH 0/8] remove vm_struct list management Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 1/8] mm, vmalloc: change iterating a vmlist to find_vm_area() Joonsoo Kim
2012-12-07  7:44   ` Pekka Enberg
2012-12-07  8:15     ` Bob Liu
2012-12-07 13:40     ` JoonSoo Kim
2012-12-10  5:20   ` guanxuetao
2012-12-10 15:13   ` Chris Metcalf
2013-01-24 15:50   ` Ingo Molnar
2012-12-06 16:09 ` [RFC PATCH 2/8] mm, vmalloc: move get_vmalloc_info() to vmalloc.c Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 3/8] mm, vmalloc: protect va->vm by vmap_area_lock Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 4/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist in vread/vwrite() Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 5/8] mm, vmalloc: iterate vmap_area_list in get_vmalloc_info() Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 6/8] mm, vmalloc: iterate vmap_area_list, instead of vmlist, in vmallocinfo() Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 7/8] mm, vmalloc: makes vmlist only for kexec Joonsoo Kim
2012-12-06 16:09 ` [RFC PATCH 8/8] mm, vmalloc: remove list management operation after initializing vmalloc Joonsoo Kim
2012-12-06 22:45 ` [RFC PATCH 0/8] remove vm_struct list management Andrew Morton
2012-12-07 13:05   ` JoonSoo Kim
2012-12-06 22:50 ` Andrew Morton
2012-12-07 13:16   ` JoonSoo Kim
2012-12-07 14:59     ` Vivek Goyal
2012-12-10 14:40       ` JoonSoo Kim
2012-12-11 14:41         ` Dave Anderson
2012-12-11 21:48         ` Vivek Goyal
2012-12-11 22:17           ` Dave Anderson
2012-12-12  5:56             ` Atsushi Kumagai
2012-12-12 14:10               ` JoonSoo Kim
2012-12-07  3:37 ` Bob Liu
2012-12-07 13:35   ` JoonSoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).