linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
@ 2023-01-13  3:19 Baoquan He
  2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
                   ` (6 more replies)
  0 siblings, 7 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Problem:
***
Stephen reported vread() will skip vm_map_ram areas when reading out
/proc/kcore with drgn utility. Please see below link to get more
details. 

  /proc/kcore reads 0's for vmap_block
  https://lore.kernel.org/all/87ilk6gos2.fsf@oracle.com/T/#u

Root cause:
***
The normal vmalloc API uses struct vmap_area to manage the virtual
kernel area allocated, and associate a vm_struct to store more
information and pass out. However, area reserved through vm_map_ram()
interface doesn't allocate vm_struct to associate with. So the current
code in vread() will skip the vm_map_ram area through 'if (!va->vm)'
conditional checking.

Solution:
***
To mark the area reserved through vm_map_ram() interface, add field 'flags'
into struct vmap_area. Bit 0 indicates this is vm_map_ram area created
through vm_map_ram() interface, bit 1 marks out the type of vm_map_ram area
which makes use of vmap_block to manage split regions via vb_alloc/free().

And also add bitmap field 'used_map' into struct vmap_block to mark those
further subdivided regions being used to differentiate with dirty and free
regions in vmap_block.

With the help of above vmap_area->flags and vmap_block->used_map, we can
recognize and handle vm_map_ram areas successfully. All these are done
in patch 1~3.

Meanwhile, do some improvement on areas related to vm_map_ram areas in
patch 4, 5. And also change area flag from VM_ALLOC to VM_IOREMAP in
patch 6, 7 because this will show them as 'ioremap' in /proc/vmallocinfo,
and exclude them from /proc/kcore.

Testing
***
Only did the basic testing on kvm guest, and run below command to
access kcore file to do statistics:

	makedumpfile --mem-usage /proc/kcore

Changelog
***
v2->v3:
- Benefited from find_unlink_vmap_area() introduced by Uladzislau, do
  not need to worry about the va->vm and va->flags reset during freeing.
- Change to identify vm_map_area with VMAP_RAM in vmap->flags in
  vread(). 
- Rename the old vb_vread() to vmap_ram_vread().
- Handle two kinds of vm_map_area area reading in vmap_ram_vread()
  respectively. 
- Fix bug of the while loop code block in vmap_block reading, found by
  Lorenzo.
- Improve conditional check related to vm_map_ram area, suggested by
  Lorenzo.

v1->v2:
- Change alloc_vmap_area() to pass in va_flags so that we can pass and
  set vmap_area->flags for vm_map_ram area. With this, no extra action
  need be added to acquire vmap_area_lock when doing the vmap_area->flags
  setting. Uladzislau reviewed v1 and pointed out the issue.
- Improve vb_vread() to cover the case where reading is started from a
  dirty or free region.

RFC->v1:
- Add a new field flags in vmap_area to mark vm_map_ram area. It cold be
  risky reusing the vm union in vmap_area in RFC. I will consider
  reusing the union in vmap_area to save memory later. Now just take the
  simpler way to let's focus on resolving the main problem.
- Add patch 4~7 for optimization.

Baoquan He (7):
  mm/vmalloc.c: add used_map into vmap_block to track space of
    vmap_block
  mm/vmalloc.c: add flags to mark vm_map_ram area
  mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  mm/vmalloc: explicitly identify vm_map_ram area when shown in
    /proc/vmcoreinfo
  mm/vmalloc: skip the uninitilized vmalloc areas
  powerpc: mm: add VM_IOREMAP flag to the vmalloc area
  sh: mm: set VM_IOREMAP flag to the vmalloc area

 arch/powerpc/kernel/pci_64.c |   2 +-
 arch/sh/kernel/cpu/sh4/sq.c  |   2 +-
 include/linux/vmalloc.h      |   1 +
 mm/vmalloc.c                 | 114 ++++++++++++++++++++++++++++++-----
 4 files changed, 101 insertions(+), 18 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:39   ` Uladzislau Rezki
  2023-01-16 12:22   ` Lorenzo Stoakes
  2023-01-13  3:19 ` [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

In one vmap_block area, there could be three types of regions: region
being used which is allocated through vb_alloc(), dirty region which
is freed via vb_free() and free region. Among them, only used region
has available data. While there's no way to track those used regions
currently.

Here, add bitmap field used_map into vmap_block, and set/clear it during
allocation or freeing regions of vmap_block area.

This is a preparatoin for later use.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmalloc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 428e0bee5c9c..d6ff058ef4d0 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1922,6 +1922,7 @@ struct vmap_block {
 	spinlock_t lock;
 	struct vmap_area *va;
 	unsigned long free, dirty;
+	DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
 	unsigned long dirty_min, dirty_max; /*< dirty range */
 	struct list_head free_list;
 	struct rcu_head rcu_head;
@@ -1998,10 +1999,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	vb->va = va;
 	/* At least something should be left free */
 	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+	bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
 	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
 	vb->dirty_min = VMAP_BBMAP_BITS;
 	vb->dirty_max = 0;
+	bitmap_set(vb->used_map, 0, (1UL << order));
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
@@ -2111,6 +2114,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 		pages_off = VMAP_BBMAP_BITS - vb->free;
 		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
 		vb->free -= 1UL << order;
+		bitmap_set(vb->used_map, pages_off, (1UL << order));
 		if (vb->free == 0) {
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
@@ -2144,6 +2148,9 @@ static void vb_free(unsigned long addr, unsigned long size)
 	order = get_order(size);
 	offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
 	vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
+	spin_lock(&vb->lock);
+	bitmap_clear(vb->used_map, offset, (1UL << order));
+	spin_unlock(&vb->lock);
 
 	vunmap_range_noflush(addr, addr + size);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
  2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:42   ` Uladzislau Rezki
  2023-01-16 12:23   ` Lorenzo Stoakes
  2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Through vmalloc API, a virtual kernel area is reserved for physical
address mapping. And vmap_area is used to track them, while vm_struct
is allocated to associate with the vmap_area to store more information
and passed out.

However, area reserved via vm_map_ram() is an exception. It doesn't have
vm_struct to associate with vmap_area. And we can't recognize the
vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
freeing path will set va->vm = NULL before unmapping, please see
function remove_vm_area().

Meanwhile, there are two kinds of handling for vm_map_ram area. One is
the whole vmap_area being reserved and mapped at one time through
vm_map_area() interface; the other is the whole vmap_area with
VMAP_BLOCK_SIZE size being reserved, while mapped into split regions
with smaller size via vb_alloc().

To mark the area reserved through vm_map_ram(), add flags field into
struct vmap_area. Bit 0 indicates this is vm_map_ram area created
through vm_map_ram() interface, while bit 1 marks out the type of
vm_map_ram area which makes use of vmap_block to manage split regions
via vb_alloc/free().

This is a preparatoin for later use.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 include/linux/vmalloc.h |  1 +
 mm/vmalloc.c            | 16 ++++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 096d48aa3437..69250efa03d1 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -76,6 +76,7 @@ struct vmap_area {
 		unsigned long subtree_max_size; /* in "free" tree */
 		struct vm_struct *vm;           /* in "busy" tree */
 	};
+	unsigned long flags; /* mark type of vm_map_ram area */
 };
 
 /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d6ff058ef4d0..ab4825050b5c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1589,7 +1589,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
 static struct vmap_area *alloc_vmap_area(unsigned long size,
 				unsigned long align,
 				unsigned long vstart, unsigned long vend,
-				int node, gfp_t gfp_mask)
+				int node, gfp_t gfp_mask,
+				unsigned long va_flags)
 {
 	struct vmap_area *va;
 	unsigned long freed;
@@ -1635,6 +1636,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
 	va->va_start = addr;
 	va->va_end = addr + size;
 	va->vm = NULL;
+	va->flags = va_flags;
 
 	spin_lock(&vmap_area_lock);
 	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
@@ -1913,6 +1915,10 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
 
 #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
 
+#define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
+#define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
+#define VMAP_FLAGS_MASK		0x3
+
 struct vmap_block_queue {
 	spinlock_t lock;
 	struct list_head free;
@@ -1988,7 +1994,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 
 	va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
 					VMALLOC_START, VMALLOC_END,
-					node, gfp_mask);
+					node, gfp_mask,
+					VMAP_RAM|VMAP_BLOCK);
 	if (IS_ERR(va)) {
 		kfree(vb);
 		return ERR_CAST(va);
@@ -2297,7 +2304,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
 	} else {
 		struct vmap_area *va;
 		va = alloc_vmap_area(size, PAGE_SIZE,
-				VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
+				VMALLOC_START, VMALLOC_END,
+				node, GFP_KERNEL, VMAP_RAM);
 		if (IS_ERR(va))
 			return NULL;
 
@@ -2537,7 +2545,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
 	if (!(flags & VM_NO_GUARD))
 		size += PAGE_SIZE;
 
-	va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
+	va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
 	if (IS_ERR(va)) {
 		kfree(area);
 		return NULL;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
  2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
  2023-01-13  3:19 ` [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-14  7:57   ` Dan Carpenter
                     ` (2 more replies)
  2023-01-13  3:19 ` [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo Baoquan He
                   ` (3 subsequent siblings)
  6 siblings, 3 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Currently, vread can read out vmalloc areas which is associated with
a vm_struct. While this doesn't work for areas created by vm_map_ram()
interface because it doesn't have an associated vm_struct. Then in vread(),
these areas are all skipped.

Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
The area created with vmap_ram_vread() interface directly can be handled
like the other normal vmap areas with aligned_vread(). While areas
which will be further subdivided and managed with vmap_block need
carefully read out page-aligned small regions and zero fill holes.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 73 insertions(+), 7 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ab4825050b5c..13875bc41e27 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3544,6 +3544,65 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
 	return copied;
 }
 
+static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
+{
+	char *start;
+	struct vmap_block *vb;
+	unsigned long offset;
+	unsigned int rs, re, n;
+
+	/*
+	 * If it's area created by vm_map_ram() interface directly, but
+	 * not further subdividing and delegating management to vmap_block,
+	 * handle it here.
+	 */
+	if (!(flags & VMAP_BLOCK)) {
+		aligned_vread(buf, addr, count);
+		return;
+	}
+
+	/*
+	 * Area is split into regions and tracked with vmap_block, read out
+	 * each region and zero fill the hole between regions.
+	 */
+	vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
+
+	spin_lock(&vb->lock);
+	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
+		spin_unlock(&vb->lock);
+		memset(buf, 0, count);
+		return;
+	}
+	for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
+		if (!count)
+			break;
+		start = vmap_block_vaddr(vb->va->va_start, rs);
+		while (addr < start) {
+			if (count == 0)
+				break;
+			*buf = '\0';
+			buf++;
+			addr++;
+			count--;
+		}
+		/*it could start reading from the middle of used region*/
+		offset = offset_in_page(addr);
+		n = ((re - rs + 1) << PAGE_SHIFT) - offset;
+		if (n > count)
+			n = count;
+		aligned_vread(buf, start+offset, n);
+
+		buf += n;
+		addr += n;
+		count -= n;
+	}
+	spin_unlock(&vb->lock);
+
+	/* zero-fill the left dirty or free regions */
+	if (count)
+		memset(buf, 0, count);
+}
+
 /**
  * vread() - read vmalloc area in a safe way.
  * @buf:     buffer for reading data
@@ -3574,7 +3633,7 @@ long vread(char *buf, char *addr, unsigned long count)
 	struct vm_struct *vm;
 	char *vaddr, *buf_start = buf;
 	unsigned long buflen = count;
-	unsigned long n;
+	unsigned long n, size, flags;
 
 	addr = kasan_reset_tag(addr);
 
@@ -3595,12 +3654,16 @@ long vread(char *buf, char *addr, unsigned long count)
 		if (!count)
 			break;
 
-		if (!va->vm)
+		vm = va->vm;
+		flags = va->flags & VMAP_FLAGS_MASK;
+
+		if (!vm && !flags)
 			continue;
 
-		vm = va->vm;
-		vaddr = (char *) vm->addr;
-		if (addr >= vaddr + get_vm_area_size(vm))
+		vaddr = (char *) va->va_start;
+		size = vm ? get_vm_area_size(vm) : va_size(va);
+
+		if (addr >= vaddr + size)
 			continue;
 		while (addr < vaddr) {
 			if (count == 0)
@@ -3610,10 +3673,13 @@ long vread(char *buf, char *addr, unsigned long count)
 			addr++;
 			count--;
 		}
-		n = vaddr + get_vm_area_size(vm) - addr;
+		n = vaddr + size - addr;
 		if (n > count)
 			n = count;
-		if (!(vm->flags & VM_IOREMAP))
+
+		if (flags & VMAP_RAM)
+			vmap_ram_vread(buf, addr, n, flags);
+		else if (!(vm->flags & VM_IOREMAP))
 			aligned_vread(buf, addr, n);
 		else /* IOREMAP area is treated as memory hole */
 			memset(buf, 0, n);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
                   ` (2 preceding siblings ...)
  2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:43   ` Uladzislau Rezki
  2023-01-16 12:24   ` Lorenzo Stoakes
  2023-01-13  3:19 ` [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas Baoquan He
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Now, by marking VMAP_RAM in vmap_area->flags for vm_map_ram, we can
clearly differentiate it with other vmalloc areas. So identify
vm_map_area area by checking VMAP_RAM of vmap_area->flags when shown
in /proc/vmcoreinfo.

Meanwhile, the code comment above vm_map_ram area checking in s_show()
is not needed any more, remove it here.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmalloc.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 13875bc41e27..4a10b3b692fa 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -4225,11 +4225,7 @@ static int s_show(struct seq_file *m, void *p)
 
 	va = list_entry(p, struct vmap_area, list);
 
-	/*
-	 * s_show can encounter race with remove_vm_area, !vm on behalf
-	 * of vmap area is being tear down or vm_map_ram allocation.
-	 */
-	if (!va->vm) {
+	if (!va->vm && (va->flags & VMAP_RAM)) {
 		seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
 			(void *)va->va_start, (void *)va->va_end,
 			va->va_end - va->va_start);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
                   ` (3 preceding siblings ...)
  2023-01-13  3:19 ` [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:44   ` Uladzislau Rezki
  2023-01-16 12:24   ` Lorenzo Stoakes
  2023-01-13  3:19 ` [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area Baoquan He
  2023-01-13  3:19 ` [PATCH v3 7/7] sh: mm: set " Baoquan He
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

For areas allocated via vmalloc_xxx() APIs, it searches for unmapped area
to reserve and allocates new pages to map into, please see function
__vmalloc_node_range(). During the process, flag VM_UNINITIALIZED is set
in vm->flags to indicate that the pages allocation and mapping haven't
been done, until clear_vm_uninitialized_flag() is called to clear it.

For this kind of area, if VM_UNINITIALIZED is still set, let's ignore
it in vread() because pages newly allocated and being mapped in that
area only contains zero data. reading them out by aligned_vread() is
wasting time.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/vmalloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 4a10b3b692fa..dbcdcad2276b 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3660,6 +3660,11 @@ long vread(char *buf, char *addr, unsigned long count)
 		if (!vm && !flags)
 			continue;
 
+		if (vm && (vm->flags & VM_UNINITIALIZED))
+			continue;
+		/* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
+		smp_rmb();
+
 		vaddr = (char *) va->va_start;
 		size = vm ? get_vm_area_size(vm) : va_size(va);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
                   ` (4 preceding siblings ...)
  2023-01-13  3:19 ` [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:46   ` Uladzislau Rezki
  2023-01-16 12:25   ` Lorenzo Stoakes
  2023-01-13  3:19 ` [PATCH v3 7/7] sh: mm: set " Baoquan He
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be
 1) Shown as ioremap in /proc/vmallocinfo;
 2) Ignored by /proc/kcore reading via vread()

So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP
in flag to make it handled correctly as above.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/powerpc/kernel/pci_64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 0c7cfb9fab04..fd42059ae2a5 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -132,7 +132,7 @@ void __iomem *ioremap_phb(phys_addr_t paddr, unsigned long size)
 	 * address decoding but I'd rather not deal with those outside of the
 	 * reserved 64K legacy region.
 	 */
-	area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
+	area = __get_vm_area_caller(size, VM_IOREMAP, PHB_IO_BASE, PHB_IO_END,
 				    __builtin_return_address(0));
 	if (!area)
 		return NULL;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 7/7] sh: mm: set VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
                   ` (5 preceding siblings ...)
  2023-01-13  3:19 ` [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area Baoquan He
@ 2023-01-13  3:19 ` Baoquan He
  2023-01-16 11:46   ` Uladzislau Rezki
  2023-01-16 12:25   ` Lorenzo Stoakes
  6 siblings, 2 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-13  3:19 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, urezki, lstoakes, stephen.s.brennan, willy, akpm,
	hch, Baoquan He

Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
specific alignment clamping in __get_vm_area_node(), they will be
1) Shown as ioremap in /proc/vmallocinfo;
2) Ignored by /proc/kcore reading via vread()

So for the ioremap in __sq_remap() of sh, we should set VM_IOREMAP
in flag to make it handled correctly as above.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 arch/sh/kernel/cpu/sh4/sq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
index a76b94e41e91..27f2e3da5aa2 100644
--- a/arch/sh/kernel/cpu/sh4/sq.c
+++ b/arch/sh/kernel/cpu/sh4/sq.c
@@ -103,7 +103,7 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
 #if defined(CONFIG_MMU)
 	struct vm_struct *vma;
 
-	vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
+	vma = __get_vm_area_caller(map->size, VM_IOREMAP, map->sq_addr,
 			SQ_ADDRMAX, __builtin_return_address(0));
 	if (!vma)
 		return -ENOMEM;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
@ 2023-01-14  7:57   ` Dan Carpenter
  2023-01-15 14:08     ` Baoquan He
  2023-01-16 11:50   ` Uladzislau Rezki
  2023-01-16 19:01   ` Matthew Wilcox
  2 siblings, 1 reply; 31+ messages in thread
From: Dan Carpenter @ 2023-01-14  7:57 UTC (permalink / raw)
  To: oe-kbuild, Baoquan He, linux-mm
  Cc: lkp, oe-kbuild-all, linux-kernel, urezki, lstoakes,
	stephen.s.brennan, willy, akpm, hch, Baoquan He

Hi Baoquan,

url:    https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/mm-vmalloc-c-add-used_map-into-vmap_block-to-track-space-of-vmap_block/20230113-112149
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230113031921.64716-4-bhe%40redhat.com
patch subject: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
config: i386-randconfig-m021
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>

smatch warnings:
mm/vmalloc.c:3682 vread() error: we previously assumed 'vm' could be null (see line 3664)

vim +/vm +3682 mm/vmalloc.c

^1da177e4c3f415 Linus Torvalds          2005-04-16  3630  long vread(char *buf, char *addr, unsigned long count)
^1da177e4c3f415 Linus Torvalds          2005-04-16  3631  {
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3632  	struct vmap_area *va;
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3633  	struct vm_struct *vm;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3634  	char *vaddr, *buf_start = buf;
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3635  	unsigned long buflen = count;
129dbdf298d7383 Baoquan He              2023-01-13  3636  	unsigned long n, size, flags;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3637  
4aff1dc4fb3a5a3 Andrey Konovalov        2022-03-24  3638  	addr = kasan_reset_tag(addr);
4aff1dc4fb3a5a3 Andrey Konovalov        2022-03-24  3639  
^1da177e4c3f415 Linus Torvalds          2005-04-16  3640  	/* Don't allow overflow */
^1da177e4c3f415 Linus Torvalds          2005-04-16  3641  	if ((unsigned long) addr + count < count)
^1da177e4c3f415 Linus Torvalds          2005-04-16  3642  		count = -(unsigned long) addr;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3643  
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3644  	spin_lock(&vmap_area_lock);
f181234a5a21fd0 Chen Wandun             2021-09-02  3645  	va = find_vmap_area_exceed_addr((unsigned long)addr);
f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3646  	if (!va)
f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3647  		goto finished;
f181234a5a21fd0 Chen Wandun             2021-09-02  3648  
f181234a5a21fd0 Chen Wandun             2021-09-02  3649  	/* no intersects with alive vmap_area */
f181234a5a21fd0 Chen Wandun             2021-09-02  3650  	if ((unsigned long)addr + count <= va->va_start)
f181234a5a21fd0 Chen Wandun             2021-09-02  3651  		goto finished;
f181234a5a21fd0 Chen Wandun             2021-09-02  3652  
f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3653  	list_for_each_entry_from(va, &vmap_area_list, list) {
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3654  		if (!count)
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3655  			break;
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3656  
129dbdf298d7383 Baoquan He              2023-01-13  3657  		vm = va->vm;
129dbdf298d7383 Baoquan He              2023-01-13  3658  		flags = va->flags & VMAP_FLAGS_MASK;
129dbdf298d7383 Baoquan He              2023-01-13  3659  
129dbdf298d7383 Baoquan He              2023-01-13  3660  		if (!vm && !flags)
                                                                            ^^^
vm can be NULL if a flag in VMAP_FLAGS_MASK is set.

e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3661  			continue;
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3662  
129dbdf298d7383 Baoquan He              2023-01-13  3663  		vaddr = (char *) va->va_start;
129dbdf298d7383 Baoquan He              2023-01-13 @3664  		size = vm ? get_vm_area_size(vm) : va_size(va);
                                                                               ^^

129dbdf298d7383 Baoquan He              2023-01-13  3665  
129dbdf298d7383 Baoquan He              2023-01-13  3666  		if (addr >= vaddr + size)
^1da177e4c3f415 Linus Torvalds          2005-04-16  3667  			continue;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3668  		while (addr < vaddr) {
^1da177e4c3f415 Linus Torvalds          2005-04-16  3669  			if (count == 0)
^1da177e4c3f415 Linus Torvalds          2005-04-16  3670  				goto finished;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3671  			*buf = '\0';
^1da177e4c3f415 Linus Torvalds          2005-04-16  3672  			buf++;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3673  			addr++;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3674  			count--;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3675  		}
129dbdf298d7383 Baoquan He              2023-01-13  3676  		n = vaddr + size - addr;
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3677  		if (n > count)
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3678  			n = count;
129dbdf298d7383 Baoquan He              2023-01-13  3679  
129dbdf298d7383 Baoquan He              2023-01-13  3680  		if (flags & VMAP_RAM)

assume VMAP_RAM is not set

129dbdf298d7383 Baoquan He              2023-01-13  3681  			vmap_ram_vread(buf, addr, n, flags);
129dbdf298d7383 Baoquan He              2023-01-13 @3682  		else if (!(vm->flags & VM_IOREMAP))
                                                                                   ^^^^^^^^^
Unchecked dereference.  Should this be "flags" instead of "vm->flags"?

d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3683  			aligned_vread(buf, addr, n);
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3684  		else /* IOREMAP area is treated as memory hole */
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3685  			memset(buf, 0, n);
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3686  		buf += n;
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3687  		addr += n;
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3688  		count -= n;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3689  	}
^1da177e4c3f415 Linus Torvalds          2005-04-16  3690  finished:
e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3691  	spin_unlock(&vmap_area_lock);
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3692  
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3693  	if (buf == buf_start)
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3694  		return 0;
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3695  	/* zero-fill memory holes */
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3696  	if (buf != buf_start + buflen)
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3697  		memset(buf, 0, buflen - (buf - buf_start));
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3698  
d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3699  	return buflen;
^1da177e4c3f415 Linus Torvalds          2005-04-16  3700  }

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-14  7:57   ` Dan Carpenter
@ 2023-01-15 14:08     ` Baoquan He
  2023-01-16 13:08       ` Dan Carpenter
  0 siblings, 1 reply; 31+ messages in thread
From: Baoquan He @ 2023-01-15 14:08 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: oe-kbuild, linux-mm, lkp, oe-kbuild-all, linux-kernel, urezki,
	lstoakes, stephen.s.brennan, willy, akpm, hch

Hi Dan,

On 01/14/23 at 10:57am, Dan Carpenter wrote:
> Hi Baoquan,
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Baoquan-He/mm-vmalloc-c-add-used_map-into-vmap_block-to-track-space-of-vmap_block/20230113-112149
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20230113031921.64716-4-bhe%40redhat.com
> patch subject: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
> config: i386-randconfig-m021
> compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Reported-by: Dan Carpenter <error27@gmail.com>
> 
> smatch warnings:
> mm/vmalloc.c:3682 vread() error: we previously assumed 'vm' could be null (see line 3664)

Thanks for checking this. I went through the code flow again, personally
think that the issue and risk pointed out could not exist. Please see
the comment at below.

> 
> vim +/vm +3682 mm/vmalloc.c
> 
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3630  long vread(char *buf, char *addr, unsigned long count)
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3631  {
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3632  	struct vmap_area *va;
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3633  	struct vm_struct *vm;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3634  	char *vaddr, *buf_start = buf;
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3635  	unsigned long buflen = count;
> 129dbdf298d7383 Baoquan He              2023-01-13  3636  	unsigned long n, size, flags;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3637  
> 4aff1dc4fb3a5a3 Andrey Konovalov        2022-03-24  3638  	addr = kasan_reset_tag(addr);
> 4aff1dc4fb3a5a3 Andrey Konovalov        2022-03-24  3639  
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3640  	/* Don't allow overflow */
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3641  	if ((unsigned long) addr + count < count)
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3642  		count = -(unsigned long) addr;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3643  
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3644  	spin_lock(&vmap_area_lock);
> f181234a5a21fd0 Chen Wandun             2021-09-02  3645  	va = find_vmap_area_exceed_addr((unsigned long)addr);
> f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3646  	if (!va)
> f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3647  		goto finished;
> f181234a5a21fd0 Chen Wandun             2021-09-02  3648  
> f181234a5a21fd0 Chen Wandun             2021-09-02  3649  	/* no intersects with alive vmap_area */
> f181234a5a21fd0 Chen Wandun             2021-09-02  3650  	if ((unsigned long)addr + count <= va->va_start)
> f181234a5a21fd0 Chen Wandun             2021-09-02  3651  		goto finished;
> f181234a5a21fd0 Chen Wandun             2021-09-02  3652  
> f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3653  	list_for_each_entry_from(va, &vmap_area_list, list) {
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3654  		if (!count)
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3655  			break;
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3656  
> 129dbdf298d7383 Baoquan He              2023-01-13  3657  		vm = va->vm;
> 129dbdf298d7383 Baoquan He              2023-01-13  3658  		flags = va->flags & VMAP_FLAGS_MASK;
> 129dbdf298d7383 Baoquan He              2023-01-13  3659  
> 129dbdf298d7383 Baoquan He              2023-01-13  3660  		if (!vm && !flags)
>                                                                             ^^^
> vm can be NULL if a flag in VMAP_FLAGS_MASK is set.
> 
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3661  			continue;

Right, after the 'continue;' line, only two cases could happen when it
comes here. (vm != null) or (vm->flags & VMAP_RAM) is true.

> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3662  
> 129dbdf298d7383 Baoquan He              2023-01-13  3663  		vaddr = (char *) va->va_start;
> 129dbdf298d7383 Baoquan He              2023-01-13 @3664  		size = vm ? get_vm_area_size(vm) : va_size(va);
>                                                                                ^^
> 
> 129dbdf298d7383 Baoquan He              2023-01-13  3665  
> 129dbdf298d7383 Baoquan He              2023-01-13  3666  		if (addr >= vaddr + size)
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3667  			continue;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3668  		while (addr < vaddr) {
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3669  			if (count == 0)
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3670  				goto finished;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3671  			*buf = '\0';
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3672  			buf++;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3673  			addr++;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3674  			count--;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3675  		}
> 129dbdf298d7383 Baoquan He              2023-01-13  3676  		n = vaddr + size - addr;
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3677  		if (n > count)
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3678  			n = count;
> 129dbdf298d7383 Baoquan He              2023-01-13  3679  
> 129dbdf298d7383 Baoquan He              2023-01-13  3680  		if (flags & VMAP_RAM)
> 
> assume VMAP_RAM is not set

OK, then vm is not null.
> 
> 129dbdf298d7383 Baoquan He              2023-01-13  3681  			vmap_ram_vread(buf, addr, n, flags);
> 129dbdf298d7383 Baoquan He              2023-01-13 @3682  		else if (!(vm->flags & VM_IOREMAP))
>                                                                                    ^^^^^^^^^
> Unchecked dereference.  Should this be "flags" instead of "vm->flags"?

Thus, here, in 'else if', vm is not null. And in this 'else if', we are
intending to check vm->flags. I don't see issue or risk here. Please
help check if I miss anything.

Thanks
Baoquan

> 
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3683  			aligned_vread(buf, addr, n);
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3684  		else /* IOREMAP area is treated as memory hole */
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3685  			memset(buf, 0, n);
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3686  		buf += n;
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3687  		addr += n;
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3688  		count -= n;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3689  	}
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3690  finished:
> e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3691  	spin_unlock(&vmap_area_lock);
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3692  
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3693  	if (buf == buf_start)
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3694  		return 0;
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3695  	/* zero-fill memory holes */
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3696  	if (buf != buf_start + buflen)
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3697  		memset(buf, 0, buflen - (buf - buf_start));
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3698  
> d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3699  	return buflen;
> ^1da177e4c3f415 Linus Torvalds          2005-04-16  3700  }
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block
  2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
@ 2023-01-16 11:39   ` Uladzislau Rezki
  2023-01-16 12:22   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:39 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:15AM +0800, Baoquan He wrote:
> In one vmap_block area, there could be three types of regions: region
> being used which is allocated through vb_alloc(), dirty region which
> is freed via vb_free() and free region. Among them, only used region
> has available data. While there's no way to track those used regions
> currently.
> 
> Here, add bitmap field used_map into vmap_block, and set/clear it during
> allocation or freeing regions of vmap_block area.
> 
> This is a preparatoin for later use.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 428e0bee5c9c..d6ff058ef4d0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1922,6 +1922,7 @@ struct vmap_block {
>  	spinlock_t lock;
>  	struct vmap_area *va;
>  	unsigned long free, dirty;
> +	DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
>  	unsigned long dirty_min, dirty_max; /*< dirty range */
>  	struct list_head free_list;
>  	struct rcu_head rcu_head;
> @@ -1998,10 +1999,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  	vb->va = va;
>  	/* At least something should be left free */
>  	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
> +	bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
>  	vb->free = VMAP_BBMAP_BITS - (1UL << order);
>  	vb->dirty = 0;
>  	vb->dirty_min = VMAP_BBMAP_BITS;
>  	vb->dirty_max = 0;
> +	bitmap_set(vb->used_map, 0, (1UL << order));
>  	INIT_LIST_HEAD(&vb->free_list);
>  
>  	vb_idx = addr_to_vb_idx(va->va_start);
> @@ -2111,6 +2114,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>  		pages_off = VMAP_BBMAP_BITS - vb->free;
>  		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
>  		vb->free -= 1UL << order;
> +		bitmap_set(vb->used_map, pages_off, (1UL << order));
>  		if (vb->free == 0) {
>  			spin_lock(&vbq->lock);
>  			list_del_rcu(&vb->free_list);
> @@ -2144,6 +2148,9 @@ static void vb_free(unsigned long addr, unsigned long size)
>  	order = get_order(size);
>  	offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
>  	vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
> +	spin_lock(&vb->lock);
> +	bitmap_clear(vb->used_map, offset, (1UL << order));
> +	spin_unlock(&vb->lock);
>  
>  	vunmap_range_noflush(addr, addr + size);
>  
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area
  2023-01-13  3:19 ` [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
@ 2023-01-16 11:42   ` Uladzislau Rezki
  2023-01-16 12:23   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:42 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:16AM +0800, Baoquan He wrote:
> Through vmalloc API, a virtual kernel area is reserved for physical
> address mapping. And vmap_area is used to track them, while vm_struct
> is allocated to associate with the vmap_area to store more information
> and passed out.
> 
> However, area reserved via vm_map_ram() is an exception. It doesn't have
> vm_struct to associate with vmap_area. And we can't recognize the
> vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
> freeing path will set va->vm = NULL before unmapping, please see
> function remove_vm_area().
> 
> Meanwhile, there are two kinds of handling for vm_map_ram area. One is
> the whole vmap_area being reserved and mapped at one time through
> vm_map_area() interface; the other is the whole vmap_area with
> VMAP_BLOCK_SIZE size being reserved, while mapped into split regions
> with smaller size via vb_alloc().
> 
> To mark the area reserved through vm_map_ram(), add flags field into
> struct vmap_area. Bit 0 indicates this is vm_map_ram area created
> through vm_map_ram() interface, while bit 1 marks out the type of
> vm_map_ram area which makes use of vmap_block to manage split regions
> via vb_alloc/free().
> 
> This is a preparatoin for later use.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/vmalloc.h |  1 +
>  mm/vmalloc.c            | 16 ++++++++++++----
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..69250efa03d1 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -76,6 +76,7 @@ struct vmap_area {
>  		unsigned long subtree_max_size; /* in "free" tree */
>  		struct vm_struct *vm;           /* in "busy" tree */
>  	};
> +	unsigned long flags; /* mark type of vm_map_ram area */
>  };
>  
>  /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d6ff058ef4d0..ab4825050b5c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1589,7 +1589,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
>  static struct vmap_area *alloc_vmap_area(unsigned long size,
>  				unsigned long align,
>  				unsigned long vstart, unsigned long vend,
> -				int node, gfp_t gfp_mask)
> +				int node, gfp_t gfp_mask,
> +				unsigned long va_flags)
>  {
>  	struct vmap_area *va;
>  	unsigned long freed;
> @@ -1635,6 +1636,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	va->va_start = addr;
>  	va->va_end = addr + size;
>  	va->vm = NULL;
> +	va->flags = va_flags;
>  
>  	spin_lock(&vmap_area_lock);
>  	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
> @@ -1913,6 +1915,10 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
>  
>  #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
>  
> +#define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
> +#define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
> +#define VMAP_FLAGS_MASK		0x3
> +
>  struct vmap_block_queue {
>  	spinlock_t lock;
>  	struct list_head free;
> @@ -1988,7 +1994,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  
>  	va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
>  					VMALLOC_START, VMALLOC_END,
> -					node, gfp_mask);
> +					node, gfp_mask,
> +					VMAP_RAM|VMAP_BLOCK);
>  	if (IS_ERR(va)) {
>  		kfree(vb);
>  		return ERR_CAST(va);
> @@ -2297,7 +2304,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
>  	} else {
>  		struct vmap_area *va;
>  		va = alloc_vmap_area(size, PAGE_SIZE,
> -				VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
> +				VMALLOC_START, VMALLOC_END,
> +				node, GFP_KERNEL, VMAP_RAM);
>  		if (IS_ERR(va))
>  			return NULL;
>  
> @@ -2537,7 +2545,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>  	if (!(flags & VM_NO_GUARD))
>  		size += PAGE_SIZE;
>  
> -	va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
> +	va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
>  	if (IS_ERR(va)) {
>  		kfree(area);
>  		return NULL;
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo
  2023-01-13  3:19 ` [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo Baoquan He
@ 2023-01-16 11:43   ` Uladzislau Rezki
  2023-01-16 12:24   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:43 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:18AM +0800, Baoquan He wrote:
> Now, by marking VMAP_RAM in vmap_area->flags for vm_map_ram, we can
> clearly differentiate it with other vmalloc areas. So identify
> vm_map_area area by checking VMAP_RAM of vmap_area->flags when shown
> in /proc/vmcoreinfo.
> 
> Meanwhile, the code comment above vm_map_ram area checking in s_show()
> is not needed any more, remove it here.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 13875bc41e27..4a10b3b692fa 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4225,11 +4225,7 @@ static int s_show(struct seq_file *m, void *p)
>  
>  	va = list_entry(p, struct vmap_area, list);
>  
> -	/*
> -	 * s_show can encounter race with remove_vm_area, !vm on behalf
> -	 * of vmap area is being tear down or vm_map_ram allocation.
> -	 */
> -	if (!va->vm) {
> +	if (!va->vm && (va->flags & VMAP_RAM)) {
>  		seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
>  			(void *)va->va_start, (void *)va->va_end,
>  			va->va_end - va->va_start);
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas
  2023-01-13  3:19 ` [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas Baoquan He
@ 2023-01-16 11:44   ` Uladzislau Rezki
  2023-01-16 12:24   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:44 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:19AM +0800, Baoquan He wrote:
> For areas allocated via vmalloc_xxx() APIs, it searches for unmapped area
> to reserve and allocates new pages to map into, please see function
> __vmalloc_node_range(). During the process, flag VM_UNINITIALIZED is set
> in vm->flags to indicate that the pages allocation and mapping haven't
> been done, until clear_vm_uninitialized_flag() is called to clear it.
> 
> For this kind of area, if VM_UNINITIALIZED is still set, let's ignore
> it in vread() because pages newly allocated and being mapped in that
> area only contains zero data. reading them out by aligned_vread() is
> wasting time.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4a10b3b692fa..dbcdcad2276b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3660,6 +3660,11 @@ long vread(char *buf, char *addr, unsigned long count)
>  		if (!vm && !flags)
>  			continue;
>  
> +		if (vm && (vm->flags & VM_UNINITIALIZED))
> +			continue;
> +		/* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
> +		smp_rmb();
> +
>  		vaddr = (char *) va->va_start;
>  		size = vm ? get_vm_area_size(vm) : va_size(va);
>  
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 ` [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area Baoquan He
@ 2023-01-16 11:46   ` Uladzislau Rezki
  2023-01-16 12:25   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:46 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:20AM +0800, Baoquan He wrote:
> Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
> specific alignment clamping in __get_vm_area_node(), they will be
>  1) Shown as ioremap in /proc/vmallocinfo;
>  2) Ignored by /proc/kcore reading via vread()
> 
> So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP
> in flag to make it handled correctly as above.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 0c7cfb9fab04..fd42059ae2a5 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -132,7 +132,7 @@ void __iomem *ioremap_phb(phys_addr_t paddr, unsigned long size)
>  	 * address decoding but I'd rather not deal with those outside of the
>  	 * reserved 64K legacy region.
>  	 */
> -	area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
> +	area = __get_vm_area_caller(size, VM_IOREMAP, PHB_IO_BASE, PHB_IO_END,
>  				    __builtin_return_address(0));
>  	if (!area)
>  		return NULL;
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/7] sh: mm: set VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 ` [PATCH v3 7/7] sh: mm: set " Baoquan He
@ 2023-01-16 11:46   ` Uladzislau Rezki
  2023-01-16 12:25   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:46 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:21AM +0800, Baoquan He wrote:
> Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
> specific alignment clamping in __get_vm_area_node(), they will be
> 1) Shown as ioremap in /proc/vmallocinfo;
> 2) Ignored by /proc/kcore reading via vread()
> 
> So for the ioremap in __sq_remap() of sh, we should set VM_IOREMAP
> in flag to make it handled correctly as above.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/sh/kernel/cpu/sh4/sq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
> index a76b94e41e91..27f2e3da5aa2 100644
> --- a/arch/sh/kernel/cpu/sh4/sq.c
> +++ b/arch/sh/kernel/cpu/sh4/sq.c
> @@ -103,7 +103,7 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
>  #if defined(CONFIG_MMU)
>  	struct vm_struct *vma;
>  
> -	vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
> +	vma = __get_vm_area_caller(map->size, VM_IOREMAP, map->sq_addr,
>  			SQ_ADDRMAX, __builtin_return_address(0));
>  	if (!vma)
>  		return -ENOMEM;
> -- 
> 2.34.1
> 
Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com>

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
  2023-01-14  7:57   ` Dan Carpenter
@ 2023-01-16 11:50   ` Uladzislau Rezki
  2023-01-19  9:52     ` Baoquan He
  2023-01-16 19:01   ` Matthew Wilcox
  2 siblings, 1 reply; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-16 11:50 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan,
	willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> Currently, vread can read out vmalloc areas which is associated with
> a vm_struct. While this doesn't work for areas created by vm_map_ram()
> interface because it doesn't have an associated vm_struct. Then in vread(),
> these areas are all skipped.
> 
> Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
> The area created with vmap_ram_vread() interface directly can be handled
> like the other normal vmap areas with aligned_vread(). While areas
> which will be further subdivided and managed with vmap_block need
> carefully read out page-aligned small regions and zero fill holes.
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 73 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index ab4825050b5c..13875bc41e27 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3544,6 +3544,65 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
>  	return copied;
>  }
>  
> +static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
> +{
> +	char *start;
> +	struct vmap_block *vb;
> +	unsigned long offset;
> +	unsigned int rs, re, n;
> +
> +	/*
> +	 * If it's area created by vm_map_ram() interface directly, but
> +	 * not further subdividing and delegating management to vmap_block,
> +	 * handle it here.
> +	 */
> +	if (!(flags & VMAP_BLOCK)) {
> +		aligned_vread(buf, addr, count);
> +		return;
> +	}
> +
> +	/*
> +	 * Area is split into regions and tracked with vmap_block, read out
> +	 * each region and zero fill the hole between regions.
> +	 */
> +	vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> +
> +	spin_lock(&vb->lock);
> +	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
>
CPU-X invokes free_vmap_block() whereas we take the vb->lock and do
some manipulations with vb that might be already freed over RCU-core.

Should we protect it by the rcu_read_lock() also here?

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block
  2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
  2023-01-16 11:39   ` Uladzislau Rezki
@ 2023-01-16 12:22   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:22 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:15AM +0800, Baoquan He wrote:
> In one vmap_block area, there could be three types of regions: region
> being used which is allocated through vb_alloc(), dirty region which
> is freed via vb_free() and free region. Among them, only used region
> has available data. While there's no way to track those used regions
> currently.
>
> Here, add bitmap field used_map into vmap_block, and set/clear it during
> allocation or freeing regions of vmap_block area.
>
> This is a preparatoin for later use.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 428e0bee5c9c..d6ff058ef4d0 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1922,6 +1922,7 @@ struct vmap_block {
>  	spinlock_t lock;
>  	struct vmap_area *va;
>  	unsigned long free, dirty;
> +	DECLARE_BITMAP(used_map, VMAP_BBMAP_BITS);
>  	unsigned long dirty_min, dirty_max; /*< dirty range */
>  	struct list_head free_list;
>  	struct rcu_head rcu_head;
> @@ -1998,10 +1999,12 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  	vb->va = va;
>  	/* At least something should be left free */
>  	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
> +	bitmap_zero(vb->used_map, VMAP_BBMAP_BITS);
>  	vb->free = VMAP_BBMAP_BITS - (1UL << order);
>  	vb->dirty = 0;
>  	vb->dirty_min = VMAP_BBMAP_BITS;
>  	vb->dirty_max = 0;
> +	bitmap_set(vb->used_map, 0, (1UL << order));
>  	INIT_LIST_HEAD(&vb->free_list);
>
>  	vb_idx = addr_to_vb_idx(va->va_start);
> @@ -2111,6 +2114,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>  		pages_off = VMAP_BBMAP_BITS - vb->free;
>  		vaddr = vmap_block_vaddr(vb->va->va_start, pages_off);
>  		vb->free -= 1UL << order;
> +		bitmap_set(vb->used_map, pages_off, (1UL << order));
>  		if (vb->free == 0) {
>  			spin_lock(&vbq->lock);
>  			list_del_rcu(&vb->free_list);
> @@ -2144,6 +2148,9 @@ static void vb_free(unsigned long addr, unsigned long size)
>  	order = get_order(size);
>  	offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
>  	vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));
> +	spin_lock(&vb->lock);
> +	bitmap_clear(vb->used_map, offset, (1UL << order));
> +	spin_unlock(&vb->lock);
>
>  	vunmap_range_noflush(addr, addr + size);
>
> --
> 2.34.1
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area
  2023-01-13  3:19 ` [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
  2023-01-16 11:42   ` Uladzislau Rezki
@ 2023-01-16 12:23   ` Lorenzo Stoakes
  2023-01-18  2:13     ` Baoquan He
  1 sibling, 1 reply; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:23 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:16AM +0800, Baoquan He wrote:
> Through vmalloc API, a virtual kernel area is reserved for physical
> address mapping. And vmap_area is used to track them, while vm_struct
> is allocated to associate with the vmap_area to store more information
> and passed out.
>
> However, area reserved via vm_map_ram() is an exception. It doesn't have
> vm_struct to associate with vmap_area. And we can't recognize the
> vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
> freeing path will set va->vm = NULL before unmapping, please see
> function remove_vm_area().
>
> Meanwhile, there are two kinds of handling for vm_map_ram area. One is
> the whole vmap_area being reserved and mapped at one time through
> vm_map_area() interface; the other is the whole vmap_area with
> VMAP_BLOCK_SIZE size being reserved, while mapped into split regions
> with smaller size via vb_alloc().
>
> To mark the area reserved through vm_map_ram(), add flags field into
> struct vmap_area. Bit 0 indicates this is vm_map_ram area created
> through vm_map_ram() interface, while bit 1 marks out the type of
> vm_map_ram area which makes use of vmap_block to manage split regions
> via vb_alloc/free().
>
> This is a preparatoin for later use.
>

Small typo here :)

> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  include/linux/vmalloc.h |  1 +
>  mm/vmalloc.c            | 16 ++++++++++++----
>  2 files changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 096d48aa3437..69250efa03d1 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -76,6 +76,7 @@ struct vmap_area {
>  		unsigned long subtree_max_size; /* in "free" tree */
>  		struct vm_struct *vm;           /* in "busy" tree */
>  	};
> +	unsigned long flags; /* mark type of vm_map_ram area */
>  };
>
>  /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index d6ff058ef4d0..ab4825050b5c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1589,7 +1589,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
>  static struct vmap_area *alloc_vmap_area(unsigned long size,
>  				unsigned long align,
>  				unsigned long vstart, unsigned long vend,
> -				int node, gfp_t gfp_mask)
> +				int node, gfp_t gfp_mask,
> +				unsigned long va_flags)
>  {
>  	struct vmap_area *va;
>  	unsigned long freed;
> @@ -1635,6 +1636,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>  	va->va_start = addr;
>  	va->va_end = addr + size;
>  	va->vm = NULL;
> +	va->flags = va_flags;
>
>  	spin_lock(&vmap_area_lock);
>  	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
> @@ -1913,6 +1915,10 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
>
>  #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
>
> +#define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
> +#define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
> +#define VMAP_FLAGS_MASK		0x3
> +
>  struct vmap_block_queue {
>  	spinlock_t lock;
>  	struct list_head free;
> @@ -1988,7 +1994,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>
>  	va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
>  					VMALLOC_START, VMALLOC_END,
> -					node, gfp_mask);
> +					node, gfp_mask,
> +					VMAP_RAM|VMAP_BLOCK);
>  	if (IS_ERR(va)) {
>  		kfree(vb);
>  		return ERR_CAST(va);
> @@ -2297,7 +2304,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
>  	} else {
>  		struct vmap_area *va;
>  		va = alloc_vmap_area(size, PAGE_SIZE,
> -				VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
> +				VMALLOC_START, VMALLOC_END,
> +				node, GFP_KERNEL, VMAP_RAM);
>  		if (IS_ERR(va))
>  			return NULL;
>
> @@ -2537,7 +2545,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>  	if (!(flags & VM_NO_GUARD))
>  		size += PAGE_SIZE;
>
> -	va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
> +	va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
>  	if (IS_ERR(va)) {
>  		kfree(area);
>  		return NULL;
> --
> 2.34.1
>

Other than that typo,

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo
  2023-01-13  3:19 ` [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo Baoquan He
  2023-01-16 11:43   ` Uladzislau Rezki
@ 2023-01-16 12:24   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:18AM +0800, Baoquan He wrote:
> Now, by marking VMAP_RAM in vmap_area->flags for vm_map_ram, we can
> clearly differentiate it with other vmalloc areas. So identify
> vm_map_area area by checking VMAP_RAM of vmap_area->flags when shown
> in /proc/vmcoreinfo.
>
> Meanwhile, the code comment above vm_map_ram area checking in s_show()
> is not needed any more, remove it here.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 13875bc41e27..4a10b3b692fa 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -4225,11 +4225,7 @@ static int s_show(struct seq_file *m, void *p)
>
>  	va = list_entry(p, struct vmap_area, list);
>
> -	/*
> -	 * s_show can encounter race with remove_vm_area, !vm on behalf
> -	 * of vmap area is being tear down or vm_map_ram allocation.
> -	 */
> -	if (!va->vm) {
> +	if (!va->vm && (va->flags & VMAP_RAM)) {
>  		seq_printf(m, "0x%pK-0x%pK %7ld vm_map_ram\n",
>  			(void *)va->va_start, (void *)va->va_end,
>  			va->va_end - va->va_start);
> --
> 2.34.1
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas
  2023-01-13  3:19 ` [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas Baoquan He
  2023-01-16 11:44   ` Uladzislau Rezki
@ 2023-01-16 12:24   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:24 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:19AM +0800, Baoquan He wrote:
> For areas allocated via vmalloc_xxx() APIs, it searches for unmapped area
> to reserve and allocates new pages to map into, please see function
> __vmalloc_node_range(). During the process, flag VM_UNINITIALIZED is set
> in vm->flags to indicate that the pages allocation and mapping haven't
> been done, until clear_vm_uninitialized_flag() is called to clear it.
>
> For this kind of area, if VM_UNINITIALIZED is still set, let's ignore
> it in vread() because pages newly allocated and being mapped in that
> area only contains zero data. reading them out by aligned_vread() is
> wasting time.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/vmalloc.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 4a10b3b692fa..dbcdcad2276b 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3660,6 +3660,11 @@ long vread(char *buf, char *addr, unsigned long count)
>  		if (!vm && !flags)
>  			continue;
>
> +		if (vm && (vm->flags & VM_UNINITIALIZED))
> +			continue;
> +		/* Pair with smp_wmb() in clear_vm_uninitialized_flag() */
> +		smp_rmb();
> +
>  		vaddr = (char *) va->va_start;
>  		size = vm ? get_vm_area_size(vm) : va_size(va);
>
> --
> 2.34.1
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 ` [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area Baoquan He
  2023-01-16 11:46   ` Uladzislau Rezki
@ 2023-01-16 12:25   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:20AM +0800, Baoquan He wrote:
> Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
> specific alignment clamping in __get_vm_area_node(), they will be
>  1) Shown as ioremap in /proc/vmallocinfo;
>  2) Ignored by /proc/kcore reading via vread()
>
> So for the io mapping in ioremap_phb() of ppc, we should set VM_IOREMAP
> in flag to make it handled correctly as above.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/powerpc/kernel/pci_64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
> index 0c7cfb9fab04..fd42059ae2a5 100644
> --- a/arch/powerpc/kernel/pci_64.c
> +++ b/arch/powerpc/kernel/pci_64.c
> @@ -132,7 +132,7 @@ void __iomem *ioremap_phb(phys_addr_t paddr, unsigned long size)
>  	 * address decoding but I'd rather not deal with those outside of the
>  	 * reserved 64K legacy region.
>  	 */
> -	area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
> +	area = __get_vm_area_caller(size, VM_IOREMAP, PHB_IO_BASE, PHB_IO_END,
>  				    __builtin_return_address(0));
>  	if (!area)
>  		return NULL;
> --
> 2.34.1
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 7/7] sh: mm: set VM_IOREMAP flag to the vmalloc area
  2023-01-13  3:19 ` [PATCH v3 7/7] sh: mm: set " Baoquan He
  2023-01-16 11:46   ` Uladzislau Rezki
@ 2023-01-16 12:25   ` Lorenzo Stoakes
  1 sibling, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 12:25 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On Fri, Jan 13, 2023 at 11:19:21AM +0800, Baoquan He wrote:
> Currently, for vmalloc areas with flag VM_IOREMAP set, except of the
> specific alignment clamping in __get_vm_area_node(), they will be
> 1) Shown as ioremap in /proc/vmallocinfo;
> 2) Ignored by /proc/kcore reading via vread()
>
> So for the ioremap in __sq_remap() of sh, we should set VM_IOREMAP
> in flag to make it handled correctly as above.
>
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  arch/sh/kernel/cpu/sh4/sq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
> index a76b94e41e91..27f2e3da5aa2 100644
> --- a/arch/sh/kernel/cpu/sh4/sq.c
> +++ b/arch/sh/kernel/cpu/sh4/sq.c
> @@ -103,7 +103,7 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
>  #if defined(CONFIG_MMU)
>  	struct vm_struct *vma;
>
> -	vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
> +	vma = __get_vm_area_caller(map->size, VM_IOREMAP, map->sq_addr,
>  			SQ_ADDRMAX, __builtin_return_address(0));
>  	if (!vma)
>  		return -ENOMEM;
> --
> 2.34.1
>

Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-15 14:08     ` Baoquan He
@ 2023-01-16 13:08       ` Dan Carpenter
  2023-01-18  2:17         ` Baoquan He
  0 siblings, 1 reply; 31+ messages in thread
From: Dan Carpenter @ 2023-01-16 13:08 UTC (permalink / raw)
  To: Baoquan He
  Cc: oe-kbuild, linux-mm, lkp, oe-kbuild-all, linux-kernel, urezki,
	lstoakes, stephen.s.brennan, willy, akpm, hch

On Sun, Jan 15, 2023 at 10:08:55PM +0800, Baoquan He wrote:
> > f181234a5a21fd0 Chen Wandun             2021-09-02  3650  	if ((unsigned long)addr + count <= va->va_start)
> > f181234a5a21fd0 Chen Wandun             2021-09-02  3651  		goto finished;
> > f181234a5a21fd0 Chen Wandun             2021-09-02  3652  
> > f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3653  	list_for_each_entry_from(va, &vmap_area_list, list) {
> > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3654  		if (!count)
> > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3655  			break;
> > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3656  
> > 129dbdf298d7383 Baoquan He              2023-01-13  3657  		vm = va->vm;
> > 129dbdf298d7383 Baoquan He              2023-01-13  3658  		flags = va->flags & VMAP_FLAGS_MASK;
> > 129dbdf298d7383 Baoquan He              2023-01-13  3659  
> > 129dbdf298d7383 Baoquan He              2023-01-13  3660  		if (!vm && !flags)
> >                                                                             ^^^
> > vm can be NULL if a flag in VMAP_FLAGS_MASK is set.
> > 
> > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3661  			continue;
> 
> Right, after the 'continue;' line, only two cases could happen when it
> comes here. (vm != null) or (vm->flags & VMAP_RAM) is true.
>

You're saying VMAP_RAM, but strictly speaking the code is checking
VMAP_FLAGS_MASK and not VMAP_RAM.

+#define VMAP_RAM               0x1 /* indicates vm_map_ram area*/
+#define VMAP_BLOCK             0x2 /* mark out the vmap_block sub-type*/
+#define VMAP_FLAGS_MASK                0x3

If we assume that vm is NULL, VMAP_BLOCK is set and VMAP_RAM is clear
then it would lead to a NULL dereference.  There might be reasons why
that combination is impossible outside the function but we can't tell
from the information we have here.

Which is fine, outside information is a common reason for false
positives with this check.  But I was just concerned about the mix of
VMAP_FLAGS_MASK and VMAP_RAM.

regards,
dan carpenter

 
> > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3662  
> > 129dbdf298d7383 Baoquan He              2023-01-13  3663  		vaddr = (char *) va->va_start;
> > 129dbdf298d7383 Baoquan He              2023-01-13 @3664  		size = vm ? get_vm_area_size(vm) : va_size(va);
> >                                                                                ^^
> > 
> > 129dbdf298d7383 Baoquan He              2023-01-13  3665  
> > 129dbdf298d7383 Baoquan He              2023-01-13  3666  		if (addr >= vaddr + size)
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3667  			continue;
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3668  		while (addr < vaddr) {
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3669  			if (count == 0)
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3670  				goto finished;
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3671  			*buf = '\0';
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3672  			buf++;
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3673  			addr++;
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3674  			count--;
> > ^1da177e4c3f415 Linus Torvalds          2005-04-16  3675  		}
> > 129dbdf298d7383 Baoquan He              2023-01-13  3676  		n = vaddr + size - addr;
> > d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3677  		if (n > count)
> > d0107eb07320b5d KAMEZAWA Hiroyuki       2009-09-21  3678  			n = count;
> > 129dbdf298d7383 Baoquan He              2023-01-13  3679  
> > 129dbdf298d7383 Baoquan He              2023-01-13  3680  		if (flags & VMAP_RAM)
> > 
> > assume VMAP_RAM is not set
> 
> OK, then vm is not null.
> > 
> > 129dbdf298d7383 Baoquan He              2023-01-13  3681  			vmap_ram_vread(buf, addr, n, flags);
> > 129dbdf298d7383 Baoquan He              2023-01-13 @3682  		else if (!(vm->flags & VM_IOREMAP))
> >                                                                                    ^^^^^^^^^
> > Unchecked dereference.  Should this be "flags" instead of "vm->flags"?
> 
> Thus, here, in 'else if', vm is not null. And in this 'else if', we are
> intending to check vm->flags. I don't see issue or risk here. Please
> help check if I miss anything.
> 
> Thanks
> Baoquan
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
  2023-01-14  7:57   ` Dan Carpenter
  2023-01-16 11:50   ` Uladzislau Rezki
@ 2023-01-16 19:01   ` Matthew Wilcox
  2023-01-16 19:48     ` Lorenzo Stoakes
  2 siblings, 1 reply; 31+ messages in thread
From: Matthew Wilcox @ 2023-01-16 19:01 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, urezki, lstoakes, stephen.s.brennan, akpm, hch

On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> +	spin_lock(&vb->lock);
> +	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> +		spin_unlock(&vb->lock);
> +		memset(buf, 0, count);
> +		return;
> +	}
> +	for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> +		if (!count)
> +			break;
> +		start = vmap_block_vaddr(vb->va->va_start, rs);
> +		while (addr < start) {
> +			if (count == 0)
> +				break;
> +			*buf = '\0';
> +			buf++;
> +			addr++;
> +			count--;
> +		}
> +		/*it could start reading from the middle of used region*/
> +		offset = offset_in_page(addr);
> +		n = ((re - rs + 1) << PAGE_SHIFT) - offset;
> +		if (n > count)
> +			n = count;
> +		aligned_vread(buf, start+offset, n);

The whole vread() interface is rather suboptimal.  The only user is proc,
which is trying to copy to userspace.  But the vread() interface copies
to a kernel address, so kcore has to copy to a bounce buffer.  That makes
this spinlock work, but the price is that we can't copy to a user address
in the future.  Ideally, read_kcore() would be kcore_read_iter() and
we'd pass an iov_iter into vread().  vread() would then need to use a
mutex rather than a spinlock.

I don't think this needs to be done now, but if someone's looking for
a project ...


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-16 19:01   ` Matthew Wilcox
@ 2023-01-16 19:48     ` Lorenzo Stoakes
  0 siblings, 0 replies; 31+ messages in thread
From: Lorenzo Stoakes @ 2023-01-16 19:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Baoquan He, linux-mm, linux-kernel, urezki, stephen.s.brennan, akpm, hch

On Mon, Jan 16, 2023 at 07:01:33PM +0000, Matthew Wilcox wrote:
> On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> > +	spin_lock(&vb->lock);
> > +	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > +		spin_unlock(&vb->lock);
> > +		memset(buf, 0, count);
> > +		return;
> > +	}
> > +	for_each_set_bitrange(rs, re, vb->used_map, VMAP_BBMAP_BITS) {
> > +		if (!count)
> > +			break;
> > +		start = vmap_block_vaddr(vb->va->va_start, rs);
> > +		while (addr < start) {
> > +			if (count == 0)
> > +				break;
> > +			*buf = '\0';
> > +			buf++;
> > +			addr++;
> > +			count--;
> > +		}
> > +		/*it could start reading from the middle of used region*/
> > +		offset = offset_in_page(addr);
> > +		n = ((re - rs + 1) << PAGE_SHIFT) - offset;
> > +		if (n > count)
> > +			n = count;
> > +		aligned_vread(buf, start+offset, n);
>
> The whole vread() interface is rather suboptimal.  The only user is proc,
> which is trying to copy to userspace.  But the vread() interface copies
> to a kernel address, so kcore has to copy to a bounce buffer.  That makes
> this spinlock work, but the price is that we can't copy to a user address
> in the future.  Ideally, read_kcore() would be kcore_read_iter() and
> we'd pass an iov_iter into vread().  vread() would then need to use a
> mutex rather than a spinlock.
>
> I don't think this needs to be done now, but if someone's looking for
> a project ...

Interesting! I may take a look at this if I get the time :)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area
  2023-01-16 12:23   ` Lorenzo Stoakes
@ 2023-01-18  2:13     ` Baoquan He
  0 siblings, 0 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-18  2:13 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, linux-kernel, urezki, stephen.s.brennan, willy, akpm, hch

On 01/16/23 at 12:23pm, Lorenzo Stoakes wrote:
> On Fri, Jan 13, 2023 at 11:19:16AM +0800, Baoquan He wrote:
> > Through vmalloc API, a virtual kernel area is reserved for physical
> > address mapping. And vmap_area is used to track them, while vm_struct
> > is allocated to associate with the vmap_area to store more information
> > and passed out.
> >
> > However, area reserved via vm_map_ram() is an exception. It doesn't have
> > vm_struct to associate with vmap_area. And we can't recognize the
> > vmap_area with '->vm == NULL' as a vm_map_ram() area because the normal
> > freeing path will set va->vm = NULL before unmapping, please see
> > function remove_vm_area().
> >
> > Meanwhile, there are two kinds of handling for vm_map_ram area. One is
> > the whole vmap_area being reserved and mapped at one time through
> > vm_map_area() interface; the other is the whole vmap_area with
> > VMAP_BLOCK_SIZE size being reserved, while mapped into split regions
> > with smaller size via vb_alloc().
> >
> > To mark the area reserved through vm_map_ram(), add flags field into
> > struct vmap_area. Bit 0 indicates this is vm_map_ram area created
> > through vm_map_ram() interface, while bit 1 marks out the type of
> > vm_map_ram area which makes use of vmap_block to manage split regions
> > via vb_alloc/free().
> >
> > This is a preparatoin for later use.
> >
> 
> Small typo here :)

Good catch, will fix it.

> 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  include/linux/vmalloc.h |  1 +
> >  mm/vmalloc.c            | 16 ++++++++++++----
> >  2 files changed, 13 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> > index 096d48aa3437..69250efa03d1 100644
> > --- a/include/linux/vmalloc.h
> > +++ b/include/linux/vmalloc.h
> > @@ -76,6 +76,7 @@ struct vmap_area {
> >  		unsigned long subtree_max_size; /* in "free" tree */
> >  		struct vm_struct *vm;           /* in "busy" tree */
> >  	};
> > +	unsigned long flags; /* mark type of vm_map_ram area */
> >  };
> >
> >  /* archs that select HAVE_ARCH_HUGE_VMAP should override one or more of these */
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d6ff058ef4d0..ab4825050b5c 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -1589,7 +1589,8 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
> >  static struct vmap_area *alloc_vmap_area(unsigned long size,
> >  				unsigned long align,
> >  				unsigned long vstart, unsigned long vend,
> > -				int node, gfp_t gfp_mask)
> > +				int node, gfp_t gfp_mask,
> > +				unsigned long va_flags)
> >  {
> >  	struct vmap_area *va;
> >  	unsigned long freed;
> > @@ -1635,6 +1636,7 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> >  	va->va_start = addr;
> >  	va->va_end = addr + size;
> >  	va->vm = NULL;
> > +	va->flags = va_flags;
> >
> >  	spin_lock(&vmap_area_lock);
> >  	insert_vmap_area(va, &vmap_area_root, &vmap_area_list);
> > @@ -1913,6 +1915,10 @@ static struct vmap_area *find_unlink_vmap_area(unsigned long addr)
> >
> >  #define VMAP_BLOCK_SIZE		(VMAP_BBMAP_BITS * PAGE_SIZE)
> >
> > +#define VMAP_RAM		0x1 /* indicates vm_map_ram area*/
> > +#define VMAP_BLOCK		0x2 /* mark out the vmap_block sub-type*/
> > +#define VMAP_FLAGS_MASK		0x3
> > +
> >  struct vmap_block_queue {
> >  	spinlock_t lock;
> >  	struct list_head free;
> > @@ -1988,7 +1994,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
> >
> >  	va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
> >  					VMALLOC_START, VMALLOC_END,
> > -					node, gfp_mask);
> > +					node, gfp_mask,
> > +					VMAP_RAM|VMAP_BLOCK);
> >  	if (IS_ERR(va)) {
> >  		kfree(vb);
> >  		return ERR_CAST(va);
> > @@ -2297,7 +2304,8 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node)
> >  	} else {
> >  		struct vmap_area *va;
> >  		va = alloc_vmap_area(size, PAGE_SIZE,
> > -				VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
> > +				VMALLOC_START, VMALLOC_END,
> > +				node, GFP_KERNEL, VMAP_RAM);
> >  		if (IS_ERR(va))
> >  			return NULL;
> >
> > @@ -2537,7 +2545,7 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
> >  	if (!(flags & VM_NO_GUARD))
> >  		size += PAGE_SIZE;
> >
> > -	va = alloc_vmap_area(size, align, start, end, node, gfp_mask);
> > +	va = alloc_vmap_area(size, align, start, end, node, gfp_mask, 0);
> >  	if (IS_ERR(va)) {
> >  		kfree(area);
> >  		return NULL;
> > --
> > 2.34.1
> >
> 
> Other than that typo,
> 
> Reviewed-by: Lorenzo Stoakes <lstoakes@gmail.com>
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-16 13:08       ` Dan Carpenter
@ 2023-01-18  2:17         ` Baoquan He
  0 siblings, 0 replies; 31+ messages in thread
From: Baoquan He @ 2023-01-18  2:17 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: oe-kbuild, linux-mm, lkp, oe-kbuild-all, linux-kernel, urezki,
	lstoakes, stephen.s.brennan, willy, akpm, hch

On 01/16/23 at 04:08pm, Dan Carpenter wrote:
> On Sun, Jan 15, 2023 at 10:08:55PM +0800, Baoquan He wrote:
> > > f181234a5a21fd0 Chen Wandun             2021-09-02  3650  	if ((unsigned long)addr + count <= va->va_start)
> > > f181234a5a21fd0 Chen Wandun             2021-09-02  3651  		goto finished;
> > > f181234a5a21fd0 Chen Wandun             2021-09-02  3652  
> > > f608788cd2d6cae Serapheim Dimitropoulos 2021-04-29  3653  	list_for_each_entry_from(va, &vmap_area_list, list) {
> > > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3654  		if (!count)
> > > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3655  			break;
> > > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3656  
> > > 129dbdf298d7383 Baoquan He              2023-01-13  3657  		vm = va->vm;
> > > 129dbdf298d7383 Baoquan He              2023-01-13  3658  		flags = va->flags & VMAP_FLAGS_MASK;
> > > 129dbdf298d7383 Baoquan He              2023-01-13  3659  
> > > 129dbdf298d7383 Baoquan He              2023-01-13  3660  		if (!vm && !flags)
> > >                                                                             ^^^
> > > vm can be NULL if a flag in VMAP_FLAGS_MASK is set.
> > > 
> > > e81ce85f960c2e2 Joonsoo Kim             2013-04-29  3661  			continue;
> > 
> > Right, after the 'continue;' line, only two cases could happen when it
> > comes here. (vm != null) or (vm->flags & VMAP_RAM) is true.
> >
> 
> You're saying VMAP_RAM, but strictly speaking the code is checking
> VMAP_FLAGS_MASK and not VMAP_RAM.
> 
> +#define VMAP_RAM               0x1 /* indicates vm_map_ram area*/
> +#define VMAP_BLOCK             0x2 /* mark out the vmap_block sub-type*/
> +#define VMAP_FLAGS_MASK                0x3
> 
> If we assume that vm is NULL, VMAP_BLOCK is set and VMAP_RAM is clear
> then it would lead to a NULL dereference.  There might be reasons why
> that combination is impossible outside the function but we can't tell
> from the information we have here.

VMAP_BLOCK has no chance to be set alone. It has to be set together with
VMAP_RAM if needed.

> 
> Which is fine, outside information is a common reason for false
> positives with this check.  But I was just concerned about the mix of
> VMAP_FLAGS_MASK and VMAP_RAM.

Thanks, I see your point now, will consider how to improve it.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-16 11:50   ` Uladzislau Rezki
@ 2023-01-19  9:52     ` Baoquan He
  2023-01-19 12:48       ` Baoquan He
  0 siblings, 1 reply; 31+ messages in thread
From: Baoquan He @ 2023-01-19  9:52 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, linux-kernel, lstoakes, stephen.s.brennan, willy, akpm, hch

On 01/16/23 at 12:50pm, Uladzislau Rezki wrote:
> On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> > Currently, vread can read out vmalloc areas which is associated with
> > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > interface because it doesn't have an associated vm_struct. Then in vread(),
> > these areas are all skipped.
> > 
> > Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
> > The area created with vmap_ram_vread() interface directly can be handled
> > like the other normal vmap areas with aligned_vread(). While areas
> > which will be further subdivided and managed with vmap_block need
> > carefully read out page-aligned small regions and zero fill holes.
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  mm/vmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 73 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index ab4825050b5c..13875bc41e27 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -3544,6 +3544,65 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> >  	return copied;
> >  }
> >  
> > +static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
> > +{
> > +	char *start;
> > +	struct vmap_block *vb;
> > +	unsigned long offset;
> > +	unsigned int rs, re, n;
> > +
> > +	/*
> > +	 * If it's area created by vm_map_ram() interface directly, but
> > +	 * not further subdividing and delegating management to vmap_block,
> > +	 * handle it here.
> > +	 */
> > +	if (!(flags & VMAP_BLOCK)) {
> > +		aligned_vread(buf, addr, count);
> > +		return;
> > +	}
> > +
> > +	/*
> > +	 * Area is split into regions and tracked with vmap_block, read out
> > +	 * each region and zero fill the hole between regions.
> > +	 */
> > +	vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > +
> > +	spin_lock(&vb->lock);
> > +	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> >
> CPU-X invokes free_vmap_block() whereas we take the vb->lock and do
> some manipulations with vb that might be already freed over RCU-core.
> 
> Should we protect it by the rcu_read_lock() also here?

Just go over the vb and vbq code again, seems we don't need the
rcu_read_lock() here. The rcu lock is needed when operating on the
vmap_block_queue->free list. I don't see race between the vb accessing
here and those list adding or removing on vmap_block_queue->free with
rcu. If I miss some race windows between them, please help point out.

However, when I check free_vmap_block(), I do find a risk. As you said,
CPU-x invokes free_vmap_block() and executed xa_erase() to remove the vb
from vmap_blocks tree. Then vread() comes into vmap_ram_vread() and call
xa_load(), it would be null. I should check the returned vb in
free_vmap_block().


static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
{
......
if (!(flags & VMAP_BLOCK)) {
                aligned_vread(buf, addr, count);
                return;
        }

        /*
         * Area is split into regions and tracked with vmap_block, read out
         * each region and zero fill the hole between regions.
         */
        vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
	if (!vb)    <-- vb need be checked here to avoid accessing erased vb from vmap_blocks tree
		memset(buf, 0, count);
......
}



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-19  9:52     ` Baoquan He
@ 2023-01-19 12:48       ` Baoquan He
  2023-01-20 11:54         ` Uladzislau Rezki
  0 siblings, 1 reply; 31+ messages in thread
From: Baoquan He @ 2023-01-19 12:48 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: linux-mm, linux-kernel, lstoakes, stephen.s.brennan, willy, akpm, hch

On 01/19/23 at 05:52pm, Baoquan He wrote:
> On 01/16/23 at 12:50pm, Uladzislau Rezki wrote:
> > On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> > > Currently, vread can read out vmalloc areas which is associated with
> > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > interface because it doesn't have an associated vm_struct. Then in vread(),
> > > these areas are all skipped.
> > > 
> > > Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
> > > The area created with vmap_ram_vread() interface directly can be handled
> > > like the other normal vmap areas with aligned_vread(). While areas
> > > which will be further subdivided and managed with vmap_block need
> > > carefully read out page-aligned small regions and zero fill holes.
> > > 
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  mm/vmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 73 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > index ab4825050b5c..13875bc41e27 100644
> > > --- a/mm/vmalloc.c
> > > +++ b/mm/vmalloc.c
> > > @@ -3544,6 +3544,65 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> > >  	return copied;
> > >  }
> > >  
> > > +static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
> > > +{
> > > +	char *start;
> > > +	struct vmap_block *vb;
> > > +	unsigned long offset;
> > > +	unsigned int rs, re, n;
> > > +
> > > +	/*
> > > +	 * If it's area created by vm_map_ram() interface directly, but
> > > +	 * not further subdividing and delegating management to vmap_block,
> > > +	 * handle it here.
> > > +	 */
> > > +	if (!(flags & VMAP_BLOCK)) {
> > > +		aligned_vread(buf, addr, count);
> > > +		return;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Area is split into regions and tracked with vmap_block, read out
> > > +	 * each region and zero fill the hole between regions.
> > > +	 */
> > > +	vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > > +
> > > +	spin_lock(&vb->lock);
> > > +	if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > >
> > CPU-X invokes free_vmap_block() whereas we take the vb->lock and do
> > some manipulations with vb that might be already freed over RCU-core.
> > 
> > Should we protect it by the rcu_read_lock() also here?
> 
> Just go over the vb and vbq code again, seems we don't need the
> rcu_read_lock() here. The rcu lock is needed when operating on the
> vmap_block_queue->free list. I don't see race between the vb accessing
> here and those list adding or removing on vmap_block_queue->free with
> rcu. If I miss some race windows between them, please help point out.
> 
> However, when I check free_vmap_block(), I do find a risk. As you said,

Forgot to add details about why there's no race between free_vmap_block()
and vmap_ram_vread() because we have taken vmap_area_lock at the beginning
of vread(). So, except of the missing checking on returned value from
xa_load(), free_vmap_block() either is blocked to wait for vmap_area_lock
before calling unlink_va(), or finishes calling unlink_va() to remove
the vmap from vmap_area_root tree. In both cases, no race happened.

> CPU-x invokes free_vmap_block() and executed xa_erase() to remove the vb
> from vmap_blocks tree. Then vread() comes into vmap_ram_vread() and call
> xa_load(), it would be null. I should check the returned vb in
> free_vmap_block().
> 
> 
> static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
> {
> ......
> if (!(flags & VMAP_BLOCK)) {
>                 aligned_vread(buf, addr, count);
>                 return;
>         }
> 
>         /*
>          * Area is split into regions and tracked with vmap_block, read out
>          * each region and zero fill the hole between regions.
>          */
>         vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> 	if (!vb)    <-- vb need be checked here to avoid accessing erased vb from vmap_blocks tree
> 		memset(buf, 0, count);
> ......
> }
> 



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas
  2023-01-19 12:48       ` Baoquan He
@ 2023-01-20 11:54         ` Uladzislau Rezki
  0 siblings, 0 replies; 31+ messages in thread
From: Uladzislau Rezki @ 2023-01-20 11:54 UTC (permalink / raw)
  To: Baoquan He
  Cc: linux-mm, linux-kernel, lstoakes, stephen.s.brennan, willy, akpm, hch

>
> On 01/19/23 at 05:52pm, Baoquan He wrote:
> > On 01/16/23 at 12:50pm, Uladzislau Rezki wrote:
> > > On Fri, Jan 13, 2023 at 11:19:17AM +0800, Baoquan He wrote:
> > > > Currently, vread can read out vmalloc areas which is associated with
> > > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > > interface because it doesn't have an associated vm_struct. Then in vread(),
> > > > these areas are all skipped.
> > > >
> > > > Here, add a new function vmap_ram_vread() to read out vm_map_ram areas.
> > > > The area created with vmap_ram_vread() interface directly can be handled
> > > > like the other normal vmap areas with aligned_vread(). While areas
> > > > which will be further subdivided and managed with vmap_block need
> > > > carefully read out page-aligned small regions and zero fill holes.
> > > >
> > > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > > ---
> > > >  mm/vmalloc.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 73 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > > > index ab4825050b5c..13875bc41e27 100644
> > > > --- a/mm/vmalloc.c
> > > > +++ b/mm/vmalloc.c
> > > > @@ -3544,6 +3544,65 @@ static int aligned_vread(char *buf, char *addr, unsigned long count)
> > > >   return copied;
> > > >  }
> > > >
> > > > +static void vmap_ram_vread(char *buf, char *addr, int count, unsigned long flags)
> > > > +{
> > > > + char *start;
> > > > + struct vmap_block *vb;
> > > > + unsigned long offset;
> > > > + unsigned int rs, re, n;
> > > > +
> > > > + /*
> > > > +  * If it's area created by vm_map_ram() interface directly, but
> > > > +  * not further subdividing and delegating management to vmap_block,
> > > > +  * handle it here.
> > > > +  */
> > > > + if (!(flags & VMAP_BLOCK)) {
> > > > +         aligned_vread(buf, addr, count);
> > > > +         return;
> > > > + }
> > > > +
> > > > + /*
> > > > +  * Area is split into regions and tracked with vmap_block, read out
> > > > +  * each region and zero fill the hole between regions.
> > > > +  */
> > > > + vb = xa_load(&vmap_blocks, addr_to_vb_idx((unsigned long)addr));
> > > > +
> > > > + spin_lock(&vb->lock);
> > > > + if (bitmap_empty(vb->used_map, VMAP_BBMAP_BITS)) {
> > > >
> > > CPU-X invokes free_vmap_block() whereas we take the vb->lock and do
> > > some manipulations with vb that might be already freed over RCU-core.
> > >
> > > Should we protect it by the rcu_read_lock() also here?
> >
> > Just go over the vb and vbq code again, seems we don't need the
> > rcu_read_lock() here. The rcu lock is needed when operating on the
> > vmap_block_queue->free list. I don't see race between the vb accessing
> > here and those list adding or removing on vmap_block_queue->free with
> > rcu. If I miss some race windows between them, please help point out.
> >
> > However, when I check free_vmap_block(), I do find a risk. As you said,
>
> Forgot to add details about why there's no race between free_vmap_block()
> and vmap_ram_vread() because we have taken vmap_area_lock at the beginning
> of vread(). So, except of the missing checking on returned value from
> xa_load(), free_vmap_block() either is blocked to wait for vmap_area_lock
> before calling unlink_va(), or finishes calling unlink_va() to remove
> the vmap from vmap_area_root tree. In both cases, no race happened.
>
Agree. xa_load()s return value should be checked. Because it can be that
there is no vmap_block associated with an address if xa_erase() was done
earlier.

--
Uladzislau Rezki


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-01-20 11:55 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-13  3:19 [PATCH v3 0/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
2023-01-13  3:19 ` [PATCH v3 1/7] mm/vmalloc.c: add used_map into vmap_block to track space of vmap_block Baoquan He
2023-01-16 11:39   ` Uladzislau Rezki
2023-01-16 12:22   ` Lorenzo Stoakes
2023-01-13  3:19 ` [PATCH v3 2/7] mm/vmalloc.c: add flags to mark vm_map_ram area Baoquan He
2023-01-16 11:42   ` Uladzislau Rezki
2023-01-16 12:23   ` Lorenzo Stoakes
2023-01-18  2:13     ` Baoquan He
2023-01-13  3:19 ` [PATCH v3 3/7] mm/vmalloc.c: allow vread() to read out vm_map_ram areas Baoquan He
2023-01-14  7:57   ` Dan Carpenter
2023-01-15 14:08     ` Baoquan He
2023-01-16 13:08       ` Dan Carpenter
2023-01-18  2:17         ` Baoquan He
2023-01-16 11:50   ` Uladzislau Rezki
2023-01-19  9:52     ` Baoquan He
2023-01-19 12:48       ` Baoquan He
2023-01-20 11:54         ` Uladzislau Rezki
2023-01-16 19:01   ` Matthew Wilcox
2023-01-16 19:48     ` Lorenzo Stoakes
2023-01-13  3:19 ` [PATCH v3 4/7] mm/vmalloc: explicitly identify vm_map_ram area when shown in /proc/vmcoreinfo Baoquan He
2023-01-16 11:43   ` Uladzislau Rezki
2023-01-16 12:24   ` Lorenzo Stoakes
2023-01-13  3:19 ` [PATCH v3 5/7] mm/vmalloc: skip the uninitilized vmalloc areas Baoquan He
2023-01-16 11:44   ` Uladzislau Rezki
2023-01-16 12:24   ` Lorenzo Stoakes
2023-01-13  3:19 ` [PATCH v3 6/7] powerpc: mm: add VM_IOREMAP flag to the vmalloc area Baoquan He
2023-01-16 11:46   ` Uladzislau Rezki
2023-01-16 12:25   ` Lorenzo Stoakes
2023-01-13  3:19 ` [PATCH v3 7/7] sh: mm: set " Baoquan He
2023-01-16 11:46   ` Uladzislau Rezki
2023-01-16 12:25   ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).