[PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-13 12:12 ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

Hello all.

Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
has free space, but still new blocks continue to appear.  Further investigation
showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
small 32bit systems that's not a big problem, cause purging will be called soon
on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
x86_64 has 45 bits of vmalloc space, that can be a disaster.

Fixing this I also did some tweaks in allocation logic of a new vmap block and
replaced dirty bitmap with min/max dirty range values to make the logic simpler.

I would like to receive comments on the following three patches.

Thanks.

Roman Pen (3):
  mm/vmalloc: fix possible exhaustion of vmalloc space caused by
    vm_map_ram allocator
  mm/vmalloc: occupy newly allocated vmap block just after allocation
  mm/vmalloc: get rid of dirty bitmap inside vmap_block structure

 mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 54 insertions(+), 40 deletions(-)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
-- 
1.9.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-13 12:12 ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

Hello all.

Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
has free space, but still new blocks continue to appear.  Further investigation
showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
small 32bit systems that's not a big problem, cause purging will be called soon
on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
x86_64 has 45 bits of vmalloc space, that can be a disaster.

Fixing this I also did some tweaks in allocation logic of a new vmap block and
replaced dirty bitmap with min/max dirty range values to make the logic simpler.

I would like to receive comments on the following three patches.

Thanks.

Roman Pen (3):
  mm/vmalloc: fix possible exhaustion of vmalloc space caused by
    vm_map_ram allocator
  mm/vmalloc: occupy newly allocated vmap block just after allocation
  mm/vmalloc: get rid of dirty bitmap inside vmap_block structure

 mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 54 insertions(+), 40 deletions(-)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-13 12:12 ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

Hello all.

Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
has free space, but still new blocks continue to appear.  Further investigation
showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
small 32bit systems that's not a big problem, cause purging will be called soon
on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
x86_64 has 45 bits of vmalloc space, that can be a disaster.

Fixing this I also did some tweaks in allocation logic of a new vmap block and
replaced dirty bitmap with min/max dirty range values to make the logic simpler.

I would like to receive comments on the following three patches.

Thanks.

Roman Pen (3):
  mm/vmalloc: fix possible exhaustion of vmalloc space caused by
    vm_map_ram allocator
  mm/vmalloc: occupy newly allocated vmap block just after allocation
  mm/vmalloc: get rid of dirty bitmap inside vmap_block structure

 mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 54 insertions(+), 40 deletions(-)

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-13 12:12 ` Roman Pen
  (?)
@ 2015-03-13 12:12   ` Roman Pen
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

If suitable block can't be found, new block is allocated and put into a head
of a free list, so on next iteration this new block will be found first.

That's bad, because old blocks in a free list will not get a chance to be fully
used, thus fragmentation will grow.

Let's consider this simple example:

 #1 We have one block in a free list which is partially used, and where only
    one page is free:

    HEAD |xxxxxxxxx-| TAIL
                   ^
                   free space for 1 page, order 0

 #2 New allocation request of order 1 (2 pages) comes, new block is allocated
    since we do not have free space to complete this request. New block is put
    into a head of a free list:

    HEAD |----------|xxxxxxxxx-| TAIL

 #3 Two pages were occupied in a new found block:

    HEAD |xx--------|xxxxxxxxx-| TAIL
          ^
          two pages mapped here

 #4 New allocation request of order 0 (1 page) comes.  Block, which was created
    on #2 step, is located at the beginning of a free list, so it will be found
    first:

  HEAD |xxX-------|xxxxxxxxx-| TAIL
          ^                 ^
          page mapped here, but better to use this hole

It is obvious, that it is better to complete request of #4 step using the old
block, where free space is left, because in other case fragmentation will be
highly increased.

But fragmentation is not only the case.  The most worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by blocks,
which are not used, but already dirty and have several free pages.

Let's consider this function which execution should be pinned to one CPU:

 ------------------------------------------------------------------------------
/* Here we consider that our block is equal to 1MB, thus 256 pages */
static void exhaust_virtual_space(struct page *pages[256], int iters)
{
	/* Firstly we have to map a big chunk, e.g. 16 pages.
	 * Then we have to occupy the remaining space with smaller
	 * chunks, i.e. 8 pages. At the end small hole should remain.
	 * So at the end of our allocation sequence block looks like
	 * this:
	 *                XX  big chunk
	 * |XXxxxxxxx-|    x  small chunk
	 *                 -  hole, which is enough for a small chunk,
	 *                    but not for a big chunk
	 */
	unsigned big_allocs   = 1;
	/* -1 for hole, which should be left at the end of each block
	 * to keep it partially used, with some free space available */
	unsigned small_allocs = (256 - 16) / 8 - 1;
	void    *vaddrs[big_allocs + small_allocs];

	while (iters--) {
		int i = 0, j;

		/* Map big chunk */
		vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);

		/* Map small chunks */
		for (j = 0; j < small_allocs; j++)
			vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
						 PAGE_KERNEL);

		/* Unmap everything */
		while (i--)
			vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
	}
}
 ------------------------------------------------------------------------------

On every iteration new block (1MB of vm area in my case) will be allocated and
then will be occupied, without attempt to resolve small allocation request
using previously allocated blocks in a free list.

In current patch I simply put newly allocated block to the tail of a free list,
thus reduce fragmentation, giving a chance to resolve allocation request using
older blocks with possible holes left.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..db6bffb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 
 	vbq = &get_cpu_var(vmap_block_queue);
 	spin_lock(&vbq->lock);
-	list_add_rcu(&vb->free_list, &vbq->free);
+	list_add_tail_rcu(&vb->free_list, &vbq->free);
 	spin_unlock(&vbq->lock);
 	put_cpu_var(vmap_block_queue);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-13 12:12   ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

If suitable block can't be found, new block is allocated and put into a head
of a free list, so on next iteration this new block will be found first.

That's bad, because old blocks in a free list will not get a chance to be fully
used, thus fragmentation will grow.

Let's consider this simple example:

 #1 We have one block in a free list which is partially used, and where only
    one page is free:

    HEAD |xxxxxxxxx-| TAIL
                   ^
                   free space for 1 page, order 0

 #2 New allocation request of order 1 (2 pages) comes, new block is allocated
    since we do not have free space to complete this request. New block is put
    into a head of a free list:

    HEAD |----------|xxxxxxxxx-| TAIL

 #3 Two pages were occupied in a new found block:

    HEAD |xx--------|xxxxxxxxx-| TAIL
          ^
          two pages mapped here

 #4 New allocation request of order 0 (1 page) comes.  Block, which was created
    on #2 step, is located at the beginning of a free list, so it will be found
    first:

  HEAD |xxX-------|xxxxxxxxx-| TAIL
          ^                 ^
          page mapped here, but better to use this hole

It is obvious, that it is better to complete request of #4 step using the old
block, where free space is left, because in other case fragmentation will be
highly increased.

But fragmentation is not only the case.  The most worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by blocks,
which are not used, but already dirty and have several free pages.

Let's consider this function which execution should be pinned to one CPU:

 ------------------------------------------------------------------------------
/* Here we consider that our block is equal to 1MB, thus 256 pages */
static void exhaust_virtual_space(struct page *pages[256], int iters)
{
	/* Firstly we have to map a big chunk, e.g. 16 pages.
	 * Then we have to occupy the remaining space with smaller
	 * chunks, i.e. 8 pages. At the end small hole should remain.
	 * So at the end of our allocation sequence block looks like
	 * this:
	 *                XX  big chunk
	 * |XXxxxxxxx-|    x  small chunk
	 *                 -  hole, which is enough for a small chunk,
	 *                    but not for a big chunk
	 */
	unsigned big_allocs   = 1;
	/* -1 for hole, which should be left at the end of each block
	 * to keep it partially used, with some free space available */
	unsigned small_allocs = (256 - 16) / 8 - 1;
	void    *vaddrs[big_allocs + small_allocs];

	while (iters--) {
		int i = 0, j;

		/* Map big chunk */
		vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);

		/* Map small chunks */
		for (j = 0; j < small_allocs; j++)
			vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
						 PAGE_KERNEL);

		/* Unmap everything */
		while (i--)
			vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
	}
}
 ------------------------------------------------------------------------------

On every iteration new block (1MB of vm area in my case) will be allocated and
then will be occupied, without attempt to resolve small allocation request
using previously allocated blocks in a free list.

In current patch I simply put newly allocated block to the tail of a free list,
thus reduce fragmentation, giving a chance to resolve allocation request using
older blocks with possible holes left.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..db6bffb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 
 	vbq = &get_cpu_var(vmap_block_queue);
 	spin_lock(&vbq->lock);
-	list_add_rcu(&vb->free_list, &vbq->free);
+	list_add_tail_rcu(&vb->free_list, &vbq->free);
 	spin_unlock(&vbq->lock);
 	put_cpu_var(vmap_block_queue);
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-13 12:12   ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel, stable

If suitable block can't be found, new block is allocated and put into a head
of a free list, so on next iteration this new block will be found first.

That's bad, because old blocks in a free list will not get a chance to be fully
used, thus fragmentation will grow.

Let's consider this simple example:

 #1 We have one block in a free list which is partially used, and where only
    one page is free:

    HEAD |xxxxxxxxx-| TAIL
                   ^
                   free space for 1 page, order 0

 #2 New allocation request of order 1 (2 pages) comes, new block is allocated
    since we do not have free space to complete this request. New block is put
    into a head of a free list:

    HEAD |----------|xxxxxxxxx-| TAIL

 #3 Two pages were occupied in a new found block:

    HEAD |xx--------|xxxxxxxxx-| TAIL
          ^
          two pages mapped here

 #4 New allocation request of order 0 (1 page) comes.  Block, which was created
    on #2 step, is located at the beginning of a free list, so it will be found
    first:

  HEAD |xxX-------|xxxxxxxxx-| TAIL
          ^                 ^
          page mapped here, but better to use this hole

It is obvious, that it is better to complete request of #4 step using the old
block, where free space is left, because in other case fragmentation will be
highly increased.

But fragmentation is not only the case.  The most worst thing is that I can
easily create scenario, when the whole vmalloc space is exhausted by blocks,
which are not used, but already dirty and have several free pages.

Let's consider this function which execution should be pinned to one CPU:

 ------------------------------------------------------------------------------
/* Here we consider that our block is equal to 1MB, thus 256 pages */
static void exhaust_virtual_space(struct page *pages[256], int iters)
{
	/* Firstly we have to map a big chunk, e.g. 16 pages.
	 * Then we have to occupy the remaining space with smaller
	 * chunks, i.e. 8 pages. At the end small hole should remain.
	 * So at the end of our allocation sequence block looks like
	 * this:
	 *                XX  big chunk
	 * |XXxxxxxxx-|    x  small chunk
	 *                 -  hole, which is enough for a small chunk,
	 *                    but not for a big chunk
	 */
	unsigned big_allocs   = 1;
	/* -1 for hole, which should be left at the end of each block
	 * to keep it partially used, with some free space available */
	unsigned small_allocs = (256 - 16) / 8 - 1;
	void    *vaddrs[big_allocs + small_allocs];

	while (iters--) {
		int i = 0, j;

		/* Map big chunk */
		vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);

		/* Map small chunks */
		for (j = 0; j < small_allocs; j++)
			vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
						 PAGE_KERNEL);

		/* Unmap everything */
		while (i--)
			vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
	}
}
 ------------------------------------------------------------------------------

On every iteration new block (1MB of vm area in my case) will be allocated and
then will be occupied, without attempt to resolve small allocation request
using previously allocated blocks in a free list.

In current patch I simply put newly allocated block to the tail of a free list,
thus reduce fragmentation, giving a chance to resolve allocation request using
older blocks with possible holes left.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 39c3388..db6bffb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -837,7 +837,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 
 	vbq = &get_cpu_var(vmap_block_queue);
 	spin_lock(&vbq->lock);
-	list_add_rcu(&vb->free_list, &vbq->free);
+	list_add_tail_rcu(&vb->free_list, &vbq->free);
 	spin_unlock(&vbq->lock);
 	put_cpu_var(vmap_block_queue);
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] mm/vmalloc: occupy newly allocated vmap block just after allocation
  2015-03-13 12:12 ` Roman Pen
@ 2015-03-13 12:12   ` Roman Pen
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Nick Piggin, Andrew Morton, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel

Previous implementation allocates new vmap block and repeats search of a free
block from the very beginning, iterating over the CPU free list.

Why it can be better??

1. Allocation can happen on one CPU, but search can be done on another CPU.
   In worst case we preallocate amount of vmap blocks which is equal to
   CPU number on the system.

2. In previous patch I added newly allocated block to the tail of free list
   to avoid soon exhaustion of virtual space and give a chance to occupy
   blocks which were allocated long time ago.  Thus to find newly allocated
   block all the search sequence should be repeated, seems it is not efficient.

In this patch newly allocated block is occupied right away, address of virtual
space is returned to the caller, so there is no any need to repeat the search
sequence, allocation job is done.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index db6bffb..9759c92 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -791,13 +791,30 @@ static unsigned long addr_to_vb_idx(unsigned long addr)
 	return addr;
 }
 
-static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
+static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
+{
+	unsigned long addr = va_start + (pages_off << PAGE_SHIFT);
+	BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start));
+	return (void *)addr;
+}
+
+/**
+ * new_vmap_block - allocates new vmap_block and occupies 2^order pages in this
+ *                  block. Of course pages number can't exceed VMAP_BBMAP_BITS
+ * @order:    how many 2^order pages should be occupied in newly allocated block
+ * @gfp_mask: flags for the page level allocator
+ * @addr:     output virtual address of a newly allocator block
+ *
+ * Returns: address of virtual space in a block or ERR_PTR
+ */
+static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 {
 	struct vmap_block_queue *vbq;
 	struct vmap_block *vb;
 	struct vmap_area *va;
 	unsigned long vb_idx;
 	int node, err;
+	void *vaddr;
 
 	node = numa_node_id();
 
@@ -821,9 +838,12 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 		return ERR_PTR(err);
 	}
 
+	vaddr = vmap_block_vaddr(va->va_start, 0);
 	spin_lock_init(&vb->lock);
 	vb->va = va;
-	vb->free = VMAP_BBMAP_BITS;
+	/* At least something should be left free */
+	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
 	bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
 	INIT_LIST_HEAD(&vb->free_list);
@@ -841,7 +861,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 	spin_unlock(&vbq->lock);
 	put_cpu_var(vmap_block_queue);
 
-	return vb;
+	return vaddr;
 }
 
 static void free_vmap_block(struct vmap_block *vb)
@@ -905,7 +925,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 {
 	struct vmap_block_queue *vbq;
 	struct vmap_block *vb;
-	unsigned long addr = 0;
+	void *vaddr = NULL;
 	unsigned int order;
 
 	BUG_ON(size & ~PAGE_MASK);
@@ -920,43 +940,38 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 	}
 	order = get_order(size);
 
-again:
 	rcu_read_lock();
 	vbq = &get_cpu_var(vmap_block_queue);
 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-		int i;
+		unsigned long pages_nr;
 
 		spin_lock(&vb->lock);
-		if (vb->free < 1UL << order)
-			goto next;
+		if (vb->free < (1UL << order)) {
+			spin_unlock(&vb->lock);
+			continue;
+		}
 
-		i = VMAP_BBMAP_BITS - vb->free;
-		addr = vb->va->va_start + (i << PAGE_SHIFT);
-		BUG_ON(addr_to_vb_idx(addr) !=
-				addr_to_vb_idx(vb->va->va_start));
+		pages_nr = VMAP_BBMAP_BITS - vb->free;
+		vaddr = vmap_block_vaddr(vb->va->va_start, pages_nr);
 		vb->free -= 1UL << order;
 		if (vb->free == 0) {
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
 			spin_unlock(&vbq->lock);
 		}
+
 		spin_unlock(&vb->lock);
 		break;
-next:
-		spin_unlock(&vb->lock);
 	}
 
 	put_cpu_var(vmap_block_queue);
 	rcu_read_unlock();
 
-	if (!addr) {
-		vb = new_vmap_block(gfp_mask);
-		if (IS_ERR(vb))
-			return vb;
-		goto again;
-	}
+	/* Allocate new block if nothing was found */
+	if (!vaddr)
+		vaddr = new_vmap_block(order, gfp_mask);
 
-	return (void *)addr;
+	return vaddr;
 }
 
 static void vb_free(const void *addr, unsigned long size)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] mm/vmalloc: occupy newly allocated vmap block just after allocation
@ 2015-03-13 12:12   ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Nick Piggin, Andrew Morton, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, linux-mm, linux-kernel

Previous implementation allocates new vmap block and repeats search of a free
block from the very beginning, iterating over the CPU free list.

Why it can be better??

1. Allocation can happen on one CPU, but search can be done on another CPU.
   In worst case we preallocate amount of vmap blocks which is equal to
   CPU number on the system.

2. In previous patch I added newly allocated block to the tail of free list
   to avoid soon exhaustion of virtual space and give a chance to occupy
   blocks which were allocated long time ago.  Thus to find newly allocated
   block all the search sequence should be repeated, seems it is not efficient.

In this patch newly allocated block is occupied right away, address of virtual
space is returned to the caller, so there is no any need to repeat the search
sequence, allocation job is done.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index db6bffb..9759c92 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -791,13 +791,30 @@ static unsigned long addr_to_vb_idx(unsigned long addr)
 	return addr;
 }
 
-static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
+static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
+{
+	unsigned long addr = va_start + (pages_off << PAGE_SHIFT);
+	BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start));
+	return (void *)addr;
+}
+
+/**
+ * new_vmap_block - allocates new vmap_block and occupies 2^order pages in this
+ *                  block. Of course pages number can't exceed VMAP_BBMAP_BITS
+ * @order:    how many 2^order pages should be occupied in newly allocated block
+ * @gfp_mask: flags for the page level allocator
+ * @addr:     output virtual address of a newly allocator block
+ *
+ * Returns: address of virtual space in a block or ERR_PTR
+ */
+static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 {
 	struct vmap_block_queue *vbq;
 	struct vmap_block *vb;
 	struct vmap_area *va;
 	unsigned long vb_idx;
 	int node, err;
+	void *vaddr;
 
 	node = numa_node_id();
 
@@ -821,9 +838,12 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 		return ERR_PTR(err);
 	}
 
+	vaddr = vmap_block_vaddr(va->va_start, 0);
 	spin_lock_init(&vb->lock);
 	vb->va = va;
-	vb->free = VMAP_BBMAP_BITS;
+	/* At least something should be left free */
+	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
+	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
 	bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
 	INIT_LIST_HEAD(&vb->free_list);
@@ -841,7 +861,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
 	spin_unlock(&vbq->lock);
 	put_cpu_var(vmap_block_queue);
 
-	return vb;
+	return vaddr;
 }
 
 static void free_vmap_block(struct vmap_block *vb)
@@ -905,7 +925,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 {
 	struct vmap_block_queue *vbq;
 	struct vmap_block *vb;
-	unsigned long addr = 0;
+	void *vaddr = NULL;
 	unsigned int order;
 
 	BUG_ON(size & ~PAGE_MASK);
@@ -920,43 +940,38 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
 	}
 	order = get_order(size);
 
-again:
 	rcu_read_lock();
 	vbq = &get_cpu_var(vmap_block_queue);
 	list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-		int i;
+		unsigned long pages_nr;
 
 		spin_lock(&vb->lock);
-		if (vb->free < 1UL << order)
-			goto next;
+		if (vb->free < (1UL << order)) {
+			spin_unlock(&vb->lock);
+			continue;
+		}
 
-		i = VMAP_BBMAP_BITS - vb->free;
-		addr = vb->va->va_start + (i << PAGE_SHIFT);
-		BUG_ON(addr_to_vb_idx(addr) !=
-				addr_to_vb_idx(vb->va->va_start));
+		pages_nr = VMAP_BBMAP_BITS - vb->free;
+		vaddr = vmap_block_vaddr(vb->va->va_start, pages_nr);
 		vb->free -= 1UL << order;
 		if (vb->free == 0) {
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
 			spin_unlock(&vbq->lock);
 		}
+
 		spin_unlock(&vb->lock);
 		break;
-next:
-		spin_unlock(&vb->lock);
 	}
 
 	put_cpu_var(vmap_block_queue);
 	rcu_read_unlock();
 
-	if (!addr) {
-		vb = new_vmap_block(gfp_mask);
-		if (IS_ERR(vb))
-			return vb;
-		goto again;
-	}
+	/* Allocate new block if nothing was found */
+	if (!vaddr)
+		vaddr = new_vmap_block(order, gfp_mask);
 
-	return (void *)addr;
+	return vaddr;
 }
 
 static void vb_free(const void *addr, unsigned long size)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
  2015-03-13 12:12 ` Roman Pen
@ 2015-03-13 12:12   ` Roman Pen
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Nick Piggin, Zhang Yanfei, Andrew Morton,
	Eric Dumazet, Joonsoo Kim, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel

In original implementation of vm_map_ram made by Nick Piggin there were two
bitmaps:  alloc_map and dirty_map.  None of them were used as supposed to be:
finding a suitable free hole for next allocation in block. vm_map_ram allocates
space sequentially in block and on free call marks pages as dirty, so freed
space can't be reused anymore.

Actually would be very interesting to know the real meaning of those bitmaps,
maybe implementation was incomplete, etc.

But long time ago Zhang Yanfei removed alloc_map by these two commits:

  mm/vmalloc.c: remove dead code in vb_alloc
     3fcd76e8028e0be37b02a2002b4f56755daeda06
  mm/vmalloc.c: remove alloc_map from vmap_block
     b8e748b6c32999f221ea4786557b8e7e6c4e4e7a

In current patch I replaced dirty_map with two range variables: dirty min and
max.  These variables store minimum and maximum position of dirty space in a
block, since we need only to know the dirty range, not exact position of dirty
pages.

Why it was made? Several reasons: at first glance it seems that vm_map_ram
allocator concerns about fragmentation thus it uses bitmaps for finding free
hole, but it is not true.  To avoid complexity seems it is better to use
something simple, like min or max range values.  Secondly, code also becomes
simpler, without iteration over bitmap, just comparing values in min and max
macros.  Thirdly, bitmap occupies up to 1024 bits (4MB is a max size of a
block).  Here I replaced the whole bitmap with two longs.

Finally vm_unmap_aliases should be slightly faster and the whole vmap_block
structure occupies less memory.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9759c92..77c4b8a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -760,7 +760,7 @@ struct vmap_block {
 	spinlock_t lock;
 	struct vmap_area *va;
 	unsigned long free, dirty;
-	DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS);
+	unsigned long dirty_min, dirty_max; /*< dirty range */
 	struct list_head free_list;
 	struct rcu_head rcu_head;
 	struct list_head purge;
@@ -845,7 +845,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
 	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
-	bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
+	vb->dirty_min = VMAP_BBMAP_BITS;
+	vb->dirty_max = 0;
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
@@ -896,7 +897,8 @@ static void purge_fragmented_blocks(int cpu)
 		if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
 			vb->free = 0; /* prevent further allocs after releasing lock */
 			vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
-			bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS);
+			vb->dirty_min = 0;
+			vb->dirty_max = VMAP_BBMAP_BITS;
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
 			spin_unlock(&vbq->lock);
@@ -989,6 +991,7 @@ static void vb_free(const void *addr, unsigned long size)
 	order = get_order(size);
 
 	offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
+	offset >>= PAGE_SHIFT;
 
 	vb_idx = addr_to_vb_idx((unsigned long)addr);
 	rcu_read_lock();
@@ -999,7 +1002,10 @@ static void vb_free(const void *addr, unsigned long size)
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
 
 	spin_lock(&vb->lock);
-	BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
+
+	/* Expand dirty range */
+	vb->dirty_min = min(vb->dirty_min, offset);
+	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
 
 	vb->dirty += 1UL << order;
 	if (vb->dirty == VMAP_BBMAP_BITS) {
@@ -1038,25 +1044,18 @@ void vm_unmap_aliases(void)
 
 		rcu_read_lock();
 		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-			int i, j;
-
 			spin_lock(&vb->lock);
-			i = find_first_bit(vb->dirty_map, VMAP_BBMAP_BITS);
-			if (i < VMAP_BBMAP_BITS) {
+			if (vb->dirty) {
+				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
 
-				j = find_last_bit(vb->dirty_map,
-							VMAP_BBMAP_BITS);
-				j = j + 1; /* need exclusive index */
+				s = va_start + (vb->dirty_min << PAGE_SHIFT);
+				e = va_start + (vb->dirty_max << PAGE_SHIFT);
 
-				s = vb->va->va_start + (i << PAGE_SHIFT);
-				e = vb->va->va_start + (j << PAGE_SHIFT);
-				flush = 1;
+				start = min(s, start);
+				end   = max(e, end);
 
-				if (s < start)
-					start = s;
-				if (e > end)
-					end = e;
+				flush = 1;
 			}
 			spin_unlock(&vb->lock);
 		}
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
@ 2015-03-13 12:12   ` Roman Pen
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Pen @ 2015-03-13 12:12 UTC (permalink / raw)
  Cc: Roman Pen, Nick Piggin, Zhang Yanfei, Andrew Morton,
	Eric Dumazet, Joonsoo Kim, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel

In original implementation of vm_map_ram made by Nick Piggin there were two
bitmaps:  alloc_map and dirty_map.  None of them were used as supposed to be:
finding a suitable free hole for next allocation in block. vm_map_ram allocates
space sequentially in block and on free call marks pages as dirty, so freed
space can't be reused anymore.

Actually would be very interesting to know the real meaning of those bitmaps,
maybe implementation was incomplete, etc.

But long time ago Zhang Yanfei removed alloc_map by these two commits:

  mm/vmalloc.c: remove dead code in vb_alloc
     3fcd76e8028e0be37b02a2002b4f56755daeda06
  mm/vmalloc.c: remove alloc_map from vmap_block
     b8e748b6c32999f221ea4786557b8e7e6c4e4e7a

In current patch I replaced dirty_map with two range variables: dirty min and
max.  These variables store minimum and maximum position of dirty space in a
block, since we need only to know the dirty range, not exact position of dirty
pages.

Why it was made? Several reasons: at first glance it seems that vm_map_ram
allocator concerns about fragmentation thus it uses bitmaps for finding free
hole, but it is not true.  To avoid complexity seems it is better to use
something simple, like min or max range values.  Secondly, code also becomes
simpler, without iteration over bitmap, just comparing values in min and max
macros.  Thirdly, bitmap occupies up to 1024 bits (4MB is a max size of a
block).  Here I replaced the whole bitmap with two longs.

Finally vm_unmap_aliases should be slightly faster and the whole vmap_block
structure occupies less memory.

Signed-off-by: Roman Pen <r.peniaev@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: David Rientjes <rientjes@google.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Christoph Lameter <cl@linux.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: Rob Jones <rob.jones@codethink.co.uk>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 35 +++++++++++++++++------------------
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9759c92..77c4b8a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -760,7 +760,7 @@ struct vmap_block {
 	spinlock_t lock;
 	struct vmap_area *va;
 	unsigned long free, dirty;
-	DECLARE_BITMAP(dirty_map, VMAP_BBMAP_BITS);
+	unsigned long dirty_min, dirty_max; /*< dirty range */
 	struct list_head free_list;
 	struct rcu_head rcu_head;
 	struct list_head purge;
@@ -845,7 +845,8 @@ static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
 	BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
 	vb->free = VMAP_BBMAP_BITS - (1UL << order);
 	vb->dirty = 0;
-	bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
+	vb->dirty_min = VMAP_BBMAP_BITS;
+	vb->dirty_max = 0;
 	INIT_LIST_HEAD(&vb->free_list);
 
 	vb_idx = addr_to_vb_idx(va->va_start);
@@ -896,7 +897,8 @@ static void purge_fragmented_blocks(int cpu)
 		if (vb->free + vb->dirty == VMAP_BBMAP_BITS && vb->dirty != VMAP_BBMAP_BITS) {
 			vb->free = 0; /* prevent further allocs after releasing lock */
 			vb->dirty = VMAP_BBMAP_BITS; /* prevent purging it again */
-			bitmap_fill(vb->dirty_map, VMAP_BBMAP_BITS);
+			vb->dirty_min = 0;
+			vb->dirty_max = VMAP_BBMAP_BITS;
 			spin_lock(&vbq->lock);
 			list_del_rcu(&vb->free_list);
 			spin_unlock(&vbq->lock);
@@ -989,6 +991,7 @@ static void vb_free(const void *addr, unsigned long size)
 	order = get_order(size);
 
 	offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
+	offset >>= PAGE_SHIFT;
 
 	vb_idx = addr_to_vb_idx((unsigned long)addr);
 	rcu_read_lock();
@@ -999,7 +1002,10 @@ static void vb_free(const void *addr, unsigned long size)
 	vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
 
 	spin_lock(&vb->lock);
-	BUG_ON(bitmap_allocate_region(vb->dirty_map, offset >> PAGE_SHIFT, order));
+
+	/* Expand dirty range */
+	vb->dirty_min = min(vb->dirty_min, offset);
+	vb->dirty_max = max(vb->dirty_max, offset + (1UL << order));
 
 	vb->dirty += 1UL << order;
 	if (vb->dirty == VMAP_BBMAP_BITS) {
@@ -1038,25 +1044,18 @@ void vm_unmap_aliases(void)
 
 		rcu_read_lock();
 		list_for_each_entry_rcu(vb, &vbq->free, free_list) {
-			int i, j;
-
 			spin_lock(&vb->lock);
-			i = find_first_bit(vb->dirty_map, VMAP_BBMAP_BITS);
-			if (i < VMAP_BBMAP_BITS) {
+			if (vb->dirty) {
+				unsigned long va_start = vb->va->va_start;
 				unsigned long s, e;
 
-				j = find_last_bit(vb->dirty_map,
-							VMAP_BBMAP_BITS);
-				j = j + 1; /* need exclusive index */
+				s = va_start + (vb->dirty_min << PAGE_SHIFT);
+				e = va_start + (vb->dirty_max << PAGE_SHIFT);
 
-				s = vb->va->va_start + (i << PAGE_SHIFT);
-				e = vb->va->va_start + (j << PAGE_SHIFT);
-				flush = 1;
+				start = min(s, start);
+				end   = max(e, end);
 
-				if (s < start)
-					start = s;
-				if (e > end)
-					end = e;
+				flush = 1;
 			}
 			spin_unlock(&vb->lock);
 		}
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
  2015-03-13 12:12 ` Roman Pen
  (?)
@ 2015-03-16 10:28   ` Gioh Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Gioh Kim @ 2015-03-16 10:28 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Rob Jones, linux-mm, linux-kernel, stable



2015-03-13 오후 9:12에 Roman Pen 이(가) 쓴 글:
> Hello all.
> 
> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
> has free space, but still new blocks continue to appear.  Further investigation
> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
> small 32bit systems that's not a big problem, cause purging will be called soon
> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
> x86_64 has 45 bits of vmalloc space, that can be a disaster.

I think the problem you comments is already known so that I wrote comments about it as
"it could consume lots of address space through fragmentation".

Could you tell me about your situation and reason why it should be avoided?


> 
> Fixing this I also did some tweaks in allocation logic of a new vmap block and
> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
> 
> I would like to receive comments on the following three patches.
> 
> Thanks.
> 
> Roman Pen (3):
>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>      vm_map_ram allocator
>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
> 
>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>   1 file changed, 54 insertions(+), 40 deletions(-)
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-16 10:28   ` Gioh Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Gioh Kim @ 2015-03-16 10:28 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Rob Jones, linux-mm, linux-kernel, stable



2015-03-13 占쏙옙占쏙옙 9:12占쏙옙 Roman Pen 占쏙옙(占쏙옙) 占쏙옙 占쏙옙:
> Hello all.
> 
> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
> has free space, but still new blocks continue to appear.  Further investigation
> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
> small 32bit systems that's not a big problem, cause purging will be called soon
> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
> x86_64 has 45 bits of vmalloc space, that can be a disaster.

I think the problem you comments is already known so that I wrote comments about it as
"it could consume lots of address space through fragmentation".

Could you tell me about your situation and reason why it should be avoided?


> 
> Fixing this I also did some tweaks in allocation logic of a new vmap block and
> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
> 
> I would like to receive comments on the following three patches.
> 
> Thanks.
> 
> Roman Pen (3):
>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>      vm_map_ram allocator
>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
> 
>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>   1 file changed, 54 insertions(+), 40 deletions(-)
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-16 10:28   ` Gioh Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Gioh Kim @ 2015-03-16 10:28 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Rob Jones, linux-mm, linux-kernel, stable



2015-03-13 ?AEA 9:12?! Roman Pen AI(?!)  3/4 ' +-U:
> Hello all.
> 
> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
> has free space, but still new blocks continue to appear.  Further investigation
> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
> small 32bit systems that's not a big problem, cause purging will be called soon
> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
> x86_64 has 45 bits of vmalloc space, that can be a disaster.

I think the problem you comments is already known so that I wrote comments about it as
"it could consume lots of address space through fragmentation".

Could you tell me about your situation and reason why it should be avoided?


> 
> Fixing this I also did some tweaks in allocation logic of a new vmap block and
> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
> 
> I would like to receive comments on the following three patches.
> 
> Thanks.
> 
> Roman Pen (3):
>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>      vm_map_ram allocator
>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
> 
>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>   1 file changed, 54 insertions(+), 40 deletions(-)
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
  2015-03-16 10:28   ` Gioh Kim
@ 2015-03-16 10:49     ` Roman Peniaev
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-16 10:49 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Eric Dumazet, Joonsoo Kim, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Rob Jones,
	linux-mm, linux-kernel, stable

On Mon, Mar 16, 2015 at 7:28 PM, Gioh Kim <gioh.kim@lge.com> wrote:
>
>
> 2015-03-13 오후 9:12에 Roman Pen 이(가) 쓴 글:
>> Hello all.
>>
>> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
>> has free space, but still new blocks continue to appear.  Further investigation
>> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
>> small 32bit systems that's not a big problem, cause purging will be called soon
>> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
>> x86_64 has 45 bits of vmalloc space, that can be a disaster.
>
> I think the problem you comments is already known so that I wrote comments about it as
> "it could consume lots of address space through fragmentation".
>
> Could you tell me about your situation and reason why it should be avoided?

In the first patch of this set I explicitly described the function,
which exhausts
vmalloc space without any chance to be purged: vm_map_ram allocator is
greedy and firstly
tries to occupy newly allocated block, even old blocks contain enough
free space.

This can be easily fixed if we put newly allocated block (which has
enough space to
complete further requests) to the tail of a free list, to give a
chance to old blocks.

Why it should be avoided?  Strange question.  For me it looks like a
bug of an allocator,
which should be fair and should not continuously allocate new blocks
without lazy purging
(seems vmap_lazy_nr and  __purge_vmap_area_lazy were created exactly
for those reasons:
 to avoid infinite allocations)


--
Roman


>
>
>>
>> Fixing this I also did some tweaks in allocation logic of a new vmap block and
>> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
>>
>> I would like to receive comments on the following three patches.
>>
>> Thanks.
>>
>> Roman Pen (3):
>>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>>      vm_map_ram allocator
>>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
>>
>>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>>   1 file changed, 54 insertions(+), 40 deletions(-)
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: WANG Chao <chaowang@redhat.com>
>> Cc: Fabian Frederick <fabf@skynet.be>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Gioh Kim <gioh.kim@lge.com>
>> Cc: Rob Jones <rob.jones@codethink.co.uk>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: stable@vger.kernel.org
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-16 10:49     ` Roman Peniaev
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-16 10:49 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Eric Dumazet, Joonsoo Kim, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Rob Jones,
	linux-mm, linux-kernel, stable

On Mon, Mar 16, 2015 at 7:28 PM, Gioh Kim <gioh.kim@lge.com> wrote:
>
>
> 2015-03-13 오후 9:12에 Roman Pen 이(가) 쓴 글:
>> Hello all.
>>
>> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
>> has free space, but still new blocks continue to appear.  Further investigation
>> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
>> small 32bit systems that's not a big problem, cause purging will be called soon
>> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
>> x86_64 has 45 bits of vmalloc space, that can be a disaster.
>
> I think the problem you comments is already known so that I wrote comments about it as
> "it could consume lots of address space through fragmentation".
>
> Could you tell me about your situation and reason why it should be avoided?

In the first patch of this set I explicitly described the function,
which exhausts
vmalloc space without any chance to be purged: vm_map_ram allocator is
greedy and firstly
tries to occupy newly allocated block, even old blocks contain enough
free space.

This can be easily fixed if we put newly allocated block (which has
enough space to
complete further requests) to the tail of a free list, to give a
chance to old blocks.

Why it should be avoided?  Strange question.  For me it looks like a
bug of an allocator,
which should be fair and should not continuously allocate new blocks
without lazy purging
(seems vmap_lazy_nr and  __purge_vmap_area_lazy were created exactly
for those reasons:
 to avoid infinite allocations)


--
Roman


>
>
>>
>> Fixing this I also did some tweaks in allocation logic of a new vmap block and
>> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
>>
>> I would like to receive comments on the following three patches.
>>
>> Thanks.
>>
>> Roman Pen (3):
>>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>>      vm_map_ram allocator
>>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
>>
>>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>>   1 file changed, 54 insertions(+), 40 deletions(-)
>>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: David Rientjes <rientjes@google.com>
>> Cc: WANG Chao <chaowang@redhat.com>
>> Cc: Fabian Frederick <fabf@skynet.be>
>> Cc: Christoph Lameter <cl@linux.com>
>> Cc: Gioh Kim <gioh.kim@lge.com>
>> Cc: Rob Jones <rob.jones@codethink.co.uk>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> Cc: stable@vger.kernel.org
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
  2015-03-16 10:49     ` Roman Peniaev
@ 2015-03-16 10:57       ` Roman Peniaev
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-16 10:57 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Eric Dumazet, Joonsoo Kim, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Rob Jones,
	linux-mm, linux-kernel, stable

On Mon, Mar 16, 2015 at 7:49 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Mon, Mar 16, 2015 at 7:28 PM, Gioh Kim <gioh.kim@lge.com> wrote:
>>
>>
>> 2015-03-13 오후 9:12에 Roman Pen 이(가) 쓴 글:
>>> Hello all.
>>>
>>> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
>>> has free space, but still new blocks continue to appear.  Further investigation
>>> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
>>> small 32bit systems that's not a big problem, cause purging will be called soon
>>> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
>>> x86_64 has 45 bits of vmalloc space, that can be a disaster.
>>
>> I think the problem you comments is already known so that I wrote comments about it as
>> "it could consume lots of address space through fragmentation".
>>
>> Could you tell me about your situation and reason why it should be avoided?
>
> In the first patch of this set I explicitly described the function,
> which exhausts
> vmalloc space without any chance to be purged: vm_map_ram allocator is
> greedy and firstly
> tries to occupy newly allocated block, even old blocks contain enough
> free space.
>
> This can be easily fixed if we put newly allocated block (which has
> enough space to
> complete further requests) to the tail of a free list, to give a
> chance to old blocks.
>
> Why it should be avoided?  Strange question.  For me it looks like a
> bug of an allocator,
> which should be fair and should not continuously allocate new blocks
> without lazy purging
> (seems vmap_lazy_nr and  __purge_vmap_area_lazy were created exactly
> for those reasons:
>  to avoid infinite allocations)


And if you are talking about your commit 364376383, which adds this comment

 * If you use this function for less than VMAP_MAX_ALLOC pages, it could be
 * faster than vmap so it's good.  But if you mix long-life and short-life
 * objects with vm_map_ram(), it could consume lots of address space through
 * fragmentation (especially on a 32bit machine).  You could see failures in
 * the end.  Please use this function for short-lived objects.

This is not that case, because if block is pinned, i.e. some pages are still
in use, we can't do anything with that.

I am talking about blocks, which are completely freed, but dirty.


--
Roman

>
>
> --
> Roman
>
>
>>
>>
>>>
>>> Fixing this I also did some tweaks in allocation logic of a new vmap block and
>>> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
>>>
>>> I would like to receive comments on the following three patches.
>>>
>>> Thanks.
>>>
>>> Roman Pen (3):
>>>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>>>      vm_map_ram allocator
>>>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>>>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
>>>
>>>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>>>   1 file changed, 54 insertions(+), 40 deletions(-)
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Nick Piggin <npiggin@kernel.dk>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> Cc: David Rientjes <rientjes@google.com>
>>> Cc: WANG Chao <chaowang@redhat.com>
>>> Cc: Fabian Frederick <fabf@skynet.be>
>>> Cc: Christoph Lameter <cl@linux.com>
>>> Cc: Gioh Kim <gioh.kim@lge.com>
>>> Cc: Rob Jones <rob.jones@codethink.co.uk>
>>> Cc: linux-mm@kvack.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: stable@vger.kernel.org
>>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space
@ 2015-03-16 10:57       ` Roman Peniaev
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-16 10:57 UTC (permalink / raw)
  To: Gioh Kim
  Cc: Andrew Morton, Eric Dumazet, Joonsoo Kim, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Rob Jones,
	linux-mm, linux-kernel, stable

On Mon, Mar 16, 2015 at 7:49 PM, Roman Peniaev <r.peniaev@gmail.com> wrote:
> On Mon, Mar 16, 2015 at 7:28 PM, Gioh Kim <gioh.kim@lge.com> wrote:
>>
>>
>> 2015-03-13 오후 9:12에 Roman Pen 이(가) 쓴 글:
>>> Hello all.
>>>
>>> Recently I came across high fragmentation of vm_map_ram allocator: vmap_block
>>> has free space, but still new blocks continue to appear.  Further investigation
>>> showed that certain mapping/unmapping sequence can exhaust vmalloc space.  On
>>> small 32bit systems that's not a big problem, cause purging will be called soon
>>> on a first allocation failure (alloc_vmap_area), but on 64bit machines, e.g.
>>> x86_64 has 45 bits of vmalloc space, that can be a disaster.
>>
>> I think the problem you comments is already known so that I wrote comments about it as
>> "it could consume lots of address space through fragmentation".
>>
>> Could you tell me about your situation and reason why it should be avoided?
>
> In the first patch of this set I explicitly described the function,
> which exhausts
> vmalloc space without any chance to be purged: vm_map_ram allocator is
> greedy and firstly
> tries to occupy newly allocated block, even old blocks contain enough
> free space.
>
> This can be easily fixed if we put newly allocated block (which has
> enough space to
> complete further requests) to the tail of a free list, to give a
> chance to old blocks.
>
> Why it should be avoided?  Strange question.  For me it looks like a
> bug of an allocator,
> which should be fair and should not continuously allocate new blocks
> without lazy purging
> (seems vmap_lazy_nr and  __purge_vmap_area_lazy were created exactly
> for those reasons:
>  to avoid infinite allocations)


And if you are talking about your commit 364376383, which adds this comment

 * If you use this function for less than VMAP_MAX_ALLOC pages, it could be
 * faster than vmap so it's good.  But if you mix long-life and short-life
 * objects with vm_map_ram(), it could consume lots of address space through
 * fragmentation (especially on a 32bit machine).  You could see failures in
 * the end.  Please use this function for short-lived objects.

This is not that case, because if block is pinned, i.e. some pages are still
in use, we can't do anything with that.

I am talking about blocks, which are completely freed, but dirty.


--
Roman

>
>
> --
> Roman
>
>
>>
>>
>>>
>>> Fixing this I also did some tweaks in allocation logic of a new vmap block and
>>> replaced dirty bitmap with min/max dirty range values to make the logic simpler.
>>>
>>> I would like to receive comments on the following three patches.
>>>
>>> Thanks.
>>>
>>> Roman Pen (3):
>>>    mm/vmalloc: fix possible exhaustion of vmalloc space caused by
>>>      vm_map_ram allocator
>>>    mm/vmalloc: occupy newly allocated vmap block just after allocation
>>>    mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
>>>
>>>   mm/vmalloc.c | 94 ++++++++++++++++++++++++++++++++++--------------------------
>>>   1 file changed, 54 insertions(+), 40 deletions(-)
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Nick Piggin <npiggin@kernel.dk>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> Cc: David Rientjes <rientjes@google.com>
>>> Cc: WANG Chao <chaowang@redhat.com>
>>> Cc: Fabian Frederick <fabf@skynet.be>
>>> Cc: Christoph Lameter <cl@linux.com>
>>> Cc: Gioh Kim <gioh.kim@lge.com>
>>> Cc: Rob Jones <rob.jones@codethink.co.uk>
>>> Cc: linux-mm@kvack.org
>>> Cc: linux-kernel@vger.kernel.org
>>> Cc: stable@vger.kernel.org
>>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-13 12:12   ` Roman Pen
@ 2015-03-17  4:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-17  4:56 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
> If suitable block can't be found, new block is allocated and put into a head
> of a free list, so on next iteration this new block will be found first.
> 
> That's bad, because old blocks in a free list will not get a chance to be fully
> used, thus fragmentation will grow.
> 
> Let's consider this simple example:
> 
>  #1 We have one block in a free list which is partially used, and where only
>     one page is free:
> 
>     HEAD |xxxxxxxxx-| TAIL
>                    ^
>                    free space for 1 page, order 0
> 
>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>     since we do not have free space to complete this request. New block is put
>     into a head of a free list:
> 
>     HEAD |----------|xxxxxxxxx-| TAIL
> 
>  #3 Two pages were occupied in a new found block:
> 
>     HEAD |xx--------|xxxxxxxxx-| TAIL
>           ^
>           two pages mapped here
> 
>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>     on #2 step, is located at the beginning of a free list, so it will be found
>     first:
> 
>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>           ^                 ^
>           page mapped here, but better to use this hole
> 
> It is obvious, that it is better to complete request of #4 step using the old
> block, where free space is left, because in other case fragmentation will be
> highly increased.
> 
> But fragmentation is not only the case.  The most worst thing is that I can
> easily create scenario, when the whole vmalloc space is exhausted by blocks,
> which are not used, but already dirty and have several free pages.
> 
> Let's consider this function which execution should be pinned to one CPU:
> 
>  ------------------------------------------------------------------------------
> /* Here we consider that our block is equal to 1MB, thus 256 pages */
> static void exhaust_virtual_space(struct page *pages[256], int iters)
> {
> 	/* Firstly we have to map a big chunk, e.g. 16 pages.
> 	 * Then we have to occupy the remaining space with smaller
> 	 * chunks, i.e. 8 pages. At the end small hole should remain.
> 	 * So at the end of our allocation sequence block looks like
> 	 * this:
> 	 *                XX  big chunk
> 	 * |XXxxxxxxx-|    x  small chunk
> 	 *                 -  hole, which is enough for a small chunk,
> 	 *                    but not for a big chunk
> 	 */
> 	unsigned big_allocs   = 1;
> 	/* -1 for hole, which should be left at the end of each block
> 	 * to keep it partially used, with some free space available */
> 	unsigned small_allocs = (256 - 16) / 8 - 1;
> 	void    *vaddrs[big_allocs + small_allocs];
> 
> 	while (iters--) {
> 		int i = 0, j;
> 
> 		/* Map big chunk */
> 		vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
> 
> 		/* Map small chunks */
> 		for (j = 0; j < small_allocs; j++)
> 			vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
> 						 PAGE_KERNEL);
> 
> 		/* Unmap everything */
> 		while (i--)
> 			vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
> 	}
> }
>  ------------------------------------------------------------------------------
> 
> On every iteration new block (1MB of vm area in my case) will be allocated and
> then will be occupied, without attempt to resolve small allocation request
> using previously allocated blocks in a free list.
> 
> In current patch I simply put newly allocated block to the tail of a free list,
> thus reduce fragmentation, giving a chance to resolve allocation request using
> older blocks with possible holes left.

Hello,

I think that if you put newly allocated block to the tail of a free
list, below example would results in enormous performance degradation.

new block: 1MB (256 pages)

while (iters--) {
  vm_map_ram(3 or something else not dividable for 256) * 85
  vm_unmap_ram(3) * 85
}

On every iteration, it needs newly allocated block and it is put to the
tail of a free list so finding it consumes large amount of time.

Is there any other solution to prevent your problem?

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-17  4:56     ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-17  4:56 UTC (permalink / raw)
  To: Roman Pen
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
> If suitable block can't be found, new block is allocated and put into a head
> of a free list, so on next iteration this new block will be found first.
> 
> That's bad, because old blocks in a free list will not get a chance to be fully
> used, thus fragmentation will grow.
> 
> Let's consider this simple example:
> 
>  #1 We have one block in a free list which is partially used, and where only
>     one page is free:
> 
>     HEAD |xxxxxxxxx-| TAIL
>                    ^
>                    free space for 1 page, order 0
> 
>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>     since we do not have free space to complete this request. New block is put
>     into a head of a free list:
> 
>     HEAD |----------|xxxxxxxxx-| TAIL
> 
>  #3 Two pages were occupied in a new found block:
> 
>     HEAD |xx--------|xxxxxxxxx-| TAIL
>           ^
>           two pages mapped here
> 
>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>     on #2 step, is located at the beginning of a free list, so it will be found
>     first:
> 
>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>           ^                 ^
>           page mapped here, but better to use this hole
> 
> It is obvious, that it is better to complete request of #4 step using the old
> block, where free space is left, because in other case fragmentation will be
> highly increased.
> 
> But fragmentation is not only the case.  The most worst thing is that I can
> easily create scenario, when the whole vmalloc space is exhausted by blocks,
> which are not used, but already dirty and have several free pages.
> 
> Let's consider this function which execution should be pinned to one CPU:
> 
>  ------------------------------------------------------------------------------
> /* Here we consider that our block is equal to 1MB, thus 256 pages */
> static void exhaust_virtual_space(struct page *pages[256], int iters)
> {
> 	/* Firstly we have to map a big chunk, e.g. 16 pages.
> 	 * Then we have to occupy the remaining space with smaller
> 	 * chunks, i.e. 8 pages. At the end small hole should remain.
> 	 * So at the end of our allocation sequence block looks like
> 	 * this:
> 	 *                XX  big chunk
> 	 * |XXxxxxxxx-|    x  small chunk
> 	 *                 -  hole, which is enough for a small chunk,
> 	 *                    but not for a big chunk
> 	 */
> 	unsigned big_allocs   = 1;
> 	/* -1 for hole, which should be left at the end of each block
> 	 * to keep it partially used, with some free space available */
> 	unsigned small_allocs = (256 - 16) / 8 - 1;
> 	void    *vaddrs[big_allocs + small_allocs];
> 
> 	while (iters--) {
> 		int i = 0, j;
> 
> 		/* Map big chunk */
> 		vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
> 
> 		/* Map small chunks */
> 		for (j = 0; j < small_allocs; j++)
> 			vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
> 						 PAGE_KERNEL);
> 
> 		/* Unmap everything */
> 		while (i--)
> 			vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
> 	}
> }
>  ------------------------------------------------------------------------------
> 
> On every iteration new block (1MB of vm area in my case) will be allocated and
> then will be occupied, without attempt to resolve small allocation request
> using previously allocated blocks in a free list.
> 
> In current patch I simply put newly allocated block to the tail of a free list,
> thus reduce fragmentation, giving a chance to resolve allocation request using
> older blocks with possible holes left.

Hello,

I think that if you put newly allocated block to the tail of a free
list, below example would results in enormous performance degradation.

new block: 1MB (256 pages)

while (iters--) {
  vm_map_ram(3 or something else not dividable for 256) * 85
  vm_unmap_ram(3) * 85
}

On every iteration, it needs newly allocated block and it is put to the
tail of a free list so finding it consumes large amount of time.

Is there any other solution to prevent your problem?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17  4:56     ` Joonsoo Kim
@ 2015-03-17  5:12       ` Roman Peniaev
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-17  5:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>> If suitable block can't be found, new block is allocated and put into a head
>> of a free list, so on next iteration this new block will be found first.
>>
>> That's bad, because old blocks in a free list will not get a chance to be fully
>> used, thus fragmentation will grow.
>>
>> Let's consider this simple example:
>>
>>  #1 We have one block in a free list which is partially used, and where only
>>     one page is free:
>>
>>     HEAD |xxxxxxxxx-| TAIL
>>                    ^
>>                    free space for 1 page, order 0
>>
>>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>>     since we do not have free space to complete this request. New block is put
>>     into a head of a free list:
>>
>>     HEAD |----------|xxxxxxxxx-| TAIL
>>
>>  #3 Two pages were occupied in a new found block:
>>
>>     HEAD |xx--------|xxxxxxxxx-| TAIL
>>           ^
>>           two pages mapped here
>>
>>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>>     on #2 step, is located at the beginning of a free list, so it will be found
>>     first:
>>
>>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>>           ^                 ^
>>           page mapped here, but better to use this hole
>>
>> It is obvious, that it is better to complete request of #4 step using the old
>> block, where free space is left, because in other case fragmentation will be
>> highly increased.
>>
>> But fragmentation is not only the case.  The most worst thing is that I can
>> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>> which are not used, but already dirty and have several free pages.
>>
>> Let's consider this function which execution should be pinned to one CPU:
>>
>>  ------------------------------------------------------------------------------
>> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>> static void exhaust_virtual_space(struct page *pages[256], int iters)
>> {
>>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>>        * Then we have to occupy the remaining space with smaller
>>        * chunks, i.e. 8 pages. At the end small hole should remain.
>>        * So at the end of our allocation sequence block looks like
>>        * this:
>>        *                XX  big chunk
>>        * |XXxxxxxxx-|    x  small chunk
>>        *                 -  hole, which is enough for a small chunk,
>>        *                    but not for a big chunk
>>        */
>>       unsigned big_allocs   = 1;
>>       /* -1 for hole, which should be left at the end of each block
>>        * to keep it partially used, with some free space available */
>>       unsigned small_allocs = (256 - 16) / 8 - 1;
>>       void    *vaddrs[big_allocs + small_allocs];
>>
>>       while (iters--) {
>>               int i = 0, j;
>>
>>               /* Map big chunk */
>>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>>
>>               /* Map small chunks */
>>               for (j = 0; j < small_allocs; j++)
>>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>>                                                PAGE_KERNEL);
>>
>>               /* Unmap everything */
>>               while (i--)
>>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>>       }
>> }
>>  ------------------------------------------------------------------------------
>>
>> On every iteration new block (1MB of vm area in my case) will be allocated and
>> then will be occupied, without attempt to resolve small allocation request
>> using previously allocated blocks in a free list.
>>
>> In current patch I simply put newly allocated block to the tail of a free list,
>> thus reduce fragmentation, giving a chance to resolve allocation request using
>> older blocks with possible holes left.
>
> Hello,
>
> I think that if you put newly allocated block to the tail of a free
> list, below example would results in enormous performance degradation.
>
> new block: 1MB (256 pages)
>
> while (iters--) {
>   vm_map_ram(3 or something else not dividable for 256) * 85
>   vm_unmap_ram(3) * 85
> }
>
> On every iteration, it needs newly allocated block and it is put to the
> tail of a free list so finding it consumes large amount of time.
>
> Is there any other solution to prevent your problem?

Hello.

My second patch fixes this problem.
I occupy the block on allocation and avoid jumping to the search loop.

Also the problem is much wider.  Since we allocate a block on one CPU, but
search of a free block can be done on another CPU (preemption was turned on),
then allocation can happen again.  In worst case allocation will happen for
each CPU available on the system.

This scenario also should be fixed by occupying block on allocation.

--
Roman

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-17  5:12       ` Roman Peniaev
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-17  5:12 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>> If suitable block can't be found, new block is allocated and put into a head
>> of a free list, so on next iteration this new block will be found first.
>>
>> That's bad, because old blocks in a free list will not get a chance to be fully
>> used, thus fragmentation will grow.
>>
>> Let's consider this simple example:
>>
>>  #1 We have one block in a free list which is partially used, and where only
>>     one page is free:
>>
>>     HEAD |xxxxxxxxx-| TAIL
>>                    ^
>>                    free space for 1 page, order 0
>>
>>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>>     since we do not have free space to complete this request. New block is put
>>     into a head of a free list:
>>
>>     HEAD |----------|xxxxxxxxx-| TAIL
>>
>>  #3 Two pages were occupied in a new found block:
>>
>>     HEAD |xx--------|xxxxxxxxx-| TAIL
>>           ^
>>           two pages mapped here
>>
>>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>>     on #2 step, is located at the beginning of a free list, so it will be found
>>     first:
>>
>>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>>           ^                 ^
>>           page mapped here, but better to use this hole
>>
>> It is obvious, that it is better to complete request of #4 step using the old
>> block, where free space is left, because in other case fragmentation will be
>> highly increased.
>>
>> But fragmentation is not only the case.  The most worst thing is that I can
>> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>> which are not used, but already dirty and have several free pages.
>>
>> Let's consider this function which execution should be pinned to one CPU:
>>
>>  ------------------------------------------------------------------------------
>> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>> static void exhaust_virtual_space(struct page *pages[256], int iters)
>> {
>>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>>        * Then we have to occupy the remaining space with smaller
>>        * chunks, i.e. 8 pages. At the end small hole should remain.
>>        * So at the end of our allocation sequence block looks like
>>        * this:
>>        *                XX  big chunk
>>        * |XXxxxxxxx-|    x  small chunk
>>        *                 -  hole, which is enough for a small chunk,
>>        *                    but not for a big chunk
>>        */
>>       unsigned big_allocs   = 1;
>>       /* -1 for hole, which should be left at the end of each block
>>        * to keep it partially used, with some free space available */
>>       unsigned small_allocs = (256 - 16) / 8 - 1;
>>       void    *vaddrs[big_allocs + small_allocs];
>>
>>       while (iters--) {
>>               int i = 0, j;
>>
>>               /* Map big chunk */
>>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>>
>>               /* Map small chunks */
>>               for (j = 0; j < small_allocs; j++)
>>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>>                                                PAGE_KERNEL);
>>
>>               /* Unmap everything */
>>               while (i--)
>>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>>       }
>> }
>>  ------------------------------------------------------------------------------
>>
>> On every iteration new block (1MB of vm area in my case) will be allocated and
>> then will be occupied, without attempt to resolve small allocation request
>> using previously allocated blocks in a free list.
>>
>> In current patch I simply put newly allocated block to the tail of a free list,
>> thus reduce fragmentation, giving a chance to resolve allocation request using
>> older blocks with possible holes left.
>
> Hello,
>
> I think that if you put newly allocated block to the tail of a free
> list, below example would results in enormous performance degradation.
>
> new block: 1MB (256 pages)
>
> while (iters--) {
>   vm_map_ram(3 or something else not dividable for 256) * 85
>   vm_unmap_ram(3) * 85
> }
>
> On every iteration, it needs newly allocated block and it is put to the
> tail of a free list so finding it consumes large amount of time.
>
> Is there any other solution to prevent your problem?

Hello.

My second patch fixes this problem.
I occupy the block on allocation and avoid jumping to the search loop.

Also the problem is much wider.  Since we allocate a block on one CPU, but
search of a free block can be done on another CPU (preemption was turned on),
then allocation can happen again.  In worst case allocation will happen for
each CPU available on the system.

This scenario also should be fixed by occupying block on allocation.

--
Roman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17  5:12       ` Roman Peniaev
@ 2015-03-17  7:29         ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-17  7:29 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
> >> If suitable block can't be found, new block is allocated and put into a head
> >> of a free list, so on next iteration this new block will be found first.
> >>
> >> That's bad, because old blocks in a free list will not get a chance to be fully
> >> used, thus fragmentation will grow.
> >>
> >> Let's consider this simple example:
> >>
> >>  #1 We have one block in a free list which is partially used, and where only
> >>     one page is free:
> >>
> >>     HEAD |xxxxxxxxx-| TAIL
> >>                    ^
> >>                    free space for 1 page, order 0
> >>
> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
> >>     since we do not have free space to complete this request. New block is put
> >>     into a head of a free list:
> >>
> >>     HEAD |----------|xxxxxxxxx-| TAIL
> >>
> >>  #3 Two pages were occupied in a new found block:
> >>
> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
> >>           ^
> >>           two pages mapped here
> >>
> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
> >>     on #2 step, is located at the beginning of a free list, so it will be found
> >>     first:
> >>
> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
> >>           ^                 ^
> >>           page mapped here, but better to use this hole
> >>
> >> It is obvious, that it is better to complete request of #4 step using the old
> >> block, where free space is left, because in other case fragmentation will be
> >> highly increased.
> >>
> >> But fragmentation is not only the case.  The most worst thing is that I can
> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
> >> which are not used, but already dirty and have several free pages.
> >>
> >> Let's consider this function which execution should be pinned to one CPU:
> >>
> >>  ------------------------------------------------------------------------------
> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
> >> {
> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
> >>        * Then we have to occupy the remaining space with smaller
> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
> >>        * So at the end of our allocation sequence block looks like
> >>        * this:
> >>        *                XX  big chunk
> >>        * |XXxxxxxxx-|    x  small chunk
> >>        *                 -  hole, which is enough for a small chunk,
> >>        *                    but not for a big chunk
> >>        */
> >>       unsigned big_allocs   = 1;
> >>       /* -1 for hole, which should be left at the end of each block
> >>        * to keep it partially used, with some free space available */
> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
> >>       void    *vaddrs[big_allocs + small_allocs];
> >>
> >>       while (iters--) {
> >>               int i = 0, j;
> >>
> >>               /* Map big chunk */
> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
> >>
> >>               /* Map small chunks */
> >>               for (j = 0; j < small_allocs; j++)
> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
> >>                                                PAGE_KERNEL);
> >>
> >>               /* Unmap everything */
> >>               while (i--)
> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
> >>       }
> >> }
> >>  ------------------------------------------------------------------------------
> >>
> >> On every iteration new block (1MB of vm area in my case) will be allocated and
> >> then will be occupied, without attempt to resolve small allocation request
> >> using previously allocated blocks in a free list.
> >>
> >> In current patch I simply put newly allocated block to the tail of a free list,
> >> thus reduce fragmentation, giving a chance to resolve allocation request using
> >> older blocks with possible holes left.
> >
> > Hello,
> >
> > I think that if you put newly allocated block to the tail of a free
> > list, below example would results in enormous performance degradation.
> >
> > new block: 1MB (256 pages)
> >
> > while (iters--) {
> >   vm_map_ram(3 or something else not dividable for 256) * 85
> >   vm_unmap_ram(3) * 85
> > }
> >
> > On every iteration, it needs newly allocated block and it is put to the
> > tail of a free list so finding it consumes large amount of time.
> >
> > Is there any other solution to prevent your problem?
> 
> Hello.
> 
> My second patch fixes this problem.
> I occupy the block on allocation and avoid jumping to the search loop.

I'm not sure that this fixes above case.
'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.

First vm_map_ram(3) caller could get benefit from your second patch.
But, second caller and the other callers in each iteration could not
get benefit and should iterate whole list to find suitable free block,
because this free block is put to the tail of the list. Am I missing
something?

Thanks.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-17  7:29         ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-17  7:29 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Andrew Morton, Nick Piggin, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
> >> If suitable block can't be found, new block is allocated and put into a head
> >> of a free list, so on next iteration this new block will be found first.
> >>
> >> That's bad, because old blocks in a free list will not get a chance to be fully
> >> used, thus fragmentation will grow.
> >>
> >> Let's consider this simple example:
> >>
> >>  #1 We have one block in a free list which is partially used, and where only
> >>     one page is free:
> >>
> >>     HEAD |xxxxxxxxx-| TAIL
> >>                    ^
> >>                    free space for 1 page, order 0
> >>
> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
> >>     since we do not have free space to complete this request. New block is put
> >>     into a head of a free list:
> >>
> >>     HEAD |----------|xxxxxxxxx-| TAIL
> >>
> >>  #3 Two pages were occupied in a new found block:
> >>
> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
> >>           ^
> >>           two pages mapped here
> >>
> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
> >>     on #2 step, is located at the beginning of a free list, so it will be found
> >>     first:
> >>
> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
> >>           ^                 ^
> >>           page mapped here, but better to use this hole
> >>
> >> It is obvious, that it is better to complete request of #4 step using the old
> >> block, where free space is left, because in other case fragmentation will be
> >> highly increased.
> >>
> >> But fragmentation is not only the case.  The most worst thing is that I can
> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
> >> which are not used, but already dirty and have several free pages.
> >>
> >> Let's consider this function which execution should be pinned to one CPU:
> >>
> >>  ------------------------------------------------------------------------------
> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
> >> {
> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
> >>        * Then we have to occupy the remaining space with smaller
> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
> >>        * So at the end of our allocation sequence block looks like
> >>        * this:
> >>        *                XX  big chunk
> >>        * |XXxxxxxxx-|    x  small chunk
> >>        *                 -  hole, which is enough for a small chunk,
> >>        *                    but not for a big chunk
> >>        */
> >>       unsigned big_allocs   = 1;
> >>       /* -1 for hole, which should be left at the end of each block
> >>        * to keep it partially used, with some free space available */
> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
> >>       void    *vaddrs[big_allocs + small_allocs];
> >>
> >>       while (iters--) {
> >>               int i = 0, j;
> >>
> >>               /* Map big chunk */
> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
> >>
> >>               /* Map small chunks */
> >>               for (j = 0; j < small_allocs; j++)
> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
> >>                                                PAGE_KERNEL);
> >>
> >>               /* Unmap everything */
> >>               while (i--)
> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
> >>       }
> >> }
> >>  ------------------------------------------------------------------------------
> >>
> >> On every iteration new block (1MB of vm area in my case) will be allocated and
> >> then will be occupied, without attempt to resolve small allocation request
> >> using previously allocated blocks in a free list.
> >>
> >> In current patch I simply put newly allocated block to the tail of a free list,
> >> thus reduce fragmentation, giving a chance to resolve allocation request using
> >> older blocks with possible holes left.
> >
> > Hello,
> >
> > I think that if you put newly allocated block to the tail of a free
> > list, below example would results in enormous performance degradation.
> >
> > new block: 1MB (256 pages)
> >
> > while (iters--) {
> >   vm_map_ram(3 or something else not dividable for 256) * 85
> >   vm_unmap_ram(3) * 85
> > }
> >
> > On every iteration, it needs newly allocated block and it is put to the
> > tail of a free list so finding it consumes large amount of time.
> >
> > Is there any other solution to prevent your problem?
> 
> Hello.
> 
> My second patch fixes this problem.
> I occupy the block on allocation and avoid jumping to the search loop.

I'm not sure that this fixes above case.
'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.

First vm_map_ram(3) caller could get benefit from your second patch.
But, second caller and the other callers in each iteration could not
get benefit and should iterate whole list to find suitable free block,
because this free block is put to the tail of the list. Am I missing
something?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17  7:29         ` Joonsoo Kim
@ 2015-03-17  8:22           ` Roman Peniaev
  -1 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-17  8:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Eric Dumazet, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 4:29 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
>> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>> >> If suitable block can't be found, new block is allocated and put into a head
>> >> of a free list, so on next iteration this new block will be found first.
>> >>
>> >> That's bad, because old blocks in a free list will not get a chance to be fully
>> >> used, thus fragmentation will grow.
>> >>
>> >> Let's consider this simple example:
>> >>
>> >>  #1 We have one block in a free list which is partially used, and where only
>> >>     one page is free:
>> >>
>> >>     HEAD |xxxxxxxxx-| TAIL
>> >>                    ^
>> >>                    free space for 1 page, order 0
>> >>
>> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>> >>     since we do not have free space to complete this request. New block is put
>> >>     into a head of a free list:
>> >>
>> >>     HEAD |----------|xxxxxxxxx-| TAIL
>> >>
>> >>  #3 Two pages were occupied in a new found block:
>> >>
>> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
>> >>           ^
>> >>           two pages mapped here
>> >>
>> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>> >>     on #2 step, is located at the beginning of a free list, so it will be found
>> >>     first:
>> >>
>> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>> >>           ^                 ^
>> >>           page mapped here, but better to use this hole
>> >>
>> >> It is obvious, that it is better to complete request of #4 step using the old
>> >> block, where free space is left, because in other case fragmentation will be
>> >> highly increased.
>> >>
>> >> But fragmentation is not only the case.  The most worst thing is that I can
>> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>> >> which are not used, but already dirty and have several free pages.
>> >>
>> >> Let's consider this function which execution should be pinned to one CPU:
>> >>
>> >>  ------------------------------------------------------------------------------
>> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
>> >> {
>> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>> >>        * Then we have to occupy the remaining space with smaller
>> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
>> >>        * So at the end of our allocation sequence block looks like
>> >>        * this:
>> >>        *                XX  big chunk
>> >>        * |XXxxxxxxx-|    x  small chunk
>> >>        *                 -  hole, which is enough for a small chunk,
>> >>        *                    but not for a big chunk
>> >>        */
>> >>       unsigned big_allocs   = 1;
>> >>       /* -1 for hole, which should be left at the end of each block
>> >>        * to keep it partially used, with some free space available */
>> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
>> >>       void    *vaddrs[big_allocs + small_allocs];
>> >>
>> >>       while (iters--) {
>> >>               int i = 0, j;
>> >>
>> >>               /* Map big chunk */
>> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>> >>
>> >>               /* Map small chunks */
>> >>               for (j = 0; j < small_allocs; j++)
>> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>> >>                                                PAGE_KERNEL);
>> >>
>> >>               /* Unmap everything */
>> >>               while (i--)
>> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>> >>       }
>> >> }
>> >>  ------------------------------------------------------------------------------
>> >>
>> >> On every iteration new block (1MB of vm area in my case) will be allocated and
>> >> then will be occupied, without attempt to resolve small allocation request
>> >> using previously allocated blocks in a free list.
>> >>
>> >> In current patch I simply put newly allocated block to the tail of a free list,
>> >> thus reduce fragmentation, giving a chance to resolve allocation request using
>> >> older blocks with possible holes left.
>> >
>> > Hello,
>> >
>> > I think that if you put newly allocated block to the tail of a free
>> > list, below example would results in enormous performance degradation.
>> >
>> > new block: 1MB (256 pages)
>> >
>> > while (iters--) {
>> >   vm_map_ram(3 or something else not dividable for 256) * 85
>> >   vm_unmap_ram(3) * 85
>> > }
>> >
>> > On every iteration, it needs newly allocated block and it is put to the
>> > tail of a free list so finding it consumes large amount of time.
>> >
>> > Is there any other solution to prevent your problem?
>>
>> Hello.
>>
>> My second patch fixes this problem.
>> I occupy the block on allocation and avoid jumping to the search loop.
>
> I'm not sure that this fixes above case.
> 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>
> First vm_map_ram(3) caller could get benefit from your second patch.
> But, second caller and the other callers in each iteration could not
> get benefit and should iterate whole list to find suitable free block,
> because this free block is put to the tail of the list. Am I missing
> something?

You are missing the fact that we occupy blocks in 2^n.
So in your example 4 page slots will be occupied (order is 2), not 3.

The maximum size of allocation is 32 pages for 32-bit system
(if you try to map more, original alloc_vmap_area will be called).

So the maximum order is 5.  That means that worst case, before we make
the decision
to allocate new block, is to iterate 6 blocks:

HEAD
1st block - has 1  page slot  free (order 0)
2nd block - has 2  page slots free (order 1)
3rd block - has 4  page slots free (order 2)
4th block - has 8  page slots free (order 3)
5th block - has 16 page slots free (order 4)
6th block - has 32 page slots free (order 5)
TAIL

So the worst scenario is that each CPU queue can have 6 blocks in a free list.

This can happen only and only if you allocate blocks increasing the order.
(as I did in the function written in the comment of the first patch)
This is weird and rare case, but still it is possible.
Afterwards you will get 6 blocks in a list.

All further requests should be placed in a newly allocated block or
some free slots
should be found in a free list.  Seems it does not look dramatically awful.


--
Roman

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-17  8:22           ` Roman Peniaev
  0 siblings, 0 replies; 35+ messages in thread
From: Roman Peniaev @ 2015-03-17  8:22 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, Eric Dumazet, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel, stable

On Tue, Mar 17, 2015 at 4:29 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
> On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
>> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>> >> If suitable block can't be found, new block is allocated and put into a head
>> >> of a free list, so on next iteration this new block will be found first.
>> >>
>> >> That's bad, because old blocks in a free list will not get a chance to be fully
>> >> used, thus fragmentation will grow.
>> >>
>> >> Let's consider this simple example:
>> >>
>> >>  #1 We have one block in a free list which is partially used, and where only
>> >>     one page is free:
>> >>
>> >>     HEAD |xxxxxxxxx-| TAIL
>> >>                    ^
>> >>                    free space for 1 page, order 0
>> >>
>> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>> >>     since we do not have free space to complete this request. New block is put
>> >>     into a head of a free list:
>> >>
>> >>     HEAD |----------|xxxxxxxxx-| TAIL
>> >>
>> >>  #3 Two pages were occupied in a new found block:
>> >>
>> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
>> >>           ^
>> >>           two pages mapped here
>> >>
>> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>> >>     on #2 step, is located at the beginning of a free list, so it will be found
>> >>     first:
>> >>
>> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>> >>           ^                 ^
>> >>           page mapped here, but better to use this hole
>> >>
>> >> It is obvious, that it is better to complete request of #4 step using the old
>> >> block, where free space is left, because in other case fragmentation will be
>> >> highly increased.
>> >>
>> >> But fragmentation is not only the case.  The most worst thing is that I can
>> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>> >> which are not used, but already dirty and have several free pages.
>> >>
>> >> Let's consider this function which execution should be pinned to one CPU:
>> >>
>> >>  ------------------------------------------------------------------------------
>> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
>> >> {
>> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>> >>        * Then we have to occupy the remaining space with smaller
>> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
>> >>        * So at the end of our allocation sequence block looks like
>> >>        * this:
>> >>        *                XX  big chunk
>> >>        * |XXxxxxxxx-|    x  small chunk
>> >>        *                 -  hole, which is enough for a small chunk,
>> >>        *                    but not for a big chunk
>> >>        */
>> >>       unsigned big_allocs   = 1;
>> >>       /* -1 for hole, which should be left at the end of each block
>> >>        * to keep it partially used, with some free space available */
>> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
>> >>       void    *vaddrs[big_allocs + small_allocs];
>> >>
>> >>       while (iters--) {
>> >>               int i = 0, j;
>> >>
>> >>               /* Map big chunk */
>> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>> >>
>> >>               /* Map small chunks */
>> >>               for (j = 0; j < small_allocs; j++)
>> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>> >>                                                PAGE_KERNEL);
>> >>
>> >>               /* Unmap everything */
>> >>               while (i--)
>> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>> >>       }
>> >> }
>> >>  ------------------------------------------------------------------------------
>> >>
>> >> On every iteration new block (1MB of vm area in my case) will be allocated and
>> >> then will be occupied, without attempt to resolve small allocation request
>> >> using previously allocated blocks in a free list.
>> >>
>> >> In current patch I simply put newly allocated block to the tail of a free list,
>> >> thus reduce fragmentation, giving a chance to resolve allocation request using
>> >> older blocks with possible holes left.
>> >
>> > Hello,
>> >
>> > I think that if you put newly allocated block to the tail of a free
>> > list, below example would results in enormous performance degradation.
>> >
>> > new block: 1MB (256 pages)
>> >
>> > while (iters--) {
>> >   vm_map_ram(3 or something else not dividable for 256) * 85
>> >   vm_unmap_ram(3) * 85
>> > }
>> >
>> > On every iteration, it needs newly allocated block and it is put to the
>> > tail of a free list so finding it consumes large amount of time.
>> >
>> > Is there any other solution to prevent your problem?
>>
>> Hello.
>>
>> My second patch fixes this problem.
>> I occupy the block on allocation and avoid jumping to the search loop.
>
> I'm not sure that this fixes above case.
> 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>
> First vm_map_ram(3) caller could get benefit from your second patch.
> But, second caller and the other callers in each iteration could not
> get benefit and should iterate whole list to find suitable free block,
> because this free block is put to the tail of the list. Am I missing
> something?

You are missing the fact that we occupy blocks in 2^n.
So in your example 4 page slots will be occupied (order is 2), not 3.

The maximum size of allocation is 32 pages for 32-bit system
(if you try to map more, original alloc_vmap_area will be called).

So the maximum order is 5.  That means that worst case, before we make
the decision
to allocate new block, is to iterate 6 blocks:

HEAD
1st block - has 1  page slot  free (order 0)
2nd block - has 2  page slots free (order 1)
3rd block - has 4  page slots free (order 2)
4th block - has 8  page slots free (order 3)
5th block - has 16 page slots free (order 4)
6th block - has 32 page slots free (order 5)
TAIL

So the worst scenario is that each CPU queue can have 6 blocks in a free list.

This can happen only and only if you allocate blocks increasing the order.
(as I did in the function written in the comment of the first patch)
This is weird and rare case, but still it is possible.
Afterwards you will get 6 blocks in a list.

All further requests should be placed in a newly allocated block or
some free slots
should be found in a free list.  Seems it does not look dramatically awful.


--
Roman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17  8:22           ` Roman Peniaev
@ 2015-03-17 21:58             ` Andrew Morton
  -1 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2015-03-17 21:58 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Joonsoo Kim, Eric Dumazet, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel, stable

On Tue, 17 Mar 2015 17:22:46 +0900 Roman Peniaev <r.peniaev@gmail.com> wrote:

> >> My second patch fixes this problem.
> >> I occupy the block on allocation and avoid jumping to the search loop.
> >
> > I'm not sure that this fixes above case.
> > 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
> >
> > First vm_map_ram(3) caller could get benefit from your second patch.
> > But, second caller and the other callers in each iteration could not
> > get benefit and should iterate whole list to find suitable free block,
> > because this free block is put to the tail of the list. Am I missing
> > something?
> 
> You are missing the fact that we occupy blocks in 2^n.
> So in your example 4 page slots will be occupied (order is 2), not 3.

Could you please

- update the changelogs so they answer the questions which Joonsoo
  Kim and Gioh Kim asked

- write a little in-kernel benchmark to test the scenario which
  Joonsoo described and include the before and after timing results in
  the changelogs

- resend the patchset

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-17 21:58             ` Andrew Morton
  0 siblings, 0 replies; 35+ messages in thread
From: Andrew Morton @ 2015-03-17 21:58 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Joonsoo Kim, Eric Dumazet, David Rientjes, WANG Chao,
	Fabian Frederick, Christoph Lameter, Gioh Kim, Rob Jones,
	linux-mm, linux-kernel, stable

On Tue, 17 Mar 2015 17:22:46 +0900 Roman Peniaev <r.peniaev@gmail.com> wrote:

> >> My second patch fixes this problem.
> >> I occupy the block on allocation and avoid jumping to the search loop.
> >
> > I'm not sure that this fixes above case.
> > 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
> >
> > First vm_map_ram(3) caller could get benefit from your second patch.
> > But, second caller and the other callers in each iteration could not
> > get benefit and should iterate whole list to find suitable free block,
> > because this free block is put to the tail of the list. Am I missing
> > something?
> 
> You are missing the fact that we occupy blocks in 2^n.
> So in your example 4 page slots will be occupied (order is 2), not 3.

Could you please

- update the changelogs so they answer the questions which Joonsoo
  Kim and Gioh Kim asked

- write a little in-kernel benchmark to test the scenario which
  Joonsoo described and include the before and after timing results in
  the changelogs

- resend the patchset

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17  8:22           ` Roman Peniaev
@ 2015-03-18  5:05             ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:05 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Joonsoo Kim, Andrew Morton, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, Linux Memory Management List, linux-kernel, stable

2015-03-17 17:22 GMT+09:00 Roman Peniaev <r.peniaev@gmail.com>:
> On Tue, Mar 17, 2015 at 4:29 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
>>> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>>> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>>> >> If suitable block can't be found, new block is allocated and put into a head
>>> >> of a free list, so on next iteration this new block will be found first.
>>> >>
>>> >> That's bad, because old blocks in a free list will not get a chance to be fully
>>> >> used, thus fragmentation will grow.
>>> >>
>>> >> Let's consider this simple example:
>>> >>
>>> >>  #1 We have one block in a free list which is partially used, and where only
>>> >>     one page is free:
>>> >>
>>> >>     HEAD |xxxxxxxxx-| TAIL
>>> >>                    ^
>>> >>                    free space for 1 page, order 0
>>> >>
>>> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>>> >>     since we do not have free space to complete this request. New block is put
>>> >>     into a head of a free list:
>>> >>
>>> >>     HEAD |----------|xxxxxxxxx-| TAIL
>>> >>
>>> >>  #3 Two pages were occupied in a new found block:
>>> >>
>>> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
>>> >>           ^
>>> >>           two pages mapped here
>>> >>
>>> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>>> >>     on #2 step, is located at the beginning of a free list, so it will be found
>>> >>     first:
>>> >>
>>> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>>> >>           ^                 ^
>>> >>           page mapped here, but better to use this hole
>>> >>
>>> >> It is obvious, that it is better to complete request of #4 step using the old
>>> >> block, where free space is left, because in other case fragmentation will be
>>> >> highly increased.
>>> >>
>>> >> But fragmentation is not only the case.  The most worst thing is that I can
>>> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>>> >> which are not used, but already dirty and have several free pages.
>>> >>
>>> >> Let's consider this function which execution should be pinned to one CPU:
>>> >>
>>> >>  ------------------------------------------------------------------------------
>>> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>>> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
>>> >> {
>>> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>>> >>        * Then we have to occupy the remaining space with smaller
>>> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
>>> >>        * So at the end of our allocation sequence block looks like
>>> >>        * this:
>>> >>        *                XX  big chunk
>>> >>        * |XXxxxxxxx-|    x  small chunk
>>> >>        *                 -  hole, which is enough for a small chunk,
>>> >>        *                    but not for a big chunk
>>> >>        */
>>> >>       unsigned big_allocs   = 1;
>>> >>       /* -1 for hole, which should be left at the end of each block
>>> >>        * to keep it partially used, with some free space available */
>>> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
>>> >>       void    *vaddrs[big_allocs + small_allocs];
>>> >>
>>> >>       while (iters--) {
>>> >>               int i = 0, j;
>>> >>
>>> >>               /* Map big chunk */
>>> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>>> >>
>>> >>               /* Map small chunks */
>>> >>               for (j = 0; j < small_allocs; j++)
>>> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>>> >>                                                PAGE_KERNEL);
>>> >>
>>> >>               /* Unmap everything */
>>> >>               while (i--)
>>> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>>> >>       }
>>> >> }
>>> >>  ------------------------------------------------------------------------------
>>> >>
>>> >> On every iteration new block (1MB of vm area in my case) will be allocated and
>>> >> then will be occupied, without attempt to resolve small allocation request
>>> >> using previously allocated blocks in a free list.
>>> >>
>>> >> In current patch I simply put newly allocated block to the tail of a free list,
>>> >> thus reduce fragmentation, giving a chance to resolve allocation request using
>>> >> older blocks with possible holes left.
>>> >
>>> > Hello,
>>> >
>>> > I think that if you put newly allocated block to the tail of a free
>>> > list, below example would results in enormous performance degradation.
>>> >
>>> > new block: 1MB (256 pages)
>>> >
>>> > while (iters--) {
>>> >   vm_map_ram(3 or something else not dividable for 256) * 85
>>> >   vm_unmap_ram(3) * 85
>>> > }
>>> >
>>> > On every iteration, it needs newly allocated block and it is put to the
>>> > tail of a free list so finding it consumes large amount of time.
>>> >
>>> > Is there any other solution to prevent your problem?
>>>
>>> Hello.
>>>
>>> My second patch fixes this problem.
>>> I occupy the block on allocation and avoid jumping to the search loop.
>>
>> I'm not sure that this fixes above case.
>> 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>>
>> First vm_map_ram(3) caller could get benefit from your second patch.
>> But, second caller and the other callers in each iteration could not
>> get benefit and should iterate whole list to find suitable free block,
>> because this free block is put to the tail of the list. Am I missing
>> something?
>
> You are missing the fact that we occupy blocks in 2^n.
> So in your example 4 page slots will be occupied (order is 2), not 3.

Ah... Okay. I understand now.

> The maximum size of allocation is 32 pages for 32-bit system
> (if you try to map more, original alloc_vmap_area will be called).
>
> So the maximum order is 5.  That means that worst case, before we make
> the decision
> to allocate new block, is to iterate 6 blocks:
>
> HEAD
> 1st block - has 1  page slot  free (order 0)
> 2nd block - has 2  page slots free (order 1)
> 3rd block - has 4  page slots free (order 2)
> 4th block - has 8  page slots free (order 3)
> 5th block - has 16 page slots free (order 4)
> 6th block - has 32 page slots free (order 5)
> TAIL
>
> So the worst scenario is that each CPU queue can have 6 blocks in a free list.

Okay.

> This can happen only and only if you allocate blocks increasing the order.
> (as I did in the function written in the comment of the first patch)
> This is weird and rare case, but still it is possible.
> Afterwards you will get 6 blocks in a list.
>
> All further requests should be placed in a newly allocated block or
> some free slots
> should be found in a free list.  Seems it does not look dramatically awful.

I think so, too.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-18  5:05             ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:05 UTC (permalink / raw)
  To: Roman Peniaev
  Cc: Joonsoo Kim, Andrew Morton, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, Linux Memory Management List, linux-kernel, stable

2015-03-17 17:22 GMT+09:00 Roman Peniaev <r.peniaev@gmail.com>:
> On Tue, Mar 17, 2015 at 4:29 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>> On Tue, Mar 17, 2015 at 02:12:14PM +0900, Roman Peniaev wrote:
>>> On Tue, Mar 17, 2015 at 1:56 PM, Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote:
>>> > On Fri, Mar 13, 2015 at 09:12:55PM +0900, Roman Pen wrote:
>>> >> If suitable block can't be found, new block is allocated and put into a head
>>> >> of a free list, so on next iteration this new block will be found first.
>>> >>
>>> >> That's bad, because old blocks in a free list will not get a chance to be fully
>>> >> used, thus fragmentation will grow.
>>> >>
>>> >> Let's consider this simple example:
>>> >>
>>> >>  #1 We have one block in a free list which is partially used, and where only
>>> >>     one page is free:
>>> >>
>>> >>     HEAD |xxxxxxxxx-| TAIL
>>> >>                    ^
>>> >>                    free space for 1 page, order 0
>>> >>
>>> >>  #2 New allocation request of order 1 (2 pages) comes, new block is allocated
>>> >>     since we do not have free space to complete this request. New block is put
>>> >>     into a head of a free list:
>>> >>
>>> >>     HEAD |----------|xxxxxxxxx-| TAIL
>>> >>
>>> >>  #3 Two pages were occupied in a new found block:
>>> >>
>>> >>     HEAD |xx--------|xxxxxxxxx-| TAIL
>>> >>           ^
>>> >>           two pages mapped here
>>> >>
>>> >>  #4 New allocation request of order 0 (1 page) comes.  Block, which was created
>>> >>     on #2 step, is located at the beginning of a free list, so it will be found
>>> >>     first:
>>> >>
>>> >>   HEAD |xxX-------|xxxxxxxxx-| TAIL
>>> >>           ^                 ^
>>> >>           page mapped here, but better to use this hole
>>> >>
>>> >> It is obvious, that it is better to complete request of #4 step using the old
>>> >> block, where free space is left, because in other case fragmentation will be
>>> >> highly increased.
>>> >>
>>> >> But fragmentation is not only the case.  The most worst thing is that I can
>>> >> easily create scenario, when the whole vmalloc space is exhausted by blocks,
>>> >> which are not used, but already dirty and have several free pages.
>>> >>
>>> >> Let's consider this function which execution should be pinned to one CPU:
>>> >>
>>> >>  ------------------------------------------------------------------------------
>>> >> /* Here we consider that our block is equal to 1MB, thus 256 pages */
>>> >> static void exhaust_virtual_space(struct page *pages[256], int iters)
>>> >> {
>>> >>       /* Firstly we have to map a big chunk, e.g. 16 pages.
>>> >>        * Then we have to occupy the remaining space with smaller
>>> >>        * chunks, i.e. 8 pages. At the end small hole should remain.
>>> >>        * So at the end of our allocation sequence block looks like
>>> >>        * this:
>>> >>        *                XX  big chunk
>>> >>        * |XXxxxxxxx-|    x  small chunk
>>> >>        *                 -  hole, which is enough for a small chunk,
>>> >>        *                    but not for a big chunk
>>> >>        */
>>> >>       unsigned big_allocs   = 1;
>>> >>       /* -1 for hole, which should be left at the end of each block
>>> >>        * to keep it partially used, with some free space available */
>>> >>       unsigned small_allocs = (256 - 16) / 8 - 1;
>>> >>       void    *vaddrs[big_allocs + small_allocs];
>>> >>
>>> >>       while (iters--) {
>>> >>               int i = 0, j;
>>> >>
>>> >>               /* Map big chunk */
>>> >>               vaddrs[i++] = vm_map_ram(pages, 16, -1, PAGE_KERNEL);
>>> >>
>>> >>               /* Map small chunks */
>>> >>               for (j = 0; j < small_allocs; j++)
>>> >>                       vaddrs[i++] = vm_map_ram(pages + 16 + j * 8, 8, -1,
>>> >>                                                PAGE_KERNEL);
>>> >>
>>> >>               /* Unmap everything */
>>> >>               while (i--)
>>> >>                       vm_unmap_ram(vaddrs[i], (i ? 8 : 16));
>>> >>       }
>>> >> }
>>> >>  ------------------------------------------------------------------------------
>>> >>
>>> >> On every iteration new block (1MB of vm area in my case) will be allocated and
>>> >> then will be occupied, without attempt to resolve small allocation request
>>> >> using previously allocated blocks in a free list.
>>> >>
>>> >> In current patch I simply put newly allocated block to the tail of a free list,
>>> >> thus reduce fragmentation, giving a chance to resolve allocation request using
>>> >> older blocks with possible holes left.
>>> >
>>> > Hello,
>>> >
>>> > I think that if you put newly allocated block to the tail of a free
>>> > list, below example would results in enormous performance degradation.
>>> >
>>> > new block: 1MB (256 pages)
>>> >
>>> > while (iters--) {
>>> >   vm_map_ram(3 or something else not dividable for 256) * 85
>>> >   vm_unmap_ram(3) * 85
>>> > }
>>> >
>>> > On every iteration, it needs newly allocated block and it is put to the
>>> > tail of a free list so finding it consumes large amount of time.
>>> >
>>> > Is there any other solution to prevent your problem?
>>>
>>> Hello.
>>>
>>> My second patch fixes this problem.
>>> I occupy the block on allocation and avoid jumping to the search loop.
>>
>> I'm not sure that this fixes above case.
>> 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>>
>> First vm_map_ram(3) caller could get benefit from your second patch.
>> But, second caller and the other callers in each iteration could not
>> get benefit and should iterate whole list to find suitable free block,
>> because this free block is put to the tail of the list. Am I missing
>> something?
>
> You are missing the fact that we occupy blocks in 2^n.
> So in your example 4 page slots will be occupied (order is 2), not 3.

Ah... Okay. I understand now.

> The maximum size of allocation is 32 pages for 32-bit system
> (if you try to map more, original alloc_vmap_area will be called).
>
> So the maximum order is 5.  That means that worst case, before we make
> the decision
> to allocate new block, is to iterate 6 blocks:
>
> HEAD
> 1st block - has 1  page slot  free (order 0)
> 2nd block - has 2  page slots free (order 1)
> 3rd block - has 4  page slots free (order 2)
> 4th block - has 8  page slots free (order 3)
> 5th block - has 16 page slots free (order 4)
> 6th block - has 32 page slots free (order 5)
> TAIL
>
> So the worst scenario is that each CPU queue can have 6 blocks in a free list.

Okay.

> This can happen only and only if you allocate blocks increasing the order.
> (as I did in the function written in the comment of the first patch)
> This is weird and rare case, but still it is possible.
> Afterwards you will get 6 blocks in a list.
>
> All further requests should be placed in a newly allocated block or
> some free slots
> should be found in a free list.  Seems it does not look dramatically awful.

I think so, too.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
  2015-03-17 21:58             ` Andrew Morton
@ 2015-03-18  5:07               ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Peniaev, Joonsoo Kim, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, Linux Memory Management List, linux-kernel, stable

2015-03-18 6:58 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Tue, 17 Mar 2015 17:22:46 +0900 Roman Peniaev <r.peniaev@gmail.com> wrote:
>
>> >> My second patch fixes this problem.
>> >> I occupy the block on allocation and avoid jumping to the search loop.
>> >
>> > I'm not sure that this fixes above case.
>> > 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>> >
>> > First vm_map_ram(3) caller could get benefit from your second patch.
>> > But, second caller and the other callers in each iteration could not
>> > get benefit and should iterate whole list to find suitable free block,
>> > because this free block is put to the tail of the list. Am I missing
>> > something?
>>
>> You are missing the fact that we occupy blocks in 2^n.
>> So in your example 4 page slots will be occupied (order is 2), not 3.
>
> Could you please
>
> - update the changelogs so they answer the questions which Joonsoo
>   Kim and Gioh Kim asked
>
> - write a little in-kernel benchmark to test the scenario which
>   Joonsoo described and include the before and after timing results in
>   the changelogs

I misunderstand before and my scenario isn't possible. So, I'm fine with
current patch.

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator
@ 2015-03-18  5:07               ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Roman Peniaev, Joonsoo Kim, Eric Dumazet, David Rientjes,
	WANG Chao, Fabian Frederick, Christoph Lameter, Gioh Kim,
	Rob Jones, Linux Memory Management List, linux-kernel, stable

2015-03-18 6:58 GMT+09:00 Andrew Morton <akpm@linux-foundation.org>:
> On Tue, 17 Mar 2015 17:22:46 +0900 Roman Peniaev <r.peniaev@gmail.com> wrote:
>
>> >> My second patch fixes this problem.
>> >> I occupy the block on allocation and avoid jumping to the search loop.
>> >
>> > I'm not sure that this fixes above case.
>> > 'vm_map_ram (3) * 85' means 85 times vm_map_ram() calls.
>> >
>> > First vm_map_ram(3) caller could get benefit from your second patch.
>> > But, second caller and the other callers in each iteration could not
>> > get benefit and should iterate whole list to find suitable free block,
>> > because this free block is put to the tail of the list. Am I missing
>> > something?
>>
>> You are missing the fact that we occupy blocks in 2^n.
>> So in your example 4 page slots will be occupied (order is 2), not 3.
>
> Could you please
>
> - update the changelogs so they answer the questions which Joonsoo
>   Kim and Gioh Kim asked
>
> - write a little in-kernel benchmark to test the scenario which
>   Joonsoo described and include the before and after timing results in
>   the changelogs

I misunderstand before and my scenario isn't possible. So, I'm fine with
current patch.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: occupy newly allocated vmap block just after allocation
  2015-03-13 12:12   ` Roman Pen
@ 2015-03-18  5:51     ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:51 UTC (permalink / raw)
  To: Roman Pen
  Cc: Nick Piggin, Andrew Morton, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, Linux Memory Management List, LKML

2015-03-13 21:12 GMT+09:00 Roman Pen <r.peniaev@gmail.com>:
> Previous implementation allocates new vmap block and repeats search of a free
> block from the very beginning, iterating over the CPU free list.
>
> Why it can be better??
>
> 1. Allocation can happen on one CPU, but search can be done on another CPU.
>    In worst case we preallocate amount of vmap blocks which is equal to
>    CPU number on the system.
>
> 2. In previous patch I added newly allocated block to the tail of free list
>    to avoid soon exhaustion of virtual space and give a chance to occupy
>    blocks which were allocated long time ago.  Thus to find newly allocated
>    block all the search sequence should be repeated, seems it is not efficient.
>
> In this patch newly allocated block is occupied right away, address of virtual
> space is returned to the caller, so there is no any need to repeat the search
> sequence, allocation job is done.
>
> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 21 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index db6bffb..9759c92 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -791,13 +791,30 @@ static unsigned long addr_to_vb_idx(unsigned long addr)
>         return addr;
>  }
>
> -static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
> +static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
> +{
> +       unsigned long addr = va_start + (pages_off << PAGE_SHIFT);
> +       BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start));

Need one blank line between above two lines.
Please run script/checkpatch.pl.

> +       return (void *)addr;
> +}
> +
> +/**
> + * new_vmap_block - allocates new vmap_block and occupies 2^order pages in this
> + *                  block. Of course pages number can't exceed VMAP_BBMAP_BITS
> + * @order:    how many 2^order pages should be occupied in newly allocated block
> + * @gfp_mask: flags for the page level allocator
> + * @addr:     output virtual address of a newly allocator block
> + *
> + * Returns: address of virtual space in a block or ERR_PTR
> + */
> +static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  {
>         struct vmap_block_queue *vbq;
>         struct vmap_block *vb;
>         struct vmap_area *va;
>         unsigned long vb_idx;
>         int node, err;
> +       void *vaddr;
>
>         node = numa_node_id();
>
> @@ -821,9 +838,12 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>                 return ERR_PTR(err);
>         }
>
> +       vaddr = vmap_block_vaddr(va->va_start, 0);
>         spin_lock_init(&vb->lock);
>         vb->va = va;
> -       vb->free = VMAP_BBMAP_BITS;
> +       /* At least something should be left free */
> +       BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
> +       vb->free = VMAP_BBMAP_BITS - (1UL << order);
>         vb->dirty = 0;
>         bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
>         INIT_LIST_HEAD(&vb->free_list);
> @@ -841,7 +861,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>         spin_unlock(&vbq->lock);
>         put_cpu_var(vmap_block_queue);
>
> -       return vb;
> +       return vaddr;
>  }
>
>  static void free_vmap_block(struct vmap_block *vb)
> @@ -905,7 +925,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>  {
>         struct vmap_block_queue *vbq;
>         struct vmap_block *vb;
> -       unsigned long addr = 0;
> +       void *vaddr = NULL;
>         unsigned int order;
>
>         BUG_ON(size & ~PAGE_MASK);
> @@ -920,43 +940,38 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>         }
>         order = get_order(size);
>
> -again:
>         rcu_read_lock();
>         vbq = &get_cpu_var(vmap_block_queue);
>         list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> -               int i;
> +               unsigned long pages_nr;

I think that pages_off is better. Is there any reason to use this naming?

Anyway, patch looks okay to me.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: occupy newly allocated vmap block just after allocation
@ 2015-03-18  5:51     ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:51 UTC (permalink / raw)
  To: Roman Pen
  Cc: Nick Piggin, Andrew Morton, Eric Dumazet, Joonsoo Kim,
	David Rientjes, WANG Chao, Fabian Frederick, Christoph Lameter,
	Gioh Kim, Rob Jones, Linux Memory Management List, LKML

2015-03-13 21:12 GMT+09:00 Roman Pen <r.peniaev@gmail.com>:
> Previous implementation allocates new vmap block and repeats search of a free
> block from the very beginning, iterating over the CPU free list.
>
> Why it can be better??
>
> 1. Allocation can happen on one CPU, but search can be done on another CPU.
>    In worst case we preallocate amount of vmap blocks which is equal to
>    CPU number on the system.
>
> 2. In previous patch I added newly allocated block to the tail of free list
>    to avoid soon exhaustion of virtual space and give a chance to occupy
>    blocks which were allocated long time ago.  Thus to find newly allocated
>    block all the search sequence should be repeated, seems it is not efficient.
>
> In this patch newly allocated block is occupied right away, address of virtual
> space is returned to the caller, so there is no any need to repeat the search
> sequence, allocation job is done.
>
> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++++++++---------------------
>  1 file changed, 36 insertions(+), 21 deletions(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index db6bffb..9759c92 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -791,13 +791,30 @@ static unsigned long addr_to_vb_idx(unsigned long addr)
>         return addr;
>  }
>
> -static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
> +static void *vmap_block_vaddr(unsigned long va_start, unsigned long pages_off)
> +{
> +       unsigned long addr = va_start + (pages_off << PAGE_SHIFT);
> +       BUG_ON(addr_to_vb_idx(addr) != addr_to_vb_idx(va_start));

Need one blank line between above two lines.
Please run script/checkpatch.pl.

> +       return (void *)addr;
> +}
> +
> +/**
> + * new_vmap_block - allocates new vmap_block and occupies 2^order pages in this
> + *                  block. Of course pages number can't exceed VMAP_BBMAP_BITS
> + * @order:    how many 2^order pages should be occupied in newly allocated block
> + * @gfp_mask: flags for the page level allocator
> + * @addr:     output virtual address of a newly allocator block
> + *
> + * Returns: address of virtual space in a block or ERR_PTR
> + */
> +static void *new_vmap_block(unsigned int order, gfp_t gfp_mask)
>  {
>         struct vmap_block_queue *vbq;
>         struct vmap_block *vb;
>         struct vmap_area *va;
>         unsigned long vb_idx;
>         int node, err;
> +       void *vaddr;
>
>         node = numa_node_id();
>
> @@ -821,9 +838,12 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>                 return ERR_PTR(err);
>         }
>
> +       vaddr = vmap_block_vaddr(va->va_start, 0);
>         spin_lock_init(&vb->lock);
>         vb->va = va;
> -       vb->free = VMAP_BBMAP_BITS;
> +       /* At least something should be left free */
> +       BUG_ON(VMAP_BBMAP_BITS <= (1UL << order));
> +       vb->free = VMAP_BBMAP_BITS - (1UL << order);
>         vb->dirty = 0;
>         bitmap_zero(vb->dirty_map, VMAP_BBMAP_BITS);
>         INIT_LIST_HEAD(&vb->free_list);
> @@ -841,7 +861,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
>         spin_unlock(&vbq->lock);
>         put_cpu_var(vmap_block_queue);
>
> -       return vb;
> +       return vaddr;
>  }
>
>  static void free_vmap_block(struct vmap_block *vb)
> @@ -905,7 +925,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>  {
>         struct vmap_block_queue *vbq;
>         struct vmap_block *vb;
> -       unsigned long addr = 0;
> +       void *vaddr = NULL;
>         unsigned int order;
>
>         BUG_ON(size & ~PAGE_MASK);
> @@ -920,43 +940,38 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
>         }
>         order = get_order(size);
>
> -again:
>         rcu_read_lock();
>         vbq = &get_cpu_var(vmap_block_queue);
>         list_for_each_entry_rcu(vb, &vbq->free, free_list) {
> -               int i;
> +               unsigned long pages_nr;

I think that pages_off is better. Is there any reason to use this naming?

Anyway, patch looks okay to me.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
  2015-03-13 12:12   ` Roman Pen
@ 2015-03-18  5:52     ` Joonsoo Kim
  -1 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:52 UTC (permalink / raw)
  To: Roman Pen
  Cc: Nick Piggin, Zhang Yanfei, Andrew Morton, Eric Dumazet,
	Joonsoo Kim, David Rientjes, WANG Chao, Fabian Frederick,
	Christoph Lameter, Gioh Kim, Rob Jones,
	Linux Memory Management List, LKML

2015-03-13 21:12 GMT+09:00 Roman Pen <r.peniaev@gmail.com>:
> In original implementation of vm_map_ram made by Nick Piggin there were two
> bitmaps:  alloc_map and dirty_map.  None of them were used as supposed to be:
> finding a suitable free hole for next allocation in block. vm_map_ram allocates
> space sequentially in block and on free call marks pages as dirty, so freed
> space can't be reused anymore.
>
> Actually would be very interesting to know the real meaning of those bitmaps,
> maybe implementation was incomplete, etc.
>
> But long time ago Zhang Yanfei removed alloc_map by these two commits:
>
>   mm/vmalloc.c: remove dead code in vb_alloc
>      3fcd76e8028e0be37b02a2002b4f56755daeda06
>   mm/vmalloc.c: remove alloc_map from vmap_block
>      b8e748b6c32999f221ea4786557b8e7e6c4e4e7a
>
> In current patch I replaced dirty_map with two range variables: dirty min and
> max.  These variables store minimum and maximum position of dirty space in a
> block, since we need only to know the dirty range, not exact position of dirty
> pages.
>
> Why it was made? Several reasons: at first glance it seems that vm_map_ram
> allocator concerns about fragmentation thus it uses bitmaps for finding free
> hole, but it is not true.  To avoid complexity seems it is better to use
> something simple, like min or max range values.  Secondly, code also becomes
> simpler, without iteration over bitmap, just comparing values in min and max
> macros.  Thirdly, bitmap occupies up to 1024 bits (4MB is a max size of a
> block).  Here I replaced the whole bitmap with two longs.
>
> Finally vm_unmap_aliases should be slightly faster and the whole vmap_block
> structure occupies less memory.
>
> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure
@ 2015-03-18  5:52     ` Joonsoo Kim
  0 siblings, 0 replies; 35+ messages in thread
From: Joonsoo Kim @ 2015-03-18  5:52 UTC (permalink / raw)
  To: Roman Pen
  Cc: Nick Piggin, Zhang Yanfei, Andrew Morton, Eric Dumazet,
	Joonsoo Kim, David Rientjes, WANG Chao, Fabian Frederick,
	Christoph Lameter, Gioh Kim, Rob Jones,
	Linux Memory Management List, LKML

2015-03-13 21:12 GMT+09:00 Roman Pen <r.peniaev@gmail.com>:
> In original implementation of vm_map_ram made by Nick Piggin there were two
> bitmaps:  alloc_map and dirty_map.  None of them were used as supposed to be:
> finding a suitable free hole for next allocation in block. vm_map_ram allocates
> space sequentially in block and on free call marks pages as dirty, so freed
> space can't be reused anymore.
>
> Actually would be very interesting to know the real meaning of those bitmaps,
> maybe implementation was incomplete, etc.
>
> But long time ago Zhang Yanfei removed alloc_map by these two commits:
>
>   mm/vmalloc.c: remove dead code in vb_alloc
>      3fcd76e8028e0be37b02a2002b4f56755daeda06
>   mm/vmalloc.c: remove alloc_map from vmap_block
>      b8e748b6c32999f221ea4786557b8e7e6c4e4e7a
>
> In current patch I replaced dirty_map with two range variables: dirty min and
> max.  These variables store minimum and maximum position of dirty space in a
> block, since we need only to know the dirty range, not exact position of dirty
> pages.
>
> Why it was made? Several reasons: at first glance it seems that vm_map_ram
> allocator concerns about fragmentation thus it uses bitmaps for finding free
> hole, but it is not true.  To avoid complexity seems it is better to use
> something simple, like min or max range values.  Secondly, code also becomes
> simpler, without iteration over bitmap, just comparing values in min and max
> macros.  Thirdly, bitmap occupies up to 1024 bits (4MB is a max size of a
> block).  Here I replaced the whole bitmap with two longs.
>
> Finally vm_unmap_aliases should be slightly faster and the whole vmap_block
> structure occupies less memory.
>
> Signed-off-by: Roman Pen <r.peniaev@gmail.com>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: WANG Chao <chaowang@redhat.com>
> Cc: Fabian Frederick <fabf@skynet.be>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Gioh Kim <gioh.kim@lge.com>
> Cc: Rob Jones <rob.jones@codethink.co.uk>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-03-18  5:52 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-13 12:12 [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space Roman Pen
2015-03-13 12:12 ` Roman Pen
2015-03-13 12:12 ` Roman Pen
2015-03-13 12:12 ` [PATCH 1/3] mm/vmalloc: fix possible exhaustion of vmalloc space caused by vm_map_ram allocator Roman Pen
2015-03-13 12:12   ` Roman Pen
2015-03-13 12:12   ` Roman Pen
2015-03-17  4:56   ` Joonsoo Kim
2015-03-17  4:56     ` Joonsoo Kim
2015-03-17  5:12     ` Roman Peniaev
2015-03-17  5:12       ` Roman Peniaev
2015-03-17  7:29       ` Joonsoo Kim
2015-03-17  7:29         ` Joonsoo Kim
2015-03-17  8:22         ` Roman Peniaev
2015-03-17  8:22           ` Roman Peniaev
2015-03-17 21:58           ` Andrew Morton
2015-03-17 21:58             ` Andrew Morton
2015-03-18  5:07             ` Joonsoo Kim
2015-03-18  5:07               ` Joonsoo Kim
2015-03-18  5:05           ` Joonsoo Kim
2015-03-18  5:05             ` Joonsoo Kim
2015-03-13 12:12 ` [PATCH 2/3] mm/vmalloc: occupy newly allocated vmap block just after allocation Roman Pen
2015-03-13 12:12   ` Roman Pen
2015-03-18  5:51   ` Joonsoo Kim
2015-03-18  5:51     ` Joonsoo Kim
2015-03-13 12:12 ` [PATCH 3/3] mm/vmalloc: get rid of dirty bitmap inside vmap_block structure Roman Pen
2015-03-13 12:12   ` Roman Pen
2015-03-18  5:52   ` Joonsoo Kim
2015-03-18  5:52     ` Joonsoo Kim
2015-03-16 10:28 ` [PATCH 0/3] [RFC] mm/vmalloc: fix possible exhaustion of vmalloc space Gioh Kim
2015-03-16 10:28   ` Gioh Kim
2015-03-16 10:28   ` Gioh Kim
2015-03-16 10:49   ` Roman Peniaev
2015-03-16 10:49     ` Roman Peniaev
2015-03-16 10:57     ` Roman Peniaev
2015-03-16 10:57       ` Roman Peniaev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.