All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] mm/vmalloc: fix size check and few cleanups
@ 2019-01-03 14:59 ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

The first patch in series fixes size check for remap_vmalloc_range_partial()
which can lead to kernel crash by unfaithful userspace mapping.  Others two
are minor cleanups.

Roman Penyaev (3):
  mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
  mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()

 mm/vmalloc.c | 48 ++++++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
-- 
2.19.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/3] mm/vmalloc: fix size check and few cleanups
@ 2019-01-03 14:59 ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

The first patch in series fixes size check for remap_vmalloc_range_partial()
which can lead to kernel crash by unfaithful userspace mapping.  Others two
are minor cleanups.

Roman Penyaev (3):
  mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
  mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()

 mm/vmalloc.c | 48 ++++++++++++++++++++----------------------------
 1 file changed, 20 insertions(+), 28 deletions(-)

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
-- 
2.19.1

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 14:59 ` Roman Penyaev
@ 2019-01-03 14:59   ` Roman Penyaev
  -1 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel, stable

area->size can include adjacent guard page but get_vm_area_size()
returns actual size of the area.

This fixes possible kernel crash when userspace tries to map area
on 1 page bigger: size check passes but the following vmalloc_to_page()
returns NULL on last guard (non-existing) page.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 871e41c55e23..2cd24186ba84 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (kaddr + size > area->addr + area->size)
+	if (kaddr + size > area->addr + get_vm_area_size(area))
 		return -EINVAL;
 
 	do {
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
@ 2019-01-03 14:59   ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel, stable

area->size can include adjacent guard page but get_vm_area_size()
returns actual size of the area.

This fixes possible kernel crash when userspace tries to map area
on 1 page bigger: size check passes but the following vmalloc_to_page()
returns NULL on last guard (non-existing) page.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
---
 mm/vmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 871e41c55e23..2cd24186ba84 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
 	if (!(area->flags & VM_USERMAP))
 		return -EINVAL;
 
-	if (kaddr + size > area->addr + area->size)
+	if (kaddr + size > area->addr + get_vm_area_size(area))
 		return -EINVAL;
 
 	do {
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
  2019-01-03 14:59 ` Roman Penyaev
@ 2019-01-03 14:59   ` Roman Penyaev
  -1 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

__vmalloc_area_node() calls vfree() on error path, which in turn calls
kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2cd24186ba84..dc6a62bca503 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
 	__vfree_deferred(addr);
 }
 
+static void __vfree(const void *addr)
+{
+	if (unlikely(in_interrupt()))
+		__vfree_deferred(addr);
+	else
+		__vunmap(addr, 1);
+}
+
 /**
  *	vfree  -  release memory allocated by vmalloc()
  *	@addr:		memory base address
@@ -1591,10 +1599,8 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
-		__vfree_deferred(addr);
-	else
-		__vunmap(addr, 1);
+
+	__vfree(addr);
 }
 EXPORT_SYMBOL(vfree);
 
@@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	warn_alloc(gfp_mask, NULL,
 			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
-	vfree(area->addr);
+	__vfree(area->addr);
 	return NULL;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
@ 2019-01-03 14:59   ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

__vmalloc_area_node() calls vfree() on error path, which in turn calls
kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 2cd24186ba84..dc6a62bca503 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
 	__vfree_deferred(addr);
 }
 
+static void __vfree(const void *addr)
+{
+	if (unlikely(in_interrupt()))
+		__vfree_deferred(addr);
+	else
+		__vunmap(addr, 1);
+}
+
 /**
  *	vfree  -  release memory allocated by vmalloc()
  *	@addr:		memory base address
@@ -1591,10 +1599,8 @@ void vfree(const void *addr)
 
 	if (!addr)
 		return;
-	if (unlikely(in_interrupt()))
-		__vfree_deferred(addr);
-	else
-		__vunmap(addr, 1);
+
+	__vfree(addr);
 }
 EXPORT_SYMBOL(vfree);
 
@@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 	warn_alloc(gfp_mask, NULL,
 			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
 			  (area->nr_pages*PAGE_SIZE), area->size);
-	vfree(area->addr);
+	__vfree(area->addr);
 	return NULL;
 }
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()
  2019-01-03 14:59 ` Roman Penyaev
@ 2019-01-03 14:59   ` Roman Penyaev
  -1 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

vmalloc_user*() calls differ from normal vmalloc() only in that they
set VM_USERMAP flags for the area.  During the whole history of
vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
directly to __vmalloc_node_range() call instead of finding the area
(which obviously takes time) after the allocation.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dc6a62bca503..83fa4c642f5e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1865,18 +1865,10 @@ EXPORT_SYMBOL(vzalloc);
  */
 void *vmalloc_user(unsigned long size)
 {
-	struct vm_struct *area;
-	void *ret;
-
-	ret = __vmalloc_node(size, SHMLBA,
-			     GFP_KERNEL | __GFP_ZERO,
-			     PAGE_KERNEL, NUMA_NO_NODE,
-			     __builtin_return_address(0));
-	if (ret) {
-		area = find_vm_area(ret);
-		area->flags |= VM_USERMAP;
-	}
-	return ret;
+	return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
+				    VM_USERMAP, NUMA_NO_NODE,
+				    __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_user);
 
@@ -1970,16 +1962,10 @@ EXPORT_SYMBOL(vmalloc_32);
  */
 void *vmalloc_32_user(unsigned long size)
 {
-	struct vm_struct *area;
-	void *ret;
-
-	ret = __vmalloc_node(size, 1, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
-			     NUMA_NO_NODE, __builtin_return_address(0));
-	if (ret) {
-		area = find_vm_area(ret);
-		area->flags |= VM_USERMAP;
-	}
-	return ret;
+	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
+				    GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
+				    VM_USERMAP, NUMA_NO_NODE,
+				    __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()
@ 2019-01-03 14:59   ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 14:59 UTC (permalink / raw)
  Cc: Roman Penyaev, Andrew Morton, Michal Hocko, Andrey Ryabinin,
	Joe Perches, Luis R. Rodriguez, linux-mm, linux-kernel

vmalloc_user*() calls differ from normal vmalloc() only in that they
set VM_USERMAP flags for the area.  During the whole history of
vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
directly to __vmalloc_node_range() call instead of finding the area
(which obviously takes time) after the allocation.

Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Joe Perches <joe@perches.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
---
 mm/vmalloc.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dc6a62bca503..83fa4c642f5e 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1865,18 +1865,10 @@ EXPORT_SYMBOL(vzalloc);
  */
 void *vmalloc_user(unsigned long size)
 {
-	struct vm_struct *area;
-	void *ret;
-
-	ret = __vmalloc_node(size, SHMLBA,
-			     GFP_KERNEL | __GFP_ZERO,
-			     PAGE_KERNEL, NUMA_NO_NODE,
-			     __builtin_return_address(0));
-	if (ret) {
-		area = find_vm_area(ret);
-		area->flags |= VM_USERMAP;
-	}
-	return ret;
+	return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
+				    GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
+				    VM_USERMAP, NUMA_NO_NODE,
+				    __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_user);
 
@@ -1970,16 +1962,10 @@ EXPORT_SYMBOL(vmalloc_32);
  */
 void *vmalloc_32_user(unsigned long size)
 {
-	struct vm_struct *area;
-	void *ret;
-
-	ret = __vmalloc_node(size, 1, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
-			     NUMA_NO_NODE, __builtin_return_address(0));
-	if (ret) {
-		area = find_vm_area(ret);
-		area->flags |= VM_USERMAP;
-	}
-	return ret;
+	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
+				    GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
+				    VM_USERMAP, NUMA_NO_NODE,
+				    __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_32_user);
 
-- 
2.19.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 14:59   ` Roman Penyaev
  (?)
@ 2019-01-03 15:13   ` Michal Hocko
  2019-01-03 19:27     ` Roman Penyaev
  -1 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2019-01-03 15:13 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.

Can this actually happen? I am not really familiar with all the callers
of this API but VM_NO_GUARD is not really used wildly in the kernel.
All I can see is kasan na arm64 which doesn't really seem to use it
for vmalloc.

So is the problem real or this is a mere cleanup?
 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
>  
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
>  
>  	do {
> -- 
> 2.19.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()
  2019-01-03 14:59   ` Roman Penyaev
  (?)
@ 2019-01-03 15:23   ` Michal Hocko
  -1 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2019-01-03 15:23 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On Thu 03-01-19 15:59:54, Roman Penyaev wrote:
> vmalloc_user*() calls differ from normal vmalloc() only in that they
> set VM_USERMAP flags for the area.  During the whole history of
> vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
> directly to __vmalloc_node_range() call instead of finding the area
> (which obviously takes time) after the allocation.

Yes, this looks correct and a nice cleanup

> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/vmalloc.c | 30 ++++++++----------------------
>  1 file changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dc6a62bca503..83fa4c642f5e 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1865,18 +1865,10 @@ EXPORT_SYMBOL(vzalloc);
>   */
>  void *vmalloc_user(unsigned long size)
>  {
> -	struct vm_struct *area;
> -	void *ret;
> -
> -	ret = __vmalloc_node(size, SHMLBA,
> -			     GFP_KERNEL | __GFP_ZERO,
> -			     PAGE_KERNEL, NUMA_NO_NODE,
> -			     __builtin_return_address(0));
> -	if (ret) {
> -		area = find_vm_area(ret);
> -		area->flags |= VM_USERMAP;
> -	}
> -	return ret;
> +	return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
> +				    GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL,
> +				    VM_USERMAP, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
>  }
>  EXPORT_SYMBOL(vmalloc_user);
>  
> @@ -1970,16 +1962,10 @@ EXPORT_SYMBOL(vmalloc_32);
>   */
>  void *vmalloc_32_user(unsigned long size)
>  {
> -	struct vm_struct *area;
> -	void *ret;
> -
> -	ret = __vmalloc_node(size, 1, GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
> -			     NUMA_NO_NODE, __builtin_return_address(0));
> -	if (ret) {
> -		area = find_vm_area(ret);
> -		area->flags |= VM_USERMAP;
> -	}
> -	return ret;
> +	return __vmalloc_node_range(size, 1,  VMALLOC_START, VMALLOC_END,
> +				    GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL,
> +				    VM_USERMAP, NUMA_NO_NODE,
> +				    __builtin_return_address(0));
>  }
>  EXPORT_SYMBOL(vmalloc_32_user);
>  
> -- 
> 2.19.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 15:13   ` Michal Hocko
@ 2019-01-03 19:27     ` Roman Penyaev
  2019-01-03 19:40       ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 19:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On 2019-01-03 16:13, Michal Hocko wrote:
> On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> area->size can include adjacent guard page but get_vm_area_size()
>> returns actual size of the area.
>> 
>> This fixes possible kernel crash when userspace tries to map area
>> on 1 page bigger: size check passes but the following 
>> vmalloc_to_page()
>> returns NULL on last guard (non-existing) page.
> 
> Can this actually happen? I am not really familiar with all the callers
> of this API but VM_NO_GUARD is not really used wildly in the kernel.

Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
thus the area->size will be bigger.  The bug is not reproduced if
VM_NO_GUARD is set.

> All I can see is kasan na arm64 which doesn't really seem to use it
> for vmalloc.
> 
> So is the problem real or this is a mere cleanup?

This is the real problem, try this hunk for any file descriptor which 
provides
mapping, or say modify epoll as example:

--------------------------------
diff --git a/fs/eventpoll.c b/fs/eventpoll.c

+static int ep_mmap(struct file *file, struct vm_area_struct *vma)
+{
+       void *mem;
+
+       mem = vmalloc_user(4096);
+       BUG_ON(!mem);
+       /* Do not care about mem leak */
+
+       return remap_vmalloc_range(vma, mem, 0);
+}
+
  /* File callbacks that implement the eventpoll file behaviour */
  static const struct file_operations eventpoll_fops = {
  #ifdef CONFIG_PROC_FS
         .show_fdinfo    = ep_show_fdinfo,
  #endif
+       .mmap           = ep_mmap,
         .release        = ep_eventpoll_release,
--------------------------------

and the following code from userspace, which maps 2 pages,
instead of 1:

--------------------------------
epfd = epoll_create1(0);
assert(epfd >= 0);

p = mmap(NULL, 2<<12, PROT_WRITE|PROT_READ, MAP_PRIVATE, epfd, 0);
assert(p != MAP_FAILED);
--------------------------------

You immediately get the following oops:

[   38.894571] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000008
[   38.899048] #PF error: [normal kernel read fault]
[   38.901487] PGD 0 P4D 0
[   38.902801] Oops: 0000 [#1] PREEMPT SMP PTI
[   38.904984] CPU: 2 PID: 399 Comm: mmap-epoll Not tainted 4.20.0-1 
#238
[   38.914064] RIP: 0010:vm_insert_page+0x3b/0x1d0
[   38.941181] Call Trace:
[   38.941656]  remap_vmalloc_range_partial+0x8d/0xd0
[   38.942417]  mmap_region+0x3c7/0x630
[   38.942982]  do_mmap+0x38d/0x560
[   38.943479]  vm_mmap_pgoff+0x9a/0xf0
[   38.944028]  ksys_mmap_pgoff+0x18e/0x220
[   38.944554]  do_syscall_64+0x48/0xf0
[   38.945076]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

--
Roman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 19:27     ` Roman Penyaev
@ 2019-01-03 19:40       ` Michal Hocko
  2019-01-03 20:31         ` Roman Penyaev
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2019-01-03 19:40 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> On 2019-01-03 16:13, Michal Hocko wrote:
> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > area->size can include adjacent guard page but get_vm_area_size()
> > > returns actual size of the area.
> > > 
> > > This fixes possible kernel crash when userspace tries to map area
> > > on 1 page bigger: size check passes but the following
> > > vmalloc_to_page()
> > > returns NULL on last guard (non-existing) page.
> > 
> > Can this actually happen? I am not really familiar with all the callers
> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
> 
> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> thus the area->size will be bigger.  The bug is not reproduced if
> VM_NO_GUARD is set.
> 
> > All I can see is kasan na arm64 which doesn't really seem to use it
> > for vmalloc.
> > 
> > So is the problem real or this is a mere cleanup?
> 
> This is the real problem, try this hunk for any file descriptor which
> provides
> mapping, or say modify epoll as example:

OK, my response was more confusing than I intended. I meant to say. Is
there any in kernel code that would allow the bug have had in mind?
In other words can userspace trick any existing code?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 14:59   ` Roman Penyaev
  (?)
  (?)
@ 2019-01-03 19:59   ` Michal Hocko
  -1 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2019-01-03 19:59 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org

Forgot to add
Acked-by: Michal Hocko <mhocko@suse.com>
Although I am not really sure the stable backport is really needed as I
haven't seen any explicit example of a buggy kernel code to trigger the
issue.

> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
>  
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
>  
>  	do {
> -- 
> 2.19.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 19:40       ` Michal Hocko
@ 2019-01-03 20:31         ` Roman Penyaev
  2019-01-04  9:38           ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Roman Penyaev @ 2019-01-03 20:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On 2019-01-03 20:40, Michal Hocko wrote:
> On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
>> On 2019-01-03 16:13, Michal Hocko wrote:
>> > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
>> > > area->size can include adjacent guard page but get_vm_area_size()
>> > > returns actual size of the area.
>> > >
>> > > This fixes possible kernel crash when userspace tries to map area
>> > > on 1 page bigger: size check passes but the following
>> > > vmalloc_to_page()
>> > > returns NULL on last guard (non-existing) page.
>> >
>> > Can this actually happen? I am not really familiar with all the callers
>> > of this API but VM_NO_GUARD is not really used wildly in the kernel.
>> 
>> Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
>> thus the area->size will be bigger.  The bug is not reproduced if
>> VM_NO_GUARD is set.
>> 
>> > All I can see is kasan na arm64 which doesn't really seem to use it
>> > for vmalloc.
>> >
>> > So is the problem real or this is a mere cleanup?
>> 
>> This is the real problem, try this hunk for any file descriptor which
>> provides
>> mapping, or say modify epoll as example:
> 
> OK, my response was more confusing than I intended. I meant to say. Is
> there any in kernel code that would allow the bug have had in mind?
> In other words can userspace trick any existing code?

In theory any existing caller of remap_vmalloc_range() which does
not have an explicit size check should trigger an oops, e.g. this is
a good candidate:

*** drivers/media/usb/stkwebcam/stk-webcam.c:
v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, 
sbuf->buffer, 0);

According to the code no explicit size check, should be easy to 
reproduce:
mmap the frame buffer and you are done.

Other callers are not so easy to follow. But wait, here is another 
example:

(drivers/video/fbdev/core/fbmem.c)
static int
fb_mmap(struct file *file, struct vm_area_struct * vma)
         ...
    	res = fb->fb_mmap(info, vma);

(drivers/video/fbdev/vfb.c)
static int vfb_mmap(struct fb_info *info,
		    struct vm_area_struct *vma)
{
	return remap_vmalloc_range(vma, (void *)info->fix.smem_start, 
vma->vm_pgoff);
}

No checks, naked calls, should be also the candidate.


--
Roman




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 20:31         ` Roman Penyaev
@ 2019-01-04  9:38           ` Michal Hocko
  2019-01-04 10:21             ` Roman Penyaev
  0 siblings, 1 reply; 22+ messages in thread
From: Michal Hocko @ 2019-01-04  9:38 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On Thu 03-01-19 21:31:58, Roman Penyaev wrote:
> On 2019-01-03 20:40, Michal Hocko wrote:
> > On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
> > > On 2019-01-03 16:13, Michal Hocko wrote:
> > > > On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
> > > > > area->size can include adjacent guard page but get_vm_area_size()
> > > > > returns actual size of the area.
> > > > >
> > > > > This fixes possible kernel crash when userspace tries to map area
> > > > > on 1 page bigger: size check passes but the following
> > > > > vmalloc_to_page()
> > > > > returns NULL on last guard (non-existing) page.
> > > >
> > > > Can this actually happen? I am not really familiar with all the callers
> > > > of this API but VM_NO_GUARD is not really used wildly in the kernel.
> > > 
> > > Exactly, by default (VM_NO_GUARD is not set) each area has guard page,
> > > thus the area->size will be bigger.  The bug is not reproduced if
> > > VM_NO_GUARD is set.
> > > 
> > > > All I can see is kasan na arm64 which doesn't really seem to use it
> > > > for vmalloc.
> > > >
> > > > So is the problem real or this is a mere cleanup?
> > > 
> > > This is the real problem, try this hunk for any file descriptor which
> > > provides
> > > mapping, or say modify epoll as example:
> > 
> > OK, my response was more confusing than I intended. I meant to say. Is
> > there any in kernel code that would allow the bug have had in mind?
> > In other words can userspace trick any existing code?
> 
> In theory any existing caller of remap_vmalloc_range() which does
> not have an explicit size check should trigger an oops, e.g. this is
> a good candidate:
> 
> *** drivers/media/usb/stkwebcam/stk-webcam.c:
> v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, sbuf->buffer,
> 0);

Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
that is indeed not enforced to be <= length AFAICS. So you are right!

Can we have an example in the changelog please?

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-04  9:38           ` Michal Hocko
@ 2019-01-04 10:21             ` Roman Penyaev
  2019-01-04 10:28               ` Michal Hocko
  0 siblings, 1 reply; 22+ messages in thread
From: Roman Penyaev @ 2019-01-04 10:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On 2019-01-04 10:38, Michal Hocko wrote:

[...]

>> >
>> > OK, my response was more confusing than I intended. I meant to say. Is
>> > there any in kernel code that would allow the bug have had in mind?
>> > In other words can userspace trick any existing code?
>> 
>> In theory any existing caller of remap_vmalloc_range() which does
>> not have an explicit size check should trigger an oops, e.g. this is
>> a good candidate:
>> 
>> *** drivers/media/usb/stkwebcam/stk-webcam.c:
>> v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma, 
>> sbuf->buffer,
>> 0);
> 
> Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> that is indeed not enforced to be <= length AFAICS. So you are right!
> 
> Can we have an example in the changelog please?

You mean to resend this particular patch with the list of possible
candidates for oops in a comment message?  Sure thing.

--
Roman


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-04 10:21             ` Roman Penyaev
@ 2019-01-04 10:28               ` Michal Hocko
  0 siblings, 0 replies; 22+ messages in thread
From: Michal Hocko @ 2019-01-04 10:28 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

On Fri 04-01-19 11:21:39, Roman Penyaev wrote:
> On 2019-01-04 10:38, Michal Hocko wrote:
> 
> [...]
> 
> > > >
> > > > OK, my response was more confusing than I intended. I meant to say. Is
> > > > there any in kernel code that would allow the bug have had in mind?
> > > > In other words can userspace trick any existing code?
> > > 
> > > In theory any existing caller of remap_vmalloc_range() which does
> > > not have an explicit size check should trigger an oops, e.g. this is
> > > a good candidate:
> > > 
> > > *** drivers/media/usb/stkwebcam/stk-webcam.c:
> > > v4l_stk_mmap[789]              ret = remap_vmalloc_range(vma,
> > > sbuf->buffer,
> > > 0);
> > 
> > Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have
> > buf->v4lbuf.length. mmap callback maps this buffer to the vma size and
> > that is indeed not enforced to be <= length AFAICS. So you are right!
> > 
> > Can we have an example in the changelog please?
> 
> You mean to resend this particular patch with the list of possible
> candidates for oops in a comment message?  Sure thing.

I would just reply to the original patch with an updated changelog
wording (to include the above example and explain how the vma setup is
completely independent on the buffer allocation and ask Andrew to update
the changelog of the patch that is already in the mmotm tree).

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 14:59   ` Roman Penyaev
                     ` (2 preceding siblings ...)
  (?)
@ 2019-01-04 11:06   ` Roman Penyaev
  -1 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-04 11:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, Andrey Ryabinin, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable

Hi Andrew,

Could you please update the commit message with something as
the following (I hope it will become a bit clearer now):

-------------
When VM_NO_GUARD is not set area->size includes adjacent guard page,
thus for correct size checking get_vm_area_size() should be used,
but not area->size.

This fixes possible kernel oops when userspace tries to mmap an area
on 1 page bigger than was allocated by vmalloc_user() call: the size
check inside remap_vmalloc_range_partial() accounts non-existing
guard page also, so check successfully passes but vmalloc_to_page()
returns NULL (guard page does not physically exist).

The following code pattern example should trigger an oops:

  static int oops_mmap(struct file *file, struct vm_area_struct *vma)
  {
        void *mem;

        mem = vmalloc_user(4096);
        BUG_ON(!mem);
        /* Do not care about mem leak */

        return remap_vmalloc_range(vma, mem, 0);
  }

And userspace simply mmaps size + PAGE_SIZE:

  mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);

Possible candidates for oops which do not have any explicit size
checks:

   *** drivers/media/usb/stkwebcam/stk-webcam.c:
   v4l_stk_mmap[789]   ret = remap_vmalloc_range(vma, sbuf->buffer, 0);

Or the following one:

   *** drivers/video/fbdev/core/fbmem.c
   static int
   fb_mmap(struct file *file, struct vm_area_struct * vma)
        ...
        res = fb->fb_mmap(info, vma);

Where fb_mmap callback calls remap_vmalloc_range() directly without
any explicit checks:

   *** drivers/video/fbdev/vfb.c
   static int vfb_mmap(struct fb_info *info,
             struct vm_area_struct *vma)
   {
       return remap_vmalloc_range(vma, (void *)info->fix.smem_start, 
vma->vm_pgoff);
   }
--------

--
Roman


On 2019-01-03 15:59, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  mm/vmalloc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 871e41c55e23..2cd24186ba84 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct
> vm_area_struct *vma, unsigned long uaddr,
>  	if (!(area->flags & VM_USERMAP))
>  		return -EINVAL;
> 
> -	if (kaddr + size > area->addr + area->size)
> +	if (kaddr + size > area->addr + get_vm_area_size(area))
>  		return -EINVAL;
> 
>  	do {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial()
  2019-01-03 14:59   ` Roman Penyaev
                     ` (3 preceding siblings ...)
  (?)
@ 2019-01-11 19:19   ` Andrey Ryabinin
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrey Ryabinin @ 2019-01-11 19:19 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Michal Hocko, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel, stable



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> area->size can include adjacent guard page but get_vm_area_size()
> returns actual size of the area.
> 
> This fixes possible kernel crash when userspace tries to map area
> on 1 page bigger: size check passes but the following vmalloc_to_page()
> returns NULL on last guard (non-existing) page.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---

Fixes: e69e9d4aee71 ("vmalloc: introduce remap_vmalloc_range_partial")
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range()
  2019-01-03 14:59   ` Roman Penyaev
  (?)
  (?)
@ 2019-01-11 19:19   ` Andrey Ryabinin
  -1 siblings, 0 replies; 22+ messages in thread
From: Andrey Ryabinin @ 2019-01-11 19:19 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Michal Hocko, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> vmalloc_user*() calls differ from normal vmalloc() only in that they
> set VM_USERMAP flags for the area.  During the whole history of
> vmalloc.c changes now it is possible simply to pass VM_USERMAP flags
> directly to __vmalloc_node_range() call instead of finding the area
> (which obviously takes time) after the allocation.
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org

Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
  2019-01-03 14:59   ` Roman Penyaev
  (?)
@ 2019-01-11 19:26   ` Andrey Ryabinin
  2019-01-12 17:19     ` Roman Penyaev
  -1 siblings, 1 reply; 22+ messages in thread
From: Andrey Ryabinin @ 2019-01-11 19:26 UTC (permalink / raw)
  To: Roman Penyaev
  Cc: Andrew Morton, Michal Hocko, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel



On 1/3/19 5:59 PM, Roman Penyaev wrote:
> __vmalloc_area_node() calls vfree() on error path, which in turn calls
> kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().
> 
> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Joe Perches <joe@perches.com>
> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  mm/vmalloc.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 2cd24186ba84..dc6a62bca503 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
>  	__vfree_deferred(addr);
>  }
>  
> +static void __vfree(const void *addr)
> +{
> +	if (unlikely(in_interrupt()))
> +		__vfree_deferred(addr);
> +	else
> +		__vunmap(addr, 1);
> +}
> +
>  /**
>   *	vfree  -  release memory allocated by vmalloc()
>   *	@addr:		memory base address
> @@ -1591,10 +1599,8 @@ void vfree(const void *addr)
>  
>  	if (!addr)
>  		return;
> -	if (unlikely(in_interrupt()))
> -		__vfree_deferred(addr);
> -	else
> -		__vunmap(addr, 1);
> +
> +	__vfree(addr);
>  }
>  EXPORT_SYMBOL(vfree);
>  
> @@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
>  	warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>  			  (area->nr_pages*PAGE_SIZE), area->size);
> -	vfree(area->addr);
> +	__vfree(area->addr);

This can't be an interrupt context for a several reasons. One of them is BUG_ON(in_interrupt()) in __get_vm_area_node()
which is called right before __vmalloc_are_node().

So you can just do __vunmap(area->addr, 1); instead of __vfree().


>  	return NULL;
>  }
>  
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory
  2019-01-11 19:26   ` Andrey Ryabinin
@ 2019-01-12 17:19     ` Roman Penyaev
  0 siblings, 0 replies; 22+ messages in thread
From: Roman Penyaev @ 2019-01-12 17:19 UTC (permalink / raw)
  To: Andrey Ryabinin
  Cc: Andrew Morton, Michal Hocko, Joe Perches, Luis R. Rodriguez,
	linux-mm, linux-kernel

On 2019-01-11 20:26, Andrey Ryabinin wrote:
> On 1/3/19 5:59 PM, Roman Penyaev wrote:
>> __vmalloc_area_node() calls vfree() on error path, which in turn calls
>> kmemleak_free(), but area is not yet accounted by kmemleak_vmalloc().
>> 
>> Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
>> Cc: Joe Perches <joe@perches.com>
>> Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
>> Cc: linux-mm@kvack.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  mm/vmalloc.c | 16 +++++++++++-----
>>  1 file changed, 11 insertions(+), 5 deletions(-)
>> 
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 2cd24186ba84..dc6a62bca503 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -1565,6 +1565,14 @@ void vfree_atomic(const void *addr)
>>  	__vfree_deferred(addr);
>>  }
>> 
>> +static void __vfree(const void *addr)
>> +{
>> +	if (unlikely(in_interrupt()))
>> +		__vfree_deferred(addr);
>> +	else
>> +		__vunmap(addr, 1);
>> +}
>> +
>>  /**
>>   *	vfree  -  release memory allocated by vmalloc()
>>   *	@addr:		memory base address
>> @@ -1591,10 +1599,8 @@ void vfree(const void *addr)
>> 
>>  	if (!addr)
>>  		return;
>> -	if (unlikely(in_interrupt()))
>> -		__vfree_deferred(addr);
>> -	else
>> -		__vunmap(addr, 1);
>> +
>> +	__vfree(addr);
>>  }
>>  EXPORT_SYMBOL(vfree);
>> 
>> @@ -1709,7 +1715,7 @@ static void *__vmalloc_area_node(struct 
>> vm_struct *area, gfp_t gfp_mask,
>>  	warn_alloc(gfp_mask, NULL,
>>  			  "vmalloc: allocation failure, allocated %ld of %ld bytes",
>>  			  (area->nr_pages*PAGE_SIZE), area->size);
>> -	vfree(area->addr);
>> +	__vfree(area->addr);
> 
> This can't be an interrupt context for a several reasons. One of them
> is BUG_ON(in_interrupt()) in __get_vm_area_node()
> which is called right before __vmalloc_are_node().
> 
> So you can just do __vunmap(area->addr, 1); instead of __vfree().

Thanks, I missed that BUG_ON and could not prove, that we can call only
from a task context, thus decided not to make it strict.  Of course
simple __vunmap() is much better.  The other reason is that we call a
spin_lock without disabling the interrupts.  Now I see.

Andrew, may I resend just an updated version of this patch?

--
Roman


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-01-12 17:19 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-03 14:59 [PATCH 0/3] mm/vmalloc: fix size check and few cleanups Roman Penyaev
2019-01-03 14:59 ` Roman Penyaev
2019-01-03 14:59 ` [PATCH 1/3] mm/vmalloc: fix size check for remap_vmalloc_range_partial() Roman Penyaev
2019-01-03 14:59   ` Roman Penyaev
2019-01-03 15:13   ` Michal Hocko
2019-01-03 19:27     ` Roman Penyaev
2019-01-03 19:40       ` Michal Hocko
2019-01-03 20:31         ` Roman Penyaev
2019-01-04  9:38           ` Michal Hocko
2019-01-04 10:21             ` Roman Penyaev
2019-01-04 10:28               ` Michal Hocko
2019-01-03 19:59   ` Michal Hocko
2019-01-04 11:06   ` Roman Penyaev
2019-01-11 19:19   ` Andrey Ryabinin
2019-01-03 14:59 ` [PATCH 2/3] mm/vmalloc: do not call kmemleak_free() on not yet accounted memory Roman Penyaev
2019-01-03 14:59   ` Roman Penyaev
2019-01-11 19:26   ` Andrey Ryabinin
2019-01-12 17:19     ` Roman Penyaev
2019-01-03 14:59 ` [PATCH 3/3] mm/vmalloc: pass VM_USERMAP flags directly to __vmalloc_node_range() Roman Penyaev
2019-01-03 14:59   ` Roman Penyaev
2019-01-03 15:23   ` Michal Hocko
2019-01-11 19:19   ` Andrey Ryabinin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.