linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: convert three more cases to kvmalloc
@ 2017-06-29  3:24 Mikulas Patocka
  2017-06-29  7:10 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2017-06-29  3:24 UTC (permalink / raw)
  To: Michal Hocko, Andrew Morton, Stephen Rothwell
  Cc: Vlastimil Babka, Andreas Dilger, John Hubbard, David Miller,
	linux-kernel, linux-mm

Hi

I'm submitting this for the next merge window.

Mikulas



From: Mikulas Patocka <mpatocka@redhat.com>

The patch a7c3e901 ("mm: introduce kv[mz]alloc helpers") converted a lot 
of kernel code to kvmalloc. This patch converts three more forgotten 
cases.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
 fs/file.c                 |   12 +-----------
 kernel/bpf/syscall.c      |   11 +----------
 kernel/cgroup/cgroup-v1.c |    7 +------
 3 files changed, 3 insertions(+), 27 deletions(-)

Index: linux-2.6/fs/file.c
===================================================================
--- linux-2.6.orig/fs/file.c
+++ linux-2.6/fs/file.c
@@ -32,17 +32,7 @@ unsigned int sysctl_nr_open_max =
 
 static void *alloc_fdmem(size_t size)
 {
-	/*
-	 * Very large allocations can stress page reclaim, so fall back to
-	 * vmalloc() if the allocation size will be considered "large" by the VM.
-	 */
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		void *data = kmalloc(size, GFP_KERNEL_ACCOUNT |
-				     __GFP_NOWARN | __GFP_NORETRY);
-		if (data != NULL)
-			return data;
-	}
-	return __vmalloc(size, GFP_KERNEL_ACCOUNT, PAGE_KERNEL);
+	return kvmalloc(size, GFP_KERNEL_ACCOUNT);
 }
 
 static void __free_fdtable(struct fdtable *fdt)
Index: linux-2.6/kernel/bpf/syscall.c
===================================================================
--- linux-2.6.orig/kernel/bpf/syscall.c
+++ linux-2.6/kernel/bpf/syscall.c
@@ -58,16 +58,7 @@ void *bpf_map_area_alloc(size_t size)
 	 * trigger under memory pressure as we really just want to
 	 * fail instead.
 	 */
-	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
-	void *area;
-
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-		area = kmalloc(size, GFP_USER | flags);
-		if (area != NULL)
-			return area;
-	}
-
-	return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL);
+	return kvmalloc(size, GFP_USER | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO);
 }
 
 void bpf_map_area_free(void *area)
Index: linux-2.6/kernel/cgroup/cgroup-v1.c
===================================================================
--- linux-2.6.orig/kernel/cgroup/cgroup-v1.c
+++ linux-2.6/kernel/cgroup/cgroup-v1.c
@@ -184,15 +184,10 @@ struct cgroup_pidlist {
 /*
  * The following two functions "fix" the issue where there are more pids
  * than kmalloc will give memory for; in such cases, we use vmalloc/vfree.
- * TODO: replace with a kernel-wide solution to this problem
  */
-#define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2))
 static void *pidlist_allocate(int count)
 {
-	if (PIDLIST_TOO_LARGE(count))
-		return vmalloc(count * sizeof(pid_t));
-	else
-		return kmalloc(count * sizeof(pid_t), GFP_KERNEL);
+	return kvmalloc(count * sizeof(pid_t), GFP_KERNEL);
 }
 
 static void pidlist_free(void *p)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: convert three more cases to kvmalloc
  2017-06-29  3:24 [PATCH] mm: convert three more cases to kvmalloc Mikulas Patocka
@ 2017-06-29  7:10 ` Michal Hocko
  2017-06-30  2:13   ` Mikulas Patocka
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2017-06-29  7:10 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Andrew Morton, Stephen Rothwell, Vlastimil Babka, Andreas Dilger,
	John Hubbard, David Miller, linux-kernel, linux-mm

On Wed 28-06-17 23:24:10, Mikulas Patocka wrote:
[...]
> From: Mikulas Patocka <mpatocka@redhat.com>
> 
> The patch a7c3e901 ("mm: introduce kv[mz]alloc helpers") converted a lot 
> of kernel code to kvmalloc. This patch converts three more forgotten 
> cases.

Thanks! I have two remarks below but other than that feel free to add

> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>
[...]
> Index: linux-2.6/kernel/bpf/syscall.c
> ===================================================================
> --- linux-2.6.orig/kernel/bpf/syscall.c
> +++ linux-2.6/kernel/bpf/syscall.c
> @@ -58,16 +58,7 @@ void *bpf_map_area_alloc(size_t size)
>  	 * trigger under memory pressure as we really just want to
>  	 * fail instead.
>  	 */
> -	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> -	void *area;
> -
> -	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> -		area = kmalloc(size, GFP_USER | flags);
> -		if (area != NULL)
> -			return area;
> -	}
> -
> -	return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL);
> +	return kvmalloc(size, GFP_USER | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO);

kvzalloc without additional flags would be more appropriate.
__GFP_NORETRY is explicitly documented as non-supported and NOWARN
wouldn't be applied everywhere in the vmalloc path.

>  }
>  
>  void bpf_map_area_free(void *area)
> Index: linux-2.6/kernel/cgroup/cgroup-v1.c
> ===================================================================
> --- linux-2.6.orig/kernel/cgroup/cgroup-v1.c
> +++ linux-2.6/kernel/cgroup/cgroup-v1.c
> @@ -184,15 +184,10 @@ struct cgroup_pidlist {
>  /*
>   * The following two functions "fix" the issue where there are more pids
>   * than kmalloc will give memory for; in such cases, we use vmalloc/vfree.
> - * TODO: replace with a kernel-wide solution to this problem
>   */
> -#define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2))
>  static void *pidlist_allocate(int count)
>  {
> -	if (PIDLIST_TOO_LARGE(count))
> -		return vmalloc(count * sizeof(pid_t));
> -	else
> -		return kmalloc(count * sizeof(pid_t), GFP_KERNEL);
> +	return kvmalloc(count * sizeof(pid_t), GFP_KERNEL);
>  }

I would rather use kvmalloc_array to have an overflow protection as
well.

>  
>  static void pidlist_free(void *p)

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: convert three more cases to kvmalloc
  2017-06-29  7:10 ` Michal Hocko
@ 2017-06-30  2:13   ` Mikulas Patocka
  2017-06-30  7:21     ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Mikulas Patocka @ 2017-06-30  2:13 UTC (permalink / raw)
  To: Michal Hocko, Alexei Starovoitov, Daniel Borkmann
  Cc: Andrew Morton, Stephen Rothwell, Vlastimil Babka, Andreas Dilger,
	John Hubbard, David Miller, linux-kernel, linux-mm, netdev



On Thu, 29 Jun 2017, Michal Hocko wrote:

> On Wed 28-06-17 23:24:10, Mikulas Patocka wrote:
> [...]
> > From: Mikulas Patocka <mpatocka@redhat.com>
> > 
> > The patch a7c3e901 ("mm: introduce kv[mz]alloc helpers") converted a lot 
> > of kernel code to kvmalloc. This patch converts three more forgotten 
> > cases.
> 
> Thanks! I have two remarks below but other than that feel free to add
> 
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> 
> Acked-by: Michal Hocko <mhocko@suse.com>
> [...]
> > Index: linux-2.6/kernel/bpf/syscall.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/bpf/syscall.c
> > +++ linux-2.6/kernel/bpf/syscall.c
> > @@ -58,16 +58,7 @@ void *bpf_map_area_alloc(size_t size)
> >  	 * trigger under memory pressure as we really just want to
> >  	 * fail instead.
> >  	 */
> > -	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> > -	void *area;
> > -
> > -	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> > -		area = kmalloc(size, GFP_USER | flags);
> > -		if (area != NULL)
> > -			return area;
> > -	}
> > -
> > -	return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL);
> > +	return kvmalloc(size, GFP_USER | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO);
> 
> kvzalloc without additional flags would be more appropriate.
> __GFP_NORETRY is explicitly documented as non-supported

How is __GFP_NORETRY non-supported?

> and NOWARN wouldn't be applied everywhere in the vmalloc path.

__GFP_NORETRY and __GFP_NOWARN wouldn't be applied in the page-table 
allocation and they would be applied in the page allocation - that seems 
acceptable.

But the problem here is that if the system is under memory stress, 
__GFP_NORETRY allocations would randomly fail (they would fail for example 
if there's a plenty of free swap space and the system is busy swapping) 
and that would make the BFP creation code randomly fail.

BPF maintainers, please explain, how are you dealing with the random 
memory allocation failures? Is there some other code in the BPF stack that 
retries the failed allocations?

> >  }
> >  
> >  void bpf_map_area_free(void *area)
> > Index: linux-2.6/kernel/cgroup/cgroup-v1.c
> > ===================================================================
> > --- linux-2.6.orig/kernel/cgroup/cgroup-v1.c
> > +++ linux-2.6/kernel/cgroup/cgroup-v1.c
> > @@ -184,15 +184,10 @@ struct cgroup_pidlist {
> >  /*
> >   * The following two functions "fix" the issue where there are more pids
> >   * than kmalloc will give memory for; in such cases, we use vmalloc/vfree.
> > - * TODO: replace with a kernel-wide solution to this problem
> >   */
> > -#define PIDLIST_TOO_LARGE(c) ((c) * sizeof(pid_t) > (PAGE_SIZE * 2))
> >  static void *pidlist_allocate(int count)
> >  {
> > -	if (PIDLIST_TOO_LARGE(count))
> > -		return vmalloc(count * sizeof(pid_t));
> > -	else
> > -		return kmalloc(count * sizeof(pid_t), GFP_KERNEL);
> > +	return kvmalloc(count * sizeof(pid_t), GFP_KERNEL);
> >  }
> 
> I would rather use kvmalloc_array to have an overflow protection as
> well.

Yes.

Mikulas

> >  
> >  static void pidlist_free(void *p)
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] mm: convert three more cases to kvmalloc
  2017-06-30  2:13   ` Mikulas Patocka
@ 2017-06-30  7:21     ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2017-06-30  7:21 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrew Morton,
	Stephen Rothwell, Vlastimil Babka, Andreas Dilger, John Hubbard,
	David Miller, linux-kernel, linux-mm, netdev

On Thu 29-06-17 22:13:26, Mikulas Patocka wrote:
> 
> 
> On Thu, 29 Jun 2017, Michal Hocko wrote:
[...]
> > > Index: linux-2.6/kernel/bpf/syscall.c
> > > ===================================================================
> > > --- linux-2.6.orig/kernel/bpf/syscall.c
> > > +++ linux-2.6/kernel/bpf/syscall.c
> > > @@ -58,16 +58,7 @@ void *bpf_map_area_alloc(size_t size)
> > >  	 * trigger under memory pressure as we really just want to
> > >  	 * fail instead.
> > >  	 */
> > > -	const gfp_t flags = __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO;
> > > -	void *area;
> > > -
> > > -	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
> > > -		area = kmalloc(size, GFP_USER | flags);
> > > -		if (area != NULL)
> > > -			return area;
> > > -	}
> > > -
> > > -	return __vmalloc(size, GFP_KERNEL | flags, PAGE_KERNEL);
> > > +	return kvmalloc(size, GFP_USER | __GFP_NOWARN | __GFP_NORETRY | __GFP_ZERO);
> > 
> > kvzalloc without additional flags would be more appropriate.
> > __GFP_NORETRY is explicitly documented as non-supported
> 
> How is __GFP_NORETRY non-supported?

Because its semantic cannot be guaranteed throughout the alloaction
stack. vmalloc will ignore it e.g. for page table allocations.

> > and NOWARN wouldn't be applied everywhere in the vmalloc path.
> 
> __GFP_NORETRY and __GFP_NOWARN wouldn't be applied in the page-table 
> allocation and they would be applied in the page allocation - that seems 
> acceptable.

This is rather muddy semantic to me. Both page table and the page is an
order-0 allocation. Page table allocations are much less likely but I've
explicitly documented that explicit __GFP_NORETRY is unsupported. Slab
allocation is already __GFP_NORETRY (unless you specify
__GFP_RETRY_MAYFAIL in the current mmotm tree).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-06-30  7:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-29  3:24 [PATCH] mm: convert three more cases to kvmalloc Mikulas Patocka
2017-06-29  7:10 ` Michal Hocko
2017-06-30  2:13   ` Mikulas Patocka
2017-06-30  7:21     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).