From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933445AbbGGVl1 (ORCPT ); Tue, 7 Jul 2015 17:41:27 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:58713 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933034AbbGGVlT (ORCPT ); Tue, 7 Jul 2015 17:41:19 -0400 Date: Tue, 7 Jul 2015 14:41:17 -0700 From: Andrew Morton To: Mikulas Patocka Cc: Mike Snitzer , "Alasdair G. Kergon" , Edward Thornber , David Rientjes , Vivek Goyal , linux-kernel@vger.kernel.org, linux-mm@kvack.org, dm-devel@redhat.com, Linus Torvalds Subject: Re: [PATCH 2/7] mm: introduce kvmalloc and kvmalloc_node Message-Id: <20150707144117.5b38ac38efda238af8a1f536@linux-foundation.org> In-Reply-To: References: X-Mailer: Sylpheed 3.4.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 7 Jul 2015 11:10:09 -0400 (EDT) Mikulas Patocka wrote: > Introduce the functions kvmalloc and kvmalloc_node. These functions > provide reliable allocation of object of arbitrary size. They attempt to > do allocation with kmalloc and if it fails, use vmalloc. Memory allocated > with these functions should be freed with kvfree. Sigh. We've resisted doing this because vmalloc() is somewhat of a bad thing, and we don't want to make it easy for people to do bad things. And vmalloc is bad because a) it's slow and b) it does GFP_KERNEL allocations for page tables and c) it is susceptible to arena fragmentation. We'd prefer that people fix their junk so it doesn't depend upon large contiguous allocations. This isn't userspace - kernel space is hostile and kernel code should be robust. So I dunno. Should we continue to make it a bit more awkward to use vmalloc()? Probably that tactic isn't being very successful - people will just go ahead and open-code it. And given the surprising amount of stuff you've placed in kvmalloc_node(), they'll implement it incorrectly... How about we compromise: add kvmalloc_node(), but include a BUG_ON("you suck") to it? > > ... > > +void *kvmalloc_node(size_t size, gfp_t gfp, int node) > +{ > + void *p; > + unsigned uninitialized_var(noio_flag); > + > + /* vmalloc doesn't support no-wait allocations */ > + WARN_ON(!(gfp & __GFP_WAIT)); It could be a WARN_ON_ONCE, but that doesn't seem very important. > + if (likely(size <= KMALLOC_MAX_SIZE)) { > + p = kmalloc_node(size, gfp | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN, node); There is no way in which a reader will be able to work out the reason for this selection of flags. Heck, this reviewer can't work it out either. Can we please have a code comment in there which reveals all this? Also, it would be nice to find a tasteful way of squeezing this into 80 cols. > + if (likely(p != NULL)) > + return p; > + } > + if ((gfp & (__GFP_IO | __GFP_FS)) != (__GFP_IO | __GFP_FS)) { > + /* > + * vmalloc allocates page tables with GFP_KERNEL, regardless > + * of GFP flags passed to it. If we are no GFP_NOIO context, > + * we call memalloc_noio_save, so that all allocations are > + * implicitly done with GFP_NOIO. OK. But why do we turn on __GFP_HIGH? > + */ > + noio_flag = memalloc_noio_save(); > + gfp |= __GFP_HIGH; > + } > + p = __vmalloc_node_flags(size, node, gfp | __GFP_REPEAT | __GFP_HIGHMEM); Again, please document the __GFP_REPEAT reasoning. __vmalloc_node_flags() handles __GFP_ZERO, I believe? So we presently don't have a kvzalloc() - callers are to open-code the __GFP_ZERO. I suppose we may as well go ahead and add the 4-line wrapper for kvzalloc(). > + if ((gfp & (__GFP_IO | __GFP_FS)) != (__GFP_IO | __GFP_FS)) { > + memalloc_noio_restore(noio_flag); > + } scripts/checkpatch.pl is your friend! > + return p; > +} > +EXPORT_SYMBOL(kvmalloc_node); > + > +void *kvmalloc(size_t size, gfp_t gfp) > +{ > + return kvmalloc_node(size, gfp, NUMA_NO_NODE); > +} > +EXPORT_SYMBOL(kvmalloc); It's probably better to switch this to a static inline. That's a bit faster and will save a bit of stack on a stack-heavy code path. Unless gcc manages to do a tailcall, but it doesn't seem to do that much.