Re: [kvm-unit-tests PATCH v2 6/6] lib/x86/vm: enable malloc and friends

From: Andrew Jones <drjones@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, lvivier@redhat.com, thuth@redhat.com
Subject: Re: [kvm-unit-tests PATCH v2 6/6] lib/x86/vm: enable malloc and friends
Date: Thu, 3 Nov 2016 17:51:29 +0100	[thread overview]
Message-ID: <20161103165129.dtb265bwjqexdl7y@kamzik.brq.redhat.com> (raw)
In-Reply-To: <cfb80792-b31c-207a-fe41-b24e53803ec7@redhat.com>

On Thu, Nov 03, 2016 at 05:00:01PM +0100, Paolo Bonzini wrote:
> 
> 
> On 03/11/2016 15:58, Andrew Jones wrote:
> > Actually I have it envisioned the other way around. phys_alloc should
> > only be for early allocations, before enabling the MMU (which is why
> > it's called "phys"_alloc).
> 
> Yes, I agree.  Putting your other suggestions together, you'd have for
> early allocation:
> 
>    phys        // provide contiguous physical addresses
>     |
>    morecore    // provide contiguous RAM
>     |
>    malloc      // full API
> 
> Later, you'd switch morecore to also take care of installing PTEs:
> 
>     page     virt
>       \       /
>       morecore
>           |
>        malloc

Yup, I think we're on the same page (no pun intended). However, as
morecore needs to take both size and alignment arguments in order
to be able to cope with any offsets that a memalign call would create,
then I was planning to just keep it named early_memalign.

> 
> Now morecore talks to both page and virt.  page (alloc_page) is
> initialized with whatever memory wasn't allocated by phys, and takes up
> its job.  virt (alloc_vpages) allocates virtual address space.  morecore
> is now taking not necessarily consecutive physical memory at page
> granularity, assigning consecutive virtual address to it, and if needed
> slicing pages into smaller allocations.

vmalloc does all that except the slicing up feature, which is left for
a rainy day... So I think the implementation I have here for vmemalign
is still valid for x86's "morecore".

> 
> morecore is not an external API, it's just malloc's pluggable interface.
>  page/virt/malloc are all external APIs.  phys could be, but nothing

Right, both vmemalign and early_memalign are internal only (static) and
alloc_page and alloc_vpages and malloc/calloc/memalign/free are external.
x86 also has vmalloc/vfree (the building block for vmemalign) which allow
callers to explicitly ask for virtual addresses corresponding to a new
allocation. We should keep that external because malloc doesn't imply
you're getting PTEs set, as it depends on the stage/degree of test init.

> uses phys_alloc_aligned and phys_zalloc_aligned so we might as well zap it.

These two are useful for finding/getting the "whatever memory wasn't
allocated by phys" you state above. To do that, you simple allocate one
more region containing the amount of memory you want the new allocator
to have with one of them. I'll zap phys_alloc and phys_zalloc though.

> 
> Note that you can choose separately at each layer whether to implement
> freeing or not.  I think we only need to have it in "page" and "malloc"
> (though in the latter it can be dummy, as it is now, since "malloc" is
> just a small veneer around morecore).

Yup, check.

> 
> >> 	addr = malloc_free;
> >> 	available = malloc_top - addr;
> >> 	if (available < size) {
> >> 		if (vfree_bottom == malloc_top + 1) {
> >> 			// can reuse top of previous segment
> >> 			vsize = PAGE_ALIGN(size - available);
> >> 			vaddr = alloc_vpages(vsize >> PAGE_SHIFT);
> >> 			assert(vaddr == malloc_top + 1);
> >> 		} else {
> >> 			// ignore free space at end of previous segment
> >> 			vsize = PAGE_ALIGN(size);
> >> 			addr = vaddr =
> >> 				alloc_vpages(vsize >> PAGE_SHIFT);
> >> 		}
> >> 		malloc_top = vaddr + vsize - 1;
> >> 	}
> >> 	malloc_free = addr + size;
> >> 	return addr;
> > 
> > I agree lib/x86/vm.c needs to change it's virtual address allocation
> > scheme. It allocates down and never allows an address to be reused
> > (but maybe that's by design to avoid TLB flushes?) I didn't feel like
> > tackling that right now though. I'm not sure what the function you've
> > written should be used for. To me, it looks like it's an extension of
> > the virtual address management already available. Where do alloc_page()
> > and install_page() come in, like vmalloc has?
> 
> The alloc_page()/install_page() loop would be before setting malloc_top.

Ok, so if we don't want to rework vmalloc/alloc_vpages for this series
to use a different virtual address allocation scheme (I don't ;-), then
I think this part can be left for another time.

Thanks,
drew