From mboxrd@z Thu Jan  1 00:00:00 1970
From: catalin.marinas@arm.com (Catalin Marinas)
Date: Wed, 8 Oct 2014 10:47:04 +0100
Subject: [PATCH v2 1/3] arm64: KVM: Implement 48 VA support for KVM EL2
 and Stage-2
In-Reply-To: <20141007193954.GI3717@cbox>
References: <1412627427-28629-1-git-send-email-christoffer.dall@linaro.org>
 <1412627427-28629-2-git-send-email-christoffer.dall@linaro.org>
 <20141007104846.GB12675@e104818-lin.cambridge.arm.com>
 <5433EA8B.6010502@arm.com> <20141007193954.GI3717@cbox>
Message-ID: <20141008094703.GB5906@e104818-lin.cambridge.arm.com>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On Tue, Oct 07, 2014 at 08:39:54PM +0100, Christoffer Dall wrote:
> I came up with the following based on your feedback, but I personally
> don't find it a lot easier to read than what I had already.  Suggestions
> are welcome:

At least PTRS_PER_S2_PGD and KVM_PREALLOC_LEVEL are clearer to me as
formulas than the magic numbers.

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index a030d16..7941a51 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
[...]
> +/*
> + * In the case where PGDIR_SHIFT is larger than KVM_PHYS_SHIFT, we can address
> + * the entire IPA input range with a single pgd entry, and we would only need
> + * one pgd entry.
> + */

It may be worth here stating that this pgd is actually fake (covered
below as well). Maybe something like "single (fake) pgd entry".

> +#if PGDIR_SHIFT > KVM_PHYS_SHIFT
> +#define PTRS_PER_S2_PGD                (1)
> +#else
> +#define PTRS_PER_S2_PGD                (1 << (KVM_PHYS_SHIFT - PGDIR_SHIFT))
> +#endif
> +#define S2_PGD_ORDER           get_order(PTRS_PER_S2_PGD * sizeof(pgd_t))
> 
> +/*
> + * If we are concatenating first level stage-2 page tables, we would have less
> + * than or equal to 16 pointers in the fake PGD, because that's what the
> + * architecture allows.  In this case, (4 - CONFIG_ARM64_PGTABLE_LEVELS)
> + * represents the first level for the host, and we add 1 to go to the next
> + * level (which uses contatenation) for the stage-2 tables.
> + */
> +#if PTRS_PER_S2_PGD <= 16
> +#define KVM_PREALLOC_LEVEL     (4 - CONFIG_ARM64_PGTABLE_LEVELS + 1)
> +#else
> +#define KVM_PREALLOC_LEVEL     (0)
> +#endif
> +
> +/**
> + * kvm_prealloc_hwpgd - allocate inital table for VTTBR
> + * @kvm:       The KVM struct pointer for the VM.
> + * @pgd:       The kernel pseudo pgd
> + *
> + * When the kernel uses more levels of page tables than the guest, we allocate
> + * a fake PGD and pre-populate it to point to the next-level page table, which
> + * will be the real initial page table pointed to by the VTTBR.
> + *
> + * When KVM_PREALLOC_LEVEL==2, we allocate a single page for the PMD and
> + * the kernel will use folded pud.  When KVM_PREALLOC_LEVEL==1, we
> + * allocate 2 consecutive PUD pages.
> + */

I don't have a strong preference here, if you find the code easier to
read as separate kvm_prealloc_hwpgd() functions, use those, as per your
original patch. My point was to no longer define the functions based on
#if 64K && 3-levels etc. but only on KVM_PREALLOC_LEVEL.

Anyway, I think the code below looks ok, with some fixes.

> +static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
> +{
> +       pud_t *pud;
> +       pmd_t *pmd;
> +       unsigned int order, i;
> +       unsigned long hwpgd;
> +
> +       if (KVM_PREALLOC_LEVEL == 0)
> +               return 0;
> +
> +       order = get_order(PTRS_PER_S2_PGD);

Isn't order always 0 here? Based on our IRC discussion, PTRS_PER_S2_PGD
is 16 or less and the order should not be used.

> +       hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, order);

I assume you need __get_free_pages() for alignment.

> +       if (!hwpgd)
> +               return -ENOMEM;
> +
> +       if (KVM_PREALLOC_LEVEL == 1) {
> +               pud = (pud_t *)hwpgd;
> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> +                       pgd_populate(NULL, pgd + i, pud + i * PTRS_PER_PUD);
> +       } else if (KVM_PREALLOC_LEVEL == 2) {
> +               pud = pud_offset(pgd, 0);
> +               pmd = (pmd_t *)hwpgd;
> +               for (i = 0; i < PTRS_PER_S2_PGD; i++)
> +                       pud_populate(NULL, pud + i, pmd + i * PTRS_PER_PMD);
> +       }

It could be slightly shorter as (I can't guarantee clearer ;)):

	for (i = 0; i < PTRS_PER_S2_PGD; i++) {
		if (KVM_PREALLOC_LEVEL == 1)
			pgd_populate(NULL, pgd + i,
				     (pud_t *)hwpgd + i * PTRS_PER_PUD);
		else if (KVM_PREALLOC_LEVEL == 2)
			pud_populate(NULL, pud_offset(pgd, 0) + i,
				     (pmd_t *)hwpgd + i * PTRS_PER_PMD)
	}

Or you could write a kvm_populate_swpgd() to handle the ifs and casting.

> +
> +       return 0;
> +}
> +
> +static inline void *kvm_get_hwpgd(struct kvm *kvm)
> +{
> +       pgd_t *pgd = kvm->arch.pgd;
> +       pud_t *pud;
> +       pmd_t *pmd;
> +
> +       switch (KVM_PREALLOC_LEVEL) {
> +       case 0:
> +               return pgd;
> +       case 1:
> +               pud = pud_offset(pgd, 0);
> +               return pud;
> +       case 2:
> +               pud = pud_offset(pgd, 0);
> +               pmd = pmd_offset(pud, 0);
> +               return pmd;
> +       default:
> +               BUG();
> +               return NULL;
> +       }

	/* not needed? Use BUG_ON or BUILD_BUG_ON */
	if (KVM_PREALLOC_LEVEL == 0)
		return pgd;

	pud = pud_offset(pgd, 0);
	if (KVM_PREALLOC_LEVEL == 1)
		return pud;

	return pmd_offset(pud, 0);

You don't need KVM_PREALLOC_LEVEL == 0 case since this function wouldn't
be called. So you could do with some (BUILD_)BUG_ON and 4 lines after.

-- 
Catalin