All of lore.kernel.org
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-mm@kvack.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Subject: Re: [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions
Date: Mon, 27 Sep 2021 18:37:59 +0200	[thread overview]
Message-ID: <20210927183759.72a29645@p-imbrenda> (raw)
In-Reply-To: <20210909162248.14969-5-david@redhat.com>

On Thu,  9 Sep 2021 18:22:43 +0200
David Hildenbrand <david@redhat.com> wrote:

> There are multiple things broken about our storage key handling
> functions:
> 
> 1. We should not walk/touch page tables outside of VMA boundaries when
>    holding only the mmap sem in read mode. Evil user space can modify the
>    VMA layout just before this function runs and e.g., trigger races with
>    page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
>    with read mmap_sem in munmap"). gfn_to_hva() will only translate using
>    KVM memory regions, but won't validate the VMA.
> 
> 2. We should not allocate page tables outside of VMA boundaries: if
>    evil user space decides to map hugetlbfs to these ranges, bad things
>    will happen because we suddenly have PTE or PMD page tables where we
>    shouldn't have them.
> 
> 3. We don't handle large PUDs that might suddenly appeared inside our page
>    table hierarchy.
> 
> Don't manually allocate page tables, properly validate that we have VMA and
> bail out on pud_large().
> 
> All callers of page table handling functions, except
> get_guest_storage_key(), call fixup_user_fault() in case they
> receive an -EFAULT and retry; this will allocate the necessary page tables
> if required.
> 
> To keep get_guest_storage_key() working as expected and not requiring
> kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
> "there is simply no page table or huge page yet and the key is assumed
> to be 0" and "this is a fault to be reported".
> 
> Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling")
> introduced most of the affected code, it was actually already broken
> before when using get_locked_pte() without any VMA checks.
> 
> Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key
> handling") we can no longer set a guest storage key (for example from
> QEMU during VM live migration) without actually resolving a fault.
> Although we would have created most page tables, we would choke on the
> !pmd_present(), requiring a call to fixup_user_fault(). I would
> have thought that this is problematic in combination with postcopy life
> migration ... but nobody noticed and this patch doesn't change the
> situation. So maybe it's just fine.
> 
> Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key")
> Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key")
> Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility")
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  arch/s390/mm/pgtable.c | 57 +++++++++++++++++++++++++++++-------------
>  1 file changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 54969e0f3a94..5fb409ff7842 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -429,22 +429,36 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
>  }
>  
>  #ifdef CONFIG_PGSTE
> -static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
> +static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp)
>  {
> +	struct vm_area_struct *vma;
>  	pgd_t *pgd;
>  	p4d_t *p4d;
>  	pud_t *pud;
> -	pmd_t *pmd;
> +
> +	/* We need a valid VMA, otherwise this is clearly a fault. */
> +	vma = vma_lookup(mm, addr);
> +	if (!vma)
> +		return -EFAULT;
>  
>  	pgd = pgd_offset(mm, addr);
> -	p4d = p4d_alloc(mm, pgd, addr);
> -	if (!p4d)
> -		return NULL;
> -	pud = pud_alloc(mm, p4d, addr);
> -	if (!pud)
> -		return NULL;
> -	pmd = pmd_alloc(mm, pud, addr);
> -	return pmd;
> +	if (!pgd_present(*pgd))
> +		return -ENOENT;
> +
> +	p4d = p4d_offset(pgd, addr);
> +	if (!p4d_present(*p4d))
> +		return -ENOENT;
> +
> +	pud = pud_offset(p4d, addr);
> +	if (!pud_present(*pud))
> +		return -ENOENT;
> +
> +	/* Large PUDs are not supported yet. */
> +	if (pud_large(*pud))
> +		return -EFAULT;
> +
> +	*pmdp = pmd_offset(pud, addr);
> +	return 0;
>  }
>  #endif
>  
> @@ -778,8 +792,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	pmd_t *pmdp;
>  	pte_t *ptep;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	if (pmd_lookup(mm, addr, &pmdp))
>  		return -EFAULT;
>  
>  	ptl = pmd_lock(mm, pmdp);
> @@ -881,8 +894,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
>  	pte_t *ptep;
>  	int cc = 0;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	if (pmd_lookup(mm, addr, &pmdp))
>  		return -EFAULT;
>  
>  	ptl = pmd_lock(mm, pmdp);
> @@ -935,15 +947,24 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	pmd_t *pmdp;
>  	pte_t *ptep;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	/*
> +	 * If we don't have a PTE table and if there is no huge page mapped,
> +	 * the storage key is 0.
> +	 */
> +	*key = 0;
> +
> +	switch (pmd_lookup(mm, addr, &pmdp)) {
> +	case -ENOENT:
> +		return 0;
> +	case 0:
> +		break;
> +	default:
>  		return -EFAULT;
> +	}
>  
>  	ptl = pmd_lock(mm, pmdp);
>  	if (!pmd_present(*pmdp)) {
> -		/* Not yet mapped memory has a zero key */
>  		spin_unlock(ptl);
> -		*key = 0;
>  		return 0;
>  	}
>  


  reply	other threads:[~2021-09-27 16:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 16:22 [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 1/9] s390/gmap: validate VMA in __gmap_zap() David Hildenbrand
2021-09-14 16:53   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() " David Hildenbrand
2021-09-14 16:52   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions David Hildenbrand
2021-09-27 16:37   ` Claudio Imbrenda [this message]
2021-09-09 16:22 ` [PATCH resend RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
2021-09-14 16:53   ` Claudio Imbrenda
2021-09-14 22:41   ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-14 22:41   ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-14 17:23     ` David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 8/9] s390/mm: optimize set_guest_storage_key() David Hildenbrand
2021-09-27 17:01   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 9/9] s390/mm: optimize reset_guest_reference_bit() David Hildenbrand
2021-09-27 17:02   ` Claudio Imbrenda
2021-09-14 16:50 ` [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers Claudio Imbrenda
2021-09-14 18:06   ` David Hildenbrand
2021-09-28 10:59 ` Heiko Carstens
2021-09-28 11:06   ` Christian Borntraeger
2021-09-28 14:38     ` Claudio Imbrenda
2021-09-28 16:03 ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210927183759.72a29645@p-imbrenda \
    --to=imbrenda@linux.ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.