linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
	kvm@vger.kernel.org, linux-mm@kvack.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Niklas Schnelle <schnelle@linux.ibm.com>,
	Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
	Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Subject: Re: [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions
Date: Mon, 27 Sep 2021 18:37:59 +0200	[thread overview]
Message-ID: <20210927183759.72a29645@p-imbrenda> (raw)
In-Reply-To: <20210909162248.14969-5-david@redhat.com>

On Thu,  9 Sep 2021 18:22:43 +0200
David Hildenbrand <david@redhat.com> wrote:

> There are multiple things broken about our storage key handling
> functions:
> 
> 1. We should not walk/touch page tables outside of VMA boundaries when
>    holding only the mmap sem in read mode. Evil user space can modify the
>    VMA layout just before this function runs and e.g., trigger races with
>    page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
>    with read mmap_sem in munmap"). gfn_to_hva() will only translate using
>    KVM memory regions, but won't validate the VMA.
> 
> 2. We should not allocate page tables outside of VMA boundaries: if
>    evil user space decides to map hugetlbfs to these ranges, bad things
>    will happen because we suddenly have PTE or PMD page tables where we
>    shouldn't have them.
> 
> 3. We don't handle large PUDs that might suddenly appeared inside our page
>    table hierarchy.
> 
> Don't manually allocate page tables, properly validate that we have VMA and
> bail out on pud_large().
> 
> All callers of page table handling functions, except
> get_guest_storage_key(), call fixup_user_fault() in case they
> receive an -EFAULT and retry; this will allocate the necessary page tables
> if required.
> 
> To keep get_guest_storage_key() working as expected and not requiring
> kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
> "there is simply no page table or huge page yet and the key is assumed
> to be 0" and "this is a fault to be reported".
> 
> Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling")
> introduced most of the affected code, it was actually already broken
> before when using get_locked_pte() without any VMA checks.
> 
> Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key
> handling") we can no longer set a guest storage key (for example from
> QEMU during VM live migration) without actually resolving a fault.
> Although we would have created most page tables, we would choke on the
> !pmd_present(), requiring a call to fixup_user_fault(). I would
> have thought that this is problematic in combination with postcopy life
> migration ... but nobody noticed and this patch doesn't change the
> situation. So maybe it's just fine.
> 
> Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key")
> Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key")
> Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility")
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

> ---
>  arch/s390/mm/pgtable.c | 57 +++++++++++++++++++++++++++++-------------
>  1 file changed, 39 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 54969e0f3a94..5fb409ff7842 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -429,22 +429,36 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
>  }
>  
>  #ifdef CONFIG_PGSTE
> -static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
> +static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp)
>  {
> +	struct vm_area_struct *vma;
>  	pgd_t *pgd;
>  	p4d_t *p4d;
>  	pud_t *pud;
> -	pmd_t *pmd;
> +
> +	/* We need a valid VMA, otherwise this is clearly a fault. */
> +	vma = vma_lookup(mm, addr);
> +	if (!vma)
> +		return -EFAULT;
>  
>  	pgd = pgd_offset(mm, addr);
> -	p4d = p4d_alloc(mm, pgd, addr);
> -	if (!p4d)
> -		return NULL;
> -	pud = pud_alloc(mm, p4d, addr);
> -	if (!pud)
> -		return NULL;
> -	pmd = pmd_alloc(mm, pud, addr);
> -	return pmd;
> +	if (!pgd_present(*pgd))
> +		return -ENOENT;
> +
> +	p4d = p4d_offset(pgd, addr);
> +	if (!p4d_present(*p4d))
> +		return -ENOENT;
> +
> +	pud = pud_offset(p4d, addr);
> +	if (!pud_present(*pud))
> +		return -ENOENT;
> +
> +	/* Large PUDs are not supported yet. */
> +	if (pud_large(*pud))
> +		return -EFAULT;
> +
> +	*pmdp = pmd_offset(pud, addr);
> +	return 0;
>  }
>  #endif
>  
> @@ -778,8 +792,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	pmd_t *pmdp;
>  	pte_t *ptep;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	if (pmd_lookup(mm, addr, &pmdp))
>  		return -EFAULT;
>  
>  	ptl = pmd_lock(mm, pmdp);
> @@ -881,8 +894,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
>  	pte_t *ptep;
>  	int cc = 0;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	if (pmd_lookup(mm, addr, &pmdp))
>  		return -EFAULT;
>  
>  	ptl = pmd_lock(mm, pmdp);
> @@ -935,15 +947,24 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
>  	pmd_t *pmdp;
>  	pte_t *ptep;
>  
> -	pmdp = pmd_alloc_map(mm, addr);
> -	if (unlikely(!pmdp))
> +	/*
> +	 * If we don't have a PTE table and if there is no huge page mapped,
> +	 * the storage key is 0.
> +	 */
> +	*key = 0;
> +
> +	switch (pmd_lookup(mm, addr, &pmdp)) {
> +	case -ENOENT:
> +		return 0;
> +	case 0:
> +		break;
> +	default:
>  		return -EFAULT;
> +	}
>  
>  	ptl = pmd_lock(mm, pmdp);
>  	if (!pmd_present(*pmdp)) {
> -		/* Not yet mapped memory has a zero key */
>  		spin_unlock(ptl);
> -		*key = 0;
>  		return 0;
>  	}
>  


  reply	other threads:[~2021-09-27 16:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-09 16:22 [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 1/9] s390/gmap: validate VMA in __gmap_zap() David Hildenbrand
2021-09-14 16:53   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() " David Hildenbrand
2021-09-14 16:52   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions David Hildenbrand
2021-09-27 16:37   ` Claudio Imbrenda [this message]
2021-09-09 16:22 ` [PATCH resend RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
2021-09-14 16:53   ` Claudio Imbrenda
2021-09-14 22:41   ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-14 22:41   ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present David Hildenbrand
2021-09-14 16:54   ` Claudio Imbrenda
2021-09-14 17:23     ` David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 8/9] s390/mm: optimize set_guest_storage_key() David Hildenbrand
2021-09-27 17:01   ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 9/9] s390/mm: optimize reset_guest_reference_bit() David Hildenbrand
2021-09-27 17:02   ` Claudio Imbrenda
2021-09-14 16:50 ` [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers Claudio Imbrenda
2021-09-14 18:06   ` David Hildenbrand
2021-09-28 10:59 ` Heiko Carstens
2021-09-28 11:06   ` Christian Borntraeger
2021-09-28 14:38     ` Claudio Imbrenda
2021-09-28 16:03 ` Christian Borntraeger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210927183759.72a29645@p-imbrenda \
    --to=imbrenda@linux.ibm.com \
    --cc=Ulrich.Weigand@de.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=frankja@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=schnelle@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).