From: Claudio Imbrenda <imbrenda@linux.ibm.com>
To: David Hildenbrand <david@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-s390@vger.kernel.org,
kvm@vger.kernel.org, linux-mm@kvack.org,
Christian Borntraeger <borntraeger@de.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Cornelia Huck <cohuck@redhat.com>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Niklas Schnelle <schnelle@linux.ibm.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Ulrich Weigand <Ulrich.Weigand@de.ibm.com>
Subject: Re: [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions
Date: Mon, 27 Sep 2021 18:37:59 +0200 [thread overview]
Message-ID: <20210927183759.72a29645@p-imbrenda> (raw)
In-Reply-To: <20210909162248.14969-5-david@redhat.com>
On Thu, 9 Sep 2021 18:22:43 +0200
David Hildenbrand <david@redhat.com> wrote:
> There are multiple things broken about our storage key handling
> functions:
>
> 1. We should not walk/touch page tables outside of VMA boundaries when
> holding only the mmap sem in read mode. Evil user space can modify the
> VMA layout just before this function runs and e.g., trigger races with
> page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> with read mmap_sem in munmap"). gfn_to_hva() will only translate using
> KVM memory regions, but won't validate the VMA.
>
> 2. We should not allocate page tables outside of VMA boundaries: if
> evil user space decides to map hugetlbfs to these ranges, bad things
> will happen because we suddenly have PTE or PMD page tables where we
> shouldn't have them.
>
> 3. We don't handle large PUDs that might suddenly appeared inside our page
> table hierarchy.
>
> Don't manually allocate page tables, properly validate that we have VMA and
> bail out on pud_large().
>
> All callers of page table handling functions, except
> get_guest_storage_key(), call fixup_user_fault() in case they
> receive an -EFAULT and retry; this will allocate the necessary page tables
> if required.
>
> To keep get_guest_storage_key() working as expected and not requiring
> kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
> "there is simply no page table or huge page yet and the key is assumed
> to be 0" and "this is a fault to be reported".
>
> Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling")
> introduced most of the affected code, it was actually already broken
> before when using get_locked_pte() without any VMA checks.
>
> Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key
> handling") we can no longer set a guest storage key (for example from
> QEMU during VM live migration) without actually resolving a fault.
> Although we would have created most page tables, we would choke on the
> !pmd_present(), requiring a call to fixup_user_fault(). I would
> have thought that this is problematic in combination with postcopy life
> migration ... but nobody noticed and this patch doesn't change the
> situation. So maybe it's just fine.
>
> Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key")
> Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key")
> Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility")
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> ---
> arch/s390/mm/pgtable.c | 57 +++++++++++++++++++++++++++++-------------
> 1 file changed, 39 insertions(+), 18 deletions(-)
>
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 54969e0f3a94..5fb409ff7842 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -429,22 +429,36 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
> }
>
> #ifdef CONFIG_PGSTE
> -static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
> +static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp)
> {
> + struct vm_area_struct *vma;
> pgd_t *pgd;
> p4d_t *p4d;
> pud_t *pud;
> - pmd_t *pmd;
> +
> + /* We need a valid VMA, otherwise this is clearly a fault. */
> + vma = vma_lookup(mm, addr);
> + if (!vma)
> + return -EFAULT;
>
> pgd = pgd_offset(mm, addr);
> - p4d = p4d_alloc(mm, pgd, addr);
> - if (!p4d)
> - return NULL;
> - pud = pud_alloc(mm, p4d, addr);
> - if (!pud)
> - return NULL;
> - pmd = pmd_alloc(mm, pud, addr);
> - return pmd;
> + if (!pgd_present(*pgd))
> + return -ENOENT;
> +
> + p4d = p4d_offset(pgd, addr);
> + if (!p4d_present(*p4d))
> + return -ENOENT;
> +
> + pud = pud_offset(p4d, addr);
> + if (!pud_present(*pud))
> + return -ENOENT;
> +
> + /* Large PUDs are not supported yet. */
> + if (pud_large(*pud))
> + return -EFAULT;
> +
> + *pmdp = pmd_offset(pud, addr);
> + return 0;
> }
> #endif
>
> @@ -778,8 +792,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
> pmd_t *pmdp;
> pte_t *ptep;
>
> - pmdp = pmd_alloc_map(mm, addr);
> - if (unlikely(!pmdp))
> + if (pmd_lookup(mm, addr, &pmdp))
> return -EFAULT;
>
> ptl = pmd_lock(mm, pmdp);
> @@ -881,8 +894,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
> pte_t *ptep;
> int cc = 0;
>
> - pmdp = pmd_alloc_map(mm, addr);
> - if (unlikely(!pmdp))
> + if (pmd_lookup(mm, addr, &pmdp))
> return -EFAULT;
>
> ptl = pmd_lock(mm, pmdp);
> @@ -935,15 +947,24 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
> pmd_t *pmdp;
> pte_t *ptep;
>
> - pmdp = pmd_alloc_map(mm, addr);
> - if (unlikely(!pmdp))
> + /*
> + * If we don't have a PTE table and if there is no huge page mapped,
> + * the storage key is 0.
> + */
> + *key = 0;
> +
> + switch (pmd_lookup(mm, addr, &pmdp)) {
> + case -ENOENT:
> + return 0;
> + case 0:
> + break;
> + default:
> return -EFAULT;
> + }
>
> ptl = pmd_lock(mm, pmdp);
> if (!pmd_present(*pmdp)) {
> - /* Not yet mapped memory has a zero key */
> spin_unlock(ptl);
> - *key = 0;
> return 0;
> }
>
next prev parent reply other threads:[~2021-09-27 16:42 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-09 16:22 [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 1/9] s390/gmap: validate VMA in __gmap_zap() David Hildenbrand
2021-09-14 16:53 ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() " David Hildenbrand
2021-09-14 16:52 ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions David Hildenbrand
2021-09-14 16:54 ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions David Hildenbrand
2021-09-27 16:37 ` Claudio Imbrenda [this message]
2021-09-09 16:22 ` [PATCH resend RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
2021-09-14 16:53 ` Claudio Imbrenda
2021-09-14 22:41 ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
2021-09-14 16:54 ` Claudio Imbrenda
2021-09-14 22:41 ` Liam Howlett
2021-09-09 16:22 ` [PATCH resend RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present David Hildenbrand
2021-09-14 16:54 ` Claudio Imbrenda
2021-09-14 17:23 ` David Hildenbrand
2021-09-09 16:22 ` [PATCH resend RFC 8/9] s390/mm: optimize set_guest_storage_key() David Hildenbrand
2021-09-27 17:01 ` Claudio Imbrenda
2021-09-09 16:22 ` [PATCH resend RFC 9/9] s390/mm: optimize reset_guest_reference_bit() David Hildenbrand
2021-09-27 17:02 ` Claudio Imbrenda
2021-09-14 16:50 ` [PATCH resend RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers Claudio Imbrenda
2021-09-14 18:06 ` David Hildenbrand
2021-09-28 10:59 ` Heiko Carstens
2021-09-28 11:06 ` Christian Borntraeger
2021-09-28 14:38 ` Claudio Imbrenda
2021-09-28 16:03 ` Christian Borntraeger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210927183759.72a29645@p-imbrenda \
--to=imbrenda@linux.ibm.com \
--cc=Ulrich.Weigand@de.ibm.com \
--cc=borntraeger@de.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=frankja@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).