* [PATCH RFC 1/9] s390/gmap: validate VMA in __gmap_zap()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() " David Hildenbrand
` (7 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap"). The pure prescence in our guest_to_host
radix tree does not imply that there is a VMA.
Further, we should not allocate page tables (via get_locked_pte()) outside
of VMA boundaries: if evil user space decides to map hugetlbfs to these
ranges, bad things will happen because we suddenly have PTE or PMD page
tables where we shouldn't have them.
Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().
Note that gmap_discard() is different:
zap_page_range()->unmap_single_vma() makes sure to stay within VMA
boundaries.
Fixes: b31288fa83b2 ("s390/kvm: support collaborative memory management")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/gmap.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 9bb2c7512cd5..b6b56cd4ca64 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -673,6 +673,7 @@ EXPORT_SYMBOL_GPL(gmap_fault);
*/
void __gmap_zap(struct gmap *gmap, unsigned long gaddr)
{
+ struct vm_area_struct *vma;
unsigned long vmaddr;
spinlock_t *ptl;
pte_t *ptep;
@@ -682,6 +683,11 @@ void __gmap_zap(struct gmap *gmap, unsigned long gaddr)
gaddr >> PMD_SHIFT);
if (vmaddr) {
vmaddr |= gaddr & ~PMD_MASK;
+
+ vma = vma_lookup(gmap->mm, vmaddr);
+ if (!vma || is_vm_hugetlb_page(vma))
+ return;
+
/* Get pointer to the page table entry */
ptep = get_locked_pte(gmap->mm, vmaddr, &ptl);
if (likely(ptep))
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 1/9] s390/gmap: validate VMA in __gmap_zap() David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions David Hildenbrand
` (6 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
... otherwise we will try unlocking a spinlock that was never locked via a
garbage pointer.
At the time we reach this code path, we usually successfully looked up
a PGSTE already; however, evil user space could have manipulated the VMA
layout in the meantime and triggered removal of the page table.
Fixes: 1e133ab296f3 ("s390/mm: split arch/s390/mm/pgtable.c")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/gmap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index b6b56cd4ca64..9023bf3ced89 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -690,9 +690,10 @@ void __gmap_zap(struct gmap *gmap, unsigned long gaddr)
/* Get pointer to the page table entry */
ptep = get_locked_pte(gmap->mm, vmaddr, &ptl);
- if (likely(ptep))
+ if (likely(ptep)) {
ptep_zap_unused(gmap->mm, vmaddr, ptep, 0);
- pte_unmap_unlock(ptep, ptl);
+ pte_unmap_unlock(ptep, ptl);
+ }
}
}
EXPORT_SYMBOL_GPL(__gmap_zap);
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 1/9] s390/gmap: validate VMA in __gmap_zap() David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 2/9] s390/gmap: don't unconditionally call pte_unmap_unlock() " David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions David Hildenbrand
` (5 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap"). gfn_to_hva() will only translate using
KVM memory regions, but won't validate the VMA.
Further, we should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things will
happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.
Similarly, we have to check if we suddenly find a hugetlbfs VMA, before
calling get_locked_pte().
Fixes: 2d42f9477320 ("s390/kvm: Add PGSTE manipulation functions")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/pgtable.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index eec3a9d7176e..54969e0f3a94 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -988,6 +988,7 @@ EXPORT_SYMBOL(get_guest_storage_key);
int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
unsigned long *oldpte, unsigned long *oldpgste)
{
+ struct vm_area_struct *vma;
unsigned long pgstev;
spinlock_t *ptl;
pgste_t pgste;
@@ -997,6 +998,10 @@ int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc,
WARN_ON_ONCE(orc > ESSA_MAX);
if (unlikely(orc > ESSA_MAX))
return -EINVAL;
+
+ vma = vma_lookup(mm, hva);
+ if (!vma || is_vm_hugetlb_page(vma))
+ return -EFAULT;
ptep = get_locked_pte(mm, hva, &ptl);
if (unlikely(!ptep))
return -EFAULT;
@@ -1089,10 +1094,14 @@ EXPORT_SYMBOL(pgste_perform_essa);
int set_pgste_bits(struct mm_struct *mm, unsigned long hva,
unsigned long bits, unsigned long value)
{
+ struct vm_area_struct *vma;
spinlock_t *ptl;
pgste_t new;
pte_t *ptep;
+ vma = vma_lookup(mm, hva);
+ if (!vma || is_vm_hugetlb_page(vma))
+ return -EFAULT;
ptep = get_locked_pte(mm, hva, &ptl);
if (unlikely(!ptep))
return -EFAULT;
@@ -1117,9 +1126,13 @@ EXPORT_SYMBOL(set_pgste_bits);
*/
int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep)
{
+ struct vm_area_struct *vma;
spinlock_t *ptl;
pte_t *ptep;
+ vma = vma_lookup(mm, hva);
+ if (!vma || is_vm_hugetlb_page(vma))
+ return -EFAULT;
ptep = get_locked_pte(mm, hva, &ptl);
if (unlikely(!ptep))
return -EFAULT;
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (2 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 3/9] s390/mm: validate VMA in PGSTE manipulation functions David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
` (4 subsequent siblings)
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
There are multiple things broken about our storage key handling
functions:
1. We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap"). gfn_to_hva() will only translate using
KVM memory regions, but won't validate the VMA.
2. We should not allocate page tables outside of VMA boundaries: if
evil user space decides to map hugetlbfs to these ranges, bad things
will happen because we suddenly have PTE or PMD page tables where we
shouldn't have them.
3. We don't handle large PUDs that might suddenly appeared inside our page
table hierarchy.
Don't manually allocate page tables, properly validate that we have VMA and
bail out on pud_large().
All callers of page table handling functions, except
get_guest_storage_key(), call fixup_user_fault() in case they
receive an -EFAULT and retry; this will allocate the necessary page tables
if required.
To keep get_guest_storage_key() working as expected and not requiring
kvm_s390_get_skeys() to call fixup_user_fault() distinguish between
"there is simply no page table or huge page yet and the key is assumed
to be 0" and "this is a fault to be reported".
Although commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key handling")
introduced most of the affected code, it was actually already broken
before when using get_locked_pte() without any VMA checks.
Note: Ever since commit 637ff9efe5ea ("s390/mm: Add huge pmd storage key
handling") we can no longer set a guest storage key (for example from
QEMU during VM live migration) without actually resolving a fault.
Although we would have created most page tables, we would choke on the
!pmd_present(), requiring a call to fixup_user_fault(). I would
have thought that this is problematic in combination with postcopy life
migration ... but nobody noticed and this patch doesn't change the
situation. So maybe it's just fine.
Fixes: 9fcf93b5de06 ("KVM: S390: Create helper function get_guest_storage_key")
Fixes: 24d5dd0208ed ("s390/kvm: Provide function for setting the guest storage key")
Fixes: a7e19ab55ffd ("KVM: s390: handle missing storage-key facility")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/pgtable.c | 57 +++++++++++++++++++++++++++++-------------
1 file changed, 39 insertions(+), 18 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54969e0f3a94..5fb409ff7842 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -429,22 +429,36 @@ static inline pmd_t pmdp_flush_lazy(struct mm_struct *mm,
}
#ifdef CONFIG_PGSTE
-static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr)
+static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp)
{
+ struct vm_area_struct *vma;
pgd_t *pgd;
p4d_t *p4d;
pud_t *pud;
- pmd_t *pmd;
+
+ /* We need a valid VMA, otherwise this is clearly a fault. */
+ vma = vma_lookup(mm, addr);
+ if (!vma)
+ return -EFAULT;
pgd = pgd_offset(mm, addr);
- p4d = p4d_alloc(mm, pgd, addr);
- if (!p4d)
- return NULL;
- pud = pud_alloc(mm, p4d, addr);
- if (!pud)
- return NULL;
- pmd = pmd_alloc(mm, pud, addr);
- return pmd;
+ if (!pgd_present(*pgd))
+ return -ENOENT;
+
+ p4d = p4d_offset(pgd, addr);
+ if (!p4d_present(*p4d))
+ return -ENOENT;
+
+ pud = pud_offset(p4d, addr);
+ if (!pud_present(*pud))
+ return -ENOENT;
+
+ /* Large PUDs are not supported yet. */
+ if (pud_large(*pud))
+ return -EFAULT;
+
+ *pmdp = pmd_offset(pud, addr);
+ return 0;
}
#endif
@@ -778,8 +792,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp;
pte_t *ptep;
- pmdp = pmd_alloc_map(mm, addr);
- if (unlikely(!pmdp))
+ if (pmd_lookup(mm, addr, &pmdp))
return -EFAULT;
ptl = pmd_lock(mm, pmdp);
@@ -881,8 +894,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
pte_t *ptep;
int cc = 0;
- pmdp = pmd_alloc_map(mm, addr);
- if (unlikely(!pmdp))
+ if (pmd_lookup(mm, addr, &pmdp))
return -EFAULT;
ptl = pmd_lock(mm, pmdp);
@@ -935,15 +947,24 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp;
pte_t *ptep;
- pmdp = pmd_alloc_map(mm, addr);
- if (unlikely(!pmdp))
+ /*
+ * If we don't have a PTE table and if there is no huge page mapped,
+ * the storage key is 0.
+ */
+ *key = 0;
+
+ switch (pmd_lookup(mm, addr, &pmdp)) {
+ case -ENOENT:
+ return 0;
+ case 0:
+ break;
+ default:
return -EFAULT;
+ }
ptl = pmd_lock(mm, pmdp);
if (!pmd_present(*pmdp)) {
- /* Not yet mapped memory has a zero key */
spin_unlock(ptl);
- *key = 0;
return 0;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 5/9] s390/uv: fully validate the VMA before calling follow_page()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (3 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 4/9] s390/mm: fix VMA and page table handling code in storage key handling functions David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-10 14:14 ` Liam Howlett
2021-09-09 14:59 ` [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
` (3 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap").
find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.
Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/kernel/uv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index aeb0a15bcbb7..193205fb2777 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -227,7 +227,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
uaddr = __gmap_translate(gmap, gaddr);
if (IS_ERR_VALUE(uaddr))
goto out;
- vma = find_vma(gmap->mm, uaddr);
+ vma = vma_lookup(gmap->mm, uaddr);
if (!vma)
goto out;
/*
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 5/9] s390/uv: fully validate the VMA before calling follow_page()
2021-09-09 14:59 ` [PATCH RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
@ 2021-09-10 14:14 ` Liam Howlett
0 siblings, 0 replies; 17+ messages in thread
From: Liam Howlett @ 2021-09-10 14:14 UTC (permalink / raw)
To: David Hildenbrand; +Cc: linux-kernel, linux-s390, kvm, linux-mm
* David Hildenbrand <david@redhat.com> [210909 11:01]:
> We should not walk/touch page tables outside of VMA boundaries when
> holding only the mmap sem in read mode. Evil user space can modify the
> VMA layout just before this function runs and e.g., trigger races with
> page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> with read mmap_sem in munmap").
>
> find_vma() does not check if the address is >= the VMA start address;
> use vma_lookup() instead.
>
> Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> arch/s390/kernel/uv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index aeb0a15bcbb7..193205fb2777 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -227,7 +227,7 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> uaddr = __gmap_translate(gmap, gaddr);
> if (IS_ERR_VALUE(uaddr))
> goto out;
> - vma = find_vma(gmap->mm, uaddr);
> + vma = vma_lookup(gmap->mm, uaddr);
> if (!vma)
> goto out;
> /*
> --
> 2.31.1
>
>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (4 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 5/9] s390/uv: fully validate the VMA before calling follow_page() David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-10 8:22 ` Niklas Schnelle
2021-09-09 14:59 ` [PATCH RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present David Hildenbrand
` (2 subsequent siblings)
8 siblings, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We should not walk/touch page tables outside of VMA boundaries when
holding only the mmap sem in read mode. Evil user space can modify the
VMA layout just before this function runs and e.g., trigger races with
page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
with read mmap_sem in munmap").
find_vma() does not check if the address is >= the VMA start address;
use vma_lookup() instead.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/pci/pci_mmio.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index ae683aa623ac..c5b35ea129cf 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
mmap_read_lock(current->mm);
ret = -EINVAL;
- vma = find_vma(current->mm, mmio_addr);
+ vma = vma_lookup(current->mm, mmio_addr);
if (!vma)
goto out_unlock_mmap;
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
@@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
mmap_read_lock(current->mm);
ret = -EINVAL;
- vma = find_vma(current->mm, mmio_addr);
+ vma = vma_lookup(current->mm, mmio_addr);
if (!vma)
goto out_unlock_mmap;
if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-09 14:59 ` [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
@ 2021-09-10 8:22 ` Niklas Schnelle
2021-09-10 9:23 ` David Hildenbrand
0 siblings, 1 reply; 17+ messages in thread
From: Niklas Schnelle @ 2021-09-10 8:22 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel; +Cc: linux-s390, kvm, linux-mm
On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
> We should not walk/touch page tables outside of VMA boundaries when
> holding only the mmap sem in read mode. Evil user space can modify the
> VMA layout just before this function runs and e.g., trigger races with
> page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> with read mmap_sem in munmap").
>
> find_vma() does not check if the address is >= the VMA start address;
> use vma_lookup() instead.
>
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> arch/s390/pci/pci_mmio.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> index ae683aa623ac..c5b35ea129cf 100644
> --- a/arch/s390/pci/pci_mmio.c
> +++ b/arch/s390/pci/pci_mmio.c
> @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>
> mmap_read_lock(current->mm);
> ret = -EINVAL;
> - vma = find_vma(current->mm, mmio_addr);
> + vma = vma_lookup(current->mm, mmio_addr);
> if (!vma)
> goto out_unlock_mmap;
> if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
>
> mmap_read_lock(current->mm);
> ret = -EINVAL;
> - vma = find_vma(current->mm, mmio_addr);
> + vma = vma_lookup(current->mm, mmio_addr);
> if (!vma)
> goto out_unlock_mmap;
> if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
Oh wow great find thanks! If I may say so these are not great function
names. Looking at the code vma_lookup() is inded find_vma() plus the
check that the looked up address is indeed inside the vma.
I think this is pretty independent of the rest of the patches, so do
you want me to apply this patch independently or do you want to wait
for the others?
In any case:
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-10 8:22 ` Niklas Schnelle
@ 2021-09-10 9:23 ` David Hildenbrand
2021-09-10 12:48 ` Niklas Schnelle
2021-09-10 14:12 ` Liam Howlett
0 siblings, 2 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-10 9:23 UTC (permalink / raw)
To: Niklas Schnelle, linux-kernel; +Cc: linux-s390, kvm, linux-mm
On 10.09.21 10:22, Niklas Schnelle wrote:
> On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
>> We should not walk/touch page tables outside of VMA boundaries when
>> holding only the mmap sem in read mode. Evil user space can modify the
>> VMA layout just before this function runs and e.g., trigger races with
>> page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
>> with read mmap_sem in munmap").
>>
>> find_vma() does not check if the address is >= the VMA start address;
>> use vma_lookup() instead.
>>
>> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> arch/s390/pci/pci_mmio.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
>> index ae683aa623ac..c5b35ea129cf 100644
>> --- a/arch/s390/pci/pci_mmio.c
>> +++ b/arch/s390/pci/pci_mmio.c
>> @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
>>
>> mmap_read_lock(current->mm);
>> ret = -EINVAL;
>> - vma = find_vma(current->mm, mmio_addr);
>> + vma = vma_lookup(current->mm, mmio_addr);
>> if (!vma)
>> goto out_unlock_mmap;
>> if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
>> @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
>>
>> mmap_read_lock(current->mm);
>> ret = -EINVAL;
>> - vma = find_vma(current->mm, mmio_addr);
>> + vma = vma_lookup(current->mm, mmio_addr);
>> if (!vma)
>> goto out_unlock_mmap;
>> if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
>
> Oh wow great find thanks! If I may say so these are not great function
> names. Looking at the code vma_lookup() is inded find_vma() plus the
> check that the looked up address is indeed inside the vma.
>
IIRC, vma_lookup() was introduced fairly recently. Before that, this
additional check was open coded (and still are in some instances). It's
confusing, I agree.
> I think this is pretty independent of the rest of the patches, so do
> you want me to apply this patch independently or do you want to wait
> for the others?
Sure, please go ahead and apply independently. It'd be great if you
could give it a quick sanity test, although I don't expect surprises --
unfortunately, the environment I have easily at hand is not very well
suited (#cpu, #mem, #disk ...) for anything that exceeds basic compile
tests (and even cross-compiling is significantly faster ...).
>
> In any case:
>
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>
Thanks!
--
Thanks,
David / dhildenb
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-10 9:23 ` David Hildenbrand
@ 2021-09-10 12:48 ` Niklas Schnelle
2021-09-10 14:12 ` Liam Howlett
1 sibling, 0 replies; 17+ messages in thread
From: Niklas Schnelle @ 2021-09-10 12:48 UTC (permalink / raw)
To: David Hildenbrand, linux-kernel; +Cc: linux-s390, kvm, linux-mm
On Fri, 2021-09-10 at 11:23 +0200, David Hildenbrand wrote:
> On 10.09.21 10:22, Niklas Schnelle wrote:
> > On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
> > > We should not walk/touch page tables outside of VMA boundaries when
> > > holding only the mmap sem in read mode. Evil user space can modify the
> > > VMA layout just before this function runs and e.g., trigger races with
> > > page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> > > with read mmap_sem in munmap").
> > >
> > > find_vma() does not check if the address is >= the VMA start address;
> > > use vma_lookup() instead.
> > >
> > > Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > arch/s390/pci/pci_mmio.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > > index ae683aa623ac..c5b35ea129cf 100644
> > > --- a/arch/s390/pci/pci_mmio.c
> > > +++ b/arch/s390/pci/pci_mmio.c
> > > @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> > >
> > > mmap_read_lock(current->mm);
> > > ret = -EINVAL;
> > > - vma = find_vma(current->mm, mmio_addr);
> > > + vma = vma_lookup(current->mm, mmio_addr);
> > > if (!vma)
> > > goto out_unlock_mmap;
> > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> > >
> > > mmap_read_lock(current->mm);
> > > ret = -EINVAL;
> > > - vma = find_vma(current->mm, mmio_addr);
> > > + vma = vma_lookup(current->mm, mmio_addr);
> > > if (!vma)
> > > goto out_unlock_mmap;
> > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> >
> > Oh wow great find thanks! If I may say so these are not great function
> > names. Looking at the code vma_lookup() is inded find_vma() plus the
> > check that the looked up address is indeed inside the vma.
> >
>
> IIRC, vma_lookup() was introduced fairly recently. Before that, this
> additional check was open coded (and still are in some instances). It's
> confusing, I agree.
>
> > I think this is pretty independent of the rest of the patches, so do
> > you want me to apply this patch independently or do you want to wait
> > for the others?
>
> Sure, please go ahead and apply independently. It'd be great if you
> could give it a quick sanity test, although I don't expect surprises --
> unfortunately, the environment I have easily at hand is not very well
> suited (#cpu, #mem, #disk ...) for anything that exceeds basic compile
> tests (and even cross-compiling is significantly faster ...).
Yes and even if you had more hardware this code path is only hit by
very specialized workloads doing MMIO access of PCI devices from
userspace. I did test with such a workload (ib_send_bw test utility)
and all looks good.
Applied and will be sent out by Heiko or Vasily as part of the s390
tree.
>
> > In any case:
> >
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> >
>
> Thanks!
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-10 9:23 ` David Hildenbrand
2021-09-10 12:48 ` Niklas Schnelle
@ 2021-09-10 14:12 ` Liam Howlett
2021-09-10 14:31 ` Niklas Schnelle
1 sibling, 1 reply; 17+ messages in thread
From: Liam Howlett @ 2021-09-10 14:12 UTC (permalink / raw)
To: David Hildenbrand
Cc: Niklas Schnelle, linux-kernel, linux-s390, kvm, linux-mm
* David Hildenbrand <david@redhat.com> [210910 05:23]:
> On 10.09.21 10:22, Niklas Schnelle wrote:
> > On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
> > > We should not walk/touch page tables outside of VMA boundaries when
> > > holding only the mmap sem in read mode. Evil user space can modify the
> > > VMA layout just before this function runs and e.g., trigger races with
> > > page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> > > with read mmap_sem in munmap").
> > >
> > > find_vma() does not check if the address is >= the VMA start address;
> > > use vma_lookup() instead.
> > >
> > > Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > ---
> > > arch/s390/pci/pci_mmio.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > > index ae683aa623ac..c5b35ea129cf 100644
> > > --- a/arch/s390/pci/pci_mmio.c
> > > +++ b/arch/s390/pci/pci_mmio.c
> > > @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> > > mmap_read_lock(current->mm);
> > > ret = -EINVAL;
> > > - vma = find_vma(current->mm, mmio_addr);
> > > + vma = vma_lookup(current->mm, mmio_addr);
> > > if (!vma)
> > > goto out_unlock_mmap;
> > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> > > mmap_read_lock(current->mm);
> > > ret = -EINVAL;
> > > - vma = find_vma(current->mm, mmio_addr);
> > > + vma = vma_lookup(current->mm, mmio_addr);
> > > if (!vma)
> > > goto out_unlock_mmap;
> > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> >
> > Oh wow great find thanks! If I may say so these are not great function
> > names. Looking at the code vma_lookup() is inded find_vma() plus the
> > check that the looked up address is indeed inside the vma.
> >
>
> IIRC, vma_lookup() was introduced fairly recently. Before that, this
> additional check was open coded (and still are in some instances). It's
> confusing, I agree.
This confusion is why I introduced vma_lookup(). My hope is to reduce
the users of find_vma() to only those that actually need the added
functionality, which are mostly in the mm code.
>
> > I think this is pretty independent of the rest of the patches, so do
> > you want me to apply this patch independently or do you want to wait
> > for the others?
>
> Sure, please go ahead and apply independently. It'd be great if you could
> give it a quick sanity test, although I don't expect surprises --
> unfortunately, the environment I have easily at hand is not very well suited
> (#cpu, #mem, #disk ...) for anything that exceeds basic compile tests (and
> even cross-compiling is significantly faster ...).
>
> >
> > In any case:
> >
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> >
>
> Thanks!
>
> --
> Thanks,
>
> David / dhildenb
>
>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-10 14:12 ` Liam Howlett
@ 2021-09-10 14:31 ` Niklas Schnelle
2021-09-10 14:52 ` Liam Howlett
0 siblings, 1 reply; 17+ messages in thread
From: Niklas Schnelle @ 2021-09-10 14:31 UTC (permalink / raw)
To: Liam Howlett, David Hildenbrand; +Cc: linux-kernel, linux-s390, kvm, linux-mm
On Fri, 2021-09-10 at 14:12 +0000, Liam Howlett wrote:
> * David Hildenbrand <david@redhat.com> [210910 05:23]:
> > On 10.09.21 10:22, Niklas Schnelle wrote:
> > > On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
> > > > We should not walk/touch page tables outside of VMA boundaries when
> > > > holding only the mmap sem in read mode. Evil user space can modify the
> > > > VMA layout just before this function runs and e.g., trigger races with
> > > > page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> > > > with read mmap_sem in munmap").
> > > >
> > > > find_vma() does not check if the address is >= the VMA start address;
> > > > use vma_lookup() instead.
> > > >
> > > > Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > ---
> > > > arch/s390/pci/pci_mmio.c | 4 ++--
> > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > > > index ae683aa623ac..c5b35ea129cf 100644
> > > > --- a/arch/s390/pci/pci_mmio.c
> > > > +++ b/arch/s390/pci/pci_mmio.c
> > > > @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> > > > mmap_read_lock(current->mm);
> > > > ret = -EINVAL;
> > > > - vma = find_vma(current->mm, mmio_addr);
> > > > + vma = vma_lookup(current->mm, mmio_addr);
> > > > if (!vma)
> > > > goto out_unlock_mmap;
> > > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > > @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> > > > mmap_read_lock(current->mm);
> > > > ret = -EINVAL;
> > > > - vma = find_vma(current->mm, mmio_addr);
> > > > + vma = vma_lookup(current->mm, mmio_addr);
> > > > if (!vma)
> > > > goto out_unlock_mmap;
> > > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > >
> > > Oh wow great find thanks! If I may say so these are not great function
> > > names. Looking at the code vma_lookup() is inded find_vma() plus the
> > > check that the looked up address is indeed inside the vma.
> > >
> >
> > IIRC, vma_lookup() was introduced fairly recently. Before that, this
> > additional check was open coded (and still are in some instances). It's
> > confusing, I agree.
>
> This confusion is why I introduced vma_lookup(). My hope is to reduce
> the users of find_vma() to only those that actually need the added
> functionality, which are mostly in the mm code.
Ah I see, soo the confusingly similar names are in hope of one day
making find_vma() only visible or at least used in the mm code. That
does make more sense then. Thanks for the explanation! Maybe this would
be a good candidate for a treewide change/coccinelle script? Then again
I guess sometimes one really wants find_vma() and it's hard to tell
apart.
>
..snip..
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte()
2021-09-10 14:31 ` Niklas Schnelle
@ 2021-09-10 14:52 ` Liam Howlett
0 siblings, 0 replies; 17+ messages in thread
From: Liam Howlett @ 2021-09-10 14:52 UTC (permalink / raw)
To: Niklas Schnelle
Cc: David Hildenbrand, linux-kernel, linux-s390, kvm, linux-mm
* Niklas Schnelle <schnelle@linux.ibm.com> [210910 10:31]:
> On Fri, 2021-09-10 at 14:12 +0000, Liam Howlett wrote:
> > * David Hildenbrand <david@redhat.com> [210910 05:23]:
> > > On 10.09.21 10:22, Niklas Schnelle wrote:
> > > > On Thu, 2021-09-09 at 16:59 +0200, David Hildenbrand wrote:
> > > > > We should not walk/touch page tables outside of VMA boundaries when
> > > > > holding only the mmap sem in read mode. Evil user space can modify the
> > > > > VMA layout just before this function runs and e.g., trigger races with
> > > > > page table removal code since commit dd2283f2605e ("mm: mmap: zap pages
> > > > > with read mmap_sem in munmap").
> > > > >
> > > > > find_vma() does not check if the address is >= the VMA start address;
> > > > > use vma_lookup() instead.
> > > > >
> > > > > Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> > > > > Signed-off-by: David Hildenbrand <david@redhat.com>
> > > > > ---
> > > > > arch/s390/pci/pci_mmio.c | 4 ++--
> > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
> > > > > index ae683aa623ac..c5b35ea129cf 100644
> > > > > --- a/arch/s390/pci/pci_mmio.c
> > > > > +++ b/arch/s390/pci/pci_mmio.c
> > > > > @@ -159,7 +159,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
> > > > > mmap_read_lock(current->mm);
> > > > > ret = -EINVAL;
> > > > > - vma = find_vma(current->mm, mmio_addr);
> > > > > + vma = vma_lookup(current->mm, mmio_addr);
> > > > > if (!vma)
> > > > > goto out_unlock_mmap;
> > > > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > > > @@ -298,7 +298,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
> > > > > mmap_read_lock(current->mm);
> > > > > ret = -EINVAL;
> > > > > - vma = find_vma(current->mm, mmio_addr);
> > > > > + vma = vma_lookup(current->mm, mmio_addr);
> > > > > if (!vma)
> > > > > goto out_unlock_mmap;
> > > > > if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
> > > >
> > > > Oh wow great find thanks! If I may say so these are not great function
> > > > names. Looking at the code vma_lookup() is inded find_vma() plus the
> > > > check that the looked up address is indeed inside the vma.
> > > >
> > >
> > > IIRC, vma_lookup() was introduced fairly recently. Before that, this
> > > additional check was open coded (and still are in some instances). It's
> > > confusing, I agree.
> >
> > This confusion is why I introduced vma_lookup(). My hope is to reduce
> > the users of find_vma() to only those that actually need the added
> > functionality, which are mostly in the mm code.
>
> Ah I see, soo the confusingly similar names are in hope of one day
> making find_vma() only visible or at least used in the mm code. That
> does make more sense then. Thanks for the explanation! Maybe this would
> be a good candidate for a treewide change/coccinelle script? Then again
> I guess sometimes one really wants find_vma() and it's hard to tell
> apart.
>
find_vma() does not describe what the code actually does, so I think it
is a good candidate for a tree wide change. I'm not sure it would be
popular though. I couldn't come up with a name that would be worth the
efforts. If the name does change, then it should also change
find_vma_intersection() as well, nommu code also has a find_vma_exact().
Given the unraveling of a rename, I thought it'd be best to try and
clean up the current code and make it less error-prone with a new mm
API.
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (5 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 6/9] s390/pci_mmio: fully validate the VMA before calling follow_pte() David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 8/9] s390/mm: optimize set_guest_storage_key() David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 9/9] s390/mm: optimize reset_guest_reference_bit() David Hildenbrand
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
pte_map_lock() is sufficient.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/pgtable.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5fb409ff7842..4e77b8ebdcc5 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -814,10 +814,7 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
}
spin_unlock(ptl);
- ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
- if (unlikely(!ptep))
- return -EFAULT;
-
+ ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
new = old = pgste_get_lock(ptep);
pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT |
PGSTE_ACC_BITS | PGSTE_FP_BIT);
@@ -912,10 +909,7 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
}
spin_unlock(ptl);
- ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
- if (unlikely(!ptep))
- return -EFAULT;
-
+ ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
new = old = pgste_get_lock(ptep);
/* Reset guest reference bit only */
pgste_val(new) &= ~PGSTE_GR_BIT;
@@ -977,10 +971,7 @@ int get_guest_storage_key(struct mm_struct *mm, unsigned long addr,
}
spin_unlock(ptl);
- ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl);
- if (unlikely(!ptep))
- return -EFAULT;
-
+ ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl);
pgste = pgste_get_lock(ptep);
*key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56;
paddr = pte_val(*ptep) & PAGE_MASK;
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 8/9] s390/mm: optimize set_guest_storage_key()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (6 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 7/9] s390/mm: no need for pte_alloc_map_lock() if we know the pmd is present David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
2021-09-09 14:59 ` [PATCH RFC 9/9] s390/mm: optimize reset_guest_reference_bit() David Hildenbrand
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We already optimize get_guest_storage_key() to assume that if we don't have
a PTE table and don't have a huge page mapped that the storage key is 0.
Similarly, optimize set_guest_storage_key() to simply do nothing in case
the key to set is 0.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/pgtable.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 4e77b8ebdcc5..534939a3eca5 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -792,13 +792,23 @@ int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
pmd_t *pmdp;
pte_t *ptep;
- if (pmd_lookup(mm, addr, &pmdp))
+ /*
+ * If we don't have a PTE table and if there is no huge page mapped,
+ * we can ignore attempts to set the key to 0, because it already is 0.
+ */
+ switch (pmd_lookup(mm, addr, &pmdp)) {
+ case -ENOENT:
+ return key ? -EFAULT : 0;
+ case 0:
+ break;
+ default:
return -EFAULT;
+ }
ptl = pmd_lock(mm, pmdp);
if (!pmd_present(*pmdp)) {
spin_unlock(ptl);
- return -EFAULT;
+ return key ? -EFAULT : 0;
}
if (pmd_large(*pmdp)) {
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 9/9] s390/mm: optimize reset_guest_reference_bit()
2021-09-09 14:59 [PATCH RFC 0/9] s390: fixes, cleanups and optimizations for page table walkers David Hildenbrand
` (7 preceding siblings ...)
2021-09-09 14:59 ` [PATCH RFC 8/9] s390/mm: optimize set_guest_storage_key() David Hildenbrand
@ 2021-09-09 14:59 ` David Hildenbrand
8 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-09-09 14:59 UTC (permalink / raw)
To: linux-kernel; +Cc: linux-s390, kvm, linux-mm, David Hildenbrand
We already optimize get_guest_storage_key() to assume that if we don't have
a PTE table and don't have a huge page mapped that the storage key is 0.
Similarly, optimize reset_guest_reference_bit() to simply do nothing if
there is no PTE table and no huge page mapped.
Signed-off-by: David Hildenbrand <david@redhat.com>
---
arch/s390/mm/pgtable.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 534939a3eca5..50ab2fed3397 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -901,13 +901,23 @@ int reset_guest_reference_bit(struct mm_struct *mm, unsigned long addr)
pte_t *ptep;
int cc = 0;
- if (pmd_lookup(mm, addr, &pmdp))
+ /*
+ * If we don't have a PTE table and if there is no huge page mapped,
+ * the storage key is 0 and there is nothing for us to do.
+ */
+ switch (pmd_lookup(mm, addr, &pmdp)) {
+ case -ENOENT:
+ return 0;
+ case 0:
+ break;
+ default:
return -EFAULT;
+ }
ptl = pmd_lock(mm, pmdp);
if (!pmd_present(*pmdp)) {
spin_unlock(ptl);
- return -EFAULT;
+ return 0;
}
if (pmd_large(*pmdp)) {
--
2.31.1
^ permalink raw reply related [flat|nested] 17+ messages in thread