All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 11:09 ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of instructions.
The mentioned instructions behave quite nicely under virtualization, if there is: 
- an invalid pte, then the instructions will work on memory in the host page table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v2 -> v3:
 - Clearing up patch description Patch 3/4
 - removing unnecessary flag in mmu_context (Paolo)

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   8 +-
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/kvm/priv.c            |  17 ++--
 arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
 include/linux/mm.h              |   4 +
 mm/huge_memory.c                |   2 +-
 mm/memory.c                     |   2 +-
 8 files changed, 106 insertions(+), 111 deletions(-)

-- 
1.8.5.5


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 11:09 ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of instructions.
The mentioned instructions behave quite nicely under virtualization, if there is: 
- an invalid pte, then the instructions will work on memory in the host page table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v2 -> v3:
 - Clearing up patch description Patch 3/4
 - removing unnecessary flag in mmu_context (Paolo)

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   8 +-
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/kvm/priv.c            |  17 ++--
 arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
 include/linux/mm.h              |   4 +
 mm/huge_memory.c                |   2 +-
 mm/memory.c                     |   2 +-
 8 files changed, 106 insertions(+), 111 deletions(-)

-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 11:09 ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of instructions.
The mentioned instructions behave quite nicely under virtualization, if there is: 
- an invalid pte, then the instructions will work on memory in the host page table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v2 -> v3:
 - Clearing up patch description Patch 3/4
 - removing unnecessary flag in mmu_context (Paolo)

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   8 +-
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/kvm/priv.c            |  17 ++--
 arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
 include/linux/mm.h              |   4 +
 mm/huge_memory.c                |   2 +-
 mm/memory.c                     |   2 +-
 8 files changed, 106 insertions(+), 111 deletions(-)

-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 11:09 ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

s390 has the special notion of storage keys which are some sort of page flags
associated with physical pages and live outside of direct addressable memory.
These storage keys can be queried and changed with a special set of instructions.
The mentioned instructions behave quite nicely under virtualization, if there is: 
- an invalid pte, then the instructions will work on memory in the host page table
- a valid pte, then the instructions will work with the real storage key

Thanks to Martin with his software reference and dirty bit tracking,
the kernel does not issue any storage key instructions as now a 
software based approach will be taken, on the other hand distributions 
in the wild are currently using them.

However, for virtualized guests we still have a problem with guest pages 
mapped to zero pages and the kernel same page merging.  
With each one multiple guest pages will point to the same physical page
and share the same storage key.

Let's fix this by introducing a new function which s390 will define to
forbid new zero page mappings.  If the guest issues a storage key related 
instruction we flag the mm_struct, drop existing zero page mappings
and unmerge the guest memory.

v2 -> v3:
 - Clearing up patch description Patch 3/4
 - removing unnecessary flag in mmu_context (Paolo)

v1 -> v2: 
 - Following Dave and Paolo suggestion removing the vma flag

Dominik Dingel (4):
  s390/mm: recfactor global pgste updates
  mm: introduce mm_forbids_zeropage function
  s390/mm: prevent and break zero page mappings in case of storage keys
  s390/mm: disable KSM for storage key enabled pages

 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   8 +-
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/kvm/priv.c            |  17 ++--
 arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
 include/linux/mm.h              |   4 +
 mm/huge_memory.c                |   2 +-
 mm/memory.c                     |   2 +-
 8 files changed, 106 insertions(+), 111 deletions(-)

-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/4] s390/mm: recfactor global pgste updates
  2014-10-22 11:09 ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 11:09   ` Dominik Dingel
  -1 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/mm/pgtable.c          | 153 ++++++++++++++--------------------------
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-			    bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_MEM_CLR_CMMA:
 		mutex_lock(&kvm->lock);
 		idx = srcu_read_lock(&kvm->srcu);
-		page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+		s390_reset_cmma(kvm->arch.gmap->mm);
 		srcu_read_unlock(&kvm->srcu, idx);
 		mutex_unlock(&kvm->lock);
 		ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table)
 	__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	pte_t *start_pte, *pte;
-	spinlock_t *ptl;
-	pgste_t pgste;
-
-	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	pte = start_pte;
-	do {
-		pgste = pgste_get_lock(pte);
-		pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-		if (init_skey) {
-			unsigned long address;
-
-			pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
-					      PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-			/* skip invalid and not writable pages */
-			if (pte_val(*pte) & _PAGE_INVALID ||
-			    !(pte_val(*pte) & _PAGE_WRITE)) {
-				pgste_set_unlock(pte, pgste);
-				continue;
-			}
-
-			address = pte_val(*pte) & PAGE_MASK;
-			page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-		}
-		pgste_set_unlock(pte, pgste);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(start_pte, ptl);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-	} while (pmd++, addr = next, addr != end);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		next = page_table_reset_pmd(mm, pud, addr, next, init_skey);
-	} while (pud++, addr = next, addr != end);
-
-	return addr;
-}
-
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-	unsigned long addr, next;
-	pgd_t *pgd;
-
-	down_write(&mm->mmap_sem);
-	if (init_skey && mm_use_skey(mm))
-		goto out_up;
-	addr = start;
-	pgd = pgd_offset(mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		next = page_table_reset_pud(mm, pgd, addr, next, init_skey);
-	} while (pgd++, addr = next, addr != end);
-	if (init_skey)
-		current->mm->context.use_skey = 1;
-out_up:
-	up_write(&mm->mmap_sem);
-}
-EXPORT_SYMBOL(page_table_reset_pgste);
-
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq)
 {
@@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm,
 	return NULL;
 }
 
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-}
-
 static inline void page_table_free_pgste(unsigned long *table)
 {
 }
@@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
  */
+static int __s390_enable_skey(pte_t *pte, unsigned long addr,
+			      unsigned long next, struct mm_walk *walk)
+{
+	unsigned long ptev;
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	/* Clear storage key */
+	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
+			      PGSTE_GR_BIT | PGSTE_GC_BIT);
+	ptev = pte_val(*pte);
+	if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE))
+		page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1);
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
 void s390_enable_skey(void)
 {
-	page_table_reset_pgste(current->mm, 0, TASK_SIZE, true);
+	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+	struct mm_struct *mm = current->mm;
+
+	down_write(&mm->mmap_sem);
+	if (mm_use_skey(mm))
+		goto out_up;
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	mm->context.use_skey = 1;
+
+out_up:
+	up_write(&mm->mmap_sem);
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
 /*
+ * Reset CMMA state, make all pages stable again.
+ */
+static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
+			     unsigned long next, struct mm_walk *walk)
+{
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
+void s390_reset_cmma(struct mm_struct *mm)
+{
+	struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
+
+	down_write(&mm->mmap_sem);
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	up_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL_GPL(s390_reset_cmma);
+
+/*
  * Test and reset if a guest page is dirty
  */
 bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap)
-- 
1.8.5.5


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/4] s390/mm: recfactor global pgste updates
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/mm/pgtable.c          | 153 ++++++++++++++--------------------------
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-			    bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_MEM_CLR_CMMA:
 		mutex_lock(&kvm->lock);
 		idx = srcu_read_lock(&kvm->srcu);
-		page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+		s390_reset_cmma(kvm->arch.gmap->mm);
 		srcu_read_unlock(&kvm->srcu, idx);
 		mutex_unlock(&kvm->lock);
 		ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table)
 	__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	pte_t *start_pte, *pte;
-	spinlock_t *ptl;
-	pgste_t pgste;
-
-	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	pte = start_pte;
-	do {
-		pgste = pgste_get_lock(pte);
-		pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-		if (init_skey) {
-			unsigned long address;
-
-			pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
-					      PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-			/* skip invalid and not writable pages */
-			if (pte_val(*pte) & _PAGE_INVALID ||
-			    !(pte_val(*pte) & _PAGE_WRITE)) {
-				pgste_set_unlock(pte, pgste);
-				continue;
-			}
-
-			address = pte_val(*pte) & PAGE_MASK;
-			page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-		}
-		pgste_set_unlock(pte, pgste);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(start_pte, ptl);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-	} while (pmd++, addr = next, addr != end);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		next = page_table_reset_pmd(mm, pud, addr, next, init_skey);
-	} while (pud++, addr = next, addr != end);
-
-	return addr;
-}
-
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-	unsigned long addr, next;
-	pgd_t *pgd;
-
-	down_write(&mm->mmap_sem);
-	if (init_skey && mm_use_skey(mm))
-		goto out_up;
-	addr = start;
-	pgd = pgd_offset(mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		next = page_table_reset_pud(mm, pgd, addr, next, init_skey);
-	} while (pgd++, addr = next, addr != end);
-	if (init_skey)
-		current->mm->context.use_skey = 1;
-out_up:
-	up_write(&mm->mmap_sem);
-}
-EXPORT_SYMBOL(page_table_reset_pgste);
-
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq)
 {
@@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm,
 	return NULL;
 }
 
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-}
-
 static inline void page_table_free_pgste(unsigned long *table)
 {
 }
@@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
  */
+static int __s390_enable_skey(pte_t *pte, unsigned long addr,
+			      unsigned long next, struct mm_walk *walk)
+{
+	unsigned long ptev;
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	/* Clear storage key */
+	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
+			      PGSTE_GR_BIT | PGSTE_GC_BIT);
+	ptev = pte_val(*pte);
+	if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE))
+		page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1);
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
 void s390_enable_skey(void)
 {
-	page_table_reset_pgste(current->mm, 0, TASK_SIZE, true);
+	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+	struct mm_struct *mm = current->mm;
+
+	down_write(&mm->mmap_sem);
+	if (mm_use_skey(mm))
+		goto out_up;
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	mm->context.use_skey = 1;
+
+out_up:
+	up_write(&mm->mmap_sem);
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
 /*
+ * Reset CMMA state, make all pages stable again.
+ */
+static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
+			     unsigned long next, struct mm_walk *walk)
+{
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
+void s390_reset_cmma(struct mm_struct *mm)
+{
+	struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
+
+	down_write(&mm->mmap_sem);
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	up_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL_GPL(s390_reset_cmma);
+
+/*
  * Test and reset if a guest page is dirty
  */
 bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap)
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/4] s390/mm: recfactor global pgste updates
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/mm/pgtable.c          | 153 ++++++++++++++--------------------------
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-			    bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_MEM_CLR_CMMA:
 		mutex_lock(&kvm->lock);
 		idx = srcu_read_lock(&kvm->srcu);
-		page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+		s390_reset_cmma(kvm->arch.gmap->mm);
 		srcu_read_unlock(&kvm->srcu, idx);
 		mutex_unlock(&kvm->lock);
 		ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table)
 	__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	pte_t *start_pte, *pte;
-	spinlock_t *ptl;
-	pgste_t pgste;
-
-	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	pte = start_pte;
-	do {
-		pgste = pgste_get_lock(pte);
-		pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-		if (init_skey) {
-			unsigned long address;
-
-			pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
-					      PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-			/* skip invalid and not writable pages */
-			if (pte_val(*pte) & _PAGE_INVALID ||
-			    !(pte_val(*pte) & _PAGE_WRITE)) {
-				pgste_set_unlock(pte, pgste);
-				continue;
-			}
-
-			address = pte_val(*pte) & PAGE_MASK;
-			page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-		}
-		pgste_set_unlock(pte, pgste);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(start_pte, ptl);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-	} while (pmd++, addr = next, addr != end);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		next = page_table_reset_pmd(mm, pud, addr, next, init_skey);
-	} while (pud++, addr = next, addr != end);
-
-	return addr;
-}
-
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-	unsigned long addr, next;
-	pgd_t *pgd;
-
-	down_write(&mm->mmap_sem);
-	if (init_skey && mm_use_skey(mm))
-		goto out_up;
-	addr = start;
-	pgd = pgd_offset(mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		next = page_table_reset_pud(mm, pgd, addr, next, init_skey);
-	} while (pgd++, addr = next, addr != end);
-	if (init_skey)
-		current->mm->context.use_skey = 1;
-out_up:
-	up_write(&mm->mmap_sem);
-}
-EXPORT_SYMBOL(page_table_reset_pgste);
-
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq)
 {
@@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm,
 	return NULL;
 }
 
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-}
-
 static inline void page_table_free_pgste(unsigned long *table)
 {
 }
@@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
  */
+static int __s390_enable_skey(pte_t *pte, unsigned long addr,
+			      unsigned long next, struct mm_walk *walk)
+{
+	unsigned long ptev;
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	/* Clear storage key */
+	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
+			      PGSTE_GR_BIT | PGSTE_GC_BIT);
+	ptev = pte_val(*pte);
+	if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE))
+		page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1);
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
 void s390_enable_skey(void)
 {
-	page_table_reset_pgste(current->mm, 0, TASK_SIZE, true);
+	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+	struct mm_struct *mm = current->mm;
+
+	down_write(&mm->mmap_sem);
+	if (mm_use_skey(mm))
+		goto out_up;
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	mm->context.use_skey = 1;
+
+out_up:
+	up_write(&mm->mmap_sem);
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
 /*
+ * Reset CMMA state, make all pages stable again.
+ */
+static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
+			     unsigned long next, struct mm_walk *walk)
+{
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
+void s390_reset_cmma(struct mm_struct *mm)
+{
+	struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
+
+	down_write(&mm->mmap_sem);
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	up_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL_GPL(s390_reset_cmma);
+
+/*
  * Test and reset if a guest page is dirty
  */
 bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap)
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/4] s390/mm: recfactor global pgste updates
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

Replace the s390 specific page table walker for the pgste updates
with a call to the common code walk_page_range function.
There are now two pte modification functions, one for the reset
of the CMMA state and another one for the initialization of the
storage keys.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgalloc.h |   2 -
 arch/s390/include/asm/pgtable.h |   1 +
 arch/s390/kvm/kvm-s390.c        |   2 +-
 arch/s390/mm/pgtable.c          | 153 ++++++++++++++--------------------------
 4 files changed, 56 insertions(+), 102 deletions(-)

diff --git a/arch/s390/include/asm/pgalloc.h b/arch/s390/include/asm/pgalloc.h
index 9e18a61..120e126 100644
--- a/arch/s390/include/asm/pgalloc.h
+++ b/arch/s390/include/asm/pgalloc.h
@@ -22,8 +22,6 @@ unsigned long *page_table_alloc(struct mm_struct *, unsigned long);
 void page_table_free(struct mm_struct *, unsigned long *);
 void page_table_free_rcu(struct mmu_gather *, unsigned long *);
 
-void page_table_reset_pgste(struct mm_struct *, unsigned long, unsigned long,
-			    bool init_skey);
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq);
 
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 5efb2fe..1e991f6a 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1750,6 +1750,7 @@ extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern void s390_enable_skey(void);
+extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
  * No page table caches to initialise
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 81b0e11..7a33c11 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -281,7 +281,7 @@ static int kvm_s390_mem_control(struct kvm *kvm, struct kvm_device_attr *attr)
 	case KVM_S390_VM_MEM_CLR_CMMA:
 		mutex_lock(&kvm->lock);
 		idx = srcu_read_lock(&kvm->srcu);
-		page_table_reset_pgste(kvm->arch.gmap->mm, 0, TASK_SIZE, false);
+		s390_reset_cmma(kvm->arch.gmap->mm);
 		srcu_read_unlock(&kvm->srcu, idx);
 		mutex_unlock(&kvm->lock);
 		ret = 0;
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 5404a62..ab55ba8 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -885,99 +885,6 @@ static inline void page_table_free_pgste(unsigned long *table)
 	__free_page(page);
 }
 
-static inline unsigned long page_table_reset_pte(struct mm_struct *mm, pmd_t *pmd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	pte_t *start_pte, *pte;
-	spinlock_t *ptl;
-	pgste_t pgste;
-
-	start_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-	pte = start_pte;
-	do {
-		pgste = pgste_get_lock(pte);
-		pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
-		if (init_skey) {
-			unsigned long address;
-
-			pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
-					      PGSTE_GR_BIT | PGSTE_GC_BIT);
-
-			/* skip invalid and not writable pages */
-			if (pte_val(*pte) & _PAGE_INVALID ||
-			    !(pte_val(*pte) & _PAGE_WRITE)) {
-				pgste_set_unlock(pte, pgste);
-				continue;
-			}
-
-			address = pte_val(*pte) & PAGE_MASK;
-			page_set_storage_key(address, PAGE_DEFAULT_KEY, 1);
-		}
-		pgste_set_unlock(pte, pgste);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-	pte_unmap_unlock(start_pte, ptl);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pmd(struct mm_struct *mm, pud_t *pud,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pmd_t *pmd;
-
-	pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		if (pmd_none_or_clear_bad(pmd))
-			continue;
-		next = page_table_reset_pte(mm, pmd, addr, next, init_skey);
-	} while (pmd++, addr = next, addr != end);
-
-	return addr;
-}
-
-static inline unsigned long page_table_reset_pud(struct mm_struct *mm, pgd_t *pgd,
-			unsigned long addr, unsigned long end, bool init_skey)
-{
-	unsigned long next;
-	pud_t *pud;
-
-	pud = pud_offset(pgd, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		if (pud_none_or_clear_bad(pud))
-			continue;
-		next = page_table_reset_pmd(mm, pud, addr, next, init_skey);
-	} while (pud++, addr = next, addr != end);
-
-	return addr;
-}
-
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-	unsigned long addr, next;
-	pgd_t *pgd;
-
-	down_write(&mm->mmap_sem);
-	if (init_skey && mm_use_skey(mm))
-		goto out_up;
-	addr = start;
-	pgd = pgd_offset(mm, addr);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (pgd_none_or_clear_bad(pgd))
-			continue;
-		next = page_table_reset_pud(mm, pgd, addr, next, init_skey);
-	} while (pgd++, addr = next, addr != end);
-	if (init_skey)
-		current->mm->context.use_skey = 1;
-out_up:
-	up_write(&mm->mmap_sem);
-}
-EXPORT_SYMBOL(page_table_reset_pgste);
-
 int set_guest_storage_key(struct mm_struct *mm, unsigned long addr,
 			  unsigned long key, bool nq)
 {
@@ -1044,11 +951,6 @@ static inline unsigned long *page_table_alloc_pgste(struct mm_struct *mm,
 	return NULL;
 }
 
-void page_table_reset_pgste(struct mm_struct *mm, unsigned long start,
-			    unsigned long end, bool init_skey)
-{
-}
-
 static inline void page_table_free_pgste(unsigned long *table)
 {
 }
@@ -1400,13 +1302,66 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
  * Enable storage key handling from now on and initialize the storage
  * keys with the default key.
  */
+static int __s390_enable_skey(pte_t *pte, unsigned long addr,
+			      unsigned long next, struct mm_walk *walk)
+{
+	unsigned long ptev;
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	/* Clear storage key */
+	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
+			      PGSTE_GR_BIT | PGSTE_GC_BIT);
+	ptev = pte_val(*pte);
+	if (!(ptev & _PAGE_INVALID) && (ptev & _PAGE_WRITE))
+		page_set_storage_key(ptev & PAGE_MASK, PAGE_DEFAULT_KEY, 1);
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
 void s390_enable_skey(void)
 {
-	page_table_reset_pgste(current->mm, 0, TASK_SIZE, true);
+	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
+	struct mm_struct *mm = current->mm;
+
+	down_write(&mm->mmap_sem);
+	if (mm_use_skey(mm))
+		goto out_up;
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	mm->context.use_skey = 1;
+
+out_up:
+	up_write(&mm->mmap_sem);
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
 /*
+ * Reset CMMA state, make all pages stable again.
+ */
+static int __s390_reset_cmma(pte_t *pte, unsigned long addr,
+			     unsigned long next, struct mm_walk *walk)
+{
+	pgste_t pgste;
+
+	pgste = pgste_get_lock(pte);
+	pgste_val(pgste) &= ~_PGSTE_GPS_USAGE_MASK;
+	pgste_set_unlock(pte, pgste);
+	return 0;
+}
+
+void s390_reset_cmma(struct mm_struct *mm)
+{
+	struct mm_walk walk = { .pte_entry = __s390_reset_cmma };
+
+	down_write(&mm->mmap_sem);
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+	up_write(&mm->mmap_sem);
+}
+EXPORT_SYMBOL_GPL(s390_reset_cmma);
+
+/*
  * Test and reset if a guest page is dirty
  */
 bool gmap_test_and_clear_dirty(unsigned long address, struct gmap *gmap)
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/4] mm: introduce mm_forbids_zeropage function
  2014-10-22 11:09 ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 11:09   ` Dominik Dingel
  -1 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 include/linux/mm.h | 4 ++++
 mm/huge_memory.c   | 2 +-
 mm/memory.c        | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
 		return VM_FAULT_OOM;
-	if (!(flags & FAULT_FLAG_WRITE) &&
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
 			transparent_hugepage_use_zero_page()) {
 		spinlock_t *ptl;
 		pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_SIGBUS;
 
 	/* Use the zero-page for reads */
-	if (!(flags & FAULT_FLAG_WRITE)) {
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
 		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
 						vma->vm_page_prot));
 		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 include/linux/mm.h | 4 ++++
 mm/huge_memory.c   | 2 +-
 mm/memory.c        | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
 		return VM_FAULT_OOM;
-	if (!(flags & FAULT_FLAG_WRITE) &&
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
 			transparent_hugepage_use_zero_page()) {
 		spinlock_t *ptl;
 		pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_SIGBUS;
 
 	/* Use the zero-page for reads */
-	if (!(flags & FAULT_FLAG_WRITE)) {
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
 		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
 						vma->vm_page_prot));
 		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 include/linux/mm.h | 4 ++++
 mm/huge_memory.c   | 2 +-
 mm/memory.c        | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
 		return VM_FAULT_OOM;
-	if (!(flags & FAULT_FLAG_WRITE) &&
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
 			transparent_hugepage_use_zero_page()) {
 		spinlock_t *ptl;
 		pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_SIGBUS;
 
 	/* Use the zero-page for reads */
-	if (!(flags & FAULT_FLAG_WRITE)) {
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
 		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
 						vma->vm_page_prot));
 		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

Add a new function stub to allow architectures to disable for
an mm_structthe backing of non-present, anonymous pages with
read-only empty zero pages.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 include/linux/mm.h | 4 ++++
 mm/huge_memory.c   | 2 +-
 mm/memory.c        | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cd33ae2..0a2022e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
 #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
 #endif
 
+#ifndef mm_forbids_zeropage
+#define mm_forbids_zeropage(X)  (0)
+#endif
+
 extern unsigned long sysctl_user_reserve_kbytes;
 extern unsigned long sysctl_admin_reserve_kbytes;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index de98415..357a381 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_OOM;
 	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
 		return VM_FAULT_OOM;
-	if (!(flags & FAULT_FLAG_WRITE) &&
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
 			transparent_hugepage_use_zero_page()) {
 		spinlock_t *ptl;
 		pgtable_t pgtable;
diff --git a/mm/memory.c b/mm/memory.c
index 64f82aa..f275a9d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		return VM_FAULT_SIGBUS;
 
 	/* Use the zero-page for reads */
-	if (!(flags & FAULT_FLAG_WRITE)) {
+	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
 		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
 						vma->vm_page_prot));
 		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
  2014-10-22 11:09 ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 11:09   ` Dominik Dingel
  -1 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  5 +++++
 arch/s390/mm/pgtable.c          | 13 ++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..0da98d6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_use_skey
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..58d7eb2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	pgste_t pgste;
 
 	pgste = pgste_get_lock(pte);
+	/*
+	 * Remove all zero page mappings,
+	 * after establishing a policy to forbid zero page mappings
+	 * following faults for that page will get fresh anonymous pages
+	 */
+	if (is_zero_pfn(pte_pfn(*pte))) {
+		ptep_flush_direct(walk->mm, addr, pte);
+		pte_val(*pte) = _PAGE_INVALID;
+	}
 	/* Clear storage key */
 	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
 			      PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
+
+	mm->context.use_skey = 1;
+
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
-	mm->context.use_skey = 1;
 
 out_up:
 	up_write(&mm->mmap_sem);
-- 
1.8.5.5


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  5 +++++
 arch/s390/mm/pgtable.c          | 13 ++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..0da98d6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_use_skey
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..58d7eb2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	pgste_t pgste;
 
 	pgste = pgste_get_lock(pte);
+	/*
+	 * Remove all zero page mappings,
+	 * after establishing a policy to forbid zero page mappings
+	 * following faults for that page will get fresh anonymous pages
+	 */
+	if (is_zero_pfn(pte_pfn(*pte))) {
+		ptep_flush_direct(walk->mm, addr, pte);
+		pte_val(*pte) = _PAGE_INVALID;
+	}
 	/* Clear storage key */
 	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
 			      PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
+
+	mm->context.use_skey = 1;
+
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
-	mm->context.use_skey = 1;
 
 out_up:
 	up_write(&mm->mmap_sem);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  5 +++++
 arch/s390/mm/pgtable.c          | 13 ++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..0da98d6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_use_skey
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..58d7eb2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	pgste_t pgste;
 
 	pgste = pgste_get_lock(pte);
+	/*
+	 * Remove all zero page mappings,
+	 * after establishing a policy to forbid zero page mappings
+	 * following faults for that page will get fresh anonymous pages
+	 */
+	if (is_zero_pfn(pte_pfn(*pte))) {
+		ptep_flush_direct(walk->mm, addr, pte);
+		pte_val(*pte) = _PAGE_INVALID;
+	}
 	/* Clear storage key */
 	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
 			      PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
+
+	mm->context.use_skey = 1;
+
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
-	mm->context.use_skey = 1;
 
 out_up:
 	up_write(&mm->mmap_sem);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

As soon as storage keys are enabled we need to stop working on zero page
mappings to prevent inconsistencies between storage keys and pgste.

Otherwise following data corruption could happen:
1) guest enables storage key
2) guest sets storage key for not mapped page X
   -> change goes to PGSTE
3) guest reads from page X
   -> as X was not dirty before, the page will be zero page backed,
      storage key from PGSTE for X will go to storage key for zero page
4) guest sets storage key for not mapped page Y (same logic as above
5) guest reads from page Y
   -> as Y was not dirty before, the page will be zero page backed,
      storage key from PGSTE for Y will got to storage key for zero page
      overwriting storage key for X

While holding the mmap sem, we are safe against changes on entries we
already fixed, as every fault would need to take the mmap_sem (read).

Other vCPUs executing storage key instructions will get a one time interception
and be serialized also with mmap_sem.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  5 +++++
 arch/s390/mm/pgtable.c          | 13 ++++++++++++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 1e991f6a..0da98d6 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
 	return 0;
 }
 
+/*
+ * In the case that a guest uses storage keys
+ * faults should no longer be backed by zero pages
+ */
+#define mm_forbids_zeropage mm_use_skey
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index ab55ba8..58d7eb2 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	pgste_t pgste;
 
 	pgste = pgste_get_lock(pte);
+	/*
+	 * Remove all zero page mappings,
+	 * after establishing a policy to forbid zero page mappings
+	 * following faults for that page will get fresh anonymous pages
+	 */
+	if (is_zero_pfn(pte_pfn(*pte))) {
+		ptep_flush_direct(walk->mm, addr, pte);
+		pte_val(*pte) = _PAGE_INVALID;
+	}
 	/* Clear storage key */
 	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
 			      PGSTE_GR_BIT | PGSTE_GC_BIT);
@@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
+
+	mm->context.use_skey = 1;
+
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
-	mm->context.use_skey = 1;
 
 out_up:
 	up_write(&mm->mmap_sem);
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
  2014-10-22 11:09 ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 11:09   ` Dominik Dingel
  -1 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c            | 17 ++++++++++++-----
 arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0da98d6..dfb38af 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+	int rc = 0;
 	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-		return;
+		return rc;
 
-	s390_enable_skey();
+	rc = s390_enable_skey();
 	trace_kvm_s390_skey_related_inst(vcpu);
 	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+	return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-	__skey_check_enable(vcpu);
+	int rc = __skey_check_enable(vcpu);
 
+	if (rc)
+		return rc;
 	vcpu->stat.instruction_storage_key++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 		}
 
 		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-			__skey_check_enable(vcpu);
+			int rc = __skey_check_enable(vcpu);
+
+			if (rc)
+				return rc;
 			if (set_guest_storage_key(current->mm, useraddr,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 58d7eb2..82aa528 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/swapops.h>
+#include <linux/ksm.h>
+#include <linux/mman.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
 	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
 	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	int rc = 0;
 
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
 
 	mm->context.use_skey = 1;
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			mm->context.use_skey = 0;
+			rc = -ENOMEM;
+			goto out_up;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
 
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
 
 out_up:
 	up_write(&mm->mmap_sem);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c            | 17 ++++++++++++-----
 arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0da98d6..dfb38af 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+	int rc = 0;
 	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-		return;
+		return rc;
 
-	s390_enable_skey();
+	rc = s390_enable_skey();
 	trace_kvm_s390_skey_related_inst(vcpu);
 	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+	return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-	__skey_check_enable(vcpu);
+	int rc = __skey_check_enable(vcpu);
 
+	if (rc)
+		return rc;
 	vcpu->stat.instruction_storage_key++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 		}
 
 		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-			__skey_check_enable(vcpu);
+			int rc = __skey_check_enable(vcpu);
+
+			if (rc)
+				return rc;
 			if (set_guest_storage_key(current->mm, useraddr,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 58d7eb2..82aa528 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/swapops.h>
+#include <linux/ksm.h>
+#include <linux/mman.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
 	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
 	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	int rc = 0;
 
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
 
 	mm->context.use_skey = 1;
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			mm->context.use_skey = 0;
+			rc = -ENOMEM;
+			goto out_up;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
 
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
 
 out_up:
 	up_write(&mm->mmap_sem);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c            | 17 ++++++++++++-----
 arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0da98d6..dfb38af 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+	int rc = 0;
 	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-		return;
+		return rc;
 
-	s390_enable_skey();
+	rc = s390_enable_skey();
 	trace_kvm_s390_skey_related_inst(vcpu);
 	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+	return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-	__skey_check_enable(vcpu);
+	int rc = __skey_check_enable(vcpu);
 
+	if (rc)
+		return rc;
 	vcpu->stat.instruction_storage_key++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 		}
 
 		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-			__skey_check_enable(vcpu);
+			int rc = __skey_check_enable(vcpu);
+
+			if (rc)
+				return rc;
 			if (set_guest_storage_key(current->mm, useraddr,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 58d7eb2..82aa528 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/swapops.h>
+#include <linux/ksm.h>
+#include <linux/mman.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
 	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
 	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	int rc = 0;
 
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
 
 	mm->context.use_skey = 1;
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			mm->context.use_skey = 0;
+			rc = -ENOMEM;
+			goto out_up;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
 
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
 
 out_up:
 	up_write(&mm->mmap_sem);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 11:09   ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 11:09 UTC (permalink / raw)
  To: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin, Dominik Dingel

When storage keys are enabled unmerge already merged pages and prevent
new pages from being merged.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/include/asm/pgtable.h |  2 +-
 arch/s390/kvm/priv.c            | 17 ++++++++++++-----
 arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
 3 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 0da98d6..dfb38af 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
 extern int vmem_add_mapping(unsigned long start, unsigned long size);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
-extern void s390_enable_skey(void);
+extern int s390_enable_skey(void);
 extern void s390_reset_cmma(struct mm_struct *mm);
 
 /*
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index f89c1cd..e0967fd 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static void __skey_check_enable(struct kvm_vcpu *vcpu)
+static int __skey_check_enable(struct kvm_vcpu *vcpu)
 {
+	int rc = 0;
 	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
-		return;
+		return rc;
 
-	s390_enable_skey();
+	rc = s390_enable_skey();
 	trace_kvm_s390_skey_related_inst(vcpu);
 	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
+	return rc;
 }
 
 
 static int handle_skey(struct kvm_vcpu *vcpu)
 {
-	__skey_check_enable(vcpu);
+	int rc = __skey_check_enable(vcpu);
 
+	if (rc)
+		return rc;
 	vcpu->stat.instruction_storage_key++;
 
 	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
@@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 		}
 
 		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
-			__skey_check_enable(vcpu);
+			int rc = __skey_check_enable(vcpu);
+
+			if (rc)
+				return rc;
 			if (set_guest_storage_key(current->mm, useraddr,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
 					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 58d7eb2..82aa528 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -18,6 +18,8 @@
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
 #include <linux/swapops.h>
+#include <linux/ksm.h>
+#include <linux/mman.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 	return 0;
 }
 
-void s390_enable_skey(void)
+int s390_enable_skey(void)
 {
 	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
 	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	int rc = 0;
 
 	down_write(&mm->mmap_sem);
 	if (mm_use_skey(mm))
 		goto out_up;
 
 	mm->context.use_skey = 1;
+	for (vma = mm->mmap; vma; vma = vma->vm_next) {
+		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
+				MADV_UNMERGEABLE, &vma->vm_flags)) {
+			mm->context.use_skey = 0;
+			rc = -ENOMEM;
+			goto out_up;
+		}
+	}
+	mm->def_flags &= ~VM_MERGEABLE;
 
 	walk.mm = mm;
 	walk_page_range(0, TASK_SIZE, &walk);
 
 out_up:
 	up_write(&mm->mmap_sem);
+	return rc;
 }
 EXPORT_SYMBOL_GPL(s390_enable_skey);
 
-- 
1.8.5.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
  2014-10-22 11:09 ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 13:59   ` Paolo Bonzini
  -1 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin



On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)

... and zero the mm_use_skey flag correctly, too. :)

> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag

Thanks, the patches look good.  I expect that they will either go in
through the s390 tree, or come in via Christian.

If the latter, Martin, please reply with your Acked-by.

Paolo

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 13:59   ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin



On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)

... and zero the mm_use_skey flag correctly, too. :)

> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag

Thanks, the patches look good.  I expect that they will either go in
through the s390 tree, or come in via Christian.

If the latter, Martin, please reply with your Acked-by.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 13:59   ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin



On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)

... and zero the mm_use_skey flag correctly, too. :)

> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag

Thanks, the patches look good.  I expect that they will either go in
through the s390 tree, or come in via Christian.

If the latter, Martin, please reply with your Acked-by.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-22 13:59   ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin



On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)

... and zero the mm_use_skey flag correctly, too. :)

> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag

Thanks, the patches look good.  I expect that they will either go in
through the s390 tree, or come in via Christian.

If the latter, Martin, please reply with your Acked-by.

Paolo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
  2014-10-22 11:09   ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 13:59     ` Paolo Bonzini
  -1 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 4 ++++
>  mm/huge_memory.c   | 2 +-
>  mm/memory.c        | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cd33ae2..0a2022e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif
> +
>  extern unsigned long sysctl_user_reserve_kbytes;
>  extern unsigned long sysctl_admin_reserve_kbytes;
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index de98415..357a381 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_OOM;
>  	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
>  		return VM_FAULT_OOM;
> -	if (!(flags & FAULT_FLAG_WRITE) &&
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
>  			transparent_hugepage_use_zero_page()) {
>  		spinlock_t *ptl;
>  		pgtable_t pgtable;
> diff --git a/mm/memory.c b/mm/memory.c
> index 64f82aa..f275a9d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_SIGBUS;
>  
>  	/* Use the zero-page for reads */
> -	if (!(flags & FAULT_FLAG_WRITE)) {
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
>  		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>  						vma->vm_page_prot));
>  		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 13:59     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 4 ++++
>  mm/huge_memory.c   | 2 +-
>  mm/memory.c        | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cd33ae2..0a2022e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif
> +
>  extern unsigned long sysctl_user_reserve_kbytes;
>  extern unsigned long sysctl_admin_reserve_kbytes;
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index de98415..357a381 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_OOM;
>  	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
>  		return VM_FAULT_OOM;
> -	if (!(flags & FAULT_FLAG_WRITE) &&
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
>  			transparent_hugepage_use_zero_page()) {
>  		spinlock_t *ptl;
>  		pgtable_t pgtable;
> diff --git a/mm/memory.c b/mm/memory.c
> index 64f82aa..f275a9d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_SIGBUS;
>  
>  	/* Use the zero-page for reads */
> -	if (!(flags & FAULT_FLAG_WRITE)) {
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
>  		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>  						vma->vm_page_prot));
>  		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 13:59     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 4 ++++
>  mm/huge_memory.c   | 2 +-
>  mm/memory.c        | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cd33ae2..0a2022e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif
> +
>  extern unsigned long sysctl_user_reserve_kbytes;
>  extern unsigned long sysctl_admin_reserve_kbytes;
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index de98415..357a381 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_OOM;
>  	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
>  		return VM_FAULT_OOM;
> -	if (!(flags & FAULT_FLAG_WRITE) &&
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
>  			transparent_hugepage_use_zero_page()) {
>  		spinlock_t *ptl;
>  		pgtable_t pgtable;
> diff --git a/mm/memory.c b/mm/memory.c
> index 64f82aa..f275a9d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_SIGBUS;
>  
>  	/* Use the zero-page for reads */
> -	if (!(flags & FAULT_FLAG_WRITE)) {
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
>  		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>  						vma->vm_page_prot));
>  		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 13:59     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 13:59 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  include/linux/mm.h | 4 ++++
>  mm/huge_memory.c   | 2 +-
>  mm/memory.c        | 2 +-
>  3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cd33ae2..0a2022e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif
> +
>  extern unsigned long sysctl_user_reserve_kbytes;
>  extern unsigned long sysctl_admin_reserve_kbytes;
>  
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index de98415..357a381 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -805,7 +805,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_OOM;
>  	if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
>  		return VM_FAULT_OOM;
> -	if (!(flags & FAULT_FLAG_WRITE) &&
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm) &&
>  			transparent_hugepage_use_zero_page()) {
>  		spinlock_t *ptl;
>  		pgtable_t pgtable;
> diff --git a/mm/memory.c b/mm/memory.c
> index 64f82aa..f275a9d 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2640,7 +2640,7 @@ static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  		return VM_FAULT_SIGBUS;
>  
>  	/* Use the zero-page for reads */
> -	if (!(flags & FAULT_FLAG_WRITE)) {
> +	if (!(flags & FAULT_FLAG_WRITE) && !mm_forbids_zeropage(mm)) {
>  		entry = pte_mkspecial(pfn_pte(my_zero_pfn(address),
>  						vma->vm_page_prot));
>  		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
  2014-10-22 11:09   ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 14:00     ` Paolo Bonzini
  -1 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> As soon as storage keys are enabled we need to stop working on zero page
> mappings to prevent inconsistencies between storage keys and pgste.
> 
> Otherwise following data corruption could happen:
> 1) guest enables storage key
> 2) guest sets storage key for not mapped page X
>    -> change goes to PGSTE
> 3) guest reads from page X
>    -> as X was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for X will go to storage key for zero page
> 4) guest sets storage key for not mapped page Y (same logic as above
> 5) guest reads from page Y
>    -> as Y was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for Y will got to storage key for zero page
>       overwriting storage key for X
> 
> While holding the mmap sem, we are safe against changes on entries we
> already fixed, as every fault would need to take the mmap_sem (read).
> 
> Other vCPUs executing storage key instructions will get a one time interception
> and be serialized also with mmap_sem.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  5 +++++
>  arch/s390/mm/pgtable.c          | 13 ++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 1e991f6a..0da98d6 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
>  	return 0;
>  }
>  
> +/*
> + * In the case that a guest uses storage keys
> + * faults should no longer be backed by zero pages
> + */
> +#define mm_forbids_zeropage mm_use_skey
>  static inline int mm_use_skey(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_PGSTE
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index ab55ba8..58d7eb2 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	pgste_t pgste;
>  
>  	pgste = pgste_get_lock(pte);
> +	/*
> +	 * Remove all zero page mappings,
> +	 * after establishing a policy to forbid zero page mappings
> +	 * following faults for that page will get fresh anonymous pages
> +	 */
> +	if (is_zero_pfn(pte_pfn(*pte))) {
> +		ptep_flush_direct(walk->mm, addr, pte);
> +		pte_val(*pte) = _PAGE_INVALID;
> +	}
>  	/* Clear storage key */
>  	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
>  			      PGSTE_GR_BIT | PGSTE_GC_BIT);
> @@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
> +
> +	mm->context.use_skey = 1;
> +
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
> -	mm->context.use_skey = 1;
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> As soon as storage keys are enabled we need to stop working on zero page
> mappings to prevent inconsistencies between storage keys and pgste.
> 
> Otherwise following data corruption could happen:
> 1) guest enables storage key
> 2) guest sets storage key for not mapped page X
>    -> change goes to PGSTE
> 3) guest reads from page X
>    -> as X was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for X will go to storage key for zero page
> 4) guest sets storage key for not mapped page Y (same logic as above
> 5) guest reads from page Y
>    -> as Y was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for Y will got to storage key for zero page
>       overwriting storage key for X
> 
> While holding the mmap sem, we are safe against changes on entries we
> already fixed, as every fault would need to take the mmap_sem (read).
> 
> Other vCPUs executing storage key instructions will get a one time interception
> and be serialized also with mmap_sem.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  5 +++++
>  arch/s390/mm/pgtable.c          | 13 ++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 1e991f6a..0da98d6 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
>  	return 0;
>  }
>  
> +/*
> + * In the case that a guest uses storage keys
> + * faults should no longer be backed by zero pages
> + */
> +#define mm_forbids_zeropage mm_use_skey
>  static inline int mm_use_skey(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_PGSTE
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index ab55ba8..58d7eb2 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	pgste_t pgste;
>  
>  	pgste = pgste_get_lock(pte);
> +	/*
> +	 * Remove all zero page mappings,
> +	 * after establishing a policy to forbid zero page mappings
> +	 * following faults for that page will get fresh anonymous pages
> +	 */
> +	if (is_zero_pfn(pte_pfn(*pte))) {
> +		ptep_flush_direct(walk->mm, addr, pte);
> +		pte_val(*pte) = _PAGE_INVALID;
> +	}
>  	/* Clear storage key */
>  	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
>  			      PGSTE_GR_BIT | PGSTE_GC_BIT);
> @@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
> +
> +	mm->context.use_skey = 1;
> +
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
> -	mm->context.use_skey = 1;
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> As soon as storage keys are enabled we need to stop working on zero page
> mappings to prevent inconsistencies between storage keys and pgste.
> 
> Otherwise following data corruption could happen:
> 1) guest enables storage key
> 2) guest sets storage key for not mapped page X
>    -> change goes to PGSTE
> 3) guest reads from page X
>    -> as X was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for X will go to storage key for zero page
> 4) guest sets storage key for not mapped page Y (same logic as above
> 5) guest reads from page Y
>    -> as Y was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for Y will got to storage key for zero page
>       overwriting storage key for X
> 
> While holding the mmap sem, we are safe against changes on entries we
> already fixed, as every fault would need to take the mmap_sem (read).
> 
> Other vCPUs executing storage key instructions will get a one time interception
> and be serialized also with mmap_sem.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  5 +++++
>  arch/s390/mm/pgtable.c          | 13 ++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 1e991f6a..0da98d6 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
>  	return 0;
>  }
>  
> +/*
> + * In the case that a guest uses storage keys
> + * faults should no longer be backed by zero pages
> + */
> +#define mm_forbids_zeropage mm_use_skey
>  static inline int mm_use_skey(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_PGSTE
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index ab55ba8..58d7eb2 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	pgste_t pgste;
>  
>  	pgste = pgste_get_lock(pte);
> +	/*
> +	 * Remove all zero page mappings,
> +	 * after establishing a policy to forbid zero page mappings
> +	 * following faults for that page will get fresh anonymous pages
> +	 */
> +	if (is_zero_pfn(pte_pfn(*pte))) {
> +		ptep_flush_direct(walk->mm, addr, pte);
> +		pte_val(*pte) = _PAGE_INVALID;
> +	}
>  	/* Clear storage key */
>  	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
>  			      PGSTE_GR_BIT | PGSTE_GC_BIT);
> @@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
> +
> +	mm->context.use_skey = 1;
> +
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
> -	mm->context.use_skey = 1;
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> As soon as storage keys are enabled we need to stop working on zero page
> mappings to prevent inconsistencies between storage keys and pgste.
> 
> Otherwise following data corruption could happen:
> 1) guest enables storage key
> 2) guest sets storage key for not mapped page X
>    -> change goes to PGSTE
> 3) guest reads from page X
>    -> as X was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for X will go to storage key for zero page
> 4) guest sets storage key for not mapped page Y (same logic as above
> 5) guest reads from page Y
>    -> as Y was not dirty before, the page will be zero page backed,
>       storage key from PGSTE for Y will got to storage key for zero page
>       overwriting storage key for X
> 
> While holding the mmap sem, we are safe against changes on entries we
> already fixed, as every fault would need to take the mmap_sem (read).
> 
> Other vCPUs executing storage key instructions will get a one time interception
> and be serialized also with mmap_sem.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  5 +++++
>  arch/s390/mm/pgtable.c          | 13 ++++++++++++-
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 1e991f6a..0da98d6 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -481,6 +481,11 @@ static inline int mm_has_pgste(struct mm_struct *mm)
>  	return 0;
>  }
>  
> +/*
> + * In the case that a guest uses storage keys
> + * faults should no longer be backed by zero pages
> + */
> +#define mm_forbids_zeropage mm_use_skey
>  static inline int mm_use_skey(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_PGSTE
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index ab55ba8..58d7eb2 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -1309,6 +1309,15 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	pgste_t pgste;
>  
>  	pgste = pgste_get_lock(pte);
> +	/*
> +	 * Remove all zero page mappings,
> +	 * after establishing a policy to forbid zero page mappings
> +	 * following faults for that page will get fresh anonymous pages
> +	 */
> +	if (is_zero_pfn(pte_pfn(*pte))) {
> +		ptep_flush_direct(walk->mm, addr, pte);
> +		pte_val(*pte) = _PAGE_INVALID;
> +	}
>  	/* Clear storage key */
>  	pgste_val(pgste) &= ~(PGSTE_ACC_BITS | PGSTE_FP_BIT |
>  			      PGSTE_GR_BIT | PGSTE_GC_BIT);
> @@ -1327,9 +1336,11 @@ void s390_enable_skey(void)
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
> +
> +	mm->context.use_skey = 1;
> +
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
> -	mm->context.use_skey = 1;
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
  2014-10-22 11:09   ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 14:00     ` Paolo Bonzini
  -1 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

(missing R-b on patch 1 is _not_ a mistake :))

Paolo

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> When storage keys are enabled unmerge already merged pages and prevent
> new pages from being merged.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  2 +-
>  arch/s390/kvm/priv.c            | 17 ++++++++++++-----
>  arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 0da98d6..dfb38af 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
>  extern int vmem_add_mapping(unsigned long start, unsigned long size);
>  extern int vmem_remove_mapping(unsigned long start, unsigned long size);
>  extern int s390_enable_sie(void);
> -extern void s390_enable_skey(void);
> +extern int s390_enable_skey(void);
>  extern void s390_reset_cmma(struct mm_struct *mm);
>  
>  /*
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index f89c1cd..e0967fd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static void __skey_check_enable(struct kvm_vcpu *vcpu)
> +static int __skey_check_enable(struct kvm_vcpu *vcpu)
>  {
> +	int rc = 0;
>  	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
> -		return;
> +		return rc;
>  
> -	s390_enable_skey();
> +	rc = s390_enable_skey();
>  	trace_kvm_s390_skey_related_inst(vcpu);
>  	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
> +	return rc;
>  }
>  
>  
>  static int handle_skey(struct kvm_vcpu *vcpu)
>  {
> -	__skey_check_enable(vcpu);
> +	int rc = __skey_check_enable(vcpu);
>  
> +	if (rc)
> +		return rc;
>  	vcpu->stat.instruction_storage_key++;
>  
>  	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
>  		}
>  
>  		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
> -			__skey_check_enable(vcpu);
> +			int rc = __skey_check_enable(vcpu);
> +
> +			if (rc)
> +				return rc;
>  			if (set_guest_storage_key(current->mm, useraddr,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 58d7eb2..82aa528 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -18,6 +18,8 @@
>  #include <linux/rcupdate.h>
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
> +#include <linux/ksm.h>
> +#include <linux/mman.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	return 0;
>  }
>  
> -void s390_enable_skey(void)
> +int s390_enable_skey(void)
>  {
>  	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
>  	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	int rc = 0;
>  
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
>  
>  	mm->context.use_skey = 1;
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.use_skey = 0;
> +			rc = -ENOMEM;
> +			goto out_up;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
>  
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> +	return rc;
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_skey);
>  
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

(missing R-b on patch 1 is _not_ a mistake :))

Paolo

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> When storage keys are enabled unmerge already merged pages and prevent
> new pages from being merged.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  2 +-
>  arch/s390/kvm/priv.c            | 17 ++++++++++++-----
>  arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 0da98d6..dfb38af 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
>  extern int vmem_add_mapping(unsigned long start, unsigned long size);
>  extern int vmem_remove_mapping(unsigned long start, unsigned long size);
>  extern int s390_enable_sie(void);
> -extern void s390_enable_skey(void);
> +extern int s390_enable_skey(void);
>  extern void s390_reset_cmma(struct mm_struct *mm);
>  
>  /*
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index f89c1cd..e0967fd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static void __skey_check_enable(struct kvm_vcpu *vcpu)
> +static int __skey_check_enable(struct kvm_vcpu *vcpu)
>  {
> +	int rc = 0;
>  	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
> -		return;
> +		return rc;
>  
> -	s390_enable_skey();
> +	rc = s390_enable_skey();
>  	trace_kvm_s390_skey_related_inst(vcpu);
>  	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
> +	return rc;
>  }
>  
>  
>  static int handle_skey(struct kvm_vcpu *vcpu)
>  {
> -	__skey_check_enable(vcpu);
> +	int rc = __skey_check_enable(vcpu);
>  
> +	if (rc)
> +		return rc;
>  	vcpu->stat.instruction_storage_key++;
>  
>  	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
>  		}
>  
>  		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
> -			__skey_check_enable(vcpu);
> +			int rc = __skey_check_enable(vcpu);
> +
> +			if (rc)
> +				return rc;
>  			if (set_guest_storage_key(current->mm, useraddr,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 58d7eb2..82aa528 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -18,6 +18,8 @@
>  #include <linux/rcupdate.h>
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
> +#include <linux/ksm.h>
> +#include <linux/mman.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	return 0;
>  }
>  
> -void s390_enable_skey(void)
> +int s390_enable_skey(void)
>  {
>  	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
>  	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	int rc = 0;
>  
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
>  
>  	mm->context.use_skey = 1;
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.use_skey = 0;
> +			rc = -ENOMEM;
> +			goto out_up;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
>  
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> +	return rc;
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_skey);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

(missing R-b on patch 1 is _not_ a mistake :))

Paolo

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> When storage keys are enabled unmerge already merged pages and prevent
> new pages from being merged.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  2 +-
>  arch/s390/kvm/priv.c            | 17 ++++++++++++-----
>  arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 0da98d6..dfb38af 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
>  extern int vmem_add_mapping(unsigned long start, unsigned long size);
>  extern int vmem_remove_mapping(unsigned long start, unsigned long size);
>  extern int s390_enable_sie(void);
> -extern void s390_enable_skey(void);
> +extern int s390_enable_skey(void);
>  extern void s390_reset_cmma(struct mm_struct *mm);
>  
>  /*
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index f89c1cd..e0967fd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static void __skey_check_enable(struct kvm_vcpu *vcpu)
> +static int __skey_check_enable(struct kvm_vcpu *vcpu)
>  {
> +	int rc = 0;
>  	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
> -		return;
> +		return rc;
>  
> -	s390_enable_skey();
> +	rc = s390_enable_skey();
>  	trace_kvm_s390_skey_related_inst(vcpu);
>  	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
> +	return rc;
>  }
>  
>  
>  static int handle_skey(struct kvm_vcpu *vcpu)
>  {
> -	__skey_check_enable(vcpu);
> +	int rc = __skey_check_enable(vcpu);
>  
> +	if (rc)
> +		return rc;
>  	vcpu->stat.instruction_storage_key++;
>  
>  	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
>  		}
>  
>  		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
> -			__skey_check_enable(vcpu);
> +			int rc = __skey_check_enable(vcpu);
> +
> +			if (rc)
> +				return rc;
>  			if (set_guest_storage_key(current->mm, useraddr,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 58d7eb2..82aa528 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -18,6 +18,8 @@
>  #include <linux/rcupdate.h>
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
> +#include <linux/ksm.h>
> +#include <linux/mman.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	return 0;
>  }
>  
> -void s390_enable_skey(void)
> +int s390_enable_skey(void)
>  {
>  	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
>  	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	int rc = 0;
>  
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
>  
>  	mm->context.use_skey = 1;
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.use_skey = 0;
> +			rc = -ENOMEM;
> +			goto out_up;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
>  
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> +	return rc;
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_skey);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages
@ 2014-10-22 14:00     ` Paolo Bonzini
  0 siblings, 0 replies; 50+ messages in thread
From: Paolo Bonzini @ 2014-10-22 14:00 UTC (permalink / raw)
  To: Dominik Dingel, Andrew Morton, linux-mm, Mel Gorman,
	Michal Hocko, Dave Hansen, Rik van Riel
  Cc: Andrea Arcangeli, Andy Lutomirski, Aneesh Kumar K.V, Bob Liu,
	Christian Borntraeger, Cornelia Huck, Gleb Natapov,
	Heiko Carstens, H. Peter Anvin, Hugh Dickins, Ingo Molnar,
	Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov, kvm, linux390,
	linux-kernel, linux-s390, Martin Schwidefsky, Peter Zijlstra,
	Sasha Levin

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

(missing R-b on patch 1 is _not_ a mistake :))

Paolo

On 10/22/2014 01:09 PM, Dominik Dingel wrote:
> When storage keys are enabled unmerge already merged pages and prevent
> new pages from being merged.
> 
> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  arch/s390/include/asm/pgtable.h |  2 +-
>  arch/s390/kvm/priv.c            | 17 ++++++++++++-----
>  arch/s390/mm/pgtable.c          | 16 +++++++++++++++-
>  3 files changed, 28 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
> index 0da98d6..dfb38af 100644
> --- a/arch/s390/include/asm/pgtable.h
> +++ b/arch/s390/include/asm/pgtable.h
> @@ -1754,7 +1754,7 @@ static inline pte_t mk_swap_pte(unsigned long type, unsigned long offset)
>  extern int vmem_add_mapping(unsigned long start, unsigned long size);
>  extern int vmem_remove_mapping(unsigned long start, unsigned long size);
>  extern int s390_enable_sie(void);
> -extern void s390_enable_skey(void);
> +extern int s390_enable_skey(void);
>  extern void s390_reset_cmma(struct mm_struct *mm);
>  
>  /*
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index f89c1cd..e0967fd 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -156,21 +156,25 @@ static int handle_store_cpu_address(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
>  
> -static void __skey_check_enable(struct kvm_vcpu *vcpu)
> +static int __skey_check_enable(struct kvm_vcpu *vcpu)
>  {
> +	int rc = 0;
>  	if (!(vcpu->arch.sie_block->ictl & (ICTL_ISKE | ICTL_SSKE | ICTL_RRBE)))
> -		return;
> +		return rc;
>  
> -	s390_enable_skey();
> +	rc = s390_enable_skey();
>  	trace_kvm_s390_skey_related_inst(vcpu);
>  	vcpu->arch.sie_block->ictl &= ~(ICTL_ISKE | ICTL_SSKE | ICTL_RRBE);
> +	return rc;
>  }
>  
>  
>  static int handle_skey(struct kvm_vcpu *vcpu)
>  {
> -	__skey_check_enable(vcpu);
> +	int rc = __skey_check_enable(vcpu);
>  
> +	if (rc)
> +		return rc;
>  	vcpu->stat.instruction_storage_key++;
>  
>  	if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE)
> @@ -692,7 +696,10 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
>  		}
>  
>  		if (vcpu->run->s.regs.gprs[reg1] & PFMF_SK) {
> -			__skey_check_enable(vcpu);
> +			int rc = __skey_check_enable(vcpu);
> +
> +			if (rc)
> +				return rc;
>  			if (set_guest_storage_key(current->mm, useraddr,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_KEY,
>  					vcpu->run->s.regs.gprs[reg1] & PFMF_NQ))
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 58d7eb2..82aa528 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -18,6 +18,8 @@
>  #include <linux/rcupdate.h>
>  #include <linux/slab.h>
>  #include <linux/swapops.h>
> +#include <linux/ksm.h>
> +#include <linux/mman.h>
>  
>  #include <asm/pgtable.h>
>  #include <asm/pgalloc.h>
> @@ -1328,22 +1330,34 @@ static int __s390_enable_skey(pte_t *pte, unsigned long addr,
>  	return 0;
>  }
>  
> -void s390_enable_skey(void)
> +int s390_enable_skey(void)
>  {
>  	struct mm_walk walk = { .pte_entry = __s390_enable_skey };
>  	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	int rc = 0;
>  
>  	down_write(&mm->mmap_sem);
>  	if (mm_use_skey(mm))
>  		goto out_up;
>  
>  	mm->context.use_skey = 1;
> +	for (vma = mm->mmap; vma; vma = vma->vm_next) {
> +		if (ksm_madvise(vma, vma->vm_start, vma->vm_end,
> +				MADV_UNMERGEABLE, &vma->vm_flags)) {
> +			mm->context.use_skey = 0;
> +			rc = -ENOMEM;
> +			goto out_up;
> +		}
> +	}
> +	mm->def_flags &= ~VM_MERGEABLE;
>  
>  	walk.mm = mm;
>  	walk_page_range(0, TASK_SIZE, &walk);
>  
>  out_up:
>  	up_write(&mm->mmap_sem);
> +	return rc;
>  }
>  EXPORT_SYMBOL_GPL(s390_enable_skey);
>  
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
  2014-10-22 11:09   ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 19:22     ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:22 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif

Can we document this please?  What it does, why it does it.  We should
also specify precisely which arch header file is responsible for
defining mm_forbids_zeropage.



^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:22     ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:22 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky

On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif

Can we document this please?  What it does, why it does it.  We should
also specify precisely which arch header file is responsible for
defining mm_forbids_zeropage.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:22     ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:22 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky

On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif

Can we document this please?  What it does, why it does it.  We should
also specify precisely which arch header file is responsible for
defining mm_forbids_zeropage.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:22     ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:22 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> Add a new function stub to allow architectures to disable for
> an mm_structthe backing of non-present, anonymous pages with
> read-only empty zero pages.
> 
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
>  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
>  #endif
>  
> +#ifndef mm_forbids_zeropage
> +#define mm_forbids_zeropage(X)  (0)
> +#endif

Can we document this please?  What it does, why it does it.  We should
also specify precisely which arch header file is responsible for
defining mm_forbids_zeropage.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
  2014-10-22 19:22     ` Andrew Morton
  (?)
@ 2014-10-22 19:45       ` Dominik Dingel
  -1 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 19:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 12:22:23 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:
> 
> > Add a new function stub to allow architectures to disable for
> > an mm_structthe backing of non-present, anonymous pages with
> > read-only empty zero pages.
> > 
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
> >  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
> >  #endif
> >  
> > +#ifndef mm_forbids_zeropage
> > +#define mm_forbids_zeropage(X)  (0)
> > +#endif
> 
> Can we document this please?  What it does, why it does it.  We should
> also specify precisely which arch header file is responsible for
> defining mm_forbids_zeropage.
> 

I will add a comment like:

/*
 * To prevent common memory management code establishing
 * a zero page mapping on a read fault.
 * This function should be implemented within <asm/pgtable.h>.
 * s390 does this to prevent multiplexing of hardware bits
 * related to the physical page in case of virtualization.
 */

Okay?


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:45       ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 19:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm

On Wed, 22 Oct 2014 12:22:23 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:
> 
> > Add a new function stub to allow architectures to disable for
> > an mm_structthe backing of non-present, anonymous pages with
> > read-only empty zero pages.
> > 
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
> >  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
> >  #endif
> >  
> > +#ifndef mm_forbids_zeropage
> > +#define mm_forbids_zeropage(X)  (0)
> > +#endif
> 
> Can we document this please?  What it does, why it does it.  We should
> also specify precisely which arch header file is responsible for
> defining mm_forbids_zeropage.
> 

I will add a comment like:

/*
 * To prevent common memory management code establishing
 * a zero page mapping on a read fault.
 * This function should be implemented within <asm/pgtable.h>.
 * s390 does this to prevent multiplexing of hardware bits
 * related to the physical page in case of virtualization.
 */

Okay?


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:45       ` Dominik Dingel
  0 siblings, 0 replies; 50+ messages in thread
From: Dominik Dingel @ 2014-10-22 19:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 12:22:23 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 22 Oct 2014 13:09:28 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:
> 
> > Add a new function stub to allow architectures to disable for
> > an mm_structthe backing of non-present, anonymous pages with
> > read-only empty zero pages.
> > 
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -56,6 +56,10 @@ extern int sysctl_legacy_va_layout;
> >  #define __pa_symbol(x)  __pa(RELOC_HIDE((unsigned long)(x), 0))
> >  #endif
> >  
> > +#ifndef mm_forbids_zeropage
> > +#define mm_forbids_zeropage(X)  (0)
> > +#endif
> 
> Can we document this please?  What it does, why it does it.  We should
> also specify precisely which arch header file is responsible for
> defining mm_forbids_zeropage.
> 

I will add a comment like:

/*
 * To prevent common memory management code establishing
 * a zero page mapping on a read fault.
 * This function should be implemented within <asm/pgtable.h>.
 * s390 does this to prevent multiplexing of hardware bits
 * related to the physical page in case of virtualization.
 */

Okay?


> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
  2014-10-22 19:45       ` Dominik Dingel
  (?)
  (?)
@ 2014-10-22 19:49         ` Andrew Morton
  -1 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 21:45:52 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> > > +#ifndef mm_forbids_zeropage
> > > +#define mm_forbids_zeropage(X)  (0)
> > > +#endif
> > 
> > Can we document this please?  What it does, why it does it.  We should
> > also specify precisely which arch header file is responsible for
> > defining mm_forbids_zeropage.
> > 
> 
> I will add a comment like:
> 
> /*
>  * To prevent common memory management code establishing
>  * a zero page mapping on a read fault.
>  * This function should be implemented within <asm/pgtable.h>.

s/function should be implemented/macro should be defined/

>  * s390 does this to prevent multiplexing of hardware bits
>  * related to the physical page in case of virtualization.
>  */
> 
> Okay?

Looks great, thanks.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:49         ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky

On Wed, 22 Oct 2014 21:45:52 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> > > +#ifndef mm_forbids_zeropage
> > > +#define mm_forbids_zeropage(X)  (0)
> > > +#endif
> > 
> > Can we document this please?  What it does, why it does it.  We should
> > also specify precisely which arch header file is responsible for
> > defining mm_forbids_zeropage.
> > 
> 
> I will add a comment like:
> 
> /*
>  * To prevent common memory management code establishing
>  * a zero page mapping on a read fault.
>  * This function should be implemented within <asm/pgtable.h>.

s/function should be implemented/macro should be defined/

>  * s390 does this to prevent multiplexing of hardware bits
>  * related to the physical page in case of virtualization.
>  */
> 
> Okay?

Looks great, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:49         ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky

On Wed, 22 Oct 2014 21:45:52 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> > > +#ifndef mm_forbids_zeropage
> > > +#define mm_forbids_zeropage(X)  (0)
> > > +#endif
> > 
> > Can we document this please?  What it does, why it does it.  We should
> > also specify precisely which arch header file is responsible for
> > defining mm_forbids_zeropage.
> > 
> 
> I will add a comment like:
> 
> /*
>  * To prevent common memory management code establishing
>  * a zero page mapping on a read fault.
>  * This function should be implemented within <asm/pgtable.h>.

s/function should be implemented/macro should be defined/

>  * s390 does this to prevent multiplexing of hardware bits
>  * related to the physical page in case of virtualization.
>  */
> 
> Okay?

Looks great, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/4] mm: introduce mm_forbids_zeropage function
@ 2014-10-22 19:49         ` Andrew Morton
  0 siblings, 0 replies; 50+ messages in thread
From: Andrew Morton @ 2014-10-22 19:49 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini, Dave Hansen,
	Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Martin Schwidefsky,
	Peter Zijlstra, Sasha Levin

On Wed, 22 Oct 2014 21:45:52 +0200 Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> > > +#ifndef mm_forbids_zeropage
> > > +#define mm_forbids_zeropage(X)  (0)
> > > +#endif
> > 
> > Can we document this please?  What it does, why it does it.  We should
> > also specify precisely which arch header file is responsible for
> > defining mm_forbids_zeropage.
> > 
> 
> I will add a comment like:
> 
> /*
>  * To prevent common memory management code establishing
>  * a zero page mapping on a read fault.
>  * This function should be implemented within <asm/pgtable.h>.

s/function should be implemented/macro should be defined/

>  * s390 does this to prevent multiplexing of hardware bits
>  * related to the physical page in case of virtualization.
>  */
> 
> Okay?

Looks great, thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
  2014-10-22 11:09 ` Dominik Dingel
  (?)
@ 2014-10-23 10:19   ` Martin Schwidefsky
  -1 siblings, 0 replies; 50+ messages in thread
From: Martin Schwidefsky @ 2014-10-23 10:19 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Peter Zijlstra,
	Sasha Levin

On Wed, 22 Oct 2014 13:09:26 +0200
Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)
> 
> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag
> 
> Dominik Dingel (4):
>   s390/mm: recfactor global pgste updates
>   mm: introduce mm_forbids_zeropage function
>   s390/mm: prevent and break zero page mappings in case of storage keys
>   s390/mm: disable KSM for storage key enabled pages
> 
>  arch/s390/include/asm/pgalloc.h |   2 -
>  arch/s390/include/asm/pgtable.h |   8 +-
>  arch/s390/kvm/kvm-s390.c        |   2 +-
>  arch/s390/kvm/priv.c            |  17 ++--
>  arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
>  include/linux/mm.h              |   4 +
>  mm/huge_memory.c                |   2 +-
>  mm/memory.c                     |   2 +-
>  8 files changed, 106 insertions(+), 111 deletions(-)
 
Patches look good to me and as nobody seems to disagree with the proposed
solution I will add the code to the features branch of the s390 tree.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-23 10:19   ` Martin Schwidefsky
  0 siblings, 0 replies; 50+ messages in thread
From: Martin Schwidefsky @ 2014-10-23 10:19 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov

On Wed, 22 Oct 2014 13:09:26 +0200
Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)
> 
> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag
> 
> Dominik Dingel (4):
>   s390/mm: recfactor global pgste updates
>   mm: introduce mm_forbids_zeropage function
>   s390/mm: prevent and break zero page mappings in case of storage keys
>   s390/mm: disable KSM for storage key enabled pages
> 
>  arch/s390/include/asm/pgalloc.h |   2 -
>  arch/s390/include/asm/pgtable.h |   8 +-
>  arch/s390/kvm/kvm-s390.c        |   2 +-
>  arch/s390/kvm/priv.c            |  17 ++--
>  arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
>  include/linux/mm.h              |   4 +
>  mm/huge_memory.c                |   2 +-
>  mm/memory.c                     |   2 +-
>  8 files changed, 106 insertions(+), 111 deletions(-)
 
Patches look good to me and as nobody seems to disagree with the proposed
solution I will add the code to the features branch of the s390 tree.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process
@ 2014-10-23 10:19   ` Martin Schwidefsky
  0 siblings, 0 replies; 50+ messages in thread
From: Martin Schwidefsky @ 2014-10-23 10:19 UTC (permalink / raw)
  To: Dominik Dingel
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michal Hocko, Paolo Bonzini,
	Dave Hansen, Rik van Riel, Andrea Arcangeli, Andy Lutomirski,
	Aneesh Kumar K.V, Bob Liu, Christian Borntraeger, Cornelia Huck,
	Gleb Natapov, Heiko Carstens, H. Peter Anvin, Hugh Dickins,
	Ingo Molnar, Jianyu Zhan, Johannes Weiner, Kirill A. Shutemov,
	kvm, linux390, linux-kernel, linux-s390, Peter Zijlstra,
	Sasha Levin

On Wed, 22 Oct 2014 13:09:26 +0200
Dominik Dingel <dingel@linux.vnet.ibm.com> wrote:

> s390 has the special notion of storage keys which are some sort of page flags
> associated with physical pages and live outside of direct addressable memory.
> These storage keys can be queried and changed with a special set of instructions.
> The mentioned instructions behave quite nicely under virtualization, if there is: 
> - an invalid pte, then the instructions will work on memory in the host page table
> - a valid pte, then the instructions will work with the real storage key
> 
> Thanks to Martin with his software reference and dirty bit tracking,
> the kernel does not issue any storage key instructions as now a 
> software based approach will be taken, on the other hand distributions 
> in the wild are currently using them.
> 
> However, for virtualized guests we still have a problem with guest pages 
> mapped to zero pages and the kernel same page merging.  
> With each one multiple guest pages will point to the same physical page
> and share the same storage key.
> 
> Let's fix this by introducing a new function which s390 will define to
> forbid new zero page mappings.  If the guest issues a storage key related 
> instruction we flag the mm_struct, drop existing zero page mappings
> and unmerge the guest memory.
> 
> v2 -> v3:
>  - Clearing up patch description Patch 3/4
>  - removing unnecessary flag in mmu_context (Paolo)
> 
> v1 -> v2: 
>  - Following Dave and Paolo suggestion removing the vma flag
> 
> Dominik Dingel (4):
>   s390/mm: recfactor global pgste updates
>   mm: introduce mm_forbids_zeropage function
>   s390/mm: prevent and break zero page mappings in case of storage keys
>   s390/mm: disable KSM for storage key enabled pages
> 
>  arch/s390/include/asm/pgalloc.h |   2 -
>  arch/s390/include/asm/pgtable.h |   8 +-
>  arch/s390/kvm/kvm-s390.c        |   2 +-
>  arch/s390/kvm/priv.c            |  17 ++--
>  arch/s390/mm/pgtable.c          | 180 ++++++++++++++++++----------------------
>  include/linux/mm.h              |   4 +
>  mm/huge_memory.c                |   2 +-
>  mm/memory.c                     |   2 +-
>  8 files changed, 106 insertions(+), 111 deletions(-)
 
Patches look good to me and as nobody seems to disagree with the proposed
solution I will add the code to the features branch of the s390 tree.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2014-10-23 10:19 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-22 11:09 [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process Dominik Dingel
2014-10-22 11:09 ` Dominik Dingel
2014-10-22 11:09 ` Dominik Dingel
2014-10-22 11:09 ` Dominik Dingel
2014-10-22 11:09 ` [PATCH 1/4] s390/mm: recfactor global pgste updates Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09 ` [PATCH 2/4] mm: introduce mm_forbids_zeropage function Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 13:59   ` Paolo Bonzini
2014-10-22 13:59     ` Paolo Bonzini
2014-10-22 13:59     ` Paolo Bonzini
2014-10-22 13:59     ` Paolo Bonzini
2014-10-22 19:22   ` Andrew Morton
2014-10-22 19:22     ` Andrew Morton
2014-10-22 19:22     ` Andrew Morton
2014-10-22 19:22     ` Andrew Morton
2014-10-22 19:45     ` Dominik Dingel
2014-10-22 19:45       ` Dominik Dingel
2014-10-22 19:45       ` Dominik Dingel
2014-10-22 19:49       ` Andrew Morton
2014-10-22 19:49         ` Andrew Morton
2014-10-22 19:49         ` Andrew Morton
2014-10-22 19:49         ` Andrew Morton
2014-10-22 11:09 ` [PATCH 3/4] s390/mm: prevent and break zero page mappings in case of storage keys Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 14:00   ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 11:09 ` [PATCH 4/4] s390/mm: disable KSM for storage key enabled pages Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 11:09   ` Dominik Dingel
2014-10-22 14:00   ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 14:00     ` Paolo Bonzini
2014-10-22 13:59 ` [PATCH v3 0/4] mm: new function to forbid zeropage mappings for a process Paolo Bonzini
2014-10-22 13:59   ` Paolo Bonzini
2014-10-22 13:59   ` Paolo Bonzini
2014-10-22 13:59   ` Paolo Bonzini
2014-10-23 10:19 ` Martin Schwidefsky
2014-10-23 10:19   ` Martin Schwidefsky
2014-10-23 10:19   ` Martin Schwidefsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.