linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.vnet.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: [PATCH 4.12 32/43] s390/mm: avoid empty zero pages for KVM guests to avoid postcopy hangs
Date: Fri,  8 Sep 2017 15:19:20 +0200	[thread overview]
Message-ID: <20170908131827.724196709@linuxfoundation.org> (raw)
In-Reply-To: <20170908131826.555428826@linuxfoundation.org>

4.12-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Christian Borntraeger <borntraeger@de.ibm.com>

commit fa41ba0d08de7c975c3e94d0067553f9b934221f upstream.

Right now there is a potential hang situation for postcopy migrations,
if the guest is enabling storage keys on the target system during the
postcopy process.

For storage key virtualization, we have to forbid the empty zero page as
the storage key is a property of the physical page frame.  As we enable
storage key handling lazily we then drop all mappings for empty zero
pages for lazy refaulting later on.

This does not work with the postcopy migration, which relies on the
empty zero page never triggering a fault again in the future. The reason
is that postcopy migration will simply read a page on the target system
if that page is a known zero page to fault in an empty zero page.  At
the same time postcopy remembers that this page was already transferred
- so any future userfault on that page will NOT be retransmitted again
to avoid races.

If now the guest enters the storage key mode while in postcopy, we will
break this assumption of postcopy.

The solution is to disable the empty zero page for KVM guests early on
and not during storage key enablement. With this change, the postcopy
migration process is guaranteed to start after no zero pages are left.

As guest pages are very likely not empty zero pages anyway the memory
overhead is also pretty small.

While at it this also adds proper page table locking to the zero page
removal.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/s390/include/asm/pgtable.h |    2 +-
 arch/s390/mm/gmap.c             |   39 ++++++++++++++++++++++++++++++++-------
 2 files changed, 33 insertions(+), 8 deletions(-)

--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -502,7 +502,7 @@ static inline int mm_alloc_pgste(struct
  * In the case that a guest uses storage keys
  * faults should no longer be backed by zero pages
  */
-#define mm_forbids_zeropage mm_use_skey
+#define mm_forbids_zeropage mm_has_pgste
 static inline int mm_use_skey(struct mm_struct *mm)
 {
 #ifdef CONFIG_PGSTE
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2118,6 +2118,37 @@ static inline void thp_split_mm(struct m
 }
 
 /*
+ * Remove all empty zero pages from the mapping for lazy refaulting
+ * - This must be called after mm->context.has_pgste is set, to avoid
+ *   future creation of zero pages
+ * - This must be called after THP was enabled
+ */
+static int __zap_zero_pages(pmd_t *pmd, unsigned long start,
+			   unsigned long end, struct mm_walk *walk)
+{
+	unsigned long addr;
+
+	for (addr = start; addr != end; addr += PAGE_SIZE) {
+		pte_t *ptep;
+		spinlock_t *ptl;
+
+		ptep = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+		if (is_zero_pfn(pte_pfn(*ptep)))
+			ptep_xchg_direct(walk->mm, addr, ptep, __pte(_PAGE_INVALID));
+		pte_unmap_unlock(ptep, ptl);
+	}
+	return 0;
+}
+
+static inline void zap_zero_pages(struct mm_struct *mm)
+{
+	struct mm_walk walk = { .pmd_entry = __zap_zero_pages };
+
+	walk.mm = mm;
+	walk_page_range(0, TASK_SIZE, &walk);
+}
+
+/*
  * switch on pgstes for its userspace process (for kvm)
  */
 int s390_enable_sie(void)
@@ -2134,6 +2165,7 @@ int s390_enable_sie(void)
 	mm->context.has_pgste = 1;
 	/* split thp mappings and disable thp for future mappings */
 	thp_split_mm(mm);
+	zap_zero_pages(mm);
 	up_write(&mm->mmap_sem);
 	return 0;
 }
@@ -2146,13 +2178,6 @@ EXPORT_SYMBOL_GPL(s390_enable_sie);
 static int __s390_enable_skey(pte_t *pte, unsigned long addr,
 			      unsigned long next, struct mm_walk *walk)
 {
-	/*
-	 * Remove all zero page mappings,
-	 * after establishing a policy to forbid zero page mappings
-	 * following faults for that page will get fresh anonymous pages
-	 */
-	if (is_zero_pfn(pte_pfn(*pte)))
-		ptep_xchg_direct(walk->mm, addr, pte, __pte(_PAGE_INVALID));
 	/* Clear storage key */
 	ptep_zap_key(walk->mm, addr, pte);
 	return 0;

  parent reply	other threads:[~2017-09-08 13:39 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-08 13:18 [PATCH 4.12 00/43] 4.12.12-stable review Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 01/43] usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 02/43] USB: serial: option: add support for D-Link DWM-157 C1 Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 03/43] usb: Add device quirk for Logitech HD Pro Webcam C920-C Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 04/43] usb:xhci:Fix regression when ATI chipsets detected Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 05/43] USB: musb: fix external abort on suspend Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 06/43] ANDROID: binder: add padding to binder_fd_array_object Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 07/43] ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 08/43] USB: core: Avoid race of async_completed() w/ usbdev_release() Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 09/43] staging/rts5208: fix incorrect shift to extract upper nybble Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 10/43] iio: adc: ti-ads1015: fix incorrect data rate setting update Greg Kroah-Hartman
2017-09-08 13:18 ` [PATCH 4.12 11/43] iio: adc: ti-ads1015: fix scale information for ADS1115 Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 12/43] iio: adc: ti-ads1015: enable conversion when CONFIG_PM is not set Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 13/43] iio: adc: ti-ads1015: avoid getting stale result after runtime resume Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 14/43] iio: adc: ti-ads1015: dont return invalid value from buffer setup callbacks Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 15/43] iio: adc: ti-ads1015: add adequate wait time to get correct conversion Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 16/43] driver core: bus: Fix a potential double free Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 17/43] HID: wacom: Do not completely map WACOM_HID_WD_TOUCHRINGSTATUS usage Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 18/43] binder: free memory on error Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 21/43] fpga: altera-hps2fpga: fix multiple init of l3_remap_lock Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 22/43] intel_th: pci: Add Cannon Lake PCH-H support Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 23/43] intel_th: pci: Add Cannon Lake PCH-LP support Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 24/43] ath10k: fix memory leak in rx ring buffer allocation Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 25/43] Input: trackpoint - assume 3 buttons when buttons detection fails Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 26/43] rtlwifi: rtl_pci_probe: Fix fail path of _rtl_pci_find_adapter Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 27/43] Bluetooth: Add support of 13d3:3494 RTL8723BE device Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 28/43] iwlwifi: pci: add new PCI ID for 7265D Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 30/43] mwifiex: correct channel stat buffer overflows Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 31/43] MCB: add support for SC31 to mcb-lpc Greg Kroah-Hartman
2017-09-08 13:19 ` Greg Kroah-Hartman [this message]
2017-09-08 13:19 ` [PATCH 4.12 33/43] s390/mm: fix BUG_ON in crst_table_upgrade Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 34/43] drm/nouveau/pci/msi: disable MSI on big-endian platforms by default Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 35/43] drm/nouveau: Fix error handling in nv50_disp_atomic_commit Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 36/43] workqueue: Fix flag collision Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 37/43] ahci: dont use MSI for devices with the silly Intel NVMe remapping scheme Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 38/43] cs5536: add support for IDE controller variant Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 39/43] scsi: sg: protect against races between mmap() and SG_SET_RESERVED_SIZE Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 40/43] scsi: sg: recheck MMAP_IO request length with lock held Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 41/43] of/device: Prevent buffer overflow in of_device_modalias() Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 42/43] rtlwifi: Fix memory leak when firmware request fails Greg Kroah-Hartman
2017-09-08 13:19 ` [PATCH 4.12 43/43] rtlwifi: Fix fallback firmware loading Greg Kroah-Hartman
2017-09-08 18:29 ` [PATCH 4.12 00/43] 4.12.12-stable review Shuah Khan
2017-09-09 13:47 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170908131827.724196709@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=frankja@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).