linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Yang Shi <yang.shi@linux.alibaba.com>,
	Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.19 04/85] mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified
Date: Thu, 22 Aug 2019 10:18:37 -0700	[thread overview]
Message-ID: <20190822171731.190101948@linuxfoundation.org> (raw)
In-Reply-To: <20190822171731.012687054@linuxfoundation.org>

From: Yang Shi <yang.shi@linux.alibaba.com>

commit d883544515aae54842c21730b880172e7894fde9 upstream.

When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
try best to migrate misplaced pages, if some of the pages could not be
migrated, then return -EIO.

There are three different sub-cases:
 1. vma is not migratable
 2. vma is migratable, but there are unmovable pages
 3. vma is migratable, pages are movable, but migrate_pages() fails

If #1 happens, kernel would just abort immediately, then return -EIO,
after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
MPOL_MF_STRICT is specified").

If #3 happens, kernel would set policy and migrate pages with
best-effort, but won't rollback the migrated pages and reset the policy
back.

Before that commit, they behaves in the same way.  It'd better to keep
their behavior consistent.  But, rolling back the migrated pages and
resetting the policy back sounds not feasible, so just make #1 behave as
same as #3.

Userspace will know that not everything was successfully migrated (via
-EIO), and can take whatever steps it deems necessary - attempt
rollback, determine which exact page(s) are violating the policy, etc.

Make queue_pages_range() return 1 to indicate there are unmovable pages
or vma is not migratable.

The #2 is not handled correctly in the current kernel, the following
patch will fix it.

[yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
  Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/mempolicy.c |   68 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 20 deletions(-)

--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -429,11 +429,14 @@ static inline bool queue_pages_required(
 }
 
 /*
- * queue_pages_pmd() has three possible return values:
- * 1 - pages are placed on the right node or queued successfully.
- * 0 - THP was split.
- * -EIO - is migration entry or MPOL_MF_STRICT was specified and an existing
- *        page was already on a node that does not follow the policy.
+ * queue_pages_pmd() has four possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ *     specified.
+ * 2 - THP was split.
+ * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an
+ *        existing page was already on a node that does not follow the
+ *        policy.
  */
 static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr,
 				unsigned long end, struct mm_walk *walk)
@@ -451,19 +454,17 @@ static int queue_pages_pmd(pmd_t *pmd, s
 	if (is_huge_zero_page(page)) {
 		spin_unlock(ptl);
 		__split_huge_pmd(walk->vma, pmd, addr, false, NULL);
+		ret = 2;
 		goto out;
 	}
-	if (!queue_pages_required(page, qp)) {
-		ret = 1;
+	if (!queue_pages_required(page, qp))
 		goto unlock;
-	}
 
-	ret = 1;
 	flags = qp->flags;
 	/* go to thp migration */
 	if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
 		if (!vma_migratable(walk->vma)) {
-			ret = -EIO;
+			ret = 1;
 			goto unlock;
 		}
 
@@ -479,6 +480,13 @@ out:
 /*
  * Scan through pages checking if pages follow certain conditions,
  * and move them to the pagelist if they do.
+ *
+ * queue_pages_pte_range() has three possible return values:
+ * 0 - pages are placed on the right node or queued successfully.
+ * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ *     specified.
+ * -EIO - only MPOL_MF_STRICT was specified and an existing page was already
+ *        on a node that does not follow the policy.
  */
 static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
 			unsigned long end, struct mm_walk *walk)
@@ -488,17 +496,17 @@ static int queue_pages_pte_range(pmd_t *
 	struct queue_pages *qp = walk->private;
 	unsigned long flags = qp->flags;
 	int ret;
+	bool has_unmovable = false;
 	pte_t *pte;
 	spinlock_t *ptl;
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
 		ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
-		if (ret > 0)
-			return 0;
-		else if (ret < 0)
+		if (ret != 2)
 			return ret;
 	}
+	/* THP was split, fall through to pte walk */
 
 	if (pmd_trans_unstable(pmd))
 		return 0;
@@ -519,14 +527,21 @@ static int queue_pages_pte_range(pmd_t *
 		if (!queue_pages_required(page, qp))
 			continue;
 		if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
-			if (!vma_migratable(vma))
+			/* MPOL_MF_STRICT must be specified if we get here */
+			if (!vma_migratable(vma)) {
+				has_unmovable = true;
 				break;
+			}
 			migrate_page_add(page, qp->pagelist, flags);
 		} else
 			break;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
 	cond_resched();
+
+	if (has_unmovable)
+		return 1;
+
 	return addr != end ? -EIO : 0;
 }
 
@@ -639,7 +654,13 @@ static int queue_pages_test_walk(unsigne
  *
  * If pages found in a given range are on a set of nodes (determined by
  * @nodes and @flags,) it's isolated and queued to the pagelist which is
- * passed via @private.)
+ * passed via @private.
+ *
+ * queue_pages_range() has three possible return values:
+ * 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
+ *     specified.
+ * 0 - queue pages successfully or no misplaced page.
+ * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
  */
 static int
 queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
@@ -1168,6 +1189,7 @@ static long do_mbind(unsigned long start
 	struct mempolicy *new;
 	unsigned long end;
 	int err;
+	int ret;
 	LIST_HEAD(pagelist);
 
 	if (flags & ~(unsigned long)MPOL_MF_VALID)
@@ -1229,10 +1251,15 @@ static long do_mbind(unsigned long start
 	if (err)
 		goto mpol_out;
 
-	err = queue_pages_range(mm, start, end, nmask,
+	ret = queue_pages_range(mm, start, end, nmask,
 			  flags | MPOL_MF_INVERT, &pagelist);
-	if (!err)
-		err = mbind_range(mm, start, end, new);
+
+	if (ret < 0) {
+		err = -EIO;
+		goto up_out;
+	}
+
+	err = mbind_range(mm, start, end, new);
 
 	if (!err) {
 		int nr_failed = 0;
@@ -1245,13 +1272,14 @@ static long do_mbind(unsigned long start
 				putback_movable_pages(&pagelist);
 		}
 
-		if (nr_failed && (flags & MPOL_MF_STRICT))
+		if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
 			err = -EIO;
 	} else
 		putback_movable_pages(&pagelist);
 
+up_out:
 	up_write(&mm->mmap_sem);
- mpol_out:
+mpol_out:
 	mpol_put(new);
 	return err;
 }



  parent reply	other threads:[~2019-08-22 17:32 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-22 17:18 [PATCH 4.19 00/85] 4.19.68-stable review Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 01/85] sh: kernel: hw_breakpoint: Fix missing break in switch statement Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 02/85] seq_file: fix problem when seeking mid-record Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 03/85] mm/hmm: fix bad subpage pointer in try_to_unmap_one Greg Kroah-Hartman
2019-08-22 17:18 ` Greg Kroah-Hartman [this message]
2019-08-22 17:18 ` [PATCH 4.19 05/85] mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 06/85] mm/memcontrol.c: fix use after free in mem_cgroup_iter() Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 07/85] mm/usercopy: use memory range to be accessed for wraparound check Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 08/85] Revert "pwm: Set class for exported channels in sysfs" Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 09/85] cpufreq: schedutil: Dont skip freq update when limits change Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 10/85] xtensa: add missing isync to the cpu_reset TLB code Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 11/85] ALSA: hda/realtek - Add quirk for HP Envy x360 Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 12/85] ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 13/85] ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 14/85] ALSA: hda - Apply workaround for another AMD chip 1022:1487 Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 15/85] ALSA: hda - Fix a memory leak bug Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 16/85] ALSA: hda - Add a generic reboot_notify Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 17/85] ALSA: hda - Let all conexant codec enter D3 when rebooting Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 18/85] HID: holtek: test for sanity of intfdata Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 19/85] HID: hiddev: avoid opening a disconnected device Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 20/85] HID: hiddev: do cleanup in failure of opening a device Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 21/85] Input: kbtab - sanity check for endpoint type Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 22/85] Input: iforce - add sanity checks Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 23/85] net: usb: pegasus: fix improper read if get_registers() fail Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 24/85] netfilter: ebtables: also count base chain policies Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 25/85] riscv: Make __fstate_clean() work correctly Greg Kroah-Hartman
2019-08-22 17:18 ` [PATCH 4.19 26/85] clk: at91: generated: Truncate divisor to GENERATED_MAX_DIV + 1 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 27/85] clk: sprd: Select REGMAP_MMIO to avoid compile errors Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 28/85] clk: renesas: cpg-mssr: Fix reset control race condition Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 29/85] xen/pciback: remove set but not used variable old_state Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 30/85] irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 31/85] irqchip/irq-imx-gpcv2: Forward irq type to parent Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 32/85] perf header: Fix divide by zero error if f_header.attr_size==0 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 33/85] perf header: Fix use of unitialized value warning Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 34/85] libata: zpodd: Fix small read overflow in zpodd_get_mech_type() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 35/85] drm/bridge: lvds-encoder: Fix build error while CONFIG_DRM_KMS_HELPER=m Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 36/85] Btrfs: fix deadlock between fiemap and transaction commits Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 37/85] scsi: hpsa: correct scsi command status issue after reset Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 38/85] scsi: qla2xxx: Fix possible fcport null-pointer dereferences Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 39/85] drm/amdgpu: fix a potential information leaking bug Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 40/85] ata: libahci: do not complain in case of deferred probe Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 41/85] kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 42/85] kbuild: Check for unknown options with cc-option usage in Kconfig and clang Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 43/85] arm64/efi: fix variable si set but not used Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 44/85] arm64: unwind: Prohibit probing on return_address() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 45/85] arm64/mm: fix variable pud set but not used Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 46/85] IB/core: Add mitigation for Spectre V1 Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 47/85] IB/mlx5: Fix MR registration flow to use UMR properly Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 48/85] IB/mad: Fix use-after-free in ib mad completion handling Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 49/85] drm: msm: Fix add_gpu_components Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 50/85] drm/exynos: fix missing decrement of retry counter Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 51/85] Revert "kmemleak: allow to coexist with fault injection" Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 52/85] ocfs2: remove set but not used variable last_hash Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 53/85] asm-generic: fix -Wtype-limits compiler warnings Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 54/85] arm64: KVM: regmap: Fix unexpected switch fall-through Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 55/85] KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 56/85] staging: comedi: dt3000: Fix signed integer overflow divider * base Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 57/85] staging: comedi: dt3000: Fix rounding up of timer divisor Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 58/85] iio: adc: max9611: Fix temperature reading in probe Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 59/85] USB: core: Fix races in character device registration and deregistraion Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 60/85] usb: gadget: udc: renesas_usb3: Fix sysfs interface of "role" Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 61/85] usb: cdc-acm: make sure a refcount is taken early enough Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 62/85] USB: CDC: fix sanity checks in CDC union parser Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 63/85] USB: serial: option: add D-Link DWM-222 device ID Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 64/85] USB: serial: option: Add support for ZTE MF871A Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 65/85] USB: serial: option: add the BroadMobi BM818 card Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 66/85] USB: serial: option: Add Motorola modem UARTs Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 67/85] drm/i915/cfl: Add a new CFL PCI ID Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 68/85] dm: disable DISCARD if the underlying storage no longer supports it Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 69/85] arm64: ftrace: Ensure module ftrace trampoline is coherent with I-side Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 70/85] netfilter: conntrack: Use consistent ct id hash calculation Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 71/85] Input: psmouse - fix build error of multiple definition Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 72/85] iommu/amd: Move iommu_init_pci() to .init section Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 73/85] bnx2x: Fix VFs VLAN reconfiguration in reload Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 74/85] bonding: Add vlan tx offload to hw_enc_features Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 75/85] net: dsa: Check existence of .port_mdb_add callback before calling it Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 76/85] net/mlx4_en: fix a memory leak bug Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 77/85] net/packet: fix race in tpacket_snd() Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 78/85] sctp: fix memleak in sctp_send_reset_streams Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 79/85] sctp: fix the transport error_count check Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 80/85] team: Add vlan tx offload to hw_enc_features Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 81/85] tipc: initialise addr_trail_end when setting node addresses Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 82/85] xen/netback: Reset nr_frags before freeing skb Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 83/85] net/mlx5e: Only support tx/rx pause setting for port owner Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 84/85] net/mlx5e: Use flow keys dissector to parse packets for ARFS Greg Kroah-Hartman
2019-08-22 17:19 ` [PATCH 4.19 85/85] mmc: sdhci-of-arasan: Do now show error message in case of deffered probe Greg Kroah-Hartman
2019-08-22 21:17 ` [PATCH 4.19 00/85] 4.19.68-stable review kernelci.org bot
2019-08-23  2:08 ` Jon Hunter
2019-08-23  8:06 ` Naresh Kamboju
2019-08-23 14:28 ` Guenter Roeck
2019-08-24 17:51 ` shuah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190822171731.190101948@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).