From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751912AbeA2WGZ (ORCPT ); Mon, 29 Jan 2018 17:06:25 -0500 Received: from mail-by2nam03on0132.outbound.protection.outlook.com ([104.47.42.132]:5876 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751837AbeA2WGX (ORCPT ); Mon, 29 Jan 2018 17:06:23 -0500 Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=zi.yan@cs.rutgers.edu; From: "Zi Yan" To: "Michal Hocko" Cc: "Andrew Morton" , "Naoya Horiguchi" , "Kirill A. Shutemov" , "Vlastimil Babka" , "Andrea Reale" , "Anshuman Khandual" , linux-mm@kvack.org, LKML , "Michal Hocko" Subject: Re: [PATCH 1/3] mm, numa: rework do_pages_move Date: Mon, 29 Jan 2018 17:06:14 -0500 X-Mailer: MailMate (2.0BETAr6103) Message-ID: <8ECFD324-D8A0-47DC-A6FD-B9F7D29445DC@cs.rutgers.edu> In-Reply-To: <20180103082555.14592-2-mhocko@kernel.org> References: <20180103082555.14592-1-mhocko@kernel.org> <20180103082555.14592-2-mhocko@kernel.org> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=_MailMate_20A6E0EC-1FE1-43CC-A91A-F9813CA0E004_="; micalg=pgp-sha512; protocol="application/pgp-signature" X-Originating-IP: [128.6.37.110] X-ClientProxiedBy: YTXPR0101CA0069.CANPRD01.PROD.OUTLOOK.COM (52.132.32.174) To MWHPR14MB1664.namprd14.prod.outlook.com (10.171.146.146) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 5fb2a718-8f3f-4dd7-57d7-08d567648755 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(4534165)(4627221)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307)(7153060)(7193020);SRVR:MWHPR14MB1664; X-Microsoft-Exchange-Diagnostics: 1;MWHPR14MB1664;3:v+BEvYYKeNkiOy5f3Fbb4TYz6EJf9x8vdJCH7B4ZCoiStl9FID1wcJfUs+u2CFh459OJDEkEmb7lFEzDwd/PwQBd0yz4oevg9r4YT45Ld5xyKSyp9HaxUkvt6OlYcqT8xHdBx1FDlZdhlrtq4Rv1Jpjuo1PmxK5jE5EY6toGukeuklbpzvwXwqLeqrLfoZBUwyQE9lqNa3XpaNuu4iwbQu2vfxNSXsWa4UIFFTIKdSAq6QrO2sNoyQ37sbMGCDZy;25:dol3NUhRwpnrLPpCJnHsmH2vlauD6AhJ1V/csh2/CnXk51Z2HcGPxGntGxaRskC1uKTRQe7+i+n766PG8wmC7PjR5XnDKH5BLoOYmbZy213Jyso83df4XEI8IoiEabmkPMSS2EZIzb9LAMoixIy4soWeP2HswZS4pP1vlO4cmFqne/5KeBzrSLWdG0AhUAUHecvJ9vOguqk+Dd/QCkbl7jbzBckNJ/KI51ArfVgkp53QH4RMpo47poHdXFEE2qMLQfrUWWDEwZOG5KtQhqdsGsf19ZwN6wiVnL2KAGlVhzS5ua6cG4BgwSUZ/EC7+XJpo2vadhjGNLCW1E3fzYi4ag==;31:E3ybsIC8qRxs1R9womFCmKsDKR2IWKyUgv4T+lw+E2/M4Zj4niOrVzx2W0lXbhAadujaBq83JMgTy5lRoNdhXiBjXEOcySZnmZwmdzJJvzWLoo+r0XH7XN3g/nSoQu2HrN7ZXnkkHdeJyV5h4pYJoZ9vQA1/6gWgV7KJwQe/wWQrJrr397GmfixFkcW6tIg3u2pLfhQXMH4JhmwAnuxpNKlxBm0Txd3IOzsH3TRuTKE= X-MS-TrafficTypeDiagnostic: MWHPR14MB1664: X-Microsoft-Exchange-Diagnostics: 1;MWHPR14MB1664;20:HihYUiDlfAT324oGEY5lez8MUvGqlAqsL/Tb1lkPu5vuyk1tXtp8Mc+93BL9zHlwA7zSWVAiG8ARZ7EUzfFH0KRHmMPfpIFzwqD5wsWwrfSV39e8prPW60gWp6Knrdo2ejs6sn4vgSiHXmQj9U563w5kII32/garWQnm751/hnjioCX2pgKUNdOPEEpTvk73pKrSZN5WuU1572FOMNg9T80ooWCzOP7jR+29F90lBSU5RGo89OaYi9x/ebkgNuXT/h3jIsyNRfiLvl7PGBaOuZEfKwgkHDK8yGt+fwXk9WKU3C7C9v2Gf40PJ6LqYNTwsMQtfbXpi+8A82kcIpNcmYuriuymacMv9rqysd+elMIY8ARAYHpySVG9YNShDxdYER28Vdz0/NotQ0i9AYPDMGfRoODsPOr69D8oMsfacassHfePG5jslf6VvlaGF26cWfXUKe9iRYIsIkXP8TQgXUl0RyDP3xJUZVabNWZ9/ip/qXb3x2Ntkl5rIFAxoh/o;4:B/p8816EMKlpVdxFoj1I1anpnhHeMrr9R8N9MZOu+XzFwBSam7ZkYExzyHtLd11Rp7GLDHZ4wdw+DZMGqhlL2zD9j+CkkrqAXqa9jnj2PAo5yRGHiNYjLTbhBUtBZ4LDXGklAHHmbRWC4IelVyiCIxCKP54sFkx499r40nrPisE4++Z/qW/i/xVK7HA/e/hQYx4LvnuLvSpJOqZ2mP6VG2nFNTghTNoY8lpM6Pp3sjo0moL1GqBVL+qYwVALMPJc1AvuB5il7bOVI8T69ZaOcc1FEd/GltToMxf8IL9ZaFeWN4kXd6p2emQBdb+Ro3Q7 X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(17755550239193); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040501)(2401047)(8121501046)(5005006)(93006095)(93001095)(3231101)(11241501184)(944501161)(3002001)(10201501046)(6041288)(20161123562045)(20161123558120)(20161123564045)(20161123560045)(201703131423095)(201702281529075)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011);SRVR:MWHPR14MB1664;BCL:0;PCL:0;RULEID:;SRVR:MWHPR14MB1664; X-Forefront-PRVS: 0567A15835 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10019020)(6049001)(396003)(366004)(346002)(376002)(39380400002)(39860400002)(189003)(199004)(25786009)(7416002)(84326002)(305945005)(26005)(6486002)(97736004)(90366009)(66066001)(186003)(4326008)(6916009)(42882006)(5660300001)(76176011)(6246003)(16526019)(575784001)(2950100002)(86362001)(386003)(50226002)(53546011)(568964002)(83716003)(6666003)(106356001)(105586002)(8936002)(316002)(786003)(52116002)(81166006)(82746002)(8676002)(68736007)(53936002)(16576012)(75432002)(88552002)(16586007)(5890100001)(2906002)(54906003)(77096007)(478600001)(59450400001)(81156014)(33656002)(3846002)(6116002)(7736002)(229853002)(72826003);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR14MB1664;H:[172.27.40.195];FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;MWHPR14MB1664;23:6dZ/FWeloer1untgFrhdne+TYHP5NftLY85sdxtpZ?= =?us-ascii?Q?APHmE4cZGwQOdCfK+qPMrK6JEGI6c4BaAehJwlqx4D44syQcqp9dqgRnwBNd?= =?us-ascii?Q?2CIS4QjmzFR/qNHLT0oNK2glMNBGUxoyJKbrsfU6K5HqyXnC0CWk4bhQI0o0?= =?us-ascii?Q?yBArHmuwOtdLxlC/ng4+RiCXmgEMRvhUBennuO8DlcklgXi16DdbJlkZg8O0?= =?us-ascii?Q?c91DoxeKahnqcxXDS9SUG6mDdpSrQGu2d9vkTR+w5dxsOh66OqG9UrO5nQTp?= =?us-ascii?Q?qYNJrIqvhDMr2lKcXvlXRxK3I6eQv3PtBAjuauA0tm907Q6cIUIojFDB9hxd?= =?us-ascii?Q?Tl2hA3eefRv2N/JKIq8OLmQtb91ahY5AKfaSXhZ06x9oJ48trWb3GxH8Vpc4?= =?us-ascii?Q?Yyr780simSBqrbxMtMKWVEA7RDMCDA4Yat29A7Dq2BE6UH+FgyPmSZGbb5Oz?= =?us-ascii?Q?UB6k3gt4r/6v+qeWA3W+KTVBcavJ1sY4MxMOsOR6YnNGSeKQODLuca/vq5Se?= =?us-ascii?Q?q0tCmq3jtL/XF8BVw7SlHbBEci03SvPe2811jIweGb/RvVSNiXNvf4RzyB8O?= =?us-ascii?Q?ufpAkplNxMwVxOLvySnBke7hML5ioH/YcqjTGSF0Gdt3Fqk+DvFw/NpCCmbI?= =?us-ascii?Q?jsQqFGh2c5mCyzh3DZVEaJHqgqz/2S6yvvJjtC1SIY8WO5EUPJNkT4JCLB9y?= =?us-ascii?Q?RA9D0NSOTTRKHOTHb4VxxGGM01YhEn8kqCePRnXRdQCNHAwXdi3hhDNVla6f?= =?us-ascii?Q?RnWtgukwpgx7y/zfld2SbEy2SZ2ot8uqFHtRYFHGhUsQrh+8nxpomb3eX7sQ?= =?us-ascii?Q?YcFp9hga19HlDa7eehO1MZ4w6FVR2/RJWbys7jjQxZFLtp0n46effGR4Epy5?= =?us-ascii?Q?d/AcErL+yeufpm1QaDO+cx+GDZRl39OoOCs1MMQoyex888NAzxEQLXsZF+tq?= =?us-ascii?Q?81Sv78PKrWaSJTgJXT/Fhb+3zooNmtp4UOkOViFHQNlAIw7/dNoPMvWS7tjY?= =?us-ascii?Q?8cgNZFcxY7Y1Qma1iRNdYJFwVfREmfHMoffKq3pkbhPZAQsIGjCjSTH+G77m?= =?us-ascii?Q?eB23i0ehb8REl5VvZoWzRnbsD8sbAWYDlP34AR8JBGAJ/Zgd+cCN2Cz8CyaX?= =?us-ascii?Q?7lC/0j3u2HBvXfw5aObVGCIg/KCfkyqQaCQ+hOkktKf/aV76tYyzr0Z54O8g?= =?us-ascii?Q?+1AaMhrxyIHaFzWsonLmdJCUJL/QZdC8KaUSsZ9IB1bDjpZOAabf/brFCI9p?= =?us-ascii?Q?qTXzx1r82OR8U0bClgLPSGr6bp44J2+A+xhPWzaHyzoXh4ggoAWPFFgkXL3e?= =?us-ascii?Q?sCDI9VCcrn1DP1XuRo2nLTI0QfKFr6UuhB+TaxHKj60+TZ9bkYPkrfUgiWIE?= =?us-ascii?Q?HH0yH729VaLrp4+HrvqLwAZwk44e7Wr7eQ+8ZfcZ3dCDJjau5k4HJVPIgEO2?= =?us-ascii?Q?/Xo8nBwjg=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;MWHPR14MB1664;6:R6133P2EVxaX3RSQexd1totFi/h4Ze1X2tEqXU95PwQcyOfgOz7VoAYQTr8aIhG8QOz2Vo4lIYB9ENGiypTNppVjGMjKhw0in4ByhamhIa+DSMMc7fcxTQkRqPQpTeUy5jPYe03yXcKQMAM/igpo9Px+JVxhafiWUEK3DEIwSG6Z3L1tW/UaNVM3/Bo2gowHBPK0BAQwLCzejIXBUqxFiEaaOwmYIxSr3RJemg9iYuwimh3hrHg1EyByhT6npqEBTTc4Ss/WxbV5QqG6sgW62msh+2wwpFybqKWSZROPqKNhxRe0zd1h+b6RYJPmeQ1WBsWfjthI2ESFSZVsC/8798yli7CkqVkVqqy8ijxoLZM=;5:2JmgawhL1NouXPhzXBHAPfzF/W+4/3JPzAxTzlpmOB3Zx2eeAPIssK9Ncnvnty58YXMhnO3O+ENl1YiX+EJPW6QHuLHKDsXszJCM2FFP78Uta0IREp4hOWOSPKvQEL4f9PtB6+jhhgX8lWGslLg8o/3Hzam6eGFktDFANtXiges=;24:sBIJOobXVF6q5HsEAPJtV/pq6sDfJk3FgzjOGUJcTojwK3viT//lZlWb5NRHnfIans9PjxQG1/bBEAGmQYQ3IuZM0MKLT3Z0dFdMaBTsSwQ=;7:NQxTt5zQfqNxABIhppyGtqXC7vHpQg7xXyozE08sUoI7Pprx2vZtqzLcSKEH2szCZWMjNPXKsoua/fQ+3Qo1EYVkw6YTmikn1+W1q5JoWn7IUlqvplYzdSvx+unYV2/gZ/k0uYhlry91ws03CzwjNmkXN11evopYUBdyCVlNSLkTQEchBVUT1MC0PYsIrjhLo9/+w4sQT53XDhf0pzwewev2ZoFOV6XrqRnvWEWr8He5vsmdKYJWd7qRJ2qO+Bym SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: cs.rutgers.edu X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Jan 2018 22:06:19.6763 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5fb2a718-8f3f-4dd7-57d7-08d567648755 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: b92d2b23-4d35-4470-93ff-69aca6632ffe X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR14MB1664 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 3156 and 4880). --=_MailMate_20A6E0EC-1FE1-43CC-A91A-F9813CA0E004_= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi Michal, I discover that this patch does not hold mmap_sem while migrating pages i= n do_move_pages_to_node(). A simple fix below moves mmap_sem from add_page_for_migration() to the outmost do_pages_move(): diff --git a/mm/migrate.c b/mm/migrate.c index 5d0dc7b85f90..28b9e126cb38 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1487,7 +1487,6 @@ static int add_page_for_migration(struct mm_struct = *mm, unsigned long addr, unsigned int follflags; int err; - down_read(&mm->mmap_sem); err =3D -EFAULT; vma =3D find_vma(mm, addr); if (!vma || addr < vma->vm_start || !vma_migratable(vma)) @@ -1540,7 +1539,6 @@ static int add_page_for_migration(struct mm_struct = *mm, unsigned long addr, */ put_page(page); out: - up_read(&mm->mmap_sem); return err; } @@ -1561,6 +1559,7 @@ static int do_pages_move(struct mm_struct *mm, node= mask_t task_nodes, migrate_prep(); + down_read(&mm->mmap_sem); for (i =3D start =3D 0; i < nr_pages; i++) { const void __user *p; unsigned long addr; @@ -1628,6 +1627,7 @@ static int do_pages_move(struct mm_struct *mm, node= mask_t task_nodes, if (!err) err =3D err1; out: + up_read(&mm->mmap_sem); return err; } -- Best Regards Yan Zi On 3 Jan 2018, at 3:25, Michal Hocko wrote: > From: Michal Hocko > > do_pages_move is supposed to move user defined memory (an array of > addresses) to the user defined numa nodes (an array of nodes one for > each address). The user provided status array then contains resulting > numa node for each address or an error. The semantic of this function i= s > little bit confusing because only some errors are reported back. Notabl= y > migrate_pages error is only reported via the return value. This patch > doesn't try to address these semantic nuances but rather change the > underlying implementation. > > Currently we are processing user input (which can be really large) > in batches which are stored to a temporarily allocated page. Each > address is resolved to its struct page and stored to page_to_node > structure along with the requested target numa node. The array of these= > structures is then conveyed down the page migration path via private > argument. new_page_node then finds the corresponding structure and > allocates the proper target page. > > What is the problem with the current implementation and why to change > it? Apart from being quite ugly it also doesn't cope with unexpected > pages showing up on the migration list inside migrate_pages path. > That doesn't happen currently but the follow up patch would like to > make the thp migration code more clear and that would need to split a > THP into the list for some cases. > > How does the new implementation work? Well, instead of batching into a > fixed size array we simply batch all pages that should be migrated to > the same node and isolate all of them into a linked list which doesn't > require any additional storage. This should work reasonably well becaus= e > page migration usually migrates larger ranges of memory to a specific > node. So the common case should work equally well as the current > implementation. Even if somebody constructs an input where the target > numa nodes would be interleaved we shouldn't see a large performance > impact because page migration alone doesn't really benefit from > batching. mmap_sem batching for the lookup is quite questionable and > isolate_lru_page which would benefit from batching is not using it even= > in the current implementation. > > Acked-by: Kirill A. Shutemov > Signed-off-by: Michal Hocko > --- > mm/internal.h | 1 + > mm/mempolicy.c | 5 +- > mm/migrate.c | 306 +++++++++++++++++++++++++------------------------= -------- > 3 files changed, 138 insertions(+), 174 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index 3e5dc95dc259..745e247aca9c 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -540,4 +540,5 @@ static inline bool is_migrate_highatomic_page(struc= t page *page) > } > > void setup_zone_pageset(struct zone *zone); > +extern struct page *alloc_new_node_page(struct page *page, unsigned lo= ng node, int **x); > #endif /* __MM_INTERNAL_H */ > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index f604b22ebb65..66c9c79b21be 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -942,7 +942,8 @@ static void migrate_page_add(struct page *page, str= uct list_head *pagelist, > } > } > > -static struct page *new_node_page(struct page *page, unsigned long nod= e, int **x) > +/* page allocation callback for NUMA node migration */ > +struct page *alloc_new_node_page(struct page *page, unsigned long node= , int **x) > { > if (PageHuge(page)) > return alloc_huge_page_node(page_hstate(compound_head(page)), > @@ -986,7 +987,7 @@ static int migrate_to_node(struct mm_struct *mm, in= t source, int dest, > flags | MPOL_MF_DISCONTIG_OK, &pagelist); > > if (!list_empty(&pagelist)) { > - err =3D migrate_pages(&pagelist, new_node_page, NULL, dest, > + err =3D migrate_pages(&pagelist, alloc_new_node_page, NULL, dest, > MIGRATE_SYNC, MR_SYSCALL); > if (err) > putback_movable_pages(&pagelist); > diff --git a/mm/migrate.c b/mm/migrate.c > index 4d0be47a322a..8fb90bcd44a7 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1444,141 +1444,103 @@ int migrate_pages(struct list_head *from, new= _page_t get_new_page, > } > > #ifdef CONFIG_NUMA > -/* > - * Move a list of individual pages > - */ > -struct page_to_node { > - unsigned long addr; > - struct page *page; > - int node; > - int status; > -}; > > -static struct page *new_page_node(struct page *p, unsigned long privat= e, > - int **result) > +static int store_status(int __user *status, int start, int value, int = nr) > { > - struct page_to_node *pm =3D (struct page_to_node *)private; > - > - while (pm->node !=3D MAX_NUMNODES && pm->page !=3D p) > - pm++; > + while (nr-- > 0) { > + if (put_user(value, status + start)) > + return -EFAULT; > + start++; > + } > > - if (pm->node =3D=3D MAX_NUMNODES) > - return NULL; > + return 0; > +} > > - *result =3D &pm->status; > +static int do_move_pages_to_node(struct mm_struct *mm, > + struct list_head *pagelist, int node) > +{ > + int err; > > - if (PageHuge(p)) > - return alloc_huge_page_node(page_hstate(compound_head(p)), > - pm->node); > - else if (thp_migration_supported() && PageTransHuge(p)) { > - struct page *thp; > + if (list_empty(pagelist)) > + return 0; > > - thp =3D alloc_pages_node(pm->node, > - (GFP_TRANSHUGE | __GFP_THISNODE) & ~__GFP_RECLAIM, > - HPAGE_PMD_ORDER); > - if (!thp) > - return NULL; > - prep_transhuge_page(thp); > - return thp; > - } else > - return __alloc_pages_node(pm->node, > - GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0); > + err =3D migrate_pages(pagelist, alloc_new_node_page, NULL, node, > + MIGRATE_SYNC, MR_SYSCALL); > + if (err) > + putback_movable_pages(pagelist); > + return err; > } > > /* > - * Move a set of pages as indicated in the pm array. The addr > - * field must be set to the virtual address of the page to be moved > - * and the node number must contain a valid target node. > - * The pm array ends with node =3D MAX_NUMNODES. > + * Resolves the given address to a struct page, isolates it from the L= RU and > + * puts it to the given pagelist. > + * Returns -errno if the page cannot be found/isolated or 0 when it ha= s been > + * queued or the page doesn't need to be migrated because it is alread= y on > + * the target node > */ > -static int do_move_page_to_node_array(struct mm_struct *mm, > - struct page_to_node *pm, > - int migrate_all) > +static int add_page_for_migration(struct mm_struct *mm, unsigned long = addr, > + int node, struct list_head *pagelist, bool migrate_all) > { > + struct vm_area_struct *vma; > + struct page *page; > + unsigned int follflags; > int err; > - struct page_to_node *pp; > - LIST_HEAD(pagelist); > > down_read(&mm->mmap_sem); > + err =3D -EFAULT; > + vma =3D find_vma(mm, addr); > + if (!vma || addr < vma->vm_start || !vma_migratable(vma)) > + goto out; > > - /* > - * Build a list of pages to migrate > - */ > - for (pp =3D pm; pp->node !=3D MAX_NUMNODES; pp++) { > - struct vm_area_struct *vma; > - struct page *page; > - struct page *head; > - unsigned int follflags; > - > - err =3D -EFAULT; > - vma =3D find_vma(mm, pp->addr); > - if (!vma || pp->addr < vma->vm_start || !vma_migratable(vma)) > - goto set_status; > - > - /* FOLL_DUMP to ignore special (like zero) pages */ > - follflags =3D FOLL_GET | FOLL_DUMP; > - if (!thp_migration_supported()) > - follflags |=3D FOLL_SPLIT; > - page =3D follow_page(vma, pp->addr, follflags); > + /* FOLL_DUMP to ignore special (like zero) pages */ > + follflags =3D FOLL_GET | FOLL_DUMP; > + if (!thp_migration_supported()) > + follflags |=3D FOLL_SPLIT; > + page =3D follow_page(vma, addr, follflags); > > - err =3D PTR_ERR(page); > - if (IS_ERR(page)) > - goto set_status; > + err =3D PTR_ERR(page); > + if (IS_ERR(page)) > + goto out; > > - err =3D -ENOENT; > - if (!page) > - goto set_status; > + err =3D -ENOENT; > + if (!page) > + goto out; > > - err =3D page_to_nid(page); > + err =3D 0; > + if (page_to_nid(page) =3D=3D node) > + goto out_putpage; > > - if (err =3D=3D pp->node) > - /* > - * Node already in the right place > - */ > - goto put_and_set; > + err =3D -EACCES; > + if (page_mapcount(page) > 1 && !migrate_all) > + goto out_putpage; > > - err =3D -EACCES; > - if (page_mapcount(page) > 1 && > - !migrate_all) > - goto put_and_set; > - > - if (PageHuge(page)) { > - if (PageHead(page)) { > - isolate_huge_page(page, &pagelist); > - err =3D 0; > - pp->page =3D page; > - } > - goto put_and_set; > + if (PageHuge(page)) { > + if (PageHead(page)) { > + isolate_huge_page(page, pagelist); > + err =3D 0; > } > + } else { > + struct page *head; > > - pp->page =3D compound_head(page); > head =3D compound_head(page); > err =3D isolate_lru_page(head); > - if (!err) { > - list_add_tail(&head->lru, &pagelist); > - mod_node_page_state(page_pgdat(head), > - NR_ISOLATED_ANON + page_is_file_cache(head), > - hpage_nr_pages(head)); > - } > -put_and_set: > - /* > - * Either remove the duplicate refcount from > - * isolate_lru_page() or drop the page ref if it was > - * not isolated. > - */ > - put_page(page); > -set_status: > - pp->status =3D err; > - } > - > - err =3D 0; > - if (!list_empty(&pagelist)) { > - err =3D migrate_pages(&pagelist, new_page_node, NULL, > - (unsigned long)pm, MIGRATE_SYNC, MR_SYSCALL); > if (err) > - putback_movable_pages(&pagelist); > - } > + goto out_putpage; > > + err =3D 0; > + list_add_tail(&head->lru, pagelist); > + mod_node_page_state(page_pgdat(head), > + NR_ISOLATED_ANON + page_is_file_cache(head), > + hpage_nr_pages(head)); > + } > +out_putpage: > + /* > + * Either remove the duplicate refcount from > + * isolate_lru_page() or drop the page ref if it was > + * not isolated. > + */ > + put_page(page); > +out: > up_read(&mm->mmap_sem); > return err; > } > @@ -1593,79 +1555,79 @@ static int do_pages_move(struct mm_struct *mm, = nodemask_t task_nodes, > const int __user *nodes, > int __user *status, int flags) > { > - struct page_to_node *pm; > - unsigned long chunk_nr_pages; > - unsigned long chunk_start; > - int err; > - > - err =3D -ENOMEM; > - pm =3D (struct page_to_node *)__get_free_page(GFP_KERNEL); > - if (!pm) > - goto out; > + int current_node =3D NUMA_NO_NODE; > + LIST_HEAD(pagelist); > + int start, i; > + int err =3D 0, err1; > > migrate_prep(); > > - /* > - * Store a chunk of page_to_node array in a page, > - * but keep the last one as a marker > - */ > - chunk_nr_pages =3D (PAGE_SIZE / sizeof(struct page_to_node)) - 1; > - > - for (chunk_start =3D 0; > - chunk_start < nr_pages; > - chunk_start +=3D chunk_nr_pages) { > - int j; > + for (i =3D start =3D 0; i < nr_pages; i++) { > + const void __user *p; > + unsigned long addr; > + int node; > > - if (chunk_start + chunk_nr_pages > nr_pages) > - chunk_nr_pages =3D nr_pages - chunk_start; > - > - /* fill the chunk pm with addrs and nodes from user-space */ > - for (j =3D 0; j < chunk_nr_pages; j++) { > - const void __user *p; > - int node; > - > - err =3D -EFAULT; > - if (get_user(p, pages + j + chunk_start)) > - goto out_pm; > - pm[j].addr =3D (unsigned long) p; > - > - if (get_user(node, nodes + j + chunk_start)) > - goto out_pm; > - > - err =3D -ENODEV; > - if (node < 0 || node >=3D MAX_NUMNODES) > - goto out_pm; > - > - if (!node_state(node, N_MEMORY)) > - goto out_pm; > - > - err =3D -EACCES; > - if (!node_isset(node, task_nodes)) > - goto out_pm; > + err =3D -EFAULT; > + if (get_user(p, pages + i)) > + goto out_flush; > + if (get_user(node, nodes + i)) > + goto out_flush; > + addr =3D (unsigned long)p; > + > + err =3D -ENODEV; > + if (node < 0 || node >=3D MAX_NUMNODES) > + goto out_flush; > + if (!node_state(node, N_MEMORY)) > + goto out_flush; > > - pm[j].node =3D node; > + err =3D -EACCES; > + if (!node_isset(node, task_nodes)) > + goto out_flush; > + > + if (current_node =3D=3D NUMA_NO_NODE) { > + current_node =3D node; > + start =3D i; > + } else if (node !=3D current_node) { > + err =3D do_move_pages_to_node(mm, &pagelist, current_node); > + if (err) > + goto out; > + err =3D store_status(status, start, current_node, i - start); > + if (err) > + goto out; > + start =3D i; > + current_node =3D node; > } > > - /* End marker for this chunk */ > - pm[chunk_nr_pages].node =3D MAX_NUMNODES; > + /* > + * Errors in the page lookup or isolation are not fatal and we simpl= y > + * report them via status > + */ > + err =3D add_page_for_migration(mm, addr, current_node, > + &pagelist, flags & MPOL_MF_MOVE_ALL); > + if (!err) > + continue; > > - /* Migrate this chunk */ > - err =3D do_move_page_to_node_array(mm, pm, > - flags & MPOL_MF_MOVE_ALL); > - if (err < 0) > - goto out_pm; > + err =3D store_status(status, i, err, 1); > + if (err) > + goto out_flush; > > - /* Return status information */ > - for (j =3D 0; j < chunk_nr_pages; j++) > - if (put_user(pm[j].status, status + j + chunk_start)) { > - err =3D -EFAULT; > - goto out_pm; > - } > + err =3D do_move_pages_to_node(mm, &pagelist, current_node); > + if (err) > + goto out; > + if (i > start) { > + err =3D store_status(status, start, current_node, i - start); > + if (err) > + goto out; > + } > + current_node =3D NUMA_NO_NODE; > } > - err =3D 0; > - > -out_pm: > - free_page((unsigned long)pm); > +out_flush: > + /* Make sure we do not overwrite the existing error */ > + err1 =3D do_move_pages_to_node(mm, &pagelist, current_node); > + if (!err1) > + err1 =3D store_status(status, start, current_node, i - start); > + if (!err) > + err =3D err1; > out: > return err; > } > -- = > 2.15.1 --=_MailMate_20A6E0EC-1FE1-43CC-A91A-F9813CA0E004_= Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQEcBAEBCgAGBQJab5rWAAoJEEGLLxGcTqbMl2cIAKl+L5Ny63mpMl93chp1CsgF g6AX4/aZAdb9E6++dc6+dhYiIG0cpJTjG2jbYiNTSorwk8QN2C8uFOii+m7Sy6Kg hhVpi8XSznXS/SH7k/0WBpClQf3nHd9obHU3VFlHgRE3GY1hxy4JtoGZgrexstRX eY6iwbFiut0205lFT1VzA8W/YwGkok/2yEverZzqZVAVX41q8+qnZmMfRZkp6h6a PplX0GY+mbtTWxsCR1P3fa6aJTrPjIKshmHAQdw77HD2BnTu/BJrQ5diyFyqjT3e j4yB9iE81MBQ267NLpSSWiS2QMRrrj5VgTzfnj1shD7DoEpgz5KgPO2refy8IVY= =4UtN -----END PGP SIGNATURE----- --=_MailMate_20A6E0EC-1FE1-43CC-A91A-F9813CA0E004_=--