From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66347C43603 for ; Thu, 5 Dec 2019 17:20:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A9862064B for ; Thu, 5 Dec 2019 17:20:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A9862064B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BB5F66B114B; Thu, 5 Dec 2019 12:20:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B3E4D6B114C; Thu, 5 Dec 2019 12:20:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A067B6B114D; Thu, 5 Dec 2019 12:20:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 872256B114B for ; Thu, 5 Dec 2019 12:20:20 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 464CE52A0 for ; Thu, 5 Dec 2019 17:20:20 +0000 (UTC) X-FDA: 76231751400.08.vest71_3d021df64f806 X-HE-Tag: vest71_3d021df64f806 X-Filterd-Recvd-Size: 10304 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Thu, 5 Dec 2019 17:20:18 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R871e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0Tk3V.AY_1575566412; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0Tk3V.AY_1575566412) by smtp.aliyun-inc.com(127.0.0.1); Fri, 06 Dec 2019 01:20:15 +0800 Subject: Re: [v2 PATCH] mm: move_pages: return valid node id in status if the page is already on the target node To: John Hubbard , fabecassis@nvidia.com, mhocko@suse.com, cl@linux.com, vbabka@suse.cz, mgorman@techsingularity.net, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org References: <1575519678-86510-1-git-send-email-yang.shi@linux.alibaba.com> From: Yang Shi Message-ID: <45c6de9c-fa4b-c060-045a-1c7dde3fac36@linux.alibaba.com> Date: Thu, 5 Dec 2019 09:20:10 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12/4/19 9:44 PM, John Hubbard wrote: > On 12/4/19 8:21 PM, Yang Shi wrote: >> Felix Abecassis reports move_pages() would return random status if the >> pages are already on the target node by the below test program: >> >> ---8<--- > > This is correct correct code, so: > > Reviewed-by: John Hubbard > > ...with a few nitpicky notes about comments, below, that might help: Thanks, John. Will take in new version. > >> >> int main(void) >> { >> =C2=A0=C2=A0=C2=A0=C2=A0const long node_id =3D 1; >> =C2=A0=C2=A0=C2=A0=C2=A0const long page_size =3D sysconf(_SC_PAGESIZE)= ; >> =C2=A0=C2=A0=C2=A0=C2=A0const int64_t num_pages =3D 8; >> >> =C2=A0=C2=A0=C2=A0=C2=A0unsigned long nodemask =3D=C2=A0 1 << node_id; >> =C2=A0=C2=A0=C2=A0=C2=A0long ret =3D set_mempolicy(MPOL_BIND, &nodemas= k, sizeof(nodemask)); >> =C2=A0=C2=A0=C2=A0=C2=A0if (ret < 0) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (EXIT_FAILURE); >> >> =C2=A0=C2=A0=C2=A0=C2=A0void **pages =3D malloc(sizeof(void*) * num_pa= ges); >> =C2=A0=C2=A0=C2=A0=C2=A0for (int i =3D 0; i < num_pages; ++i) { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pages[i] =3D mmap(NULL, pag= e_size, PROT_WRITE | PROT_READ, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 -1, 0); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (pages[i] =3D=3D MAP_FAI= LED) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret= urn (EXIT_FAILURE); >> =C2=A0=C2=A0=C2=A0=C2=A0} >> >> =C2=A0=C2=A0=C2=A0=C2=A0ret =3D set_mempolicy(MPOL_DEFAULT, NULL, 0); >> =C2=A0=C2=A0=C2=A0=C2=A0if (ret < 0) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return (EXIT_FAILURE); >> >> =C2=A0=C2=A0=C2=A0=C2=A0int *nodes =3D malloc(sizeof(int) * num_pages)= ; >> =C2=A0=C2=A0=C2=A0=C2=A0int *status =3D malloc(sizeof(int) * num_pages= ); >> =C2=A0=C2=A0=C2=A0=C2=A0for (int i =3D 0; i < num_pages; ++i) { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 nodes[i] =3D node_id; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 status[i] =3D 0xd0; /* simu= late garbage values */ >> =C2=A0=C2=A0=C2=A0=C2=A0} >> >> =C2=A0=C2=A0=C2=A0=C2=A0ret =3D move_pages(0, num_pages, pages, nodes,= status, MPOL_MF_MOVE); >> =C2=A0=C2=A0=C2=A0=C2=A0printf("move_pages: %ld\n", ret); >> =C2=A0=C2=A0=C2=A0=C2=A0for (int i =3D 0; i < num_pages; ++i) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 printf("status[%d] =3D %d\n= ", i, status[i]); >> } >> ---8<--- >> >> Then running the program would return nonsense status values: >> $ ./move_pages_bug >> move_pages: 0 >> status[0] =3D 208 >> status[1] =3D 208 >> status[2] =3D 208 >> status[3] =3D 208 >> status[4] =3D 208 >> status[5] =3D 208 >> status[6] =3D 208 >> status[7] =3D 208 >> >> This is because the status is not set if the page is already on the >> target node, but move_pages() should return valid status as long as it >> succeeds.=C2=A0 The valid status may be errno or node id. >> >> We can't simply initialize status array to zero since the pages may be >> not on node 0.=C2=A0 Fix it by updating status with node id which the = page is >> already on.=C2=A0 And, it looks we have to update the status inside >> add_page_for_migration() since the page struct is not available outsid= e >> it. >> >> Make add_page_for_migration() return 1 if store_status() is failed in >> order to not mix up the status value since -EFAULT is also a valid >> status. >> >> Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move") >> Reported-by: Felix Abecassis >> Tested-by: Felix Abecassis >> Cc: John Hubbard >> Cc: Michal Hocko >> Cc: Christoph Lameter >> Cc: Vlastimil Babka >> Cc: Mel Gorman >> Cc: 4.17+ >> Signed-off-by: Yang Shi >> --- >> v2: *Correted the return value when add_page_for_migration() returns 1= . >> >> John noticed another return value inconsistency between the=20 >> implementation and >> the manpage.=C2=A0 The manpage says it should return -ENOENT if the pa= ge=20 >> is already >> on the target node, but it doesn't.=C2=A0 It looks the original code=20 >> didn't return >> -ENOENT either, I'm not sure if this is a document issue or not.=C2=A0= =20 >> Anyway this >> is another issue, once we confirm it we can fix it later. >> >> =C2=A0 mm/migrate.c | 36 ++++++++++++++++++++++++++++++------ >> =C2=A0 1 file changed, 30 insertions(+), 6 deletions(-) >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index a8f87cb..f1090a0 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -1512,17 +1512,21 @@ static int do_move_pages_to_node(struct=20 >> mm_struct *mm, >> =C2=A0 /* >> =C2=A0=C2=A0 * Resolves the given address to a struct page, isolates i= t from=20 >> the LRU and >> =C2=A0=C2=A0 * puts it to the given pagelist. >> - * Returns -errno if the page cannot be found/isolated or 0 when it=20 >> has been >> - * queued or the page doesn't need to be migrated because it is=20 >> already on >> - * the target node >> + * Returns: >> + *=C2=A0=C2=A0=C2=A0=C2=A0 errno - if the page cannot be found/isolat= ed >> + *=C2=A0=C2=A0=C2=A0=C2=A0 0 - when it has been queued or the page do= esn't need to be=20 >> migrated >> + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 because it is alre= ady on the target node >> + *=C2=A0=C2=A0=C2=A0=C2=A0 1 - if store_status() is failed > > > I recommend this wording instead: > > =C2=A0* Returns: > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0 errno - if the page cannot be found/iso= lated > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0 0 - when it has been queued or the page= doesn't need to be=20 > migrated > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 because it is a= lready on the target node > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0 1 - The page doesn't need to be migrate= d because it is already=20 > on the > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 target node. Ho= wever, attempting to store the node ID in=20 > the status > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 array failed. U= nlike other failures in this function, this=20 > case > =C2=A0*=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 needs to turn i= nto a fatal failure in the calling function. > > >> =C2=A0=C2=A0 */ >> =C2=A0 static int add_page_for_migration(struct mm_struct *mm, unsigne= d=20 >> long addr, >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int node, struct list_head= *pagelist, bool migrate_all) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int node, struct list_head= *pagelist, bool migrate_all, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int __user *status, int st= art) >> =C2=A0 { >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct vm_area_struct *vma; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 struct page *page; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 unsigned int follflags; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int err; >> +=C2=A0=C2=A0=C2=A0 bool same_node =3D false; >> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 down_read(&mm->mmap_sem); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D -EFAULT; >> @@ -1543,8 +1547,10 @@ static int add_page_for_migration(struct=20 >> mm_struct *mm, unsigned long addr, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto out; >> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D 0; >> -=C2=A0=C2=A0=C2=A0 if (page_to_nid(page) =3D=3D node) >> +=C2=A0=C2=A0=C2=A0 if (page_to_nid(page) =3D=3D node) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 same_node =3D true; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto out_putpag= e; >> +=C2=A0=C2=A0=C2=A0 } >> =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D -EACCES; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (page_mapcount(page) > 1 && !migrate= _all) >> @@ -1578,6 +1584,16 @@ static int add_page_for_migration(struct=20 >> mm_struct *mm, unsigned long addr, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 put_page(page); >> =C2=A0 out: >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 up_read(&mm->mmap_sem); >> + >> +=C2=A0=C2=A0=C2=A0 /* >> +=C2=A0=C2=A0=C2=A0=C2=A0 * Must call store_status() after releasing m= map_sem since put_user >> +=C2=A0=C2=A0=C2=A0=C2=A0 * need acquire mmap_sem too, otherwise poten= tial deadlock may=20 >> exist. >> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >> +=C2=A0=C2=A0=C2=A0 if (same_node) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (store_status(status, s= tart, node, 1)) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 er= r =3D 1; >> +=C2=A0=C2=A0=C2=A0 } >> + >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return err; >> =C2=A0 } >> =C2=A0 @@ -1639,10 +1655,18 @@ static int do_pages_move(struct mm_stru= ct=20 >> *mm, nodemask_t task_nodes, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * report = them via status >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > Let's change the comment above add_page_for_migration(), to read: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Most errors in the p= age lookup or isolation are not fatal > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * and we simply report= them via the status array. However, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * positive error value= s are fatal. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D add_pag= e_for_migration(mm, addr, current_node, >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 &pagelist, flags & MPOL_MF_MOVE_ALL); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 &pagelist, flags & MPOL_MF_MOVE_ALL, status, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 i); >> + >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!err) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 continue; >> =C2=A0 +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* store_status() f= ailed in add_page_for_migration() */ > > ...and let's replace the above line, with the following: > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Most errors in the p= age lookup or isolation are not fatal > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * and we simply report= them via the status array. However, > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * positive error value= s are fatal. > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (err > 0) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 er= r =3D -EFAULT; >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 go= to out_flush; >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> + >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 err =3D store_s= tatus(status, i, err, 1); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (err) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 goto out_flush; >> > > And with that, I think the comments help a little bit more, in reading > through the code. > > > thanks,