From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 680E1C3F68F for ; Wed, 15 Jan 2020 17:16:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B4284214AF for ; Wed, 15 Jan 2020 17:16:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4284214AF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4C4A88E0008; Wed, 15 Jan 2020 12:16:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4752C8E0005; Wed, 15 Jan 2020 12:16:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B33D8E0008; Wed, 15 Jan 2020 12:16:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0091.hostedemail.com [216.40.44.91]) by kanga.kvack.org (Postfix) with ESMTP id 205C98E0005 for ; Wed, 15 Jan 2020 12:16:24 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with SMTP id D5708180AD806 for ; Wed, 15 Jan 2020 17:16:23 +0000 (UTC) X-FDA: 76380522246.04.suit03_54d4c7e859b47 X-HE-Tag: suit03_54d4c7e859b47 X-Filterd-Recvd-Size: 6116 Received: from out4436.biz.mail.alibaba.com (out4436.biz.mail.alibaba.com [47.88.44.36]) by imf10.hostedemail.com (Postfix) with ESMTP for ; Wed, 15 Jan 2020 17:16:22 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R381e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04407;MF=yang.shi@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0TnpGHCR_1579108576; Received: from US-143344MP.local(mailfrom:yang.shi@linux.alibaba.com fp:SMTPD_---0TnpGHCR_1579108576) by smtp.aliyun-inc.com(127.0.0.1); Thu, 16 Jan 2020 01:16:19 +0800 Subject: Re: [PATCH 2/2] mm/mempolicy: Skip walking HUGETLB vma if MPOL_MF_STRICT is specified alone To: Li Xinhai , Mike Kravetz , "linux-mm@kvack.org" Cc: akpm , mhocko , n-horiguchi References: <1578993378-10860-1-git-send-email-lixinhai.lxh@gmail.com> <1578993378-10860-2-git-send-email-lixinhai.lxh@gmail.com> <2020011422092314671410@gmail.com> <7c26d332-c40a-0271-b408-a2079ab00808@linux.alibaba.com> <253e9110-4ffd-e9ba-feec-48ce899af057@linux.alibaba.com> <2020011515362520135446@gmail.com> From: Yang Shi Message-ID: Date: Wed, 15 Jan 2020 09:16:14 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <2020011515362520135446@gmail.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 1/14/20 11:36 PM, Li Xinhai wrote: > On 2020-01-15=C2=A0at 13:23=C2=A0Yang Shi=C2=A0wrote: >> >> On 1/14/20 8:28 PM, Mike Kravetz wrote: >>> On 1/14/20 5:24 PM, Yang Shi wrote: >>>> On 1/14/20 5:07 PM, Mike Kravetz wrote: >>>>> On 1/14/20 6:09 AM, Li Xinhai wrote: >>>>>> Add cc to >>>>>> Yang Shi >>>>>> Naoya Horiguchi >>>>>> , who has been worked on this part >>>>>> >>>>>> On 2020-01-14 at 17:16 Li Xinhai wrote: >>>>>>> Checking MPOL_MF_STRICT is ignored for HUGETLB vma according to m= bind man >>>>>>> page: >>>>>>> >>>>>>> Notes >>>>>>> MPOL_MF_STRICT is ignored on huge page mappings. >>>>>>> >>>>>>> If MPOL_MF_STRICT is specified alone without any MOVE flag, we sh= ould >>>>>>> indicate, from test_walk, that walking this vma should be skipped= even if >>>>>>> there are misplaced pages. >>>>>>> >>>>>>> Signed-off-by: Li Xinhai >>>>>>> Cc: Michal Hocko >>>>>>> Cc: Mike Kravetz >>>>> I do not necessarily disagree with the change.=C2=A0 However, this = has made me >>>>> question a couple things: >>>>> 1) Why does the man page say MPOL_MF_STRICT is ignored on huge page= mappings? >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - Is that leftover from the the day= s when huge page migration was not >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 supported? >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 - Is it just because huge page migr= ation is more likely to fail than >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 base page migration. >>>>> 2) Does the mbind code function properly when unable to migrate a h= uge page >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 MPOL_MF_STRICT is set?=C2=A0 A quic= k look at the code looks like it returns >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 EIO. > for question (2), > look into=C2=A0queue_pages_hugetlb(), the misplaced page would not > cause -EIO=C2=A0reported, for both STRICT set alone and STRICT set with= MOVE*; > that means STRICT been effectively ignored during isolation phase. > > In unmap and move phase, -EIO is reported if failed to move page and > STRICT is set. > >>>> I don't know the answer about question #1 I didn't dig into the hist= ory. The queue_pages_hugetlb() returns 0 unconditionally, I think this is= what "MPOL_MF_STRICT is ignored on huge page mappings" means in code. >>>> >>>> It would return -EIO for base pages or THP as what the manpage descr= ibes. >>>> >>> I was thinking about a migration failure after isolation.=C2=A0 This = block of >>> code in do_mbind() after queue_pages_range() and mbind_range(). >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!err) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int nr_failed =3D 0; >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!list_empty(&pagelist)) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 WARN_ON_ONCE(flags & MPOL_MF_LAZY); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 nr_failed =3D migrate_pages(&pagelist, new_page, NULL, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 start, MIGRATE_SYN= C, MR_MEMPOLICY_MBIND); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 if (nr_failed) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 putback_movable_pa= ges(&pagelist); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if ((ret > 0) || (nr_failed && (flags &= MPOL_MF_STRICT))) >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 err =3D -EIO; >> Hmm.. I agree this part in man page does look ambiguous. We may assume >> "MPOL_MF_STRICT is ignored on huge page mappings." implies if >> MPOL_MF_STRICT is specified alone? If MOVE flag is specified it should >> return -EIO if some pages could not be moved as what the man page desc= ribes. >> > It looks to me that current code has no feasible way to ignore STRICT > flag for hugetlb page=C2=A0when failure happen in =C2=A0unmap&move phas= e, > because mbind is about to handle multiple vma(i.e., hugetlb vma mixed w= ith > other vma) in one call. Yes, if you have both MPOL_MF_STRICT and MPOL_MF_MOVE_{ALL} specified,=20 so I speculate the condition is MPOL_MF_STRICT is specified alone, but=20 the man page doesn't elaborate this. This is also what the code does.=20 But, I'm not sure if my understanding is correct or not. > >> I don't know what the intention was at the first place. We may have to >> dig into the history. >> > >