From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FC65C47082 for ; Mon, 7 Jun 2021 22:04:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EA3160FEE for ; Mon, 7 Jun 2021 22:04:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231343AbhFGWF5 (ORCPT ); Mon, 7 Jun 2021 18:05:57 -0400 Received: from mail-ed1-f49.google.com ([209.85.208.49]:45801 "EHLO mail-ed1-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230359AbhFGWFz (ORCPT ); Mon, 7 Jun 2021 18:05:55 -0400 Received: by mail-ed1-f49.google.com with SMTP id r7so7794185edv.12 for ; Mon, 07 Jun 2021 15:03:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zOe4rQFtxBwGRN1K+AKPaoefxAAn3spLFX1ByOSM/mo=; b=itxlC5css5jLtTv9vn/LKgdTnt3i7NlF7vkViO9R2VSHBPx/qO5ad3w5XDOvR7Yhcq c7tzrMxpd0a9Xv6z0XTJonyS9y6OLDoocibZ8SZin3Q+x3CRqJKd/aOG+aRxXzKEKWdm TzP1Ly/35YzQrgXu5p/VfVetTEsYA++PSeF4K4e4fQG1vIzXGZseUgUT9BhAvJtgAAin /0yRw1WVQgtOI2Sqfu2pOgkTjLPmLvpjd5/32CpOW9s4kyPNUyPqd/BaG7A8qJ6jTSvL cN6PZk9rmCg/OjVoQhNQJi46gOiSIA+HUeotHA6mr0JT2/euSqavkM3G1Wh1d1EVyIEg atAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zOe4rQFtxBwGRN1K+AKPaoefxAAn3spLFX1ByOSM/mo=; b=sIY2wlxhnXJKUIKreNKCaIGpGlh85bgKZlB5uORoBRjZ1zBD5Gr2l60Atqpyqc6RGP pGGqyL05fZp/awe0oAK6HeVMXd1UPXoeXcOWuIBhrkK0tqErFUqofEISHUtNTrGCvVRr jbyXqkwBaJVS9PUjdbkufXRZhD9MuS6ScwBLX1zuGJ0TetdKMANNsEBjfkama3mwTNVy SHsQ2jnArO2W2rreRYklKN+8MHa8qKo0a9aIcUWUxRGyl/B2YN6yjFtptEIF4LlIuZmB Udsv9L9hpAfg3WEAXmhf4hG2z0mhtCs1N1NgDVofkAxtYDbDP8DEc3cd0108+iCGH6rL 19vA== X-Gm-Message-State: AOAM531kqUA4eNjdKSjsNLSUPSiU5IOmg87aUKaV3oWu3EZn1WcCmLCJ D/3wJ0NUlXp35UqlI2h0QwCjIXLZXqzJQ6wyig+cR5BMmatssQ== X-Google-Smtp-Source: ABdhPJx8Fxb9YF2ZLWBL0PAME1svLwLI3W+xko1ynG08vhVviDgvl2PG57GJXgJUskKdMp7ZUOND6Z1wUj6DxBccbMM= X-Received: by 2002:aa7:cf0f:: with SMTP id a15mr19806266edy.313.1623103371320; Mon, 07 Jun 2021 15:02:51 -0700 (PDT) MIME-Version: 1.0 References: <20210604203513.240709-1-shy828301@gmail.com> In-Reply-To: From: Yang Shi Date: Mon, 7 Jun 2021 15:02:39 -0700 Message-ID: Subject: Re: [PATCH] mm: mempolicy: don't have to split pmd for huge zero page To: Michal Hocko Cc: Zi Yan , nao.horiguchi@gmail.com, "Kirill A. Shutemov" , Hugh Dickins , Andrew Morton , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 7, 2021 at 11:55 AM Michal Hocko wrote: > > On Mon 07-06-21 10:00:01, Yang Shi wrote: > > On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko wrote: > > > > > > On Fri 04-06-21 13:35:13, Yang Shi wrote: > > > > When trying to migrate pages to obey mempolicy, the huge zero page is > > > > split then the page table walk at PTE level just skips zero page. So it > > > > seems pointless to split huge zero page, it could be just skipped like > > > > base zero page. > > > > > > My THP knowledge is not the best but this is incorrect AIACS. Huge zero > > > page is not split. We do split the pmd which is mapping the said page. I > > > suspect you refer to vm_normal_page when talking about a zero page but > > > please be aware that huge zero page is not a normal zero page. It is > > > allocated dynamically (see get_huge_zero_page). > > > > For a normal huge page, yes, split_huge_pmd() just splits pmd. But > > actually the base zero pfn will be inserted to PTEs when splitting > > huge zero pmd. Please check __split_huge_zero_page_pmd() out. > > My bad. I didn't have a look all the way down there. The naming > suggested that this is purely page table operations and I have suspected > that ptes just point to the offset of the THP. > > But I am obviously wrong here. Sorry about that. > > > I should make this point clearer in the commit log. Sorry for the confusion. > > > > > > > > So in the end you patch disables mbind of zero pages to a target node > > > and that is a regression. > > > > Do we really migrate zero page? IIUC zero page is just skipped by > > vm_normal_page() check in queue_pages_pte_range(), isn't it? > > Yeah, normal zero pages are skipped indeed. I haven't studied why this > is the case yet. It surely sounds a bit suspicious because this is an > explicit request to migrate memory and if the zero page is misplaced it > should be moved. On the hand this would increase RSS so maybe this is > the point. The zero page is a global shared page, I don't think "misplace" applies to it. It doesn't make too much sense to migrate a shared page. Actually there is page mapcount check in migrate_page_add() to skip shared normal pages as well. > > > > Have you tested the patch? > > > > No, just build test. I thought this change was straightforward. > > > > > > > > > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for > > > > this case. > > > > > > Btw. this changelog is missing a problem statement. I suspect there is > > > no actual problem that it should fix and it is likely driven by reading > > > the code. Right? > > > > The actual problem is it is pointless to split a huge zero pmd. Yes, > > it is driven by visual inspection. > > Is there any actual workload that cares? This is quite a subtle area so > I would be careful to do changes just because... I'm not sure whether there is measurable improvement for actual workloads, but I believe this change does eliminate some unnecessary work. I think the test shown in the previous email gives us some confidence that the change doesn't have regression. > -- > Michal Hocko > SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA978C47082 for ; Mon, 7 Jun 2021 22:02:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 59F5061208 for ; Mon, 7 Jun 2021 22:02:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 59F5061208 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 85E856B006C; Mon, 7 Jun 2021 18:02:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 835F26B006E; Mon, 7 Jun 2021 18:02:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AFAA6B0070; Mon, 7 Jun 2021 18:02:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0246.hostedemail.com [216.40.44.246]) by kanga.kvack.org (Postfix) with ESMTP id 3A8346B006C for ; Mon, 7 Jun 2021 18:02:53 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D2609180AD801 for ; Mon, 7 Jun 2021 22:02:52 +0000 (UTC) X-FDA: 78228303384.29.1C19135 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf12.hostedemail.com (Postfix) with ESMTP id 89183182 for ; Mon, 7 Jun 2021 22:02:48 +0000 (UTC) Received: by mail-ed1-f49.google.com with SMTP id w21so22106251edv.3 for ; Mon, 07 Jun 2021 15:02:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zOe4rQFtxBwGRN1K+AKPaoefxAAn3spLFX1ByOSM/mo=; b=itxlC5css5jLtTv9vn/LKgdTnt3i7NlF7vkViO9R2VSHBPx/qO5ad3w5XDOvR7Yhcq c7tzrMxpd0a9Xv6z0XTJonyS9y6OLDoocibZ8SZin3Q+x3CRqJKd/aOG+aRxXzKEKWdm TzP1Ly/35YzQrgXu5p/VfVetTEsYA++PSeF4K4e4fQG1vIzXGZseUgUT9BhAvJtgAAin /0yRw1WVQgtOI2Sqfu2pOgkTjLPmLvpjd5/32CpOW9s4kyPNUyPqd/BaG7A8qJ6jTSvL cN6PZk9rmCg/OjVoQhNQJi46gOiSIA+HUeotHA6mr0JT2/euSqavkM3G1Wh1d1EVyIEg atAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zOe4rQFtxBwGRN1K+AKPaoefxAAn3spLFX1ByOSM/mo=; b=iSk6GbAD1ScQD5pNyN5sQFTv+50MqpaYNYq14I5Rm5bObsxeVjJZO7B2UpGKZOW2Qa 7sOGmtNlU5Fr2LOu6jr+HmYY/MQcxWfQOC5CdP5w5pN8qtdzrMqFE7xfxEzYEJy2B4H4 0b6wcfKL3uPjiNcPv8RvV77KCz7WQpMcL/CiMQ1KQUs6AP/HjPQouywxYlNmBqisGGlM /NaLaXnpo3XHoUO3JwwYSCiu15KXe4itGabwBe0ZdR9DefqJjnU3sXaqMoW6lSdolEXk sHI3PXccgD/J7JVh7smHV1tUVXV7WuJXh+PBff0APAsANBZRxpeR1egcE2SC8e0NHeQn 6jiQ== X-Gm-Message-State: AOAM5305G38Cg+o8VZa7tvlYeROc5cV3YJUOs8n932LegcmrNO2XAIXk tehSVn3l6MPkPXm5e3QdG7Wl07mQlADxxxAFJew= X-Google-Smtp-Source: ABdhPJx8Fxb9YF2ZLWBL0PAME1svLwLI3W+xko1ynG08vhVviDgvl2PG57GJXgJUskKdMp7ZUOND6Z1wUj6DxBccbMM= X-Received: by 2002:aa7:cf0f:: with SMTP id a15mr19806266edy.313.1623103371320; Mon, 07 Jun 2021 15:02:51 -0700 (PDT) MIME-Version: 1.0 References: <20210604203513.240709-1-shy828301@gmail.com> In-Reply-To: From: Yang Shi Date: Mon, 7 Jun 2021 15:02:39 -0700 Message-ID: Subject: Re: [PATCH] mm: mempolicy: don't have to split pmd for huge zero page To: Michal Hocko Cc: Zi Yan , nao.horiguchi@gmail.com, "Kirill A. Shutemov" , Hugh Dickins , Andrew Morton , Linux MM , Linux Kernel Mailing List Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20161025 header.b=itxlC5cs; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf12.hostedemail.com: domain of shy828301@gmail.com designates 209.85.208.49 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Stat-Signature: hizmrwjkc997waiyb19sck71rirnescj X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 89183182 X-HE-Tag: 1623103368-23962 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 7, 2021 at 11:55 AM Michal Hocko wrote: > > On Mon 07-06-21 10:00:01, Yang Shi wrote: > > On Sun, Jun 6, 2021 at 11:21 PM Michal Hocko wrote: > > > > > > On Fri 04-06-21 13:35:13, Yang Shi wrote: > > > > When trying to migrate pages to obey mempolicy, the huge zero page is > > > > split then the page table walk at PTE level just skips zero page. So it > > > > seems pointless to split huge zero page, it could be just skipped like > > > > base zero page. > > > > > > My THP knowledge is not the best but this is incorrect AIACS. Huge zero > > > page is not split. We do split the pmd which is mapping the said page. I > > > suspect you refer to vm_normal_page when talking about a zero page but > > > please be aware that huge zero page is not a normal zero page. It is > > > allocated dynamically (see get_huge_zero_page). > > > > For a normal huge page, yes, split_huge_pmd() just splits pmd. But > > actually the base zero pfn will be inserted to PTEs when splitting > > huge zero pmd. Please check __split_huge_zero_page_pmd() out. > > My bad. I didn't have a look all the way down there. The naming > suggested that this is purely page table operations and I have suspected > that ptes just point to the offset of the THP. > > But I am obviously wrong here. Sorry about that. > > > I should make this point clearer in the commit log. Sorry for the confusion. > > > > > > > > So in the end you patch disables mbind of zero pages to a target node > > > and that is a regression. > > > > Do we really migrate zero page? IIUC zero page is just skipped by > > vm_normal_page() check in queue_pages_pte_range(), isn't it? > > Yeah, normal zero pages are skipped indeed. I haven't studied why this > is the case yet. It surely sounds a bit suspicious because this is an > explicit request to migrate memory and if the zero page is misplaced it > should be moved. On the hand this would increase RSS so maybe this is > the point. The zero page is a global shared page, I don't think "misplace" applies to it. It doesn't make too much sense to migrate a shared page. Actually there is page mapcount check in migrate_page_add() to skip shared normal pages as well. > > > > Have you tested the patch? > > > > No, just build test. I thought this change was straightforward. > > > > > > > > > Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for > > > > this case. > > > > > > Btw. this changelog is missing a problem statement. I suspect there is > > > no actual problem that it should fix and it is likely driven by reading > > > the code. Right? > > > > The actual problem is it is pointless to split a huge zero pmd. Yes, > > it is driven by visual inspection. > > Is there any actual workload that cares? This is quite a subtle area so > I would be careful to do changes just because... I'm not sure whether there is measurable improvement for actual workloads, but I believe this change does eliminate some unnecessary work. I think the test shown in the previous email gives us some confidence that the change doesn't have regression. > -- > Michal Hocko > SUSE Labs