From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DCFDC54FCE for ; Tue, 24 Mar 2020 12:30:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5670A208DB for ; Tue, 24 Mar 2020 12:30:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5670A208DB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E0A666B0005; Tue, 24 Mar 2020 08:30:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB9A26B0006; Tue, 24 Mar 2020 08:30:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA9B26B0007; Tue, 24 Mar 2020 08:30:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id B0AFE6B0005 for ; Tue, 24 Mar 2020 08:30:52 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 98D698248047 for ; Tue, 24 Mar 2020 12:30:52 +0000 (UTC) X-FDA: 76630189944.10.fifth02_773fa7a9633c X-HE-Tag: fifth02_773fa7a9633c X-Filterd-Recvd-Size: 9387 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Mar 2020 12:30:50 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R661e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01358;MF=teawaterz@linux.alibaba.com;NM=1;PH=DS;RN=20;SR=0;TI=SMTPD_---0TtWSYmA_1585053034; Received: from 127.0.0.1(mailfrom:teawaterz@linux.alibaba.com fp:SMTPD_---0TtWSYmA_1585053034) by smtp.aliyun-inc.com(127.0.0.1); Tue, 24 Mar 2020 20:30:43 +0800 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.0 \(3608.60.0.2.5\)) Subject: Re: [PATCH] mm, memcg: Add memory.transparent_hugepage_disabled From: teawater In-Reply-To: <20200324110034.GH19542@dhcp22.suse.cz> Date: Tue, 24 Mar 2020 20:30:32 +0800 Cc: Hui Zhu , Johannes Weiner , Vladimir Davydov , Andrew Morton , hughd@google.com, yang.shi@linux.alibaba.com, kirill@shutemov.name, dan.j.williams@intel.com, aneesh.kumar@linux.ibm.com, sean.j.christopherson@intel.com, thellstrom@vmware.com, guro@fb.com, shakeelb@google.com, chris@chrisdown.name, tj@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Transfer-Encoding: quoted-printable Message-Id: <816B70EC-20AD-4BB8-AD13-4F5640EBAB35@linux.alibaba.com> References: <1585045916-27339-1-git-send-email-teawater@gmail.com> <20200324110034.GH19542@dhcp22.suse.cz> To: Michal Hocko X-Mailer: Apple Mail (2.3608.60.0.2.5) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > 2020=E5=B9=B43=E6=9C=8824=E6=97=A5 19:00=EF=BC=8CMichal Hocko = =E5=86=99=E9=81=93=EF=BC=9A >=20 > On Tue 24-03-20 18:31:56, Hui Zhu wrote: >> /sys/kernel/mm/transparent_hugepage/enabled is the only interface to >> control if the application can use THP in system level. >> Sometime, we would not want an application use THP even if >> transparent_hugepage/enabled is set to "always" or "madvise" because >> thp may need more cpu and memory resources in some cases. >=20 > Could you specify that sometime by a real usecase in the memcg context > please? Thanks for your review. We use thp+balloon to supply more memory flexibility for vm. = https://lore.kernel.org/linux-mm/1584893097-12317-1-git-send-email-teawate= r@gmail.com/ This is another thread that I am working around thp+balloon. Other applications are already deployed on these machines. The = transparent_hugepage/enabled is set to never because they used to have a = lot of THP related performance issues. And some of them may call madvise thp with itself. Then even if I set transparent_hugepage/enabled to madvise. These = programs are still at risk of THP-related performance issues. That is = why I need this cgroup thp switch. >=20 >> This commit add a new interface memory.transparent_hugepage_disabled >> in memcg. >> When it set to 1, the application inside the cgroup cannot use THP >> except dax. >=20 > Why should this interface differ from the global semantic. How does it > relate to the kcompactd. Your patch also doesn't seem to define this > knob to have hierarchical semantic. Why? >=20 According to my previous description, I didn=E2=80=99t get any need to = add transparent_hugepage/enabled like interface. That is why just add a switch. What about add a transparent_hugepage/enabled like interface? Thanks, Hui > All that being said, this patch is lacking both proper justification = and > the semantic is dubious to be honest. I have also say that I am not a > great fan. THP semantic is overly complex already and adding more on = top > would require really strong usecase. >=20 >> Signed-off-by: Hui Zhu >> --- >> include/linux/huge_mm.h | 18 ++++++++++++++++-- >> include/linux/memcontrol.h | 2 ++ >> mm/memcontrol.c | 42 = ++++++++++++++++++++++++++++++++++++++++++ >> mm/shmem.c | 4 ++++ >> 4 files changed, 64 insertions(+), 2 deletions(-) >>=20 >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 5aca3d1..fd81479 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -91,6 +91,16 @@ extern bool is_vma_temporary_stack(struct = vm_area_struct *vma); >>=20 >> extern unsigned long transparent_hugepage_flags; >>=20 >> +#ifdef CONFIG_MEMCG >> +extern bool memcg_transparent_hugepage_disabled(struct = vm_area_struct *vma); >> +#else >> +static inline bool >> +memcg_transparent_hugepage_disabled(struct vm_area_struct *vma) >> +{ >> + return false; >> +} >> +#endif >> + >> /* >> * to be used on vmas which are known to support THP. >> * Use transparent_hugepage_enabled otherwise >> @@ -106,8 +116,6 @@ static inline bool = __transparent_hugepage_enabled(struct vm_area_struct *vma) >> if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) >> return false; >>=20 >> - if (transparent_hugepage_flags & (1 << = TRANSPARENT_HUGEPAGE_FLAG)) >> - return true; >> /* >> * For dax vmas, try to always use hugepage mappings. If the = kernel does >> * not support hugepages, fsdax mappings will fallback to = PAGE_SIZE >> @@ -117,6 +125,12 @@ static inline bool = __transparent_hugepage_enabled(struct vm_area_struct *vma) >> if (vma_is_dax(vma)) >> return true; >>=20 >> + if (memcg_transparent_hugepage_disabled(vma)) >> + return false; >> + >> + if (transparent_hugepage_flags & (1 << = TRANSPARENT_HUGEPAGE_FLAG)) >> + return true; >> + >> if (transparent_hugepage_flags & >> (1 << = TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) >> return !!(vma->vm_flags & VM_HUGEPAGE); >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index a7a0a1a5..abc3142 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -320,6 +320,8 @@ struct mem_cgroup { >>=20 >> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >> struct deferred_split deferred_split_queue; >> + >> + bool transparent_hugepage_disabled; >> #endif >>=20 >> struct mem_cgroup_per_node *nodeinfo[0]; >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index 7a4bd8b..b6d91b6 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -5011,6 +5011,14 @@ mem_cgroup_css_alloc(struct = cgroup_subsys_state *parent_css) >> if (parent) { >> memcg->swappiness =3D mem_cgroup_swappiness(parent); >> memcg->oom_kill_disable =3D parent->oom_kill_disable; >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + memcg->transparent_hugepage_disabled >> + =3D parent->transparent_hugepage_disabled; >> +#endif >> + } else { >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + memcg->transparent_hugepage_disabled =3D false; >> +#endif >> } >> if (parent && parent->use_hierarchy) { >> memcg->use_hierarchy =3D true; >> @@ -6126,6 +6134,24 @@ static ssize_t memory_oom_group_write(struct = kernfs_open_file *of, >> return nbytes; >> } >>=20 >> +static u64 transparent_hugepage_disabled_read(struct = cgroup_subsys_state *css, >> + struct cftype *cft) >> +{ >> + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); >> + >> + return memcg->transparent_hugepage_disabled; >> +} >> + >> +static int transparent_hugepage_disabled_write(struct = cgroup_subsys_state *css, >> + struct cftype *cft, u64 = val) >> +{ >> + struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); >> + >> + memcg->transparent_hugepage_disabled =3D !!val; >> + >> + return 0; >> +} >> + >> static struct cftype memory_files[] =3D { >> { >> .name =3D "current", >> @@ -6179,6 +6205,12 @@ static struct cftype memory_files[] =3D { >> .seq_show =3D memory_oom_group_show, >> .write =3D memory_oom_group_write, >> }, >> + { >> + .name =3D "transparent_hugepage_disabled", >> + .flags =3D CFTYPE_NOT_ON_ROOT, >> + .read_u64 =3D transparent_hugepage_disabled_read, >> + .write_u64 =3D transparent_hugepage_disabled_write, >> + }, >> { } /* terminate */ >> }; >>=20 >> @@ -6787,6 +6819,16 @@ void mem_cgroup_uncharge_skmem(struct = mem_cgroup *memcg, unsigned int nr_pages) >> refill_stock(memcg, nr_pages); >> } >>=20 >> +bool memcg_transparent_hugepage_disabled(struct vm_area_struct *vma) >> +{ >> + struct mem_cgroup *memcg =3D get_mem_cgroup_from_mm(vma->vm_mm); >> + >> + if (memcg && memcg->transparent_hugepage_disabled) >> + return true; >> + >> + return false; >> +} >> + >> static int __init cgroup_memory(char *s) >> { >> char *token; >> diff --git a/mm/shmem.c b/mm/shmem.c >> index aad3ba7..253b63b 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -1810,6 +1810,10 @@ static int shmem_getpage_gfp(struct inode = *inode, pgoff_t index, >> goto alloc_nohuge; >> if (shmem_huge =3D=3D SHMEM_HUGE_DENY || sgp_huge =3D=3D = SGP_NOHUGE) >> goto alloc_nohuge; >> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >> + if (memcg_transparent_hugepage_disabled(vma)) >> + goto alloc_nohuge; >> +#endif >> if (shmem_huge =3D=3D SHMEM_HUGE_FORCE) >> goto alloc_huge; >> switch (sbinfo->huge) { >> --=20 >> 2.7.4 >=20 > --=20 > Michal Hocko > SUSE Labs