From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35646C433EF for ; Thu, 30 Jun 2022 02:32:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C24498E0002; Wed, 29 Jun 2022 22:32:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD38E8E0001; Wed, 29 Jun 2022 22:32:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC2788E0002; Wed, 29 Jun 2022 22:32:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9DF738E0001 for ; Wed, 29 Jun 2022 22:32:47 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7126165C for ; Thu, 30 Jun 2022 02:32:47 +0000 (UTC) X-FDA: 79633329174.22.773AD46 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 003C7100031 for ; Thu, 30 Jun 2022 02:32:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1656556366; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=d9VAuTb7CFGHMBX9juSTF+au9uAW7Xwp03NsaiwwIc8=; b=YDuQO1iD1BYkFUGQai1653zGKHf+93PMgpqEsCWxtAlWYIavGIvxCd2zeIQUsIWvfFWC5q 5KDd4hMy83tkz7pe47N1phUKx5u9gMrhJC000V/PsNrVzi8o7kJ5mC9XtlYnJpOJ9nQKp7 gxI92FerXg1i2uxD5+Q1fDWU2VzWv6I= Received: from mail-il1-f198.google.com (mail-il1-f198.google.com [209.85.166.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-599-u1svaNk3NZalvTXe1AGhQQ-1; Wed, 29 Jun 2022 22:32:44 -0400 X-MC-Unique: u1svaNk3NZalvTXe1AGhQQ-1 Received: by mail-il1-f198.google.com with SMTP id h5-20020a056e021b8500b002d8f50441a2so9917909ili.13 for ; Wed, 29 Jun 2022 19:32:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=d9VAuTb7CFGHMBX9juSTF+au9uAW7Xwp03NsaiwwIc8=; b=XwI44w3rd2orORZRoT/duiB/rXIIgevroBcnmNGmxL8EiFLvxxUXjZGQ+QaG47Ay9s 6L6Hj73qOCvRwGGDLNoubzpmkcbEldJht8/rFaTNbRqghz4dli5ix6zumjOZtkJaLQmq PTrtOoBvZ0TIu0ehpTahpt8sGraZBkTL5J+eLeH2i0W9cQ++H8e+Z+QHbVZMXqO3jlWS B/RGDp9RqRfakqYz2d+Yxj/mQyGG3qU39cOX5GjNSzEQesbtEsnowo6blHOKmDSivNhL ZaGvfHfIHFo9L8cUsso5GpC/t+qHaTwjJ17hEwlfzGjQwDw8TtUCjz/mabHBHmTAYtNu Yb3w== X-Gm-Message-State: AJIora8rR3sI0QNGJALU83a9cg0IMRkgMbaJp62u1+q+ut/ZJyc2h1vD IZ8cCdfsCHvhStAWc2cW6BZLq6RRUf8ATThYQCjYkS2GhGelrbQNhLYNpPS6x0diCh1wI7nIG6w JxMEhsHUlGao= X-Received: by 2002:a05:6602:1221:b0:675:4bfe:bb4 with SMTP id z1-20020a056602122100b006754bfe0bb4mr3254153iot.92.1656556363826; Wed, 29 Jun 2022 19:32:43 -0700 (PDT) X-Google-Smtp-Source: AGRyM1ultWMMAOXF6ULL+74tTX6Rz5C+7bWtZZKCNK6ZcwUb14dqMJO3wCxPoQDjBL5huWID+sk37w== X-Received: by 2002:a05:6602:1221:b0:675:4bfe:bb4 with SMTP id z1-20020a056602122100b006754bfe0bb4mr3254119iot.92.1656556363578; Wed, 29 Jun 2022 19:32:43 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id m28-20020a02a15c000000b00339e5e105a2sm7931520jah.117.2022.06.29.19.32.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Jun 2022 19:32:42 -0700 (PDT) Date: Wed, 29 Jun 2022 22:32:39 -0400 From: Peter Xu To: Zach O'Keefe Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Subject: Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled Message-ID: References: <20220604004004.954674-1-zokeefe@google.com> <20220604004004.954674-9-zokeefe@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656556367; a=rsa-sha256; cv=none; b=XVkJlgNnoRqeSIY46AzJd0F+ygOg00g/iJyKhgFfRuLomqh0Ywo5AdaH9W5xKMIxBcwU99 d9XB7eIflRWG2cpaw3FbyOltN3htEt4iuJbVWy2UFWg5j1muGGH1s3eA0jQzZekWxLL58z 83dsavJmLcI+Fdn70YJqrqP4mQA+Nos= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YDuQO1iD; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf14.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656556367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d9VAuTb7CFGHMBX9juSTF+au9uAW7Xwp03NsaiwwIc8=; b=fa7nDJO1fGXI0kqcnbGE7/R16NToiP0WIBlMnoNACdCEzTLTuedKFMcrtvahFVD4cWRBZW 0yzhicNNaznLaLCqismvY9tNpiU8s2PI7NTIPDMkr/aNIbyd4yVhB+04vlmy8FbfaA0q7h IjKn5Eb6m4uSGJWvIcaGgBNsw3QM9r0= X-Rspamd-Queue-Id: 003C7100031 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=YDuQO1iD; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf14.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspamd-Server: rspam02 X-Stat-Signature: ct19semrcaut8g8onzqfriwpxbeqffnp X-HE-Tag: 1656556366-164315 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote: > On Jun 29 19:21, Peter Xu wrote: > > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote: > > > Add enforce_thp_enabled flag to struct collapse_control that allows context > > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled. > > > > > > This flag is set in khugepaged collapse context to preserve existing > > > khugepaged behavior. > > > > > > This flag will be used (unset) when introducing madvise collapse > > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled > > > to sysfs THP settings. Most notably, for the purpose of eventual > > > madvise_collapse(2) support, this allows userspace to trigger THP collapse > > > on behalf of another processes, without adding support to meddle with > > > the VMA flags of said process, or change sysfs THP settings. > > > > > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled, > > > but it can be expanded to include > > > /sys/kernel/transparent_hugepage/shmem_enabled later. > > > > > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ > > > > > > Signed-off-by: Zach O'Keefe > > > --- > > > mm/khugepaged.c | 34 +++++++++++++++++++++++++++------- > > > 1 file changed, 27 insertions(+), 7 deletions(-) > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index c3589b3e238d..4ad04f552347 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -94,6 +94,11 @@ struct collapse_control { > > > */ > > > bool enforce_page_heuristics; > > > > > > + /* Enforce constraints of > > > + * /sys/kernel/mm/transparent_hugepage/enabled > > > + */ > > > + bool enforce_thp_enabled; > > > > Small nitpick that we could have merged the two booleans if they always > > match, but no strong opinions if you think these two are clearer. Or maybe > > there's other plan of using them? > > > > > + > > > /* Num pages scanned per node */ > > > int node_load[MAX_NUMNODES]; > > > > > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > > > */ > > > > > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > - struct vm_area_struct **vmap) > > > + struct vm_area_struct **vmap, > > > + struct collapse_control *cc) > > > { > > > struct vm_area_struct *vma; > > > unsigned long hstart, hend; > > > + unsigned long vma_flags; > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > return SCAN_ANY_PROCESS; > > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > hend = vma->vm_end & HPAGE_PMD_MASK; > > > if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > return SCAN_ADDRESS_RANGE; > > > - if (!hugepage_vma_check(vma, vma->vm_flags)) > > > + > > > + /* > > > + * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that > > > + * hugepage_vma_check() can pass even if > > > + * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode). > > > + * Note that hugepage_vma_check() doesn't enforce that > > > + * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG > > > + * must be set (i.e. "never" mode). > > > + */ > > > + vma_flags = cc->enforce_thp_enabled ? vma->vm_flags > > > + : vma->vm_flags | VM_HUGEPAGE; > > > > Another nitpick.. > > > > We could get a weird vm_flags when VM_NOHUGEPAGE is set. I don't think > > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO > > we shouldn't rely on that as it seems error prone (e.g. when accidentally > > moved things around). > > > > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE? Or pass over > > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc. > > Passing in the boolean has one benefit that we don't really need the > > complicated comment above since the code should be able to explain itself. > > Hey Peter, thanks again for taking the time to review. > > Answering both of the above at the time: > > As in this series so far, I've tried to keep context functionally-declarative - > specifying the intended behavior (e.g. "enforce_page_heuristics") rather than > adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }" > around the code which, IMO, makes it difficult to follow. Unfortunately, I've > ran into the 2 problems you've stated here: > > 1) *Right now* all the behavior knobs are either off/on at the same time > 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central > authority on THP eligibility), things are complicated enough that I > couldn't find a clean way to describe the parameters of the context without > explicitly mentioning the caller. > > For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior, > I think we need to package these contexts into a single set of flags: > > enum thp_ctx_flags { > THP_CTX_ANON_FAULT = 1 << 1, > THP_CTX_KHUGEPAGED = 1 << 2, > THP_CTX_SMAPS = 1 << 3, > THP_CTX_MADVISE_COLLAPSE = 1 << 4, > }; > > That will avoid hacking vma flags passed to hugepage_vma_check(). > > And, if we have these anyways, I might as well do away with some of the > (semantically meaningful but functionally redundant) flags in > struct collapse_control and just specify a single .thp_ctx_flags member. I'm > not entirely happy with it - but that's what I'm planning. > > WDYT? Firstly I think I wrongly sent previous email privately.. :( Let me try to add the list back.. IMHO we don't need to worry too much on the "if... else if... else", because they shouldn't be more complicated than when you spread the meanings into multiple flags, or how could it be? :) IMHO it should literally be as simple as applying: s/enforce_{A|B|C|...}/khugepaged_initiated/g Throughout the patches, then we squash the patches introducing enforce_X. If you worry it's not clear on "what does khugepaged_initiated mean", we could add whatever comment above the variable explaining A/B/C/D will be covered when this is set, and we could postpone to do the flag split only until there're real user. Adding these flags could add unnecessary bit-and instructions into the code generated at last, and if it's only about readability issue that's really what comment is for? Thanks, -- Peter Xu