From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CA53C433F5 for ; Thu, 28 Apr 2022 15:52:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6609C6B00AF; Thu, 28 Apr 2022 11:52:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 610676B00B0; Thu, 28 Apr 2022 11:52:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 489836B00B1; Thu, 28 Apr 2022 11:52:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 39CF86B00AF for ; Thu, 28 Apr 2022 11:52:28 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0D224224F3 for ; Thu, 28 Apr 2022 15:52:28 +0000 (UTC) X-FDA: 79406729976.21.8E48F82 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 12D8410005B for ; Thu, 28 Apr 2022 15:52:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1651161146; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=oB4ha/n1N7PgjoQju5aBYDz/Ogyl4WXScI5nzngCsJo=; b=KD7coFH9Qx2UXCJ6ZxlbTPZYu9nQgaWJM3NI2VwOoKTfC9hhO2YTDMGler2/raKCYls/X0 4sV57x459WTwFC7yvJXq9a17MCeNYrXBuSENHNRkItrw2ApWKXtEKGV3XLe6QOxOQypAK+ 4VAQJTd4SCZ4vhR/gBkWTN92P+++BvA= Received: from mail-io1-f72.google.com (mail-io1-f72.google.com [209.85.166.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-EXJpfgDrPlGsosxzBD1rPQ-1; Thu, 28 Apr 2022 11:52:25 -0400 X-MC-Unique: EXJpfgDrPlGsosxzBD1rPQ-1 Received: by mail-io1-f72.google.com with SMTP id i19-20020a5d9353000000b006495ab76af6so4819999ioo.0 for ; Thu, 28 Apr 2022 08:52:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=oB4ha/n1N7PgjoQju5aBYDz/Ogyl4WXScI5nzngCsJo=; b=hEZP44J5qlfe+IXQbTxfJg/tOBoWOlX8RhVRxOv2XKm6bOHuZ+163ZUyoI1p6g/S7s SctpDE3f5hxSD+3pm6Q4PXE1b05Erg4bnOvqwPb/sHlKo3G2m5QVT8GTeVl/pkkMasqo arePVloOVwpDEkN4R7FujSyjgyQgdv4NKTZT4GAXM4D6lxNNmGxE696bK8nliGwprkpO mz0HAS1PLoQWz0o39FdjPzMotMp0KSSWNoElwNQI33f2+JV3rLezCbs0K3QSfchwmoj8 XUSEp7LfJ6gXrkks+DBCq+yNpRfJhm0Mxr9tRz3uJwKnNbibWcc35WVurfi6xJZg6xwg mTdw== X-Gm-Message-State: AOAM531hsGnK5QzDn22fsahYUl3VlAFpc5EZeFyWOfOfdB4xwYbWj2Z1 7LVO+9OOpo7lIm8Y1EAIknRkUXUg4tjJJQTWTDVrI74q/Cz1VQNswsRNzwI/DT9U89rgCO1KPKa j+gAAQeI1zRw= X-Received: by 2002:a02:ccab:0:b0:32b:30c0:283b with SMTP id t11-20020a02ccab000000b0032b30c0283bmr961293jap.25.1651161144989; Thu, 28 Apr 2022 08:52:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxZgn8tRlpCXWBPK50D/HiUm5b7wxH03P24XGCN0X8S4fgVxCbClCqU5BfmeNVpGOQViaY7Cw== X-Received: by 2002:a02:ccab:0:b0:32b:30c0:283b with SMTP id t11-20020a02ccab000000b0032b30c0283bmr961270jap.25.1651161144700; Thu, 28 Apr 2022 08:52:24 -0700 (PDT) Received: from xz-m1.local (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id i9-20020a056e02054900b002cc1959aa6fsm109332ils.87.2022.04.28.08.52.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Apr 2022 08:52:24 -0700 (PDT) Date: Thu, 28 Apr 2022 11:52:21 -0400 From: Peter Xu To: Zach O'Keefe Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , kernel test robot Subject: Re: [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific Message-ID: References: <20220426144412.742113-1-zokeefe@google.com> <20220426144412.742113-4-zokeefe@google.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 12D8410005B X-Stat-Signature: 6xhcneyuwhqc69g1f6aqo5txi9ya3q9e Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=KD7coFH9; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf14.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com X-Rspam-User: X-HE-Tag: 1651161145-242689 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 28, 2022 at 08:37:06AM -0700, Zach O'Keefe wrote: > On Thu, Apr 28, 2022 at 7:51 AM Peter Xu wrote: > > > > On Wed, Apr 27, 2022 at 05:51:53PM -0700, Zach O'Keefe wrote: > > > > > @@ -1110,13 +1115,14 @@ static void collapse_huge_page(struct mm_struct *mm, > > > > > */ > > > > > mmap_read_unlock(mm); > > > > > > > > > > + node = khugepaged_find_target_node(cc); > > > > > /* sched to specified node before huage page memory copy */ > > > > > if (task_node(current) != node) { > > > > > cpumask = cpumask_of_node(node); > > > > > if (!cpumask_empty(cpumask)) > > > > > set_cpus_allowed_ptr(current, cpumask); > > > > > } > > > > > - new_page = khugepaged_alloc_page(hpage, gfp, node); > > > > > + new_page = cc->alloc_hpage(cc, gfp, node); > > > > > > > > AFAICT you removed all references of khugepaged_alloc_page() in this patch, > > > > then you'd better drop the function for both NUMA and UMA. > > > > > > > > > > Sorry, I'm not sure I follow. In khugepaged context, logic WRT > > > khugepaged_alloc_page() is unchanged - it's just called indirectly > > > through ->alloc_hpage(). > > > > Ah you're right, sorry for the confusion. > > > > > > > > > Said that, I think it's actually better to keep them, as they do things > > > > useful. For example,AFAICT this ->alloc_hpage() change can leak the hpage > > > > alloacted for UMA already so that's why I think keeping them makes sense, > > > > then iiuc the BUG_ON would trigger with UMA already. > > > > > > > > I saw that you've moved khugepaged_find_target_node() into this function > > > > which looks definitely good, but if we could keep khugepaged_alloc_page() > > > > and then IMHO we could even move khugepaged_find_target_node() into that > > > > and drop the "node" parameter in khugepaged_alloc_page(). > > > > > > > > > > I actually had done this, until commit 4272063db38c ("mm/khugepaged: > > > sched to numa node when collapse huge page") which forced me to keep > > > "node" visible in this function. > > > > Right, I was looking at my local tree and that patch seems to be very > > lately added into -next. > > > > I'm curious why it wasn't applying to file thps too if it is worthwhile, > > since if so that's also a suitable candidate to be moved into the same > > hook. I've asked in that thread instead. > > > > Before that, feel free to keep your code as is. > > > > Thanks for checking on that. On second thought, it seems like we would > actually want to move this sched logic into the khupaged hook impl > since we probably don't want to be moving around the calling process > if in madvise context. Sounds correct, if it's moved over it must only be in the new helper (if you're going to introduce it, like below :) that was only for khugepaged. Actually I really start to question that idea more.. e.g. even for khugepaged the target node can be changing rapidly depending on the owner of pages planned to collapse. Whether does it make sense to bounce khugepaged thread itself seems still questionable to me, and definitely not a good idea for madvise() context. The only proof in that patch was a whole testbench result but I'm not sure how reliable that'll be when applied to real world scenarios. > > > > > > > > I'd even consider moving cc->gfp() all into it if possible, since the gfp > > > > seems to be always with __GFP_THISNODE anyways. > > > > > > > > > > I would have liked to do this, but the gfp flags are referenced again > > > when calling mem_cgroup_charge(), so I couldn't quite get rid of them > > > from here. > > > > Yeah, maybe we could move mem_cgroup_charge() into the hook too? As below > > codes are duplicated between file & anon and IMHO they're good candidate to > > a new helper already anyway: > > > > /* Only allocate from the target node */ > > gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; > > > > new_page = khugepaged_alloc_page(hpage, gfp, node); > > if (!new_page) { > > result = SCAN_ALLOC_HUGE_PAGE_FAIL; > > goto out; > > } > > > > if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { > > result = SCAN_CGROUP_CHARGE_FAIL; > > goto out; > > } > > count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); > > > > If we want to generalize it maybe we want to return the "result" instead of > > the new page though, so the newpage can be passed in as a pointer. > > > > There's one mmap_read_unlock(mm) for the anonymous code but I think that > > can simply be moved above the chunk. > > > > No strong opinion here, please feel free to think about what's the best > > approach for landing this series. > > > > Thanks for the suggestion. I'll play around with it a little and see > what looks good. Sounds good! If the new helper would be welcomed then it can be a standalone pre-requisite patch to clean things up first. Let's also see how it goes with the other patch, because there's chance you can also cleanup khugepaged_find_target_node() in the same patch along with all above. -- Peter Xu