From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D51CC43334 for ; Thu, 30 Jun 2022 16:24:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235507AbiF3QYE (ORCPT ); Thu, 30 Jun 2022 12:24:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33056 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbiF3QYD (ORCPT ); Thu, 30 Jun 2022 12:24:03 -0400 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 637E731368 for ; Thu, 30 Jun 2022 09:24:01 -0700 (PDT) Received: by mail-pg1-x532.google.com with SMTP id 23so18954258pgc.8 for ; Thu, 30 Jun 2022 09:24:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UQ0ABdxfR6D43G67MI06zZIhVmsQkiYQlJlGftbg3co=; b=U34n6EPZFSdi0BxyjYW9mpzFC4uLov7wQVqlBzu4KgmT9eQTL1rff0zfX9lTeGFG6x oAlz37vRUkmn91Zv1WGucnJQH0McZEV4cs1nm2Ji5UcXzWtixkiPwquhMbXwOjbY4KID g/U02e1hG1qzapf6uMxCMR+iLfmBhnlc96EiMONXF7wsJybq5FfZ6DFeC+8uPyLd2B/W nfedQ4nuV5Ft5fas0ry+rGPn5vbcjWNwjatkIoubdkClM6Mm71HXuzlO4IYki9z5lqP2 CkjIkqZvO/lcSUJqm/uY6NspI/2FaW9jwowoIjxaBXl+q7NZTJnwq8G9Jhy4m1FjzK4K 1Q3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UQ0ABdxfR6D43G67MI06zZIhVmsQkiYQlJlGftbg3co=; b=q6ziQ6sJmP1ghF8POXyvM5ryO65W1rEHCHeBJJ+ElxsQYQgRyNDnMRsQZGTjLZYSjs jVeBKdseWxt/WrEqqwmX9PqM8JcquP1860V6GrIwe4n1pxPX2hUMlcjNpYpF3PACdYfX F0Bl3AxmS/xCrycsuLZZu5fVi1inl9Ml6biy47C+Uvyz9uPoOzfg5RW9UuAl6DcP3ncC Zh4Uf3PSKpaYGG6uagSRWQ5i4u9SlYF2cXihfMzD5q190aXXim3f4lmNxZocSy32oyjU yZAtELy3Oxepfke30BI9ubJfpt6vT96/SMpVe0SqwQZYYvn5D8VA+KASIDPl3r+121Um oUyw== X-Gm-Message-State: AJIora9VD/fOc6nhbY6ayaavLNOBlBZ9+U994qPeGHypCgom9RmCiEjW LU6ai8ttE7Ura07NEUcNIWzYT55w15enWkoudxRe9w== X-Google-Smtp-Source: AGRyM1vyljlV3h/CFSJYkNpPc3JzXZ4DEIbWhvcjFs6Qm0gF5Tz+g16pAYIAcWs2EeS1saAxpg3NA+2q2/KUAbSN9II= X-Received: by 2002:a63:5449:0:b0:40d:c8d4:ed6a with SMTP id e9-20020a635449000000b0040dc8d4ed6amr8372360pgm.9.1656606240746; Thu, 30 Jun 2022 09:24:00 -0700 (PDT) MIME-Version: 1.0 References: <20220624173656.2033256-1-jthoughton@google.com> <20220624173656.2033256-5-jthoughton@google.com> In-Reply-To: From: James Houghton Date: Thu, 30 Jun 2022 09:23:49 -0700 Message-ID: Subject: Re: [RFC PATCH 04/26] hugetlb: make huge_pte_lockptr take an explicit shift argument. To: Muchun Song Cc: Mike Kravetz , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 30, 2022 at 2:35 AM Muchun Song wrote: > > On Wed, Jun 29, 2022 at 03:24:45PM -0700, Mike Kravetz wrote: > > On 06/29/22 14:39, James Houghton wrote: > > > On Wed, Jun 29, 2022 at 2:04 PM Mike Kravetz wrote: > > > > > > > > On 06/29/22 14:09, Muchun Song wrote: > > > > > On Mon, Jun 27, 2022 at 01:51:53PM -0700, Mike Kravetz wrote: > > > > > > On 06/24/22 17:36, James Houghton wrote: > > > > > > > This is needed to handle PTL locking with high-granularity mapping. We > > > > > > > won't always be using the PMD-level PTL even if we're using the 2M > > > > > > > hugepage hstate. It's possible that we're dealing with 4K PTEs, in which > > > > > > > case, we need to lock the PTL for the 4K PTE. > > > > > > > > > > > > I'm not really sure why this would be required. > > > > > > Why not use the PMD level lock for 4K PTEs? Seems that would scale better > > > > > > with less contention than using the more coarse mm lock. > > > > > > > > > > > > > > > > Your words make me thing of another question unrelated to this patch. > > > > > We __know__ that arm64 supports continues PTE HugeTLB. huge_pte_lockptr() > > > > > did not consider this case, in this case, those HugeTLB pages are contended > > > > > with mm lock. Seems we should optimize this case. Something like: > > > > > > > > > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > > > > index 0d790fa3f297..68a1e071bfc0 100644 > > > > > --- a/include/linux/hugetlb.h > > > > > +++ b/include/linux/hugetlb.h > > > > > @@ -893,7 +893,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) > > > > > static inline spinlock_t *huge_pte_lockptr(struct hstate *h, > > > > > struct mm_struct *mm, pte_t *pte) > > > > > { > > > > > - if (huge_page_size(h) == PMD_SIZE) > > > > > + if (huge_page_size(h) <= PMD_SIZE) > > > > > return pmd_lockptr(mm, (pmd_t *) pte); > > > > > VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); > > > > > return &mm->page_table_lock; > > > > > > > > > > I did not check if elsewhere needs to be changed as well. Just a primary > > > > > thought. > > > > > > I'm not sure if this works. If hugetlb_pte_size(hpte) is PAGE_SIZE, > > > then `hpte.ptep` will be a pte_t, not a pmd_t -- I assume that breaks > > > things. So I think, when doing a HugeTLB PT walk down to PAGE_SIZE, we > > > need to separately keep track of the location of the PMD so that we > > > can use it to get the PMD lock. > > > > I assume Muchun was talking about changing this in current code (before > > your changes) where huge_page_size(h) can not be PAGE_SIZE. > > > > Yes, that's what I meant. Right -- but I think my point still stands. If `huge_page_size(h)` is CONT_PTE_SIZE, then the `pte_t *` passed to `huge_pte_lockptr` will *actually* point to a `pte_t` and not a `pmd_t` (I'm pretty sure the distinction is important). So it seems like we need to separately keep track of the real pmd_t that is being used in the CONT_PTE_SIZE case (and therefore, when considering HGM, the PAGE_SIZE case). However, we *can* make this optimization for CONT_PMD_SIZE (maybe this is what you originally meant, Muchun?), so instead of `huge_page_size(h) == PMD_SIZE`, we could do `huge_page_size(h) >= PMD_SIZE && huge_page_size(h) < PUD_SIZE`. > > Thanks.