From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E57BC433B4 for ; Wed, 5 May 2021 22:20:55 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A094606A5 for ; Wed, 5 May 2021 22:20:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A094606A5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B10BA6B006E; Wed, 5 May 2021 18:20:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AE6FC6B0070; Wed, 5 May 2021 18:20:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 988636B0071; Wed, 5 May 2021 18:20:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0019.hostedemail.com [216.40.44.19]) by kanga.kvack.org (Postfix) with ESMTP id 7C25A6B006E for ; Wed, 5 May 2021 18:20:54 -0400 (EDT) Received: from smtpin40.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 398856C0B for ; Wed, 5 May 2021 22:20:54 +0000 (UTC) X-FDA: 78108598428.40.0B03B1E Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf16.hostedemail.com (Postfix) with ESMTP id 3671A80192EF for ; Wed, 5 May 2021 22:20:44 +0000 (UTC) Received: by mail-ej1-f44.google.com with SMTP id n2so5248945ejy.7 for ; Wed, 05 May 2021 15:20:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NEDAh4DcJxI2Idka9X3zEdAWqtCJML9fMhYKOaD1tRM=; b=Sdy66CfF8FQ7BxRk+mOraAZl9/zzzQjNTK4fI66Obu4uT9zeR18bMW6JW//dZ/A9GC VxrBarmM4Fg092WwDHEi0E77lkSsxD5xE3Xhxal7jdy2ZxvVsq0Nd/PGZvwRNKOagKS5 m9i0Z05TOy+mLZcL8b46Axmn+5YBSDVE104OoBwL6lM+Xu7MA9TV++Tbr7Bf0SwB2o2t rh4x3qMrbUBeg8CnRMxiz2PygUXVqJNh2ZhT4hbvPAu+rcLcORZr0+FLSr8oIafIYiFh l8l0m+CY6Jk124ZOtdKoI+G/6KhlAQOTE52Okg2qH5AEdvf9F6fc6se+l7lfmxM6f2qB HiXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NEDAh4DcJxI2Idka9X3zEdAWqtCJML9fMhYKOaD1tRM=; b=oKcJTsx3yYXGCQiALurPwMCoN6B1zBN7P7nU+zM97rNOgDg3NiWaEdJdOch7EtR0Aa DosCViwTfYc63zX5stinm2coMZXHfIVk2b2mvVU/nd+8jsEvWA7GjRK/IonaCgfAwbUc zYBqmfeHNAeXmUihAqQ9SFoH7IbENR6fzDTR5VkE353+1WFuKLlA2obYqsyrPMLqDAnV 3z3NR/smlCAZg8FJ8L6LChhpRFAW2yg+etTxDYv11/XiyOJL6N3O+zpKAg0CzlTBomtP wIrK/L8Cl5b6zvEF0OBvkdW0/0PJLj+FNpGfexCKl+OCF0tV1QUKe2FsD0bScDllPyAW KR0g== X-Gm-Message-State: AOAM530zWRGs5Lb1WewcyHfbDO+1ofI328Z0O7tsfdIagzWyYqFSLMyU Y+AFDFbqKRY/yWbnBfrAZUPSd+wl9/Dq6Z4QLJUiXQ== X-Google-Smtp-Source: ABdhPJwYu8VVgPRy7o945bGqK6y6C2nvmUL2m1BEqoZAxhbsOpqibbx5xBrJ+XyffZbiIDFPOlAR4iCfY0sp7JoCg9M= X-Received: by 2002:a17:907:1183:: with SMTP id uz3mr982678ejb.264.1620253251070; Wed, 05 May 2021 15:20:51 -0700 (PDT) MIME-Version: 1.0 References: <20210325230938.30752-1-joao.m.martins@oracle.com> <20210325230938.30752-5-joao.m.martins@oracle.com> In-Reply-To: From: Dan Williams Date: Wed, 5 May 2021 15:20:55 -0700 Message-ID: Subject: Re: [PATCH v1 04/11] mm/memremap: add ZONE_DEVICE support for compound pages To: Joao Martins Cc: Linux MM , Ira Weiny , linux-nvdimm , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=intel-com.20150623.gappssmtp.com header.s=20150623 header.b=Sdy66CfF; spf=none (imf16.hostedemail.com: domain of dan.j.williams@intel.com has no SPF policy when checking 209.85.218.44) smtp.mailfrom=dan.j.williams@intel.com; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=intel.com (policy=none) X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 3671A80192EF X-Stat-Signature: 6udkm4d3p4ntsqazuz9mqcbu1oa4fpm6 Received-SPF: none (intel.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=mail-ej1-f44.google.com; client-ip=209.85.218.44 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1620253244-919250 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 5, 2021 at 12:50 PM Joao Martins wrote: > > > > On 5/5/21 7:44 PM, Dan Williams wrote: > > On Thu, Mar 25, 2021 at 4:10 PM Joao Martins wrote: > >> > >> Add a new align property for struct dev_pagemap which specifies that a > >> pagemap is composed of a set of compound pages of size @align, instead > >> of base pages. When these pages are initialised, most are initialised as > >> tail pages instead of order-0 pages. > >> > >> For certain ZONE_DEVICE users like device-dax which have a fixed page > >> size, this creates an opportunity to optimize GUP and GUP-fast walkers, > >> treating it the same way as THP or hugetlb pages. > >> > >> Signed-off-by: Joao Martins > >> --- > >> include/linux/memremap.h | 13 +++++++++++++ > >> mm/memremap.c | 8 ++++++-- > >> mm/page_alloc.c | 24 +++++++++++++++++++++++- > >> 3 files changed, 42 insertions(+), 3 deletions(-) > >> > >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h > >> index b46f63dcaed3..bb28d82dda5e 100644 > >> --- a/include/linux/memremap.h > >> +++ b/include/linux/memremap.h > >> @@ -114,6 +114,7 @@ struct dev_pagemap { > >> struct completion done; > >> enum memory_type type; > >> unsigned int flags; > >> + unsigned long align; > > > > I think this wants some kernel-doc above to indicate that non-zero > > means "use compound pages with tail-page dedup" and zero / PAGE_SIZE > > means "use non-compound base pages". > > Got it. Are you thinking a kernel_doc on top of the variable above, or preferably on top > of pgmap_align()? I was thinking in dev_pagemap because this value is more than just a plain alignment its restructuring the layout and construction of the memmap(), but when I say it that way it seems much less like a vanilla align value compared to a construction / geometry mode setting. > > > The non-zero value must be > > PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE. > > Hmm, maybe it should be an > > enum: > > > > enum devmap_geometry { > > DEVMAP_PTE, > > DEVMAP_PMD, > > DEVMAP_PUD, > > } > > > I suppose a converter between devmap_geometry and page_size would be needed too? And maybe > the whole dax/nvdimm align values change meanwhile (as a followup improvement)? I think it is ok for dax/nvdimm to continue to maintain their align value because it should be ok to have 4MB align if the device really wanted. However, when it goes to map that alignment with memremap_pages() it can pick a mode. For example, it's already the case that dax->align == 1GB is mapped with DEVMAP_PTE today, so they're already separate concepts that can stay separate. > > Although to be fair we only ever care about compound page size in this series (and > similarly dax/nvdimm @align properties). > > > ...because it's more than just an alignment it's a structural > > definition of how the memmap is laid out. > > > >> const struct dev_pagemap_ops *ops; > >> void *owner; > >> int nr_range; > >> @@ -130,6 +131,18 @@ static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap) > >> return NULL; > >> } > >> > >> +static inline unsigned long pgmap_align(struct dev_pagemap *pgmap) > >> +{ > >> + if (!pgmap || !pgmap->align) > >> + return PAGE_SIZE; > >> + return pgmap->align; > >> +} > >> + > >> +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap) > >> +{ > >> + return PHYS_PFN(pgmap_align(pgmap)); > >> +} > >> + > >> #ifdef CONFIG_ZONE_DEVICE > >> bool pfn_zone_device_reserved(unsigned long pfn); > >> void *memremap_pages(struct dev_pagemap *pgmap, int nid); > >> diff --git a/mm/memremap.c b/mm/memremap.c > >> index 805d761740c4..d160853670c4 100644 > >> --- a/mm/memremap.c > >> +++ b/mm/memremap.c > >> @@ -318,8 +318,12 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params, > >> memmap_init_zone_device(&NODE_DATA(nid)->node_zones[ZONE_DEVICE], > >> PHYS_PFN(range->start), > >> PHYS_PFN(range_len(range)), pgmap); > >> - percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) > >> - - pfn_first(pgmap, range_id)); > >> + if (pgmap_align(pgmap) > PAGE_SIZE) > >> + percpu_ref_get_many(pgmap->ref, (pfn_end(pgmap, range_id) > >> + - pfn_first(pgmap, range_id)) / pgmap_pfn_align(pgmap)); > >> + else > >> + percpu_ref_get_many(pgmap->ref, pfn_end(pgmap, range_id) > >> + - pfn_first(pgmap, range_id)); > >> return 0; > >> > >> err_add_memory: > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 58974067bbd4..3a77f9e43f3a 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -6285,6 +6285,8 @@ void __ref memmap_init_zone_device(struct zone *zone, > >> unsigned long pfn, end_pfn = start_pfn + nr_pages; > >> struct pglist_data *pgdat = zone->zone_pgdat; > >> struct vmem_altmap *altmap = pgmap_altmap(pgmap); > >> + unsigned int pfn_align = pgmap_pfn_align(pgmap); > >> + unsigned int order_align = order_base_2(pfn_align); > >> unsigned long zone_idx = zone_idx(zone); > >> unsigned long start = jiffies; > >> int nid = pgdat->node_id; > >> @@ -6302,10 +6304,30 @@ void __ref memmap_init_zone_device(struct zone *zone, > >> nr_pages = end_pfn - start_pfn; > >> } > >> > >> - for (pfn = start_pfn; pfn < end_pfn; pfn++) { > >> + for (pfn = start_pfn; pfn < end_pfn; pfn += pfn_align) { > > > > pfn_align is in bytes and pfn is in pages... is there a "pfn_align >>= > > PAGE_SHIFT" I missed somewhere? > > > @pfn_align is in pages too. It's pgmap_align() which is in bytes: > > +static inline unsigned long pgmap_pfn_align(struct dev_pagemap *pgmap) > +{ > + return PHYS_PFN(pgmap_align(pgmap)); > +} Ah yup, my eyes glazed over that. I think this is another place that benefits from a more specific name than "align". "pfns_per_compound" "compound_pfns"?