From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23F6BC4708F for ; Wed, 2 Jun 2021 00:37:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A0AE86135D for ; Wed, 2 Jun 2021 00:37:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0AE86135D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0A5D16B006C; Tue, 1 Jun 2021 20:37:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 056B76B006E; Tue, 1 Jun 2021 20:37:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E11AD6B0070; Tue, 1 Jun 2021 20:37:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id AB7386B006C for ; Tue, 1 Jun 2021 20:37:01 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3A3048249980 for ; Wed, 2 Jun 2021 00:37:01 +0000 (UTC) X-FDA: 78206919042.35.BA14AEA Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf12.hostedemail.com (Postfix) with ESMTP id 7649536D for ; Wed, 2 Jun 2021 00:36:43 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id i22so696629pju.0 for ; Tue, 01 Jun 2021 17:37:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ehJpNoxcSO5j61EDrC7njAl2/wUG1UBiz299N+upZb4=; b=sapS4iu5CfFYym4eoplSKgwPAWoSNSH4WenA7uJuirmkQN7i4Bd1gHKHX0kZz+l+sj TwFpeluguD5x5RCGKT9txZOymQcGHPpvEHIblG3KpjZYfTmYwI1frl9HyFbmPZqZNcdU kTu0xLTxgFnyp7oPrDFY1ldbHTrTCz47VoNVmgGM0GNHFUb4pNEK4gvCYQ4900+WOAgj 2x0nE/SlThf2mSaJIGlz+lTs2rq8pH2QiTOK6gXP+huYCuaszA20Jm5kDryb9azR3w+G f2tknJXQKyYCpDw1Bcq+yeVa1kuCN6p0ARcGVa5K9k8VBFNnYa60JyqmYVeDhzjek1UT rWnQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ehJpNoxcSO5j61EDrC7njAl2/wUG1UBiz299N+upZb4=; b=n9pyVPnF+hlBt/OgIJsjUX9s5l5aycC+NpgmX8E2OTSD+71Qd6N0HtPQhmA9dX09L6 zcqT8c03YqaTmqrEExwycdZ6ErT0EOL+Ca7eHvklAtH2Vl9Gfi4KgVrBi1uHGoE+UEW8 uQAoifYSztuptErmiRWL9b2TMUdbWKwCZIERKlPSs1UZ9cYqi+zlfe4Y2JWXH7bc7l8x Xxi3tp8r5bOZ/puHEZogXbPrEseFnvrlGxNSgtDK1gavarZjWnwndUy3OQBChhEYTdm8 eiKoi8QbYdQRQSJVr3a+2XGVrQ7fGdRlB9zbJzHxLkCCvNvt81ECECv4q/McGkTrfjCV peuQ== X-Gm-Message-State: AOAM531PnT238M+/2AGHisWD/XNc2q25ci2a/hVs+3RdKz0Vhx+otw6Q LOyUemzXEhC4QRGpNLFBP9uB89QgOnPjEV6PyqYfXA== X-Google-Smtp-Source: ABdhPJxuDQjMKOMsCf1R72tSKLxLc6OxrXJGjrHWxuaBYkPhKMWdAG4+JUdDiBjEaoSS72sjtkjPTME1xlslo+39a1Q= X-Received: by 2002:a17:90b:17c9:: with SMTP id me9mr2613366pjb.13.1622594219424; Tue, 01 Jun 2021 17:36:59 -0700 (PDT) MIME-Version: 1.0 References: <20210325230938.30752-1-joao.m.martins@oracle.com> <20210325230938.30752-11-joao.m.martins@oracle.com> In-Reply-To: <20210325230938.30752-11-joao.m.martins@oracle.com> From: Dan Williams Date: Tue, 1 Jun 2021 17:36:48 -0700 Message-ID: Subject: Re: [PATCH v1 10/11] device-dax: compound pagemap support To: Joao Martins Cc: Linux MM , Ira Weiny , linux-nvdimm , Matthew Wilcox , Jason Gunthorpe , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7649536D Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=intel-com.20150623.gappssmtp.com header.s=20150623 header.b=sapS4iu5; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=intel.com (policy=none); spf=none (imf12.hostedemail.com: domain of dan.j.williams@intel.com has no SPF policy when checking 209.85.216.52) smtp.mailfrom=dan.j.williams@intel.com X-Rspamd-Server: rspam03 X-Stat-Signature: mtj1qgycgk1h7gheroicxe57qwtmafiu X-HE-Tag: 1622594203-281276 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Mar 25, 2021 at 4:10 PM Joao Martins wrote: > > dax devices are created with a fixed @align (huge page size) which > is enforced through as well at mmap() of the device. Faults, > consequently happen too at the specified @align specified at the > creation, and those don't change through out dax device lifetime. > MCEs poisons a whole dax huge page, as well as splits occurring at > at the configured page size. This paragraph last... > > Use the newly added compound pagemap facility which maps the > assigned dax ranges as compound pages at a page size of @align. > Currently, this means, that region/namespace bootstrap would take > considerably less, given that you would initialize considerably less > pages. This paragraph should go first... > > On setups with 128G NVDIMMs the initialization with DRAM stored struct pages > improves from ~268-358 ms to ~78-100 ms with 2M pages, and to less than > a 1msec with 1G pages. This paragraph second... The reason for this ordering is to have increasingly more detail as the changelog is read so that people that don't care about the details can get the main theme immediately, and others that wonder why device-dax is able to support this can read deeper. > > Signed-off-by: Joao Martins > --- > drivers/dax/device.c | 58 ++++++++++++++++++++++++++++++++++---------- > 1 file changed, 45 insertions(+), 13 deletions(-) > > diff --git a/drivers/dax/device.c b/drivers/dax/device.c > index db92573c94e8..e3dcc4ad1727 100644 > --- a/drivers/dax/device.c > +++ b/drivers/dax/device.c > @@ -192,6 +192,43 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, > } > #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ > > +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn, > + unsigned long fault_size, > + struct address_space *f_mapping) > +{ > + unsigned long i; > + pgoff_t pgoff; > + > + pgoff = linear_page_index(vmf->vma, vmf->address > + & ~(fault_size - 1)); I know you are just copying this style from whomever wrote it this way originally, but that person (me) was wrong this should be: pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size)); ...you might do a lead-in cleanup patch before this one. > + > + for (i = 0; i < fault_size / PAGE_SIZE; i++) { > + struct page *page; > + > + page = pfn_to_page(pfn_t_to_pfn(pfn) + i); > + if (page->mapping) > + continue; > + page->mapping = f_mapping; > + page->index = pgoff + i; > + } > +} > + > +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn, > + unsigned long fault_size, > + struct address_space *f_mapping) > +{ > + struct page *head; > + > + head = pfn_to_page(pfn_t_to_pfn(pfn)); > + head = compound_head(head); > + if (head->mapping) > + return; > + > + head->mapping = f_mapping; > + head->index = linear_page_index(vmf->vma, vmf->address > + & ~(fault_size - 1)); > +} > + > static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, > enum page_entry_size pe_size) > { > @@ -225,8 +262,7 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, > } > > if (rc == VM_FAULT_NOPAGE) { > - unsigned long i; > - pgoff_t pgoff; > + struct dev_pagemap *pgmap = pfn_t_to_page(pfn)->pgmap; The device should already know its pagemap... There is a distinction in dev_dax_probe() for "static" vs "dynamic" pgmap, but once the pgmap is allocated it should be fine to assign it back to dev_dax->pgmap in the "dynamic" case. That could be a lead-in patch to make dev_dax->pgmap always valid. > > /* > * In the device-dax case the only possibility for a > @@ -234,17 +270,10 @@ static vm_fault_t dev_dax_huge_fault(struct vm_fault *vmf, > * mapped. No need to consider the zero page, or racing > * conflicting mappings. > */ > - pgoff = linear_page_index(vmf->vma, vmf->address > - & ~(fault_size - 1)); > - for (i = 0; i < fault_size / PAGE_SIZE; i++) { > - struct page *page; > - > - page = pfn_to_page(pfn_t_to_pfn(pfn) + i); > - if (page->mapping) > - continue; > - page->mapping = filp->f_mapping; > - page->index = pgoff + i; > - } > + if (pgmap->align > PAGE_SIZE) > + set_compound_mapping(vmf, pfn, fault_size, filp->f_mapping); > + else > + set_page_mapping(vmf, pfn, fault_size, filp->f_mapping); > } > dax_read_unlock(id); > > @@ -426,6 +455,9 @@ int dev_dax_probe(struct dev_dax *dev_dax) > } > > pgmap->type = MEMORY_DEVICE_GENERIC; > + if (dev_dax->align > PAGE_SIZE) > + pgmap->align = dev_dax->align; Just needs updates for whatever renames you do for the "compound geometry" terminology rather than subtle side effects of "align". Other than that, looks good to me.