From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85B66C432BE for ; Wed, 28 Jul 2021 19:03:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 395A160BD3 for ; Wed, 28 Jul 2021 19:03:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 395A160BD3 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id DAE418E0001; Wed, 28 Jul 2021 15:03:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D5F1A8D0002; Wed, 28 Jul 2021 15:03:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C4D338E0001; Wed, 28 Jul 2021 15:03:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id A1CE48D0002 for ; Wed, 28 Jul 2021 15:03:22 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 57EC722003 for ; Wed, 28 Jul 2021 19:03:22 +0000 (UTC) X-FDA: 78412919844.02.7C5F796 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf10.hostedemail.com (Postfix) with ESMTP id B704760116B4 for ; Wed, 28 Jul 2021 19:03:20 +0000 (UTC) Received: by mail-pl1-f169.google.com with SMTP id e5so3850275pld.6 for ; Wed, 28 Jul 2021 12:03:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=D09k/udZzMUApTlrkXnqjrikpFnOE7fyiukk2PXvDy0=; b=s9HiC6tPraYvdos2gWW7j/Hb62sMW6rzQnjZs/GKe3z19p4HRDKZioJyHfvH1ZcWG7 qf7YgpBMLtEKx6JABTpeMgd5F5lcCWe65oeqTNqi4DZILAzKJYrCdVt9l/MeXdu3jS/X VCM7QEK47oh/TMvmUq9GVEQL9tFg0dSQa0p1jnmPh3AddvpMQtI2oWZJhVUVJW9ectf7 GOvi5NkI+S5IW8helEGgw1i1c1S0RHMBRD3z/GdGXjQlY6/yCmYOoqB/k7DNdryDH6fd 4J/rggL384wtHZ6C3omTgWHa0UROCw1rBVsUNYYwWlKvlxCrLyRpem7usEvxq+jdUGs3 rWwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=D09k/udZzMUApTlrkXnqjrikpFnOE7fyiukk2PXvDy0=; b=fkW8XUCOHDPsaFWS1xrIb04elODgzYSLxjBB9XWDJg2LkfIH66CUBNY7malJJFSDuW ZmMJUeFV71icSZqbVwrWXJxQ8ok9viiUwGMdPw6YP2LzaqilUHzZJssAVfGWdihGjH2H zRDdV3g1hFLQ1BgrcgS6u2SDKE5fyZrnCu1VNgfN1FGOBcd/QPI7TqD88Ecnn33lvSRD jQdxaFua8wG2TWmTPZBM6dksrG7gfzELU4/GpsIu/l/mQZNpdfNYpftcxnrfUZBkNmxg j5SyxDEUo1+Df/pUN+r6k6yZxDZEiEHF24QDv3DC1F4dwz3Rx+8mBiHMPYzc54iFExD9 V5nQ== X-Gm-Message-State: AOAM533MG3yFarzXZJ8M3DynhD4f1Y4Nv0pO9MNXVqDNthqCX8Dnfmz1 hb+TFQ2NeGR8XEOFeXHWD7A7jDjTw5X1+Bfp9mecxA== X-Google-Smtp-Source: ABdhPJxrZ0UlTsxwVvgI1SC8FXddx7YeCH0Fhfco+z6XiLx1P52Oy12HxjSfu8GvoH5TyVqyNIT2qcJlRFKnJ8H88nk= X-Received: by 2002:a63:d54b:: with SMTP id v11mr325711pgi.450.1627498998244; Wed, 28 Jul 2021 12:03:18 -0700 (PDT) MIME-Version: 1.0 References: <20210714193542.21857-1-joao.m.martins@oracle.com> <20210714193542.21857-13-joao.m.martins@oracle.com> <156c4fb8-46c5-d8ae-b953-837b86516ded@oracle.com> In-Reply-To: From: Dan Williams Date: Wed, 28 Jul 2021 12:03:07 -0700 Message-ID: Subject: Re: [PATCH v3 12/14] device-dax: compound pagemap support To: Joao Martins Cc: Linux MM , Vishal Verma , Dave Jiang , Naoya Horiguchi , Matthew Wilcox , Jason Gunthorpe , John Hubbard , Jane Chu , Muchun Song , Mike Kravetz , Andrew Morton , Jonathan Corbet , Linux NVDIMM , Linux Doc Mailing List Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=intel-com.20150623.gappssmtp.com header.s=20150623 header.b=s9HiC6tP; spf=none (imf10.hostedemail.com: domain of dan.j.williams@intel.com has no SPF policy when checking 209.85.214.169) smtp.mailfrom=dan.j.williams@intel.com; dmarc=fail reason="No valid SPF, DKIM not aligned (relaxed)" header.from=intel.com (policy=none) X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B704760116B4 X-Stat-Signature: trh1jf49m4pbcrk46srybqrcrrfz8pa5 X-HE-Tag: 1627499000-60919 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Jul 28, 2021 at 11:59 AM Joao Martins wrote: > > > > On 7/28/21 7:51 PM, Dan Williams wrote: > > On Wed, Jul 28, 2021 at 2:36 AM Joao Martins wrote: > >> > >> On 7/28/21 12:51 AM, Dan Williams wrote: > >>> On Thu, Jul 15, 2021 at 5:01 AM Joao Martins wrote: > >>>> On 7/15/21 12:36 AM, Dan Williams wrote: > >>>>> On Wed, Jul 14, 2021 at 12:36 PM Joao Martins wrote: > >>>> This patch is not the culprit, the flaw is early in the series, specifically the fourth patch. > >>>> > >>>> It needs this chunk below change on the fourth patch due to the existing elevated page ref > >>>> count at zone device memmap init. put_page() called here in memunmap_pages(): > >>>> > >>>> for (i = 0; i < pgmap->nr_ranges; i++) > >>>> for_each_device_pfn(pfn, pgmap, i) > >>>> put_page(pfn_to_page(pfn)); > >>>> > >>>> ... on a zone_device compound memmap would otherwise always decrease head page refcount by > >>>> @geometry pfn amount (leading to the aforementioned splat you reported). > >>>> > >>>> diff --git a/mm/memremap.c b/mm/memremap.c > >>>> index b0e7b8cf3047..79a883af788e 100644 > >>>> --- a/mm/memremap.c > >>>> +++ b/mm/memremap.c > >>>> @@ -102,15 +102,15 @@ static unsigned long pfn_end(struct dev_pagemap *pgmap, int range_id) > >>>> return (range->start + range_len(range)) >> PAGE_SHIFT; > >>>> } > >>>> > >>>> -static unsigned long pfn_next(unsigned long pfn) > >>>> +static unsigned long pfn_next(struct dev_pagemap *pgmap, unsigned long pfn) > >>>> { > >>>> if (pfn % 1024 == 0) > >>>> cond_resched(); > >>>> - return pfn + 1; > >>>> + return pfn + pgmap_pfn_geometry(pgmap); > >>> > >>> The cond_resched() would need to be fixed up too to something like: > >>> > >>> if (pfn % (1024 << pgmap_geometry_order(pgmap))) > >>> cond_resched(); > >>> > >>> ...because the goal is to take a break every 1024 iterations, not > >>> every 1024 pfns. > >>> > >> > >> Ah, good point. > >> > >>>> } > >>>> > >>>> #define for_each_device_pfn(pfn, map, i) \ > >>>> - for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(pfn)) > >>>> + for (pfn = pfn_first(map, i); pfn < pfn_end(map, i); pfn = pfn_next(map, pfn)) > >>>> > >>>> static void dev_pagemap_kill(struct dev_pagemap *pgmap) > >>>> { > >>>> > >>>> It could also get this hunk below, but it is sort of redundant provided we won't touch > >>>> tail page refcount through out the devmap pages lifetime. This setting of tail pages > >>>> refcount to zero was in pre-v5.14 series, but it got removed under the assumption it comes > >>>> from the page allocator (where tail pages are already zeroed in refcount). > >>> > >>> Wait, devmap pages never see the page allocator? > >>> > >> "where tail pages are already zeroed in refcount" this actually meant 'freshly allocated > >> pages' and I was referring to commit 7118fc2906e2 ("hugetlb: address ref count racing in > >> prep_compound_gigantic_page") that removed set_page_count() because the setting of page > >> ref count to zero was redundant. > > > > Ah, maybe include that reference in the changelog? > > > Yeap, will do. > > >> > >> Albeit devmap pages don't come from page allocator, you know separate zone and these pages > >> aren't part of the regular page pools (e.g. accessible via alloc_pages()), as you are > >> aware. Unless of course, we reassign them via dax_kmem, but then the way we map the struct > >> pages would be regular without any devmap stuff. > > > > Got it. I think with the back reference to that commit (7118fc2906e2) > > it resolves my confusion. > > > >> > >>>> > >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>> index 96975edac0a8..469a7aa5cf38 100644 > >>>> --- a/mm/page_alloc.c > >>>> +++ b/mm/page_alloc.c > >>>> @@ -6623,6 +6623,7 @@ static void __ref memmap_init_compound(struct page *page, unsigned > >>>> long pfn, > >>>> __init_zone_device_page(page + i, pfn + i, zone_idx, > >>>> nid, pgmap); > >>>> prep_compound_tail(page, i); > >>>> + set_page_count(page + i, 0); > >>> > >>> Looks good to me and perhaps a for elevated tail page refcount at > >>> teardown as a sanity check that the tail pages was never pinned > >>> directly? > >>> > >> Sorry didn't follow completely. > >> > >> You meant to set tail page refcount back to 1 at teardown if it was kept to 0 (e.g. > >> memunmap_pages() after put_page()) or that the refcount is indeed kept to zero after the > >> put_page() in memunmap_pages() ? > > > > The latter, i.e. would it be worth it to check that a tail page did > > not get accidentally pinned instead of a head page? I'm also ok to > > leave out that sanity checking for now. > > > What makes me not worry too much about the sanity checking is that this put_page is > supposed to disappear here: > > https://lore.kernel.org/linux-mm/20210717192135.9030-3-alex.sierra@amd.com/ > > .. in fact none the hunks here: > > https://lore.kernel.org/linux-mm/f7217b61-c845-eaed-501e-c9e7067a6b87@oracle.com/ > > None of them would matter, as there would no longer exist an elevated page refcount to > deal with. Ah good point. It's past time to take care of that... if only that patch kit had been Cc'd to the DAX maintainer...