From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB084C432BE for ; Thu, 12 Aug 2021 21:08:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 30340610FA for ; Thu, 12 Aug 2021 21:08:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 30340610FA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 861876B006C; Thu, 12 Aug 2021 17:08:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7EB156B0071; Thu, 12 Aug 2021 17:08:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 68AD56B0072; Thu, 12 Aug 2021 17:08:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 4C3286B006C for ; Thu, 12 Aug 2021 17:08:35 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D4F6418041CF8 for ; Thu, 12 Aug 2021 21:08:34 +0000 (UTC) X-FDA: 78467667348.04.AA8FB03 Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by imf07.hostedemail.com (Postfix) with ESMTP id 9491C100A3AD for ; Thu, 12 Aug 2021 21:08:34 +0000 (UTC) Received: by mail-lj1-f171.google.com with SMTP id h9so12485153ljq.8 for ; Thu, 12 Aug 2021 14:08:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=YpFOixfCUpC49LCJc0vUvfuAwPcwjJWVPf9TZf8tyJ8=; b=cAG9i2LwB3ilC3ArMI7sK1ndTIQHyayqlAYVv3TlOJ5K+Z6NySlDYlyxQEVoQ9Wgvk NhkpaxSmgxedeZ2O7khhDzfm6Pak3MvGiXBVHRibHFlBD2cRKkF8oTVL46xJI4ean5h7 GVZlzbl37MOZrjyehEAeM9aL+K9Z8D0aTmBV/UpyNzofZ3IdxRhubVb37DRtAxp6f/1W XBMmDjpd+LuS370xmTefqRvppJguLKysX+6ubL2k8A+4chahQoxIFx5fdSIdFoWutTy4 fqmEKHXyCBHHbu9yMrwydNqTj7C7EXBa6HS3BN+B4DqpzrbXVuTyXhqsCprOwayzDeRC tEjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=YpFOixfCUpC49LCJc0vUvfuAwPcwjJWVPf9TZf8tyJ8=; b=R7UXjB9pYPYxe9xkp5FzjNcqezsjhNahf5N2wmZc7m/ZMJDKc0wrmlQ8mY0I3m93FS c6kNOmZVGMYkdsQB3JydWLSfrg5ZxtroERxfIhx0TarK2ndChvaXRA11tAtQ3wwrjwgC k/Lte3L5r2iUlxweg0yZ2F/aLXyEJW4r1TVgjmFEoM1tqXSTRx6IhvljMHznOGL2q2FL G4WZX138/aUbZ5asAL7Be9DRCqjoLz6c4Lfwr0N6eY5iiGEQ6c4qqdujtx/teVxd6NJF OH50mOOCAkOXbNpTna12InZgGwP35GGNBmcx4Dc422OSWB8dYet7W0N2gGnSNPUHfOzD xtqA== X-Gm-Message-State: AOAM530x069wAEfEZg4VeRenPpuywBLq+IE2058wBdqjn4joWmWE077a sKvEuMDOPv6+gaCLG5qUh5N3TA== X-Google-Smtp-Source: ABdhPJzFIkQOm8E0duwu89bHdOD4H/nXReJ/pZkzEeRcCz/lKjlA0OADTVu1mSRdY1UHYhvG1mAEcQ== X-Received: by 2002:a05:651c:516:: with SMTP id o22mr161107ljp.152.1628802513050; Thu, 12 Aug 2021 14:08:33 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id q66sm441742ljb.83.2021.08.12.14.08.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Aug 2021 14:08:32 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 10242102BEE; Fri, 13 Aug 2021 00:08:46 +0300 (+03) Date: Fri, 13 Aug 2021 00:08:46 +0300 From: "Kirill A. Shutemov" To: Dave Hansen Cc: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: Re: [PATCH 1/5] mm: Add support for unaccepted memory Message-ID: <20210812210846.bfalflrvn4bfpyyh@box.shutemov.name> References: <20210810062626.1012-1-kirill.shutemov@linux.intel.com> <20210810062626.1012-2-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=shutemov-name.20150623.gappssmtp.com header.s=20150623 header.b=cAG9i2Lw; spf=none (imf07.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.208.171) smtp.mailfrom=kirill@shutemov.name; dmarc=none X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9491C100A3AD X-Stat-Signature: rafdmazjny371p4uetnf3cytuprw8333 X-HE-Tag: 1628802514-265736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 10, 2021 at 01:50:57PM -0700, Dave Hansen wrote: > On 8/10/21 11:13 AM, Dave Hansen wrote: > >> @@ -1001,6 +1004,9 @@ static inline void del_page_from_free_list(struct page *page, struct zone *zone, > >> if (page_reported(page)) > >> __ClearPageReported(page); > >> > >> + if (PageOffline(page)) > >> + clear_page_offline(page, order); > >> + > >> list_del(&page->lru); > >> __ClearPageBuddy(page); > >> set_page_private(page, 0); > > So, this is right in the fast path of the page allocator. It's a > > one-time thing per 2M page, so it's not permanent. > > > > *But* there's both a global spinlock and a firmware call hidden in > > clear_page_offline(). That's *GOT* to hurt if you were, for instance, > > running a benchmark while this code path is being tickled. Not just to > > > > That could be just downright catastrophic for scalability, albeit > > temporarily. > > One more thing... > > How long are these calls? You have to make at least 512 calls into the > SEAM module. Assuming they're syscall-ish, so ~1,000 cycles each, > that's ~500,000 cycles, even if we ignore the actual time it takes to > zero that 2MB worth of memory and all other overhead within the SEAM module. I hope to get away with 2 calls per 2M: one MapGPA and one TDACCEPTPAGE (or 3 for MAXORDER -- 4M -- pages). I don't have any numbers yet. > So, we're sitting on one CPU with interrupts off, blocking all the other > CPUs from doing page allocation in this zone. I agree that's not good. Let's see if it's going to be okay with accepting in 2M chunks. > Then, we're holding a global lock which prevents any other NUMA nodes > from accepting pages. Looking at this again, the global lock is aviodable: the caller owns the pfn range so nobody can touch these bits in the bitmap. We can replace bitmap_clear() with atomic clear_bit() loop and drop the lock completely. > If the other node happens to *try* to do an > accept, it will sit with its zone lock held waiting for this one. > Maybe nobody will ever notice. But, it seems like an awfully big risk > to me. I'd at least *try* do these calls outside of the zone lock. > Then the collateral damage will at least be limited to things doing > accepts rather than all zone->lock users. > > Couldn't we delay the acceptance to, say the place where we've dropped > the zone->lock and do the __GFP_ZERO memset() like at prep_new_page()? > Or is there some concern that the page has been split at that point? It *will* be split by the point. Like if you ask for order-0 page and you don't any left page allocator will try higher orders until finds anything. On order-9 it would hit unaccepted. At that point the page going to split and put on the free lists accordingly. That's all happens under zone lock. __rmqueue_smallest -> del_page_from_free_list() expand() > I guess that makes it more complicated because you might have a 4k page > but you need to go accept a 2M page. You might end up having to check > the bitmap 511 more times because you might see 511 more PageOffline() > pages come through. > > You shouldn't even need the bitmap lock to read since it's a one-way > trip from unaccepted->accepted. Yeah. Unless we don't want to flip it back on making the range share. I think we do. Otherwise it will cause problems for kexec. -- Kirill A. Shutemov