From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43396C433EF for ; Thu, 7 Oct 2021 09:15:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DBF1C610C8 for ; Thu, 7 Oct 2021 09:15:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org DBF1C610C8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=shutemov.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7A9636B006C; Thu, 7 Oct 2021 05:15:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7598F6B0071; Thu, 7 Oct 2021 05:15:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 648F36B0073; Thu, 7 Oct 2021 05:15:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 57CB76B006C for ; Thu, 7 Oct 2021 05:15:39 -0400 (EDT) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1E2B125F2C for ; Thu, 7 Oct 2021 09:15:39 +0000 (UTC) X-FDA: 78669083598.06.E7E866A Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by imf21.hostedemail.com (Postfix) with ESMTP id C4197D039906 for ; Thu, 7 Oct 2021 09:15:38 +0000 (UTC) Received: by mail-lf1-f46.google.com with SMTP id u18so21848234lfd.12 for ; Thu, 07 Oct 2021 02:15:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov-name.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=VauanTdZv3UHgHwISg3C/HfE8zoLg5XgnlLh83CWG9w=; b=E2OAB6TTPYTlDL0meHUx7URBdK8B6piIaknNEV73WUz4yAFaha46PnPUQyXeklJYI/ SH3gu4mQIi6a6/1PU6/tlybRvtvkVgAFz4peVCNs2BgCWWhMAUs1SvOg438J4FQ6caOE FCYl+HSTT2J02uzrjHqrH3NaOZog2IQi1cIrH3Rlz3hFic9ABW0cCytjvTTXvAEJ1FpK UqscY49IEJp15SngF4FXDwDQgNDD1Xyxrx3sesiwo7Khq08Q0QW6lFoYuBssPRBeLlB9 4eNuG9rmyOUrmlyRBHAHTJCeSbl7T2l8PC1Fu8m57njMvwQzfIElGDXo87dmjNDQgip/ a7PQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=VauanTdZv3UHgHwISg3C/HfE8zoLg5XgnlLh83CWG9w=; b=n0ap5VU3OFUDX2UsAtDA+vohDOyHOoO7HGKC9fdsVDOW/fe0Ew2/jqaJDF2ukNNiHy gWMIf3kdTl78fmGkBz3sX9d+7czgNQjrGjQ3Kd3aTgY38fPPXV77A24S2TjSP+JR5vvM kBUHfwQHZuRUSoWtTISnbuZnEGflkyy/jv8cbb1sH+WDkvo70DtCUWaE0MYJRX5XzBRY tOu+sAjV23x+nFEGD9vLsEVtj5b8VepNZ0FbEniV0ttnoLa4IiB2DU245FRcBP5OwPEu /9NAo6NvHJm39b+HwdcbF6slF/rufRHf/jt1tJd+pmy8da/UPT1q71tc5z9D/HZI/gF0 pGCQ== X-Gm-Message-State: AOAM531PfkQpjp7gnG9QCRqVkyAqLrDMFKgWsQ0WCc+WEpEtdWj0UKzh g7iYHbJCFu6/tGT45fE7DQT4xQ== X-Google-Smtp-Source: ABdhPJzupbqgpetrQi3zSxVLO8EW0giR+9i3oadppQRYscjz5KhGI4kOosqfLtCZPUk2P2vxrJetJQ== X-Received: by 2002:a05:6512:398b:: with SMTP id j11mr3005311lfu.529.1633598137145; Thu, 07 Oct 2021 02:15:37 -0700 (PDT) Received: from box.localdomain ([86.57.175.117]) by smtp.gmail.com with ESMTPSA id s17sm1637867lfe.10.2021.10.07.02.15.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Oct 2021 02:15:36 -0700 (PDT) Received: by box.localdomain (Postfix, from userid 1000) id 335451030EC; Thu, 7 Oct 2021 12:15:35 +0300 (+03) Date: Thu, 7 Oct 2021 12:15:35 +0300 From: "Kirill A. Shutemov" To: Kent Overstreet , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, hannes@cmpxchg.org, willy@infradead.org, rientjes@google.com Subject: Re: Compaction & folios Message-ID: <20211007091535.7ocsvylljmfva2fy@box.shutemov.name> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C4197D039906 X-Stat-Signature: nhmk1i6obctgmwsatpxd74st4suojmbt Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=shutemov-name.20210112.gappssmtp.com header.s=20210112 header.b=E2OAB6TT; dmarc=none; spf=none (imf21.hostedemail.com: domain of kirill@shutemov.name has no SPF policy when checking 209.85.167.46) smtp.mailfrom=kirill@shutemov.name X-HE-Tag: 1633598138-865493 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Oct 06, 2021 at 06:53:41PM -0400, Kent Overstreet wrote: > So I have some observations on memory compaction & hugepages. > > Right now, the working assumption in MM is that compaction is hard and > expensive, and right now it is - because most allocations are order 0, with a > small subset being hugepage order allocations. This means any time we need a > hugepage, compaction has to move a bunch of order 0 pages around, and memory > reclaim is no help here - when we reclaim memory, it's coming back as fragmented > order 0 pages. > > But what if compaction wasn't such a difficult, expensive operation? > > With folios, and then folios for anonymous pages, we won't see nearly so many > order 0 allocations anymore - we'll see a spread of allocation sizes based on a > mixture of application usage patterns - something much closer to a poisson > distribution, vs. our current very bimodal distribution. And since we won't be > fragmenting all our allocations up front, memory reclaim will be freeing > allocations in this same distribution. > > Which means that any time an order n allocation fails, it's likely that we'll > still have order n-1 pages free - and of those free order n-1 pages, one will > likely have a buddy that's moveable and hasn't been fragmented - meaning the > common case is that compaction will have to move _one_ (higher order) page - > we'll almost never be having to move a bunch of 4k pages. > > Another way of thinking of this is that memory reclaim will be doing most of the > work that compaction has to do now to allocate a high order page. Compaction > will go from an expensive, somewhat unreliable operation to one that mostly just > works - it's going to be _much_ less of a pain point. > > It may turn out that allocating hugepages still doesn't work as reliably as we'd > like - but folios are still a big help even when we can't allocate a 2MB page, > because we'll be able to fall back to an order 6 or 7 or 8 allocation, which is > something we can't do now. And, since multiple CPU vendors now support > coalescing contiguous PTE entries in the TLB, this will still get us most of the > performance benefits of using hugepages. Compaction at the moment built with assumption that compound pages are PMD-mappable or larger and it doesn't make sense to move them: /* * Regardless of being on LRU, compound pages such as THP and * hugetlbfs are not to be compacted unless we are attempting * an allocation much larger than the huge page size (eg CMA). * We can potentially save a lot of iterations if we skip them * at once. The check is racy, but we can consider only valid * values and the only danger is skipping too much. */ if (PageCompound(page) && !cc->alloc_contig) { const unsigned int order = compound_order(page); if (likely(order < MAX_ORDER)) low_pfn += (1UL << order) - 1; goto isolate_fail; } It also will apply to folios with direct conversion. It has to be reworked sooner rather than later if we want to be more flexible on size of folios or we are risking getting compaction situation worse. -- Kirill A. Shutemov