From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B563C001B0 for ; Mon, 10 Jul 2023 10:10:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233225AbjGJKKu (ORCPT ); Mon, 10 Jul 2023 06:10:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231731AbjGJKKU (ORCPT ); Mon, 10 Jul 2023 06:10:20 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 15E19129 for ; Mon, 10 Jul 2023 03:08:23 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B09FE2B; Mon, 10 Jul 2023 03:08:34 -0700 (PDT) Received: from [10.57.77.63] (unknown [10.57.77.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4203F3F740; Mon, 10 Jul 2023 03:07:49 -0700 (PDT) Message-ID: <8304d1e2-2848-858d-e25b-5eeef8606754@arm.com> Date: Mon, 10 Jul 2023 11:07:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory To: David Hildenbrand , Matthew Wilcox Cc: Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <78159ed0-a233-9afb-712f-2df1a4858b22@redhat.com> <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/07/2023 14:24, David Hildenbrand wrote: > On 07.07.23 15:12, Matthew Wilcox wrote: >> On Fri, Jul 07, 2023 at 01:40:53PM +0200, David Hildenbrand wrote: >>> On 06.07.23 10:02, Ryan Roberts wrote: >>> But can you comment on the page migration part (IOW did you try it already)? >>> >>> For example, memory hotunplug, CMA, MCE handling, compaction all rely on >>> page migration of something that was allocated using GFP_MOVABLE to actually >>> work. >>> >>> Compaction seems to skip any higher-order folios, but the question is if the >>> udnerlying migration itself works. >>> >>> If it already works: great! If not, this really has to be tackled early, >>> because otherwise we'll be breaking the GFP_MOVABLE semantics. >> >> I have looked at this a bit.  _Migration_ should be fine.  _Compaction_ >> is not. > > Thanks! Very nice if at least ordinary migration works. That's good to hear - I hadn't personally investigated. > >> >> If you look at a function like folio_migrate_mapping(), it all seems >> appropriately folio-ised.  There might be something in there that is >> slightly wrong, but that would just be a bug to fix, not a huge >> architectural problem. >> >> The problem comes in the callers of migrate_pages().  They pass a >> new_folio_t callback.  alloc_migration_target() is the usual one passed >> and as far as I can tell is fine.  I've seen no problems reported with it. >> >> compaction_alloc() is a disaster, and I don't know how to fix it. >> The compaction code has its own allocator which is populated with order-0 >> folios.  How it populates that freelist is awful ... see split_map_pages() I think this compaction issue also affects large folios in the page cache? So really it is a pre-existing bug in the code base that needs to be fixed independently of large anon folios? Should I assume you are tackling this, Matthew? > > Yeah, all that code was written under the assumption that we're moving order-0 > pages (which is what the anon+pagecache pages part). > > From what I recall, we're allocating order-0 pages from the high memory > addresses, so we can migrate from low memory addresses, effectively freeing up > low memory addresses and filling high memory addresses. > > Adjusting that will be ... interesting. Instead of allocating order-0 pages from > high addresses, we might want to allocate "as large as possible" ("grab what we > can") from high addresses and then have our own kind of buddy for allocating > from that pool a compaction destination page, depending on our source page. Nasty. > > What should always work is the split->migrate. But that's definitely not what we > want in many cases. > >> >>> Is swapping working as expected? zswap? >> >> Suboptimally.  Swap will split folios in order to swap them.  Somebody >> needs to fix that, but it should work. > > Good! > > It would be great to have some kind of a feature matrix that tells us what works > perfectly, sub-optimally, barely, not at all (and what has not been tested). > Maybe (likely!) we'll also find things that are sub-optimal for ordinary THP > (like swapping, not even sure about). I'm building a list of known issues, but so far it has been based on code I've found during review and things raised by people in these threads. Are there test suites that explicitly test these features? If so I'll happily run them against large anon folios, but at the moment I'm ignorant I'm afraid. I have been trying to get mm selftests up and running, but I currently have a bunch of failures on arm64, even without any of my patches - somthing I'm working through. > > I suspect that KSM should work mostly fine with flexible-thp. When > deduplciating, we'll simply split the compound page and proceed as expected. But > might be worth testing as well. > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 673ADEB64D9 for ; Mon, 10 Jul 2023 10:08:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:Cc:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=G32uD7C8y33WWoyqgdZKoV8YKl3Mz9ZNcNcJgHvhYCw=; b=KKxnE9BRmpJtV+ 85UaQJxLoJS5RBlnFeyRuaa7cVu9jr6hJcOtzDh8EDPR7usIEa5uWLWyNvknkg51ZfM5z61yb75AK e7FAYLrLrbrGvdu6LbKX1dp898yqZhMrNld5GN1iVrXkI7ekzp8jHZ1ElX8AS3BbEvwoLcWeSlxWG H6hEqukQmjdvcReQ1mkc/zCBwSFKRLMPJZrVe4KpIEUMolwGGSCGmJxmODwZIZERP5zp0TOciaH+B BL1Gp6jCDq7yoVui+0fAigM0m2pxbZT9SCqsOk8WQ9Rjb9qLSAwUcG4rw7AZ+nni868HOipphvfel kJLuIMUBYYgSvwHr/UxA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qInoD-00BAqp-2W; Mon, 10 Jul 2023 10:07:57 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qInoA-00BApO-1x for linux-arm-kernel@lists.infradead.org; Mon, 10 Jul 2023 10:07:56 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B09FE2B; Mon, 10 Jul 2023 03:08:34 -0700 (PDT) Received: from [10.57.77.63] (unknown [10.57.77.63]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4203F3F740; Mon, 10 Jul 2023 03:07:49 -0700 (PDT) Message-ID: <8304d1e2-2848-858d-e25b-5eeef8606754@arm.com> Date: Mon, 10 Jul 2023 11:07:47 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v2 0/5] variable-order, large folios for anonymous memory To: David Hildenbrand , Matthew Wilcox Cc: Andrew Morton , "Kirill A. Shutemov" , Yin Fengwei , Yu Zhao , Catalin Marinas , Will Deacon , Anshuman Khandual , Yang Shi , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20230703135330.1865927-1-ryan.roberts@arm.com> <78159ed0-a233-9afb-712f-2df1a4858b22@redhat.com> <4d4c45a2-0037-71de-b182-f516fee07e67@arm.com> From: Ryan Roberts In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230710_030754_741460_7AE0CB79 X-CRM114-Status: GOOD ( 35.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org T24gMDcvMDcvMjAyMyAxNDoyNCwgRGF2aWQgSGlsZGVuYnJhbmQgd3JvdGU6Cj4gT24gMDcuMDcu MjMgMTU6MTIsIE1hdHRoZXcgV2lsY294IHdyb3RlOgo+PiBPbiBGcmksIEp1bCAwNywgMjAyMyBh dCAwMTo0MDo1M1BNICswMjAwLCBEYXZpZCBIaWxkZW5icmFuZCB3cm90ZToKPj4+IE9uIDA2LjA3 LjIzIDEwOjAyLCBSeWFuIFJvYmVydHMgd3JvdGU6Cj4+PiBCdXQgY2FuIHlvdSBjb21tZW50IG9u IHRoZSBwYWdlIG1pZ3JhdGlvbiBwYXJ0IChJT1cgZGlkIHlvdSB0cnkgaXQgYWxyZWFkeSk/Cj4+ Pgo+Pj4gRm9yIGV4YW1wbGUsIG1lbW9yeSBob3R1bnBsdWcsIENNQSwgTUNFIGhhbmRsaW5nLCBj b21wYWN0aW9uIGFsbCByZWx5IG9uCj4+PiBwYWdlIG1pZ3JhdGlvbiBvZiBzb21ldGhpbmcgdGhh dCB3YXMgYWxsb2NhdGVkIHVzaW5nIEdGUF9NT1ZBQkxFIHRvIGFjdHVhbGx5Cj4+PiB3b3JrLgo+ Pj4KPj4+IENvbXBhY3Rpb24gc2VlbXMgdG8gc2tpcCBhbnkgaGlnaGVyLW9yZGVyIGZvbGlvcywg YnV0IHRoZSBxdWVzdGlvbiBpcyBpZiB0aGUKPj4+IHVkbmVybHlpbmcgbWlncmF0aW9uIGl0c2Vs ZiB3b3Jrcy4KPj4+Cj4+PiBJZiBpdCBhbHJlYWR5IHdvcmtzOiBncmVhdCEgSWYgbm90LCB0aGlz IHJlYWxseSBoYXMgdG8gYmUgdGFja2xlZCBlYXJseSwKPj4+IGJlY2F1c2Ugb3RoZXJ3aXNlIHdl J2xsIGJlIGJyZWFraW5nIHRoZSBHRlBfTU9WQUJMRSBzZW1hbnRpY3MuCj4+Cj4+IEkgaGF2ZSBs b29rZWQgYXQgdGhpcyBhIGJpdC7CoCBfTWlncmF0aW9uXyBzaG91bGQgYmUgZmluZS7CoCBfQ29t cGFjdGlvbl8KPj4gaXMgbm90Lgo+IAo+IFRoYW5rcyEgVmVyeSBuaWNlIGlmIGF0IGxlYXN0IG9y ZGluYXJ5IG1pZ3JhdGlvbiB3b3Jrcy4KClRoYXQncyBnb29kIHRvIGhlYXIgLSBJIGhhZG4ndCBw ZXJzb25hbGx5IGludmVzdGlnYXRlZC4KCj4gCj4+Cj4+IElmIHlvdSBsb29rIGF0IGEgZnVuY3Rp b24gbGlrZSBmb2xpb19taWdyYXRlX21hcHBpbmcoKSwgaXQgYWxsIHNlZW1zCj4+IGFwcHJvcHJp YXRlbHkgZm9saW8taXNlZC7CoCBUaGVyZSBtaWdodCBiZSBzb21ldGhpbmcgaW4gdGhlcmUgdGhh dCBpcwo+PiBzbGlnaHRseSB3cm9uZywgYnV0IHRoYXQgd291bGQganVzdCBiZSBhIGJ1ZyB0byBm aXgsIG5vdCBhIGh1Z2UKPj4gYXJjaGl0ZWN0dXJhbCBwcm9ibGVtLgo+Pgo+PiBUaGUgcHJvYmxl bSBjb21lcyBpbiB0aGUgY2FsbGVycyBvZiBtaWdyYXRlX3BhZ2VzKCkuwqAgVGhleSBwYXNzIGEK Pj4gbmV3X2ZvbGlvX3QgY2FsbGJhY2suwqAgYWxsb2NfbWlncmF0aW9uX3RhcmdldCgpIGlzIHRo ZSB1c3VhbCBvbmUgcGFzc2VkCj4+IGFuZCBhcyBmYXIgYXMgSSBjYW4gdGVsbCBpcyBmaW5lLsKg IEkndmUgc2VlbiBubyBwcm9ibGVtcyByZXBvcnRlZCB3aXRoIGl0Lgo+Pgo+PiBjb21wYWN0aW9u X2FsbG9jKCkgaXMgYSBkaXNhc3RlciwgYW5kIEkgZG9uJ3Qga25vdyBob3cgdG8gZml4IGl0Lgo+ PiBUaGUgY29tcGFjdGlvbiBjb2RlIGhhcyBpdHMgb3duIGFsbG9jYXRvciB3aGljaCBpcyBwb3B1 bGF0ZWQgd2l0aCBvcmRlci0wCj4+IGZvbGlvcy7CoCBIb3cgaXQgcG9wdWxhdGVzIHRoYXQgZnJl ZWxpc3QgaXMgYXdmdWwgLi4uIHNlZSBzcGxpdF9tYXBfcGFnZXMoKQoKSSB0aGluayB0aGlzIGNv bXBhY3Rpb24gaXNzdWUgYWxzbyBhZmZlY3RzIGxhcmdlIGZvbGlvcyBpbiB0aGUgcGFnZSBjYWNo ZT8gU28KcmVhbGx5IGl0IGlzIGEgcHJlLWV4aXN0aW5nIGJ1ZyBpbiB0aGUgY29kZSBiYXNlIHRo YXQgbmVlZHMgdG8gYmUgZml4ZWQKaW5kZXBlbmRlbnRseSBvZiBsYXJnZSBhbm9uIGZvbGlvcz8g U2hvdWxkIEkgYXNzdW1lIHlvdSBhcmUgdGFja2xpbmcgdGhpcywgTWF0dGhldz8KCj4gCj4gWWVh aCwgYWxsIHRoYXQgY29kZSB3YXMgd3JpdHRlbiB1bmRlciB0aGUgYXNzdW1wdGlvbiB0aGF0IHdl J3JlIG1vdmluZyBvcmRlci0wCj4gcGFnZXMgKHdoaWNoIGlzIHdoYXQgdGhlIGFub24rcGFnZWNh Y2hlIHBhZ2VzIHBhcnQpLgo+IAo+IEZyb20gd2hhdCBJIHJlY2FsbCwgd2UncmUgYWxsb2NhdGlu ZyBvcmRlci0wIHBhZ2VzIGZyb20gdGhlIGhpZ2ggbWVtb3J5Cj4gYWRkcmVzc2VzLCBzbyB3ZSBj YW4gbWlncmF0ZSBmcm9tIGxvdyBtZW1vcnkgYWRkcmVzc2VzLCBlZmZlY3RpdmVseSBmcmVlaW5n IHVwCj4gbG93IG1lbW9yeSBhZGRyZXNzZXMgYW5kIGZpbGxpbmcgaGlnaCBtZW1vcnkgYWRkcmVz c2VzLgo+IAo+IEFkanVzdGluZyB0aGF0IHdpbGwgYmUgLi4uIGludGVyZXN0aW5nLiBJbnN0ZWFk IG9mIGFsbG9jYXRpbmcgb3JkZXItMCBwYWdlcyBmcm9tCj4gaGlnaCBhZGRyZXNzZXMsIHdlIG1p Z2h0IHdhbnQgdG8gYWxsb2NhdGUgImFzIGxhcmdlIGFzIHBvc3NpYmxlIiAoImdyYWIgd2hhdCB3 ZQo+IGNhbiIpIGZyb20gaGlnaCBhZGRyZXNzZXMgYW5kIHRoZW4gaGF2ZSBvdXIgb3duIGtpbmQg b2YgYnVkZHkgZm9yIGFsbG9jYXRpbmcKPiBmcm9tIHRoYXQgcG9vbCBhIGNvbXBhY3Rpb24gZGVz dGluYXRpb24gcGFnZSwgZGVwZW5kaW5nIG9uIG91ciBzb3VyY2UgcGFnZS4gTmFzdHkuCj4gCj4g V2hhdCBzaG91bGQgYWx3YXlzIHdvcmsgaXMgdGhlIHNwbGl0LT5taWdyYXRlLiBCdXQgdGhhdCdz IGRlZmluaXRlbHkgbm90IHdoYXQgd2UKPiB3YW50IGluIG1hbnkgY2FzZXMuCj4gCj4+Cj4+PiBJ cyBzd2FwcGluZyB3b3JraW5nIGFzIGV4cGVjdGVkPyB6c3dhcD8KPj4KPj4gU3Vib3B0aW1hbGx5 LsKgIFN3YXAgd2lsbCBzcGxpdCBmb2xpb3MgaW4gb3JkZXIgdG8gc3dhcCB0aGVtLsKgIFNvbWVi b2R5Cj4+IG5lZWRzIHRvIGZpeCB0aGF0LCBidXQgaXQgc2hvdWxkIHdvcmsuCj4gCj4gR29vZCEK PiAKPiBJdCB3b3VsZCBiZSBncmVhdCB0byBoYXZlIHNvbWUga2luZCBvZiBhIGZlYXR1cmUgbWF0 cml4IHRoYXQgdGVsbHMgdXMgd2hhdCB3b3Jrcwo+IHBlcmZlY3RseSwgc3ViLW9wdGltYWxseSwg YmFyZWx5LCBub3QgYXQgYWxsIChhbmQgd2hhdCBoYXMgbm90IGJlZW4gdGVzdGVkKS4KPiBNYXli ZSAobGlrZWx5ISkgd2UnbGwgYWxzbyBmaW5kIHRoaW5ncyB0aGF0IGFyZSBzdWItb3B0aW1hbCBm b3Igb3JkaW5hcnkgVEhQCj4gKGxpa2Ugc3dhcHBpbmcsIG5vdCBldmVuIHN1cmUgYWJvdXQpLgoK SSdtIGJ1aWxkaW5nIGEgbGlzdCBvZiBrbm93biBpc3N1ZXMsIGJ1dCBzbyBmYXIgaXQgaGFzIGJl ZW4gYmFzZWQgb24gY29kZSBJJ3ZlCmZvdW5kIGR1cmluZyByZXZpZXcgYW5kIHRoaW5ncyByYWlz ZWQgYnkgcGVvcGxlIGluIHRoZXNlIHRocmVhZHMuIEFyZSB0aGVyZSB0ZXN0CnN1aXRlcyB0aGF0 IGV4cGxpY2l0bHkgdGVzdCB0aGVzZSBmZWF0dXJlcz8gSWYgc28gSSdsbCBoYXBwaWx5IHJ1biB0 aGVtIGFnYWluc3QKbGFyZ2UgYW5vbiBmb2xpb3MsIGJ1dCBhdCB0aGUgbW9tZW50IEknbSBpZ25v cmFudCBJJ20gYWZyYWlkLiBJIGhhdmUgYmVlbiB0cnlpbmcKdG8gZ2V0IG1tIHNlbGZ0ZXN0cyB1 cCBhbmQgcnVubmluZywgYnV0IEkgY3VycmVudGx5IGhhdmUgYSBidW5jaCBvZiBmYWlsdXJlcyBv bgphcm02NCwgZXZlbiB3aXRob3V0IGFueSBvZiBteSBwYXRjaGVzIC0gc29tdGhpbmcgSSdtIHdv cmtpbmcgdGhyb3VnaC4KCj4gCj4gSSBzdXNwZWN0IHRoYXQgS1NNIHNob3VsZCB3b3JrIG1vc3Rs eSBmaW5lIHdpdGggZmxleGlibGUtdGhwLiBXaGVuCj4gZGVkdXBsY2lhdGluZywgd2UnbGwgc2lt cGx5IHNwbGl0IHRoZSBjb21wb3VuZCBwYWdlIGFuZCBwcm9jZWVkIGFzIGV4cGVjdGVkLiBCdXQK PiBtaWdodCBiZSB3b3J0aCB0ZXN0aW5nIGFzIHdlbGwuCj4gCgoKX19fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX18KbGludXgtYXJtLWtlcm5lbCBtYWlsaW5nIGxp c3QKbGludXgtYXJtLWtlcm5lbEBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZy YWRlYWQub3JnL21haWxtYW4vbGlzdGluZm8vbGludXgtYXJtLWtlcm5lbAo=