From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8055C433FE for ; Mon, 24 Jan 2022 18:48:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6681F6B00B3; Mon, 24 Jan 2022 13:48:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 615BE6B00B6; Mon, 24 Jan 2022 13:48:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4DDF46B00B7; Mon, 24 Jan 2022 13:48:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id 3EBAE6B00B3 for ; Mon, 24 Jan 2022 13:48:59 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 059D11802F63A for ; Mon, 24 Jan 2022 18:48:59 +0000 (UTC) X-FDA: 79066067598.10.361E035 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf09.hostedemail.com (Postfix) with ESMTP id A54E114000D for ; Mon, 24 Jan 2022 18:48:58 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id b15so2077147plg.3 for ; Mon, 24 Jan 2022 10:48:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=Gzn3uL4rVVdIQDAl4BqctHY3PBm5IhFX7n3JXbdsza8=; b=Qh+FuWF89aMBe6tH8YCR707h4Gxwg0Ls7NK8D6oyRrTxLavxgit+eOxbNfYobiPB1P i257nnXOJ8+q5A2mnJooBL3VgJ9yLW8Gt8uzuRkJOzrL4ED+g0coDI1iBYnSaYVekOi/ /S3xOTDv41y8sYC00uwEtka6RT+mYC3wDiqifhjjzZiM1HYvcJhgVNvFzbB4Z7K2K6GV FlxVGn/TG9/6zw0KTq50pVnWTnFDMi96DliTDzlOOKV+No0Ji1+Ez50CY9Q3alpzbIsb BLf5lgX6IJ+qdi80fHU8lg8GSB7aVZYkY93NfalIoxT0OpbAd9c5VN2Y9mk2ZirG0HWP lhGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=Gzn3uL4rVVdIQDAl4BqctHY3PBm5IhFX7n3JXbdsza8=; b=OoXbn+QGJ9Kf5LtTHaFoxeznywLpt69oZKhsSzwXf9rqreh9FZtJtdQmEMzEYBTWxF QfkmWm/eN2CBPesZmxcoBYlJ4SuHK6glQrv7lH+QO7ljk6Xxl8hV8YeeqsqP4xhR0hNS 6hm3z6tHB5dLYW9n4mn8/TkVzvdly8MeMRaS1yKYXf9xH39uZqqZGjPrr89EmrYVhmyN rr1uNp5J4jXzXpE7AjEYigo0GGKg+MclJA8c1lEypxZqllJiVjtLPhcoqueTF8bLQS8N 0kBUut65IqjYeIU1A+iPNSzjhxbHvW7ZEFtsDkHAfVC5rnymVW6B+g5RmVhb0j2a4AZk pQWw== X-Gm-Message-State: AOAM530DmtrJIgcZ58+KNVcW2kYSQnNRdtcAauMhOGUspt63L+WmSULi Oxy9GDImLg1z5KQXYn+iTTh2kQ== X-Google-Smtp-Source: ABdhPJxpeHQ7qWhJZyBZR4jcIo1P20Ptn1cZClhGtSKAcWnnaDw6cFV4D/Pth0o3hjutXJnUc8uQnA== X-Received: by 2002:a17:90b:1b0e:: with SMTP id nu14mr3174732pjb.39.1643050137402; Mon, 24 Jan 2022 10:48:57 -0800 (PST) Received: from [2620:15c:29:204:1f99:bd65:fcbb:146c] ([2620:15c:29:204:1f99:bd65:fcbb:146c]) by smtp.gmail.com with ESMTPSA id 20sm12251408pgz.59.2022.01.24.10.48.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 10:48:56 -0800 (PST) Date: Mon, 24 Jan 2022 10:48:55 -0800 (PST) From: David Rientjes To: Peter Xu , Zach O'Keefe , SeongJae Park cc: Shakeel Butt , David Hildenbrand , "Kirill A . Shutemov" , Yang Shi , Zi Yan , Matthew Wilcox , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] mm: split thp synchronously on MADV_DONTNEED In-Reply-To: Message-ID: References: <20211120201230.920082-1-shakeelb@google.com> <25b36a5c-5bbd-5423-0c67-05cd6c1432a7@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: A54E114000D X-Stat-Signature: fhpyrrspe4e5th119p7gij4g56kwkxfq Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Qh+FuWF8; spf=pass (imf09.hostedemail.com: domain of rientjes@google.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=rientjes@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1643050138-530542 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 26 Nov 2021, Peter Xu wrote: > Some side notes: I digged out the old MADV_COLLAPSE proposal right after I > thought about MADV_SPLIT (or any of its variance): > > https://lore.kernel.org/all/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ > > My memory was that there's some issue to be solved so that was blocked, however > when I read the thread it sounds like the list was mostly reaching a consensus > on considering MADV_COLLAPSE being beneficial. Still copying DavidR in case I > missed something important. > > If we think MADV_COLLAPSE can help to implement an userspace (and more > importantly, data-aware) khugepaged, then MADV_SPLIT can be the other side of > kcompactd, perhaps. > > That's probably a bit off topic of this specific discussion on the specific use > case, but so far it seems all reasonable and discussable. > Hi Peter, Providing a (late) update since we now have some better traction on this, I think we'll be ready to post an RFC soon that introduces MADV_COLLAPSE. The work is being driven by Zach, now cc'd. Let's also include SeongJae Park as well and keep him in the loop since DAMON could easily be extended with a DAMOS_COLLAPSE action to use MADV_COLLAPSE for hot regions of memory. Idea for initial approach: - MADV_COLLAPSE core code based on the proposal you cite above for anon memory as the inaugural support, collapse memory into thp in process context - Batching support to collapse ranges of memory into multiple THP - Wire this up for madvise(2) (and process_madvise(2)) - Enlightenment for file-backed thp I think Zach's RFC will cover the first three, it could be debated if the initial patch series *must* support file-backed thp. We'll see based on the feedback to the RFC. There's also an extension where MADV_COLLAPSE could be potentially useful for hugetlb backed memory. We have another effort underway that we've been talking with Mike Kravetz about that allows hugetlb memory to be mapped at multiple levels of the page tables. There are several use cases but one of the driving factors is the performance of post-copy live migration; in this case, you'd be able to send smaller sized pages over the wire rather than, say, a 1GB gigantic page. In this case, MADV_COLLAPSE could be useful to map smaller pages by a larger page table entry before all of the smaller pages have been live migrated. That said, we have not invested time into an MADV_SPLIT yet. Do you (or anybody else) have concerns about this approach? Ideas for extensions? Thanks!