From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F8D5EB64D9 for ; Mon, 10 Jul 2023 09:04:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229733AbjGJJEJ (ORCPT ); Mon, 10 Jul 2023 05:04:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36152 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232797AbjGJJDu (ORCPT ); Mon, 10 Jul 2023 05:03:50 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53CEA103 for ; Mon, 10 Jul 2023 02:03:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688979800; x=1720515800; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=vZU5Dq8OZ7D/sG+nefR6vxZotXrdySTAym5CodgVrAY=; b=IFSF2mb0c/p9WyE1PYtF+MdTReF41yV7f4N7QYqNfDtWsaOS9/mWZfW3 gEcFJOKuSgGw/nmfMfiy0IE74gBeXgRrKZooJF+RzMDtniMsF2236q8t7 pMNym/nL5zhjEurz/kE7TiD/+ULWJtJhpQZ+nz9ipcA1yDObfUiRoGoB5 2l8e9r7CLDQeFVE6dq0N3Q3yx/EO9ILnd9DJx9CTa9uALv159vIAEH1VZ rAxy8lI0RN/0K2qQYeECAMWveKdvlTJpjIoL/RtbbjdrEthhkHENuNq4w mDPc/F3R6y7rUBgi3cmhFeBQFcvuYUbS4w06F//8uzHcLJbrxH4xK9wjR Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="349093719" X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; d="scan'208";a="349093719" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="967358364" X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; d="scan'208";a="967358364" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:16 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , "David Hildenbrand" , Yu Zhao , "Catalin Marinas" , Will Deacon , "Anshuman Khandual" , Yang Shi , , , Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-3-ryan.roberts@arm.com> <877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com> <6379dd13-551e-3c73-422a-56ce40b27deb@arm.com> <87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 10 Jul 2023 17:01:39 +0800 In-Reply-To: (Ryan Roberts's message of "Mon, 10 Jul 2023 09:29:57 +0100") Message-ID: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ryan Roberts writes: > On 10/07/2023 06:37, Huang, Ying wrote: >> Ryan Roberts writes: >> >>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>> resending: >>> >>> On 07/07/2023 09:21, Huang, Ying wrote: >>>> Ryan Roberts writes: >>>> >>>>> With the introduction of large folios for anonymous memory, we would >>>>> like to be able to split them when they have unmapped subpages, in order >>>>> to free those unused pages under memory pressure. So remove the >>>>> artificial requirement that the large folio needed to be at least >>>>> PMD-sized. >>>>> >>>>> Signed-off-by: Ryan Roberts >>>>> Reviewed-by: Yu Zhao >>>>> Reviewed-by: Yin Fengwei >>>>> --- >>>>> mm/rmap.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>> --- a/mm/rmap.c >>>>> +++ b/mm/rmap.c >>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>> * page of the folio is unmapped and at least one page >>>>> * is still mapped. >>>>> */ >>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>> if (!compound || nr < nr_pmdmapped) >>>>> deferred_split_folio(folio); >>>>> } >>>> >>>> One possible issue is that even for large folios mapped only in one >>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>> unnecessarily before freeing a large folio. >>> >>> Hi Huang, thanks for reviewing! >>> >>> I have a patch that solves this problem by determining a range of ptes covered >>> by a single folio and doing a "batch zap". This prevents the need to add the >>> folio to the deferred split queue, only to remove it again shortly afterwards. >>> This reduces lock contention and I can measure a performance improvement for the >>> kernel compilation benchmark. See [1]. >>> >>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>> aiming for the minimal patch set to start with and wanted to focus people on >>> that. I intend to submit it separately later on. >>> >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >> >> Thanks for your information! "batch zap" can solve the problem. >> >> And, I agree with Matthew's comments to fix the large folios interaction >> issues before merging the patches to allocate large folios as in the >> following email. >> >> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >> >> If so, we don't need to introduce the above problem or a large patchset. > > I appreciate Matthew's and others position about not wanting to merge a minimal > implementation while there are some fundamental features (e.g. compaction) it > doesn't play well with - I'm working to create a definitive list so these items > can be tracked and tackled. Good to know this, Thanks! > That said, I don't see this "batch zap" patch as an example of this. It's just a > performance enhancement that improves things even further than large anon folios > on their own. I'd rather concentrate on the core changes first then deal with > this type of thing later. Does that work for you? IIUC, allocating large folios upon page fault depends on splitting large folios in page_remove_rmap() to avoid memory wastage. Splitting large folios in page_remove_rmap() depends on "batch zap" to avoid performance regression in zap_pte_range(). So we need them to be done earlier. Or I miss something? Best Regards, Huang, Ying From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id C74EEEB64D9 for ; Mon, 10 Jul 2023 09:03:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:In-Reply-To: Date:References:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=L2e4iLxGNFQpKwBKmXoT+r/dDQ5pXnzpzVJPjJ6ruuA=; b=u52YQDlWHwMvGv KEEXEpCeLlA7i0dYgU+30dwuXXFHGXCIlKviLH59vLVLv2z7+cPUS2oEMHcTHEfkXA2z7x+3ZfigV ASiS6ETJefzRf8gqV2YueZ/ODK3ErsztFaKKfLADzvlvm6iXnQBDoTzrV7whJ6CCwbQfvqY9BaLq5 k6CFJk1pehlTB3qa67fGLCoRD35JFGevJAUs6ca2ZnKhZam13kGNyvpYYcQNwvCfXoEYEAgQMvMuR VwvS4v33FQpYSdCio1Q86RvbzEEBOaT9JuSJT3/jmQBHLz5dt3Z5cpzXdexMlhpGOBNYI5/vwZ12X +WBhtFb5s4XTo4Cc2IWA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qImnm-00AxFP-1e; Mon, 10 Jul 2023 09:03:26 +0000 Received: from mga18.intel.com ([134.134.136.126]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qImni-00AxDi-2v for linux-arm-kernel@lists.infradead.org; Mon, 10 Jul 2023 09:03:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1688979802; x=1720515802; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=vZU5Dq8OZ7D/sG+nefR6vxZotXrdySTAym5CodgVrAY=; b=OUOAnK7Dj8BgreKt4gjZVPETluOuwKPB36j1s8KhSBD+VYPvSQhjW0Dj Kd7pMWC6Nm1IRRmEkS2p3Kg85YnbO6FEN/mc7D/Ns+at/Q627Bhln7Ze9 sHbPOcdpKHwRv50WEOaNDTtjXIRi/UNbHDrjdQUAKmwl0NO1nF8NAcHNH 7vECXvJbJrhRsVw6JpX35HPIkHKjnDoQoTZWnzMUH0NyvaWej3E5Krbe4 OcHbb1EqplacVBDpTR9WTfahRfaq+1akgXf93exDNYXDy5NrFgRl2LKrS jXWIxqtEUJg8uPIOd1JKTBbIXXf9raHmqg+uSdzgNB/BGbsqB6BplZ3K4 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="349093718" X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; d="scan'208";a="349093718" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:19 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="967358364" X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; d="scan'208";a="967358364" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:16 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , "David Hildenbrand" , Yu Zhao , "Catalin Marinas" , Will Deacon , "Anshuman Khandual" , Yang Shi , , , Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-3-ryan.roberts@arm.com> <877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com> <6379dd13-551e-3c73-422a-56ce40b27deb@arm.com> <87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Mon, 10 Jul 2023 17:01:39 +0800 In-Reply-To: (Ryan Roberts's message of "Mon, 10 Jul 2023 09:29:57 +0100") Message-ID: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20230710_020322_982952_B087B9C0 X-CRM114-Status: GOOD ( 29.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Ryan Roberts writes: > On 10/07/2023 06:37, Huang, Ying wrote: >> Ryan Roberts writes: >> >>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>> resending: >>> >>> On 07/07/2023 09:21, Huang, Ying wrote: >>>> Ryan Roberts writes: >>>> >>>>> With the introduction of large folios for anonymous memory, we would >>>>> like to be able to split them when they have unmapped subpages, in order >>>>> to free those unused pages under memory pressure. So remove the >>>>> artificial requirement that the large folio needed to be at least >>>>> PMD-sized. >>>>> >>>>> Signed-off-by: Ryan Roberts >>>>> Reviewed-by: Yu Zhao >>>>> Reviewed-by: Yin Fengwei >>>>> --- >>>>> mm/rmap.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>> --- a/mm/rmap.c >>>>> +++ b/mm/rmap.c >>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>> * page of the folio is unmapped and at least one page >>>>> * is still mapped. >>>>> */ >>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>> if (!compound || nr < nr_pmdmapped) >>>>> deferred_split_folio(folio); >>>>> } >>>> >>>> One possible issue is that even for large folios mapped only in one >>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>> unnecessarily before freeing a large folio. >>> >>> Hi Huang, thanks for reviewing! >>> >>> I have a patch that solves this problem by determining a range of ptes covered >>> by a single folio and doing a "batch zap". This prevents the need to add the >>> folio to the deferred split queue, only to remove it again shortly afterwards. >>> This reduces lock contention and I can measure a performance improvement for the >>> kernel compilation benchmark. See [1]. >>> >>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>> aiming for the minimal patch set to start with and wanted to focus people on >>> that. I intend to submit it separately later on. >>> >>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >> >> Thanks for your information! "batch zap" can solve the problem. >> >> And, I agree with Matthew's comments to fix the large folios interaction >> issues before merging the patches to allocate large folios as in the >> following email. >> >> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >> >> If so, we don't need to introduce the above problem or a large patchset. > > I appreciate Matthew's and others position about not wanting to merge a minimal > implementation while there are some fundamental features (e.g. compaction) it > doesn't play well with - I'm working to create a definitive list so these items > can be tracked and tackled. Good to know this, Thanks! > That said, I don't see this "batch zap" patch as an example of this. It's just a > performance enhancement that improves things even further than large anon folios > on their own. I'd rather concentrate on the core changes first then deal with > this type of thing later. Does that work for you? IIUC, allocating large folios upon page fault depends on splitting large folios in page_remove_rmap() to avoid memory wastage. Splitting large folios in page_remove_rmap() depends on "batch zap" to avoid performance regression in zap_pte_range(). So we need them to be done earlier. Or I miss something? Best Regards, Huang, Ying _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel