From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5F8D5EB64D9
	for <linux-kernel@archiver.kernel.org>; Mon, 10 Jul 2023 09:04:10 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229733AbjGJJEJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 10 Jul 2023 05:04:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36152 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232797AbjGJJDu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 10 Jul 2023 05:03:50 -0400
Received: from mga18.intel.com (mga18.intel.com [134.134.136.126])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53CEA103
        for <linux-kernel@vger.kernel.org>; Mon, 10 Jul 2023 02:03:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1688979800; x=1720515800;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=vZU5Dq8OZ7D/sG+nefR6vxZotXrdySTAym5CodgVrAY=;
  b=IFSF2mb0c/p9WyE1PYtF+MdTReF41yV7f4N7QYqNfDtWsaOS9/mWZfW3
   gEcFJOKuSgGw/nmfMfiy0IE74gBeXgRrKZooJF+RzMDtniMsF2236q8t7
   pMNym/nL5zhjEurz/kE7TiD/+ULWJtJhpQZ+nz9ipcA1yDObfUiRoGoB5
   2l8e9r7CLDQeFVE6dq0N3Q3yx/EO9ILnd9DJx9CTa9uALv159vIAEH1VZ
   rAxy8lI0RN/0K2qQYeECAMWveKdvlTJpjIoL/RtbbjdrEthhkHENuNq4w
   mDPc/F3R6y7rUBgi3cmhFeBQFcvuYUbS4w06F//8uzHcLJbrxH4xK9wjR
   Q==;
X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="349093719"
X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; 
   d="scan'208";a="349093719"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="967358364"
X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; 
   d="scan'208";a="967358364"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:16 -0700
From:   "Huang, Ying" <ying.huang@intel.com>
To:     Ryan Roberts <ryan.roberts@arm.com>
Cc:     Andrew Morton <akpm@linux-foundation.org>,
        Matthew Wilcox <willy@infradead.org>,
        "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
        Yin Fengwei <fengwei.yin@intel.com>,
        "David Hildenbrand" <david@redhat.com>,
        Yu Zhao <yuzhao@google.com>,
        "Catalin Marinas" <catalin.marinas@arm.com>,
        Will Deacon <will@kernel.org>,
        "Anshuman Khandual" <anshuman.khandual@arm.com>,
        Yang Shi <shy828301@gmail.com>,
        <linux-arm-kernel@lists.infradead.org>,
        <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large
 anon folios
References: <20230703135330.1865927-1-ryan.roberts@arm.com>
        <20230703135330.1865927-3-ryan.roberts@arm.com>
        <877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com>
        <6379dd13-551e-3c73-422a-56ce40b27deb@arm.com>
        <87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com>
        <a60af4b1-13c2-2d05-d112-e3dce94bccb0@arm.com>
Date:   Mon, 10 Jul 2023 17:01:39 +0800
In-Reply-To: <a60af4b1-13c2-2d05-d112-e3dce94bccb0@arm.com> (Ryan Roberts's
        message of "Mon, 10 Jul 2023 09:29:57 +0100")
Message-ID: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Ryan Roberts <ryan.roberts@arm.com> writes:

> On 10/07/2023 06:37, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>> 
>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so
>>> resending:
>>>
>>> On 07/07/2023 09:21, Huang, Ying wrote:
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> With the introduction of large folios for anonymous memory, we would
>>>>> like to be able to split them when they have unmapped subpages, in order
>>>>> to free those unused pages under memory pressure. So remove the
>>>>> artificial requirement that the large folio needed to be at least
>>>>> PMD-sized.
>>>>>
>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>> ---
>>>>>  mm/rmap.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>  		 * page of the folio is unmapped and at least one page
>>>>>  		 * is still mapped.
>>>>>  		 */
>>>>> -		if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
>>>>> +		if (folio_test_large(folio) && folio_test_anon(folio))
>>>>>  			if (!compound || nr < nr_pmdmapped)
>>>>>  				deferred_split_folio(folio);
>>>>>  	}
>>>>
>>>> One possible issue is that even for large folios mapped only in one
>>>> process, in zap_pte_range(), we will always call deferred_split_folio()
>>>> unnecessarily before freeing a large folio.
>>>
>>> Hi Huang, thanks for reviewing!
>>>
>>> I have a patch that solves this problem by determining a range of ptes covered
>>> by a single folio and doing a "batch zap". This prevents the need to add the
>>> folio to the deferred split queue, only to remove it again shortly afterwards.
>>> This reduces lock contention and I can measure a performance improvement for the
>>> kernel compilation benchmark. See [1].
>>>
>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are
>>> aiming for the minimal patch set to start with and wanted to focus people on
>>> that. I intend to submit it separately later on.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/
>> 
>> Thanks for your information!  "batch zap" can solve the problem.
>> 
>> And, I agree with Matthew's comments to fix the large folios interaction
>> issues before merging the patches to allocate large folios as in the
>> following email.
>> 
>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/
>> 
>> If so, we don't need to introduce the above problem or a large patchset.
>
> I appreciate Matthew's and others position about not wanting to merge a minimal
> implementation while there are some fundamental features (e.g. compaction) it
> doesn't play well with - I'm working to create a definitive list so these items
> can be tracked and tackled.

Good to know this, Thanks!

> That said, I don't see this "batch zap" patch as an example of this. It's just a
> performance enhancement that improves things even further than large anon folios
> on their own. I'd rather concentrate on the core changes first then deal with
> this type of thing later. Does that work for you?

IIUC, allocating large folios upon page fault depends on splitting large
folios in page_remove_rmap() to avoid memory wastage.  Splitting large
folios in page_remove_rmap() depends on "batch zap" to avoid performance
regression in zap_pte_range().  So we need them to be done earlier.  Or
I miss something?

Best Regards,
Huang, Ying

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id C74EEEB64D9
	for <linux-arm-kernel@archiver.kernel.org>; Mon, 10 Jul 2023 09:03:56 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:Message-ID:In-Reply-To:
	Date:References:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=L2e4iLxGNFQpKwBKmXoT+r/dDQ5pXnzpzVJPjJ6ruuA=; b=u52YQDlWHwMvGv
	KEEXEpCeLlA7i0dYgU+30dwuXXFHGXCIlKviLH59vLVLv2z7+cPUS2oEMHcTHEfkXA2z7x+3ZfigV
	ASiS6ETJefzRf8gqV2YueZ/ODK3ErsztFaKKfLADzvlvm6iXnQBDoTzrV7whJ6CCwbQfvqY9BaLq5
	k6CFJk1pehlTB3qa67fGLCoRD35JFGevJAUs6ca2ZnKhZam13kGNyvpYYcQNwvCfXoEYEAgQMvMuR
	VwvS4v33FQpYSdCio1Q86RvbzEEBOaT9JuSJT3/jmQBHLz5dt3Z5cpzXdexMlhpGOBNYI5/vwZ12X
	+WBhtFb5s4XTo4Cc2IWA==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux))
	id 1qImnm-00AxFP-1e;
	Mon, 10 Jul 2023 09:03:26 +0000
Received: from mga18.intel.com ([134.134.136.126])
	by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux))
	id 1qImni-00AxDi-2v
	for linux-arm-kernel@lists.infradead.org;
	Mon, 10 Jul 2023 09:03:24 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1688979802; x=1720515802;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=vZU5Dq8OZ7D/sG+nefR6vxZotXrdySTAym5CodgVrAY=;
  b=OUOAnK7Dj8BgreKt4gjZVPETluOuwKPB36j1s8KhSBD+VYPvSQhjW0Dj
   Kd7pMWC6Nm1IRRmEkS2p3Kg85YnbO6FEN/mc7D/Ns+at/Q627Bhln7Ze9
   sHbPOcdpKHwRv50WEOaNDTtjXIRi/UNbHDrjdQUAKmwl0NO1nF8NAcHNH
   7vECXvJbJrhRsVw6JpX35HPIkHKjnDoQoTZWnzMUH0NyvaWej3E5Krbe4
   OcHbb1EqplacVBDpTR9WTfahRfaq+1akgXf93exDNYXDy5NrFgRl2LKrS
   jXWIxqtEUJg8uPIOd1JKTBbIXXf9raHmqg+uSdzgNB/BGbsqB6BplZ3K4
   g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="349093718"
X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; 
   d="scan'208";a="349093718"
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
  by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:19 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10766"; a="967358364"
X-IronPort-AV: E=Sophos;i="6.01,194,1684825200"; 
   d="scan'208";a="967358364"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 02:03:16 -0700
From: "Huang, Ying" <ying.huang@intel.com>
To: Ryan Roberts <ryan.roberts@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,  Matthew Wilcox
 <willy@infradead.org>,  "Kirill A. Shutemov"
 <kirill.shutemov@linux.intel.com>,  Yin Fengwei <fengwei.yin@intel.com>,
  "David Hildenbrand" <david@redhat.com>,  Yu Zhao <yuzhao@google.com>,
  "Catalin Marinas" <catalin.marinas@arm.com>,  Will Deacon
 <will@kernel.org>,  "Anshuman Khandual" <anshuman.khandual@arm.com>,  Yang
 Shi <shy828301@gmail.com>,  <linux-arm-kernel@lists.infradead.org>,
  <linux-kernel@vger.kernel.org>,  <linux-mm@kvack.org>
Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large
 anon folios
References: <20230703135330.1865927-1-ryan.roberts@arm.com>
	<20230703135330.1865927-3-ryan.roberts@arm.com>
	<877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<6379dd13-551e-3c73-422a-56ce40b27deb@arm.com>
	<87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<a60af4b1-13c2-2d05-d112-e3dce94bccb0@arm.com>
Date: Mon, 10 Jul 2023 17:01:39 +0800
In-Reply-To: <a60af4b1-13c2-2d05-d112-e3dce94bccb0@arm.com> (Ryan Roberts's
	message of "Mon, 10 Jul 2023 09:29:57 +0100")
Message-ID: <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
MIME-Version: 1.0
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20230710_020322_982952_B087B9C0 
X-CRM114-Status: GOOD (  29.41  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Ryan Roberts <ryan.roberts@arm.com> writes:

> On 10/07/2023 06:37, Huang, Ying wrote:
>> Ryan Roberts <ryan.roberts@arm.com> writes:
>> 
>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so
>>> resending:
>>>
>>> On 07/07/2023 09:21, Huang, Ying wrote:
>>>> Ryan Roberts <ryan.roberts@arm.com> writes:
>>>>
>>>>> With the introduction of large folios for anonymous memory, we would
>>>>> like to be able to split them when they have unmapped subpages, in order
>>>>> to free those unused pages under memory pressure. So remove the
>>>>> artificial requirement that the large folio needed to be at least
>>>>> PMD-sized.
>>>>>
>>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>>> Reviewed-by: Yu Zhao <yuzhao@google.com>
>>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>>>>> ---
>>>>>  mm/rmap.c | 2 +-
>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644
>>>>> --- a/mm/rmap.c
>>>>> +++ b/mm/rmap.c
>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma,
>>>>>  		 * page of the folio is unmapped and at least one page
>>>>>  		 * is still mapped.
>>>>>  		 */
>>>>> -		if (folio_test_pmd_mappable(folio) && folio_test_anon(folio))
>>>>> +		if (folio_test_large(folio) && folio_test_anon(folio))
>>>>>  			if (!compound || nr < nr_pmdmapped)
>>>>>  				deferred_split_folio(folio);
>>>>>  	}
>>>>
>>>> One possible issue is that even for large folios mapped only in one
>>>> process, in zap_pte_range(), we will always call deferred_split_folio()
>>>> unnecessarily before freeing a large folio.
>>>
>>> Hi Huang, thanks for reviewing!
>>>
>>> I have a patch that solves this problem by determining a range of ptes covered
>>> by a single folio and doing a "batch zap". This prevents the need to add the
>>> folio to the deferred split queue, only to remove it again shortly afterwards.
>>> This reduces lock contention and I can measure a performance improvement for the
>>> kernel compilation benchmark. See [1].
>>>
>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are
>>> aiming for the minimal patch set to start with and wanted to focus people on
>>> that. I intend to submit it separately later on.
>>>
>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/
>> 
>> Thanks for your information!  "batch zap" can solve the problem.
>> 
>> And, I agree with Matthew's comments to fix the large folios interaction
>> issues before merging the patches to allocate large folios as in the
>> following email.
>> 
>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/
>> 
>> If so, we don't need to introduce the above problem or a large patchset.
>
> I appreciate Matthew's and others position about not wanting to merge a minimal
> implementation while there are some fundamental features (e.g. compaction) it
> doesn't play well with - I'm working to create a definitive list so these items
> can be tracked and tackled.

Good to know this, Thanks!

> That said, I don't see this "batch zap" patch as an example of this. It's just a
> performance enhancement that improves things even further than large anon folios
> on their own. I'd rather concentrate on the core changes first then deal with
> this type of thing later. Does that work for you?

IIUC, allocating large folios upon page fault depends on splitting large
folios in page_remove_rmap() to avoid memory wastage.  Splitting large
folios in page_remove_rmap() depends on "batch zap" to avoid performance
regression in zap_pte_range().  So we need them to be done earlier.  Or
I miss something?

Best Regards,
Huang, Ying

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel