From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=OUtR=CU=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,NICE_REPLY_A,
	SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id CBCD4C433E2
	for <linux-mm@archiver.kernel.org>; Fri, 11 Sep 2020 03:39:00 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 34FFF21D7E
	for <linux-mm@archiver.kernel.org>; Fri, 11 Sep 2020 03:38:59 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 34FFF21D7E
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 595B7900005; Thu, 10 Sep 2020 23:38:59 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 51EE68E0001; Thu, 10 Sep 2020 23:38:59 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3C080900005; Thu, 10 Sep 2020 23:38:59 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87])
	by kanga.kvack.org (Postfix) with ESMTP id 22A368E0001
	for <linux-mm@kvack.org>; Thu, 10 Sep 2020 23:38:59 -0400 (EDT)
Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id D62E51E0B
	for <linux-mm@kvack.org>; Fri, 11 Sep 2020 03:38:58 +0000 (UTC)
X-FDA: 77249374356.11.eggs57_2610313270eb
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin11.hostedemail.com (Postfix) with ESMTP id AC754180F8B80
	for <linux-mm@kvack.org>; Fri, 11 Sep 2020 03:38:58 +0000 (UTC)
X-HE-Tag: eggs57_2610313270eb
X-Filterd-Recvd-Size: 8634
Received: from out30-43.freemail.mail.aliyun.com (out30-43.freemail.mail.aliyun.com [115.124.30.43])
	by imf20.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 11 Sep 2020 03:38:56 +0000 (UTC)
X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04423;MF=alex.shi@linux.alibaba.com;NM=1;PH=DS;RN=22;SR=0;TI=SMTPD_---0U8YuYvo_1599795527;
Received: from IT-FVFX43SYHV2H.local(mailfrom:alex.shi@linux.alibaba.com fp:SMTPD_---0U8YuYvo_1599795527)
          by smtp.aliyun-inc.com(127.0.0.1);
          Fri, 11 Sep 2020 11:38:49 +0800
Subject: Re: [PATCH v18 06/32] mm/thp: narrow lru locking
To: Matthew Wilcox <willy@infradead.org>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net, tj@kernel.org,
 hughd@google.com, khlebnikov@yandex-team.ru, daniel.m.jordan@oracle.com,
 hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
 linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, shakeelb@google.com,
 iamjoonsoo.kim@lge.com, richard.weiyang@gmail.com, kirill@shutemov.name,
 alexander.duyck@gmail.com, rong.a.chen@intel.com, mhocko@suse.com,
 vdavydov.dev@gmail.com, shy828301@gmail.com,
 Andrea Arcangeli <aarcange@redhat.com>
References: <1598273705-69124-1-git-send-email-alex.shi@linux.alibaba.com>
 <1598273705-69124-7-git-send-email-alex.shi@linux.alibaba.com>
 <20200910134923.GR6583@casper.infradead.org>
From: Alex Shi <alex.shi@linux.alibaba.com>
Message-ID: <514f6afa-dbf7-11c5-5431-1d558d2c20c9@linux.alibaba.com>
Date: Fri, 11 Sep 2020 11:37:50 +0800
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:68.0)
 Gecko/20100101 Thunderbird/68.7.0
MIME-Version: 1.0
In-Reply-To: <20200910134923.GR6583@casper.infradead.org>
Content-Type: text/plain; charset=gbk
X-Rspamd-Queue-Id: AC754180F8B80
X-Spamd-Result: default: False [0.00 / 100.00]
X-Rspamd-Server: rspam05
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>


=D4=DA 2020/9/10 =CF=C2=CE=E79:49, Matthew Wilcox =D0=B4=B5=C0:
> On Mon, Aug 24, 2020 at 08:54:39PM +0800, Alex Shi wrote:
>> lru_lock and page cache xa_lock have no reason with current sequence,
>> put them together isn't necessary. let's narrow the lru locking, but
>> left the local_irq_disable to block interrupt re-entry and statistic u=
pdate.
>=20
> What stats are you talking about here?

Hi Matthew,

Thanks for comments!

like __dec_node_page_state(head, NR_SHMEM_THPS); will have preemptive war=
ning...

>=20
>> +++ b/mm/huge_memory.c
>> @@ -2397,7 +2397,7 @@ static void __split_huge_page_tail(struct page *=
head, int tail,
>>  }
>> =20
>>  static void __split_huge_page(struct page *page, struct list_head *li=
st,
>> -		pgoff_t end, unsigned long flags)
>> +			      pgoff_t end)
>=20
> Please don't change this whitespace.  It's really annoying having to
> adjust the whitespace when renaming a function.  Just two tabs indentat=
ion
> to give a clear separation of arguments from code is fine.
>=20
>=20
> How about this patch instead?  It occurred to me we already have
> perfectly good infrastructure to track whether or not interrupts are
> already disabled, and so we should use that instead of ensuring that
> interrupts are disabled, or tracking that ourselves.

So your proposal looks like;
1, xa_lock_irq(&mapping->i_pages); (optional)
2, spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
3, spin_lock_irqsave(&pgdat->lru_lock, flags);

Is there meaningful for the 2nd and 3rd flags?

IIRC, I had a similar proposal as your, the flags used in xa_lock_irqsave=
(),
but objected by Hugh.

Thanks
Alex

>=20
> But I may have missed something else that's relying on having
> interrupts disabled.  Please check carefully.
>=20
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2ccff8472cd4..74cae6c032f9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2376,17 +2376,16 @@ static void __split_huge_page_tail(struct page =
*head, int tail,
>  }
> =20
>  static void __split_huge_page(struct page *page, struct list_head *lis=
t,
> -		pgoff_t end, unsigned long flags)
> +		pgoff_t end)
>  {
>  	struct page *head =3D compound_head(page);
>  	pg_data_t *pgdat =3D page_pgdat(head);
>  	struct lruvec *lruvec;
>  	struct address_space *swap_cache =3D NULL;
>  	unsigned long offset =3D 0;
> +	unsigned long flags;
>  	int i;
> =20
> -	lruvec =3D mem_cgroup_page_lruvec(head, pgdat);
> -
>  	/* complete memcg works before add pages to LRU */
>  	mem_cgroup_split_huge_fixup(head);
> =20
> @@ -2395,9 +2394,13 @@ static void __split_huge_page(struct page *page,=
 struct list_head *list,
> =20
>  		offset =3D swp_offset(entry);
>  		swap_cache =3D swap_address_space(entry);
> -		xa_lock(&swap_cache->i_pages);
> +		xa_lock_irq(&swap_cache->i_pages);
>  	}
> =20
> +	/* prevent PageLRU to go away from under us, and freeze lru stats */
> +	spin_lock_irqsave(&pgdat->lru_lock, flags);
> +	lruvec =3D mem_cgroup_page_lruvec(head, pgdat);
> +
>  	for (i =3D HPAGE_PMD_NR - 1; i >=3D 1; i--) {
>  		__split_huge_page_tail(head, i, lruvec, list);
>  		/* Some pages can be beyond i_size: drop them from page cache */
> @@ -2417,6 +2420,7 @@ static void __split_huge_page(struct page *page, =
struct list_head *list,
>  	}
> =20
>  	ClearPageCompound(head);
> +	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> =20
>  	split_page_owner(head, HPAGE_PMD_ORDER);
> =20
> @@ -2425,18 +2429,16 @@ static void __split_huge_page(struct page *page=
, struct list_head *list,
>  		/* Additional pin to swap cache */
>  		if (PageSwapCache(head)) {
>  			page_ref_add(head, 2);
> -			xa_unlock(&swap_cache->i_pages);
> +			xa_unlock_irq(&swap_cache->i_pages);
>  		} else {
>  			page_ref_inc(head);
>  		}
>  	} else {
>  		/* Additional pin to page cache */
>  		page_ref_add(head, 2);
> -		xa_unlock(&head->mapping->i_pages);
> +		xa_unlock_irq(&head->mapping->i_pages);
>  	}
> =20
> -	spin_unlock_irqrestore(&pgdat->lru_lock, flags);
> -
>  	remap_page(head);
> =20
>  	for (i =3D 0; i < HPAGE_PMD_NR; i++) {
> @@ -2574,7 +2576,6 @@ bool can_split_huge_page(struct page *page, int *=
pextra_pins)
>  int split_huge_page_to_list(struct page *page, struct list_head *list)
>  {
>  	struct page *head =3D compound_head(page);
> -	struct pglist_data *pgdata =3D NODE_DATA(page_to_nid(head));
>  	struct deferred_split *ds_queue =3D get_deferred_split_queue(head);
>  	struct anon_vma *anon_vma =3D NULL;
>  	struct address_space *mapping =3D NULL;
> @@ -2640,9 +2641,6 @@ int split_huge_page_to_list(struct page *page, st=
ruct list_head *list)
>  	unmap_page(head);
>  	VM_BUG_ON_PAGE(compound_mapcount(head), head);
> =20
> -	/* prevent PageLRU to go away from under us, and freeze lru stats */
> -	spin_lock_irqsave(&pgdata->lru_lock, flags);
> -
>  	if (mapping) {
>  		XA_STATE(xas, &mapping->i_pages, page_index(head));
> =20
> @@ -2650,13 +2648,13 @@ int split_huge_page_to_list(struct page *page, =
struct list_head *list)
>  		 * Check if the head page is present in page cache.
>  		 * We assume all tail are present too, if head is there.
>  		 */
> -		xa_lock(&mapping->i_pages);
> +		xa_lock_irq(&mapping->i_pages);
>  		if (xas_load(&xas) !=3D head)
>  			goto fail;
>  	}
> =20
>  	/* Prevent deferred_split_scan() touching ->_refcount */
> -	spin_lock(&ds_queue->split_queue_lock);
> +	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
>  	count =3D page_count(head);
>  	mapcount =3D total_mapcount(head);
>  	if (!mapcount && page_ref_freeze(head, 1 + extra_pins)) {
> @@ -2664,7 +2662,7 @@ int split_huge_page_to_list(struct page *page, st=
ruct list_head *list)
>  			ds_queue->split_queue_len--;
>  			list_del(page_deferred_list(head));
>  		}
> -		spin_unlock(&ds_queue->split_queue_lock);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  		if (mapping) {
>  			if (PageSwapBacked(head))
>  				__dec_node_page_state(head, NR_SHMEM_THPS);
> @@ -2672,7 +2670,7 @@ int split_huge_page_to_list(struct page *page, st=
ruct list_head *list)
>  				__dec_node_page_state(head, NR_FILE_THPS);
>  		}
> =20
> -		__split_huge_page(page, list, end, flags);
> +		__split_huge_page(page, list, end);
>  		if (PageSwapCache(head)) {
>  			swp_entry_t entry =3D { .val =3D page_private(head) };
> =20
> @@ -2688,10 +2686,9 @@ int split_huge_page_to_list(struct page *page, s=
truct list_head *list)
>  			dump_page(page, "total_mapcount(head) > 0");
>  			BUG();
>  		}
> -		spin_unlock(&ds_queue->split_queue_lock);
> +		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>  fail:		if (mapping)
> -			xa_unlock(&mapping->i_pages);
> -		spin_unlock_irqrestore(&pgdata->lru_lock, flags);
> +			xa_unlock_irq(&mapping->i_pages);
>  		remap_page(head);
>  		ret =3D -EBUSY;
>  	}
>=20