All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU
Date: Tue, 22 Sep 2020 11:53:37 +0800	[thread overview]
Message-ID: <d3a3cee4-2680-8a04-edaf-4ddcc2e95058@linux.alibaba.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2009211504200.5214@eggly.anvils>



在 2020/9/22 上午7:16, Hugh Dickins 写道:
> On Mon, 24 Aug 2020, Alex Shi wrote:
> 
>> Currently lru_lock still guards both lru list and page's lru bit, that's
>> ok. but if we want to use specific lruvec lock on the page, we need to
>> pin down the page's lruvec/memcg during locking. Just taking lruvec
>> lock first may be undermined by the page's memcg charge/migration. To
>> fix this problem, we could clear the lru bit out of locking and use
>> it as pin down action to block the page isolation in memcg changing.
>>
>> So now a standard steps of page isolation is following:
>> 	1, get_page(); 	       #pin the page avoid to be free
>> 	2, TestClearPageLRU(); #block other isolation like memcg change
>> 	3, spin_lock on lru_lock; #serialize lru list access
>> 	4, delete page from lru list;
>> The step 2 could be optimzed/replaced in scenarios which page is
>> unlikely be accessed or be moved between memcgs.
>>
>> This patch start with the first part: TestClearPageLRU, which combines
>> PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
>> function will be used as page isolation precondition to prevent other
>> isolations some where else. Then there are may !PageLRU page on lru
>> list, need to remove BUG() checking accordingly.
>>
>> There 2 rules for lru bit now:
>> 1, the lru bit still indicate if a page on lru list, just in some
>>    temporary moment(isolating), the page may have no lru bit when
>>    it's on lru list.  but the page still must be on lru list when the
>>    lru bit set.
>> 2, have to remove lru bit before delete it from lru list.
>>
>> Hugh Dickins pointed that when a page is in free path and no one is
>> possible to take it, non atomic lru bit clearing is better, like in
>> __page_cache_release and release_pages.
>> And no need get_page() before lru bit clear in isolate_lru_page,
>> since it '(1) Must be called with an elevated refcount on the page'.
> 
> Delete that paragraph: you're justifying changes made during the
> course of earlier review, but not needed here.  If we start to
> comment on everything that is not done...!
> 

Will delete it!

>>
>> As Andrew Morton mentioned this change would dirty cacheline for page
>> isn't on LRU. But the lost would be acceptable with Rong Chen
>> <rong.a.chen@intel.com> report:
>> https://lkml.org/lkml/2020/3/4/173
> 
> Please use a lore link instead, lkml.org is nice but unreliable:
> https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Yes, will replace the link.

> 
>>
>> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> 
> Acked-by: Hugh Dickins <hughd@google.com>
> when you make the changes suggested above and below.

Thanks!

> 
> I still have long-standing reservations about this TestClearPageLRU
> technique (it's hard to reason about, and requires additional atomic ops
> in some places); but it's working, so I'd like it to go in, then later
> we can experiment with whether lock_page_memcg() does a better job, or
> rechecking memcg when getting the lru_lock (my original technique).
> 
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: cgroups@vger.kernel.org
>> Cc: linux-mm@kvack.org
>> ---
>>  include/linux/page-flags.h |  1 +
>>  mm/mlock.c                 |  3 +--
>>  mm/swap.c                  |  5 ++---
>>  mm/vmscan.c                | 18 +++++++-----------
>>  4 files changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 6be1aa559b1e..9554ed1387dc 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -326,6 +326,7 @@ static inline void page_init_poison(struct page *page, size_t size)
>>  PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
>>  	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
>>  PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
>> +	TESTCLEARFLAG(LRU, lru, PF_HEAD)
>>  PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
>>  	TESTCLEARFLAG(Active, active, PF_HEAD)
>>  PAGEFLAG(Workingset, workingset, PF_HEAD)
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index 93ca2bf30b4f..3762d9dd5b31 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -107,13 +107,12 @@ void mlock_vma_page(struct page *page)
>>   */
>>  static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
>>  {
>> -	if (PageLRU(page)) {
>> +	if (TestClearPageLRU(page)) {
>>  		struct lruvec *lruvec;
>>  
>>  		lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>>  		if (getpage)
>>  			get_page(page);
>> -		ClearPageLRU(page);
>>  		del_page_from_lru_list(page, lruvec, page_lru(page));
>>  		return true;
>>  	}
>> diff --git a/mm/swap.c b/mm/swap.c
>> index f80ccd6f3cb4..446ffe280809 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -83,10 +83,9 @@ static void __page_cache_release(struct page *page)
>>  		struct lruvec *lruvec;
>>  		unsigned long flags;
>>  
>> +		__ClearPageLRU(page);
>>  		spin_lock_irqsave(&pgdat->lru_lock, flags);
>>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> -		VM_BUG_ON_PAGE(!PageLRU(page), page);
>> -		__ClearPageLRU(page);
>>  		del_page_from_lru_list(page, lruvec, page_off_lru(page));
>>  		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
>>  	}
>> @@ -880,9 +879,9 @@ void release_pages(struct page **pages, int nr)
>>  				spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
>>  			}
>>  
>> -			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
>>  			VM_BUG_ON_PAGE(!PageLRU(page), page);
>>  			__ClearPageLRU(page);
>> +			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
>>  			del_page_from_lru_list(page, lruvec, page_off_lru(page));
>>  		}
>>  
> 
> Please delete all those mods to mm/swap.c from this patch.  This patch
> is about introducing TestClearPageLRU, but that is not involved here.
> Several versions ago, yes it was, then I pointed out that these are
> operations on refcount 0 pages, and we don't want to add unnecessary
> atomic operations on them.  I expect you want to keep the rearrangements,
> but do them where you need them later (I expect that's in 20/32).

When I look into the 20th patch, replace lru_lock, it seems this change isn't
belong there too. And I try to reduce more code changes from 20th patch, since
it's already big enough. that make it hard to do bisect if anything wrong.

So the same dilemma is here to this patch. For the bisection friendly, may it's
better to split this part out?

Thanks!

> 
> And I notice that one VM_BUG_ON_PAGE was kept and the other deleted:
> though one can certainly argue that they're redundant (as all BUGs
> should be), I think most people will feel safer to keep them both.

Right, will keep the BUG check here.

> 
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 7b7b36bd1448..1b3e0eeaad64 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1665,8 +1665,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>  		page = lru_to_page(src);
>>  		prefetchw_prev_lru_page(page, src, flags);
>>  
>> -		VM_BUG_ON_PAGE(!PageLRU(page), page);
>> -
>>  		nr_pages = compound_nr(page);
>>  		total_scan += nr_pages;
>>  
> 
> It is not enough to remove just that one VM_BUG_ON_PAGE there.
> This is a patch series, and we don't need it to be perfect at every
> bisection point between patches, but we do need it to be reasonably
> robust, so as not to waste unrelated bughunters' time.  It didn't
> take me very long to crash on the "default: BUG()" further down
> isolate_lru_pages(), because now PageLRU may get cleared at any
> instant, whatever locks are held.
> 
> (But you're absolutely right to leave the compaction and pagevec
> mods to subsequent patches: it's fairly safe to separate those out,
> and much easier for reviewers that you did so.)
> 
> This patch is much more robust with __isolate_lru_page() mods below
> on top.  I agree there's other ways to do it, but given that nobody
> cares what the error return is from __isolate_lru_page(), except for
> the isolate_lru_pages() switch statement BUG() which has become
> invalid, I suggest just use -EBUSY throughout __isolate_lru_page().
> Yes, we can and should change that switch statement to an
> "if {} else {}" without any BUG(), but I don't want to mess
> you around at this time, leave cleanup like that until later.
> Please fold in this patch on top:
> 

Thanks a lot! will merge it.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1540,7 +1540,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>   */
>  int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  {
> -	int ret = -EINVAL;
> +	int ret = -EBUSY;
>  
>  	/* Only take pages on the LRU. */
>  	if (!PageLRU(page))
> @@ -1550,8 +1550,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  	if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
>  		return ret;
>  
> -	ret = -EBUSY;
> -
>  	/*
>  	 * To minimise LRU disruption, the caller can indicate that it only
>  	 * wants to isolate pages it will be able to operate on without
> @@ -1598,8 +1596,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  		 * sure the page is not being freed elsewhere -- the
>  		 * page release code relies on it.
>  		 */
> -		ClearPageLRU(page);
> -		ret = 0;
> +		if (TestClearPageLRU(page))
> +			ret = 0;
> +		else
> +			put_page(page);
>  	}

this code will finally be removed in next patch, but it's better in here now.
Thanks!

>  
>  	return ret;
> 
>> @@ -1763,21 +1761,19 @@ int isolate_lru_page(struct page *page)
>>  	VM_BUG_ON_PAGE(!page_count(page), page);
>>  	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
>>  
>> -	if (PageLRU(page)) {
>> +	if (TestClearPageLRU(page)) {
>>  		pg_data_t *pgdat = page_pgdat(page);
>>  		struct lruvec *lruvec;
>> +		int lru = page_lru(page);
>>  
>> -		spin_lock_irq(&pgdat->lru_lock);
>> +		get_page(page);
>>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> -		if (PageLRU(page)) {
>> -			int lru = page_lru(page);
>> -			get_page(page);
>> -			ClearPageLRU(page);
>> -			del_page_from_lru_list(page, lruvec, lru);
>> -			ret = 0;
>> -		}
>> +		spin_lock_irq(&pgdat->lru_lock);
>> +		del_page_from_lru_list(page, lruvec, lru);
>>  		spin_unlock_irq(&pgdat->lru_lock);
>> +		ret = 0;
>>  	}
>> +
>>  	return ret;
>>  }
> 
> And a small mod to isolate_lru_page() to be folded in.  I had
> never noticed this before, but here you are evaluating page_lru()
> after clearing PageLRU, but before getting lru_lock: that seems unsafe.
> I'm pretty sure it's unsafe at this stage of the series; I did once
> persuade myself that it becomes safe by the end of the series,
> but I've already forgotten the argument for that (I have already
> said TestClearPageLRU is difficult to reason about).  Please don't
> force us to have to think about this! Just get page_lru after lru_lock.
> 
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1764,12 +1764,11 @@ int isolate_lru_page(struct page *page)
>  	if (TestClearPageLRU(page)) {
>  		pg_data_t *pgdat = page_pgdat(page);
>  		struct lruvec *lruvec;
> -		int lru = page_lru(page);
>  
>  		get_page(page);
>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>  		spin_lock_irq(&pgdat->lru_lock);
> -		del_page_from_lru_list(page, lruvec, lru);
> +		del_page_from_lru_list(page, lruvec, page_lru(page));
>  		spin_unlock_irq(&pgdat->lru_lock);
>  		ret = 0;
>  	}
> 

took thanks!

> And lastly, please do check_move_unevictable_pages()'s TestClearPageLRU
> mod here at the end of mm/vmscan.c in this patch: I noticed that your
> lruv19 branch is doing it in a later patch, but it fits better here.
> 

will move that part change here
Thanks!
Alex

WARNING: multiple messages have this Message-ID (diff)
From: Alex Shi <alex.shi@linux.alibaba.com>
To: Hugh Dickins <hughd@google.com>
Cc: akpm@linux-foundation.org, mgorman@techsingularity.net,
	tj@kernel.org, khlebnikov@yandex-team.ru,
	daniel.m.jordan@oracle.com, willy@infradead.org,
	hannes@cmpxchg.org, lkp@intel.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	shakeelb@google.com, iamjoonsoo.kim@lge.com,
	richard.weiyang@gmail.com, kirill@shutemov.name,
	alexander.duyck@gmail.com, rong.a.chen@intel.com,
	mhocko@suse.com, vdavydov.dev@gmail.com, shy828301@gmail.com,
	Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU
Date: Tue, 22 Sep 2020 11:53:37 +0800	[thread overview]
Message-ID: <d3a3cee4-2680-8a04-edaf-4ddcc2e95058@linux.alibaba.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2009211504200.5214@eggly.anvils>



ÔÚ 2020/9/22 ÉÏÎç7:16, Hugh Dickins дµÀ:
> On Mon, 24 Aug 2020, Alex Shi wrote:
> 
>> Currently lru_lock still guards both lru list and page's lru bit, that's
>> ok. but if we want to use specific lruvec lock on the page, we need to
>> pin down the page's lruvec/memcg during locking. Just taking lruvec
>> lock first may be undermined by the page's memcg charge/migration. To
>> fix this problem, we could clear the lru bit out of locking and use
>> it as pin down action to block the page isolation in memcg changing.
>>
>> So now a standard steps of page isolation is following:
>> 	1, get_page(); 	       #pin the page avoid to be free
>> 	2, TestClearPageLRU(); #block other isolation like memcg change
>> 	3, spin_lock on lru_lock; #serialize lru list access
>> 	4, delete page from lru list;
>> The step 2 could be optimzed/replaced in scenarios which page is
>> unlikely be accessed or be moved between memcgs.
>>
>> This patch start with the first part: TestClearPageLRU, which combines
>> PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
>> function will be used as page isolation precondition to prevent other
>> isolations some where else. Then there are may !PageLRU page on lru
>> list, need to remove BUG() checking accordingly.
>>
>> There 2 rules for lru bit now:
>> 1, the lru bit still indicate if a page on lru list, just in some
>>    temporary moment(isolating), the page may have no lru bit when
>>    it's on lru list.  but the page still must be on lru list when the
>>    lru bit set.
>> 2, have to remove lru bit before delete it from lru list.
>>
>> Hugh Dickins pointed that when a page is in free path and no one is
>> possible to take it, non atomic lru bit clearing is better, like in
>> __page_cache_release and release_pages.
>> And no need get_page() before lru bit clear in isolate_lru_page,
>> since it '(1) Must be called with an elevated refcount on the page'.
> 
> Delete that paragraph: you're justifying changes made during the
> course of earlier review, but not needed here.  If we start to
> comment on everything that is not done...!
> 

Will delete it!

>>
>> As Andrew Morton mentioned this change would dirty cacheline for page
>> isn't on LRU. But the lost would be acceptable with Rong Chen
>> <rong.a.chen@intel.com> report:
>> https://lkml.org/lkml/2020/3/4/173
> 
> Please use a lore link instead, lkml.org is nice but unreliable:
> https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Yes, will replace the link.

> 
>>
>> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
>> Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
> 
> Acked-by: Hugh Dickins <hughd@google.com>
> when you make the changes suggested above and below.

Thanks!

> 
> I still have long-standing reservations about this TestClearPageLRU
> technique (it's hard to reason about, and requires additional atomic ops
> in some places); but it's working, so I'd like it to go in, then later
> we can experiment with whether lock_page_memcg() does a better job, or
> rechecking memcg when getting the lru_lock (my original technique).
> 
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-kernel@vger.kernel.org
>> Cc: cgroups@vger.kernel.org
>> Cc: linux-mm@kvack.org
>> ---
>>  include/linux/page-flags.h |  1 +
>>  mm/mlock.c                 |  3 +--
>>  mm/swap.c                  |  5 ++---
>>  mm/vmscan.c                | 18 +++++++-----------
>>  4 files changed, 11 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 6be1aa559b1e..9554ed1387dc 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -326,6 +326,7 @@ static inline void page_init_poison(struct page *page, size_t size)
>>  PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD)
>>  	__CLEARPAGEFLAG(Dirty, dirty, PF_HEAD)
>>  PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD)
>> +	TESTCLEARFLAG(LRU, lru, PF_HEAD)
>>  PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD)
>>  	TESTCLEARFLAG(Active, active, PF_HEAD)
>>  PAGEFLAG(Workingset, workingset, PF_HEAD)
>> diff --git a/mm/mlock.c b/mm/mlock.c
>> index 93ca2bf30b4f..3762d9dd5b31 100644
>> --- a/mm/mlock.c
>> +++ b/mm/mlock.c
>> @@ -107,13 +107,12 @@ void mlock_vma_page(struct page *page)
>>   */
>>  static bool __munlock_isolate_lru_page(struct page *page, bool getpage)
>>  {
>> -	if (PageLRU(page)) {
>> +	if (TestClearPageLRU(page)) {
>>  		struct lruvec *lruvec;
>>  
>>  		lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
>>  		if (getpage)
>>  			get_page(page);
>> -		ClearPageLRU(page);
>>  		del_page_from_lru_list(page, lruvec, page_lru(page));
>>  		return true;
>>  	}
>> diff --git a/mm/swap.c b/mm/swap.c
>> index f80ccd6f3cb4..446ffe280809 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -83,10 +83,9 @@ static void __page_cache_release(struct page *page)
>>  		struct lruvec *lruvec;
>>  		unsigned long flags;
>>  
>> +		__ClearPageLRU(page);
>>  		spin_lock_irqsave(&pgdat->lru_lock, flags);
>>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> -		VM_BUG_ON_PAGE(!PageLRU(page), page);
>> -		__ClearPageLRU(page);
>>  		del_page_from_lru_list(page, lruvec, page_off_lru(page));
>>  		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
>>  	}
>> @@ -880,9 +879,9 @@ void release_pages(struct page **pages, int nr)
>>  				spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
>>  			}
>>  
>> -			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
>>  			VM_BUG_ON_PAGE(!PageLRU(page), page);
>>  			__ClearPageLRU(page);
>> +			lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
>>  			del_page_from_lru_list(page, lruvec, page_off_lru(page));
>>  		}
>>  
> 
> Please delete all those mods to mm/swap.c from this patch.  This patch
> is about introducing TestClearPageLRU, but that is not involved here.
> Several versions ago, yes it was, then I pointed out that these are
> operations on refcount 0 pages, and we don't want to add unnecessary
> atomic operations on them.  I expect you want to keep the rearrangements,
> but do them where you need them later (I expect that's in 20/32).

When I look into the 20th patch, replace lru_lock, it seems this change isn't
belong there too. And I try to reduce more code changes from 20th patch, since
it's already big enough. that make it hard to do bisect if anything wrong.

So the same dilemma is here to this patch. For the bisection friendly, may it's
better to split this part out?

Thanks!

> 
> And I notice that one VM_BUG_ON_PAGE was kept and the other deleted:
> though one can certainly argue that they're redundant (as all BUGs
> should be), I think most people will feel safer to keep them both.

Right, will keep the BUG check here.

> 
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index 7b7b36bd1448..1b3e0eeaad64 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1665,8 +1665,6 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
>>  		page = lru_to_page(src);
>>  		prefetchw_prev_lru_page(page, src, flags);
>>  
>> -		VM_BUG_ON_PAGE(!PageLRU(page), page);
>> -
>>  		nr_pages = compound_nr(page);
>>  		total_scan += nr_pages;
>>  
> 
> It is not enough to remove just that one VM_BUG_ON_PAGE there.
> This is a patch series, and we don't need it to be perfect at every
> bisection point between patches, but we do need it to be reasonably
> robust, so as not to waste unrelated bughunters' time.  It didn't
> take me very long to crash on the "default: BUG()" further down
> isolate_lru_pages(), because now PageLRU may get cleared at any
> instant, whatever locks are held.
> 
> (But you're absolutely right to leave the compaction and pagevec
> mods to subsequent patches: it's fairly safe to separate those out,
> and much easier for reviewers that you did so.)
> 
> This patch is much more robust with __isolate_lru_page() mods below
> on top.  I agree there's other ways to do it, but given that nobody
> cares what the error return is from __isolate_lru_page(), except for
> the isolate_lru_pages() switch statement BUG() which has become
> invalid, I suggest just use -EBUSY throughout __isolate_lru_page().
> Yes, we can and should change that switch statement to an
> "if {} else {}" without any BUG(), but I don't want to mess
> you around at this time, leave cleanup like that until later.
> Please fold in this patch on top:
> 

Thanks a lot! will merge it.

> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1540,7 +1540,7 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>   */
>  int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  {
> -	int ret = -EINVAL;
> +	int ret = -EBUSY;
>  
>  	/* Only take pages on the LRU. */
>  	if (!PageLRU(page))
> @@ -1550,8 +1550,6 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  	if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE))
>  		return ret;
>  
> -	ret = -EBUSY;
> -
>  	/*
>  	 * To minimise LRU disruption, the caller can indicate that it only
>  	 * wants to isolate pages it will be able to operate on without
> @@ -1598,8 +1596,10 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode)
>  		 * sure the page is not being freed elsewhere -- the
>  		 * page release code relies on it.
>  		 */
> -		ClearPageLRU(page);
> -		ret = 0;
> +		if (TestClearPageLRU(page))
> +			ret = 0;
> +		else
> +			put_page(page);
>  	}

this code will finally be removed in next patch, but it's better in here now.
Thanks!

>  
>  	return ret;
> 
>> @@ -1763,21 +1761,19 @@ int isolate_lru_page(struct page *page)
>>  	VM_BUG_ON_PAGE(!page_count(page), page);
>>  	WARN_RATELIMIT(PageTail(page), "trying to isolate tail page");
>>  
>> -	if (PageLRU(page)) {
>> +	if (TestClearPageLRU(page)) {
>>  		pg_data_t *pgdat = page_pgdat(page);
>>  		struct lruvec *lruvec;
>> +		int lru = page_lru(page);
>>  
>> -		spin_lock_irq(&pgdat->lru_lock);
>> +		get_page(page);
>>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>> -		if (PageLRU(page)) {
>> -			int lru = page_lru(page);
>> -			get_page(page);
>> -			ClearPageLRU(page);
>> -			del_page_from_lru_list(page, lruvec, lru);
>> -			ret = 0;
>> -		}
>> +		spin_lock_irq(&pgdat->lru_lock);
>> +		del_page_from_lru_list(page, lruvec, lru);
>>  		spin_unlock_irq(&pgdat->lru_lock);
>> +		ret = 0;
>>  	}
>> +
>>  	return ret;
>>  }
> 
> And a small mod to isolate_lru_page() to be folded in.  I had
> never noticed this before, but here you are evaluating page_lru()
> after clearing PageLRU, but before getting lru_lock: that seems unsafe.
> I'm pretty sure it's unsafe at this stage of the series; I did once
> persuade myself that it becomes safe by the end of the series,
> but I've already forgotten the argument for that (I have already
> said TestClearPageLRU is difficult to reason about).  Please don't
> force us to have to think about this! Just get page_lru after lru_lock.
> 
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1764,12 +1764,11 @@ int isolate_lru_page(struct page *page)
>  	if (TestClearPageLRU(page)) {
>  		pg_data_t *pgdat = page_pgdat(page);
>  		struct lruvec *lruvec;
> -		int lru = page_lru(page);
>  
>  		get_page(page);
>  		lruvec = mem_cgroup_page_lruvec(page, pgdat);
>  		spin_lock_irq(&pgdat->lru_lock);
> -		del_page_from_lru_list(page, lruvec, lru);
> +		del_page_from_lru_list(page, lruvec, page_lru(page));
>  		spin_unlock_irq(&pgdat->lru_lock);
>  		ret = 0;
>  	}
> 

took thanks!

> And lastly, please do check_move_unevictable_pages()'s TestClearPageLRU
> mod here at the end of mm/vmscan.c in this patch: I noticed that your
> lruv19 branch is doing it in a later patch, but it fits better here.
> 

will move that part change here
Thanks!
Alex

  reply	other threads:[~2020-09-22  3:55 UTC|newest]

Thread overview: 201+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-24 12:54 [PATCH v18 00/32] per memcg lru_lock Alex Shi
2020-08-24 12:54 ` [PATCH v18 01/32] mm/memcg: warning on !memcg after readahead page charged Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 02/32] mm/memcg: bail out early from swap accounting when memcg is disabled Alex Shi
2020-08-24 12:54 ` [PATCH v18 03/32] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 04/32] mm/thp: clean up lru_add_page_tail Alex Shi
2020-08-24 12:54 ` [PATCH v18 05/32] mm/thp: remove code path which never got into Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 06/32] mm/thp: narrow lru locking Alex Shi
2020-09-10 13:49   ` Matthew Wilcox
2020-09-10 13:49     ` Matthew Wilcox
2020-09-11  3:37     ` Alex Shi
2020-09-11  3:37       ` Alex Shi
2020-09-13 15:27       ` Matthew Wilcox
2020-09-13 15:27         ` Matthew Wilcox
2020-09-19  1:00         ` Hugh Dickins
2020-09-19  1:00           ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 07/32] mm/swap.c: stop deactivate_file_page if page not on lru Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 08/32] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 09/32] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 10/32] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-08-24 12:54 ` [PATCH v18 11/32] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-08-24 12:54 ` [PATCH v18 12/32] mm/memcg: optimize mem_cgroup_page_lruvec Alex Shi
2020-08-24 12:54 ` [PATCH v18 13/32] mm/swap.c: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 14/32] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-24 12:54 ` [PATCH v18 15/32] mm/lru: move lock into lru_note_cost Alex Shi
2020-09-21 21:36   ` Hugh Dickins
2020-09-21 21:36     ` Hugh Dickins
2020-09-21 21:36     ` Hugh Dickins
2020-09-21 22:03     ` Hugh Dickins
2020-09-21 22:03       ` Hugh Dickins
2020-09-22  3:39       ` Alex Shi
2020-09-22  3:39         ` Alex Shi
2020-09-22  3:38     ` Alex Shi
2020-09-22  3:38       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 16/32] mm/lru: introduce TestClearPageLRU Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-09-21 23:16   ` Hugh Dickins
2020-09-21 23:16     ` Hugh Dickins
2020-09-21 23:16     ` Hugh Dickins
2020-09-22  3:53     ` Alex Shi [this message]
2020-09-22  3:53       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 17/32] mm/compaction: do page isolation first in compaction Alex Shi
2020-09-21 23:49   ` Hugh Dickins
2020-09-21 23:49     ` Hugh Dickins
2020-09-22  4:57     ` Alex Shi
2020-09-22  4:57       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 18/32] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 19/32] mm/swap.c: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-09-22  0:42   ` Hugh Dickins
2020-09-22  0:42     ` Hugh Dickins
2020-09-22  0:42     ` Hugh Dickins
2020-09-22  5:00     ` Alex Shi
2020-09-22  5:00       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 20/32] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-09-22  5:27   ` Hugh Dickins
2020-09-22  5:27     ` Hugh Dickins
2020-09-22  5:27     ` Hugh Dickins
2020-09-22  8:58     ` Alex Shi
2020-09-22  8:58       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 21/32] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-09-22  5:40   ` Hugh Dickins
2020-09-22  5:40     ` Hugh Dickins
2020-09-22  5:40     ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 22/32] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-09-22  5:44   ` Hugh Dickins
2020-09-22  5:44     ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-09-23  1:55       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 23/32] mm/lru: revise the comments of lru_lock Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-09-22  5:48   ` Hugh Dickins
2020-09-22  5:48     ` Hugh Dickins
2020-08-24 12:54 ` [PATCH v18 24/32] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-09-22  5:53   ` Hugh Dickins
2020-09-22  5:53     ` Hugh Dickins
2020-09-23  1:55     ` Alex Shi
2020-09-23  1:55       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 25/32] mm/mlock: remove lru_lock on TestClearPageMlocked in munlock_vma_page Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-26  5:52   ` Alex Shi
2020-08-26  5:52     ` Alex Shi
2020-09-22  6:13   ` Hugh Dickins
2020-09-22  6:13     ` Hugh Dickins
2020-09-22  6:13     ` Hugh Dickins
2020-09-23  1:58     ` Alex Shi
2020-09-23  1:58       ` Alex Shi
2020-08-24 12:54 ` [PATCH v18 26/32] mm/mlock: remove __munlock_isolate_lru_page Alex Shi
2020-08-24 12:54   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 27/32] mm/swap.c: optimizing __pagevec_lru_add lru_lock Alex Shi
2020-08-24 12:55   ` Alex Shi
2020-08-26  9:07   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 28/32] mm/compaction: Drop locked from isolate_migratepages_block Alex Shi
2020-08-24 12:55 ` [PATCH v18 29/32] mm: Identify compound pages sooner in isolate_migratepages_block Alex Shi
2020-08-24 12:55   ` Alex Shi
2020-08-24 12:55 ` [PATCH v18 30/32] mm: Drop use of test_and_set_skip in favor of just setting skip Alex Shi
2020-08-24 12:55 ` [PATCH v18 31/32] mm: Add explicit page decrement in exception path for isolate_lru_pages Alex Shi
2020-08-24 12:55   ` Alex Shi
2020-09-09  1:01   ` Matthew Wilcox
2020-09-09  1:01     ` Matthew Wilcox
2020-09-09 15:43     ` Alexander Duyck
2020-09-09 15:43       ` Alexander Duyck
2020-09-09 15:43       ` Alexander Duyck
2020-09-09 17:07       ` Matthew Wilcox
2020-09-09 17:07         ` Matthew Wilcox
2020-09-09 18:24       ` Hugh Dickins
2020-09-09 18:24         ` Hugh Dickins
2020-09-09 18:24         ` Hugh Dickins
2020-09-09 20:15         ` Matthew Wilcox
2020-09-09 20:15           ` Matthew Wilcox
2020-09-09 21:05           ` Hugh Dickins
2020-09-09 21:05             ` Hugh Dickins
2020-09-09 21:05             ` Hugh Dickins
2020-09-09 21:17         ` Alexander Duyck
2020-09-09 21:17           ` Alexander Duyck
2020-09-09 21:17           ` Alexander Duyck
2020-08-24 12:55 ` [PATCH v18 32/32] mm: Split release_pages work into 3 passes Alex Shi
2020-08-24 12:55   ` Alex Shi
2020-08-24 18:42 ` [PATCH v18 00/32] per memcg lru_lock Andrew Morton
2020-08-24 19:50   ` Qian Cai
2020-08-24 19:50     ` Qian Cai
2020-08-24 20:24   ` Hugh Dickins
2020-08-24 20:24     ` Hugh Dickins
2020-08-24 20:24     ` Hugh Dickins
2020-08-25  1:56     ` Daniel Jordan
2020-08-25  1:56       ` Daniel Jordan
2020-08-25  3:26       ` Alex Shi
2020-08-25  3:26         ` Alex Shi
2020-08-25 11:39         ` Matthew Wilcox
2020-08-25 11:39           ` Matthew Wilcox
2020-08-26  1:19         ` Daniel Jordan
2020-08-26  8:59           ` Alex Shi
2020-08-26  8:59             ` Alex Shi
2020-08-28  1:40             ` Daniel Jordan
2020-08-28  1:40               ` Daniel Jordan
2020-08-28  5:22               ` Alex Shi
2020-08-28  5:22                 ` Alex Shi
2020-09-09  2:44               ` Aaron Lu
2020-09-09  2:44                 ` Aaron Lu
2020-09-09 11:40                 ` Michal Hocko
2020-08-25  8:52       ` Alex Shi
2020-08-25  8:52         ` Alex Shi
2020-08-25 13:00         ` Alex Shi
2020-08-25 13:00           ` Alex Shi
2020-08-27  7:01     ` Hugh Dickins
2020-08-27  7:01       ` Hugh Dickins
2020-08-27 12:20       ` Race between freeing and waking page Matthew Wilcox
2020-09-08 23:41       ` [PATCH v18 00/32] per memcg lru_lock: reviews Hugh Dickins
2020-09-08 23:41         ` Hugh Dickins
2020-09-09  2:24         ` Wei Yang
2020-09-09  2:24           ` Wei Yang
2020-09-09 15:08         ` Alex Shi
2020-09-09 15:08           ` Alex Shi
2020-09-09 23:16           ` Hugh Dickins
2020-09-09 23:16             ` Hugh Dickins
2020-09-11  2:50             ` Alex Shi
2020-09-11  2:50               ` Alex Shi
2020-09-12  2:13               ` Hugh Dickins
2020-09-12  2:13                 ` Hugh Dickins
2020-09-13 14:21                 ` Alex Shi
2020-09-13 14:21                   ` Alex Shi
2020-09-15  8:21                   ` Hugh Dickins
2020-09-15  8:21                     ` Hugh Dickins
2020-09-15  8:21                     ` Hugh Dickins
2020-09-15 16:58                     ` Daniel Jordan
2020-09-15 16:58                       ` Daniel Jordan
2020-09-16 12:44                       ` Alex Shi
2020-09-17  2:37                       ` Alex Shi
2020-09-17  2:37                         ` Alex Shi
2020-09-17 14:35                         ` Daniel Jordan
2020-09-17 14:35                           ` Daniel Jordan
2020-09-17 15:39                           ` Alexander Duyck
2020-09-17 15:39                             ` Alexander Duyck
2020-09-17 15:39                             ` Alexander Duyck
2020-09-17 16:48                             ` Daniel Jordan
2020-09-17 16:48                               ` Daniel Jordan
2020-09-12  8:38           ` Hugh Dickins
2020-09-12  8:38             ` Hugh Dickins
2020-09-12  8:38             ` Hugh Dickins
2020-09-13 14:22             ` Alex Shi
2020-09-13 14:22               ` Alex Shi
2020-09-09 16:11         ` Alexander Duyck
2020-09-09 16:11           ` Alexander Duyck
2020-09-09 16:11           ` Alexander Duyck
2020-09-10  0:32           ` Hugh Dickins
2020-09-10  0:32             ` Hugh Dickins
2020-09-10  0:32             ` Hugh Dickins
2020-09-10 14:24             ` Alexander Duyck
2020-09-10 14:24               ` Alexander Duyck
2020-09-10 14:24               ` Alexander Duyck
2020-09-12  5:12               ` Hugh Dickins
2020-09-12  5:12                 ` Hugh Dickins
2020-09-12  5:12                 ` Hugh Dickins
2020-08-25  7:21   ` [PATCH v18 00/32] per memcg lru_lock Michal Hocko
2020-08-25  7:21     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3a3cee4-2680-8a04-edaf-4ddcc2e95058@linux.alibaba.com \
    --to=alex.shi@linux.alibaba.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.duyck@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.