Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	 Tejun Heo <tj@kernel.org>, Hugh Dickins <hughd@google.com>,
	 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	 Yang Shi <yang.shi@linux.alibaba.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	kbuild test robot <lkp@intel.com>, linux-mm <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org,  Shakeel Butt <shakeelb@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Wei Yang <richard.weiyang@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	 Rong Chen <rong.a.chen@intel.com>
Subject: Re: [PATCH v17 14/21] mm/compaction: do page isolation first in compaction
Date: Tue, 11 Aug 2020 07:47:17 -0700
Message-ID: <CAKgT0Ues0ShkSbb1XtA7z7EYB8NCPgLGq8zZUjrXK_jcWn8mDQ@mail.gmail.com> (raw)
In-Reply-To: <d9818e06-95f1-9f21-05c0-98f29ea96d89@linux.alibaba.com>

On Tue, Aug 11, 2020 at 1:23 AM Alex Shi <alex.shi@linux.alibaba.com> wrote:
>
>
>
> 在 2020/8/10 下午10:41, Alexander Duyck 写道:
> > On Mon, Aug 10, 2020 at 6:10 AM Alex Shi <alex.shi@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> 在 2020/8/7 下午10:51, Alexander Duyck 写道:
> >>> I wonder if this entire section shouldn't be restructured. This is the
> >>> only spot I can see where we are resetting the LRU flag instead of
> >>> pulling the page from the LRU list with the lock held. Looking over
> >>> the code it seems like something like that should be possible. I am
> >>> not sure the LRU lock is really protecting us in either the
> >>> PageCompound check nor the skip bits. It seems like holding a
> >>> reference on the page should prevent it from switching between
> >>> compound or not, and the skip bits are per pageblock with the LRU bits
> >>> being per node/memcg which I would think implies that we could have
> >>> multiple LRU locks that could apply to a single skip bit.
> >>
> >> Hi Alexander,
> >>
> >> I don't find problem yet on compound or skip bit usage. Would you clarify the
> >> issue do you concerned?
> >>
> >> Thanks!
> >
> > The point I was getting at is that the LRU lock is being used to
> > protect these and with your changes I don't think that makes sense
> > anymore.
> >
> > The skip bits are per-pageblock bits. With your change the LRU lock is
> > now per memcg first and then per node. As such I do not believe it
> > really provides any sort of exclusive access to the skip bits. I still
> > have to look into this more, but it seems like you need a lock per
> > either section or zone that can be used to protect those bits and deal
> > with this sooner rather than waiting until you have found an LRU page.
> > The one part that is confusing though is that the definition of the
> > skip bits seems to call out that they are a hint since they are not
> > protected by a lock, but that is exactly what has been happening here.
> >
>
> The skip bits are safe here, since even it race with other skip action,
> It will still skip out. The skip action is try not to compaction too much,
> not a exclusive action needs avoid race.

That would be the case if it didn't have the impact that they
currently do on the compaction process. What I am getting at is that a
race was introduced when you placed this test between the clearing of
the LRU flag and the actual pulling of the page from the LRU list. So
if you tested the skip bits before clearing the LRU flag then I would
be okay with the code, however because it is triggering an abort after
the LRU flag is cleared then you are creating a situation where
multiple processes will be stomping all over each other as you can
have each thread essentially take a page via the LRU flag, but only
one thread will process a page and it could skip over all other pages
that preemptively had their LRU flag cleared.

If you take a look at the test_and_set_skip the function only acts on
the pageblock aligned PFN for a given range. WIth the changes you have
in place now that would mean that only one thread would ever actually
call this function anyway since the first PFN would take the LRU flag
so no other thread could follow through and test or set the bit as
well. The expectation before was that all threads would encounter this
test and either proceed after setting the bit for the first PFN or
abort after testing the first PFN. With you changes only the first
thread actually runs this test and then it and the others will likely
encounter multiple failures as they are all clearing LRU bits
simultaneously and tripping each other up. That is why the skip bit
must have a test and set done before you even get to the point of
clearing the LRU flag.

> > The point I was getting at with the PageCompound check is that instead
> > of needing the LRU lock you should be able to look at PageCompound as
> > soon as you call get_page_unless_zero() and preempt the need to set
> > the LRU bit again. Instead of trying to rely on the LRU lock to
> > guarantee that the page hasn't been merged you could just rely on the
> > fact that you are holding a reference to it so it isn't going to
> > switch between being compound or order 0 since it cannot be freed. It
> > spoils the idea I originally had of combining the logic for
> > get_page_unless_zero and TestClearPageLRU into a single function, but
> > the advantage is you aren't clearing the LRU flag unless you are
> > actually going to pull the page from the LRU list.
>
> Sorry, I still can not follow you here. Compound code part is unchanged
> and follow the original logical. So would you like to pose a new code to
> see if its works?

No there are significant changes as you reordered all of the
operations. Prior to your change the LRU bit was checked, but not
cleared before testing for PageCompound. Now you are clearing it
before you are testing if it is a compound page. So if compaction is
running we will be seeing the pages in the LRU stay put, but the
compound bit flickering off and on if the compound page is encountered
with the wrong or NULL lruvec. What I was suggesting is that the
PageCompound test probably doesn't need to be concerned with the lock
after your changes. You could test it after you call
get_page_unless_zero() and before you call
__isolate_lru_page_prepare(). Instead of relying on the LRU lock to
protect us from the page switching between compound and not we would
be relying on the fact that we are holding a reference to the page so
it should not be freed and transition between compound or not.


  reply index

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-25 12:59 [PATCH v17 00/21] per memcg lru lock Alex Shi
2020-07-25 12:59 ` [PATCH v17 01/21] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-06  3:47   ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 02/21] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-07-25 12:59 ` [PATCH v17 03/21] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-07-27 17:29   ` Alexander Duyck
2020-07-28 11:59     ` Alex Shi
2020-07-28 14:17       ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 04/21] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-07-25 12:59 ` [PATCH v17 05/21] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-07-25 12:59 ` [PATCH v17 06/21] mm/thp: clean up lru_add_page_tail Alex Shi
2020-07-25 12:59 ` [PATCH v17 07/21] mm/thp: remove code path which never got into Alex Shi
2020-07-25 12:59 ` [PATCH v17 08/21] mm/thp: narrow lru locking Alex Shi
2020-07-25 12:59 ` [PATCH v17 09/21] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-07-25 12:59 ` [PATCH v17 10/21] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-07-25 12:59 ` [PATCH v17 11/21] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-05 21:18   ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 12/21] mm/lru: move lock into lru_note_cost Alex Shi
2020-07-25 12:59 ` [PATCH v17 13/21] mm/lru: introduce TestClearPageLRU Alex Shi
2020-07-29  3:53   ` Alex Shi
2020-08-05 22:43     ` Alexander Duyck
2020-08-06  1:54       ` Alex Shi
2020-08-06 14:41         ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 14/21] mm/compaction: do page isolation first in compaction Alex Shi
2020-08-04 21:35   ` Alexander Duyck
2020-08-06 18:38   ` Alexander Duyck
2020-08-07  3:24     ` Alex Shi
2020-08-07 14:51       ` Alexander Duyck
2020-08-10 13:10         ` Alex Shi
2020-08-10 14:41           ` Alexander Duyck
2020-08-11  8:22             ` Alex Shi
2020-08-11 14:47               ` Alexander Duyck [this message]
2020-08-12 11:43                 ` Alex Shi
2020-08-12 12:16                   ` Alex Shi
2020-08-12 16:51                   ` Alexander Duyck
2020-08-13  1:46                     ` Alex Shi
2020-08-13  2:17                       ` Alexander Duyck
2020-08-13  3:52                         ` Alex Shi
2020-08-13  4:02                       ` [RFC PATCH 0/3] " Alexander Duyck
2020-08-13  4:02                         ` [RFC PATCH 1/3] mm: Drop locked from isolate_migratepages_block Alexander Duyck
2020-08-13  6:56                           ` Alex Shi
2020-08-13 14:32                             ` Alexander Duyck
2020-08-14  7:25                               ` Alex Shi
2020-08-13  7:44                           ` Alex Shi
2020-08-13 14:26                             ` Alexander Duyck
2020-08-13  4:02                         ` [RFC PATCH 2/3] mm: Drop use of test_and_set_skip in favor of just setting skip Alexander Duyck
2020-08-14  7:19                           ` Alex Shi
2020-08-14 14:24                             ` Alexander Duyck
2020-08-14 21:15                               ` Alexander Duyck
2020-08-15  9:49                                 ` Alex Shi
2020-08-17 15:38                                   ` Alexander Duyck
2020-08-18  6:50                           ` Alex Shi
2020-08-13  4:02                         ` [RFC PATCH 3/3] mm: Identify compound pages sooner in isolate_migratepages_block Alexander Duyck
2020-08-14  7:20                           ` Alex Shi
2020-08-17 22:58   ` [PATCH v17 14/21] mm/compaction: do page isolation first in compaction Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 15/21] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-07-25 12:59 ` [PATCH v17 16/21] mm/swap: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-07-25 12:59 ` [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-07-27 23:34   ` Alexander Duyck
2020-07-28  7:15     ` Alex Shi
2020-07-28 11:19     ` Alex Shi
2020-07-28 14:54       ` Alexander Duyck
2020-07-29  1:00         ` Alex Shi
2020-07-29  1:27           ` Alexander Duyck
2020-07-29  2:27             ` Alex Shi
2020-07-28 15:39     ` Alex Shi
2020-07-28 15:55       ` Alexander Duyck
2020-07-29  0:48         ` Alex Shi
2020-07-29  3:54   ` Alex Shi
2020-08-06  7:41   ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 18/21] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-07-29 17:52   ` Alexander Duyck
2020-07-30  6:08     ` Alex Shi
2020-07-31 14:20       ` Alexander Duyck
2020-07-31 21:14   ` [PATCH RFC] mm: Add function for testing if the current lruvec lock is valid alexander.h.duyck
2020-07-31 23:54     ` Alex Shi
2020-08-02 18:20       ` Alexander Duyck
2020-08-04  6:13         ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 19/21] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-08-03 22:49   ` Alexander Duyck
2020-08-04  6:23     ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 20/21] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-08-03 22:42   ` Alexander Duyck
2020-08-03 22:45     ` Alexander Duyck
2020-08-04  6:22       ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 21/21] mm/lru: revise the comments of lru_lock Alex Shi
2020-08-03 22:37   ` Alexander Duyck
2020-08-04 10:04     ` Alex Shi
2020-08-04 14:29       ` Alexander Duyck
2020-08-06  1:39         ` Alex Shi
2020-08-06 16:27           ` Alexander Duyck
2020-07-27  5:40 ` [PATCH v17 00/21] per memcg lru lock Alex Shi
2020-07-29 14:49   ` Alex Shi
2020-07-29 18:06     ` Hugh Dickins
2020-07-30  2:16       ` Alex Shi
2020-08-03 15:07         ` Michal Hocko
2020-08-04  6:14           ` Alex Shi
2020-07-31 21:31 ` Alexander Duyck
2020-08-04  8:36 ` Alex Shi
2020-08-04  8:36 ` Alex Shi
2020-08-04  8:37 ` Alex Shi
2020-08-04  8:37 ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0Ues0ShkSbb1XtA7z7EYB8NCPgLGq8zZUjrXK_jcWn8mDQ@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git