Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Alex Shi <alex.shi@linux.alibaba.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	 Tejun Heo <tj@kernel.org>, Hugh Dickins <hughd@google.com>,
	 Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
	Daniel Jordan <daniel.m.jordan@oracle.com>,
	 Yang Shi <yang.shi@linux.alibaba.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	kbuild test robot <lkp@intel.com>, linux-mm <linux-mm@kvack.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	cgroups@vger.kernel.org,  Shakeel Butt <shakeelb@google.com>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 Wei Yang <richard.weiyang@gmail.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	 Rong Chen <rong.a.chen@intel.com>,
	Michal Hocko <mhocko@kernel.org>,
	 Vladimir Davydov <vdavydov.dev@gmail.com>
Subject: Re: [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock
Date: Tue, 28 Jul 2020 18:27:16 -0700
Message-ID: <CAKgT0UfCv9u3UaJnzh7CYu_nCggV8yesZNu4oxMGn4+mJYiFUw@mail.gmail.com> (raw)
In-Reply-To: <09aeced7-cc36-0c9a-d40b-451db9dc54cc@linux.alibaba.com>

On Tue, Jul 28, 2020 at 6:00 PM Alex Shi <alex.shi@linux.alibaba.com> wrote:
>
>
>
> 在 2020/7/28 下午10:54, Alexander Duyck 写道:
> > On Tue, Jul 28, 2020 at 4:20 AM Alex Shi <alex.shi@linux.alibaba.com> wrote:
> >>
> >>
> >>
> >> 在 2020/7/28 上午7:34, Alexander Duyck 写道:
> >>>> @@ -1876,6 +1876,12 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec,
> >>>>                  *                                        list_add(&page->lru,)
> >>>>                  *     list_add(&page->lru,) //corrupt
> >>>>                  */
> >>>> +               new_lruvec = mem_cgroup_page_lruvec(page, page_pgdat(page));
> >>>> +               if (new_lruvec != lruvec) {
> >>>> +                       if (lruvec)
> >>>> +                               spin_unlock_irq(&lruvec->lru_lock);
> >>>> +                       lruvec = lock_page_lruvec_irq(page);
> >>>> +               }
> >>>>                 SetPageLRU(page);
> >>>>
> >>>>                 if (unlikely(put_page_testzero(page))) {
> >>> I was going through the code of the entire patch set and I noticed
> >>> these changes in move_pages_to_lru. What is the reason for adding the
> >>> new_lruvec logic? My understanding is that we are moving the pages to
> >>> the lruvec provided are we not?If so why do we need to add code to get
> >>> a new lruvec? The code itself seems to stand out from the rest of the
> >>> patch as it is introducing new code instead of replacing existing
> >>> locking code, and it doesn't match up with the description of what
> >>> this function is supposed to do since it changes the lruvec.
> >>
> >> this new_lruvec is the replacement of removed line, as following code:
> >>>> -               lruvec = mem_cgroup_page_lruvec(page, pgdat);
> >> This recheck is for the page move the root memcg, otherwise it cause the bug:
> >
> > Okay, now I see where the issue is. You moved this code so now it has
> > a different effect than it did before. You are relocking things before
> > you needed to. Don't forget that when you came into this function you
> > already had the lock. In addition the patch is broken as it currently
> > stands as you aren't using similar logic in the code just above this
> > addition if you encounter an evictable page. As a result this is
> > really difficult to review as there are subtle bugs here.
>
> Why you think its a bug? the relock only happens if locked lruvec is different.
> and unlock the old one.

The section I am talking about with the bug is this section here:
       while (!list_empty(list)) {
+               struct lruvec *new_lruvec = NULL;
+
                page = lru_to_page(list);
                VM_BUG_ON_PAGE(PageLRU(page), page);
                list_del(&page->lru);
                if (unlikely(!page_evictable(page))) {
-                       spin_unlock_irq(&pgdat->lru_lock);
+                       spin_unlock_irq(&lruvec->lru_lock);
                        putback_lru_page(page);
-                       spin_lock_irq(&pgdat->lru_lock);
+                       spin_lock_irq(&lruvec->lru_lock);
                        continue;
                }

Basically it probably is not advisable to be retaking the
lruvec->lru_lock directly as the lruvec may have changed so it
wouldn't be correct for the next page. It would make more sense to be
using your API and calling unlock_page_lruvec_irq and
lock_page_lruvec_irq instead of using the lock directly.

> >
> > I suppose the correct fix is to get rid of this line, but  it should
> > be placed everywhere the original function was calling
> > spin_lock_irq().
> >
> > In addition I would consider changing the arguments/documentation for
> > move_pages_to_lru. You aren't moving the pages to lruvec, so there is
> > probably no need to pass that as an argument. Instead I would pass
> > pgdat since that isn't going to be moving and is the only thing you
> > actually derive based on the original lruvec.
>
> yes, The comments should be changed with the line was introduced from long ago. :)
> Anyway, I am wondering if it worth a v18 version resend?

So I have been looking over the function itself and I wonder if it
isn't worth looking at rewriting this to optimize the locking behavior
to minimize the number of times we have to take the LRU lock. I have
some code I am working on that I plan to submit as an RFC in the next
day or so after I can get it smoke tested. The basic idea would be to
defer returning the evictiable pages or freeing the compound pages
until after we have processed the pages that can be moved while still
holding the lock. I would think it should reduce the lock contention
significantly while improving the throughput.


  reply index

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-25 12:59 [PATCH v17 00/21] per memcg lru lock Alex Shi
2020-07-25 12:59 ` [PATCH v17 01/21] mm/vmscan: remove unnecessary lruvec adding Alex Shi
2020-08-06  3:47   ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 02/21] mm/page_idle: no unlikely double check for idle page counting Alex Shi
2020-07-25 12:59 ` [PATCH v17 03/21] mm/compaction: correct the comments of compact_defer_shift Alex Shi
2020-07-27 17:29   ` Alexander Duyck
2020-07-28 11:59     ` Alex Shi
2020-07-28 14:17       ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 04/21] mm/compaction: rename compact_deferred as compact_should_defer Alex Shi
2020-07-25 12:59 ` [PATCH v17 05/21] mm/thp: move lru_add_page_tail func to huge_memory.c Alex Shi
2020-07-25 12:59 ` [PATCH v17 06/21] mm/thp: clean up lru_add_page_tail Alex Shi
2020-07-25 12:59 ` [PATCH v17 07/21] mm/thp: remove code path which never got into Alex Shi
2020-07-25 12:59 ` [PATCH v17 08/21] mm/thp: narrow lru locking Alex Shi
2020-07-25 12:59 ` [PATCH v17 09/21] mm/memcg: add debug checking in lock_page_memcg Alex Shi
2020-07-25 12:59 ` [PATCH v17 10/21] mm/swap: fold vm event PGROTATED into pagevec_move_tail_fn Alex Shi
2020-07-25 12:59 ` [PATCH v17 11/21] mm/lru: move lru_lock holding in func lru_note_cost_page Alex Shi
2020-08-05 21:18   ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 12/21] mm/lru: move lock into lru_note_cost Alex Shi
2020-07-25 12:59 ` [PATCH v17 13/21] mm/lru: introduce TestClearPageLRU Alex Shi
2020-07-29  3:53   ` Alex Shi
2020-08-05 22:43     ` Alexander Duyck
2020-08-06  1:54       ` Alex Shi
2020-08-06 14:41         ` Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 14/21] mm/compaction: do page isolation first in compaction Alex Shi
2020-08-04 21:35   ` Alexander Duyck
2020-08-06 18:38   ` Alexander Duyck
2020-08-07  3:24     ` Alex Shi
2020-08-07 14:51       ` Alexander Duyck
2020-08-10 13:10         ` Alex Shi
2020-08-10 14:41           ` Alexander Duyck
2020-08-11  8:22             ` Alex Shi
2020-08-11 14:47               ` Alexander Duyck
2020-08-12 11:43                 ` Alex Shi
2020-08-12 12:16                   ` Alex Shi
2020-08-12 16:51                   ` Alexander Duyck
2020-08-13  1:46                     ` Alex Shi
2020-08-13  2:17                       ` Alexander Duyck
2020-08-13  3:52                         ` Alex Shi
2020-08-13  4:02                       ` [RFC PATCH 0/3] " Alexander Duyck
2020-08-13  4:02                         ` [RFC PATCH 1/3] mm: Drop locked from isolate_migratepages_block Alexander Duyck
2020-08-13  6:56                           ` Alex Shi
2020-08-13 14:32                             ` Alexander Duyck
2020-08-14  7:25                               ` Alex Shi
2020-08-13  7:44                           ` Alex Shi
2020-08-13 14:26                             ` Alexander Duyck
2020-08-13  4:02                         ` [RFC PATCH 2/3] mm: Drop use of test_and_set_skip in favor of just setting skip Alexander Duyck
2020-08-14  7:19                           ` Alex Shi
2020-08-14 14:24                             ` Alexander Duyck
2020-08-14 21:15                               ` Alexander Duyck
2020-08-15  9:49                                 ` Alex Shi
2020-08-17 15:38                                   ` Alexander Duyck
2020-08-18  6:50                           ` Alex Shi
2020-08-13  4:02                         ` [RFC PATCH 3/3] mm: Identify compound pages sooner in isolate_migratepages_block Alexander Duyck
2020-08-14  7:20                           ` Alex Shi
2020-08-17 22:58   ` [PATCH v17 14/21] mm/compaction: do page isolation first in compaction Alexander Duyck
2020-07-25 12:59 ` [PATCH v17 15/21] mm/thp: add tail pages into lru anyway in split_huge_page() Alex Shi
2020-07-25 12:59 ` [PATCH v17 16/21] mm/swap: serialize memcg changes in pagevec_lru_move_fn Alex Shi
2020-07-25 12:59 ` [PATCH v17 17/21] mm/lru: replace pgdat lru_lock with lruvec lock Alex Shi
2020-07-27 23:34   ` Alexander Duyck
2020-07-28  7:15     ` Alex Shi
2020-07-28 11:19     ` Alex Shi
2020-07-28 14:54       ` Alexander Duyck
2020-07-29  1:00         ` Alex Shi
2020-07-29  1:27           ` Alexander Duyck [this message]
2020-07-29  2:27             ` Alex Shi
2020-07-28 15:39     ` Alex Shi
2020-07-28 15:55       ` Alexander Duyck
2020-07-29  0:48         ` Alex Shi
2020-07-29  3:54   ` Alex Shi
2020-08-06  7:41   ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 18/21] mm/lru: introduce the relock_page_lruvec function Alex Shi
2020-07-29 17:52   ` Alexander Duyck
2020-07-30  6:08     ` Alex Shi
2020-07-31 14:20       ` Alexander Duyck
2020-07-31 21:14   ` [PATCH RFC] mm: Add function for testing if the current lruvec lock is valid alexander.h.duyck
2020-07-31 23:54     ` Alex Shi
2020-08-02 18:20       ` Alexander Duyck
2020-08-04  6:13         ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 19/21] mm/vmscan: use relock for move_pages_to_lru Alex Shi
2020-08-03 22:49   ` Alexander Duyck
2020-08-04  6:23     ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 20/21] mm/pgdat: remove pgdat lru_lock Alex Shi
2020-08-03 22:42   ` Alexander Duyck
2020-08-03 22:45     ` Alexander Duyck
2020-08-04  6:22       ` Alex Shi
2020-07-25 12:59 ` [PATCH v17 21/21] mm/lru: revise the comments of lru_lock Alex Shi
2020-08-03 22:37   ` Alexander Duyck
2020-08-04 10:04     ` Alex Shi
2020-08-04 14:29       ` Alexander Duyck
2020-08-06  1:39         ` Alex Shi
2020-08-06 16:27           ` Alexander Duyck
2020-07-27  5:40 ` [PATCH v17 00/21] per memcg lru lock Alex Shi
2020-07-29 14:49   ` Alex Shi
2020-07-29 18:06     ` Hugh Dickins
2020-07-30  2:16       ` Alex Shi
2020-08-03 15:07         ` Michal Hocko
2020-08-04  6:14           ` Alex Shi
2020-07-31 21:31 ` Alexander Duyck
2020-08-04  8:36 ` Alex Shi
2020-08-04  8:36 ` Alex Shi
2020-08-04  8:37 ` Alex Shi
2020-08-04  8:37 ` Alex Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKgT0UfCv9u3UaJnzh7CYu_nCggV8yesZNu4oxMGn4+mJYiFUw@mail.gmail.com \
    --to=alexander.duyck@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel.m.jordan@oracle.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=rong.a.chen@intel.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=willy@infradead.org \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git