All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>, Zi Yan <ziy@nvidia.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Shuang Zhai <zhais@google.com>
Subject: Re: [PATCH 2/3] mm: free zapped tail pages when splitting isolated thp
Date: Sun, 8 Aug 2021 11:49:33 -0600	[thread overview]
Message-ID: <CAOUHufZdefs-QQnKb_8M_KCrUHB4qurB6ULGOy3vc8A_R3gFPA@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkq4Bk2U8gEOum=uspwtjh=4ikoxdh7oJmyBLNvch8uvyA@mail.gmail.com>

On Wed, Aug 4, 2021 at 6:13 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Fri, Jul 30, 2021 at 11:39 PM Yu Zhao <yuzhao@google.com> wrote:
> >
> > If a tail page has only two references left, one inherited from the
> > isolation of its head and the other from lru_add_page_tail() which we
> > are about to drop, it means this tail page was concurrently zapped.
> > Then we can safely free it and save page reclaim or migration the
> > trouble of trying it.
> >
> > Signed-off-by: Yu Zhao <yuzhao@google.com>
> > Tested-by: Shuang Zhai <zhais@google.com>
> > ---
> >  include/linux/vm_event_item.h |  1 +
> >  mm/huge_memory.c              | 28 ++++++++++++++++++++++++++++
> >  mm/vmstat.c                   |  1 +
> >  3 files changed, 30 insertions(+)
> >
> > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> > index ae0dd1948c2b..829eeac84094 100644
> > --- a/include/linux/vm_event_item.h
> > +++ b/include/linux/vm_event_item.h
> > @@ -99,6 +99,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> >  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >                 THP_SPLIT_PUD,
> >  #endif
> > +               THP_SPLIT_FREE,
> >                 THP_ZERO_PAGE_ALLOC,
> >                 THP_ZERO_PAGE_ALLOC_FAILED,
> >                 THP_SWPOUT,
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d8b655856e79..5120478bca41 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2432,6 +2432,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >         struct address_space *swap_cache = NULL;
> >         unsigned long offset = 0;
> >         unsigned int nr = thp_nr_pages(head);
> > +       LIST_HEAD(pages_to_free);
> > +       int nr_pages_to_free = 0;
> >         int i;
> >
> >         VM_BUG_ON_PAGE(list && PageLRU(head), head);
> > @@ -2506,6 +2508,25 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >                         continue;
> >                 unlock_page(subpage);
> >
> > +               /*
> > +                * If a tail page has only two references left, one inherited
> > +                * from the isolation of its head and the other from
> > +                * lru_add_page_tail() which we are about to drop, it means this
> > +                * tail page was concurrently zapped. Then we can safely free it
> > +                * and save page reclaim or migration the trouble of trying it.
> > +                */
> > +               if (list && page_ref_freeze(subpage, 2)) {
> > +                       VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
> > +                       VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
> > +                       VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
> > +
> > +                       ClearPageActive(subpage);
> > +                       ClearPageUnevictable(subpage);
> > +                       list_move(&subpage->lru, &pages_to_free);
> > +                       nr_pages_to_free++;
> > +                       continue;
> > +               }
>
> Yes, such page could be freed instead of swapping out. But I'm
> wondering if we could have some simpler implementation. Since such
> pages will be re-added to page list, so we should be able to check
> their refcount in shrink_page_list(). If the refcount is 1, the
> refcount inc'ed by lru_add_page_tail() has been put by later
> put_page(), we know it is freed under us since the only refcount comes
> from isolation, we could just jump to "keep" (the label in
> shrink_page_list()), then such page will be freed later by
> shrink_inactive_list().
>
> For MADV_PAGEOUT, I think we could add some logic to handle such page
> after shrink_page_list(), just like what shrink_inactive_list() does.
>
> Migration already handles refcount == 1 page, so should not need any change.
>
> Is this idea feasible?

Yes, but then we would have to loop over the tail pages twice, here
and in shrink_page_list(), right?

In addition, if we try to freeze the refcount of a page in
shrink_page_list(), we couldn't be certain whether this page used to
be a tail page. So we would have to test every page. If a page wasn't
a tail page, it's unlikely for its refcount to drop unless there is a
race. But this patch isn't really intended to optimize such a race.
It's mainly for the next, i.e., we know there is a good chance to drop
tail pages (~10% on our systems). Sounds reasonable? Thanks.

> > +
> >                 /*
> >                  * Subpages may be freed if there wasn't any mapping
> >                  * like if add_to_swap() is running on a lru page that
> > @@ -2515,6 +2536,13 @@ static void __split_huge_page(struct page *page, struct list_head *list,
> >                  */
> >                 put_page(subpage);
> >         }
> > +
> > +       if (!nr_pages_to_free)
> > +               return;
> > +
> > +       mem_cgroup_uncharge_list(&pages_to_free);
> > +       free_unref_page_list(&pages_to_free);
> > +       count_vm_events(THP_SPLIT_FREE, nr_pages_to_free);
> >  }
> >
> >  int total_mapcount(struct page *page)
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index b0534e068166..f486e5d98d96 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -1300,6 +1300,7 @@ const char * const vmstat_text[] = {
> >  #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >         "thp_split_pud",
> >  #endif
> > +       "thp_split_free",
> >         "thp_zero_page_alloc",
> >         "thp_zero_page_alloc_failed",
> >         "thp_swpout",
> > --
> > 2.32.0.554.ge1b32706d8-goog
> >

  reply	other threads:[~2021-08-08 17:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-31  6:39 [PATCH 0/3] mm: optimize thp for reclaim and migration Yu Zhao
2021-07-31  6:39 ` Yu Zhao
2021-07-31  6:39 ` [PATCH 1/3] mm: don't take lru lock when splitting isolated thp Yu Zhao
2021-07-31  6:39   ` Yu Zhao
2021-07-31  6:39 ` [PATCH 2/3] mm: free zapped tail pages " Yu Zhao
2021-07-31  6:39   ` Yu Zhao
2021-08-04 14:22   ` Kirill A. Shutemov
2021-08-08 17:28     ` Yu Zhao
2021-08-08 17:28       ` Yu Zhao
2021-08-05  0:13   ` Yang Shi
2021-08-05  0:13     ` Yang Shi
2021-08-08 17:49     ` Yu Zhao [this message]
2021-08-08 17:49       ` Yu Zhao
2021-08-11 22:25       ` Yang Shi
2021-08-11 22:25         ` Yang Shi
2021-08-11 23:12         ` Yu Zhao
2021-08-11 23:12           ` Yu Zhao
2021-08-13 23:24           ` Yang Shi
2021-08-13 23:24             ` Yang Shi
2021-08-13 23:56             ` Yu Zhao
2021-08-13 23:56               ` Yu Zhao
2021-08-14  0:30               ` Yang Shi
2021-08-14  0:30                 ` Yang Shi
2021-08-14  1:49                 ` Yu Zhao
2021-08-14  1:49                   ` Yu Zhao
2021-08-14  2:34                   ` Yang Shi
2021-08-14  2:34                     ` Yang Shi
2021-07-31  6:39 ` [PATCH 3/3] mm: don't remap clean subpages " Yu Zhao
2021-07-31  6:39   ` Yu Zhao
2021-07-31  9:53   ` kernel test robot
2021-07-31  9:53     ` kernel test robot
2021-07-31 15:45   ` kernel test robot
2021-07-31 15:45     ` kernel test robot
2021-08-03 11:25   ` Matthew Wilcox
2021-08-03 11:36   ` Matthew Wilcox
2021-08-08 17:21     ` Yu Zhao
2021-08-08 17:21       ` Yu Zhao
2021-08-04 14:27   ` Kirill A. Shutemov
2021-08-08 17:20     ` Yu Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOUHufZdefs-QQnKb_8M_KCrUHB4qurB6ULGOy3vc8A_R3gFPA@mail.gmail.com \
    --to=yuzhao@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=zhais@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.