linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	NeilBrown <neilb@suse.de>, "Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	"Darrick J . Wong" <djwong@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Dave Chinner <david@fromorbit.com>,
	Rik van Riel <riel@surriel.com>, Vlastimil Babka <vbabka@suse.cz>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>, Linux-MM <linux-mm@kvack.org>,
	Linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated
Date: Tue, 19 Oct 2021 10:12:08 -0700	[thread overview]
Message-ID: <CAHbLzkrpuB=YTHVXDdVUhqh11AiU2W53f8QsYetAEigEiA+m+w@mail.gmail.com> (raw)
In-Reply-To: <20211019090108.25501-3-mgorman@techsingularity.net>

On Tue, Oct 19, 2021 at 2:01 AM Mel Gorman <mgorman@techsingularity.net> wrote:
>
> Page reclaim throttles on congestion if too many parallel reclaim instances
> have isolated too many pages. This makes no sense, excessive parallelisation
> has nothing to do with writeback or congestion.
>
> This patch creates an additional workqueue to sleep on when too many
> pages are isolated. The throttled tasks are woken when the number
> of isolated pages is reduced or a timeout occurs. There may be
> some false positive wakeups for GFP_NOIO/GFP_NOFS callers but
> the tasks will throttle again if necessary.
>
> [shy828301@gmail.com: Wake up from compaction context]

Reviewed-by: Yang Shi <shy828301@gmail.com>

> [vbabka@suse.cz: Account number of throttled tasks only for writeback]
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  include/linux/mmzone.h        |  6 ++++--
>  include/trace/events/vmscan.h |  4 +++-
>  mm/compaction.c               | 10 ++++++++--
>  mm/internal.h                 | 13 ++++++++++++-
>  mm/page_alloc.c               |  6 +++++-
>  mm/vmscan.c                   | 28 +++++++++++++++++++---------
>  6 files changed, 51 insertions(+), 16 deletions(-)
>
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index ef0a63ebd21d..58a25d42c31c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -275,6 +275,8 @@ enum lru_list {
>
>  enum vmscan_throttle_state {
>         VMSCAN_THROTTLE_WRITEBACK,
> +       VMSCAN_THROTTLE_ISOLATED,
> +       NR_VMSCAN_THROTTLE,
>  };
>
>  #define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
> @@ -846,8 +848,8 @@ typedef struct pglist_data {
>         int node_id;
>         wait_queue_head_t kswapd_wait;
>         wait_queue_head_t pfmemalloc_wait;
> -       wait_queue_head_t reclaim_wait; /* wq for throttling reclaim */
> -       atomic_t nr_reclaim_throttled;  /* nr of throtted tasks */
> +       wait_queue_head_t reclaim_wait[NR_VMSCAN_THROTTLE];
> +       atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */
>         unsigned long nr_reclaim_start; /* nr pages written while throttled
>                                          * when throttling started. */
>         struct task_struct *kswapd;     /* Protected by
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index c317f9fe0d17..d4905bd9e9c4 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -28,10 +28,12 @@
>                 ) : "RECLAIM_WB_NONE"
>
>  #define _VMSCAN_THROTTLE_WRITEBACK     (1 << VMSCAN_THROTTLE_WRITEBACK)
> +#define _VMSCAN_THROTTLE_ISOLATED      (1 << VMSCAN_THROTTLE_ISOLATED)
>
>  #define show_throttle_flags(flags)                                             \
>         (flags) ? __print_flags(flags, "|",                                     \
> -               {_VMSCAN_THROTTLE_WRITEBACK,    "VMSCAN_THROTTLE_WRITEBACK"}    \
> +               {_VMSCAN_THROTTLE_WRITEBACK,    "VMSCAN_THROTTLE_WRITEBACK"},   \
> +               {_VMSCAN_THROTTLE_ISOLATED,     "VMSCAN_THROTTLE_ISOLATED"}     \
>                 ) : "VMSCAN_THROTTLE_NONE"
>
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index bfc93da1c2c7..7359093d8ac0 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -761,6 +761,8 @@ isolate_freepages_range(struct compact_control *cc,
>  /* Similar to reclaim, but different enough that they don't share logic */
>  static bool too_many_isolated(pg_data_t *pgdat)
>  {
> +       bool too_many;
> +
>         unsigned long active, inactive, isolated;
>
>         inactive = node_page_state(pgdat, NR_INACTIVE_FILE) +
> @@ -770,7 +772,11 @@ static bool too_many_isolated(pg_data_t *pgdat)
>         isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
>                         node_page_state(pgdat, NR_ISOLATED_ANON);
>
> -       return isolated > (inactive + active) / 2;
> +       too_many = isolated > (inactive + active) / 2;
> +       if (!too_many)
> +               wake_throttle_isolated(pgdat);
> +
> +       return too_many;
>  }
>
>  /**
> @@ -822,7 +828,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>                 if (cc->mode == MIGRATE_ASYNC)
>                         return -EAGAIN;
>
> -               congestion_wait(BLK_RW_ASYNC, HZ/10);
> +               reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10);
>
>                 if (fatal_signal_pending(current))
>                         return -EINTR;
> diff --git a/mm/internal.h b/mm/internal.h
> index 90764d646e02..3461a1055975 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -39,12 +39,21 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>  static inline void acct_reclaim_writeback(struct page *page)
>  {
>         pg_data_t *pgdat = page_pgdat(page);
> -       int nr_throttled = atomic_read(&pgdat->nr_reclaim_throttled);
> +       int nr_throttled = atomic_read(&pgdat->nr_writeback_throttled);
>
>         if (nr_throttled)
>                 __acct_reclaim_writeback(pgdat, page, nr_throttled);
>  }
>
> +static inline void wake_throttle_isolated(pg_data_t *pgdat)
> +{
> +       wait_queue_head_t *wqh;
> +
> +       wqh = &pgdat->reclaim_wait[VMSCAN_THROTTLE_ISOLATED];
> +       if (waitqueue_active(wqh))
> +               wake_up_all(wqh);
> +}
> +
>  vm_fault_t do_swap_page(struct vm_fault *vmf);
>
>  void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma,
> @@ -120,6 +129,8 @@ extern unsigned long highest_memmap_pfn;
>   */
>  extern int isolate_lru_page(struct page *page);
>  extern void putback_lru_page(struct page *page);
> +extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
> +                                                               long timeout);
>
>  /*
>   * in mm/rmap.c:
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d849ddfc1e51..78e538067651 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7389,6 +7389,8 @@ static void pgdat_init_kcompactd(struct pglist_data *pgdat) {}
>
>  static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
>  {
> +       int i;
> +
>         pgdat_resize_init(pgdat);
>
>         pgdat_init_split_queue(pgdat);
> @@ -7396,7 +7398,9 @@ static void __meminit pgdat_init_internals(struct pglist_data *pgdat)
>
>         init_waitqueue_head(&pgdat->kswapd_wait);
>         init_waitqueue_head(&pgdat->pfmemalloc_wait);
> -       init_waitqueue_head(&pgdat->reclaim_wait);
> +
> +       for (i = 0; i < NR_VMSCAN_THROTTLE; i++)
> +               init_waitqueue_head(&pgdat->reclaim_wait[i]);
>
>         pgdat_page_ext_init(pgdat);
>         lruvec_init(&pgdat->__lruvec);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 735b1f2b5d9e..29434d4fc1c7 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1006,12 +1006,12 @@ static void handle_write_error(struct address_space *mapping,
>         unlock_page(page);
>  }
>
> -static void
> -reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
> +void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>                                                         long timeout)
>  {
> -       wait_queue_head_t *wqh = &pgdat->reclaim_wait;
> +       wait_queue_head_t *wqh = &pgdat->reclaim_wait[reason];
>         long ret;
> +       bool acct_writeback = (reason == VMSCAN_THROTTLE_WRITEBACK);
>         DEFINE_WAIT(wait);
>
>         /*
> @@ -1023,7 +1023,8 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>             current->flags & (PF_IO_WORKER|PF_KTHREAD))
>                 return;
>
> -       if (atomic_inc_return(&pgdat->nr_reclaim_throttled) == 1) {
> +       if (acct_writeback &&
> +           atomic_inc_return(&pgdat->nr_writeback_throttled) == 1) {
>                 WRITE_ONCE(pgdat->nr_reclaim_start,
>                         node_page_state(pgdat, NR_THROTTLED_WRITTEN));
>         }
> @@ -1031,7 +1032,9 @@ reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason,
>         prepare_to_wait(wqh, &wait, TASK_UNINTERRUPTIBLE);
>         ret = schedule_timeout(timeout);
>         finish_wait(wqh, &wait);
> -       atomic_dec(&pgdat->nr_reclaim_throttled);
> +
> +       if (acct_writeback)
> +               atomic_dec(&pgdat->nr_writeback_throttled);
>
>         trace_mm_vmscan_throttled(pgdat->node_id, jiffies_to_usecs(timeout),
>                                 jiffies_to_usecs(timeout - ret),
> @@ -1061,7 +1064,7 @@ void __acct_reclaim_writeback(pg_data_t *pgdat, struct page *page,
>                 READ_ONCE(pgdat->nr_reclaim_start);
>
>         if (nr_written > SWAP_CLUSTER_MAX * nr_throttled)
> -               wake_up_all(&pgdat->reclaim_wait);
> +               wake_up_all(&pgdat->reclaim_wait[VMSCAN_THROTTLE_WRITEBACK]);
>  }
>
>  /* possible outcome of pageout() */
> @@ -2176,6 +2179,7 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
>                 struct scan_control *sc)
>  {
>         unsigned long inactive, isolated;
> +       bool too_many;
>
>         if (current_is_kswapd())
>                 return 0;
> @@ -2199,7 +2203,13 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
>         if ((sc->gfp_mask & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS))
>                 inactive >>= 3;
>
> -       return isolated > inactive;
> +       too_many = isolated > inactive;
> +
> +       /* Wake up tasks throttled due to too_many_isolated. */
> +       if (!too_many)
> +               wake_throttle_isolated(pgdat);
> +
> +       return too_many;
>  }
>
>  /*
> @@ -2308,8 +2318,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>                         return 0;
>
>                 /* wait a bit for the reclaimer. */
> -               msleep(100);
>                 stalled = true;
> +               reclaim_throttle(pgdat, VMSCAN_THROTTLE_ISOLATED, HZ/10);
>
>                 /* We are about to die and free our memory. Return now. */
>                 if (fatal_signal_pending(current))
> @@ -4343,7 +4353,7 @@ static int kswapd(void *p)
>
>         WRITE_ONCE(pgdat->kswapd_order, 0);
>         WRITE_ONCE(pgdat->kswapd_highest_zoneidx, MAX_NR_ZONES);
> -       atomic_set(&pgdat->nr_reclaim_throttled, 0);
> +       atomic_set(&pgdat->nr_writeback_throttled, 0);
>         for ( ; ; ) {
>                 bool ret;
>
> --
> 2.31.1
>
>

  reply	other threads:[~2021-10-19 17:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-19  9:01 [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-19  9:01 ` [PATCH 1/8] mm/vmscan: Throttle reclaim until some writeback completes if congested Mel Gorman
2021-10-19  9:01 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-19 17:12   ` Yang Shi [this message]
2021-10-19  9:01 ` [PATCH 3/8] mm/vmscan: Throttle reclaim when no progress is being made Mel Gorman
2021-10-19  9:01 ` [PATCH 4/8] mm/writeback: Throttle based on page writeback instead of congestion Mel Gorman
2021-10-19  9:01 ` [PATCH 5/8] mm/page_alloc: Remove the throttling logic from the page allocator Mel Gorman
2021-10-19  9:20   ` Vlastimil Babka
2021-10-19  9:01 ` [PATCH 6/8] mm/vmscan: Centralise timeout values for reclaim_throttle Mel Gorman
2021-10-22  1:06   ` NeilBrown
2021-10-22  8:12     ` Mel Gorman
2021-10-19  9:01 ` [PATCH 7/8] mm/vmscan: Increase the timeout if page reclaim is not making progress Mel Gorman
2021-10-22  1:07   ` NeilBrown
2021-10-22  8:14     ` Mel Gorman
2021-10-19  9:01 ` [PATCH 8/8] mm/vmscan: Delay waking of tasks throttled on NOPROGRESS Mel Gorman
2021-10-19 22:00 ` [PATCH v4 0/8] Remove dependency on congestion_wait in mm/ Andrew Morton
2021-10-20  8:44   ` Mel Gorman
2021-10-22  1:15 ` NeilBrown
2021-10-22  8:39   ` Mel Gorman
2021-10-22 11:26     ` NeilBrown
2021-10-22 13:17       ` Mel Gorman
2021-10-27  0:43         ` NeilBrown
2021-10-27 10:13           ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2021-10-22 14:46 [PATCH v5 " Mel Gorman
2021-10-22 14:46 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-08 13:53 [PATCH v3 0/8] Remove dependency on congestion_wait in mm/ Mel Gorman
2021-10-08 13:53 ` [PATCH 2/8] mm/vmscan: Throttle reclaim and compaction when too may pages are isolated Mel Gorman
2021-10-14  8:06   ` Vlastimil Babka
2021-10-14 11:56     ` Mel Gorman
2021-10-14 15:44       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrpuB=YTHVXDdVUhqh11AiU2W53f8QsYetAEigEiA+m+w@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=neilb@suse.de \
    --cc=riel@surriel.com \
    --cc=tytso@mit.edu \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).