From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=ID7B=Y7=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-14.3 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,
	MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,
	USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5D261FC6194
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 02:51:36 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id F1B8C217D7
	for <linux-mm@archiver.kernel.org>; Thu,  7 Nov 2019 02:51:35 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="NoyhNB55"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F1B8C217D7
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 9FE976B000C; Wed,  6 Nov 2019 21:51:35 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9AF066B000D; Wed,  6 Nov 2019 21:51:35 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 8C5996B000E; Wed,  6 Nov 2019 21:51:35 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13])
	by kanga.kvack.org (Postfix) with ESMTP id 75D946B000C
	for <linux-mm@kvack.org>; Wed,  6 Nov 2019 21:51:35 -0500 (EST)
Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with SMTP id 34244181AEF1F
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 02:51:35 +0000 (UTC)
X-FDA: 76127955750.25.fruit80_dc44ec1dea58
X-HE-Tag: fruit80_dc44ec1dea58
X-Filterd-Recvd-Size: 19363
Received: from mail-oi1-f193.google.com (mail-oi1-f193.google.com [209.85.167.193])
	by imf17.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Thu,  7 Nov 2019 02:51:34 +0000 (UTC)
Received: by mail-oi1-f193.google.com with SMTP id j7so671770oib.3
        for <linux-mm@kvack.org>; Wed, 06 Nov 2019 18:51:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=vV4g22IvRsy9ZewgLM3I0CDMQ/+H7R7jQFrGMyrOPjc=;
        b=NoyhNB55MIZCIOG0XfMZmoN8uSPTzSusIA7XecKoZhUDQezKTzuJld9ththyRbJnq0
         YoW/jJBrijPi7ms2JwP07f9nkbJJRUAglH4f/IeCmcNLdbhLZ0SIHvLZuIs5cRT0ehl+
         n50X/4oeTHI0Sn5QsBvHQ1qcrPufmpkZY4WovCjZ+GZd9BiYpdLB2/7bY5rJ+Y2GFoU9
         6hAGJxH9bJYXBKOSukSj4RE6w8VwpkGJCIZsXJsCQwtuP6THgrW8Jl8Ty+HH+As9JehL
         vyiXcKLS8c3m/i8JE/44hS5fv/cd8+08lJJgG4qjvXBScJ3Ue2cRln03g80aqLw21VtC
         M74w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=vV4g22IvRsy9ZewgLM3I0CDMQ/+H7R7jQFrGMyrOPjc=;
        b=JZkyVsiG2SNO5VxCkMeCxu0/GM/2rzv8+rgVv2jN9S6R2lrOW94e5ppXSminPKOCbD
         6ta6TzKO/vYCeGI+0UQOhUVmgBcT3GU5y9rP/KjOt3W3UeMNFPMVBeKD2vapfQhIt7de
         FvCpfSemd4byml6y1i74Ad65KHb0wlzr7NlwzFHnr35BZv7Y6+oIU8D/KJQn+Xj5FMGu
         bfKziR7GlVCbs2EAIhm4LVDz3aeoAwquUyGW1zFfUxMOjugviDaNlnQbP39UZ8L3e3/C
         ohLdRjgZKanGM3ZmNLPkCy1oVMs5PqMxKAz1PyOxswIDJy6cEivyUVKqOE85Azu7Jsom
         mLKg==
X-Gm-Message-State: APjAAAVgmrOrC+MYWNTR1oKhHlVTBIJ35XHONiBOhhk71vmD8VOZe7yt
	A7G2FHID4Ufb17Mi7n3lZA5zGL8Lyxh5pZOKIACKlQhJ
X-Google-Smtp-Source: APXvYqw1Q9OnVKchm8XCBthdRrIE4hL8As5AEttf6HR4+lXMPtPmTRfGSJT2W3m0DCJqs/36D7EzQZ1b/+RJK0FXJ7M=
X-Received: by 2002:a05:6808:9ae:: with SMTP id e14mr1111258oig.79.1573095093430;
 Wed, 06 Nov 2019 18:51:33 -0800 (PST)
MIME-Version: 1.0
References: <20190603210746.15800-1-hannes@cmpxchg.org> <20190603210746.15800-6-hannes@cmpxchg.org>
In-Reply-To: <20190603210746.15800-6-hannes@cmpxchg.org>
From: Shakeel Butt <shakeelb@google.com>
Date: Wed, 6 Nov 2019 18:51:22 -0800
Message-ID: <CALvZod41w1XwwEhKoeSTJet1+WO8FXf3M_B4B08Q0DrbR51M0w@mail.gmail.com>
Subject: Re: [PATCH 05/11] mm: vmscan: replace shrink_node() loop with a retry jump
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>, Andrey Ryabinin <aryabinin@virtuozzo.com>, 
	Suren Baghdasaryan <surenb@google.com>, Michal Hocko <mhocko@suse.com>, Linux MM <linux-mm@kvack.org>, 
	Cgroups <cgroups@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, 
	Kernel Team <kernel-team@fb.com>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jun 3, 2019 at 3:05 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> Most of the function body is inside a loop, which imposes an
> additional indentation and scoping level that makes the code a bit
> hard to follow and modify.
>
> The looping only happens in case of reclaim-compaction, which isn't
> the common case. So rather than adding yet another function level to
> the reclaim path and have every reclaim invocation go through a level
> that only exists for one specific cornercase, use a retry goto.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Shakeel Butt <shakeelb@google.com>


> ---
>  mm/vmscan.c | 266 ++++++++++++++++++++++++++--------------------------
>  1 file changed, 133 insertions(+), 133 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index afd5e2432a8e..304974481146 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2672,164 +2672,164 @@ static bool pgdat_memcg_congested(pg_data_t *pgdat, struct mem_cgroup *memcg)
>  static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  {
>         struct reclaim_state *reclaim_state = current->reclaim_state;
> +       struct mem_cgroup *root = sc->target_mem_cgroup;
> +       struct mem_cgroup_reclaim_cookie reclaim = {
> +               .pgdat = pgdat,
> +               .priority = sc->priority,
> +       };
>         unsigned long nr_reclaimed, nr_scanned;
>         bool reclaimable = false;
> +       struct mem_cgroup *memcg;
>
> -       do {
> -               struct mem_cgroup *root = sc->target_mem_cgroup;
> -               struct mem_cgroup_reclaim_cookie reclaim = {
> -                       .pgdat = pgdat,
> -                       .priority = sc->priority,
> -               };
> -               struct mem_cgroup *memcg;
> -
> -               memset(&sc->nr, 0, sizeof(sc->nr));
> +again:
> +       memset(&sc->nr, 0, sizeof(sc->nr));
>
> -               nr_reclaimed = sc->nr_reclaimed;
> -               nr_scanned = sc->nr_scanned;
> +       nr_reclaimed = sc->nr_reclaimed;
> +       nr_scanned = sc->nr_scanned;
>
> -               memcg = mem_cgroup_iter(root, NULL, &reclaim);
> -               do {
> -                       unsigned long reclaimed;
> -                       unsigned long scanned;
> +       memcg = mem_cgroup_iter(root, NULL, &reclaim);
> +       do {
> +               unsigned long reclaimed;
> +               unsigned long scanned;
>
> -                       switch (mem_cgroup_protected(root, memcg)) {
> -                       case MEMCG_PROT_MIN:
> -                               /*
> -                                * Hard protection.
> -                                * If there is no reclaimable memory, OOM.
> -                                */
> +               switch (mem_cgroup_protected(root, memcg)) {
> +               case MEMCG_PROT_MIN:
> +                       /*
> +                        * Hard protection.
> +                        * If there is no reclaimable memory, OOM.
> +                        */
> +                       continue;
> +               case MEMCG_PROT_LOW:
> +                       /*
> +                        * Soft protection.
> +                        * Respect the protection only as long as
> +                        * there is an unprotected supply
> +                        * of reclaimable memory from other cgroups.
> +                        */
> +                       if (!sc->memcg_low_reclaim) {
> +                               sc->memcg_low_skipped = 1;
>                                 continue;
> -                       case MEMCG_PROT_LOW:
> -                               /*
> -                                * Soft protection.
> -                                * Respect the protection only as long as
> -                                * there is an unprotected supply
> -                                * of reclaimable memory from other cgroups.
> -                                */
> -                               if (!sc->memcg_low_reclaim) {
> -                                       sc->memcg_low_skipped = 1;
> -                                       continue;
> -                               }
> -                               memcg_memory_event(memcg, MEMCG_LOW);
> -                               break;
> -                       case MEMCG_PROT_NONE:
> -                               /*
> -                                * All protection thresholds breached. We may
> -                                * still choose to vary the scan pressure
> -                                * applied based on by how much the cgroup in
> -                                * question has exceeded its protection
> -                                * thresholds (see get_scan_count).
> -                                */
> -                               break;
>                         }
> -
> -                       reclaimed = sc->nr_reclaimed;
> -                       scanned = sc->nr_scanned;
> -                       shrink_node_memcg(pgdat, memcg, sc);
> -
> -                       if (sc->may_shrinkslab) {
> -                               shrink_slab(sc->gfp_mask, pgdat->node_id,
> -                                   memcg, sc->priority);
> -                       }
> -
> -                       /* Record the group's reclaim efficiency */
> -                       vmpressure(sc->gfp_mask, memcg, false,
> -                                  sc->nr_scanned - scanned,
> -                                  sc->nr_reclaimed - reclaimed);
> -
> +                       memcg_memory_event(memcg, MEMCG_LOW);
> +                       break;
> +               case MEMCG_PROT_NONE:
>                         /*
> -                        * Kswapd have to scan all memory cgroups to fulfill
> -                        * the overall scan target for the node.
> -                        *
> -                        * Limit reclaim, on the other hand, only cares about
> -                        * nr_to_reclaim pages to be reclaimed and it will
> -                        * retry with decreasing priority if one round over the
> -                        * whole hierarchy is not sufficient.
> +                        * All protection thresholds breached. We may
> +                        * still choose to vary the scan pressure
> +                        * applied based on by how much the cgroup in
> +                        * question has exceeded its protection
> +                        * thresholds (see get_scan_count).
>                          */
> -                       if (!current_is_kswapd() &&
> -                                       sc->nr_reclaimed >= sc->nr_to_reclaim) {
> -                               mem_cgroup_iter_break(root, memcg);
> -                               break;
> -                       }
> -               } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
> +                       break;
> +               }
> +
> +               reclaimed = sc->nr_reclaimed;
> +               scanned = sc->nr_scanned;
> +               shrink_node_memcg(pgdat, memcg, sc);
>
> -               if (reclaim_state) {
> -                       sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> -                       reclaim_state->reclaimed_slab = 0;
> +               if (sc->may_shrinkslab) {
> +                       shrink_slab(sc->gfp_mask, pgdat->node_id,
> +                                   memcg, sc->priority);
>                 }
>
> -               /* Record the subtree's reclaim efficiency */
> -               vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> -                          sc->nr_scanned - nr_scanned,
> -                          sc->nr_reclaimed - nr_reclaimed);
> +               /* Record the group's reclaim efficiency */
> +               vmpressure(sc->gfp_mask, memcg, false,
> +                          sc->nr_scanned - scanned,
> +                          sc->nr_reclaimed - reclaimed);
>
> -               if (sc->nr_reclaimed - nr_reclaimed)
> -                       reclaimable = true;
> +               /*
> +                * Kswapd have to scan all memory cgroups to fulfill
> +                * the overall scan target for the node.
> +                *
> +                * Limit reclaim, on the other hand, only cares about
> +                * nr_to_reclaim pages to be reclaimed and it will
> +                * retry with decreasing priority if one round over the
> +                * whole hierarchy is not sufficient.
> +                */
> +               if (!current_is_kswapd() &&
> +                   sc->nr_reclaimed >= sc->nr_to_reclaim) {
> +                       mem_cgroup_iter_break(root, memcg);
> +                       break;
> +               }
> +       } while ((memcg = mem_cgroup_iter(root, memcg, &reclaim)));
>
> -               if (current_is_kswapd()) {
> -                       /*
> -                        * If reclaim is isolating dirty pages under writeback,
> -                        * it implies that the long-lived page allocation rate
> -                        * is exceeding the page laundering rate. Either the
> -                        * global limits are not being effective at throttling
> -                        * processes due to the page distribution throughout
> -                        * zones or there is heavy usage of a slow backing
> -                        * device. The only option is to throttle from reclaim
> -                        * context which is not ideal as there is no guarantee
> -                        * the dirtying process is throttled in the same way
> -                        * balance_dirty_pages() manages.
> -                        *
> -                        * Once a node is flagged PGDAT_WRITEBACK, kswapd will
> -                        * count the number of pages under pages flagged for
> -                        * immediate reclaim and stall if any are encountered
> -                        * in the nr_immediate check below.
> -                        */
> -                       if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
> -                               set_bit(PGDAT_WRITEBACK, &pgdat->flags);
> +       if (reclaim_state) {
> +               sc->nr_reclaimed += reclaim_state->reclaimed_slab;
> +               reclaim_state->reclaimed_slab = 0;
> +       }
>
> -                       /*
> -                        * Tag a node as congested if all the dirty pages
> -                        * scanned were backed by a congested BDI and
> -                        * wait_iff_congested will stall.
> -                        */
> -                       if (sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> -                               set_bit(PGDAT_CONGESTED, &pgdat->flags);
> +       /* Record the subtree's reclaim efficiency */
> +       vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true,
> +                  sc->nr_scanned - nr_scanned,
> +                  sc->nr_reclaimed - nr_reclaimed);
>
> -                       /* Allow kswapd to start writing pages during reclaim.*/
> -                       if (sc->nr.unqueued_dirty == sc->nr.file_taken)
> -                               set_bit(PGDAT_DIRTY, &pgdat->flags);
> +       if (sc->nr_reclaimed - nr_reclaimed)
> +               reclaimable = true;
>
> -                       /*
> -                        * If kswapd scans pages marked marked for immediate
> -                        * reclaim and under writeback (nr_immediate), it
> -                        * implies that pages are cycling through the LRU
> -                        * faster than they are written so also forcibly stall.
> -                        */
> -                       if (sc->nr.immediate)
> -                               congestion_wait(BLK_RW_ASYNC, HZ/10);
> -               }
> +       if (current_is_kswapd()) {
> +               /*
> +                * If reclaim is isolating dirty pages under writeback,
> +                * it implies that the long-lived page allocation rate
> +                * is exceeding the page laundering rate. Either the
> +                * global limits are not being effective at throttling
> +                * processes due to the page distribution throughout
> +                * zones or there is heavy usage of a slow backing
> +                * device. The only option is to throttle from reclaim
> +                * context which is not ideal as there is no guarantee
> +                * the dirtying process is throttled in the same way
> +                * balance_dirty_pages() manages.
> +                *
> +                * Once a node is flagged PGDAT_WRITEBACK, kswapd will
> +                * count the number of pages under pages flagged for
> +                * immediate reclaim and stall if any are encountered
> +                * in the nr_immediate check below.
> +                */
> +               if (sc->nr.writeback && sc->nr.writeback == sc->nr.taken)
> +                       set_bit(PGDAT_WRITEBACK, &pgdat->flags);
>
>                 /*
> -                * Legacy memcg will stall in page writeback so avoid forcibly
> -                * stalling in wait_iff_congested().
> +                * Tag a node as congested if all the dirty pages
> +                * scanned were backed by a congested BDI and
> +                * wait_iff_congested will stall.
>                  */
> -               if (cgroup_reclaim(sc) && writeback_working(sc) &&
> -                   sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> -                       set_memcg_congestion(pgdat, root, true);
> +               if (sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> +                       set_bit(PGDAT_CONGESTED, &pgdat->flags);
> +
> +               /* Allow kswapd to start writing pages during reclaim.*/
> +               if (sc->nr.unqueued_dirty == sc->nr.file_taken)
> +                       set_bit(PGDAT_DIRTY, &pgdat->flags);
>
>                 /*
> -                * Stall direct reclaim for IO completions if underlying BDIs
> -                * and node is congested. Allow kswapd to continue until it
> -                * starts encountering unqueued dirty pages or cycling through
> -                * the LRU too quickly.
> +                * If kswapd scans pages marked marked for immediate
> +                * reclaim and under writeback (nr_immediate), it
> +                * implies that pages are cycling through the LRU
> +                * faster than they are written so also forcibly stall.
>                  */
> -               if (!sc->hibernation_mode && !current_is_kswapd() &&
> -                  current_may_throttle() && pgdat_memcg_congested(pgdat, root))
> -                       wait_iff_congested(BLK_RW_ASYNC, HZ/10);
> +               if (sc->nr.immediate)
> +                       congestion_wait(BLK_RW_ASYNC, HZ/10);
> +       }
> +
> +       /*
> +        * Legacy memcg will stall in page writeback so avoid forcibly
> +        * stalling in wait_iff_congested().
> +        */
> +       if (cgroup_reclaim(sc) && writeback_working(sc) &&
> +           sc->nr.dirty && sc->nr.dirty == sc->nr.congested)
> +               set_memcg_congestion(pgdat, root, true);
> +
> +       /*
> +        * Stall direct reclaim for IO completions if underlying BDIs
> +        * and node is congested. Allow kswapd to continue until it
> +        * starts encountering unqueued dirty pages or cycling through
> +        * the LRU too quickly.
> +        */
> +       if (!sc->hibernation_mode && !current_is_kswapd() &&
> +           current_may_throttle() && pgdat_memcg_congested(pgdat, root))
> +               wait_iff_congested(BLK_RW_ASYNC, HZ/10);
>
> -       } while (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> -                                        sc->nr_scanned - nr_scanned, sc));
> +       if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed,
> +                                   sc->nr_scanned - nr_scanned, sc))
> +               goto again;
>
>         /*
>          * Kswapd gives up on balancing particular nodes after too
> --
> 2.21.0
>