From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA26BC433E1 for ; Wed, 17 Jun 2020 13:37:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 926382158C for ; Wed, 17 Jun 2020 13:37:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="AQDHF9j1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727812AbgFQNhg (ORCPT ); Wed, 17 Jun 2020 09:37:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726953AbgFQNhf (ORCPT ); Wed, 17 Jun 2020 09:37:35 -0400 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87654C06174E for ; Wed, 17 Jun 2020 06:37:34 -0700 (PDT) Received: by mail-lj1-x243.google.com with SMTP id q19so2921712lji.2 for ; Wed, 17 Jun 2020 06:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=AQDHF9j1wumEdOF8jW/1ZD+wMd/8SpKXUSrLOHAhi0A44X1JaVVNvxPIvK23AWFQl1 lcfd2qLeHyI1mOR56hQwvpPqFkxdvrqlLEDd85my4N1iUDfAlgcF8AzsgKWBK3zrgNCe FfcZd/NGkaf2Mw5/0Wi7/NN8GlYj/dE1H3Y2K/UdByyIhfAVRQSJIdp1UXJN3EirQcCx gGhwwgc+eOdRDRZohnKU4obNcVnQwSt7TB0Y31V6blCHXIgexA++Mbo6gO8D5gHDGqpn e08UA7hYCS3YGjFr+hICEEExE+bXh2EV+2ZuVFnYmW334FTROJKMWUws6mNfrG1VlVfC NXPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=CNmuyLcXbGJRE/dApXeMmeLYs4DpckuMzsC8ZiSBJIoyNJRrTmCowq/HFxjWDobisI ylMJpwVHBbNLDr2zCuMZpQe4gibZ0wgw+3eg/VvDXSTB/E6NkUKGPH0GCk0zjF7x4nL3 o0O72ZqhkwcXoT0VRSDYmLJQaw7A8jEbRVWowQedHRIKa/H+uv4fPpFywjtJi0MtWfNH CdlSQi5TsuP6oM8q3qB9hwaAeGFkIJ+mo6MGC19ZGGvKR+bEcE2VeOpH3MkA5sPpz0st PLzTJUB5bIw/6swD3XFHq8A2fp2JLFfin9dOe5UwT/eAPf3YajtAhcSEdu6OFPjUHdNj htLA== X-Gm-Message-State: AOAM533sX5tBTcv2qL+HqQt8HIWopvXtje7K8pAZD8eciS+yLXsLXXET ZWyoP54IBWcEOzwxr8hNd4pV2IJtJbJtk6qKMMFUIg== X-Google-Smtp-Source: ABdhPJwA6fF2gcH8wQw09CP/KfWtNPcawF+FUlNNP2wgH6FMgwt35Wa0rYTio1e8v/dCRtNDcHsmii7KXDmm4uILcaA= X-Received: by 2002:a2e:911:: with SMTP id 17mr4347007ljj.411.1592401052747; Wed, 17 Jun 2020 06:37:32 -0700 (PDT) MIME-Version: 1.0 References: <20200501135806.4eebf0b92f84ab60bba3e1e7@linux-foundation.org> <20200519075213.GF32497@dhcp22.suse.cz> <20200519084535.GG32497@dhcp22.suse.cz> <20200520190906.GA558281@chrisdown.name> <20200521095515.GK6462@dhcp22.suse.cz> <20200521163450.GV6462@dhcp22.suse.cz> In-Reply-To: <20200521163450.GV6462@dhcp22.suse.cz> From: Naresh Kamboju Date: Wed, 17 Jun 2020 19:07:20 +0530 Message-ID: Subject: Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page To: Michal Hocko , Chris Down , Yafang Shao Cc: Anders Roxell , "Linux F2FS DEV, Mailing List" , linux-ext4 , linux-block , Andrew Morton , open list , Linux-Next Mailing List , linux-mm , Arnd Bergmann , Andreas Dilger , Jaegeuk Kim , "Theodore Ts'o" , Chao Yu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Chao Yu , lkft-triage@lists.linaro.org, Johannes Weiner , Roman Gushchin , Cgroups Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Thu, 21 May 2020 at 22:04, Michal Hocko wrote: > > On Thu 21-05-20 11:55:16, Michal Hocko wrote: > > On Wed 20-05-20 20:09:06, Chris Down wrote: > > > Hi Naresh, > > > > > > Naresh Kamboju writes: > > > > As a part of investigation on this issue LKFT teammate Anders Roxell > > > > git bisected the problem and found bad commit(s) which caused this problem. > > > > > > > > The following two patches have been reverted on next-20200519 and retested the > > > > reproducible steps and confirmed the test case mkfs -t ext4 got PASS. > > > > ( invoked oom-killer is gone now) > > > > > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > > > protection" > > > > This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6. > > > > > > > > Revert "mm, memcg: decouple e{low,min} state mutations from protection > > > > checks" > > > > This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82. > > > > > > Thanks Anders and Naresh for tracking this down and reverting. > > > > > > I'll take a look tomorrow. I don't see anything immediately obviously wrong > > > in either of those commits from a (very) cursory glance, but they should > > > only be taking effect if protections are set. > > > > Agreed. If memory.{low,min} is not used then the patch should be > > effectively a nop. > > I was staring into the code and do not see anything. Could you give the > following debugging patch a try and see whether it triggers? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cc555903a332..df2e8df0eb71 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * sc->priority further than desirable. > */ > scan = max(scan, SWAP_CLUSTER_MAX); > + > + trace_printk("scan:%lu protection:%lu\n", scan, protection); > } else { > scan = lruvec_size; > } > @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(memcg)) { > + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; As per your suggestions on debugging this problem, trace_printk is replaced with printk and applied to your patch on top of the problematic kernel and here is the test output and link. mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0 Full test log link, https://lkft.validation.linaro.org/scheduler/job/1497412#L1451 - Naresh