From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA26BC433E1 for ; Wed, 17 Jun 2020 13:37:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 926382158C for ; Wed, 17 Jun 2020 13:37:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="AQDHF9j1" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727812AbgFQNhg (ORCPT ); Wed, 17 Jun 2020 09:37:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51008 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726953AbgFQNhf (ORCPT ); Wed, 17 Jun 2020 09:37:35 -0400 Received: from mail-lj1-x243.google.com (mail-lj1-x243.google.com [IPv6:2a00:1450:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87654C06174E for ; Wed, 17 Jun 2020 06:37:34 -0700 (PDT) Received: by mail-lj1-x243.google.com with SMTP id q19so2921712lji.2 for ; Wed, 17 Jun 2020 06:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=AQDHF9j1wumEdOF8jW/1ZD+wMd/8SpKXUSrLOHAhi0A44X1JaVVNvxPIvK23AWFQl1 lcfd2qLeHyI1mOR56hQwvpPqFkxdvrqlLEDd85my4N1iUDfAlgcF8AzsgKWBK3zrgNCe FfcZd/NGkaf2Mw5/0Wi7/NN8GlYj/dE1H3Y2K/UdByyIhfAVRQSJIdp1UXJN3EirQcCx gGhwwgc+eOdRDRZohnKU4obNcVnQwSt7TB0Y31V6blCHXIgexA++Mbo6gO8D5gHDGqpn e08UA7hYCS3YGjFr+hICEEExE+bXh2EV+2ZuVFnYmW334FTROJKMWUws6mNfrG1VlVfC NXPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=CNmuyLcXbGJRE/dApXeMmeLYs4DpckuMzsC8ZiSBJIoyNJRrTmCowq/HFxjWDobisI ylMJpwVHBbNLDr2zCuMZpQe4gibZ0wgw+3eg/VvDXSTB/E6NkUKGPH0GCk0zjF7x4nL3 o0O72ZqhkwcXoT0VRSDYmLJQaw7A8jEbRVWowQedHRIKa/H+uv4fPpFywjtJi0MtWfNH CdlSQi5TsuP6oM8q3qB9hwaAeGFkIJ+mo6MGC19ZGGvKR+bEcE2VeOpH3MkA5sPpz0st PLzTJUB5bIw/6swD3XFHq8A2fp2JLFfin9dOe5UwT/eAPf3YajtAhcSEdu6OFPjUHdNj htLA== X-Gm-Message-State: AOAM533sX5tBTcv2qL+HqQt8HIWopvXtje7K8pAZD8eciS+yLXsLXXET ZWyoP54IBWcEOzwxr8hNd4pV2IJtJbJtk6qKMMFUIg== X-Google-Smtp-Source: ABdhPJwA6fF2gcH8wQw09CP/KfWtNPcawF+FUlNNP2wgH6FMgwt35Wa0rYTio1e8v/dCRtNDcHsmii7KXDmm4uILcaA= X-Received: by 2002:a2e:911:: with SMTP id 17mr4347007ljj.411.1592401052747; Wed, 17 Jun 2020 06:37:32 -0700 (PDT) MIME-Version: 1.0 References: <20200501135806.4eebf0b92f84ab60bba3e1e7@linux-foundation.org> <20200519075213.GF32497@dhcp22.suse.cz> <20200519084535.GG32497@dhcp22.suse.cz> <20200520190906.GA558281@chrisdown.name> <20200521095515.GK6462@dhcp22.suse.cz> <20200521163450.GV6462@dhcp22.suse.cz> In-Reply-To: <20200521163450.GV6462@dhcp22.suse.cz> From: Naresh Kamboju Date: Wed, 17 Jun 2020 19:07:20 +0530 Message-ID: Subject: Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page To: Michal Hocko , Chris Down , Yafang Shao Cc: Anders Roxell , "Linux F2FS DEV, Mailing List" , linux-ext4 , linux-block , Andrew Morton , open list , Linux-Next Mailing List , linux-mm , Arnd Bergmann , Andreas Dilger , Jaegeuk Kim , "Theodore Ts'o" , Chao Yu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Chao Yu , lkft-triage@lists.linaro.org, Johannes Weiner , Roman Gushchin , Cgroups Content-Type: text/plain; charset="UTF-8" Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Thu, 21 May 2020 at 22:04, Michal Hocko wrote: > > On Thu 21-05-20 11:55:16, Michal Hocko wrote: > > On Wed 20-05-20 20:09:06, Chris Down wrote: > > > Hi Naresh, > > > > > > Naresh Kamboju writes: > > > > As a part of investigation on this issue LKFT teammate Anders Roxell > > > > git bisected the problem and found bad commit(s) which caused this problem. > > > > > > > > The following two patches have been reverted on next-20200519 and retested the > > > > reproducible steps and confirmed the test case mkfs -t ext4 got PASS. > > > > ( invoked oom-killer is gone now) > > > > > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > > > protection" > > > > This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6. > > > > > > > > Revert "mm, memcg: decouple e{low,min} state mutations from protection > > > > checks" > > > > This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82. > > > > > > Thanks Anders and Naresh for tracking this down and reverting. > > > > > > I'll take a look tomorrow. I don't see anything immediately obviously wrong > > > in either of those commits from a (very) cursory glance, but they should > > > only be taking effect if protections are set. > > > > Agreed. If memory.{low,min} is not used then the patch should be > > effectively a nop. > > I was staring into the code and do not see anything. Could you give the > following debugging patch a try and see whether it triggers? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cc555903a332..df2e8df0eb71 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * sc->priority further than desirable. > */ > scan = max(scan, SWAP_CLUSTER_MAX); > + > + trace_printk("scan:%lu protection:%lu\n", scan, protection); > } else { > scan = lruvec_size; > } > @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(memcg)) { > + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; As per your suggestions on debugging this problem, trace_printk is replaced with printk and applied to your patch on top of the problematic kernel and here is the test output and link. mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0 Full test log link, https://lkft.validation.linaro.org/scheduler/job/1497412#L1451 - Naresh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89CA2C433E0 for ; Wed, 17 Jun 2020 13:37:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E8ED2168B for ; Wed, 17 Jun 2020 13:37:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="AQDHF9j1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E8ED2168B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C557D6B000C; Wed, 17 Jun 2020 09:37:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C06766B000E; Wed, 17 Jun 2020 09:37:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ACE876B0010; Wed, 17 Jun 2020 09:37:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0138.hostedemail.com [216.40.44.138]) by kanga.kvack.org (Postfix) with ESMTP id 91D0C6B000C for ; Wed, 17 Jun 2020 09:37:35 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 386463664C for ; Wed, 17 Jun 2020 13:37:35 +0000 (UTC) X-FDA: 76938806070.18.story52_0b16e6126e08 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 10B7710061C40 for ; Wed, 17 Jun 2020 13:37:35 +0000 (UTC) X-HE-Tag: story52_0b16e6126e08 X-Filterd-Recvd-Size: 8200 Received: from mail-lj1-f196.google.com (mail-lj1-f196.google.com [209.85.208.196]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Jun 2020 13:37:34 +0000 (UTC) Received: by mail-lj1-f196.google.com with SMTP id i27so2849866ljb.12 for ; Wed, 17 Jun 2020 06:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=AQDHF9j1wumEdOF8jW/1ZD+wMd/8SpKXUSrLOHAhi0A44X1JaVVNvxPIvK23AWFQl1 lcfd2qLeHyI1mOR56hQwvpPqFkxdvrqlLEDd85my4N1iUDfAlgcF8AzsgKWBK3zrgNCe FfcZd/NGkaf2Mw5/0Wi7/NN8GlYj/dE1H3Y2K/UdByyIhfAVRQSJIdp1UXJN3EirQcCx gGhwwgc+eOdRDRZohnKU4obNcVnQwSt7TB0Y31V6blCHXIgexA++Mbo6gO8D5gHDGqpn e08UA7hYCS3YGjFr+hICEEExE+bXh2EV+2ZuVFnYmW334FTROJKMWUws6mNfrG1VlVfC NXPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=bSokUzxU6KzSUUVtugzrJsDsNntnCWLDlHL7ryXM66Q7H3nbOugHu4IWVI+9nu2cfH 6rrDoyGlZEeu40u7X9BBlx/Xxv2RTq3Eag3+B9GRxHbMXvuI5OY4MkHJ0HsXAGFg2+P9 7e6kDzI7UNb3npRtrVkxq2wTQkN/FrLGNy0jZjDsQakLYnv5iNPjo0NdExdgBGse7czQ rQJUZWfJwxOvo7iWq3GYgvfSWaTCL5W7N7fDHSbWKrzMxzBrzNmLixpX9ExCwmPm+CDf +Ub3QP49bhoEbllfPUnUg+fpOuCUF/f1deYTvSPJ2FesRW72uwr5U9QSYVAZQdJVDeFa uwrw== X-Gm-Message-State: AOAM532n7yaSLo4WkCKhrlleyEi0e8n2uTTtVyToWyfIumVzCrsGEIzu vvgUcJlTmAHO+gDEtukv5HZ061QALaVIz4CXuMUh8Q== X-Google-Smtp-Source: ABdhPJwA6fF2gcH8wQw09CP/KfWtNPcawF+FUlNNP2wgH6FMgwt35Wa0rYTio1e8v/dCRtNDcHsmii7KXDmm4uILcaA= X-Received: by 2002:a2e:911:: with SMTP id 17mr4347007ljj.411.1592401052747; Wed, 17 Jun 2020 06:37:32 -0700 (PDT) MIME-Version: 1.0 References: <20200501135806.4eebf0b92f84ab60bba3e1e7@linux-foundation.org> <20200519075213.GF32497@dhcp22.suse.cz> <20200519084535.GG32497@dhcp22.suse.cz> <20200520190906.GA558281@chrisdown.name> <20200521095515.GK6462@dhcp22.suse.cz> <20200521163450.GV6462@dhcp22.suse.cz> In-Reply-To: <20200521163450.GV6462@dhcp22.suse.cz> From: Naresh Kamboju Date: Wed, 17 Jun 2020 19:07:20 +0530 Message-ID: Subject: Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page To: Michal Hocko , Chris Down , Yafang Shao Cc: Anders Roxell , "Linux F2FS DEV, Mailing List" , linux-ext4 , linux-block , Andrew Morton , open list , Linux-Next Mailing List , linux-mm , Arnd Bergmann , Andreas Dilger , Jaegeuk Kim , "Theodore Ts'o" , Chao Yu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Chao Yu , lkft-triage@lists.linaro.org, Johannes Weiner , Roman Gushchin , Cgroups Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 10B7710061C40 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 21 May 2020 at 22:04, Michal Hocko wrote: > > On Thu 21-05-20 11:55:16, Michal Hocko wrote: > > On Wed 20-05-20 20:09:06, Chris Down wrote: > > > Hi Naresh, > > > > > > Naresh Kamboju writes: > > > > As a part of investigation on this issue LKFT teammate Anders Roxell > > > > git bisected the problem and found bad commit(s) which caused this problem. > > > > > > > > The following two patches have been reverted on next-20200519 and retested the > > > > reproducible steps and confirmed the test case mkfs -t ext4 got PASS. > > > > ( invoked oom-killer is gone now) > > > > > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > > > protection" > > > > This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6. > > > > > > > > Revert "mm, memcg: decouple e{low,min} state mutations from protection > > > > checks" > > > > This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82. > > > > > > Thanks Anders and Naresh for tracking this down and reverting. > > > > > > I'll take a look tomorrow. I don't see anything immediately obviously wrong > > > in either of those commits from a (very) cursory glance, but they should > > > only be taking effect if protections are set. > > > > Agreed. If memory.{low,min} is not used then the patch should be > > effectively a nop. > > I was staring into the code and do not see anything. Could you give the > following debugging patch a try and see whether it triggers? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cc555903a332..df2e8df0eb71 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * sc->priority further than desirable. > */ > scan = max(scan, SWAP_CLUSTER_MAX); > + > + trace_printk("scan:%lu protection:%lu\n", scan, protection); > } else { > scan = lruvec_size; > } > @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(memcg)) { > + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; As per your suggestions on debugging this problem, trace_printk is replaced with printk and applied to your patch on top of the problematic kernel and here is the test output and link. mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0 Full test log link, https://lkft.validation.linaro.org/scheduler/job/1497412#L1451 - Naresh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 02AC0C433DF for ; Wed, 17 Jun 2020 13:37:48 +0000 (UTC) Received: from lists.sourceforge.net (lists.sourceforge.net [216.105.38.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C025521556 for ; Wed, 17 Jun 2020 13:37:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=sourceforge.net header.i=@sourceforge.net header.b="Hf5EmjoH"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=sf.net header.i=@sf.net header.b="fR2Ls47q"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="AQDHF9j1" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C025521556 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linaro.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linux-f2fs-devel-bounces@lists.sourceforge.net Received: from [127.0.0.1] (helo=sfs-ml-4.v29.lw.sourceforge.com) by sfs-ml-4.v29.lw.sourceforge.com with esmtp (Exim 4.90_1) (envelope-from ) id 1jlYGA-00064u-Dm; Wed, 17 Jun 2020 13:37:46 +0000 Received: from [172.30.20.202] (helo=mx.sourceforge.net) by sfs-ml-4.v29.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jlYG9-00064j-7P for linux-f2fs-devel@lists.sourceforge.net; Wed, 17 Jun 2020 13:37:45 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sourceforge.net; s=x; h=Content-Type:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Sender:Reply-To:Content-Transfer-Encoding :Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=Hf5EmjoHaQ3FywO1su0nhQ6fci IX647RQMuI08zPefYQnoKjFS+1/Vq3pjyLCUDH2+ZsYuBI1/T9KLxWvEtF5tptby10JBEhlw5IvPm /E2zU1zl0Hupt0nzyZMQ+WEqAtORbcse8HY+4tk8qzZuW3NMl09TobpkVfu2QdfDZb+A=; DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=sf.net; s=x ; h=Content-Type:Cc:To:Subject:Message-ID:Date:From:In-Reply-To:References: MIME-Version:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=fR2Ls47qJ9N2q769z721EbKzxD CXjT8J1wX+XQS/SPnR3MHVJoFlGY3J0n8a8gBmOg60mkocQ0Mbwi2bZGWGjo5Xi8D+pi0OEnUp9rC 9zsMzAq2LfhoO2A2Sq+1Yk/dxfKeoBLuYIlSfrkoSp/uOGpGJgeiSfXDmQ7jrF6IyIBI=; Received: from mail-lj1-f193.google.com ([209.85.208.193]) by sfi-mx-1.v28.lw.sourceforge.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.92.2) id 1jlYG3-006vCc-II for linux-f2fs-devel@lists.sourceforge.net; Wed, 17 Jun 2020 13:37:45 +0000 Received: by mail-lj1-f193.google.com with SMTP id x18so2921853lji.1 for ; Wed, 17 Jun 2020 06:37:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=AQDHF9j1wumEdOF8jW/1ZD+wMd/8SpKXUSrLOHAhi0A44X1JaVVNvxPIvK23AWFQl1 lcfd2qLeHyI1mOR56hQwvpPqFkxdvrqlLEDd85my4N1iUDfAlgcF8AzsgKWBK3zrgNCe FfcZd/NGkaf2Mw5/0Wi7/NN8GlYj/dE1H3Y2K/UdByyIhfAVRQSJIdp1UXJN3EirQcCx gGhwwgc+eOdRDRZohnKU4obNcVnQwSt7TB0Y31V6blCHXIgexA++Mbo6gO8D5gHDGqpn e08UA7hYCS3YGjFr+hICEEExE+bXh2EV+2ZuVFnYmW334FTROJKMWUws6mNfrG1VlVfC NXPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=mzp4Ns9ZiXBNw2ghMgRGVom2Hj9w9xWvXM43AhVuRmOkUQlE0gkehpIVHjc2Bp/xr/ a+BEkeH1DDA+mOUBR9DL4lJvKV4cKMFUQjfuU/fMYdUWCQzptmLMFv9kDe0ZVgV8ZHNB vUAG8aoawqhdznJtL8R1ZOguL56WHD2gW12k/X+u8u8OPYShFQlOwV/UoO0zzZLdlgn9 LDAElRzws0MYnS5yNxE1xPRoggB8+tN3Eelzx64yFEJOn7AJ+rOtGPdFHOKPi3rtdAQG 0g26TzEMgeGjWtibLmdW+fVa8bMOgVNGXJJaAJx/pbpYHYnIIGVPiQlv2PZ7GB+XmnCH yQcA== X-Gm-Message-State: AOAM532QBi0Q6H5Az0hRm/ZXMeFzkYRA3tRf/BaQFL9YWMIcl14BDYxM Xjd0w+GWwTNbGXbofuWQoPS6bm4ZEb0CGfL1Ev+jlw== X-Google-Smtp-Source: ABdhPJwA6fF2gcH8wQw09CP/KfWtNPcawF+FUlNNP2wgH6FMgwt35Wa0rYTio1e8v/dCRtNDcHsmii7KXDmm4uILcaA= X-Received: by 2002:a2e:911:: with SMTP id 17mr4347007ljj.411.1592401052747; Wed, 17 Jun 2020 06:37:32 -0700 (PDT) MIME-Version: 1.0 References: <20200501135806.4eebf0b92f84ab60bba3e1e7@linux-foundation.org> <20200519075213.GF32497@dhcp22.suse.cz> <20200519084535.GG32497@dhcp22.suse.cz> <20200520190906.GA558281@chrisdown.name> <20200521095515.GK6462@dhcp22.suse.cz> <20200521163450.GV6462@dhcp22.suse.cz> In-Reply-To: <20200521163450.GV6462@dhcp22.suse.cz> From: Naresh Kamboju Date: Wed, 17 Jun 2020 19:07:20 +0530 Message-ID: To: Michal Hocko , Chris Down , Yafang Shao X-Headers-End: 1jlYG3-006vCc-II Subject: Re: [f2fs-dev] mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page X-BeenThere: linux-f2fs-devel@lists.sourceforge.net X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrea Arcangeli , Anders Roxell , Arnd Bergmann , Cgroups , Hugh Dickins , open list , Matthew Wilcox , "Linux F2FS DEV, Mailing List" , linux-block , linux-mm , Linux-Next Mailing List , Andreas Dilger , Theodore Ts'o , lkft-triage@lists.linaro.org, Johannes Weiner , Jaegeuk Kim , Andrew Morton , linux-ext4 , Roman Gushchin Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linux-f2fs-devel-bounces@lists.sourceforge.net On Thu, 21 May 2020 at 22:04, Michal Hocko wrote: > > On Thu 21-05-20 11:55:16, Michal Hocko wrote: > > On Wed 20-05-20 20:09:06, Chris Down wrote: > > > Hi Naresh, > > > > > > Naresh Kamboju writes: > > > > As a part of investigation on this issue LKFT teammate Anders Roxell > > > > git bisected the problem and found bad commit(s) which caused this problem. > > > > > > > > The following two patches have been reverted on next-20200519 and retested the > > > > reproducible steps and confirmed the test case mkfs -t ext4 got PASS. > > > > ( invoked oom-killer is gone now) > > > > > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > > > protection" > > > > This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6. > > > > > > > > Revert "mm, memcg: decouple e{low,min} state mutations from protection > > > > checks" > > > > This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82. > > > > > > Thanks Anders and Naresh for tracking this down and reverting. > > > > > > I'll take a look tomorrow. I don't see anything immediately obviously wrong > > > in either of those commits from a (very) cursory glance, but they should > > > only be taking effect if protections are set. > > > > Agreed. If memory.{low,min} is not used then the patch should be > > effectively a nop. > > I was staring into the code and do not see anything. Could you give the > following debugging patch a try and see whether it triggers? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cc555903a332..df2e8df0eb71 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * sc->priority further than desirable. > */ > scan = max(scan, SWAP_CLUSTER_MAX); > + > + trace_printk("scan:%lu protection:%lu\n", scan, protection); > } else { > scan = lruvec_size; > } > @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(memcg)) { > + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; As per your suggestions on debugging this problem, trace_printk is replaced with printk and applied to your patch on top of the problematic kernel and here is the test output and link. mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0 Full test log link, https://lkft.validation.linaro.org/scheduler/job/1497412#L1451 - Naresh _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Naresh Kamboju Subject: Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page Date: Wed, 17 Jun 2020 19:07:20 +0530 Message-ID: References: <20200501135806.4eebf0b92f84ab60bba3e1e7@linux-foundation.org> <20200519075213.GF32497@dhcp22.suse.cz> <20200519084535.GG32497@dhcp22.suse.cz> <20200520190906.GA558281@chrisdown.name> <20200521095515.GK6462@dhcp22.suse.cz> <20200521163450.GV6462@dhcp22.suse.cz> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fqR44w8izx/KE5EkQ1Ws5ETNchnRTHELluIzCKgLjZQ=; b=AQDHF9j1wumEdOF8jW/1ZD+wMd/8SpKXUSrLOHAhi0A44X1JaVVNvxPIvK23AWFQl1 lcfd2qLeHyI1mOR56hQwvpPqFkxdvrqlLEDd85my4N1iUDfAlgcF8AzsgKWBK3zrgNCe FfcZd/NGkaf2Mw5/0Wi7/NN8GlYj/dE1H3Y2K/UdByyIhfAVRQSJIdp1UXJN3EirQcCx gGhwwgc+eOdRDRZohnKU4obNcVnQwSt7TB0Y31V6blCHXIgexA++Mbo6gO8D5gHDGqpn e08UA7hYCS3YGjFr+hICEEExE+bXh2EV+2ZuVFnYmW334FTROJKMWUws6mNfrG1VlVfC NXPg== In-Reply-To: <20200521163450.GV6462-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko , Chris Down , Yafang Shao Cc: Anders Roxell , "Linux F2FS DEV, Mailing List" , linux-ext4 , linux-block , Andrew Morton , open list , Linux-Next Mailing List , linux-mm , Arnd Bergmann , Andreas Dilger , Jaegeuk Kim , Theodore Ts'o , Chao Yu , Hugh Dickins , Andrea Arcangeli , Matthew Wilcox , Chao Yu , lkft-triage-cunTk1MwBs8s++Sfvej+rw@public.gmane.org, Johannes Weiner , Roman On Thu, 21 May 2020 at 22:04, Michal Hocko wrote: > > On Thu 21-05-20 11:55:16, Michal Hocko wrote: > > On Wed 20-05-20 20:09:06, Chris Down wrote: > > > Hi Naresh, > > > > > > Naresh Kamboju writes: > > > > As a part of investigation on this issue LKFT teammate Anders Roxell > > > > git bisected the problem and found bad commit(s) which caused this problem. > > > > > > > > The following two patches have been reverted on next-20200519 and retested the > > > > reproducible steps and confirmed the test case mkfs -t ext4 got PASS. > > > > ( invoked oom-killer is gone now) > > > > > > > > Revert "mm, memcg: avoid stale protection values when cgroup is above > > > > protection" > > > > This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6. > > > > > > > > Revert "mm, memcg: decouple e{low,min} state mutations from protection > > > > checks" > > > > This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82. > > > > > > Thanks Anders and Naresh for tracking this down and reverting. > > > > > > I'll take a look tomorrow. I don't see anything immediately obviously wrong > > > in either of those commits from a (very) cursory glance, but they should > > > only be taking effect if protections are set. > > > > Agreed. If memory.{low,min} is not used then the patch should be > > effectively a nop. > > I was staring into the code and do not see anything. Could you give the > following debugging patch a try and see whether it triggers? > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index cc555903a332..df2e8df0eb71 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > * sc->priority further than desirable. > */ > scan = max(scan, SWAP_CLUSTER_MAX); > + > + trace_printk("scan:%lu protection:%lu\n", scan, protection); > } else { > scan = lruvec_size; > } > @@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(memcg)) { > + trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin); > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > @@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > + trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow); > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; As per your suggestions on debugging this problem, trace_printk is replaced with printk and applied to your patch on top of the problematic kernel and here is the test output and link. mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF mke2fs 1.43.8 (1-Jan-2018) Creating filesystem with 244190646 4k blocks and 61054976 inodes Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: 0/7453 done Writing inode tables: 0/7453 done Creating journal (262144 blocks): [ 51.544525] under min:0 emin:0 [ 51.845304] under min:0 emin:0 [ 51.848738] under min:0 emin:0 [ 51.858147] under min:0 emin:0 [ 51.861333] under min:0 emin:0 [ 51.862034] under min:0 emin:0 [ 51.862442] under min:0 emin:0 [ 51.862763] under min:0 emin:0 Full test log link, https://lkft.validation.linaro.org/scheduler/job/1497412#L1451 - Naresh