From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2F2FC433E9 for ; Fri, 12 Mar 2021 23:19:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B937F64F89 for ; Fri, 12 Mar 2021 23:19:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235789AbhCLXTF (ORCPT ); Fri, 12 Mar 2021 18:19:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235757AbhCLXSw (ORCPT ); Fri, 12 Mar 2021 18:18:52 -0500 Received: from mail-lf1-x12c.google.com (mail-lf1-x12c.google.com [IPv6:2a00:1450:4864:20::12c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D2A9C061574 for ; Fri, 12 Mar 2021 15:18:52 -0800 (PST) Received: by mail-lf1-x12c.google.com with SMTP id t18so103481lfl.3 for ; Fri, 12 Mar 2021 15:18:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mK2aSnmbGyTuRcFoPtY3rwHLVLofTtX1TRzm5DaMZsQ=; b=EYYAWKTb46EhgkJ6ATfk1K+QMBJUeLlZ+F7wWLEs/eYy4fbaF5/kgjBg0efBbt3ZdX AkSomeYqhVzPSv8U4uX9pRp44mspQHaNfQebeQt+kIrPzcfiHu1eyywuTJOBsWBxZxar i6lWX/MqGqFaECINQ4jztNzKr6nFitwb0wlMAM4VXaI3IsNOzanLJV3s/a+nfqNvDmeN a2FbnZ7ksi0QP9Erjq7csaFe+BeyMSrOVZQhp1xvzTRwurud0uRcFzwpl0ASZLT6c+A9 K/ZgvB7/IgFM7SRkkpgNCUREq3dAacn3IK/2gPPtswfx4q8y05Odx3x5iRZyMKNuj46k wVaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mK2aSnmbGyTuRcFoPtY3rwHLVLofTtX1TRzm5DaMZsQ=; b=JW7juymTO+DZfW64RxlEnxeduahyw9BK9S13fyzaun6uPmKiNj1i4xd2QQbwiOmXEl Kj+ao4WAdNG1Pignbqnoi0Q02hW7QnZI2kxR6NWVGg+oSJ2V3pvCqt+M1ik+eHNgnp4B JPoUUOjxBWXZeUGYhucyLRQFTjNi6s1NZ08v2/sQe5Nv5Pa+bVJjpN7CTT7KDBZ9vgu2 hcUbAomp+iZCDLB5jsCs5glsjq9YRTSQYuFqOT0IJuBazWkrFBceDhKtkNGQGIr/pWsB kcD6SNlfC9FyAMucVZc4Ve9UXpryW+lnOX71Wq7ciAtBVI+4M98tCvC1QQ0VyiIyLcDX /knw== X-Gm-Message-State: AOAM530k7F3fzSlwOehIzlS/ZP1RidWiKKJi/JlU60wGer+/9m2c6uby OUGmF1S/c89k2jOygcaFRBSF99CfzIe1ub/c2hOnQg== X-Google-Smtp-Source: ABdhPJwVrSTWR49g3B/sEW1zu6GRQiJdkX8bUu6PAjNwqo76aOwqrS5un3q8ZCEc1wErqlo7lug9dq6WK5xvPLbYa+k= X-Received: by 2002:a05:6512:6c6:: with SMTP id u6mr873792lff.347.1615591130812; Fri, 12 Mar 2021 15:18:50 -0800 (PST) MIME-Version: 1.0 References: <20210309100717.253-1-songmuchun@bytedance.com> <20210309100717.253-3-songmuchun@bytedance.com> In-Reply-To: From: Shakeel Butt Date: Fri, 12 Mar 2021 15:18:38 -0800 Message-ID: Subject: Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page To: Johannes Weiner Cc: Muchun Song , Roman Gushchin , Michal Hocko , Andrew Morton , Vladimir Davydov , LKML , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 12, 2021 at 3:07 PM Johannes Weiner wrote: > > On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote: > > Hi Johannes, > > > > On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner wrote: > > > > > [...] > > > > > > Longer term we most likely need it there anyway. The issue you are > > > describing in the cover letter - allocations pinning memcgs for a long > > > time - it exists at a larger scale and is causing recurring problems > > > in the real world: page cache doesn't get reclaimed for a long time, > > > or is used by the second, third, fourth, ... instance of the same job > > > that was restarted into a new cgroup every time. Unreclaimable dying > > > cgroups pile up, waste memory, and make page reclaim very inefficient. > > > > > > > For the scenario described above, do we really want to reparent the > > page cache pages? Shouldn't we recharge the pages to the second, > > third, fourth and so on, memcgs? My concern is that we will see a big > > chunk of page cache pages charged to root and will only get reclaimed > > on global pressure. > > Sorry, I'm proposing to reparent to the ancestor, not root. It's an > optimization, not a change in user-visible behavior. > > As far as the user can tell, the pages already belong to the parent > after deletion: they'll show up in the parent's stats, naturally, and > they will get reclaimed as part of the parent being reclaimed. > > The dead cgroup doesn't even have its own limit anymore after > .css_reset() has run. And we already physically reparent slab objects > in memcg_reparent_objcgs() and memcg_drain_all_list_lrus(). > > I'm just saying we should do the same thing for LRU pages. I understand the proposal and I agree it makes total sense when a job is recycling sub-job/sub-container. I was talking about the (recycling of the) top level cgroups. Though for that to be an issue, I suppose the file system has to be shared between the jobs on the system. I was wondering if a page cache reaches the root memcg on multiple reparenting, should the next access cause that page to be charged to the accessor? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A258C433DB for ; Fri, 12 Mar 2021 23:52:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DF21164F27 for ; Fri, 12 Mar 2021 23:52:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DF21164F27 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FEBF6B006C; Fri, 12 Mar 2021 18:52:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AE986B006E; Fri, 12 Mar 2021 18:52:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 501066B0070; Fri, 12 Mar 2021 18:52:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0222.hostedemail.com [216.40.44.222]) by kanga.kvack.org (Postfix) with ESMTP id 304896B006C for ; Fri, 12 Mar 2021 18:52:12 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DF37AA76E for ; Fri, 12 Mar 2021 23:52:11 +0000 (UTC) X-FDA: 77912873262.20.894AF50 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf24.hostedemail.com (Postfix) with ESMTP id 00B65A5FB2AC for ; Fri, 12 Mar 2021 23:18:47 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id 18so47844926lff.6 for ; Fri, 12 Mar 2021 15:18:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mK2aSnmbGyTuRcFoPtY3rwHLVLofTtX1TRzm5DaMZsQ=; b=EYYAWKTb46EhgkJ6ATfk1K+QMBJUeLlZ+F7wWLEs/eYy4fbaF5/kgjBg0efBbt3ZdX AkSomeYqhVzPSv8U4uX9pRp44mspQHaNfQebeQt+kIrPzcfiHu1eyywuTJOBsWBxZxar i6lWX/MqGqFaECINQ4jztNzKr6nFitwb0wlMAM4VXaI3IsNOzanLJV3s/a+nfqNvDmeN a2FbnZ7ksi0QP9Erjq7csaFe+BeyMSrOVZQhp1xvzTRwurud0uRcFzwpl0ASZLT6c+A9 K/ZgvB7/IgFM7SRkkpgNCUREq3dAacn3IK/2gPPtswfx4q8y05Odx3x5iRZyMKNuj46k wVaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mK2aSnmbGyTuRcFoPtY3rwHLVLofTtX1TRzm5DaMZsQ=; b=gVHbkZLFPVzHI1ebxCfB3bNUFKNnkOUetUGjHgEo3GhjhtJIUiWP3GoQaQ9tp9my/g HN34kbJ3sY3dRaRstvS7HfSUxym4f9AOFz/n9YohHgaTW7+1g2BP6gsCgUYXsf6GnZxo +anMyshiBxob5eI6bY0HzhT0Mju3T6aP8bYXyt9DPS+OJ0FAQoE/fEZOz4DYqgz+w0kt VdwE8ml2W0yeduHy6mYBvcuSfjWlt7rbKmSm0cN6YuRCUDTBX5mXZ/+0iLc3/WIRAsiF saSyUzvqZzDdw5wL1oiC/mkKEBw5aSWKuWOA+w7B29xoPCGF1e4MXYreP2pRqoIf6cGm RJ4g== X-Gm-Message-State: AOAM531PyKDPJ7MnJIrq+hAXxEjHdPLxyCJv0SZNGMAEZRoHbWQBsZ+3 uKAcloC2oIADQkav2hIsduVCRgpiy4076k+seaC2ug== X-Google-Smtp-Source: ABdhPJwVrSTWR49g3B/sEW1zu6GRQiJdkX8bUu6PAjNwqo76aOwqrS5un3q8ZCEc1wErqlo7lug9dq6WK5xvPLbYa+k= X-Received: by 2002:a05:6512:6c6:: with SMTP id u6mr873792lff.347.1615591130812; Fri, 12 Mar 2021 15:18:50 -0800 (PST) MIME-Version: 1.0 References: <20210309100717.253-1-songmuchun@bytedance.com> <20210309100717.253-3-songmuchun@bytedance.com> In-Reply-To: From: Shakeel Butt Date: Fri, 12 Mar 2021 15:18:38 -0800 Message-ID: Subject: Re: [External] Re: [PATCH v3 2/4] mm: memcontrol: make page_memcg{_rcu} only applicable for non-kmem page To: Johannes Weiner Cc: Muchun Song , Roman Gushchin , Michal Hocko , Andrew Morton , Vladimir Davydov , LKML , Linux Memory Management List , Xiongchun duan Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 00B65A5FB2AC X-Stat-Signature: z4xsscx41fdgizmit4jba4kf64m3h1sd Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=mail-lf1-f51.google.com; client-ip=209.85.167.51 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615591127-647880 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 12, 2021 at 3:07 PM Johannes Weiner wrote: > > On Fri, Mar 12, 2021 at 02:42:45PM -0800, Shakeel Butt wrote: > > Hi Johannes, > > > > On Fri, Mar 12, 2021 at 11:23 AM Johannes Weiner wrote: > > > > > [...] > > > > > > Longer term we most likely need it there anyway. The issue you are > > > describing in the cover letter - allocations pinning memcgs for a long > > > time - it exists at a larger scale and is causing recurring problems > > > in the real world: page cache doesn't get reclaimed for a long time, > > > or is used by the second, third, fourth, ... instance of the same job > > > that was restarted into a new cgroup every time. Unreclaimable dying > > > cgroups pile up, waste memory, and make page reclaim very inefficient. > > > > > > > For the scenario described above, do we really want to reparent the > > page cache pages? Shouldn't we recharge the pages to the second, > > third, fourth and so on, memcgs? My concern is that we will see a big > > chunk of page cache pages charged to root and will only get reclaimed > > on global pressure. > > Sorry, I'm proposing to reparent to the ancestor, not root. It's an > optimization, not a change in user-visible behavior. > > As far as the user can tell, the pages already belong to the parent > after deletion: they'll show up in the parent's stats, naturally, and > they will get reclaimed as part of the parent being reclaimed. > > The dead cgroup doesn't even have its own limit anymore after > .css_reset() has run. And we already physically reparent slab objects > in memcg_reparent_objcgs() and memcg_drain_all_list_lrus(). > > I'm just saying we should do the same thing for LRU pages. I understand the proposal and I agree it makes total sense when a job is recycling sub-job/sub-container. I was talking about the (recycling of the) top level cgroups. Though for that to be an issue, I suppose the file system has to be shared between the jobs on the system. I was wondering if a page cache reaches the root memcg on multiple reparenting, should the next access cause that page to be charged to the accessor?