From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B2199C433F5
	for <linux-mm@archiver.kernel.org>; Tue,  8 Feb 2022 00:06:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 342D86B0073; Mon,  7 Feb 2022 19:06:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2F1866B0075; Mon,  7 Feb 2022 19:06:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 1B9606B0078; Mon,  7 Feb 2022 19:06:07 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193])
	by kanga.kvack.org (Postfix) with ESMTP id 0948C6B0073
	for <linux-mm@kvack.org>; Mon,  7 Feb 2022 19:06:07 -0500 (EST)
Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id C21198249980
	for <linux-mm@kvack.org>; Tue,  8 Feb 2022 00:06:06 +0000 (UTC)
X-FDA: 79117669932.09.F3FDA0B
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf19.hostedemail.com (Postfix) with ESMTP id 4A0A51A000C
	for <linux-mm@kvack.org>; Tue,  8 Feb 2022 00:06:06 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1644278765;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:
	 content-transfer-encoding:content-transfer-encoding;
	bh=hp8wvZdS1P5SZuwA1phL5v+8erUCnmBhdeZatALDNps=;
	b=S/g3M2QNf2rmcpXYneOtqil1GUidsZxCHJXBUBs8QUqrqHKCM9Col9rGIO0WpMfT8Uy4vP
	VhsVbikpGEAwl0hKbod5Yb02RR7Kkoxzjde+pyocPawXDzRbqhst2rAvERvSTf61OAQxk1
	LQ9GQS75eh50jsKbnYB5MyGyZq0XGqQ=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-447-n3pT6MGFOE-l5rV8R4WyZw-1; Mon, 07 Feb 2022 19:06:00 -0500
X-MC-Unique: n3pT6MGFOE-l5rV8R4WyZw-1
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D721B100C663;
	Tue,  8 Feb 2022 00:05:57 +0000 (UTC)
Received: from llong.com (unknown [10.22.32.15])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 4CE875BC49;
	Tue,  8 Feb 2022 00:05:39 +0000 (UTC)
From: Waiman Long <longman@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Petr Mladek <pmladek@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org,
	linux-mm@kvack.org,
	Ira Weiny <ira.weiny@intel.com>,
	Mike Rapoport <rppt@kernel.org>,
	David Rientjes <rientjes@google.com>,
	Roman Gushchin <guro@fb.com>,
	Rafael Aquini <aquini@redhat.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH v5 0/4] mm/page_owner: Extend page_owner to show memcg information
Date: Mon,  7 Feb 2022 19:05:28 -0500
Message-Id: <20220208000532.1054311-1-longman@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
X-Stat-Signature: pnyws53yzor61fb7cizn731fnccnsgoz
Authentication-Results: imf19.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="S/g3M2QN";
	spf=none (imf19.hostedemail.com: domain of longman@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=longman@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
X-Rspam-User: 
X-Rspamd-Server: rspam09
X-Rspamd-Queue-Id: 4A0A51A000C
X-HE-Tag: 1644278766-4435
Content-Transfer-Encoding: quoted-printable
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

 v5:
  - Apply the following changes to patch 3
    1) Make cgroup_name() write directly into kbuf without using an
       intermediate buffer.
    2) Change the terminology from "offline memcg" to "dying memcg" to al=
ign
       better with similar terms used elsewhere in the kernel.

 v4:
  - Take rcu_read_lock() when memcg is being accessed as suggested by
    Michal.
  - Make print_page_owner_memcg() return the new offset into the buffer
    and put CONFIG_MEMCG block inside as suggested by Mike.
  - Directly use TASK_COMM_LEN as length of name buffer as suggested by
    Roman.

 v3:
  - Add unlikely() to patch 1 and clarify that -1 will not be returned.
  - Use a helper function to print out memcg information in patch 3.
  - Add a new patch 4 to store task command name in page_owner
    structure.

While debugging the constant increase in percpu memory consumption on
a system that spawned large number of containers, it was found that a
lot of dying mem_cgroup structures remained in place without being
freed. Further investigation indicated that those mem_cgroup structures
were pinned by some pages.

In order to find out what those pages are, the existing page_owner
debugging tool is extended to show memory cgroup information and whether
those memcgs are dying or not. With the enhanced page_owner tool,
the following is a typical page that pinned the mem_cgroup structure
in my test case:

Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 709=
84 (podman), ts 5421278969115 ns, free_ts 5420935666638 ns
PFN 3205061 type Movable Block 6259 type Movable Flags 0x17ffffc00c001c(u=
ptodate|dirty|lru|reclaim|swapbacked|node=3D0|zone=3D2|lastcpupid=3D0x1ff=
fff)
 prep_new_page+0x8e/0xb0
 get_page_from_freelist+0xc4d/0xe50
 __alloc_pages+0x172/0x320
 alloc_pages_vma+0x84/0x230
 shmem_alloc_page+0x3f/0x90
 shmem_alloc_and_acct_page+0x76/0x1c0
 shmem_getpage_gfp+0x48d/0x890
 shmem_write_begin+0x36/0xc0
 generic_perform_write+0xed/0x1d0
 __generic_file_write_iter+0xdc/0x1b0
 generic_file_write_iter+0x5d/0xb0
 new_sync_write+0x11f/0x1b0
 vfs_write+0x1ba/0x2a0
 ksys_write+0x59/0xd0
 do_syscall_64+0x37/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
Charged to dying memcg libpod-conmon-fbc62060b5377479a7371cc16c5c59600294=
5f2aa00d3d6d73a0cd0d148b6637.scope

So the page was not freed because it was part of a shmem segment. That
is useful information that can help users to diagnose similar problems.

With cgroup v1, /proc/cgroups can be read to find out the total number
of memory cgroups (online + dying). With cgroup v2, the cgroup.stat
of the root cgroup can be read to find the number of dying cgroups
(most likely pinned by dying memcgs).

The page_owner feature is not supposed to be enabled for production
system due to its memory overhead. However, if it is suspected that
dying memcgs are increasing over time, a test environment with page_owner
enabled can then be set up with appropriate workload for further analysis
on what may be causing the increasing number of dying memcgs.

Waiman Long (4):
  lib/vsprintf: Avoid redundant work with 0 size
  mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check
  mm/page_owner: Print memcg information
  mm/page_owner: Record task command name

 lib/vsprintf.c  |  8 +++---
 mm/page_owner.c | 72 ++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 62 insertions(+), 18 deletions(-)

--=20
2.27.0