From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2199C433F5 for ; Tue, 8 Feb 2022 00:06:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 342D86B0073; Mon, 7 Feb 2022 19:06:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F1866B0075; Mon, 7 Feb 2022 19:06:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1B9606B0078; Mon, 7 Feb 2022 19:06:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0193.hostedemail.com [216.40.44.193]) by kanga.kvack.org (Postfix) with ESMTP id 0948C6B0073 for ; Mon, 7 Feb 2022 19:06:07 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C21198249980 for ; Tue, 8 Feb 2022 00:06:06 +0000 (UTC) X-FDA: 79117669932.09.F3FDA0B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 4A0A51A000C for ; Tue, 8 Feb 2022 00:06:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1644278765; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=hp8wvZdS1P5SZuwA1phL5v+8erUCnmBhdeZatALDNps=; b=S/g3M2QNf2rmcpXYneOtqil1GUidsZxCHJXBUBs8QUqrqHKCM9Col9rGIO0WpMfT8Uy4vP VhsVbikpGEAwl0hKbod5Yb02RR7Kkoxzjde+pyocPawXDzRbqhst2rAvERvSTf61OAQxk1 LQ9GQS75eh50jsKbnYB5MyGyZq0XGqQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-447-n3pT6MGFOE-l5rV8R4WyZw-1; Mon, 07 Feb 2022 19:06:00 -0500 X-MC-Unique: n3pT6MGFOE-l5rV8R4WyZw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D721B100C663; Tue, 8 Feb 2022 00:05:57 +0000 (UTC) Received: from llong.com (unknown [10.22.32.15]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4CE875BC49; Tue, 8 Feb 2022 00:05:39 +0000 (UTC) From: Waiman Long To: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Petr Mladek , Steven Rostedt , Sergey Senozhatsky , Andy Shevchenko , Rasmus Villemoes Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org, Ira Weiny , Mike Rapoport , David Rientjes , Roman Gushchin , Rafael Aquini , Waiman Long Subject: [PATCH v5 0/4] mm/page_owner: Extend page_owner to show memcg information Date: Mon, 7 Feb 2022 19:05:28 -0500 Message-Id: <20220208000532.1054311-1-longman@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: pnyws53yzor61fb7cizn731fnccnsgoz Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="S/g3M2QN"; spf=none (imf19.hostedemail.com: domain of longman@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=longman@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4A0A51A000C X-HE-Tag: 1644278766-4435 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: v5: - Apply the following changes to patch 3 1) Make cgroup_name() write directly into kbuf without using an intermediate buffer. 2) Change the terminology from "offline memcg" to "dying memcg" to al= ign better with similar terms used elsewhere in the kernel. v4: - Take rcu_read_lock() when memcg is being accessed as suggested by Michal. - Make print_page_owner_memcg() return the new offset into the buffer and put CONFIG_MEMCG block inside as suggested by Mike. - Directly use TASK_COMM_LEN as length of name buffer as suggested by Roman. v3: - Add unlikely() to patch 1 and clarify that -1 will not be returned. - Use a helper function to print out memcg information in patch 3. - Add a new patch 4 to store task command name in page_owner structure. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of dying mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are dying or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 709= 84 (podman), ts 5421278969115 ns, free_ts 5420935666638 ns PFN 3205061 type Movable Block 6259 type Movable Flags 0x17ffffc00c001c(u= ptodate|dirty|lru|reclaim|swapbacked|node=3D0|zone=3D2|lastcpupid=3D0x1ff= fff) prep_new_page+0x8e/0xb0 get_page_from_freelist+0xc4d/0xe50 __alloc_pages+0x172/0x320 alloc_pages_vma+0x84/0x230 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x48d/0x890 shmem_write_begin+0x36/0xc0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to dying memcg libpod-conmon-fbc62060b5377479a7371cc16c5c59600294= 5f2aa00d3d6d73a0cd0d148b6637.scope So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + dying). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. Waiman Long (4): lib/vsprintf: Avoid redundant work with 0 size mm/page_owner: Use scnprintf() to avoid excessive buffer overrun check mm/page_owner: Print memcg information mm/page_owner: Record task command name lib/vsprintf.c | 8 +++--- mm/page_owner.c | 72 ++++++++++++++++++++++++++++++++++++++----------- 2 files changed, 62 insertions(+), 18 deletions(-) --=20 2.27.0