From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.4 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_MED,URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC6A5C43144 for ; Wed, 27 Jun 2018 19:13:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 968C925DD7 for ; Wed, 27 Jun 2018 19:13:15 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kB85XWA+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 968C925DD7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966412AbeF0TNK (ORCPT ); Wed, 27 Jun 2018 15:13:10 -0400 Received: from mail-pf0-f195.google.com ([209.85.192.195]:43867 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965421AbeF0TNF (ORCPT ); Wed, 27 Jun 2018 15:13:05 -0400 Received: by mail-pf0-f195.google.com with SMTP id y8-v6so1403427pfm.10 for ; Wed, 27 Jun 2018 12:13:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=m6tdRrvKpKkGOKc/97tUsEByWeLMusIKApVaRRiu7mU=; b=kB85XWA+cW/Cagq0Dweo+oofwnOc402eWhHxVOKirFbUyN8vz5fZfEA/K9qnPx0BIp 32Mo/J0ve02NLb1YyTtaqvzB8fQj5BVIrrUx5bVe4wFW/zHBcb9jH4a2+ZBVmTDGkZyq hP5AJ6VqguuZ43OEUpGQMr2rRATt3AK5PJ0Ji1gt4rPdlvWgKTaSs7/ye6/yOG29y2Or FzzaY+n3z59Q5ALwHba6Cc8caBjDLe6DGGHaJxafMPhGgSAu0JiV8bY0olDYJN6yK/Bb CikBqVj7SvsUnd6VH0N749G8eGBAyvDOSmOT4o1+uSCpfzuRFZ1vOW9//ubb63crRbiq OWRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=m6tdRrvKpKkGOKc/97tUsEByWeLMusIKApVaRRiu7mU=; b=thL4KgvUzH0asJc7vw3NhPcMQYjEhvVGQ5vUWQn7SiI0W0HacxuY4wYONDu9V8/ZKn ZHr6xW0sWklgSfub5+uQd8AW8NMuJGd8HKtse1BsYGSW9MYlsk3HX8PdRVgt2rsmmKWn C1f67R6Uk90WoFEpTfriC5gea6VUT6RgXj0fLrkHv6tT0/5be7jTuuELaaX0T4xDmoO+ dJevLDYT/M0hx0DFWxykE2RqFtPIgnL8txlfabHzufz6QyadOzZEKnO4/2yQqGZqoJsz rQwXp4sHhEAQRbMfWDpRyYgKWJpIR8OJhhsxZoTvlD7wxsMHN189Iw0IWH/qinfF/lEt 2JYQ== X-Gm-Message-State: APt69E2IQwfNIfQtZbQzj/fU1ca6e2DPKbzL3zK8zfCv8TRaJ+/8TYoX hlAqfI26OhwHGJU+Ix57cqGujQ== X-Google-Smtp-Source: ADUXVKLmOp0m6t4Vpa3AxLUUNG0u1WwK7TaMYvk2jP/1SylTU3bUPg7OOz3RXtr+GKjvDQOxCAa3cQ== X-Received: by 2002:a63:3f05:: with SMTP id m5-v6mr6162564pga.51.1530126784136; Wed, 27 Jun 2018 12:13:04 -0700 (PDT) Received: from shakeelb.mtv.corp.google.com ([2620:15c:2cb:201:3a5f:3a4f:fa44:6b63]) by smtp.gmail.com with ESMTPSA id l6-v6sm8733667pfc.172.2018.06.27.12.13.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 27 Jun 2018 12:13:02 -0700 (PDT) From: Shakeel Butt To: Andrew Morton Cc: Michal Hocko , Johannes Weiner , Vladimir Davydov , Jan Kara , Greg Thelen , Amir Goldstein , Roman Gushchin , Alexander Viro , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Jan Kara Subject: [PATCH 2/2] fs, mm: account buffer_head to kmemcg Date: Wed, 27 Jun 2018 12:12:50 -0700 Message-Id: <20180627191250.209150-3-shakeelb@google.com> X-Mailer: git-send-email 2.18.0.rc2.346.g013aa6912e-goog In-Reply-To: <20180627191250.209150-1-shakeelb@google.com> References: <20180627191250.209150-1-shakeelb@google.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The buffer_head can consume a significant amount of system memory and is directly related to the amount of page cache. In our production environment we have observed that a lot of machines are spending a significant amount of memory as buffer_head and can not be left as system memory overhead. Charging buffer_head is not as simple as adding __GFP_ACCOUNT to the allocation. The buffer_heads can be allocated in a memcg different from the memcg of the page for which buffer_heads are being allocated. One concrete example is memory reclaim. The reclaim can trigger I/O of pages of any memcg on the system. So, the right way to charge buffer_head is to extract the memcg from the page for which buffer_heads are being allocated and then use targeted memcg charging API. Signed-off-by: Shakeel Butt Cc: Michal Hocko Cc: Jan Kara Cc: Amir Goldstein Cc: Greg Thelen Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Roman Gushchin Cc: Andrew Morton Cc: Alexander Viro --- Changelog since v2: - get_mem_cgroup_from_page() returns root_mem_cgroup if page->memcg is either NULL or css_tryget_online fails. Changelog since v1: - simple code cleanups fs/buffer.c | 10 +++++++++- include/linux/memcontrol.h | 7 +++++++ mm/memcontrol.c | 22 ++++++++++++++++++++++ 3 files changed, 38 insertions(+), 1 deletion(-) diff --git a/fs/buffer.c b/fs/buffer.c index 8194e3049fc5..235826333936 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -45,6 +45,7 @@ #include #include #include +#include #include static int fsync_buffers_list(spinlock_t *lock, struct list_head *list); @@ -815,10 +816,14 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, struct buffer_head *bh, *head; gfp_t gfp = GFP_NOFS; long offset; + struct mem_cgroup *memcg; if (retry) gfp |= __GFP_NOFAIL; + memcg = get_mem_cgroup_from_page(page); + memalloc_use_memcg(memcg); + head = NULL; offset = PAGE_SIZE; while ((offset -= size) >= 0) { @@ -835,6 +840,9 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, /* Link the buffer to its page */ set_bh_page(bh, page, offset); } +out: + memalloc_unuse_memcg(); + mem_cgroup_put(memcg); return head; /* * In case anything failed, we just free everything we got. @@ -848,7 +856,7 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, } while (head); } - return NULL; + goto out; } EXPORT_SYMBOL_GPL(alloc_page_buffers); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index cb04b382c8d2..919b98ddda45 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -380,6 +380,8 @@ struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p); struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm); +struct mem_cgroup *get_mem_cgroup_from_page(struct page *page); + static inline struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css){ return css ? container_of(css, struct mem_cgroup, css) : NULL; @@ -865,6 +867,11 @@ static inline struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) return NULL; } +static inline struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + return NULL; +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index b25ca5c13196..21a7c2fb8097 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -713,6 +713,28 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) } EXPORT_SYMBOL(get_mem_cgroup_from_mm); +/** + * get_mem_cgroup_from_page: Obtain a reference on given page's memcg. + * @page: page from which memcg should be extracted. + * + * Obtain a reference on page->memcg and returns it if successful. Otherwise + * root_mem_cgroup is returned. + */ +struct mem_cgroup *get_mem_cgroup_from_page(struct page *page) +{ + struct mem_cgroup *memcg = page->mem_cgroup; + + if (mem_cgroup_disabled()) + return NULL; + + rcu_read_lock(); + if (!memcg || !css_tryget_online(&memcg->css)) + memcg = root_mem_cgroup; + rcu_read_unlock(); + return memcg; +} +EXPORT_SYMBOL(get_mem_cgroup_from_page); + /** * If current->active_memcg is non-NULL, do not fallback to current->mm->memcg. */ -- 2.18.0.rc2.346.g013aa6912e-goog