From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75B9AC2D0E8 for ; Thu, 26 Mar 2020 11:12:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1B19420409 for ; Thu, 26 Mar 2020 11:12:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iELsYFB7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1B19420409 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B05A86B0072; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB5C26B0073; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CDCB6B0074; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 850C16B0072 for ; Thu, 26 Mar 2020 07:12:56 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 661658248047 for ; Thu, 26 Mar 2020 11:12:56 +0000 (UTC) X-FDA: 76637251152.14.thumb24_199ad09693819 X-HE-Tag: thumb24_199ad09693819 X-Filterd-Recvd-Size: 11186 Received: from mail-pl1-f196.google.com (mail-pl1-f196.google.com [209.85.214.196]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 26 Mar 2020 11:12:55 +0000 (UTC) Received: by mail-pl1-f196.google.com with SMTP id g2so2003100plo.3 for ; Thu, 26 Mar 2020 04:12:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lDcRAPOAOCjxei5GIfDQABYJpwvJHNWVWIgGygWjlfU=; b=iELsYFB7VpOogk9P9fQpVkKjPl9106/LwwtS35V+SKTH2G9QOEQme56XM0YTTDT0t7 JyZMHFe17Ei43hfsUIZEl4y4TY2uOS+1VnORvr90+36VcG0y/qfGqU+dm9VBANQut0e0 B1Z7rSwuXLh7uc8gyk0yGg+OlfSKCzwLTzeUUUvzHRJP7XdUAtt2p8KntcuIv5b5br1t GlD9yKSlI00phiyzl+vT22TZuGIa+f+TqCh/CXqFmxepjY4Bxya7ZAhf/cSYZykUmzxV fhQ93AF+6vmxuvTds1Q9pMK1+2dBYPI/qtG2EHNvWEzJyhHW1BAYrUCyGpAYwhzcy1U1 6v6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lDcRAPOAOCjxei5GIfDQABYJpwvJHNWVWIgGygWjlfU=; b=m23YyPpf9prgFRl7Aj+WOA7k9KPjhDQOYZaeb5XbAYQ6i+vAVTs9ZnOP2F4BIoSJi1 OD3AKmksz64eGoEBhaLcVz7eBa0a/RJ2FYie9g1r1GcjYGnckPZf2Hfv9O23nffMes7v Reml8sb8mWssqaLjPlDje2DO4/f/GDj+bS1suU5BSbyStUwy5txHqglLhD9WZmP4QPIe 2QjvKd0z0LM8luLhQ6XpfzWdDj2YJ6L0yL44LlGa0uCvG+XPgtBJ368S5OXpsIZAIqzr vvGl1VXrK3LNOWofpnLMBMo1MMzjodse0nfShg0w1PtSAfgpJse7qhf8P36IgpEN32jy rVMw== X-Gm-Message-State: ANhLgQ2i/xW7zFDyjTgwCFz8x9wZnODrytXIqJomU/cQ0JFh0B4dvz0T woXfF1PWgckVwQC6AsVWmbY= X-Google-Smtp-Source: ADFU+vusNe8isEuLjYVdOaSJm3pPllANr4V6EDWHvBZaYRQbgIVI+9LfWkRoZv1XUfRz+yO5J9nohA== X-Received: by 2002:a17:902:20b:: with SMTP id 11mr7030287plc.209.1585221174653; Thu, 26 Mar 2020 04:12:54 -0700 (PDT) Received: from dev.localdomain ([203.100.54.194]) by smtp.gmail.com with ESMTPSA id m9sm1427723pff.93.2020.03.26.04.12.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Mar 2020 04:12:54 -0700 (PDT) From: Yafang Shao To: hannes@cmpxchg.org, peterz@infradead.org, akpm@linux-foundation.org, mhocko@kernel.org, axboe@kernel.dk, mgorman@suse.de, rostedt@goodmis.org, mingo@redhat.com Cc: linux-mm@kvack.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH 2/2] psi, tracepoint: introduce tracepoints for psi_memstall_{enter, leave} Date: Thu, 26 Mar 2020 07:12:07 -0400 Message-Id: <1585221127-11458-3-git-send-email-laoar.shao@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> References: <1585221127-11458-1-git-send-email-laoar.shao@gmail.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: With the new parameter introduced in psi_memstall_{enter, leave} we can get the specific type of memstal. To make it easier to use, we'd better introduce tracepoints for them. Once these two tracepoints are added we can easily use other tools like ebpf or bash script to collect the memstall data and analyze. Here's one example with bpftrace to measure application's latency. tracepoint:sched:psi_memstall_enter { @start[tid, args->type] = nsecs } tracepoint:sched:psi_memstall_leave { @time[comm, args->type] = hist(nsecs - @start[tid, args->type]); delete(@start[tid, args->type]); } Bellow is part of the result after producing some memory pressure. @time[objdump, 7]: [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objdump, 6]: [8K, 16K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[objcopy, 7]: [16K, 32K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[ld, 7]: [4M, 8M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8M, 16M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[khugepaged, 5]: [4K, 8K) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [8K, 16K) 0 | | [16K, 32K) 0 | | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 0 | | [16M, 32M) 0 | | [32M, 64M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| @time[kswapd0, 0]: [16K, 32K) 1 |@@@@@ | [32K, 64K) 0 | | [64K, 128K) 0 | | [128K, 256K) 0 | | [256K, 512K) 0 | | [512K, 1M) 0 | | [1M, 2M) 0 | | [2M, 4M) 0 | | [4M, 8M) 0 | | [8M, 16M) 1 |@@@@@ | [16M, 32M) 10 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32M, 64M) 9 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 2 |@@@@@@@@@@ | [128M, 256M) 2 |@@@@@@@@@@ | [256M, 512M) 3 |@@@@@@@@@@@@@@@ | [512M, 1G) 1 |@@@@@ | @time[kswapd1, 0]: [1M, 2M) 1 |@@@@ | [2M, 4M) 2 |@@@@@@@@ | [4M, 8M) 0 | | [8M, 16M) 12 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [16M, 32M) 7 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32M, 64M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [64M, 128M) 5 |@@@@@@@@@@@@@@@@@@@@@ | [128M, 256M) 3 |@@@@@@@@@@@@@ | [256M, 512M) 1 |@@@@ | @time[khugepaged, 1]: [2M, 4M) 1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| With the builtin variable 'cgroup' of bpftrace we can also filter a memcg and its descendants. Signed-off-by: Yafang Shao --- include/trace/events/sched.h | 41 +++++++++++++++++++++++++++++++++++++++++ kernel/sched/psi.c | 8 ++++++++ 2 files changed, 49 insertions(+) diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h index 420e80e..6aca996 100644 --- a/include/trace/events/sched.h +++ b/include/trace/events/sched.h @@ -7,8 +7,20 @@ #include #include +#include #include +#define show_psi_memstall_type(type) __print_symbolic(type, \ + {MEMSTALL_KSWAPD, "MEMSTALL_KSWAPD"}, \ + {MEMSTALL_RECLAIM_DIRECT, "MEMSTALL_RECLAIM_DIRECT"}, \ + {MEMSTALL_RECLAIM_MEMCG, "MEMSTALL_RECLAIM_MEMCG"}, \ + {MEMSTALL_RECLAIM_HIGH, "MEMSTALL_RECLAIM_HIGH"}, \ + {MEMSTALL_KCOMPACTD, "MEMSTALL_KCOMPACTD"}, \ + {MEMSTALL_COMPACT, "MEMSTALL_COMPACT"}, \ + {MEMSTALL_WORKINGSET, "MEMSTALL_WORKINGSET"}, \ + {MEMSTALL_PGLOCK, "MEMSTALL_PGLOCK"}, \ + {MEMSTALL_MEMDELAY, "MEMSTALL_MEMDELAY"}, \ + {MEMSTALL_SWAP, "MEMSTALL_SWAP"}) /* * Tracepoint for calling kthread_stop, performed to end a kthread: */ @@ -625,6 +637,35 @@ static inline long __trace_sched_switch_state(bool preempt, struct task_struct * TP_PROTO(struct root_domain *rd, bool overutilized), TP_ARGS(rd, overutilized)); +DECLARE_EVENT_CLASS(psi_memstall_template, + + TP_PROTO(int type), + + TP_ARGS(type), + + TP_STRUCT__entry( + __field(int, type) + ), + + TP_fast_assign( + __entry->type = type; + ), + + TP_printk("type=%s", + show_psi_memstall_type(__entry->type)) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_enter, + TP_PROTO(int type), + TP_ARGS(type) +); + +DEFINE_EVENT(psi_memstall_template, psi_memstall_leave, + TP_PROTO(int type), + TP_ARGS(type) +); + + #endif /* _TRACE_SCHED_H */ /* This part must be outside protection */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 460f084..4c5a402 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -142,6 +142,8 @@ #include #include "sched.h" +#include + static int psi_bug __read_mostly; DEFINE_STATIC_KEY_FALSE(psi_disabled); @@ -822,6 +824,9 @@ void psi_memstall_enter(unsigned long *flags, enum memstall_types type) *flags = current->flags & PF_MEMSTALL; if (*flags) return; + + trace_psi_memstall_enter(type); + /* * PF_MEMSTALL setting & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we can @@ -852,6 +857,9 @@ void psi_memstall_leave(unsigned long *flags, enum memstall_types type) if (*flags) return; + + trace_psi_memstall_leave(type); + /* * PF_MEMSTALL clearing & accounting needs to be atomic wrt * changes to the task's scheduling state, otherwise we could -- 1.8.3.1