From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9ED0C7619A for ; Sat, 8 Apr 2023 09:28:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35EC96B0072; Sat, 8 Apr 2023 05:28:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30EEF6B0074; Sat, 8 Apr 2023 05:28:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D8846B0075; Sat, 8 Apr 2023 05:28:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 0A1856B0072 for ; Sat, 8 Apr 2023 05:28:53 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CDB81AB541 for ; Sat, 8 Apr 2023 09:28:52 +0000 (UTC) X-FDA: 80657699304.22.C6DC214 Received: from mxct.zte.com.cn (mxct.zte.com.cn [183.62.165.209]) by imf21.hostedemail.com (Postfix) with ESMTP id D79281C0014 for ; Sat, 8 Apr 2023 09:28:49 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=zte.com.cn; spf=pass (imf21.hostedemail.com: domain of yang.yang29@zte.com.cn designates 183.62.165.209 as permitted sender) smtp.mailfrom=yang.yang29@zte.com.cn ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1680946130; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references; bh=rFk4aZenKPoikegLrFlljPLjAb8MljOGSf7uqMF+NVM=; b=H9gX+SYEfGFWnfXA44HJmNWxoIGjhSWwL0eOIsVqju1GV2tAljliCztPF8ROxqhynszUuy /EpOFBtZi6WYRlloaUiU0sUWTCly/IiAGYy+CxMEeL6NwMPW1j4/aEx32dF9hKUL2bV0dt WQV6z0nvuacmgPk20UdZWwgQOjUxvks= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=zte.com.cn; spf=pass (imf21.hostedemail.com: domain of yang.yang29@zte.com.cn designates 183.62.165.209 as permitted sender) smtp.mailfrom=yang.yang29@zte.com.cn ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1680946130; a=rsa-sha256; cv=none; b=0dv5+nv+F+L7+Jjolw/+yH3tuwtzWSEWUhIk7bmX97QZxFjqTw63lh4uxTOoJhtyubCk6l oXrclQHfPy8uaqxStAy0BQFu+lM+sYBZx7rmaxHRVPycUAGx1fIrLrFhUu0rSYG6w9aIRo 9/JyxT1lqsJJCQpj4Rl04nH+GU2xE7E= Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4Ptqfx5SkPz4xq1K; Sat, 8 Apr 2023 17:28:41 +0800 (CST) Received: from szxlzmapp07.zte.com.cn ([10.5.230.251]) by mse-fl2.zte.com.cn with SMTP id 3389SW78058724; Sat, 8 Apr 2023 17:28:32 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp03[null]) by mapi (Zmail) with MAPI id mid14; Sat, 8 Apr 2023 17:28:35 +0800 (CST) Date: Sat, 8 Apr 2023 17:28:35 +0800 (CST) X-Zmail-TransId: 2b05643133c3037-ddf59 X-Mailer: Zmail v1.0 Message-ID: <202304081728353557233@zte.com.cn> Mime-Version: 1.0 From: To: , , Cc: , , , , , Subject: =?UTF-8?B?W1BBVENIIGxpbnV4LW5leHRdwqBkZWxheWFjY3Q6IHRyYWNrIGRlbGF5cyBmcm9tIElSUS9TT0ZUSVJR?= Content-Type: text/plain; charset="UTF-8" X-MAIL:mse-fl2.zte.com.cn 3389SW78058724 X-Fangmail-Gw-Spam-Type: 0 X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 643133C9.002/4Ptqfx5SkPz4xq1K X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D79281C0014 X-Stat-Signature: zzr9tr8ur98uowdyp55xhsomx9suw7r3 X-HE-Tag: 1680946129-351053 X-HE-Meta: U2FsdGVkX18+Td3yq5BQaW9XrwyTf4LogB2hPnFuIcNo543JP7Vi+CSkvJkKSI0c0+xWiwRRtuiyJ5bDzJcV6w5wrEhDVR8BNsN+hOHXZZtMbFhv7y2VtzK4L/Nra6p1RWbKTcgH2rcOHZPH4SWZCMJC+RjgmICK/N44JARv/dpQrtfVi5ggojC4zN6JZdFvKvYMnWkCYyZFOtPijE6i7uwWfTICiJ39yfAptmnpoXyXO/S5XTeR/XvnnuoCxei81z5XknJieibsmkHCY1Dz7uqrQEU7wRE63hiFlF/de4tCL7H+tvM8+J73YHlADmPk2m4CyaSJYw+LUZ2PyEKbqV7zDWI4Yh1EAAljIgorTwuHKPSOXtdztPzifb71OK3F1hse91fhwvo/zABWyllRJ4/LUuxMJ4KPELDEXjDdLwSHihX9PuRNhwIJKRE/Ph0UT6vygmfku/PeK1Zh+X8o2fI+DAOMW0VW5IjtGUoahAblyLrK8bRsgorAyJO8Zl/LHcflmuur0TpUmbfVUxT5PL32A23+QW22z+vZtdWIadlGKaTyWjM9oZ4vrCxv1bQEOzuy7FB/sLgetaLLrHQ57VTJNafNz1dWMtPq0VuqL8c6Gb4cFWDqQy3ky9aQMbS0kxFMd9RMrm0XgBesmj/UupMe2c8hEhZLpme9w+Jt9MpF5M4nF7VgVWYyciPCMK+CgvstdoNy05/ooKdJNMio9rm00lrqcDclSiurKpH8ce/RJJHFX9h/sQ9gngedjM4EZU1VOOF+DEzQ51l8o0uUSA+WksZQ5am+UXSLjawdsISer1xJ05I2l6IyXkfq0AsB1Qdsks5Y3UTBDZhvLka14evZ9NwPfLi8Pr3T6sLhDJ8gDn4UUarN5ecfwISH08XQWUAz838V+o6EPwGY0sY74/DInwZSFhcaB62nwQ8WonW/QH3iBqmS0MAH8aS7kcQlYKdfiuwc8CKRR8Nfgt2 tR4Lb5MA rVb3TYVnjTMu+RgwGbd/4HpNnZuSXdPrcy9+NEvV0aAcjOG+6fAmxlVA8UvMyoAy1jqWQOC2regf7+oOaQb1TR3rAb/zg1dvbouDyVYbw7IFzeLkMGVlwhiGvOzIsegn0+1mAwC/vvzVd+l/mvpik+/A18cP4th7lu5b2wE1WNbvyqeWt74mmyL4Op7jpB4obysi1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Yang Yang Delay accounting does not track the delay of IRQ/SOFTIRQ. While IRQ/SOFTIRQ could have obvious impact on some workloads productivity, such as when workloads are running on system which is busy handling network IRQ/SOFTIRQ. Get the delay of IRQ/SOFTIRQ could help users to reduce such delay. Such as setting interrupt affinity or task affinity, using kernel thread for NAPI etc. This is inspired by "sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure"[1]. Also fix some code indent problems of older code. And update tools/accounting/getdelays.c: / # ./getdelays -p 156 -di print delayacct stats ON printing IO accounting PID 156 CPU count real total virtual total delay total delay average 15 15836008 16218149 275700790 18.380ms IO count delay total delay average 0 0 0.000ms SWAP count delay total delay average 0 0 0.000ms RECLAIM count delay total delay average 0 0 0.000ms THRASHING count delay total delay average 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms WPCOPY count delay total delay average 36 7586118 0.211ms IRQ count delay total delay average 42 929161 0.022ms [1] commit 52b1364ba0b1("sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure") Signed-off-by: Yang Yang Cc: Jiang Xuexin Cc: wangyong Cc: junhua huang --- Documentation/accounting/delay-accounting.rst | 7 +++-- include/linux/delayacct.h | 15 ++++++++++ include/uapi/linux/taskstats.h | 6 +++- kernel/delayacct.c | 14 +++++++++ kernel/sched/core.c | 1 + tools/accounting/getdelays.c | 30 +++++++++++-------- 6 files changed, 58 insertions(+), 15 deletions(-) diff --git a/Documentation/accounting/delay-accounting.rst b/Documentation/accounting/delay-accounting.rst index 79f537c9f160..f61c01fc376e 100644 --- a/Documentation/accounting/delay-accounting.rst +++ b/Documentation/accounting/delay-accounting.rst @@ -16,6 +16,7 @@ d) memory reclaim e) thrashing f) direct compact g) write-protect copy +h) IRQ/SOFTIRQ and makes these statistics available to userspace through the taskstats interface. @@ -49,7 +50,7 @@ this structure. See for a description of the fields pertaining to delay accounting. It will generally be in the form of counters returning the cumulative delay seen for cpu, sync block I/O, swapin, memory reclaim, thrash page -cache, direct compact, write-protect copy etc. +cache, direct compact, write-protect copy, IRQ/SOFTIRQ etc. Taking the difference of two successive readings of a given counter (say cpu_delay_total) for a task will give the delay @@ -118,7 +119,9 @@ Get sum of delays, since system boot, for all pids with tgid 5:: 0 0 0.000ms COMPACT count delay total delay average 0 0 0.000ms - WPCOPY count delay total delay average + WPCOPY count delay total delay average + 0 0 0.000ms + IRQ count delay total delay average 0 0 0.000ms Get IO accounting for pid 1, it works only with -p:: diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index 0da97dba9ef8..6639f48dac36 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -48,10 +48,13 @@ struct task_delay_info { u64 wpcopy_start; u64 wpcopy_delay; /* wait for write-protect copy */ + u64 irq_delay; /* wait for IRQ/SOFTIRQ */ + u32 freepages_count; /* total count of memory reclaim */ u32 thrashing_count; /* total count of thrash waits */ u32 compact_count; /* total count of memory compact */ u32 wpcopy_count; /* total count of write-protect copy */ + u32 irq_count; /* total count of IRQ/SOFTIRQ */ }; #endif @@ -81,6 +84,7 @@ extern void __delayacct_compact_start(void); extern void __delayacct_compact_end(void); extern void __delayacct_wpcopy_start(void); extern void __delayacct_wpcopy_end(void); +extern void __delayacct_irq(struct task_struct *task, u32 delta); static inline void delayacct_tsk_init(struct task_struct *tsk) { @@ -215,6 +219,15 @@ static inline void delayacct_wpcopy_end(void) __delayacct_wpcopy_end(); } +static inline void delayacct_irq(struct task_struct *task, u32 delta) +{ + if (!static_branch_unlikely(&delayacct_key)) + return; + + if (task->delays) + __delayacct_irq(task, delta); +} + #else static inline void delayacct_init(void) {} @@ -253,6 +266,8 @@ static inline void delayacct_wpcopy_start(void) {} static inline void delayacct_wpcopy_end(void) {} +static inline void delayacct_irq(struct task_struct *task, u32 delta) +{} #endif /* CONFIG_TASK_DELAY_ACCT */ diff --git a/include/uapi/linux/taskstats.h b/include/uapi/linux/taskstats.h index a7f5b11a8f1b..b50b2eb257a0 100644 --- a/include/uapi/linux/taskstats.h +++ b/include/uapi/linux/taskstats.h @@ -34,7 +34,7 @@ */ -#define TASKSTATS_VERSION 13 +#define TASKSTATS_VERSION 14 #define TS_COMM_LEN 32 /* should be >= TASK_COMM_LEN * in linux/sched.h */ @@ -198,6 +198,10 @@ struct taskstats { /* v13: Delay waiting for write-protect copy */ __u64 wpcopy_count; __u64 wpcopy_delay_total; + + /* v14: Delay waiting for IRQ/SOFTIRQ */ + __u64 irq_count; + __u64 irq_delay_total; }; diff --git a/kernel/delayacct.c b/kernel/delayacct.c index e39cb696cfbd..6f0c358e73d8 100644 --- a/kernel/delayacct.c +++ b/kernel/delayacct.c @@ -179,12 +179,15 @@ int delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk) d->compact_delay_total = (tmp < d->compact_delay_total) ? 0 : tmp; tmp = d->wpcopy_delay_total + tsk->delays->wpcopy_delay; d->wpcopy_delay_total = (tmp < d->wpcopy_delay_total) ? 0 : tmp; + tmp = d->irq_delay_total + tsk->delays->irq_delay; + d->irq_delay_total = (tmp < d->irq_delay_total) ? 0 : tmp; d->blkio_count += tsk->delays->blkio_count; d->swapin_count += tsk->delays->swapin_count; d->freepages_count += tsk->delays->freepages_count; d->thrashing_count += tsk->delays->thrashing_count; d->compact_count += tsk->delays->compact_count; d->wpcopy_count += tsk->delays->wpcopy_count; + d->irq_count += tsk->delays->irq_count; raw_spin_unlock_irqrestore(&tsk->delays->lock, flags); return 0; @@ -274,3 +277,14 @@ void __delayacct_wpcopy_end(void) ¤t->delays->wpcopy_delay, ¤t->delays->wpcopy_count); } + +void __delayacct_irq(struct task_struct *task, u32 delta) +{ + unsigned long flags; + + raw_spin_lock_irqsave(&task->delays->lock, flags); + task->delays->irq_delay += delta; + task->delays->irq_count++; + raw_spin_unlock_irqrestore(&task->delays->lock, flags); +} + diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a380f34789a2..8127fa8dfde7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -704,6 +704,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) rq->prev_irq_time += irq_delta; delta -= irq_delta; psi_account_irqtime(rq->curr, irq_delta); + delayacct_irq(rq->curr, irq_delta); #endif #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING if (static_key_false((¶virt_steal_rq_enabled))) { diff --git a/tools/accounting/getdelays.c b/tools/accounting/getdelays.c index 23a15d8f2bf4..1334214546d7 100644 --- a/tools/accounting/getdelays.c +++ b/tools/accounting/getdelays.c @@ -198,17 +198,19 @@ static void print_delayacct(struct taskstats *t) printf("\n\nCPU %15s%15s%15s%15s%15s\n" " %15llu%15llu%15llu%15llu%15.3fms\n" "IO %15s%15s%15s\n" - " %15llu%15llu%15.3fms\n" + " %15llu%15llu%15.3fms\n" "SWAP %15s%15s%15s\n" - " %15llu%15llu%15.3fms\n" + " %15llu%15llu%15.3fms\n" "RECLAIM %12s%15s%15s\n" - " %15llu%15llu%15.3fms\n" + " %15llu%15llu%15.3fms\n" "THRASHING%12s%15s%15s\n" - " %15llu%15llu%15.3fms\n" + " %15llu%15llu%15.3fms\n" "COMPACT %12s%15s%15s\n" - " %15llu%15llu%15.3fms\n" + " %15llu%15llu%15.3fms\n" "WPCOPY %12s%15s%15s\n" - " %15llu%15llu%15.3fms\n", + " %15llu%15llu%15.3fms\n" + "IRQ %15s%15s%15s\n" + " %15llu%15llu%15.3fms\n", "count", "real total", "virtual total", "delay total", "delay average", (unsigned long long)t->cpu_count, @@ -219,27 +221,31 @@ static void print_delayacct(struct taskstats *t) "count", "delay total", "delay average", (unsigned long long)t->blkio_count, (unsigned long long)t->blkio_delay_total, - average_ms((double)t->blkio_delay_total, t->blkio_count), + average_ms((double)t->blkio_delay_total, t->blkio_count), "count", "delay total", "delay average", (unsigned long long)t->swapin_count, (unsigned long long)t->swapin_delay_total, - average_ms((double)t->swapin_delay_total, t->swapin_count), + average_ms((double)t->swapin_delay_total, t->swapin_count), "count", "delay total", "delay average", (unsigned long long)t->freepages_count, (unsigned long long)t->freepages_delay_total, - average_ms((double)t->freepages_delay_total, t->freepages_count), + average_ms((double)t->freepages_delay_total, t->freepages_count), "count", "delay total", "delay average", (unsigned long long)t->thrashing_count, (unsigned long long)t->thrashing_delay_total, - average_ms((double)t->thrashing_delay_total, t->thrashing_count), + average_ms((double)t->thrashing_delay_total, t->thrashing_count), "count", "delay total", "delay average", (unsigned long long)t->compact_count, (unsigned long long)t->compact_delay_total, - average_ms((double)t->compact_delay_total, t->compact_count), + average_ms((double)t->compact_delay_total, t->compact_count), "count", "delay total", "delay average", (unsigned long long)t->wpcopy_count, (unsigned long long)t->wpcopy_delay_total, - average_ms((double)t->wpcopy_delay_total, t->wpcopy_count)); + average_ms((double)t->wpcopy_delay_total, t->wpcopy_count), + "count", "delay total", "delay average", + (unsigned long long)t->irq_count, + (unsigned long long)t->irq_delay_total, + average_ms((double)t->irq_delay_total, t->irq_count)); } static void task_context_switch_counts(struct taskstats *t) -- 2.25.1