From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1AFF5C38145 for ; Wed, 7 Sep 2022 09:06:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 86EC46B0073; Wed, 7 Sep 2022 05:06:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81ED36B0074; Wed, 7 Sep 2022 05:06:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E7378D0001; Wed, 7 Sep 2022 05:06:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 61B226B0073 for ; Wed, 7 Sep 2022 05:06:17 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 35990AAF20 for ; Wed, 7 Sep 2022 09:06:17 +0000 (UTC) X-FDA: 79884707994.15.B9858FF Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf26.hostedemail.com (Postfix) with ESMTP id AC4E8140095 for ; Wed, 7 Sep 2022 09:06:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662541575; x=1694077575; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=J//TeRrrTjMF/epUCtQHHwk+9t0QqQ48g9qbTOtc0iA=; b=i0Wz+M3TUMJrvWtYg9Vks/ArqE8WpwTZUJBejTpMhOqvu7uhuvkDOB4t 5TqPKtFxQ85yKVWcA/4Y0vs/USVAR0qdPZyzIBE0m+NpDp29dc2x4IZON SUtC09l3+4OOqST/9DzthSzsVV0L+nKMwB3By+fGOJkLA4JEfAN+8AJ7q HssisFQdSRTYsFeLg6lphGsHzzEREQtGCu7Q15rFuv6mV8a8UzzSZ90zu yDNnvWcF0sxbXlNFY96EBaBF14hGB3MwNiyQqKcnCeiWA6j8aUqf5sSyK sYKfHsKUJwO7OC+KZ2FRnOt2yRxDR0MuPJrkDhGyfc4hwM6GuGcswtL3J Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10462"; a="360767384" X-IronPort-AV: E=Sophos;i="5.93,296,1654585200"; d="scan'208";a="360767384" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Sep 2022 02:06:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,296,1654585200"; d="scan'208";a="740210230" Received: from linux-pnp-server-13.sh.intel.com ([10.239.176.176]) by orsmga004.jf.intel.com with ESMTP; 07 Sep 2022 02:06:10 -0700 From: Jiebin Sun To: akpm@linux-foundation.org, vasily.averin@linux.dev, shakeelb@google.com, dennis@kernel.org, tj@kernel.org, cl@linux.com, ebiederm@xmission.com, legion@kernel.org, manfred@colorfullife.com, alexander.mikhalitsyn@virtuozzo.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: tim.c.chen@intel.com, feng.tang@intel.com, ying.huang@intel.com, tianyou.li@intel.com, wangyang.guo@intel.com, jiebin.sun@intel.com Subject: [PATCH v4] ipc/msg: mitigate the lock contention with percpu counter Date: Thu, 8 Sep 2022 01:25:16 +0800 Message-Id: <20220907172516.1210842-1-jiebin.sun@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=i0Wz+M3T; spf=softfail (imf26.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662541575; a=rsa-sha256; cv=none; b=gJRjeZuewPBhRWP5x53Q4oLsPij8vjzIt0RunMWaK+Ia+hPjHa5tIYZOPB3GUQSI+n04Xd 1LKsw56b/IpU9K7r/4T7j964AGXhBlU1Y4IYR53bF82VVpahqlrbzjvMhHXQTB6DrAnC5R +IzVyJ7Ut/u2QeUQxcFnDuwv2tnYzsI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662541575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uFtRhl8HcClNOvK4FhOO4g8o1N77dEdMSrObnlHIsXg=; b=z3GUV9VUWYOymq+rZDufVp7C3VL9uqCBct5S/j/Kq/w3c8KGEnpieYG/WvMdhxyXkGpPbC pbLIDMDvbjWxKt3vjQho8sUK0Sn9e9nfV84ZgTFsixKchr/YuCAtDRan9zuU1Ngi/2Ult0 Wn+oLM1G/pM0cUKo7sc9pNXacersotk= X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AC4E8140095 Authentication-Results: imf26.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=i0Wz+M3T; spf=softfail (imf26.hostedemail.com: 134.134.136.100 is neither permitted nor denied by domain of jiebin.sun@intel.com) smtp.mailfrom=jiebin.sun@intel.com; dmarc=fail reason="No valid SPF" header.from=intel.com (policy=none) X-Rspam-User: X-Stat-Signature: 8sxwbbqac6j6ou9478if7omyw1w3pj1k X-HE-Tag: 1662541575-933252 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.17x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) Signed-off-by: Jiebin Sun --- include/linux/ipc_namespace.h | 5 ++-- ipc/msg.c | 47 ++++++++++++++++++++++++----------- ipc/namespace.c | 5 +++- ipc/util.h | 4 +-- 4 files changed, 42 insertions(+), 19 deletions(-) diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include struct user_namespace; @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; diff --git a/ipc/msg.c b/ipc/msg.c index a0d05775af2c..040cfc93d7ef 100644 --- a/ipc/msg.c +++ b/ipc/msg.c @@ -39,11 +39,15 @@ #include #include #include +#include #include #include #include "util.h" +/* large batch size could reduce the times to sum up percpu counter */ +#define MSG_PERCPU_COUNTER_BATCH 1024 + /* one msq_queue structure for each present queue on the system */ struct msg_queue { struct kern_ipc_perm q_perm; @@ -285,10 +289,10 @@ static void freeque(struct ipc_namespace *ns, struct kern_ipc_perm *ipcp) rcu_read_unlock(); list_for_each_entry_safe(msg, t, &msq->q_messages, m_list) { - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_batch(&ns->percpu_msg_hdrs, -1, MSG_PERCPU_COUNTER_BATCH); free_msg(msg); } - atomic_sub(msq->q_cbytes, &ns->msg_bytes); + percpu_counter_add_batch(&ns->percpu_msg_bytes, -(msq->q_cbytes), MSG_PERCPU_COUNTER_BATCH); ipc_update_pid(&msq->q_lspid, NULL); ipc_update_pid(&msq->q_lrpid, NULL); ipc_rcu_putref(&msq->q_perm, msg_rcu_free); @@ -495,17 +499,18 @@ static int msgctl_info(struct ipc_namespace *ns, int msqid, msginfo->msgssz = MSGSSZ; msginfo->msgseg = MSGSEG; down_read(&msg_ids(ns).rwsem); - if (cmd == MSG_INFO) { + if (cmd == MSG_INFO) msginfo->msgpool = msg_ids(ns).in_use; - msginfo->msgmap = atomic_read(&ns->msg_hdrs); - msginfo->msgtql = atomic_read(&ns->msg_bytes); + max_idx = ipc_get_maxidx(&msg_ids(ns)); + up_read(&msg_ids(ns).rwsem); + if (cmd == MSG_INFO) { + msginfo->msgmap = percpu_counter_sum(&ns->percpu_msg_hdrs); + msginfo->msgtql = percpu_counter_sum(&ns->percpu_msg_bytes); } else { msginfo->msgmap = MSGMAP; msginfo->msgpool = MSGPOOL; msginfo->msgtql = MSGTQL; } - max_idx = ipc_get_maxidx(&msg_ids(ns)); - up_read(&msg_ids(ns).rwsem); return (max_idx < 0) ? 0 : max_idx; } @@ -935,8 +940,8 @@ static long do_msgsnd(int msqid, long mtype, void __user *mtext, list_add_tail(&msg->m_list, &msq->q_messages); msq->q_cbytes += msgsz; msq->q_qnum++; - atomic_add(msgsz, &ns->msg_bytes); - atomic_inc(&ns->msg_hdrs); + percpu_counter_add_batch(&ns->percpu_msg_bytes, msgsz, MSG_PERCPU_COUNTER_BATCH); + percpu_counter_add_batch(&ns->percpu_msg_hdrs, 1, MSG_PERCPU_COUNTER_BATCH); } err = 0; @@ -1159,8 +1164,8 @@ static long do_msgrcv(int msqid, void __user *buf, size_t bufsz, long msgtyp, in msq->q_rtime = ktime_get_real_seconds(); ipc_update_pid(&msq->q_lrpid, task_tgid(current)); msq->q_cbytes -= msg->m_ts; - atomic_sub(msg->m_ts, &ns->msg_bytes); - atomic_dec(&ns->msg_hdrs); + percpu_counter_add_batch(&ns->percpu_msg_bytes, -(msg->m_ts), MSG_PERCPU_COUNTER_BATCH); + percpu_counter_add_batch(&ns->percpu_msg_hdrs, -1, MSG_PERCPU_COUNTER_BATCH); ss_wakeup(msq, &wake_q, false); goto out_unlock0; @@ -1297,20 +1302,34 @@ COMPAT_SYSCALL_DEFINE5(msgrcv, int, msqid, compat_uptr_t, msgp, } #endif -void msg_init_ns(struct ipc_namespace *ns) +int msg_init_ns(struct ipc_namespace *ns) { + int ret; + ns->msg_ctlmax = MSGMAX; ns->msg_ctlmnb = MSGMNB; ns->msg_ctlmni = MSGMNI; - atomic_set(&ns->msg_bytes, 0); - atomic_set(&ns->msg_hdrs, 0); + ret = percpu_counter_init(&ns->percpu_msg_bytes, 0, GFP_KERNEL); + if (ret) + goto fail_msg_bytes; + ret = percpu_counter_init(&ns->percpu_msg_hdrs, 0, GFP_KERNEL); + if (ret) + goto fail_msg_hdrs; ipc_init_ids(&ns->ids[IPC_MSG_IDS]); + return 0; + + fail_msg_hdrs: + percpu_counter_destroy(&ns->percpu_msg_bytes); + fail_msg_bytes: + return ret; } #ifdef CONFIG_IPC_NS void msg_exit_ns(struct ipc_namespace *ns) { + percpu_counter_destroy(&ns->percpu_msg_bytes); + percpu_counter_destroy(&ns->percpu_msg_hdrs); free_ipcs(ns, &msg_ids(ns), freeque); idr_destroy(&ns->ids[IPC_MSG_IDS].ipcs_idr); rhashtable_destroy(&ns->ids[IPC_MSG_IDS].key_ht); diff --git a/ipc/namespace.c b/ipc/namespace.c index e1fcaedba4fa..8316ea585733 100644 --- a/ipc/namespace.c +++ b/ipc/namespace.c @@ -66,8 +66,11 @@ static struct ipc_namespace *create_ipc_ns(struct user_namespace *user_ns, if (!setup_ipc_sysctls(ns)) goto fail_mq; + err = msg_init_ns(ns); + if (err) + goto fail_put; + sem_init_ns(ns); - msg_init_ns(ns); shm_init_ns(ns); return ns; diff --git a/ipc/util.h b/ipc/util.h index 2dd7ce0416d8..1b0086c6346f 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -64,7 +64,7 @@ static inline void mq_put_mnt(struct ipc_namespace *ns) { } #ifdef CONFIG_SYSVIPC void sem_init_ns(struct ipc_namespace *ns); -void msg_init_ns(struct ipc_namespace *ns); +int msg_init_ns(struct ipc_namespace *ns); void shm_init_ns(struct ipc_namespace *ns); void sem_exit_ns(struct ipc_namespace *ns); @@ -72,7 +72,7 @@ void msg_exit_ns(struct ipc_namespace *ns); void shm_exit_ns(struct ipc_namespace *ns); #else static inline void sem_init_ns(struct ipc_namespace *ns) { } -static inline void msg_init_ns(struct ipc_namespace *ns) { } +static inline int msg_init_ns(struct ipc_namespace *ns) { return 0;} static inline void shm_init_ns(struct ipc_namespace *ns) { } static inline void sem_exit_ns(struct ipc_namespace *ns) { } -- 2.31.1