From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4188AEB64D7 for ; Wed, 28 Jun 2023 09:58:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9EF948D0002; Wed, 28 Jun 2023 05:58:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A0058D0001; Wed, 28 Jun 2023 05:58:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83FBE8D0002; Wed, 28 Jun 2023 05:58:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 707FF8D0001 for ; Wed, 28 Jun 2023 05:58:00 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 198F3C0C8B for ; Wed, 28 Jun 2023 09:58:00 +0000 (UTC) X-FDA: 80951705520.04.F5AC0D8 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf03.hostedemail.com (Postfix) with ESMTP id BB7BD2001A for ; Wed, 28 Jun 2023 09:57:56 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=sVseuML8; spf=pass (imf03.hostedemail.com: domain of jaypatel@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=jaypatel@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687946276; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=a3TyLbiirsFWLMEN8TVRxLX9wAB//kAZMeb27nzZLDs=; b=zHwkPaFmuOsYHSNcEtXhbwt9Kv63/SRkrkGBQqhVHJGY5Ef1TG0qTuPvNaEXZ+mxr4fvxw knmKUp5ilWiLH+EVcR8LgIJF68l6JfRfvllSce9zK45/tXMf7S2Rf5Il5JdaUVqlzDXg37 m8wB74F+A3v2CV+cVzRWPnBrkveNl/M= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=sVseuML8; spf=pass (imf03.hostedemail.com: domain of jaypatel@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=jaypatel@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687946276; a=rsa-sha256; cv=none; b=tFLp8OsZjAy+9hFosS+ymJClbscmXCFp1O6JhORnLGHdFJie6YEbOe3XYG7bUyf/6GGDjj mTUogD9i6aY/PGyOc5gn6VBE1QARtnEmJyFJNjMdaHaiteiGxcEUhpnZgCkmDkQLNQzklC CSi8Ub7QrsR1EBOLxTu5taodnRJxFjE= Received: from pps.filterd (m0353724.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35S9lYdU004002; Wed, 28 Jun 2023 09:57:51 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=a3TyLbiirsFWLMEN8TVRxLX9wAB//kAZMeb27nzZLDs=; b=sVseuML8aiGL3Nd/uj9xwv0uJurH7cM3mYW7KmloV1Ip3AQfJ45xaIiIJCnvkn4KDZXe 3hlgCQTFwB9G6j6AbiFWydL5oyP0WOvxoi3Sds0wa16dQFXlNJokAZ33OGopEtNcb7A0 g3wh4ce0WmrVUbsLlgPuDZUyiLBXNCWIy0ThqiOflbqs1dT+KKrAAOZ3B8p3VoJueLl0 2Cd3saFfxi4VR+cf+SVn60n2VS/EChlg6qSgrfXsBzvHKfWXMzqP3iwgHtC7tSxvIg6l ZrcUs+hSROblvhzrRwz2VzKyKhiaLWw5PCyFph8d7fWk/Gw7FHt01jmZJhdEwwI4ez8Y Cg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rgjhpg7nf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 09:57:50 +0000 Received: from m0353724.ppops.net (m0353724.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 35S9m9GC005723; Wed, 28 Jun 2023 09:57:50 GMT Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rgjhpg7n6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 09:57:50 +0000 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 35S4ChGr014684; Wed, 28 Jun 2023 09:57:49 GMT Received: from smtprelay01.wdc07v.mail.ibm.com ([9.208.129.119]) by ppma03dal.us.ibm.com (PPS) with ESMTPS id 3rdr45fsby-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 09:57:49 +0000 Received: from smtpav01.dal12v.mail.ibm.com (smtpav01.dal12v.mail.ibm.com [10.241.53.100]) by smtprelay01.wdc07v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 35S9vla334669154 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 28 Jun 2023 09:57:47 GMT Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3DAF458057; Wed, 28 Jun 2023 09:57:47 +0000 (GMT) Received: from smtpav01.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 823AE58059; Wed, 28 Jun 2023 09:57:43 +0000 (GMT) Received: from patel.in.ibm.com (unknown [9.109.195.224]) by smtpav01.dal12v.mail.ibm.com (Postfix) with ESMTP; Wed, 28 Jun 2023 09:57:43 +0000 (GMT) From: Jay Patel To: linux-mm@kvack.org Cc: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, aneesh.kumar@linux.ibm.com, tsahu@linux.ibm.com, piyushs@linux.ibm.com, jaypatel@linux.ibm.com Subject: [PATCH] [RFC PATCH v2]mm/slub: Optimize slub memory usage Date: Wed, 28 Jun 2023 15:27:40 +0530 Message-Id: <20230628095740.589893-1-jaypatel@linux.ibm.com> X-Mailer: git-send-email 2.39.1 X-TM-AS-GCONF: 00 X-Proofpoint-GUID: xlZ9lgxYK2tWEmOx3-_jxTuCrcrop4T1 X-Proofpoint-ORIG-GUID: MqEvXy-OvLfYqYD4tapXNs3pfEwouvAS Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-28_06,2023-06-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 malwarescore=0 priorityscore=1501 mlxlogscore=974 suspectscore=0 phishscore=0 mlxscore=0 clxscore=1015 bulkscore=0 impostorscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306280083 X-Rspamd-Queue-Id: BB7BD2001A X-Rspam-User: X-Stat-Signature: qghobgba39qctypuqn57bbsf5pm97kpt X-Rspamd-Server: rspam01 X-HE-Tag: 1687946276-1469 X-HE-Meta: U2FsdGVkX1/r1Ka6AYxABNRF91J+8hGYvIJbIXSlpId0m5AJXJalcqWU94kcKrsnUn0WI7E4vC5mVV8qGgJNEx5nzxVtgkQXHLHlrl/awSSTGCltHg5ZOuTCkJ65eEu/IjSxi4ZUCSw9AVzFL+4yzoQkmFHOPrDB9pKZRTSYvkYoHrBwVM8xV0uqNUiIGb9mXupLAVwrBiG/NOWtCUKQWNBsj0JhdYD92ofnD5jRlAwiJtmCHXLHQ4Ayp0ScdTjEdNkiK4okher9wO9UUiH5iqIA9I6MHhmoW8bPIPP03VuNBQ/oBokujwwzW0EJmwmRLlqSC0iDiko9LZgegOLRH2zjCBfCcSqlinCXHZdZb2vERUhFpZaiCPQbocTawzxjfssmyAx0St0BDiHV7ukPGHzwCUFTb2FSa3HQM3sHDyB3Cp3SWyVAL1ENMSHKaozFXONZkVlYhFBziVSx6O/qq2AYLOH6uXKXkyZPknw3JlYw5mioh+1ad93TfMvckOF/Cfv+eT3hGb9OYC1GGa/qGDRfZGyJx0BabCw+/xPUwtDVgvDY9dNRIMbyZbj45ApPtCgWjF9govRN+H+zxbg23+7u7pxLTKNr48Ytis2/QQj43igVkxn+Dt81yeRmtRWgpVljJZFr+TL1zw07c3LKG7+ZGmxTL8cAP+65sLj/GSek8nS9D18lZtu7VhQHlahzSNsyBR3pgHUYXaewPzY7X6Pc5Ke9jRiNcjnIls+KP69O6+pf8xDhHx3TZEIp4GJCLxhN4LUaMMaNlOyuoQOuD9hi9gDOl7lo/W+SpAlMeNWgKTKt7RXR21XtOQ62iXcz5wfmx61s3Lk1tqy1uL0K7pV00LO/ZoDadXXGruskwheXPQWMXFYzA+fBuInzz/gfIJ8oC8lCOHfQvaJccPlg2ZvU8kUQz0nEK/+8vKEPyeuo8ZG0kvS9GXisDtj4XFFzedzsr62sPoHQG/ftuxK qb/CB3iT lHZ2VHRKod4SHARSMOvPG9I2+p30sNW9EXvgCHjKqwpDa34O8qTce44KwFh5bvrRMwoWj7DAxk8ycgIJTORTBKQGSH4UDap46uiiQxElSbc4DLFcy4fQj6fC59Qr/bEcxS4DoyizumGo6eQ+iLDyjau2Kejbi5vHXRFbHe+xf5WESnaf443BTud6fD+Gmi77mOWsNI/lrLVVkiK0ug7Lj/gHE8TdC+EgkC4Jp X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In the previous version [1], we were able to reduce slub memory wastage, but the total memory was also increasing so to solve this problem have modified the patch as follow: 1) If min_objects * object_size > PAGE_ALLOC_COSTLY_ORDER, then it will return with PAGE_ALLOC_COSTLY_ORDER. 2) Similarly, if min_objects * object_size < PAGE_SIZE, then it will return with slub_min_order. 3) Additionally, I changed slub_max_order to 2. There is no specific reason for using the value 2, but it provided the best results in terms of performance without any noticeable impact. [1] https://lore.kernel.org/linux-mm/20230612085535.275206-1-jaypatel@linux.ibm.com/ I have conducted tests on systems with 160 CPUs and 16 CPUs using 4K and 64K page sizes. The tests showed that the patch successfully reduces the total and wastage of slab memory without any noticeable performance degradation in the hackbench test. Test Results are as follows: 1) On 160 CPUs with 4K Page size +----------------+----------------+----------------+ | Total wastage in slub memory | +----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 2090 Kb | 3204 Kb | | With Patch | 1825 Kb | 3088 Kb | | Wastage reduce | ~12% | ~4% | +----------------+----------------+----------------+ +-----------------+----------------+----------------+ | Total slub memory | +-----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 500572 | 713568 | | With Patch | 482036 | 688312 | | Memory reduce | ~4% | ~3% | +-----------------+----------------+----------------+ hackbench-process-sockets +-------+-----+----------+----------+-----------+ | | Normal |With Patch| | +-------+-----+----------+----------+-----------+ | Amean | 1 | 1.3237 | 1.2737 | ( 3.78%) | | Amean | 4 | 1.5923 | 1.6023 | ( -0.63%) | | Amean | 7 | 2.3727 | 2.4260 | ( -2.25%) | | Amean | 12 | 3.9813 | 4.1290 | ( -3.71%) | | Amean | 21 | 6.9680 | 7.0630 | ( -1.36%) | | Amean | 30 | 10.1480 | 10.2170 | ( -0.68%) | | Amean | 48 | 16.7793 | 16.8780 | ( -0.59%) | | Amean | 79 | 28.9537 | 28.8187 | ( 0.47%) | | Amean | 110 | 39.5507 | 40.0157 | ( -1.18%) | | Amean | 141 | 51.5670 | 51.8200 | ( -0.49%) | | Amean | 172 | 62.8710 | 63.2540 | ( -0.61%) | | Amean | 203 | 74.6417 | 75.2520 | ( -0.82%) | | Amean | 234 | 86.0853 | 86.5653 | ( -0.56%) | | Amean | 265 | 97.9203 | 98.4617 | ( -0.55%) | | Amean | 296 | 108.6243 | 109.8770 | ( -1.15%) | +-------+-----+----------+----------+-----------+ 2) On 160 CPUs with 64K Page size +-----------------+----------------+----------------+ | Total wastage in slub memory | +-----------------+----------------+----------------+ | | After Boot |After Hackbench | | Normal | 919 Kb | 1880 Kb | | With Patch | 807 Kb | 1684 Kb | | Wastage reduce | ~12% | ~10% | +-----------------+----------------+----------------+ +-----------------+----------------+----------------+ | Total slub memory | +-----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 1862592 | 3023744 | | With Patch | 1644416 | 2675776 | | Memory reduce | ~12% | ~11% | +-----------------+----------------+----------------+ hackbench-process-sockets +-------+-----+----------+----------+-----------+ | | Normal |With Patch| | +-------+-----+----------+----------+-----------+ | Amean | 1 | 1.2547 | 1.2677 | ( -1.04%) | | Amean | 4 | 1.5523 | 1.5783 | ( -1.67%) | | Amean | 7 | 2.4157 | 2.3883 | ( 1.13%) | | Amean | 12 | 3.9807 | 3.9793 | ( 0.03%) | | Amean | 21 | 6.9687 | 6.9703 | ( -0.02%) | | Amean | 30 | 10.1403 | 10.1297 | ( 0.11%) | | Amean | 48 | 16.7477 | 16.6893 | ( 0.35%) | | Amean | 79 | 27.9510 | 28.0463 | ( -0.34%) | | Amean | 110 | 39.6833 | 39.5687 | ( 0.29%) | | Amean | 141 | 51.5673 | 51.4477 | ( 0.23%) | | Amean | 172 | 62.9643 | 63.1647 | ( -0.32%) | | Amean | 203 | 74.6220 | 73.7900 | ( 1.11%) | | Amean | 234 | 85.1783 | 85.3420 | ( -0.19%) | | Amean | 265 | 96.6627 | 96.7903 | ( -0.13%) | | Amean | 296 | 108.2543 | 108.2253 | ( 0.03%) | +-------+-----+----------+----------+-----------+ 3) On 16 CPUs with 4K Page size +-----------------+----------------+------------------+ | Total wastage in slub memory | +-----------------+----------------+------------------+ | | After Boot | After Hackbench | | Normal | 491 Kb | 727 Kb | | With Patch | 483 Kb | 670 Kb | | Wastage reduce | ~1% | ~8% | +-----------------+----------------+------------------+ +-----------------+----------------+----------------+ | Total slub memory | +-----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 105340 | 153116 | | With Patch | 103620 | 147412 | | Memory reduce | ~1.6% | ~4% | +-----------------+----------------+----------------+ hackbench-process-sockets +-------+-----+----------+----------+---------+ | | Normal |With Patch| | +-------+-----+----------+----------+---------+ | Amean | 1 | 1.0963 | 1.1070 | ( -0.97%) | | Amean | 4 | 3.7963) | 3.7957 | ( 0.02%) | | Amean | 7 | 6.5947) | 6.6017 | ( -0.11%) | | Amean | 12 | 11.1993) | 11.1730 | ( 0.24%) | | Amean | 21 | 19.4097) | 19.3647 | ( 0.23%) | | Amean | 30 | 27.7023) | 27.6040 | ( 0.35%) | | Amean | 48 | 44.1287) | 43.9630 | ( 0.38%) | | Amean | 64 | 58.8147) | 58.5753 | ( 0.41%) | +-------+----+---------+----------+-----------+ 4) On 16 CPUs with 64K Page size +----------------+----------------+----------------+ | Total wastage in slub memory | +----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 194 Kb | 349 Kb | | With Patch | 191 Kb | 344 Kb | | Wastage reduce | ~1% | ~1% | +----------------+----------------+----------------+ +-----------------+----------------+----------------+ | Total slub memory | +-----------------+----------------+----------------+ | | After Boot | After Hackbench| | Normal | 330304 | 472960 | | With Patch | 319808 | 458944 | | Memory reduce | ~3% | ~3% | +-----------------+----------------+----------------+ hackbench-process-sockets +-------+-----+----------+----------+---------+ | | Normal |With Patch| | +-------+----+----------+----------+----------+ | Amean | 1 | 1.9030 | 1.8967 | ( 0.33%) | | Amean | 4 | 7.2117 | 7.1283 | ( 1.16%) | | Amean | 7 | 12.5247 | 12.3460 | ( 1.43%) | | Amean | 12 | 21.7157 | 21.4753 | ( 1.11%) | | Amean | 21 | 38.2693 | 37.6670 | ( 1.57%) | | Amean | 30 | 54.5930 | 53.8657 | ( 1.33%) | | Amean | 48 | 87.6700 | 86.3690 | ( 1.48%) | | Amean | 64 | 117.1227 | 115.4893 | ( 1.39%) | +-------+----+----------+----------+----------+ Signed-off-by: Jay Patel --- mm/slub.c | 52 +++++++++++++++++++++++++--------------------------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index c87628cd8a9a..0a1090c528da 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -4058,7 +4058,7 @@ EXPORT_SYMBOL(kmem_cache_alloc_bulk); */ static unsigned int slub_min_order; static unsigned int slub_max_order = - IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER; + IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : 2; static unsigned int slub_min_objects; /* @@ -4087,11 +4087,10 @@ static unsigned int slub_min_objects; * the smallest order which will fit the object. */ static inline unsigned int calc_slab_order(unsigned int size, - unsigned int min_objects, unsigned int max_order, - unsigned int fract_leftover) + unsigned int min_objects, unsigned int max_order) { unsigned int min_order = slub_min_order; - unsigned int order; + unsigned int order, min_wastage = size, min_wastage_order = MAX_ORDER+1; if (order_objects(min_order, size) > MAX_OBJS_PER_PAGE) return get_order(size * MAX_OBJS_PER_PAGE) - 1; @@ -4104,11 +4103,17 @@ static inline unsigned int calc_slab_order(unsigned int size, rem = slab_size % size; - if (rem <= slab_size / fract_leftover) - break; + if (rem < min_wastage) { + min_wastage = rem; + min_wastage_order = order; + } } - return order; + if (min_wastage_order <= slub_max_order) + return min_wastage_order; + else + return order; + } static inline int calculate_order(unsigned int size) @@ -4142,35 +4147,28 @@ static inline int calculate_order(unsigned int size) nr_cpus = nr_cpu_ids; min_objects = 4 * (fls(nr_cpus) + 1); } + + if ((min_objects * size) > (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) + return PAGE_ALLOC_COSTLY_ORDER; + + if ((min_objects * size) <= PAGE_SIZE) + return slub_min_order; + max_objects = order_objects(slub_max_order, size); min_objects = min(min_objects, max_objects); - while (min_objects > 1) { - unsigned int fraction; - - fraction = 16; - while (fraction >= 4) { - order = calc_slab_order(size, min_objects, - slub_max_order, fraction); - if (order <= slub_max_order) - return order; - fraction /= 2; - } + while (min_objects >= 1) { + order = calc_slab_order(size, min_objects, + slub_max_order); + if (order <= slub_max_order) + return order; min_objects--; } - /* - * We were unable to place multiple objects in a slab. Now - * lets see if we can place a single object there. - */ - order = calc_slab_order(size, 1, slub_max_order, 1); - if (order <= slub_max_order) - return order; - /* * Doh this slab cannot be placed using slub_max_order. */ - order = calc_slab_order(size, 1, MAX_ORDER, 1); + order = calc_slab_order(size, 1, MAX_ORDER); if (order <= MAX_ORDER) return order; return -ENOSYS; -- 2.39.1