From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37F14C433B4 for ; Fri, 9 Apr 2021 16:35:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2BB46054E for ; Fri, 9 Apr 2021 16:35:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2BB46054E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 54E446B006C; Fri, 9 Apr 2021 12:35:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FCDC6B006E; Fri, 9 Apr 2021 12:35:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 376186B0070; Fri, 9 Apr 2021 12:35:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1AA796B006C for ; Fri, 9 Apr 2021 12:35:43 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C865998B2 for ; Fri, 9 Apr 2021 16:35:42 +0000 (UTC) X-FDA: 78013379724.17.9A050EA Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf29.hostedemail.com (Postfix) with ESMTP id B8DA8EB for ; Fri, 9 Apr 2021 16:35:40 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id y5so6397981qkl.9 for ; Fri, 09 Apr 2021 09:35:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=SwI9dKNHih0HBykn7XRhBQf79N/hp0URWoW1yNfnjE4=; b=DE/mc8yNoAgpMpKewHbtPdEY8ARocDvubcZmUOwitmZr2vnXvYJNoi9x3vNJ0pTJWe 0oGx3cblI3UJjauam8ZAXKSTpIObJZyHY89b61/7T03rsZzzwIYTffYwaOmp9/lPmIZ9 VZT1tnOyi4MPbfVGPyBTEkqY2MaQ++pvmgfWaPPwtqhdAtZS8ChfTP8evEz+UaQ9zO/d lg4FDs5KrB11z3lPP8kwqVhnqAIbqC3YITLJ+OseurBcK4K6y8LQrYNy+EOnZlZc8igQ o+YdG2wsh+74lvNKESPo0KkSLQ2dLWYEN3N66USB9cxjniHHlZ3njS1rgZC+FziT3ps1 1DMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=SwI9dKNHih0HBykn7XRhBQf79N/hp0URWoW1yNfnjE4=; b=WAJUiSZP7UWu1P4oUuA1qhnpglJK1dhibmbSQhjXsMbuWCZ0cAqzaLoHSnRjuxQRPa t1Cma5fH5nnPLg6Wh4bQpnbPWhzbXnBNFQH5kCkoW96YL9oxl+4b4zxs1fSXEfk95zG6 fIuiRYmpOJF6XJwP64dv9P8oZ7MGL6p+LqKrnNNY+BasoYmfBH0w21ckoH6yfVAhcm8d aiYeQ8HpLzBuRVaUQifg9j6V6odCjZ2XSXJQFD0IU6trOTbPOxSnsQRaL4IBAEa6krDt iCA2GxH8FY7tEisSSi+lMxZHSHz5chg75ZttjaHvsMU/ao2wEgVSShVRH+TJoEIlM9vO TuQQ== X-Gm-Message-State: AOAM532jv9yHYDExdlNCcwi5hiTDZsE/qa5AscMXiFck1y/Be3yYcljv +sJRCYcTjljPDRNQZfsYnw== X-Google-Smtp-Source: ABdhPJwhzp9QT0Cfo7UnR0Dw5K11HqByuVcvmUKzXHT78fAUlpeit0vsXdbi8IKRzVt1EYZdLDsaRQ== X-Received: by 2002:ae9:eb57:: with SMTP id b84mr14445736qkg.271.1617986141716; Fri, 09 Apr 2021 09:35:41 -0700 (PDT) Received: from gabell (209-6-122-159.s2973.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [209.6.122.159]) by smtp.gmail.com with ESMTPSA id v128sm2095356qkc.127.2021.04.09.09.35.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Apr 2021 09:35:41 -0700 (PDT) Date: Fri, 9 Apr 2021 12:35:39 -0400 From: Masayoshi Mizuma To: Shakeel Butt Cc: Roman Gushchin , Johannes Weiner , Michal Hocko , Vladimir Davydov , Cgroups , Linux MM Subject: Re: memcg: performance degradation since v5.9 Message-ID: <20210409163539.5374pde3u6gkbg4a@gabell> References: <20210408193948.vfktg3azh2wrt56t@gabell> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B8DA8EB X-Stat-Signature: mcjaqygdkmb9d4yky31etfcd7o3997e5 Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=mail-qk1-f175.google.com; client-ip=209.85.222.175 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617986140-71451 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 08, 2021 at 02:08:13PM -0700, Shakeel Butt wrote: > On Thu, Apr 8, 2021 at 1:54 PM Roman Gushchin wrote: > > > > On Thu, Apr 08, 2021 at 03:39:48PM -0400, Masayoshi Mizuma wrote: > > > Hello, > > > > > > I detected a performance degradation issue for a benchmark of PostgresSQL [1], > > > and the issue seems to be related to object level memory cgroup [2]. > > > I would appreciate it if you could give me some ideas to solve it. > > > > > > The benchmark shows the transaction per second (tps) and the tps for v5.9 > > > and later kernel get about 10%-20% smaller than v5.8. > > > > > > The benchmark does sendto() and recvfrom() system calls repeatedly, > > > and the duration of the system calls get longer than v5.8. > > > The result of perf trace of the benchmark is as follows: > > > > > > - v5.8 > > > > > > syscall calls errors total min avg max stddev > > > (msec) (msec) (msec) (msec) (%) > > > --------------- -------- ------ -------- --------- --------- --------- ------ > > > sendto 699574 0 2595.220 0.001 0.004 0.462 0.03% > > > recvfrom 1391089 694427 2163.458 0.001 0.002 0.442 0.04% > > > > > > - v5.9 > > > > > > syscall calls errors total min avg max stddev > > > (msec) (msec) (msec) (msec) (%) > > > --------------- -------- ------ -------- --------- --------- --------- ------ > > > sendto 699187 0 3316.948 0.002 0.005 0.044 0.02% > > > recvfrom 1397042 698828 2464.995 0.001 0.002 0.025 0.04% > > > > > > - v5.12-rc6 > > > > > > syscall calls errors total min avg max stddev > > > (msec) (msec) (msec) (msec) (%) > > > --------------- -------- ------ -------- --------- --------- --------- ------ > > > sendto 699445 0 3015.642 0.002 0.004 0.027 0.02% > > > recvfrom 1395929 697909 2338.783 0.001 0.002 0.024 0.03% > > > > > Can you please explain how to read these numbers? Or at least put a % > regression. Let me summarize them here. The total duration ('total' column above) of each system call is as follows if v5.8 is assumed as 100%: - sendto: - v5.8 100% - v5.9 128% - v5.12-rc6 116% - revfrom: - v5.8 100% - v5.9 114% - v5.12-rc6 108% > > > > I bisected the kernel patches, then I found the patch series, which add > > > object level memory cgroup support, causes the degradation. > > > > > > I confirmed the delay with a kernel module which just runs > > > kmem_cache_alloc/kmem_cache_free as follows. The duration is about > > > 2-3 times than v5.8. > > > > > > dummy_cache = KMEM_CACHE(dummy, SLAB_ACCOUNT); > > > for (i = 0; i < 100000000; i++) > > > { > > > p = kmem_cache_alloc(dummy_cache, GFP_KERNEL); > > > kmem_cache_free(dummy_cache, p); > > > } > > > > > > It seems that the object accounting work in slab_pre_alloc_hook() and > > > slab_post_alloc_hook() is the overhead. > > > > > > cgroup.nokmem kernel parameter doesn't work for my case because it disables > > > all of kmem accounting. > > The patch is somewhat doing that i.e. disabling memcg accounting for slab. > > > > > > > The degradation is gone when I apply a patch (at the bottom of this email) > > > that adds a kernel parameter that expects to fallback to the page level > > > accounting, however, I'm not sure it's a good approach though... > > > > Hello Masayoshi! > > > > Thank you for the report! > > > > It's not a secret that per-object accounting is more expensive than a per-page > > allocation. I had micro-benchmark results similar to yours: accounted > > allocations are about 2x slower. But in general it tends to not affect real > > workloads, because the cost of allocations is still low and tends to be only > > a small fraction of the whole cpu load. And because it brings up significant > > benefits: 40%+ slab memory savings, less fragmentation, more stable workingset, > > etc, real workloads tend to perform on pair or better. > > > > So my first question is if you see the regression in any real workload > > or it's only about the benchmark? > > > > Second, I'll try to take a look into the benchmark to figure out why it's > > affected so badly, but I'm not sure we can easily fix it. If you have any > > ideas what kind of objects the benchmark is allocating in big numbers, > > please let me know. > > > > One idea would be to increase MEMCG_CHARGE_BATCH. Thank you for the idea! It's hard-corded as 32 now, so I'm wondering it may be a good idea to make MEMCG_CHARGE_BATCH tunable from a kernel parameter or something. Thanks! Masa