From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7ECAFC4338F for ; Thu, 19 Aug 2021 22:17:09 +0000 (UTC) Received: from mail.server123.net (mail.server123.net [78.46.64.186]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B87F61051 for ; Thu, 19 Aug 2021 22:17:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 0B87F61051 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=corelight.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=saout.de X-Virus-Scanned: amavisd-new at saout.de Authentication-Results: mail.server123.net (amavisd-new); dkim=pass (2048-bit key) header.d=corelight-com.20150623.gappssmtp.com Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::42a; helo=mail-wr1-x42a.google.com; envelope-from=arne.welzel@corelight.com; receiver= Received: from mail-wr1-x42a.google.com (mail-wr1-x42a.google.com [IPv6:2a00:1450:4864:20::42a]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.server123.net (Postfix) with ESMTPS for ; Sun, 8 Aug 2021 15:43:09 +0200 (CEST) Received: by mail-wr1-x42a.google.com with SMTP id m12so17696070wru.12 for ; Sun, 08 Aug 2021 06:43:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=corelight-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Cmem1dLMJ0CRtSIpGPuywVWqVDPZcLOwVJWKWNAdjOM=; b=gAy7+CBGI9Tmg4NuA+iGMqhhSolAFCTEVwVysVH3ntbM++kN8mTPRZXUD3oTBefGSQ ctZAFwAW27f2qlT0yNvrzDzp0AdNTrH1ZiE+HlGRjCRwlY3cji/Q0PIpPIxGlwUQyUsO OuPr9MleTqOCqnI2HPHg45KxM+68fpxPVcpUmxDcy9L0l/Lro7ROEiRfj/YQRFBKh4GC uA5V8SvIzyNUighr60+fxMpGJqsPPkt/ntW/xPxiVSzyf7aV9DskmRaqTCwLyNT3RJjx SEVBDoU6wJAC2jS04m01Unx//nXjU5o11EcJeCyls7lf+lorhVr/4wZIyjKPLfCjuSgz 3eJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=Cmem1dLMJ0CRtSIpGPuywVWqVDPZcLOwVJWKWNAdjOM=; b=QApB/Pr8y0TpHM2g9N1wPHIhxpyD6yKbsYDgX2fbUkB3TiUN3ZEjs6fNn8SrliuxRW W+xePJ9/VsDivbkZ2Dsw1Efjg94pdIumFQytvAvTdZ3uo8HPAk29KzMBqj45tI0jfgqO s4s9J/P/V/G2j3mBEYMrdvNgnGpoe18kgkIuWlmedtsBrRhTsjSOOqISk+5POym7/PF3 9omK3J2rnFbI+sk4p10mW65kseWtdQ5s1h8JGLbwRs9+5fyqPu+3pAF+7mbovS/YaqDn WA2+QBOTw0Zxh408mhdVXDbdqbQ6Hi5n0CWd+N5bsIW7zVsFCq2fOeFQrgz6uoe8iGCX IEzQ== X-Gm-Message-State: AOAM530i0CJMoXpsYSGEj+k8xgiR9rRoxZsAphxAX4BIMh+UilI6qiJP vtjVLxGm99fh9DpHaLNVVADz9PiULrmPahRy X-Google-Smtp-Source: ABdhPJx46iXM132tPldyO1vewOickk5yIPhPYk/Y4BNlseNSOLyW4daAQMs3sK8WPbHoKK9nVeLfEA== X-Received: by 2002:adf:d1e3:: with SMTP id g3mr20220584wrd.104.1628430189387; Sun, 08 Aug 2021 06:43:09 -0700 (PDT) Received: from tinkyx280.. (ip-109-42-112-123.web.vodafone.de. [109.42.112.123]) by smtp.gmail.com with ESMTPSA id w18sm17347608wrg.68.2021.08.08.06.43.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 08 Aug 2021 06:43:09 -0700 (PDT) From: Arne Welzel To: dm-devel@redhat.com, dm-crypt@saout.de Date: Sun, 8 Aug 2021 15:42:05 +0200 Message-Id: <20210808134205.1981531-1-arne.welzel@corelight.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-MailFrom: arne.welzel@corelight.com X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-dm-crypt.saout.de-0 Message-ID-Hash: 622Y4Q44FXWJ63NBEOV3F6FWTLAREFVY X-Message-ID-Hash: 622Y4Q44FXWJ63NBEOV3F6FWTLAREFVY X-Mailman-Approved-At: Fri, 20 Aug 2021 00:07:43 +0200 CC: mpatocka@redhat.com, snitzer@redhat.com, agk@redhat.com, Arne Welzel , DJ Gregor X-Mailman-Version: 3.3.2 Precedence: list Subject: [dm-crypt] [PATCH] dm crypt: Avoid percpu_counter spinlock contention in crypt_page_alloc() List-Id: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On many core systems using dm-crypt, heavy spinlock contention in percpu_counter_compare() can be observed when the dmcrypt page allocation limit for a given device is reached or close to be reached. This is due to percpu_counter_compare() taking a spinlock to compute an exact result on potentially many CPUs at the same time. Switch to non-exact comparison of allocated and allowed pages by using the value returned by percpu_counter_read_positive(). This may over/under estimate the actual number of allocated pages by at most (batch-1) * num_online_cpus() (assuming my understanding of the percpu_counter logic is proper). Currently, batch is bounded by 32. The system on which this issue was first observed has 256 CPUs and 512G of RAM. With a 4k page size, this change may over/under estimate by 31MB. With ~10G (2%) allowed for dmcrypt allocations, this seems an acceptable error. Certainly preferred over running into the spinlock contention. This behavior was separately/artificially reproduced on an EC2 c5.24xlarge instance system with 96 CPUs and 192GB RAM as follows, but can be provokes on systems with less available CPUs. * Disable swap * Tune vm settings to promote regular writeback $ echo 50 > /proc/sys/vm/dirty_expire_centisecs $ echo 25 > /proc/sys/vm/dirty_writeback_centisecs $ echo $((128 * 1024 * 1024)) > /proc/sys/vm/dirty_background_bytes * Create 8 dmcrypt devices based on files on a tmpfs * Create and mount an ext4 filesystem on each crypt devices * Run stress-ng --hdd 8 within one of above filesystems Total %system usage shown via sysstat goes to ~35%, write througput on the underlying loop device is ~2GB/s. perf profiling an individual kworker kcryptd thread shows the following in the profile, indicating it hits heavy spinlock contention in percpu_counter_compare(): 99.98% 0.00% kworker/u193:46 [kernel.kallsyms] [k] ret_from_fork | ---ret_from_fork kthread worker_thread | --99.92%--process_one_work | |--80.52%--kcryptd_crypt | | | |--62.58%--mempool_alloc | | | | | --62.24%--crypt_page_alloc | | | | | --61.51%--__percpu_counter_compare | | | | | --61.34%--__percpu_counter_sum | | | | | |--58.68%--_raw_spin_lock_irqsave | | | | | | | --58.30%--native_queued_spin_lock_slowpath | | | | | --0.69%--cpumask_next | | | | | --0.51%--_find_next_bit | | | |--10.61%--crypt_convert | | | | | |--6.05%--xts_crypt ... After apply this change, %system usage is lowered to ~7% and write throughput on the loopback interface increases to 2.7GB/s. The profile shows mempool_alloc() as ~8% rather than ~62% in the profile and not hitting the percpu_counter() spinlock anymore. |--8.15%--mempool_alloc | | | |--3.93%--crypt_page_alloc | | | | | --3.75%--__alloc_pages | | | | | --3.62%--get_page_from_freelist | | | | | --3.22%--rmqueue_bulk | | | | | --2.59%--_raw_spin_lock | | | | --2.57%--native_queued_spin_lock_slowpath | | | --3.05%--_raw_spin_lock_irqsave | | | --2.49%--native_queued_spin_lock_slowpath Suggested-by: DJ Gregor Signed-off-by: Arne Welzel --- drivers/md/dm-crypt.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index 50f4cbd600d5..2ae481610f12 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -2661,7 +2661,12 @@ static void *crypt_page_alloc(gfp_t gfp_mask, void *pool_data) struct crypt_config *cc = pool_data; struct page *page; - if (unlikely(percpu_counter_compare(&cc->n_allocated_pages, dm_crypt_pages_per_client) >= 0) && + /* + * Note, percpu_counter_read_positive() may over (and under) estimate + * the current usage by at most (batch - 1) * num_online_cpus() pages, + * but avoids potential spinlock contention of an exact result. + */ + if (unlikely(percpu_counter_read_positive(&cc->n_allocated_pages) > dm_crypt_pages_per_client) && likely(gfp_mask & __GFP_NORETRY)) return NULL; -- 2.20.1 _______________________________________________ dm-crypt mailing list -- dm-crypt@saout.de To unsubscribe send an email to dm-crypt-leave@saout.de