From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 94C90C433DF for ; Thu, 30 Jul 2020 23:12:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4C4612173E for ; Thu, 30 Jul 2020 23:12:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="F4nnpSxv" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4C4612173E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CF67D8D0008; Thu, 30 Jul 2020 19:12:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7F218D0005; Thu, 30 Jul 2020 19:12:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B203B8D0008; Thu, 30 Jul 2020 19:12:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0116.hostedemail.com [216.40.44.116]) by kanga.kvack.org (Postfix) with ESMTP id 9D8F18D0005 for ; Thu, 30 Jul 2020 19:12:07 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 26EEC180AD830 for ; Thu, 30 Jul 2020 23:12:07 +0000 (UTC) X-FDA: 77096292294.20.pen49_430feb726f7f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id F2DFF180C07AF for ; Thu, 30 Jul 2020 23:12:06 +0000 (UTC) X-HE-Tag: pen49_430feb726f7f X-Filterd-Recvd-Size: 5587 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Thu, 30 Jul 2020 23:12:06 +0000 (UTC) Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 71BD620809; Thu, 30 Jul 2020 23:12:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1596150725; bh=m2t34d5YnSi9udUc3+Zve/FSetve41PZXbnLhDl0L0A=; h=Date:From:To:Cc:Subject:Reply-To:From; b=F4nnpSxvFGAvdLHn6xWj4xZAJU4VmRYfPbR9WcWujLRTlZkMIVLkp2vyP5cb0l+7g 58TzGPEOopo3Na1cctJ/0xq9ddlPipJBgUtB/OLsZGAK0pTe0yv1JAwT9r6JwtX5Ur lG0dOKKZifunmBQiuZiesz2nRw949ghrilCjSoZ0= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id 513BC3522635; Thu, 30 Jul 2020 16:12:05 -0700 (PDT) Date: Thu, 30 Jul 2020 16:12:05 -0700 From: "Paul E. McKenney" To: cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org Cc: hannes@cmpxchg.org, willy@infradead.org, urezki@gmail.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Raw spinlocks and memory allocation Message-ID: <20200730231205.GA11265@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.9.4 (2018-02-28) X-Rspamd-Queue-Id: F2DFF180C07AF X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello! We have an interesting issue involving interactions between RCU, memory allocation, and "raw atomic" contexts. The most attractive solution to this issue requires adding a new GFP_ flag. Perhaps this is a big ask, but on the other hand, the benefit is a large reduction in linked-list-induced cache misses when invoking RCU callbacks. For more details, please read on! Examples of raw atomic contexts include disabled hardware interrupts (that is, a hardware irq handler rather than a threaded irq handler), code holding a raw_spinlock_t, and code with preemption disabled (but only in cases where -rt cannot safely map it to disabled migration). It turns out that call_rcu() is already invoked from raw atomic contexts, and we therefore anticipate that kfree_rcu() will also be at some point. This matters due to recent work to fix a weakness in both call_rcu() and kfree_rcu() that was pointed out long ago by Christoph Lameter, among others. The weakness is that RCU traverses linked callback lists when invoking those callbacks. Because the just-ended grace period will have rendered these lists cache-cold, this results in an expensive cache miss on each and every callback invocation. Uladzislau Rezki (CCed) has recently produced patches for kfree_rcu() that instead store pointers to callbacks in arrays, so that callback invocation can step through the array using the kfree_bulk() interface. This greatly reducing the number of cache misses. The benefits are not subtle: https://lore.kernel.org/lkml/20191231122241.5702-1-urezki@gmail.com/ Of course, the arrays have to come from somewhere, and that somewhere is the memory allocator. Yes, memory allocation can fail, but in that rare case, kfree_rcu() just falls back to the old approach, taking a few extra cache misses, but making good (if expensive) forward progress. This works well until someone invokes kfree_rcu() with a raw spinlock held. Even that works fine unless the memory allocator has exhausted its caches, at which point it will acquire a normal spinlock. In kernels built with CONFIG_PROVE_RAW_LOCK_NESTING=y this will result in a lockdep splat. Worse yet, in -rt kernels, this can result in scheduling while atomic. So, may we add a GFP_ flag that will cause kmalloc() and friends to return NULL when they would otherwise need to acquire their non-raw spinlock? This avoids adding any overhead to the slab-allocator fastpaths, but allows callback invocation to reduce cache misses without having to restructure some existing callers of call_rcu() and potential future callers of kfree_rcu(). Thoughts? Thanx, Paul PS. Other avenues investigated: o Just don't invoke kmalloc() when kfree_rcu() is invoked from raw atomic contexts. The problem with this is that there is no way to detect raw atomic contexts in production kernels built with CONFIG_PREEMPT=n. Adding means to detect this would increase overhead on numerous fastpaths. o Just say "no" to invoking call_rcu() and kfree_rcu() from raw atomic contexts. This would require that the affected call_rcu() and kfree_rcu() invocations be deferred. This is in theory simple, but can get quite messy, and often requires fallbacks such as timers that can degrade energy efficiency and realtime response. o Provide a different non-allocating API such as kfree_rcu_raw() and call_rcu_raw() that are used from raw atomic contexts and also on memory-allocation failure from kfree_rcu() and call_rcu(). This results in unconditional callback-invocation cache misses for calls from raw contexts, including for common code that is only occasionally invoked from raw atomic contexts. This approach also forces developers to worry about two more RCU API members. o Move the memory allocator's spinlocks to raw_spinlock_t. This would be bad for realtime response, and would likely require even more conversions when the allocator invokes other subsystems that also use non-raw spinlocks.