From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54C42CA9EA0 for ; Mon, 4 Nov 2019 11:41:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 34D272184C for ; Mon, 4 Nov 2019 11:41:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728396AbfKDLla (ORCPT ); Mon, 4 Nov 2019 06:41:30 -0500 Received: from out30-42.freemail.mail.aliyun.com ([115.124.30.42]:56996 "EHLO out30-42.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726441AbfKDLla (ORCPT ); Mon, 4 Nov 2019 06:41:30 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R401e4;CH=green;DM=||false|;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01f04446;MF=laijs@linux.alibaba.com;NM=1;PH=DS;RN=36;SR=0;TI=SMTPD_---0ThBi2ca_1572867681; Received: from C02XQCBJJG5H.local(mailfrom:laijs@linux.alibaba.com fp:SMTPD_---0ThBi2ca_1572867681) by smtp.aliyun-inc.com(127.0.0.1); Mon, 04 Nov 2019 19:41:22 +0800 Subject: Re: [PATCH V2 7/7] x86,rcu: use percpu rcu_preempt_depth To: Sebastian Andrzej Siewior Cc: linux-kernel@vger.kernel.org, Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86@kernel.org, "Paul E. McKenney" , Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , Joel Fernandes , Andi Kleen , Andy Lutomirski , Fenghua Yu , Kees Cook , "Rafael J. Wysocki" , Dave Hansen , Babu Moger , Rik van Riel , "Chang S. Bae" , Jann Horn , David Windsor , Elena Reshetova , Yuyang Du , Anshuman Khandual , Richard Guy Briggs , Andrew Morton , Christian Brauner , Michal Hocko , Andrea Arcangeli , Al Viro , "Dmitry V. Levin" , rcu@vger.kernel.org References: <20191102124559.1135-1-laijs@linux.alibaba.com> <20191102124559.1135-8-laijs@linux.alibaba.com> <20191104092519.nukaz5qmgiskzafi@linutronix.de> From: Lai Jiangshan Message-ID: <4878ccfd-7a4e-4f84-9bc3-1d477e077587@linux.alibaba.com> Date: Mon, 4 Nov 2019 19:41:20 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191104092519.nukaz5qmgiskzafi@linutronix.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: rcu-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: rcu@vger.kernel.org On 2019/11/4 5:25 下午, Sebastian Andrzej Siewior wrote: > On 2019-11-02 12:45:59 [+0000], Lai Jiangshan wrote: >> Convert x86 to use a per-cpu rcu_preempt_depth. The reason for doing so >> is that accessing per-cpu variables is a lot cheaper than accessing >> task_struct or thread_info variables. > > Is there a benchmark saying how much we gain from this? Hello Maybe I can write a tight loop for testing, but I don't think anyone will be interesting in it. I'm also trying to find some good real tests. I need some suggestions here. > >> We need to save/restore the actual rcu_preempt_depth when switch. >> We also place the per-cpu rcu_preempt_depth close to __preempt_count >> and current_task variable. >> >> Using the idea of per-cpu __preempt_count. >> >> No function call when using rcu_read_[un]lock(). >> Single instruction for rcu_read_lock(). >> 2 instructions for fast path of rcu_read_unlock(). > > I think these were not inlined due to the header requirements. objdump -D -S kernel/workqueue.o shows (selected fractions): raw_cpu_add_4(__rcu_preempt_depth, 1); d8f: 65 ff 05 00 00 00 00 incl %gs:0x0(%rip) # d96 ...... return GEN_UNARY_RMWcc("decl", __rcu_preempt_depth, e, __percpu_arg([var])); dd8: 65 ff 0d 00 00 00 00 decl %gs:0x0(%rip) # ddf if (unlikely(rcu_preempt_depth_dec_and_test())) ddf: 74 26 je e07 ...... rcu_read_unlock_special(); e07: e8 00 00 00 00 callq e0c > > Boris pointed one thing, there is also DEFINE_PERCPU_RCU_PREEMP_DEPTH. > Thanks for pointing out. Best regards Lai