From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751885AbdHVRzV (ORCPT ); Tue, 22 Aug 2017 13:55:21 -0400 Received: from merlin.infradead.org ([205.233.59.134]:58764 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751688AbdHVRzT (ORCPT ); Tue, 22 Aug 2017 13:55:19 -0400 Date: Tue, 22 Aug 2017 19:55:12 +0200 From: Peter Zijlstra To: Jesper Dangaard Brouer Cc: Arnaldo Carvalho de Melo , linux-kernel@vger.kernel.org, Jiri Olsa , Ingo Molnar Subject: Re: [PATCH] trace: adjust code layout in get_recursion_context Message-ID: <20170822175512.GJ32112@worktop.programming.kicks-ass.net> References: <150341282404.1960.2812166781523027528.stgit@firesoul> <20170822151410.xroe4mrkk32tlrxv@hirez.programming.kicks-ass.net> <20170822152025.wl664322yuu6esq4@hirez.programming.kicks-ass.net> <20170822190039.519c25bc@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170822190039.519c25bc@redhat.com> User-Agent: Mutt/1.5.22.1 (2013-10-16) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 22, 2017 at 07:00:39PM +0200, Jesper Dangaard Brouer wrote: > > static inline int get_recursion_context(int *recursion) > > { > > + unsigned int pc = preempt_count(); > > int rctx; > > > > - if (in_nmi()) > > + if (pc & NMI_MASK) > > rctx = 3; > > - else if (in_irq()) > > + else if (pc & HARDIRQ_MASK) > > rctx = 2; > > - else if (in_softirq()) > > + else if (pc & SOFTIRQ_OFFSET) > > Hmmm... shouldn't this be SOFTIRQ_MASK? No, that was actually broken, this is correct. See the comment near __local_bh_disable_ip(). We use the low SOFTIRQ bit to indicate if we're in softirq and the rest of the bits to 'disable' softirq. The code should've been using in_serving_softirq(). > perf_swevent_get_recursion_context /proc/kcore > │ > │ > │ Disassembly of section load0: > │ > │ ffffffff811465c0 : > 13.32 │ push %rbp > 1.43 │ mov $0x14d20,%rax > 5.12 │ mov %rsp,%rbp > 6.56 │ add %gs:0x7eec3b5d(%rip),%rax > 0.72 │ lea 0x34(%rax),%rdx > 0.31 │ mov %gs:0x7eec5db2(%rip),%eax > 2.46 │ mov %eax,%ecx > 6.86 │ and $0x7fffffff,%ecx > 0.72 │ test $0x100000,%eax > │ ↓ jne 40 > │ test $0xf0000,%eax > 0.41 │ ↓ je 5b > │ mov $0x8,%ecx > │ mov $0x2,%eax > │ ↓ jmp 4a > │40: mov $0xc,%ecx > │ mov $0x3,%eax > 2.05 │4a: add %rcx,%rdx > 16.60 │ mov (%rdx),%ecx > 2.66 │ test %ecx,%ecx > │ ↓ jne 6d > 1.33 │ movl $0x1,(%rdx) > 1.54 │ pop %rbp > 4.51 │ ← retq > 3.89 │5b: shr $0x8,%ecx > 9.53 │ and $0x1,%ecx > 0.61 │ movzbl %cl,%eax > 0.92 │ movzbl %cl,%ecx > 4.30 │ shl $0x2,%rcx > 14.14 │ ↑ jmp 4a > │6d: mov $0xffffffff,%eax > │ pop %rbp > │ ← retq > │ xchg %ax,%ax > Bah, that's fairly disgusting code but I'm failing to come up with anything significantly better with the current rules. However, if we change the rules on what *_{enter,exit} do, such that we always have all the bits set, we can. The below patch (100% untested) gives me the following asm, which looks better (of course, given how things go today, it'll both explode and be slower once you get it to work). Also, its not strictly correct in that while we _should_ not have nested hardirqs, they _could_ happen and push the SOFTIRQ count out of the 2 bit 'serving' range. In any case, something to play with... $ objdump -dr defconfig-build/kernel/events/core.o | awk '/<[^>]*>:$/ {p=0} /:/ {p=1} {if (p) print $0}' 0000000000004450 : 4450: 55 push %rbp 4451: 48 c7 c6 00 00 00 00 mov $0x0,%rsi 4454: R_X86_64_32S .data..percpu+0x20 4458: 48 89 e5 mov %rsp,%rbp 445b: 65 48 03 35 00 00 00 add %gs:0x0(%rip),%rsi # 4463 4462: 00 445f: R_X86_64_PC32 this_cpu_off-0x4 4463: 65 8b 15 00 00 00 00 mov %gs:0x0(%rip),%edx # 446a 4466: R_X86_64_PC32 __preempt_count-0x4 446a: 89 d0 mov %edx,%eax 446c: c1 e8 08 shr $0x8,%eax 446f: 83 e0 01 and $0x1,%eax 4472: 89 c1 mov %eax,%ecx 4474: 83 c0 01 add $0x1,%eax 4477: f7 c2 00 00 0f 00 test $0xf0000,%edx 447d: 0f 44 c1 cmove %ecx,%eax 4480: 81 e2 00 00 10 00 and $0x100000,%edx 4486: 83 fa 01 cmp $0x1,%edx 4489: 83 d8 ff sbb $0xffffffff,%eax 448c: 48 63 d0 movslq %eax,%rdx 448f: 48 8d 54 96 2c lea 0x2c(%rsi,%rdx,4),%rdx 4494: 8b 0a mov (%rdx),%ecx 4496: 85 c9 test %ecx,%ecx 4498: 75 08 jne 44a2 449a: c7 02 01 00 00 00 movl $0x1,(%rdx) 44a0: 5d pop %rbp 44a1: c3 retq 44a2: b8 ff ff ff ff mov $0xffffffff,%eax 44a7: 5d pop %rbp 44a8: c3 retq 44a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) --- diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index c683996110b1..480babbbc2a5 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -35,7 +35,7 @@ static inline void rcu_nmi_exit(void) #define __irq_enter() \ do { \ account_irq_enter_time(current); \ - preempt_count_add(HARDIRQ_OFFSET); \ + preempt_count_add(HARDIRQ_OFFSET+SOFTIRQ_OFFSET); \ trace_hardirq_enter(); \ } while (0) @@ -51,7 +51,7 @@ static inline void rcu_nmi_exit(void) do { \ trace_hardirq_exit(); \ account_irq_exit_time(current); \ - preempt_count_sub(HARDIRQ_OFFSET); \ + preempt_count_sub(HARDIRQ_OFFSET+SOFTIRQ_OFFSET); \ } while (0) /* @@ -65,7 +65,7 @@ static inline void rcu_nmi_exit(void) lockdep_off(); \ ftrace_nmi_enter(); \ BUG_ON(in_nmi()); \ - preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \ + preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET + SOFTIRQ_OFFSET); \ rcu_nmi_enter(); \ trace_hardirq_enter(); \ } while (0) @@ -75,7 +75,7 @@ static inline void rcu_nmi_exit(void) trace_hardirq_exit(); \ rcu_nmi_exit(); \ BUG_ON(!in_nmi()); \ - preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \ + preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET + SOFTIRQ_OFFSET); \ ftrace_nmi_exit(); \ lockdep_on(); \ printk_nmi_exit(); \ diff --git a/include/linux/preempt.h b/include/linux/preempt.h index cae461224948..e8e7b759b196 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -50,7 +50,7 @@ #define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT) #define NMI_OFFSET (1UL << NMI_SHIFT) -#define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET) +#define SOFTIRQ_DISABLE_OFFSET (4 * SOFTIRQ_OFFSET) /* We use the MSB mostly because its available */ #define PREEMPT_NEED_RESCHED 0x80000000 diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 486fd78eb8d5..981e4163e16c 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -206,16 +206,15 @@ static inline unsigned long perf_aux_size(struct ring_buffer *rb) static inline int get_recursion_context(int *recursion) { - int rctx; - - if (in_nmi()) - rctx = 3; - else if (in_irq()) - rctx = 2; - else if (in_softirq()) - rctx = 1; - else - rctx = 0; + unsigned int pc = preempt_count(); + int rctx = 0; + + if (pc & SOFTIRQ_OFFSET) + rctx++; + if (pc & HARDIRQ_MASK) + rctx++; + if (pc & NMI_MASK) + rctx++; if (recursion[rctx]) return -1;