From: David Laight <David.Laight@ACULAB.COM>
To: 'Peter Zijlstra' <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Jesper Dangaard Brouer <brouer@redhat.com>,
"mingo@kernel.org" <mingo@kernel.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kan.liang@linux.intel.com" <kan.liang@linux.intel.com>,
"acme@kernel.org" <acme@kernel.org>,
"mark.rutland@arm.com" <mark.rutland@arm.com>,
"alexander.shishkin@linux.intel.com"
<alexander.shishkin@linux.intel.com>,
"jolsa@redhat.com" <jolsa@redhat.com>,
"namhyung@kernel.org" <namhyung@kernel.org>,
"ak@linux.intel.com" <ak@linux.intel.com>,
"eranian@google.com" <eranian@google.com>
Subject: RE: [PATCH 4/6] perf: Optimize get_recursion_context()
Date: Mon, 9 Nov 2020 14:14:43 +0000 [thread overview]
Message-ID: <262e5838b89f4776a1830bc218a6d9a6@AcuMS.aculab.com> (raw)
In-Reply-To: <20201109121237.GJ2594@hirez.programming.kicks-ass.net>
> -----Original Message-----
> From: Peter Zijlstra <peterz@infradead.org>
> Sent: 09 November 2020 12:13
> To: David Laight <David.Laight@ACULAB.COM>
> Cc: Steven Rostedt <rostedt@goodmis.org>; Jesper Dangaard Brouer <brouer@redhat.com>;
> mingo@kernel.org; tglx@linutronix.de; linux-kernel@vger.kernel.org; kan.liang@linux.intel.com;
> acme@kernel.org; mark.rutland@arm.com; alexander.shishkin@linux.intel.com; jolsa@redhat.com;
> namhyung@kernel.org; ak@linux.intel.com; eranian@google.com
> Subject: Re: [PATCH 4/6] perf: Optimize get_recursion_context()
>
> On Sat, Oct 31, 2020 at 12:11:42PM +0000, David Laight wrote:
> > The gcc 7.5.0 I have handy probably generates the best code for:
> >
> > unsigned char q_2(unsigned int pc)
> > {
> > unsigned char rctx = 0;
> >
> > rctx += !!(pc & (NMI_MASK));
> > rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK));
> > rctx += !!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET));
> >
> > return rctx;
> > }
> >
> > 0000000000000000 <q_2>:
> > 0: f7 c7 00 00 f0 00 test $0xf00000,%edi # clock 0
> > 6: 0f 95 c0 setne %al # clock 1
> > 9: f7 c7 00 00 ff 00 test $0xff0000,%edi # clock 0
> > f: 0f 95 c2 setne %dl # clock 1
> > 12: 01 c2 add %eax,%edx # clock 2
> > 14: 81 e7 00 01 ff 00 and $0xff0100,%edi
> > 1a: 0f 95 c0 setne %al
> > 1d: 01 d0 add %edx,%eax # clock 3
> > 1f: c3 retq
> >
> > I doubt that is beatable.
> >
> > I've annotated the register dependency chain.
> > Likely to be 3 (or maybe 4) clocks.
> > The other versions are a lot worse (7 or 8) without allowing
> > for 'sbb' taking 2 clocks on a lot of Intel cpus.
>
> https://godbolt.org/z/EfnG8E
>
> Recent GCC just doesn't want to do that. Still, using u8 makes sense, so
> I've kept that.
u8 helps x86 because its 'setne' only affects the low 8 bits.
I guess that seemed a good idea when it was added (386).
It doesn't seem to make the other architectures much worse.
gcc 10.x can be persuaded to generate the above code.
https://godbolt.org/z/6GoT94
It sometimes seems to me that every new version of gcc is
larger, slower and generates worse code than the previous one.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
next prev parent reply other threads:[~2020-11-09 14:14 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-30 15:13 [PATCH 0/6] perf: Reduce stack usage (and misc bits) Peter Zijlstra
2020-10-30 15:13 ` [PATCH 1/6] perf: Reduce stack usage of perf_output_begin() Peter Zijlstra
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
2020-10-30 15:13 ` [PATCH 2/6] perf/x86: Reduce stack usage for x86_pmu::drain_pebs() Peter Zijlstra
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
2020-10-30 15:13 ` [PATCH 3/6] perf: Fix get_recursion_context() Peter Zijlstra
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
2020-10-30 15:13 ` [PATCH 4/6] perf: Optimize get_recursion_context() Peter Zijlstra
2020-10-30 17:11 ` Jesper Dangaard Brouer
2020-10-30 20:22 ` Steven Rostedt
2020-10-30 22:14 ` Thomas Gleixner
2020-10-30 23:31 ` Steven Rostedt
2020-10-31 11:23 ` Peter Zijlstra
2020-10-30 23:01 ` Peter Zijlstra
2020-10-31 12:11 ` David Laight
2020-10-31 13:18 ` David Laight
2020-11-09 12:12 ` Peter Zijlstra
2020-11-09 14:14 ` David Laight [this message]
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
2020-10-30 15:13 ` [PATCH 5/6] perf/arch: Remove perf_sample_data::regs_user_copy Peter Zijlstra
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
2020-10-30 15:13 ` [PATCH 6/6] perf/x86: Make dummy_iregs static Peter Zijlstra
2020-11-10 12:45 ` [tip: perf/urgent] " tip-bot2 for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=262e5838b89f4776a1830bc218a6d9a6@AcuMS.aculab.com \
--to=david.laight@aculab.com \
--cc=acme@kernel.org \
--cc=ak@linux.intel.com \
--cc=alexander.shishkin@linux.intel.com \
--cc=brouer@redhat.com \
--cc=eranian@google.com \
--cc=jolsa@redhat.com \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).