linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Thompson <daniel.thompson@linaro.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>,
	Russell King <linux@arm.linux.org.uk>,
	Will Deacon <will.deacon@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Stephen Boyd <sboyd@codeaurora.org>,
	John Stultz <john.stultz@linaro.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, patches@linaro.org,
	linaro-kernel@lists.linaro.org,
	Sumit Semwal <sumit.semwal@linaro.org>,
	Dirk Behme <dirk.behme@de.bosch.com>,
	Daniel Drake <drake@endlessm.com>,
	Dmitry Pervushin <dpervushin@gmail.com>,
	Tim Sander <tim@krieglstein.org>
Subject: Re: [PATCH 3.19-rc2 v15 4/8] sched_clock: Avoid deadlock during read from NMI
Date: Mon, 26 Jan 2015 20:28:51 +0000	[thread overview]
Message-ID: <54C6A383.6060409@linaro.org> (raw)
In-Reply-To: <alpine.DEB.2.11.1501242248170.5526@nanos>

On 24/01/15 22:40, Thomas Gleixner wrote:
> On Fri, 23 Jan 2015, Daniel Thompson wrote:
>> This patch fixes that problem by providing banked clock data in a
>> similar manner to Thomas Gleixner's 4396e058c52e("timekeeping: Provide
>> fast and NMI safe access to CLOCK_MONOTONIC").
> 
> By some definition of similar.

Fair point, I copied only the NMI-safety concept.

Anyhow, thanks very much for the review.


>> -struct clock_data {
>> -	ktime_t wrap_kt;
>> +struct clock_data_banked {
>>  	u64 epoch_ns;
>>  	u64 epoch_cyc;
>> -	seqcount_t seq;
>> -	unsigned long rate;
>> +	u64 (*read_sched_clock)(void);
>> +	u64 sched_clock_mask;
>>  	u32 mult;
>>  	u32 shift;
>>  	bool suspended;
>>  };
>>  
>> +struct clock_data {
>> +	ktime_t wrap_kt;
>> +	seqcount_t seq;
>> +	unsigned long rate;
>> +	struct clock_data_banked bank[2];
>> +};
> 
> ....
> 
>> -static u64 __read_mostly (*read_sched_clock)(void) = jiffy_sched_clock_read;
>> +static struct clock_data cd = {
>> +	.bank = {
>> +		[0] = {
>> +			.mult	= NSEC_PER_SEC / HZ,
>> +			.read_sched_clock = jiffy_sched_clock_read,
>> +		},
>> +	},
>> +};
> 
> If you had carefully studied the changes which made it possible to do
> the nmi safe clock monotonic accessor then you'd had noticed that I
> went a great way to optimize the cache foot print first and then add
> this new fangled thing.
> 
> So in the first place 'cd' lacks ____cacheline_aligned. It should have
> been there before, but that's a different issue. You should have
> noticed.
> 
> Secondly, I don't see any hint that you actually thought about the
> cache foot print of the result struct clock_data.

I did think about the cache footprint but only to the point of believing
my patch was unlikely to regress performance. As it happens it was the
absence of __cacheline_aligned on cd in the current code that made be
believe absence of regression would be enough (once I'd managed that I
ordered the members within the structure to get best locality of
reference within the *patch* in order to make code review easier).

I guess I did two things wrong here: inadequately documenting what work
I did and possessing insufficient ambition to improve!

I'll work on both of these.


> struct clock_data {
> 	ktime_t wrap_kt;
> 	seqcount_t seq;
> 	unsigned long rate;
> 	struct clock_data_banked bank[2];
> };
> 
> wrap_kt and rate are completely irrelevant for the hotpath. The whole
> thing up to the last member of bank[0] still fits into 64 byte on both
> 32 and 64bit, but that's not by design and not documented so anyone
> who is aware of cache foot print issues will go WTF when the first
> member of a hot path data structure is completely irrelevant.

Agreed.

It looks like I also put the function pointer in the wrong place within
clock_data_banked. It should occupy the space between the 64-bit and
32-bit members shouldn't it?


>>  static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
>>  {
>> @@ -58,50 +65,82 @@ static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
>>  
>>  unsigned long long notrace sched_clock(void)
>>  {
>> -	u64 epoch_ns;
>> -	u64 epoch_cyc;
>>  	u64 cyc;
>>  	unsigned long seq;
>> -
>> -	if (cd.suspended)
>> -		return cd.epoch_ns;
>> +	struct clock_data_banked *b;
>> +	u64 res;
> 
> So we now have
> 
>   	u64 cyc;
>   	unsigned long seq;
> 	struct clock_data_banked *b;
> 	u64 res;
> 
> Let me try a different version of that:
> 
> 	struct clock_data_banked *b;
>   	unsigned long seq;
> 	u64 res, cyc;
> 
> Can you spot the difference in the reading experience?

Will fix.

>  
>>  	do {
>> -		seq = raw_read_seqcount_begin(&cd.seq);
>> -		epoch_cyc = cd.epoch_cyc;
>> -		epoch_ns = cd.epoch_ns;
>> +		seq = raw_read_seqcount(&cd.seq);
>> +		b = cd.bank + (seq & 1);
>> +		if (b->suspended) {
>> +			res = b->epoch_ns;
> 
> So now we have read_sched_clock as a pointer in the bank. Why do you
> still need b->suspended?
> 
> What's wrong with setting b->read_sched_clock to NULL at suspend and
> restore the proper pointer on resume and use that as a conditional?
>  
> It would allow the compiler to generate better code, but that's
> obviously not the goal here. Darn, this is hot path code and not some
> random driver.

The update code probably won't be as easy to read but, as you say, this
is hot patch code.

>> +		} else {
>> +			cyc = b->read_sched_clock();
>> +			cyc = (cyc - b->epoch_cyc) & b->sched_clock_mask;
>> +			res = b->epoch_ns + cyc_to_ns(cyc, b->mult, b->shift);
> 
> It would allow the following optimization as well:
> 
>    	 res = b->epoch_ns;
> 	 if (b->read_sched_clock) {
> 	    	...
> 	 }
> 
> If you think that compilers are smart enough to figure all that out
> for you, you might get surprised. The more clear your code is the
> better is the chance that the compiler gets it right. We have seen the
> opposite of that as well, but that's clearly a compiler bug.

Good idea and, in this case there is a function pointer with unknown
side effects so a compiler would never be able to make that optimization.


>> +/*
>> + * Start updating the banked clock data.
>> + *
>> + * sched_clock will never observe mis-matched data even if called from
>> + * an NMI. We do this by maintaining an odd/even copy of the data and
>> + * steering sched_clock to one or the other using a sequence counter.
>> + * In order to preserve the data cache profile of sched_clock as much
>> + * as possible the system reverts back to the even copy when the update
>> + * completes; the odd copy is used *only* during an update.
>> + *
>> + * The caller is responsible for avoiding simultaneous updates.
>> + */
>> +static struct clock_data_banked *update_bank_begin(void)
>> +{
>> +	/* update the backup (odd) bank and steer readers towards it */
>> +	memcpy(cd.bank + 1, cd.bank, sizeof(struct clock_data_banked));
>> +	raw_write_seqcount_latch(&cd.seq);
>> +
>> +	return cd.bank;
>> +}
>> +
>> +/*
>> + * Finalize update of banked clock data.
>> + *
>> + * This is just a trivial switch back to the primary (even) copy.
>> + */
>> +static void update_bank_end(void)
>> +{
>> +	raw_write_seqcount_latch(&cd.seq);
>>  }
> 
> What's wrong with having a master struct
> 
> struct master_data {
> 	struct clock_data_banked master_data;
> 	ktime_t wrap_kt;
> 	unsigned long rate;
> 	u64 (*real_read_sched_clock)(void);
> };
> 
> Then you only have to care about the serialization of the master_data
> update and then the hotpath data update would be the same simple
> function as update_fast_timekeeper(). And it would have the same
> ordering scheme and aside of that the resulting code would be simpler,
> more intuitive to read and I'm pretty sure faster.

Sorry. I don't quite understand this.

Is the intent to have a single function to update the hotpath data used
by both update_sched_clock() and sched_clock_register() to replace the
pairing of update_bank_begin/end()?

If so, I started out doing that but eventually concluded that
update_sched_clock() didn't really benefit from having to make a third
copy of the values it consumes rather than updates.

However if that's an unconvincing reason I'm happy to switch to having
an update structure.


Daniel.


  reply	other threads:[~2015-01-26 20:28 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-23 14:22 [PATCH 3.19-rc2 v15 0/8] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 1/8] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 2/8] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 3/8] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 4/8] sched_clock: Avoid deadlock during read from NMI Daniel Thompson
2015-01-24 22:40   ` Thomas Gleixner
2015-01-26 20:28     ` Daniel Thompson [this message]
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 5/8] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-01-24 21:44   ` Thomas Gleixner
2015-01-26 17:21     ` Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 6/8] x86/nmi: Use common printk functions Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 7/8] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-01-23 14:22 ` [PATCH 3.19-rc2 v15 8/8] ARM: Fix on-demand backtrace triggered by IRQ Daniel Thompson
2015-02-03 19:06 ` [PATCH 3.19-rc6 v16 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-02-26 20:31     ` Nicolas Pitre
2015-02-26 21:05       ` Daniel Thompson
2015-02-26 21:33         ` Nicolas Pitre
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-02-26 20:33     ` Nicolas Pitre
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-02-03 19:06   ` [PATCH 3.19-rc6 v16 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-03-04 10:12 ` [PATCH 4.0-rc1 v17 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-03-04 16:13     ` Joe Perches
2015-03-04 16:20       ` Steven Rostedt
2015-03-04 16:33         ` Daniel Thompson
2015-03-04 17:21           ` Joe Perches
2015-03-05 12:11             ` Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-03-05  0:54     ` Ingo Molnar
2015-03-05 12:29       ` Daniel Thompson
2015-03-05 19:46         ` Ingo Molnar
2015-03-06 19:02           ` Daniel Thompson
2015-03-04 10:12   ` [PATCH 4.0-rc1 v17 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson
2015-03-12 13:39 ` [PATCH 4.0-rc2 v18 0/6] irq/arm: Implement arch_trigger_all_cpu_backtrace Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 1/6] irqchip: gic: Optimize locking in gic_raise_softirq Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 2/6] irqchip: gic: Make gic_raise_softirq FIQ-safe Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 3/6] irqchip: gic: Introduce plumbing for IPI FIQ Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 4/6] printk: Simple implementation for NMI backtracing Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 5/6] x86/nmi: Use common printk functions Daniel Thompson
2015-03-12 13:39   ` [PATCH 4.0-rc2 v18 6/6] ARM: Add support for on-demand backtrace of other CPUs Daniel Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54C6A383.6060409@linaro.org \
    --to=daniel.thompson@linaro.org \
    --cc=catalin.marinas@arm.com \
    --cc=dirk.behme@de.bosch.com \
    --cc=dpervushin@gmail.com \
    --cc=drake@endlessm.com \
    --cc=jason@lakedaemon.net \
    --cc=john.stultz@linaro.org \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@arm.linux.org.uk \
    --cc=marc.zyngier@arm.com \
    --cc=patches@linaro.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@codeaurora.org \
    --cc=sumit.semwal@linaro.org \
    --cc=tglx@linutronix.de \
    --cc=tim@krieglstein.org \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).