All of lore.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Tony Luck <tony.luck@intel.com>,
	Wanpeng Li <wanpeng.li@hotmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Paul Mackerras <paulus@samba.org>, Ingo Molnar <mingo@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>, Rik van Riel <riel@redhat.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [PATCH 08/37] cputime: Convert task/group cputime to nsecs
Date: Mon, 30 Jan 2017 16:38:08 +0100	[thread overview]
Message-ID: <20170130153807.GB16945@lerouge> (raw)
In-Reply-To: <20170130135451.GB2669@redhat.com>

On Mon, Jan 30, 2017 at 02:54:51PM +0100, Stanislaw Gruszka wrote:
> On Sat, Jan 28, 2017 at 04:28:13PM +0100, Frederic Weisbecker wrote:
> > On Sat, Jan 28, 2017 at 12:57:40PM +0100, Stanislaw Gruszka wrote:
> > > On 32 bit architectures 64bit store/load is not atomic and if not
> > > protected - 64bit variables can be mangled. I do not see any protection
> > > (lock) between utime/stime store and load in the patch and seems that
> > > {u/s}time store/load can be performed at the same time. Though problem
> > > is very very improbable it still can happen at least theoretically when
> > > lower and upper 32 bits are changed at the same time i.e. process
> > > {u,s}time become near to multiple of 2**32 nsec (aprox: 4sec) and
> > > 64bit {u,s}time is stored and loaded at the same time on different
> > > cpus. As said this is very improbable situation, but eventually could
> > > be possible on long lived processes.
> > 
> > "Improbable situation" doesn't appply to Linux. With millions (billion?)
> > of machines using it, a rare issue in the core turns into likely to happen
> > somewhere in the planet every second.
> > 
> > So it's definetly a race we want to consider. Note it goes beyond the scope
> > of this patchset as the issue was already there before since cputime_t can already
> > map to u64 on 32 bits systems upstream. But this patchset definetly extends
> > the issue on all 32 bits configs.
> > 
> > kcpustat has the same issue upstream. It's is made of u64 on all configs.
> 
> I would like to add what are possible consequences if value will be
> mangled. For sum_exec_runtime, utime and stime we could get wrong values
> on cpu-clock related syscalls like clock_gettime() or clock_nanosleep()
> and cpu-clock timers like timer_create(CLOCK_PROCESS_CPUTIME_ID) can be
> triggered before or long after expected. For kcpustat this seems to be
> wrong values read by procfs and 3 drivers (cpufreq, appldata, macintosh).

Yep, all agreed.

> 
> > > I considering fixing problem of sum_exec_runtime possible mangling
> > > by using prev_sum_exec_runtime:
> > > 
> > > u64 read_sum_exec_runtime(struct task_struct *t)
> > > {
> > >        u64 ns, prev_ns;
> > >  
> > >        do {
> > >                prev_ns = READ_ONCE(t->se.prev_sum_exec_runtime);
> > >                ns = READ_ONCE(t->se.sum_exec_runtime);
> > >        } while (ns < prev_ns || ns > (prev_ns + U32_MAX));
> > >  
> > >        return ns;
> > > }
> > > 
> > > This should work based on fact that prev_sum_exec_runtime and
> > > sum_exec_runtime are not modified and stored at the same time, so only
> > > one of those variabled can be mangled. Though I need to think about 
> > > correctnes of that a bit more.
> > 
> > I'm not sure that would be enough. READ_ONCE prevents from reordering by the
> > compiler but not by the CPU. You'd need memory barriers between reads and
> > writes of prev_ns and ns.
> 
> It will not be enough, this _suppose_ to work based on that sum_exec_runtime
> and prev_sum_exec_runtime are not written at the same time. i.e. only
> one variable can be mangled as another one sits already in the memory.
> However "not written at the same time" is weak part of reasoning. Even
> if those variables are stored at different part of code (sum_exec_runtime
> on update_curr() and prev_sum_exec_runtime on set_next_entity()) we can
> not assume store of one variable is finished before another one starts.

Right.

> 
> >    WRITE ns                READ prev_ns
> >    smp_wmb()               smp_rmb()
> >    WRITE prev_ns           READ ns
> >    smp_wmb()               smp_rmb()
> >
> > It seems to be the only way to make sure that at least one of the reads
> > (prev_ns or ns) is correct.

Well reading that again, I'm not 100% sure about the correctness guarantee.
But it might work.

> 
> I think you have right, but seems on much code paths we have scenario:
> 
> 	WRITE ns		READ prev_ns
> 	smp_wmb()		smp_rmb()
> 	WRITE prev_ns		READ ns
> 
> and we have already smp_wmb() after write of sum_exec_runtime on
> update_min_vruntime().

You still need a second barrier after the second write and read (or
before the first write and read, it's the same) to ensure that if you read
a mangled version of ns, prev_ns is ok.

Still I think u64_stats_sync is less trouble and more reliable.

Thanks.

  reply	other threads:[~2017-01-30 15:39 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-22 18:19 [PATCH 00/37] cputime: Convert core use of cputime_t to nsecs v3 Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 01/37] jiffies: Reuse TICK_NSEC instead of NSEC_PER_JIFFY Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 02/37] time: Introduce jiffies64_to_nsecs() Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 03/37] sched: Remove unused INIT_CPUTIME macro Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 04/37] cputime: Convert kcpustat to nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 05/37] macintosh/rack-meter: Remove cputime_t internal use Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 06/37] cputime: Convert guest time accounting to nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 07/37] cputime: Special API to return old-typed cputime Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 08/37] cputime: Convert task/group cputime to nsecs Frederic Weisbecker
2017-01-28 11:57   ` Stanislaw Gruszka
2017-01-28 15:28     ` Frederic Weisbecker
2017-01-30 13:54       ` Stanislaw Gruszka
2017-01-30 15:38         ` Frederic Weisbecker [this message]
2017-01-22 18:19 ` [PATCH 09/37] alpha: Convert obsolete cputime_t " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 10/37] x86: Convert obsolete cputime type " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 11/37] isdn: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 12/37] binfmt: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 13/37] acct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 14/37] delaycct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 15/37] tsacct: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 16/37] signal: " Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 17/37] cputime: Increment kcpustat directly on irqtime account Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 18/37] posix-timers: Use TICK_NSEC instead of a dynamically ad-hoc calculated version Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 19/37] posix-timers: Convert internals to use nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 20/37] itimer: Convert internal cputime_t units to nsec Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 21/37] sched: Remove temporary cputime_t accessors Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 22/37] cputime: Push time to account_user_time() in nsecs Frederic Weisbecker
2017-01-22 18:19 ` [PATCH 23/37] cputime: Push time to account_steal_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 24/37] cputime: Push time to account_idle_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 25/37] cputime: Push time to account_system_time() " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 26/37] cputime: Complete nsec conversion of tick based accounting Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 27/37] vtime: Return nsecs instead of cputime_t to account Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 28/37] cputime: Remove jiffies based cputime Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 29/37] ia64: Move nsecs based cputime headers to the last arch using it Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 30/37] ia64: Convert vtime to use nsec units directly Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 31/37] ia64: Remove unused cputime definitions Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 32/37] s390: Make arch_cpu_idle_time() to return nsecs Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 33/37] powerpc: Remove unused cputime definitions Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 34/37] s390: " Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 35/37] cputime: Remove unused nsec_to_cputime Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 36/37] cputime: Remove asm generic headers Frederic Weisbecker
2017-01-22 18:20 ` [PATCH 37/37] s390: Prevent from cputime leaks Frederic Weisbecker
2017-01-23  9:44   ` Martin Schwidefsky
2017-01-25 15:25     ` Frederic Weisbecker
2017-01-25 15:40       ` Martin Schwidefsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170130153807.GB16945@lerouge \
    --to=fweisbec@gmail.com \
    --cc=benh@kernel.crashing.org \
    --cc=fenghua.yu@intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=sgruszka@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=wanpeng.li@hotmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.