On Mon, 2016-06-27 at 14:25 +0200, Frederic Weisbecker wrote: > On Wed, Jun 22, 2016 at 10:25:47PM -0400, riel@redhat.com wrote: > > > > From: Rik van Riel > > > > Currently, if there was any irq or softirq time during 'ticks' > > jiffies, the entire period will be accounted as irq or softirq > > time. > > > > This is inaccurate if only a subset of 'ticks' jiffies was > > actually spent handling irqs, and could conceivably mis-count > > all of the ticks during a period as irq time, when there was > > some irq and some softirq time. > Good catch! > > Many comments following. > > > > > > > This can actually happen when irqtime_account_process_tick > > is called from account_idle_ticks, which can pass a larger > > number of ticks down all at once. > > > > Fix this by changing irqtime_account_hi_update and > > irqtime_account_si_update to round elapsed irq and softirq > > time to jiffies, and return the number of jiffies spent in > > each mode, similar to how steal time is handled. > > > > Additionally, have irqtime_account_process_tick take into > > account how much time was spent in each of steal, irq, > > and softirq time. > > > > The latter could help improve the accuracy of timekeeping > Maybe you meant cputime? Timekeeping is rather about jiffies and > GTOD. > > > > > when returning from idle on a NO_HZ_IDLE CPU. > > > > Properly accounting how much time was spent in hardirq and > > softirq time will also allow the NO_HZ_FULL code to re-use > > these same functions for hardirq and softirq accounting. > > > > Signed-off-by: Rik van Riel > >  > >   local_irq_save(flags); > > - latest_ns = this_cpu_read(cpu_hardirq_time); > > - if (nsecs_to_cputime64(latest_ns) > cpustat[CPUTIME_IRQ]) > > - ret = 1; > > + irq = this_cpu_read(cpu_hardirq_time) - > > cpustat[CPUTIME_IRQ]; > cpu_hardirq_time is made of nsecs whereas cpustat is of cputime_t (in > fact > even cputime64_t). So you need to convert cpu_hardirq_time before > doing the > substract. Doh. Good catch! > > -static int irqtime_account_si_update(void) > > +static unsigned long irqtime_account_si_update(unsigned long > > max_jiffies) > >  { > >   u64 *cpustat = kcpustat_this_cpu->cpustat; > > + unsigned long si_jiffies = 0; > >   unsigned long flags; > > - u64 latest_ns; > > - int ret = 0; > > + u64 softirq; > >   > >   local_irq_save(flags); > > - latest_ns = this_cpu_read(cpu_softirq_time); > > - if (nsecs_to_cputime64(latest_ns) > > > cpustat[CPUTIME_SOFTIRQ]) > > - ret = 1; > > + softirq = this_cpu_read(cpu_softirq_time) - > > cpustat[CPUTIME_SOFTIRQ]; > > + if (softirq > cputime_one_jiffy) { > > + si_jiffies = min(max_jiffies, > > cputime_to_jiffies(softirq)); > > + cpustat[CPUTIME_SOFTIRQ] += > > jiffies_to_cputime(si_jiffies); > > + } > >   local_irq_restore(flags); > > - return ret; > > + return si_jiffies; > So same comments apply here. > > [...] > > > >   * Accumulate raw cputime values of dead tasks (sig->[us]time) and > > live > >   * tasks (sum on group iteration) belonging to @tsk's group. > >   */ > > @@ -344,19 +378,24 @@ static void > > irqtime_account_process_tick(struct task_struct *p, int user_tick, > >  { > >   cputime_t scaled = cputime_to_scaled(cputime_one_jiffy); > >   u64 cputime = (__force u64) cputime_one_jiffy; > > - u64 *cpustat = kcpustat_this_cpu->cpustat; > > + unsigned long other; > >   > > - if (steal_account_process_tick(ULONG_MAX)) > > + /* > > +  * When returning from idle, many ticks can get accounted > > at > > +  * once, including some ticks of steal, irq, and softirq > > time. > > +  * Subtract those ticks from the amount of time accounted > > to > > +  * idle, or potentially user or system time. Due to > > rounding, > > +  * other time can exceed ticks occasionally. > > +  */ > > + other = account_other_ticks(ticks); > > + if (other >= ticks) > >   return; > > + ticks -= other; > >   > >   cputime *= ticks; > >   scaled *= ticks; > So instead of dealing with ticks here, I think you should rather use > the above > cputime as both the limit and the remaining time to account after > steal/irqs. > > This should avoid some middle conversions and improve precision when > cputime_t == nsecs granularity. > > If we account 2 ticks to idle (lets say HZ=100) and irq time to > account is 15 ms. 2 ticks = 20 ms > so we have 5 ms left to account to idle. With the jiffies granularity > in this patch, we would account > one tick to irqtime (1 tick = 10 ms, there will be 5 ms to account > back later) and one tick to idle > time whereas if you deal with cputime_t, you are going to account the > correct amount of idle time. > Ahhh, so you want irqtime_account_process_tick to work with and account fractional ticks when calling account_system_time, account_user_time, account_idle_time, etc? I guess that should work fine since we already pass cputime values in, anyway. I suppose we can do the same for get_vtime_delta, too. They can both work with the actual remaining time (in cputime_t), after the other time has been subtracted. I can rework the series to do that. -- All Rights Reversed.