linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Question] Reading /proc/stat has a time backward issue
@ 2022-07-27  4:02 Lihua (lihua, ran)
  2022-08-04  7:44 ` Lihua (lihua, ran)
  0 siblings, 1 reply; 5+ messages in thread
From: Lihua (lihua, ran) @ 2022-07-27  4:02 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, open list:SCHEDULER

Hi all,

I found a problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
first:
cat /proc/stat |  grep cpu1
cpu1    319    0    496    41665    0    0    0    0    0    0
then:
cat /proc/stat |  grep cpu1
cpu1    318    0    497    41674    0    0    0    0    0    0

Time goes back, which is counterintuitive.

After debug this, I found that the problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:

                               CPU0                                                                          CPU1
First:
show_stat():
     ->kcpustat_cpu_fetch()
         ->kcpustat_cpu_fetch_vtime()
             ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta;              rq->curr is in user mod
              ---> When CPU1 rq->curr running on userspace, need add utime and delta
                                                                                              --->  rq->curr->vtime->utime is less than 1 tick
Then:
show_stat():
     ->kcpustat_cpu_fetch()
         ->kcpustat_cpu_fetch_vtime()
             ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu);                                     rq->curr is in kernel mod
             ---> When CPU1 rq->curr running on kernel space, just got kcpustat

Because the values ​​of utime、 stime and delta are temporarily written to cpustat. Therefore, there are two problems  read from /proc/stat:
1. There may be a regression phenomenon;
2. When there are many tasks, the statistics are not accurate enough when utime and stime do not exceed one TICK.
The time goes back is counterintuitive, and I want to discuss whether there is a good solution without compromising performance.

Thanks a lot.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Question] Reading /proc/stat has a time backward issue
  2022-07-27  4:02 [Question] Reading /proc/stat has a time backward issue Lihua (lihua, ran)
@ 2022-08-04  7:44 ` Lihua (lihua, ran)
  2022-08-08 12:23   ` Lihua (lihua, ran)
  0 siblings, 1 reply; 5+ messages in thread
From: Lihua (lihua, ran) @ 2022-08-04  7:44 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, open list:SCHEDULER

ping...

Any good suggestions?

thanks all.

在 2022/7/27 12:02, Lihua (lihua, ran) 写道:
> Hi all,
> 
> I found a problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
> first:
> cat /proc/stat |  grep cpu1
> cpu1    319    0    496    41665    0    0    0    0    0    0
> then:
> cat /proc/stat |  grep cpu1
> cpu1    318    0    497    41674    0    0    0    0    0    0
> 
> Time goes back, which is counterintuitive.
> 
> After debug this, I found that the problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
> 
>                                CPU0                                                                          CPU1
> First:
> show_stat():
>      ->kcpustat_cpu_fetch()
>          ->kcpustat_cpu_fetch_vtime()
>              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta;              rq->curr is in user mod
>               ---> When CPU1 rq->curr running on userspace, need add utime and delta
>                                                                                               --->  rq->curr->vtime->utime is less than 1 tick
> Then:
> show_stat():
>      ->kcpustat_cpu_fetch()
>          ->kcpustat_cpu_fetch_vtime()
>              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu);                                     rq->curr is in kernel mod
>              ---> When CPU1 rq->curr running on kernel space, just got kcpustat
> 
> Because the values ​​of utime、 stime and delta are temporarily written to cpustat. Therefore, there are two problems  read from /proc/stat:
> 1. There may be a regression phenomenon;
> 2. When there are many tasks, the statistics are not accurate enough when utime and stime do not exceed one TICK.
> The time goes back is counterintuitive, and I want to discuss whether there is a good solution without compromising performance.
> 
> Thanks a lot.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Question] Reading /proc/stat has a time backward issue
  2022-08-04  7:44 ` Lihua (lihua, ran)
@ 2022-08-08 12:23   ` Lihua (lihua, ran)
  2022-08-11 12:49     ` Frederic Weisbecker
  0 siblings, 1 reply; 5+ messages in thread
From: Lihua (lihua, ran) @ 2022-08-08 12:23 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, open list:SCHEDULER, frederic

ping...

Your suggestions are valuable, I don't have a good way to fix this.

thanks all.

在 2022/8/4 15:44, Lihua (lihua, ran) 写道:
> ping...
> 
> Any good suggestions?
> 
> thanks all.
> 
> 在 2022/7/27 12:02, Lihua (lihua, ran) 写道:
>> Hi all,
>>
>> I found a problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
>> first:
>> cat /proc/stat |  grep cpu1
>> cpu1    319    0    496    41665    0    0    0    0    0    0
>> then:
>> cat /proc/stat |  grep cpu1
>> cpu1    318    0    497    41674    0    0    0    0    0    0
>>
>> Time goes back, which is counterintuitive.
>>
>> After debug this, I found that the problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
>>
>>                                CPU0                                                                          CPU1
>> First:
>> show_stat():
>>      ->kcpustat_cpu_fetch()
>>          ->kcpustat_cpu_fetch_vtime()
>>              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta;              rq->curr is in user mod
>>               ---> When CPU1 rq->curr running on userspace, need add utime and delta
>>                                                                                               --->  rq->curr->vtime->utime is less than 1 tick
>> Then:
>> show_stat():
>>      ->kcpustat_cpu_fetch()
>>          ->kcpustat_cpu_fetch_vtime()
>>              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu);                                     rq->curr is in kernel mod
>>              ---> When CPU1 rq->curr running on kernel space, just got kcpustat
>>
>> Because the values ​​of utime、 stime and delta are temporarily written to cpustat. Therefore, there are two problems  read from /proc/stat:
>> 1. There may be a regression phenomenon;
>> 2. When there are many tasks, the statistics are not accurate enough when utime and stime do not exceed one TICK.
>> The time goes back is counterintuitive, and I want to discuss whether there is a good solution without compromising performance.
>>
>> Thanks a lot.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Question] Reading /proc/stat has a time backward issue
  2022-08-08 12:23   ` Lihua (lihua, ran)
@ 2022-08-11 12:49     ` Frederic Weisbecker
  2022-08-13 18:58       ` Li Hua
  0 siblings, 1 reply; 5+ messages in thread
From: Frederic Weisbecker @ 2022-08-11 12:49 UTC (permalink / raw)
  To: Lihua (lihua, ran)
  Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
	Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
	Daniel Bristot de Oliveira, open list:SCHEDULER

On Mon, Aug 08, 2022 at 08:23:46PM +0800, Lihua (lihua, ran) wrote:
> ping...
> 
> Your suggestions are valuable, I don't have a good way to fix this.
> 
> thanks all.
> 
> 在 2022/8/4 15:44, Lihua (lihua, ran) 写道:
> > ping...
> > 
> > Any good suggestions?
> > 
> > thanks all.
> > 
> > 在 2022/7/27 12:02, Lihua (lihua, ran) 写道:
> > > Hi all,
> > > 
> > > I found a problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows:
> > > first:
> > > cat /proc/stat |  grep cpu1
> > > cpu1    319    0    496    41665    0    0    0    0    0    0
> > > then:
> > > cat /proc/stat |  grep cpu1
> > > cpu1    318    0    497    41674    0    0    0    0    0    0
> > > 
> > > Time goes back, which is counterintuitive.
> > > 
> > > After debug this, I found that the problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
> > > 
> > >                                CPU0                                                                          CPU1
> > > First:
> > > show_stat():
> > >      ->kcpustat_cpu_fetch()
> > >          ->kcpustat_cpu_fetch_vtime()
> > >              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta;              rq->curr is in user mod
> > >               ---> When CPU1 rq->curr running on userspace, need add utime and delta
> > >                                                                                               --->  rq->curr->vtime->utime is less than 1 tick
> > > Then:
> > > show_stat():
> > >      ->kcpustat_cpu_fetch()
> > >          ->kcpustat_cpu_fetch_vtime()
> > >              ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu);                                     rq->curr is in kernel mod
> > >              ---> When CPU1 rq->curr running on kernel space, just got kcpustat
> > > 
> > > Because the values ​​of utime、 stime and delta are temporarily written to cpustat. Therefore, there are two problems  read from /proc/stat:
> > > 1. There may be a regression phenomenon;
> > > 2. When there are many tasks, the statistics are not accurate enough when utime and stime do not exceed one TICK.
> > > The time goes back is counterintuitive, and I want to discuss whether there is a good solution without compromising performance.

I see...

Does the following help? (only built tested)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 78a233d43757..410e35e178ac 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -802,6 +802,16 @@ void vtime_init_idle(struct task_struct *t, int cpu)
 	local_irq_restore(flags);
 }
 
+u64 static vtime_delta_filtered(u64 val, struct vtime *vtime)
+{
+	u64 delta = val + vtime_delta(vtime);
+
+	if (delta >= TICK_NSEC)
+		return delta;
+	else
+		return 0;
+}
+
 u64 task_gtime(struct task_struct *t)
 {
 	struct vtime *vtime = &t->vtime;
@@ -816,7 +826,7 @@ u64 task_gtime(struct task_struct *t)
 
 		gtime = t->gtime;
 		if (vtime->state == VTIME_GUEST)
-			gtime += vtime->gtime + vtime_delta(vtime);
+			gtime += vtime_delta_filtered(vtime->gtime, vtime);
 
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
@@ -832,7 +842,6 @@ bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 {
 	struct vtime *vtime = &t->vtime;
 	unsigned int seq;
-	u64 delta;
 	int ret;
 
 	if (!vtime_accounting_enabled()) {
@@ -853,16 +862,15 @@ bool task_cputime(struct task_struct *t, u64 *utime, u64 *stime)
 			continue;
 
 		ret = true;
-		delta = vtime_delta(vtime);
 
 		/*
 		 * Task runs either in user (including guest) or kernel space,
 		 * add pending nohz time to the right place.
 		 */
 		if (vtime->state == VTIME_SYS)
-			*stime += vtime->stime + delta;
+			*stime += vtime_delta_filtered(vtime->stime, vtime);
 		else
-			*utime += vtime->utime + delta;
+			*utime += vtime_delta_filtered(vtime->utime, vtime);
 	} while (read_seqcount_retry(&vtime->seqcount, seq));
 
 	return ret;
@@ -897,9 +905,9 @@ static int vtime_state_fetch(struct vtime *vtime, int cpu)
 static u64 kcpustat_user_vtime(struct vtime *vtime)
 {
 	if (vtime->state == VTIME_USER)
-		return vtime->utime + vtime_delta(vtime);
+		return vtime_delta_filtered(vtime->utime, vtime);
 	else if (vtime->state == VTIME_GUEST)
-		return vtime->gtime + vtime_delta(vtime);
+		return vtime_delta_filtered(vtime->gtime, vtime);
 	return 0;
 }
 
@@ -932,7 +940,7 @@ static int kcpustat_field_vtime(u64 *cpustat,
 		switch (usage) {
 		case CPUTIME_SYSTEM:
 			if (state == VTIME_SYS)
-				*val += vtime->stime + vtime_delta(vtime);
+				*val += vtime_delta_filtered(vtime->stime, vtime);
 			break;
 		case CPUTIME_USER:
 			if (task_nice(tsk) <= 0)
@@ -944,11 +952,11 @@ static int kcpustat_field_vtime(u64 *cpustat,
 			break;
 		case CPUTIME_GUEST:
 			if (state == VTIME_GUEST && task_nice(tsk) <= 0)
-				*val += vtime->gtime + vtime_delta(vtime);
+				*val += vtime_delta_filtered(vtime->gtime, vtime);
 			break;
 		case CPUTIME_GUEST_NICE:
 			if (state == VTIME_GUEST && task_nice(tsk) > 0)
-				*val += vtime->gtime + vtime_delta(vtime);
+				*val += vtime_delta_filtered(vtime->gtime, vtime);
 			break;
 		default:
 			break;
@@ -1001,7 +1009,6 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 
 	do {
 		u64 *cpustat;
-		u64 delta;
 		int state;
 
 		seq = read_seqcount_begin(&vtime->seqcount);
@@ -1017,27 +1024,25 @@ static int kcpustat_cpu_fetch_vtime(struct kernel_cpustat *dst,
 		if (state < VTIME_SYS)
 			continue;
 
-		delta = vtime_delta(vtime);
-
 		/*
 		 * Task runs either in user (including guest) or kernel space,
 		 * add pending nohz time to the right place.
 		 */
 		if (state == VTIME_SYS) {
-			cpustat[CPUTIME_SYSTEM] += vtime->stime + delta;
+			cpustat[CPUTIME_SYSTEM] += vtime_delta_filtered(vtime->stime, vtime);
 		} else if (state == VTIME_USER) {
 			if (task_nice(tsk) > 0)
-				cpustat[CPUTIME_NICE] += vtime->utime + delta;
+				cpustat[CPUTIME_NICE] += vtime_delta_filtered(vtime->utime, vtime);
 			else
-				cpustat[CPUTIME_USER] += vtime->utime + delta;
+				cpustat[CPUTIME_USER] += vtime_delta_filtered(vtime->utime, vtime);
 		} else {
 			WARN_ON_ONCE(state != VTIME_GUEST);
 			if (task_nice(tsk) > 0) {
-				cpustat[CPUTIME_GUEST_NICE] += vtime->gtime + delta;
-				cpustat[CPUTIME_NICE] += vtime->gtime + delta;
+				cpustat[CPUTIME_GUEST_NICE] += vtime_delta_filtered(vtime->gtime, vtime);
+				cpustat[CPUTIME_NICE] += vtime_delta_filtered(vtime->gtime, vtime);
 			} else {
-				cpustat[CPUTIME_GUEST] += vtime->gtime + delta;
-				cpustat[CPUTIME_USER] += vtime->gtime + delta;
+				cpustat[CPUTIME_GUEST] += vtime_delta_filtered(vtime->gtime, vtime);
+				cpustat[CPUTIME_USER] += vtime_delta_filtered(vtime->gtime, vtime);
 			}
 		}
 	} while (read_seqcount_retry(&vtime->seqcount, seq));


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Question] Reading /proc/stat has a time backward issue
  2022-08-11 12:49     ` Frederic Weisbecker
@ 2022-08-13 18:58       ` Li Hua
  0 siblings, 0 replies; 5+ messages in thread
From: Li Hua @ 2022-08-13 18:58 UTC (permalink / raw)
  To: frederic
  Cc: bristot, bsegall, dietmar.eggemann, hucool.lihua, juri.lelli,
	linux-kernel, mgorman, mingo, peterz, rostedt, vincent.guittot

Sorry for not seeing your reply in time

Your patch doesn't seem to fix the problem.  The root cause of the problem is that
the "vtime->utime" and "delta" are temporarily added to the stack and show to the user.

I tried to submit a patch to avoid presenting time backwards to the user. as bellow:
https://lkml.org/lkml/2022/8/12/261

Hope you have better suggestions, thank you for your reply.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-08-13  5:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-27  4:02 [Question] Reading /proc/stat has a time backward issue Lihua (lihua, ran)
2022-08-04  7:44 ` Lihua (lihua, ran)
2022-08-08 12:23   ` Lihua (lihua, ran)
2022-08-11 12:49     ` Frederic Weisbecker
2022-08-13 18:58       ` Li Hua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).