linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] de_thread() should update ->real_start_time
@ 2013-06-10 18:33 Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 1/3] de_thread: mt-exec " Oleg Nesterov
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-10 18:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, John Stultz, Tomas Janousek, Tomas Smetana,
	linux-kernel

1/3 is the obvious bugfix, 2 and 3 are minor cleanups

I am wondering, can't we kill task->real_start_time? What if we
simply change copy_process

	-	do_posix_clock_monotonic_gettime(&p->start_time);
	+	get_monotonic_boottime(&p->start_time);

?

Afaics, this will only affect do_acct_process() and bacct_add_tsk(),
but do we really want to exclude the suspended time in this case?


Another user of ->start_time is cgroup.c and it looks wrong... But
the change above should not make any difference.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] de_thread: mt-exec should update ->real_start_time
  2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
@ 2013-06-10 18:33 ` Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 2/3] uptime_proc_show: use get_monotonic_boottime() Oleg Nesterov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-10 18:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, John Stultz, Tomas Janousek, Tomas Smetana,
	linux-kernel

924b42d5 "Use boot based time for process start time and boot
time in /proc" updated copy_process/do_task_stat but forgot
about de_thread(). This breaks "ps axOT" if a sub-thread execs.

Note: I think that task->start_time should die.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/exec.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/exec.c b/fs/exec.c
index 00eaba7..aeace12 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -930,6 +930,7 @@ static int de_thread(struct task_struct *tsk)
 		 * also take its birthdate (always earlier than our own).
 		 */
 		tsk->start_time = leader->start_time;
+		tsk->real_start_time = leader->real_start_time;
 
 		BUG_ON(!same_thread_group(leader, tsk));
 		BUG_ON(has_group_leader_pid(tsk));
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] uptime_proc_show: use get_monotonic_boottime()
  2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 1/3] de_thread: mt-exec " Oleg Nesterov
@ 2013-06-10 18:33 ` Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 3/3] do_sysinfo: " Oleg Nesterov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-10 18:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, John Stultz, Tomas Janousek, Tomas Smetana,
	linux-kernel

Change uptime_proc_show() to use get_monotonic_boottime() instead
of do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/proc/uptime.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/proc/uptime.c b/fs/proc/uptime.c
index 9610ac7..0618946 100644
--- a/fs/proc/uptime.c
+++ b/fs/proc/uptime.c
@@ -20,8 +20,7 @@ static int uptime_proc_show(struct seq_file *m, void *v)
 	for_each_possible_cpu(i)
 		idletime += (__force u64) kcpustat_cpu(i).cpustat[CPUTIME_IDLE];
 
-	do_posix_clock_monotonic_gettime(&uptime);
-	monotonic_to_bootbased(&uptime);
+	get_monotonic_boottime(&uptime);
 	nsec = cputime64_to_jiffies64(idletime) * TICK_NSEC;
 	idle.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
 	idle.tv_nsec = rem;
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] do_sysinfo: use get_monotonic_boottime()
  2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 1/3] de_thread: mt-exec " Oleg Nesterov
  2013-06-10 18:33 ` [PATCH 2/3] uptime_proc_show: use get_monotonic_boottime() Oleg Nesterov
@ 2013-06-10 18:33 ` Oleg Nesterov
  2013-06-10 19:48 ` [PATCH 0/3] de_thread() should update ->real_start_time John Stultz
  2013-06-10 20:18 ` John Stultz
  4 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-10 18:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Eric W. Biederman, John Stultz, Tomas Janousek, Tomas Smetana,
	linux-kernel

Change do_sysinfo() to use get_monotonic_boottime() instead of
do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/sys.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/kernel/sys.c b/kernel/sys.c
index 50f8677..746575d 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2343,8 +2343,7 @@ static int do_sysinfo(struct sysinfo *info)
 
 	memset(info, 0, sizeof(struct sysinfo));
 
-	ktime_get_ts(&tp);
-	monotonic_to_bootbased(&tp);
+	get_monotonic_boottime(&tp);
 	info->uptime = tp.tv_sec + (tp.tv_nsec ? 1 : 0);
 
 	get_avenrun(info->loads, 0, SI_LOAD_SHIFT - FSHIFT);
-- 
1.5.5.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] de_thread() should update ->real_start_time
  2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
                   ` (2 preceding siblings ...)
  2013-06-10 18:33 ` [PATCH 3/3] do_sysinfo: " Oleg Nesterov
@ 2013-06-10 19:48 ` John Stultz
  2013-06-11 17:13   ` Oleg Nesterov
  2013-06-10 20:18 ` John Stultz
  4 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2013-06-10 19:48 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Tomas Janousek, Tomas Smetana,
	Linux Kernel Mailing List

On Mon, Jun 10, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 1/3 is the obvious bugfix, 2 and 3 are minor cleanups

Acked-by: John Stultz <john.stultz@linaro.org> for these three patches.



> I am wondering, can't we kill task->real_start_time? What if we

Yea this sounds like a good idea.  Additionally the name
real_start_time is also confusing, since it naively  seems it might be
the CLOCK_REALTIME start time, which is incorrect.

> simply change copy_process
>
>         -       do_posix_clock_monotonic_gettime(&p->start_time);
>         +       get_monotonic_boottime(&p->start_time);
>
> ?
>
> Afaics, this will only affect do_acct_process() and bacct_add_tsk(),
> but do we really want to exclude the suspended time in this case?

So bacct_add_tsk seems easy to change, since its just:
    do_posix_clock_monotonic_gettime(&uptime);
    ts = timespec_sub(uptime, tsk->start_time);

So grabbing the monotonic boot time for uptime would provide the same
relative delta.

It looks like the same is true in do_acct_process().

> Another user of ->start_time is cgroup.c and it looks wrong... But
> the change above should not make any difference.

The cgroup usage I'm unfamiliar with. Though from the comments in the
code, it seems like using boottime would be ok for this.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] de_thread() should update ->real_start_time
  2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
                   ` (3 preceding siblings ...)
  2013-06-10 19:48 ` [PATCH 0/3] de_thread() should update ->real_start_time John Stultz
@ 2013-06-10 20:18 ` John Stultz
  4 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2013-06-10 20:18 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Tomas Smetana,
	Linux Kernel Mailing List

On Mon, Jun 10, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> 1/3 is the obvious bugfix, 2 and 3 are minor cleanups
>
> I am wondering, can't we kill task->real_start_time? What if we
> simply change copy_process
>
>         -       do_posix_clock_monotonic_gettime(&p->start_time);
>         +       get_monotonic_boottime(&p->start_time);


An additional thought: task_struct covers quite a bit of stuff, so I
don't know if this would be too useful, but we could further change
start_time to be a ktime_t, allowing possible additional space savings
in the task_struct on 64bit systems.

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] de_thread() should update ->real_start_time
  2013-06-10 19:48 ` [PATCH 0/3] de_thread() should update ->real_start_time John Stultz
@ 2013-06-11 17:13   ` Oleg Nesterov
  2013-06-11 18:14     ` John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-11 17:13 UTC (permalink / raw)
  To: John Stultz
  Cc: Andrew Morton, Eric W. Biederman, Tomas Janousek, Tomas Smetana,
	Linux Kernel Mailing List

On 06/10, John Stultz wrote:
>
> On Mon, Jun 10, 2013 at 11:33 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> > 1/3 is the obvious bugfix, 2 and 3 are minor cleanups
>
> Acked-by: John Stultz <john.stultz@linaro.org> for these three patches.

Thanks!

> > simply change copy_process
> >
> >         -       do_posix_clock_monotonic_gettime(&p->start_time);
> >         +       get_monotonic_boottime(&p->start_time);
> >
> > ?
> >
> > Afaics, this will only affect do_acct_process() and bacct_add_tsk(),
> > but do we really want to exclude the suspended time in this case?
>
> So bacct_add_tsk seems easy to change, since its just:
>     do_posix_clock_monotonic_gettime(&uptime);
>     ts = timespec_sub(uptime, tsk->start_time);
>
> So grabbing the monotonic boot time for uptime would provide the same
> relative delta.

Not really, or I misunderstood monotonic/boottime interaction.

IIUC, monotonic doesn't grow during suspend, so the delta can grow if
we use get_monotonic_boottime() in copy_process() and bacct_add_tsk()
and the system was suspended in between. Right?

But perhaps this is fine and even more correct?

> > Another user of ->start_time is cgroup.c and it looks wrong... But
> > the change above should not make any difference.
>
> The cgroup usage I'm unfamiliar with. Though from the comments in the
> code, it seems like using boottime would be ok for this.

Yes, this change can't make any difference. But this code looks wrong
although I'll try to recheck later.  We can't trust started_after()
exactly because ->start_time can be changed by mt-exec. But this is
offtopic.


> An additional thought: task_struct covers quite a bit of stuff, so I
> don't know if this would be too useful, but we could further change
> start_time to be a ktime_t, allowing possible additional space savings
> in the task_struct on 64bit systems.

Agreed.

Oleg.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] de_thread() should update ->real_start_time
  2013-06-11 17:13   ` Oleg Nesterov
@ 2013-06-11 18:14     ` John Stultz
  2013-06-11 20:06       ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2013-06-11 18:14 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, Eric W. Biederman, Tomas Janousek, Tomas Smetana,
	Linux Kernel Mailing List

On 06/11/2013 10:13 AM, Oleg Nesterov wrote:
> On 06/10, John Stultz wrote:
>
>>> simply change copy_process
>>>
>>>          -       do_posix_clock_monotonic_gettime(&p->start_time);
>>>          +       get_monotonic_boottime(&p->start_time);
>>>
>>> ?
>>>
>>> Afaics, this will only affect do_acct_process() and bacct_add_tsk(),
>>> but do we really want to exclude the suspended time in this case?
>> So bacct_add_tsk seems easy to change, since its just:
>>      do_posix_clock_monotonic_gettime(&uptime);
>>      ts = timespec_sub(uptime, tsk->start_time);
>>
>> So grabbing the monotonic boot time for uptime would provide the same
>> relative delta.
> Not really, or I misunderstood monotonic/boottime interaction.
>
> IIUC, monotonic doesn't grow during suspend, so the delta can grow if
> we use get_monotonic_boottime() in copy_process() and bacct_add_tsk()
> and the system was suspended in between. Right?

Oh right. Good point. The suspend time may not be constant between the 
calculations.


> But perhaps this is fine and even more correct?

Hrmm.. Looking closer at what the calculations are used for, I worry 
changing to counting suspend time in elapsed run time would be a 
userland visible behaivor change that might be problematic.

That said, elapsed run time as it exists now is not really a useful 
measurement, since you get different results depending on the situation: 
ie, a VM that doesn't get much cpu time vs a system that suspends 
frequently. In one case, you seem to have been running for a long time, 
but not getting much cpu runtime, where as the other you might appear to 
get quite a bit of the possible execution time.

This all goes back to issues around what suspend-state really is. Where 
in previously it was viewed to be user-controlled and considered closer 
to the system temporarily being off - thus the intent was to make 
suspend invisible/hidden to the system itself, but more recently, with 
systems suspending quite frequently suspend state is more 
invisible/hidden to the end user, and is closer to a deep idle state.

Back in the day folks were up in arms that someone could "cheat" their 
way to large uptime values by leaving their system suspended. But, if I 
could do it again, I'd probably push for CLOCK_MONOTONIC (as exported to 
userland) and all of these user-visible metrics to include suspend time.

So I think it probably *makes more sense* to include suspend_time in the 
elapsed runtime value being exported via bacct_add_tsk() and 
do_acct_process(), but I unfortunately worry now any such change would 
risk breaking userland expectations.

The *actual* risk may be quite minor, so this could be one of those: 
"Let the tree fall and if no one is there to hear it, fine" interface 
breaks, but I'm not sure I'm eager enough to be the one proposing it. :)

thanks
-john

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] de_thread() should update ->real_start_time
  2013-06-11 18:14     ` John Stultz
@ 2013-06-11 20:06       ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2013-06-11 20:06 UTC (permalink / raw)
  To: John Stultz
  Cc: Andrew Morton, Eric W. Biederman, Tomas Janousek, Tomas Smetana,
	Linux Kernel Mailing List

On 06/11, John Stultz wrote:
>
> On 06/11/2013 10:13 AM, Oleg Nesterov wrote:
>
>> But perhaps this is fine and even more correct?
>
> So I think it probably *makes more sense* to include suspend_time in the
> elapsed runtime value being exported via bacct_add_tsk() and
> do_acct_process(), but I unfortunately worry now any such change would
> risk breaking userland expectations.
>
> The *actual* risk may be quite minor, so this could be one of those:
> "Let the tree fall and if no one is there to hear it, fine" interface
> breaks, but I'm not sure I'm eager enough to be the one proposing it. :)

Yes, same thoughts here ;)

Still it is ugly imho to keep task->start_time just for taskstats,
and _I think_ nobody really cares. Perhaps I'll try to send the patch
later...

And look. It seems that ->ac_btime (Process Creation Time) in
bacct_add_tsk() is obviously wrong anyway? So perhaps we can fix
this and in this case we can also change the meaning of start_time.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-06-11 20:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-10 18:33 [PATCH 0/3] de_thread() should update ->real_start_time Oleg Nesterov
2013-06-10 18:33 ` [PATCH 1/3] de_thread: mt-exec " Oleg Nesterov
2013-06-10 18:33 ` [PATCH 2/3] uptime_proc_show: use get_monotonic_boottime() Oleg Nesterov
2013-06-10 18:33 ` [PATCH 3/3] do_sysinfo: " Oleg Nesterov
2013-06-10 19:48 ` [PATCH 0/3] de_thread() should update ->real_start_time John Stultz
2013-06-11 17:13   ` Oleg Nesterov
2013-06-11 18:14     ` John Stultz
2013-06-11 20:06       ` Oleg Nesterov
2013-06-10 20:18 ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).