All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] proc: make high precision system boot time available
@ 2013-06-29 13:14 Sami Kerola
  2013-06-29 14:23 ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Sami Kerola @ 2013-06-29 13:14 UTC (permalink / raw)
  To: linux-kernel
  Cc: kerolasa, Karel Zak, Lennart Poettering, Kay Sievers, Eric W. Biederman

The kernel does not expose precise start time anywhere.  Precision of
sysinfo() is limited to a second.  The /proc/uptime has precision of two
decimals.  Neither are good enough when klog messages are displayed with
a log time stamp that is converted to human understandable format, such
as ISO-8601.  The problem is that imprecise knowledge about uptime will
lead human readable klog time stamps to wiggle in user commands, such as
dmesg(1).

Additionally the current dmesg and journalctl(1) are disagreeing when
kernel events occurred.  The dmesg is using sysinfo(2) to determine boot
time, while journalctl is using systemd(1)'s early clock_gettime(2) call.
Here is demonstration how different baseline effects on a random laptop.

$ dmesg --ctime | grep 'Initializing cgroup subsys cpuset'
[Fri Jun 28 07:40:12 2013] Initializing cgroup subsys cpuset
$ journalctl -b | grep 'Initializing cgroup subsys cpuset'
Jun 28 07:40:21 kerolasa-home kernel: Initializing cgroup subsys cpuset

To solve the wiggling problem, and to have a boot time coming from source
which everyone can agree is 'good enough' estimation of system start up,
the kernel needs to tell the boot time as precisily as it can.

CC: Karel Zak <kzak@redhat.com>
CC: Lennart Poettering <lennart@poettering.net>
CC: Kay Sievers <kay@vrfy.org>
CC: Eric W. Biederman <ebiederm@xmission.com>
References: http://www.spinics.net/lists/util-linux-ng/msg08092.html
Measurement: https://plus.google.com/u/0/117341944650953092607/posts/SHy5qu1HpSU
Signed-off-by: Sami Kerola <kerolasa@iki.fi>
---
 Documentation/ABI/testing/procfs-boottime | 15 ++++++++++++++
 fs/proc/Makefile                          |  1 +
 fs/proc/boottime.c                        | 33 +++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)
 create mode 100644 Documentation/ABI/testing/procfs-boottime
 create mode 100644 fs/proc/boottime.c

diff --git a/Documentation/ABI/testing/procfs-boottime b/Documentation/ABI/testing/procfs-boottime
new file mode 100644
index 0000000..d5bf8f2
--- /dev/null
+++ b/Documentation/ABI/testing/procfs-boottime
@@ -0,0 +1,15 @@
+What:		/proc/boottime
+Date:		June 2013
+KernelVersion:	3.11
+Contact:	Sami Kerola <kerolasa@iki.fi>
+
+Description:	Show high precision time stamp when linux kernel thinks
+		the system start up.  The value is suitable to be used as
+		struct timespec.
+
+		struct timespec ts;
+		hpu = fopen("/sys/kernel/boottime", "r");
+		fscanf(hpu, "%ld.%ld", &ts.tv_sec, &ts.tv_nsec);
+
+Users:		At least utilities that print klog entries in
+		times stamps converted for humans.
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index ab30716..5a7ec5a 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -10,6 +10,7 @@ proc-$(CONFIG_MMU)	:= task_mmu.o
 proc-y       += inode.o root.o base.o generic.o array.o \
 		fd.o
 proc-$(CONFIG_TTY)      += proc_tty.o
+proc-y	+= boottime.o
 proc-y	+= cmdline.o
 proc-y	+= consoles.o
 proc-y	+= cpuinfo.o
diff --git a/fs/proc/boottime.c b/fs/proc/boottime.c
new file mode 100644
index 0000000..f56c624
--- /dev/null
+++ b/fs/proc/boottime.c
@@ -0,0 +1,33 @@
+#include <linux/fs.h>
+#include <linux/init.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/time.h>
+
+static int boottime_proc_show(struct seq_file *m, void *v)
+{
+	struct timespec ts;
+	getboottime(&ts);
+	seq_printf(m, "%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
+	return 0;
+}
+
+static int boottime_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, boottime_proc_show, NULL);
+}
+
+static const struct file_operations boottime_proc_fops = {
+	.open = boottime_proc_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = single_release,
+};
+
+static int __init proc_boottime_init(void)
+{
+	proc_create("boottime", 0, NULL, &boottime_proc_fops);
+	return 0;
+}
+
+module_init(proc_boottime_init);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] proc: make high precision system boot time available
  2013-06-29 13:14 [PATCH] proc: make high precision system boot time available Sami Kerola
@ 2013-06-29 14:23 ` Eric W. Biederman
  2013-06-29 20:53   ` Sami Kerola
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2013-06-29 14:23 UTC (permalink / raw)
  To: Sami Kerola; +Cc: linux-kernel, Karel Zak, Lennart Poettering, Kay Sievers


Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>

The pach does not solve the problem described.

Sami Kerola <kerolasa@iki.fi> writes:

> The kernel does not expose precise start time anywhere.  Precision of
> sysinfo() is limited to a second.  The /proc/uptime has precision of two
> decimals.  Neither are good enough when klog messages are displayed with
> a log time stamp that is converted to human understandable format, such
> as ISO-8601.  The problem is that imprecise knowledge about uptime will
> lead human readable klog time stamps to wiggle in user commands, such as
> dmesg(1).

Using an adjusted idea like boottime will also cause wiggle whenever the
clock is set, or we have a leap second.  So I can't see this solving
anything.

> Additionally the current dmesg and journalctl(1) are disagreeing when
> kernel events occurred.  The dmesg is using sysinfo(2) to determine boot
> time, while journalctl is using systemd(1)'s early clock_gettime(2) call.
> Here is demonstration how different baseline effects on a random laptop.
>
> $ dmesg --ctime | grep 'Initializing cgroup subsys cpuset'
> [Fri Jun 28 07:40:12 2013] Initializing cgroup subsys cpuset
> $ journalctl -b | grep 'Initializing cgroup subsys cpuset'
> Jun 28 07:40:21 kerolasa-home kernel: Initializing cgroup subsys cpuset

Here you are show nearly 10 seconds of difference in time.  Subsecond
accuracy is not likely to help you here.

> To solve the wiggling problem, and to have a boot time coming from source
> which everyone can agree is 'good enough' estimation of system start up,
> the kernel needs to tell the boot time as precisily as it can.

Quite frankly none of this makes a lick of sense to me.  Your deep
complaint is a human pretty printer is not matching the results of
another human pretty printer.

I fail to see how the limitations of pretty printers justifies
maintiaing yet another kernel interface.  Especially for something so
dubious and fluctuating as the time we think the wall clock read when we
started up.

I can't imagine this interface doing anything except generating more
user complaints.  A made up number with more digits after the decimal
point remains a made up number.

Eric


> CC: Karel Zak <kzak@redhat.com>
> CC: Lennart Poettering <lennart@poettering.net>
> CC: Kay Sievers <kay@vrfy.org>
> CC: Eric W. Biederman <ebiederm@xmission.com>
> References: http://www.spinics.net/lists/util-linux-ng/msg08092.html
> Measurement: https://plus.google.com/u/0/117341944650953092607/posts/SHy5qu1HpSU
> Signed-off-by: Sami Kerola <kerolasa@iki.fi>
> ---
>  Documentation/ABI/testing/procfs-boottime | 15 ++++++++++++++
>  fs/proc/Makefile                          |  1 +
>  fs/proc/boottime.c                        | 33 +++++++++++++++++++++++++++++++
>  3 files changed, 49 insertions(+)
>  create mode 100644 Documentation/ABI/testing/procfs-boottime
>  create mode 100644 fs/proc/boottime.c
>
> diff --git a/Documentation/ABI/testing/procfs-boottime b/Documentation/ABI/testing/procfs-boottime
> new file mode 100644
> index 0000000..d5bf8f2
> --- /dev/null
> +++ b/Documentation/ABI/testing/procfs-boottime
> @@ -0,0 +1,15 @@
> +What:		/proc/boottime
> +Date:		June 2013
> +KernelVersion:	3.11
> +Contact:	Sami Kerola <kerolasa@iki.fi>
> +
> +Description:	Show high precision time stamp when linux kernel thinks
> +		the system start up.  The value is suitable to be used as
> +		struct timespec.
> +
> +		struct timespec ts;
> +		hpu = fopen("/sys/kernel/boottime", "r");
> +		fscanf(hpu, "%ld.%ld", &ts.tv_sec, &ts.tv_nsec);
> +
> +Users:		At least utilities that print klog entries in
> +		times stamps converted for humans.
> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
> index ab30716..5a7ec5a 100644
> --- a/fs/proc/Makefile
> +++ b/fs/proc/Makefile
> @@ -10,6 +10,7 @@ proc-$(CONFIG_MMU)	:= task_mmu.o
>  proc-y       += inode.o root.o base.o generic.o array.o \
>  		fd.o
>  proc-$(CONFIG_TTY)      += proc_tty.o
> +proc-y	+= boottime.o
>  proc-y	+= cmdline.o
>  proc-y	+= consoles.o
>  proc-y	+= cpuinfo.o
> diff --git a/fs/proc/boottime.c b/fs/proc/boottime.c
> new file mode 100644
> index 0000000..f56c624
> --- /dev/null
> +++ b/fs/proc/boottime.c
> @@ -0,0 +1,33 @@
> +#include <linux/fs.h>
> +#include <linux/init.h>
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/time.h>
> +
> +static int boottime_proc_show(struct seq_file *m, void *v)
> +{
> +	struct timespec ts;
> +	getboottime(&ts);
> +	seq_printf(m, "%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
> +	return 0;
> +}
> +
> +static int boottime_proc_open(struct inode *inode, struct file *file)
> +{
> +	return single_open(file, boottime_proc_show, NULL);
> +}
> +
> +static const struct file_operations boottime_proc_fops = {
> +	.open = boottime_proc_open,
> +	.read = seq_read,
> +	.llseek = seq_lseek,
> +	.release = single_release,
> +};
> +
> +static int __init proc_boottime_init(void)
> +{
> +	proc_create("boottime", 0, NULL, &boottime_proc_fops);
> +	return 0;
> +}
> +
> +module_init(proc_boottime_init);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] proc: make high precision system boot time available
  2013-06-29 14:23 ` Eric W. Biederman
@ 2013-06-29 20:53   ` Sami Kerola
  2013-06-30 15:10     ` Kay Sievers
  0 siblings, 1 reply; 5+ messages in thread
From: Sami Kerola @ 2013-06-29 20:53 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, Karel Zak, Lennart Poettering, Kay Sievers

On 29 June 2013 15:23, Eric W. Biederman <ebiederm@xmission.com> wrote:
>
> Nacked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> The pach does not solve the problem described.

Which might be true, and I am open to better proposals.

> Sami Kerola <kerolasa@iki.fi> writes:
>
>> The kernel does not expose precise start time anywhere.  Precision of
>> sysinfo() is limited to a second.  The /proc/uptime has precision of two
>> decimals.  Neither are good enough when klog messages are displayed with
>> a log time stamp that is converted to human understandable format, such
>> as ISO-8601.  The problem is that imprecise knowledge about uptime will
>> lead human readable klog time stamps to wiggle in user commands, such as
>> dmesg(1).
>
> Using an adjusted idea like boottime will also cause wiggle whenever the
> clock is set, or we have a leap second.  So I can't see this solving
> anything.

I must disagree. Unless a boot time expressed with at least the same
accuracy as the klog messages the human readable time stamps can be
rounded to differently making for example dmesg output to be unstable.

diff -q <(dmesg --ctime) <(sleep 0.5 ; dmesg --ctime)

>> Additionally the current dmesg and journalctl(1) are disagreeing when
>> kernel events occurred.  The dmesg is using sysinfo(2) to determine boot
>> time, while journalctl is using systemd(1)'s early clock_gettime(2) call.
>> Here is demonstration how different baseline effects on a random laptop.
>>
>> $ dmesg --ctime | grep 'Initializing cgroup subsys cpuset'
>> [Fri Jun 28 07:40:12 2013] Initializing cgroup subsys cpuset
>> $ journalctl -b | grep 'Initializing cgroup subsys cpuset'
>> Jun 28 07:40:21 kerolasa-home kernel: Initializing cgroup subsys cpuset
>
> Here you are show nearly 10 seconds of difference in time.  Subsecond
> accuracy is not likely to help you here.

That is exactly the point, systemd is using different logic to
determine system start time. I am not sure why that was done, but my
guess is that all existing ways to determine boot time lead to
rounding errors so a private estimation was invented.

Of course i could align dmesg with systemd, by reading boot time from
systemd's dbus, but I believe after that uptime(1), top(1), [and so
on], would disagree with these two commands. Essentially it seems to
be either a) wiggling, or b) even more random estimation of boot time
from which I need to choose unless there is a new interface.

>> To solve the wiggling problem, and to have a boot time coming from source
>> which everyone can agree is 'good enough' estimation of system start up,
>> the kernel needs to tell the boot time as precisily as it can.
>
> Quite frankly none of this makes a lick of sense to me.  Your deep
> complaint is a human pretty printer is not matching the results of
> another human pretty printer.
>
> I fail to see how the limitations of pretty printers justifies
> maintiaing yet another kernel interface.  Especially for something so
> dubious and fluctuating as the time we think the wall clock read when we
> started up.

Correct. This is at the end user space issue. As a system
administrator I cannot understand when dmesg events happen, when they
are expressed in their 'native' format. But if I use human readable
time stamps output is unstable, and different tools give different
answers. That seems flawed as well.

BTW having a way to measure effect of suspend/resume could lead to a
way to fix time time distortion.

> I can't imagine this interface doing anything except generating more
> user complaints.  A made up number with more digits after the decimal
> point remains a made up number.

I am not claiming the /proc/boottime would fix every considerable
issue. What it tries to address is 1) stable high accuracy boot time
source, which 2) can be agreed in between utilities to be the
authoritative boot time source. Maybe the leap seconds will break
something, but I consider that better problem than current situation.

Perhaps there is better alternative to fix user space programs.
Unfortunately I do not have either knowledge, or imagination, to come
up with any. Fix hints, ideas, and other proposal are most welcome.


>> CC: Karel Zak <kzak@redhat.com>
>> CC: Lennart Poettering <lennart@poettering.net>
>> CC: Kay Sievers <kay@vrfy.org>
>> CC: Eric W. Biederman <ebiederm@xmission.com>
>> References: http://www.spinics.net/lists/util-linux-ng/msg08092.html
>> Measurement: https://plus.google.com/u/0/117341944650953092607/posts/SHy5qu1HpSU
>> Signed-off-by: Sami Kerola <kerolasa@iki.fi>
>> ---
>>  Documentation/ABI/testing/procfs-boottime | 15 ++++++++++++++
>>  fs/proc/Makefile                          |  1 +
>>  fs/proc/boottime.c                        | 33 +++++++++++++++++++++++++++++++
>>  3 files changed, 49 insertions(+)
>>  create mode 100644 Documentation/ABI/testing/procfs-boottime
>>  create mode 100644 fs/proc/boottime.c
>>
>> diff --git a/Documentation/ABI/testing/procfs-boottime b/Documentation/ABI/testing/procfs-boottime
>> new file mode 100644
>> index 0000000..d5bf8f2
>> --- /dev/null
>> +++ b/Documentation/ABI/testing/procfs-boottime
>> @@ -0,0 +1,15 @@
>> +What:                /proc/boottime
>> +Date:                June 2013
>> +KernelVersion:       3.11
>> +Contact:     Sami Kerola <kerolasa@iki.fi>
>> +
>> +Description: Show high precision time stamp when linux kernel thinks
>> +             the system start up.  The value is suitable to be used as
>> +             struct timespec.
>> +
>> +             struct timespec ts;
>> +             hpu = fopen("/sys/kernel/boottime", "r");
>> +             fscanf(hpu, "%ld.%ld", &ts.tv_sec, &ts.tv_nsec);
>> +
>> +Users:               At least utilities that print klog entries in
>> +             times stamps converted for humans.
>> diff --git a/fs/proc/Makefile b/fs/proc/Makefile
>> index ab30716..5a7ec5a 100644
>> --- a/fs/proc/Makefile
>> +++ b/fs/proc/Makefile
>> @@ -10,6 +10,7 @@ proc-$(CONFIG_MMU)  := task_mmu.o
>>  proc-y       += inode.o root.o base.o generic.o array.o \
>>               fd.o
>>  proc-$(CONFIG_TTY)      += proc_tty.o
>> +proc-y       += boottime.o
>>  proc-y       += cmdline.o
>>  proc-y       += consoles.o
>>  proc-y       += cpuinfo.o
>> diff --git a/fs/proc/boottime.c b/fs/proc/boottime.c
>> new file mode 100644
>> index 0000000..f56c624
>> --- /dev/null
>> +++ b/fs/proc/boottime.c
>> @@ -0,0 +1,33 @@
>> +#include <linux/fs.h>
>> +#include <linux/init.h>
>> +#include <linux/proc_fs.h>
>> +#include <linux/seq_file.h>
>> +#include <linux/time.h>
>> +
>> +static int boottime_proc_show(struct seq_file *m, void *v)
>> +{
>> +     struct timespec ts;
>> +     getboottime(&ts);
>> +     seq_printf(m, "%ld.%ld\n", ts.tv_sec, ts.tv_nsec);
>> +     return 0;
>> +}
>> +
>> +static int boottime_proc_open(struct inode *inode, struct file *file)
>> +{
>> +     return single_open(file, boottime_proc_show, NULL);
>> +}
>> +
>> +static const struct file_operations boottime_proc_fops = {
>> +     .open = boottime_proc_open,
>> +     .read = seq_read,
>> +     .llseek = seq_lseek,
>> +     .release = single_release,
>> +};
>> +
>> +static int __init proc_boottime_init(void)
>> +{
>> +     proc_create("boottime", 0, NULL, &boottime_proc_fops);
>> +     return 0;
>> +}
>> +
>> +module_init(proc_boottime_init);



--
Sami Kerola
http://www.iki.fi/kerolasa/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] proc: make high precision system boot time available
  2013-06-29 20:53   ` Sami Kerola
@ 2013-06-30 15:10     ` Kay Sievers
  2013-06-30 17:46       ` Sami Kerola
  0 siblings, 1 reply; 5+ messages in thread
From: Kay Sievers @ 2013-06-30 15:10 UTC (permalink / raw)
  To: kerolasa; +Cc: Eric W. Biederman, linux-kernel, Karel Zak, Lennart Poettering

On Sat, Jun 29, 2013 at 10:53 PM, Sami Kerola <kerolasa@iki.fi> wrote:

> BTW having a way to measure effect of suspend/resume could lead to a
> way to fix time time distortion.

> Perhaps there is better alternative to fix user space programs.
> Unfortunately I do not have either knowledge, or imagination, to come
> up with any. Fix hints, ideas, and other proposal are most welcome.

Stuff should just use:
  clock_gettime(CLOCK_BOOTTIME, &ts);

to get the time in better resolution, and with suspend time taken into account?

As a general note, the kmsg/printk() time in the raw kernel log is the
time of the CPU it runs on, especially during early init this time
might not be in sync across CPUs and stuff jumps event in the very
same stream on events from the kernel itself:
  http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/clock.c#n329

Kay

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] proc: make high precision system boot time available
  2013-06-30 15:10     ` Kay Sievers
@ 2013-06-30 17:46       ` Sami Kerola
  0 siblings, 0 replies; 5+ messages in thread
From: Sami Kerola @ 2013-06-30 17:46 UTC (permalink / raw)
  To: Kay Sievers
  Cc: Eric W. Biederman, linux-kernel, Karel Zak, Lennart Poettering

On 30 June 2013 16:10, Kay Sievers <kay@vrfy.org> wrote:
> On Sat, Jun 29, 2013 at 10:53 PM, Sami Kerola <kerolasa@iki.fi> wrote:
>
>> BTW having a way to measure effect of suspend/resume could lead to a
>> way to fix time time distortion.
>
>> Perhaps there is better alternative to fix user space programs.
>> Unfortunately I do not have either knowledge, or imagination, to come
>> up with any. Fix hints, ideas, and other proposal are most welcome.
>
> Stuff should just use:
>   clock_gettime(CLOCK_BOOTTIME, &ts);
>
> to get the time in better resolution, and with suspend time taken into account?
>
> As a general note, the kmsg/printk() time in the raw kernel log is the
> time of the CPU it runs on, especially during early init this time
> might not be in sync across CPUs and stuff jumps event in the very
> same stream on events from the kernel itself:
>   http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/kernel/sched/clock.c#n329

Thank you Kay,

The clock_gettime(CLOCK_BOOTTIME, &uptime); seems to be exactly what
dmesg needs. A patch to fix dmesg wigging will be sent to util-linux
maillist shortly. Perhaps the dmesg / journald time stamp difference
issue will be fixed someday, but that discussion does not belong to
linux-kernel mailist, so it is time to declare

end of thread.

--
Sami Kerola
http://www.iki.fi/kerolasa/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-06-30 17:46 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-29 13:14 [PATCH] proc: make high precision system boot time available Sami Kerola
2013-06-29 14:23 ` Eric W. Biederman
2013-06-29 20:53   ` Sami Kerola
2013-06-30 15:10     ` Kay Sievers
2013-06-30 17:46       ` Sami Kerola

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.