All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] time namespace aware system boot time
@ 2020-10-08  5:39 Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 1/4] timens: additional helper function to add boottime in nsec Michael Weiß
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Michael Weiß @ 2020-10-08  5:39 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, Christian Brauner
  Cc: linux-kernel, Michael Weiß, kernel test robot

Time namespaces make it possible to virtualize time inside of
containers, e.g., it is feasible to reset the uptime of a container
to zero by setting the time namespace offset for boottime to the
negated current value of the CLOCK_BOOTTIME.

However, the boot time stamp provided by getboottime64() does not
take care of time namespaces. The resulting boot time stamp 'btime'
provided by /proc/stat does not show a plausible time stamp inside
the time namespace of a container.

We address this by shifting the value returned by getboottime64()
by subtracting the boottime offset of the time namespace.
(A selftest to check the expected /proc/stat 'btime' inside the
namespace is provided.)

Further, to avoid to show processes as time travelers inside of the
time namespace the boottime offset then needs to be added to the
start_bootime provided by the task_struct.

v2 Changes:
Fixed compile errors with TIME_NS not set in config
Reported-by: kernel test robot <lkp@intel.com>

Michael Weiß (4):
  timens: additional helper function to add boottime in nsec
  time: make getboottime64 aware of time namespace
  fs/proc: apply timens offset for start_boottime of processes
  selftests/timens: added selftest for /proc/stat btime

 fs/proc/array.c                         |  6 ++-
 include/linux/time_namespace.h          | 13 ++++++
 kernel/time/timekeeping.c               |  3 ++
 tools/testing/selftests/timens/procfs.c | 58 ++++++++++++++++++++++++-
 4 files changed, 77 insertions(+), 3 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] timens: additional helper function to add boottime in nsec
  2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
@ 2020-10-08  5:39 ` Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 2/4] time: make getboottime64 aware of time namespace Michael Weiß
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Weiß @ 2020-10-08  5:39 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, Christian Brauner
  Cc: linux-kernel, Michael Weiß

Provide a helper function to apply the boottime offset to u64 types
in nanoseconds.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 include/linux/time_namespace.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/time_namespace.h b/include/linux/time_namespace.h
index 5b6031385db0..5372181010f9 100644
--- a/include/linux/time_namespace.h
+++ b/include/linux/time_namespace.h
@@ -77,6 +77,13 @@ static inline void timens_add_boottime(struct timespec64 *ts)
 	*ts = timespec64_add(*ts, ns_offsets->boottime);
 }
 
+static inline u64 timens_add_boottime_ns(u64 nsec)
+{
+	struct timens_offsets *ns_offsets = &current->nsproxy->time_ns->offsets;
+
+	return nsec + timespec64_to_ns(&ns_offsets->boottime);
+}
+
 ktime_t do_timens_ktime_to_host(clockid_t clockid, ktime_t tim,
 				struct timens_offsets *offsets);
 
@@ -130,6 +137,12 @@ static inline int timens_on_fork(struct nsproxy *nsproxy,
 
 static inline void timens_add_monotonic(struct timespec64 *ts) { }
 static inline void timens_add_boottime(struct timespec64 *ts) { }
+
+static inline u64 timens_add_boottime_ns(u64 nsec)
+{
+	return nsec;
+}
+
 static inline ktime_t timens_ktime_to_host(clockid_t clockid, ktime_t tim)
 {
 	return tim;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 1/4] timens: additional helper function to add boottime in nsec Michael Weiß
@ 2020-10-08  5:39 ` Michael Weiß
  2020-10-09 13:28   ` Christian Brauner
  2020-10-08  5:39 ` [PATCH v2 3/4] fs/proc: apply timens offset for start_boottime of processes Michael Weiß
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Michael Weiß @ 2020-10-08  5:39 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, Christian Brauner
  Cc: linux-kernel, Michael Weiß

getboottime64() provides the time stamp of system boot. In case of
time namespaces, the offset to the boot time stamp was not applied
earlier. However, getboottime64 is used e.g., in /proc/stat to print
the system boot time to userspace. In container runtimes which utilize
time namespaces to virtualize boottime of a container, this leaks
information about the host system boot time.

Therefore, we make getboottime64() to respect the time namespace offset
for boottime by subtracting the boottime offset.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 kernel/time/timekeeping.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f388a83f..67530cdb389e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -17,6 +17,7 @@
 #include <linux/clocksource.h>
 #include <linux/jiffies.h>
 #include <linux/time.h>
+#include <linux/time_namespace.h>
 #include <linux/tick.h>
 #include <linux/stop_machine.h>
 #include <linux/pvclock_gtod.h>
@@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
+	/* shift boot time stamp according to the timens offset */
+	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
 
 	*ts = ktime_to_timespec64(t);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] fs/proc: apply timens offset for start_boottime of processes
  2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 1/4] timens: additional helper function to add boottime in nsec Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 2/4] time: make getboottime64 aware of time namespace Michael Weiß
@ 2020-10-08  5:39 ` Michael Weiß
  2020-10-08  5:39 ` [PATCH v2 4/4] selftests/timens: added selftest for /proc/stat btime Michael Weiß
  2020-10-09 13:21 ` [PATCH v2 0/4] time namespace aware system boot time Christian Brauner
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Weiß @ 2020-10-08  5:39 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, Christian Brauner
  Cc: linux-kernel, Michael Weiß

Since start_boottime of processes are seconds since boottime and the
boottime stamp is now shifted according to the timens offset, the
offset of the time namespace also needs to be applied before the
process stats are given to userspace.

This avoids that processes shown, e.g., by 'ps' appear as time
travelers in the corresponding time namespace.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 fs/proc/array.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/proc/array.c b/fs/proc/array.c
index 65ec2029fa80..277f654f289e 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -56,6 +56,7 @@
 #include <linux/types.h>
 #include <linux/errno.h>
 #include <linux/time.h>
+#include <linux/time_namespace.h>
 #include <linux/kernel.h>
 #include <linux/kernel_stat.h>
 #include <linux/tty.h>
@@ -533,8 +534,9 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
 	priority = task_prio(task);
 	nice = task_nice(task);
 
-	/* convert nsec -> ticks */
-	start_time = nsec_to_clock_t(task->start_boottime);
+	/* apply timens offset for boottime and convert nsec -> ticks */
+	start_time =
+		nsec_to_clock_t(timens_add_boottime_ns(task->start_boottime));
 
 	seq_put_decimal_ull(m, "", pid_nr_ns(pid, ns));
 	seq_puts(m, " (");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] selftests/timens: added selftest for /proc/stat btime
  2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
                   ` (2 preceding siblings ...)
  2020-10-08  5:39 ` [PATCH v2 3/4] fs/proc: apply timens offset for start_boottime of processes Michael Weiß
@ 2020-10-08  5:39 ` Michael Weiß
  2020-10-09 13:21 ` [PATCH v2 0/4] time namespace aware system boot time Christian Brauner
  4 siblings, 0 replies; 17+ messages in thread
From: Michael Weiß @ 2020-10-08  5:39 UTC (permalink / raw)
  To: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, Christian Brauner
  Cc: linux-kernel, Michael Weiß

Test that btime value of /proc/stat is as expected in the time namespace
using a simple parser to get btime from /proc/stat.

Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
---
 tools/testing/selftests/timens/procfs.c | 58 ++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/timens/procfs.c b/tools/testing/selftests/timens/procfs.c
index 7f14f0fdac84..f2519154208a 100644
--- a/tools/testing/selftests/timens/procfs.c
+++ b/tools/testing/selftests/timens/procfs.c
@@ -93,6 +93,33 @@ static int read_proc_uptime(struct timespec *uptime)
 	return 0;
 }
 
+static int read_proc_stat_btime(unsigned long long *boottime_sec)
+{
+	FILE *proc;
+	char line_buf[2048];
+
+	proc = fopen("/proc/stat", "r");
+	if (proc == NULL) {
+		pr_perror("Unable to open /proc/stat");
+		return -1;
+	}
+
+	while (fgets(line_buf, 2048, proc)) {
+		if (sscanf(line_buf, "btime %llu", boottime_sec) != 1)
+			continue;
+		fclose(proc);
+		return 0;
+	}
+	if (errno) {
+		pr_perror("fscanf");
+		fclose(proc);
+		return -errno;
+	}
+	pr_err("failed to parse /proc/stat");
+	fclose(proc);
+	return -1;
+}
+
 static int check_uptime(void)
 {
 	struct timespec uptime_new, uptime_old;
@@ -123,18 +150,47 @@ static int check_uptime(void)
 	return 0;
 }
 
+static int check_stat_btime(void)
+{
+	unsigned long long btime_new, btime_old;
+	unsigned long long btime_expected;
+
+	if (switch_ns(parent_ns))
+		return pr_err("switch_ns(%d)", parent_ns);
+
+	if (read_proc_stat_btime(&btime_old))
+		return 1;
+
+	if (switch_ns(child_ns))
+		return pr_err("switch_ns(%d)", child_ns);
+
+	if (read_proc_stat_btime(&btime_new))
+		return 1;
+
+	btime_expected = btime_old - TEN_DAYS_IN_SEC;
+	if (btime_new != btime_expected) {
+		pr_fail("btime in /proc/stat: old %llu, new %llu [%llu]",
+			btime_old, btime_new, btime_expected);
+		return 1;
+	}
+
+	ksft_test_result_pass("Passed for /proc/stat btime\n");
+	return 0;
+}
+
 int main(int argc, char *argv[])
 {
 	int ret = 0;
 
 	nscheck();
 
-	ksft_set_plan(1);
+	ksft_set_plan(2);
 
 	if (init_namespaces())
 		return 1;
 
 	ret |= check_uptime();
+	ret |= check_stat_btime();
 
 	if (ret)
 		ksft_exit_fail();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 0/4] time namespace aware system boot time
  2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
                   ` (3 preceding siblings ...)
  2020-10-08  5:39 ` [PATCH v2 4/4] selftests/timens: added selftest for /proc/stat btime Michael Weiß
@ 2020-10-09 13:21 ` Christian Brauner
  4 siblings, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2020-10-09 13:21 UTC (permalink / raw)
  To: Michael Weiß
  Cc: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, linux-kernel,
	kernel test robot

On Thu, Oct 08, 2020 at 07:39:40AM +0200, Michael Weiß wrote:
> Time namespaces make it possible to virtualize time inside of
> containers, e.g., it is feasible to reset the uptime of a container
> to zero by setting the time namespace offset for boottime to the
> negated current value of the CLOCK_BOOTTIME.
> 
> However, the boot time stamp provided by getboottime64() does not
> take care of time namespaces. The resulting boot time stamp 'btime'
> provided by /proc/stat does not show a plausible time stamp inside
> the time namespace of a container.
> 
> We address this by shifting the value returned by getboottime64()
> by subtracting the boottime offset of the time namespace.
> (A selftest to check the expected /proc/stat 'btime' inside the
> namespace is provided.)
> 
> Further, to avoid to show processes as time travelers inside of the
> time namespace the boottime offset then needs to be added to the
> start_bootime provided by the task_struct.
> 
> v2 Changes:
> Fixed compile errors with TIME_NS not set in config
> Reported-by: kernel test robot <lkp@intel.com>

Hey Michael,

Thanks for the patches. This looks like a good idea to me. Since
/proc/uptime is now virtualized according to the timens the caller is in
btime has to be virtualized too.

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-08  5:39 ` [PATCH v2 2/4] time: make getboottime64 aware of time namespace Michael Weiß
@ 2020-10-09 13:28   ` Christian Brauner
  2020-10-09 13:55     ` J. Bruce Fields
  2020-10-10  7:19     ` Andrei Vagin
  0 siblings, 2 replies; 17+ messages in thread
From: Christian Brauner @ 2020-10-09 13:28 UTC (permalink / raw)
  To: Michael Weiß
  Cc: Thomas Gleixner, Andrei Vagin, Dmitry Safonov, linux-kernel,
	J. Bruce Fields, Chuck Lever, Trond Myklebust, Anna Schumaker

On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> getboottime64() provides the time stamp of system boot. In case of
> time namespaces, the offset to the boot time stamp was not applied
> earlier. However, getboottime64 is used e.g., in /proc/stat to print
> the system boot time to userspace. In container runtimes which utilize
> time namespaces to virtualize boottime of a container, this leaks
> information about the host system boot time.
> 
> Therefore, we make getboottime64() to respect the time namespace offset
> for boottime by subtracting the boottime offset.
> 
> Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
> ---
>  kernel/time/timekeeping.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 4c47f388a83f..67530cdb389e 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -17,6 +17,7 @@
>  #include <linux/clocksource.h>
>  #include <linux/jiffies.h>
>  #include <linux/time.h>
> +#include <linux/time_namespace.h>
>  #include <linux/tick.h>
>  #include <linux/stop_machine.h>
>  #include <linux/pvclock_gtod.h>
> @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
>  {
>  	struct timekeeper *tk = &tk_core.timekeeper;
>  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> +	/* shift boot time stamp according to the timens offset */
> +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);

Note that getbootime64() is mostly used in net/sunrpc and I don't know
if this change has any security implications for them.

Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
to the time namespace of the caller, i.e. a container can e.g. reset
it's boottime when started. This is already possible. The series here
fixes a bug where /proc/stat's btime field is not virtualized but since
this changes getboottime64() this would also apply to sunrpc's
timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
i.e. the time in the initial time namespace?

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-09 13:28   ` Christian Brauner
@ 2020-10-09 13:55     ` J. Bruce Fields
  2020-10-09 20:08       ` Thomas Gleixner
  2020-10-10  7:19     ` Andrei Vagin
  1 sibling, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2020-10-09 13:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Michael Weiß,
	Thomas Gleixner, Andrei Vagin, Dmitry Safonov, linux-kernel,
	Chuck Lever, Trond Myklebust, Anna Schumaker

On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > getboottime64() provides the time stamp of system boot. In case of
> > time namespaces,

Huh, I didn't know there were time namespaces.

> > the offset to the boot time stamp was not applied
> > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > the system boot time to userspace. In container runtimes which utilize
> > time namespaces to virtualize boottime of a container, this leaks
> > information about the host system boot time.
> > 
> > Therefore, we make getboottime64() to respect the time namespace offset
> > for boottime by subtracting the boottime offset.
> > 
> > Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
> > ---
> >  kernel/time/timekeeping.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4c47f388a83f..67530cdb389e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/clocksource.h>
> >  #include <linux/jiffies.h>
> >  #include <linux/time.h>
> > +#include <linux/time_namespace.h>
> >  #include <linux/tick.h>
> >  #include <linux/stop_machine.h>
> >  #include <linux/pvclock_gtod.h>
> > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> >  {
> >  	struct timekeeper *tk = &tk_core.timekeeper;
> >  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > +	/* shift boot time stamp according to the timens offset */
> > +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> 
> Note that getbootime64() is mostly used in net/sunrpc and I don't know
> if this change has any security implications for them.
> 
> Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
> to the time namespace of the caller, i.e. a container can e.g. reset
> it's boottime when started. This is already possible. The series here
> fixes a bug where /proc/stat's btime field is not virtualized but since
> this changes getboottime64() this would also apply to sunrpc's
> timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
> i.e. the time in the initial time namespace?

Looking at how it's used in net/sunrpc/cache.c....  All it's doing is
comparing times which have all been calculated relative to the time
returned by getboottime64().  So it doesn't really matter what
getboottime64() is, as long as it's always the same.

So, I don't think this should change behavior of the sunrpc code at all.

--b.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-09 13:55     ` J. Bruce Fields
@ 2020-10-09 20:08       ` Thomas Gleixner
  2020-10-12 21:13         ` J. Bruce Fields
  0 siblings, 1 reply; 17+ messages in thread
From: Thomas Gleixner @ 2020-10-09 20:08 UTC (permalink / raw)
  To: J. Bruce Fields, Christian Brauner
  Cc: Michael Weiß,
	Andrei Vagin, Dmitry Safonov, linux-kernel, Chuck Lever,
	Trond Myklebust, Anna Schumaker, Arnd Bergmann

On Fri, Oct 09 2020 at 09:55, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
>> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
>> > getboottime64() provides the time stamp of system boot. In case of
>> > time namespaces,
>
> Huh, I didn't know there were time namespaces.

There are enough people who live in their own universe, so there is a
clear need for them to have their own time zones :)

>> >  	struct timekeeper *tk = &tk_core.timekeeper;
>> >  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
>> > +	/* shift boot time stamp according to the timens offset */
>> > +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
>> 
>> Note that getbootime64() is mostly used in net/sunrpc and I don't know
>> if this change has any security implications for them.
>> 
>> Hey, Trond, Anna, Bruce, and Chuck this virtualizes boottime according
>> to the time namespace of the caller, i.e. a container can e.g. reset
>> it's boottime when started. This is already possible. The series here
>> fixes a bug where /proc/stat's btime field is not virtualized but since
>> this changes getboottime64() this would also apply to sunrpc's
>> timekeeping. Is that ok or does sunrpc rely on the hosts's boot time,
>> i.e. the time in the initial time namespace?
>
> Looking at how it's used in net/sunrpc/cache.c....  All it's doing is
> comparing times which have all been calculated relative to the time
> returned by getboottime64().  So it doesn't really matter what
> getboottime64() is, as long as it's always the same.
>
> So, I don't think this should change behavior of the sunrpc code at all.

You wish. That's clearly wrong because that code is not guaranteed to
always run in a context which belongs to the root time namespace.

AFAICT, this stuff can run in softirq context which is context stealing
and the interrupted task can belong to a different time name space.

So, no. All time namespace functions are seperate and the conversion to
and from the root name space happens at the system call boundaries.
Michael, this needs an explicit wrapper which can be used in those
places.

But that made me look at the usage of that in sunrpc.

static inline time64_t seconds_since_boot(void)
{
        struct timespec64 boot;
        getboottime64(&boot);
        return ktime_get_real_seconds() - boot.tv_sec;
}

Yikes.

static inline time64_t convert_to_wallclock(time64_t sinceboot)
{
        struct timespec64 boot;
        getboottime64(&boot);
        return boot.tv_sec + sinceboot;
}

The comment above seconds_since_boot() is just hillarious. How is the
above so much faster than ktime_get_boottime_seconds()? Arnd?

seconds_since_boot()
  getboottime64() {
     ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);

     *ts = ktime_to_timespec64(t) {
         ....
         ts.tv_sec = div_u64_rem(nsec, NSEC_PER_SEC, &rem);
     }
   }
   ktime_get_real_seconds()
     return tk->xtime_sec;

ktime_get_bootime_seconds() contains a division as well to convert
nanoseconds to seconds and I seriously doubt that the difference is
observable at all. So that comment was pulled out of thin air.

The real reason to have this clumsiness is that the expiry time is read
out of a buffer. That time is seconds since the epoch, aka. wall
time. There is also the conversion the other way round which is used for
procfs. All that can't be changed of course.

Up to 2010 this used get_seconds() and that was changed in commit
c5b29f885afe ("sunrpc: use seconds since boot in expiry cache"):

    This protects us from confusion when the wallclock time changes.
    
    We convert to and from wallclock when  setting or reading expiry
    times.

So this stuff has been hacked up as is because there are no functions to
convert wall time to boot time and vice versa. I know it's always better
to hack something up than to talk to people.

In fact the whole thing can be simplified. You can just use time in
nanoseconds retrieved via ktime_get_coarse_boottime() which does not
read the clocksource and advances once per tick and does not contain a
divison and is definitely faster than seconds_since_boot()

The expiry time is initialized via get_expiry() which does:

   getboottime64(&boot);
   return rv - boot.tv_sec; 

The expiry value 'rv' which is read out of the buffer is wall time in
seconds. So all you need is are function which convert real to boottime
and vice versa. That's trivial to implement and again faster than
getboottime64(). Something like this:

ktime_t ktime_real_to_boot(ktime_t real)
{
        struct timekeeper *tk = &tk_core.timekeeper;
        ktime_t mono = ktime_sub(real, tk->offs_real);
              
        return ktime_add(mono, tk->offs_boot);
}

so the above becomes:

   return ktime_real_to_boot(rv * NSEC_PER_SEC);

which is still faster than a division.

The nanoseconds value after converting back to wall clock will need a
division to get seconds since the epoch, but that's not an issue because
that backward conversion already has one today.

You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
add's '1' to the expiry value and some janitoring of function names and
variable types, but no real big surgery AFAICT.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-09 13:28   ` Christian Brauner
  2020-10-09 13:55     ` J. Bruce Fields
@ 2020-10-10  7:19     ` Andrei Vagin
  2020-10-10 11:50       ` Michael Weiß
  2020-10-13 20:05       ` Christian Brauner
  1 sibling, 2 replies; 17+ messages in thread
From: Andrei Vagin @ 2020-10-10  7:19 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Michael Weiß,
	Thomas Gleixner, Dmitry Safonov, linux-kernel, J. Bruce Fields,
	Chuck Lever, Trond Myklebust, Anna Schumaker

On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > getboottime64() provides the time stamp of system boot. In case of
> > time namespaces, the offset to the boot time stamp was not applied
> > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > the system boot time to userspace. In container runtimes which utilize
> > time namespaces to virtualize boottime of a container, this leaks
> > information about the host system boot time.
> > 
> > Therefore, we make getboottime64() to respect the time namespace offset
> > for boottime by subtracting the boottime offset.
> > 
> > Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
> > ---
> >  kernel/time/timekeeping.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > index 4c47f388a83f..67530cdb389e 100644
> > --- a/kernel/time/timekeeping.c
> > +++ b/kernel/time/timekeeping.c
> > @@ -17,6 +17,7 @@
> >  #include <linux/clocksource.h>
> >  #include <linux/jiffies.h>
> >  #include <linux/time.h>
> > +#include <linux/time_namespace.h>
> >  #include <linux/tick.h>
> >  #include <linux/stop_machine.h>
> >  #include <linux/pvclock_gtod.h>
> > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> >  {
> >  	struct timekeeper *tk = &tk_core.timekeeper;
> >  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > +	/* shift boot time stamp according to the timens offset */
> > +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> 
> Note that getbootime64() is mostly used in net/sunrpc and I don't know
> if this change has any security implications for them.

I would prefer to not patch kernel internal functions if they are used
not only to expose time to the userspace.

I think when kernel developers sees the getboottime64 function, they
will expect that it returns the real time of kernel boot. They will
not expect that it is aware of time namespaces and a returned time will
depend on a task in which context it will be called.

IMHO, as a minimum, we need to update the documentation for this function or
even adjust a function name.

And I think we need to consider an option to not change getbootime64 and
apply a timens offset right in the show_stat(fs/proc/stat.c) function.

Thanks,
Andrei

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-10  7:19     ` Andrei Vagin
@ 2020-10-10 11:50       ` Michael Weiß
  2020-10-10 16:37         ` Thomas Gleixner
  2020-10-13 20:05       ` Christian Brauner
  1 sibling, 1 reply; 17+ messages in thread
From: Michael Weiß @ 2020-10-10 11:50 UTC (permalink / raw)
  To: Andrei Vagin, Christian Brauner, Thomas Gleixner
  Cc: Dmitry Safonov, linux-kernel, J. Bruce Fields, Chuck Lever,
	Trond Myklebust, Anna Schumaker


On 10.10.20 09:19, Andrei Vagin wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
>> On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
>>> getboottime64() provides the time stamp of system boot. In case of
>>> time namespaces, the offset to the boot time stamp was not applied
>>> earlier. However, getboottime64 is used e.g., in /proc/stat to print
>>> the system boot time to userspace. In container runtimes which utilize
>>> time namespaces to virtualize boottime of a container, this leaks
>>> information about the host system boot time.
>>>
>>> Therefore, we make getboottime64() to respect the time namespace offset
>>> for boottime by subtracting the boottime offset.
>>>
>>> Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
>>> ---
>>>  kernel/time/timekeeping.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
>>> index 4c47f388a83f..67530cdb389e 100644
>>> --- a/kernel/time/timekeeping.c
>>> +++ b/kernel/time/timekeeping.c
>>> @@ -17,6 +17,7 @@
>>>  #include <linux/clocksource.h>
>>>  #include <linux/jiffies.h>
>>>  #include <linux/time.h>
>>> +#include <linux/time_namespace.h>
>>>  #include <linux/tick.h>
>>>  #include <linux/stop_machine.h>
>>>  #include <linux/pvclock_gtod.h>
>>> @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
>>>  {
>>>  	struct timekeeper *tk = &tk_core.timekeeper;
>>>  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
>>> +	/* shift boot time stamp according to the timens offset */
>>> +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
>> Note that getbootime64() is mostly used in net/sunrpc and I don't know
>> if this change has any security implications for them.
> I would prefer to not patch kernel internal functions if they are used
> not only to expose time to the userspace.
>
> I think when kernel developers sees the getboottime64 function, they
> will expect that it returns the real time of kernel boot. They will
> not expect that it is aware of time namespaces and a returned time will
> depend on a task in which context it will be called.
>
> IMHO, as a minimum, we need to update the documentation for this function or
> even adjust a function name.
>
> And I think we need to consider an option to not change getbootime64 and
> apply a timens offset right in the show_stat(fs/proc/stat.c) function.
>
> Thanks,
> Andrei

Since the problems in softirq context mentioned from Thomas,
I would agree to Andrei's option to just patch proc/stat.c and leave
getboottime64 unchanged.

Digging around in the kernel tree, I just found /proc/stat as the only
place where boottime is exposed to userspace, thus it seems a valid
option.

What do you think? If you agree I'll come up with an updated patch-set.

Cheers,
Michael




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-10 11:50       ` Michael Weiß
@ 2020-10-10 16:37         ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-10-10 16:37 UTC (permalink / raw)
  To: Michael Weiß, Andrei Vagin, Christian Brauner
  Cc: Dmitry Safonov, linux-kernel, J. Bruce Fields, Chuck Lever,
	Trond Myklebust, Anna Schumaker

Michael,

On Sat, Oct 10 2020 at 13:50, Michael Weiß wrote:
> On 10.10.20 09:19, Andrei Vagin wrote:
>> And I think we need to consider an option to not change getbootime64 and
>> apply a timens offset right in the show_stat(fs/proc/stat.c)
>> function.

That's what I meant and failed to express correctly.

> Since the problems in softirq context mentioned from Thomas,
> I would agree to Andrei's option to just patch proc/stat.c and leave
> getboottime64 unchanged.
>
> Digging around in the kernel tree, I just found /proc/stat as the only
> place where boottime is exposed to userspace, thus it seems a valid
> option.

Yes, I thought about a wrapper function which adds the namespace offset,
but as it is the only place, open coding is fine.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-09 20:08       ` Thomas Gleixner
@ 2020-10-12 21:13         ` J. Bruce Fields
  2020-10-12 21:36           ` Thomas Gleixner
  2020-10-14 22:05           ` J. Bruce Fields
  0 siblings, 2 replies; 17+ messages in thread
From: J. Bruce Fields @ 2020-10-12 21:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christian Brauner, Michael Weiß,
	Andrei Vagin, Dmitry Safonov, linux-kernel, Chuck Lever,
	Trond Myklebust, Anna Schumaker, Arnd Bergmann

On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> On Fri, Oct 09 2020 at 09:55, J. Bruce Fields wrote:
> > Looking at how it's used in net/sunrpc/cache.c....  All it's doing is
> > comparing times which have all been calculated relative to the time
> > returned by getboottime64().  So it doesn't really matter what
> > getboottime64() is, as long as it's always the same.
> >
> > So, I don't think this should change behavior of the sunrpc code at all.
> 
> You wish. That's clearly wrong because that code is not guaranteed to
> always run in a context which belongs to the root time namespace.

Argh, right, thanks.

> AFAICT, this stuff can run in softirq context which is context stealing
> and the interrupted task can belong to a different time name space.

Some of it runs in the context of a process doing IO to proc, some from
kthreads.  So, anyway, yes, it's not consistent in the way we'd need.

> In fact the whole thing can be simplified. You can just use time in
> nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> read the clocksource and advances once per tick and does not contain a
> divison and is definitely faster than seconds_since_boot()
> 
> The expiry time is initialized via get_expiry() which does:
> 
>    getboottime64(&boot);
>    return rv - boot.tv_sec; 
> 
> The expiry value 'rv' which is read out of the buffer is wall time in
> seconds. So all you need is are function which convert real to boottime
> and vice versa. That's trivial to implement and again faster than
> getboottime64(). Something like this:
> 
> ktime_t ktime_real_to_boot(ktime_t real)
> {
>         struct timekeeper *tk = &tk_core.timekeeper;
>         ktime_t mono = ktime_sub(real, tk->offs_real);
>               
>         return ktime_add(mono, tk->offs_boot);
> }
> 
> so the above becomes:
> 
>    return ktime_real_to_boot(rv * NSEC_PER_SEC);
> 
> which is still faster than a division.
> 
> The nanoseconds value after converting back to wall clock will need a
> division to get seconds since the epoch, but that's not an issue because
> that backward conversion already has one today.
> 
> You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> add's '1' to the expiry value and some janitoring of function names and
> variable types, but no real big surgery AFAICT.

I'll give it a shot.

Thanks so much for taking a careful look at this.

--b.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-12 21:13         ` J. Bruce Fields
@ 2020-10-12 21:36           ` Thomas Gleixner
  2020-10-14 22:05           ` J. Bruce Fields
  1 sibling, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-10-12 21:36 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christian Brauner, Michael Weiß,
	Andrei Vagin, Dmitry Safonov, linux-kernel, Chuck Lever,
	Trond Myklebust, Anna Schumaker, Arnd Bergmann

On Mon, Oct 12 2020 at 17:13, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
>> You wish. That's clearly wrong because that code is not guaranteed to
>> always run in a context which belongs to the root time namespace.
>
> Argh, right, thanks.
>
>> AFAICT, this stuff can run in softirq context which is context stealing
>> and the interrupted task can belong to a different time name space.
>
> Some of it runs in the context of a process doing IO to proc, some from
> kthreads.  So, anyway, yes, it's not consistent in the way we'd need.

Yes, that'll do it. If the process is in a time namespace then it's
definitely incorrect vs. the kthread which is in the root namespace.

>> You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
>> add's '1' to the expiry value and some janitoring of function names and
>> variable types, but no real big surgery AFAICT.
>
> I'll give it a shot.
>
> Thanks so much for taking a careful look at this.

Welcome. I was just looking at the use case. The code and especially
Arnds comment were odd enogh to make me look deeper. Such constructs are
usually showing shortcomings of the core interfaces. Seventh sense which
I gained over the past decades. :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-10  7:19     ` Andrei Vagin
  2020-10-10 11:50       ` Michael Weiß
@ 2020-10-13 20:05       ` Christian Brauner
  1 sibling, 0 replies; 17+ messages in thread
From: Christian Brauner @ 2020-10-13 20:05 UTC (permalink / raw)
  To: Andrei Vagin
  Cc: Michael Weiß,
	Thomas Gleixner, Dmitry Safonov, linux-kernel, J. Bruce Fields,
	Chuck Lever, Trond Myklebust, Anna Schumaker

On Sat, Oct 10, 2020 at 12:19:14AM -0700, Andrei Vagin wrote:
> On Fri, Oct 09, 2020 at 03:28:15PM +0200, Christian Brauner wrote:
> > On Thu, Oct 08, 2020 at 07:39:42AM +0200, Michael Weiß wrote:
> > > getboottime64() provides the time stamp of system boot. In case of
> > > time namespaces, the offset to the boot time stamp was not applied
> > > earlier. However, getboottime64 is used e.g., in /proc/stat to print
> > > the system boot time to userspace. In container runtimes which utilize
> > > time namespaces to virtualize boottime of a container, this leaks
> > > information about the host system boot time.
> > > 
> > > Therefore, we make getboottime64() to respect the time namespace offset
> > > for boottime by subtracting the boottime offset.
> > > 
> > > Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de>
> > > ---
> > >  kernel/time/timekeeping.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> > > index 4c47f388a83f..67530cdb389e 100644
> > > --- a/kernel/time/timekeeping.c
> > > +++ b/kernel/time/timekeeping.c
> > > @@ -17,6 +17,7 @@
> > >  #include <linux/clocksource.h>
> > >  #include <linux/jiffies.h>
> > >  #include <linux/time.h>
> > > +#include <linux/time_namespace.h>
> > >  #include <linux/tick.h>
> > >  #include <linux/stop_machine.h>
> > >  #include <linux/pvclock_gtod.h>
> > > @@ -2154,6 +2155,8 @@ void getboottime64(struct timespec64 *ts)
> > >  {
> > >  	struct timekeeper *tk = &tk_core.timekeeper;
> > >  	ktime_t t = ktime_sub(tk->offs_real, tk->offs_boot);
> > > +	/* shift boot time stamp according to the timens offset */
> > > +	t = timens_ktime_to_host(CLOCK_BOOTTIME, t);
> > 
> > Note that getbootime64() is mostly used in net/sunrpc and I don't know
> > if this change has any security implications for them.
> 
> I would prefer to not patch kernel internal functions if they are used
> not only to expose time to the userspace.
> 
> I think when kernel developers sees the getboottime64 function, they
> will expect that it returns the real time of kernel boot. They will
> not expect that it is aware of time namespaces and a returned time will
> depend on a task in which context it will be called.
> 
> IMHO, as a minimum, we need to update the documentation for this function or
> even adjust a function name.
> 
> And I think we need to consider an option to not change getbootime64 and
> apply a timens offset right in the show_stat(fs/proc/stat.c) function.

This is why I asked about this since I assumed this would break
someone's use-case. :)

In any case, if I understand correctly then we want the same thing: just
change fs/proc/stat.c i.e. have a a specific helper that applies the
correct offset.

Christian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-12 21:13         ` J. Bruce Fields
  2020-10-12 21:36           ` Thomas Gleixner
@ 2020-10-14 22:05           ` J. Bruce Fields
  2020-10-16 12:37             ` Thomas Gleixner
  1 sibling, 1 reply; 17+ messages in thread
From: J. Bruce Fields @ 2020-10-14 22:05 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Christian Brauner, Michael Weiß,
	Andrei Vagin, Dmitry Safonov, linux-kernel, Chuck Lever,
	Trond Myklebust, Anna Schumaker, Arnd Bergmann

On Mon, Oct 12, 2020 at 05:13:08PM -0400, J. Bruce Fields wrote:
> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
> > In fact the whole thing can be simplified. You can just use time in
> > nanoseconds retrieved via ktime_get_coarse_boottime() which does not
> > read the clocksource and advances once per tick and does not contain a
> > divison and is definitely faster than seconds_since_boot()
> > 
> > The expiry time is initialized via get_expiry() which does:
> > 
> >    getboottime64(&boot);
> >    return rv - boot.tv_sec; 
> > 
> > The expiry value 'rv' which is read out of the buffer is wall time in
> > seconds. So all you need is are function which convert real to boottime
> > and vice versa. That's trivial to implement and again faster than
> > getboottime64(). Something like this:
> > 
> > ktime_t ktime_real_to_boot(ktime_t real)
> > {
> >         struct timekeeper *tk = &tk_core.timekeeper;
> >         ktime_t mono = ktime_sub(real, tk->offs_real);
> >               
> >         return ktime_add(mono, tk->offs_boot);
> > }
> > 
> > so the above becomes:
> > 
> >    return ktime_real_to_boot(rv * NSEC_PER_SEC);
> > 
> > which is still faster than a division.
> > 
> > The nanoseconds value after converting back to wall clock will need a
> > division to get seconds since the epoch, but that's not an issue because
> > that backward conversion already has one today.
> > 
> > You'd obviously need to fixup CACHE_NEW_EXPIRY and the other place which
> > add's '1' to the expiry value and some janitoring of function names and
> > variable types, but no real big surgery AFAICT.
> 
> I'll give it a shot.

I took your code above verbatim, but should I really be following the
example of ktime_mono_to_any()?  (With the seqlock, and maybe also the
more general prototype in case someone needs the other conversions.)--b.

commit eae2005cb432
Author: J. Bruce Fields <bfields@redhat.com>
Date:   Wed Oct 14 10:19:59 2020 -0400

    sunrpc: simplify cache expiry times
    
    We've been rolling our own conversion between wallclock and boot time in
    get_expiry() and convert_to_wallclock().  Thomas Gleixner suggests it
    would be simpler to use nanoseconds since boot stored in time_t
    internally, and create common helpers for the conversion in
    kernel/time/timekeeping.c.
    
    Signed-off-by: J. Bruce Fields <bfields@redhat.com>

diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 10891b70fc7b..10868e74d6e2 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -45,8 +45,8 @@
  */
 struct cache_head {
 	struct hlist_node	cache_list;
-	time64_t	expiry_time;	/* After time time, don't use the data */
-	time64_t	last_refresh;   /* If CACHE_PENDING, this is when upcall was
+	ktime_t		expiry_time;	/* After time time, don't use the data */
+	ktime_t		last_refresh;   /* If CACHE_PENDING, this is when upcall was
 					 * sent, else this is when update was
 					 * received, though it is alway set to
 					 * be *after* ->flush_time.
@@ -59,7 +59,8 @@ struct cache_head {
 #define	CACHE_PENDING	2	/* An upcall has been sent but no reply received yet*/
 #define	CACHE_CLEANED	3	/* Entry has been cleaned from cache */
 
-#define	CACHE_NEW_EXPIRY 120	/* keep new things pending confirmation for 120 seconds */
+#define	CACHE_NEW_EXPIRY (120*NSEC_PER_SEC)
+		/* keep new things pending confirmation for 120 seconds */
 
 struct cache_detail {
 	struct module *		owner;
@@ -102,7 +103,7 @@ struct cache_detail {
 							 * than this.
 							 */
 	struct list_head	others;
-	time64_t		nextcheck;
+	ktime_t			nextcheck;
 	int			entries;
 
 	/* fields for communication over channel */
@@ -159,11 +160,9 @@ static inline time64_t seconds_since_boot(void)
 	return ktime_get_real_seconds() - boot.tv_sec;
 }
 
-static inline time64_t convert_to_wallclock(time64_t sinceboot)
+static inline time64_t convert_to_wallclock(ktime_t sinceboot)
 {
-	struct timespec64 boot;
-	getboottime64(&boot);
-	return boot.tv_sec + sinceboot;
+	return ktime_boot_to_real(sinceboot) / NSEC_PER_SEC;
 }
 
 extern const struct file_operations cache_file_operations_pipefs;
@@ -298,17 +297,15 @@ static inline int get_time(char **bpp, time64_t *time)
 	return 0;
 }
 
-static inline time64_t get_expiry(char **bpp)
+static inline ktime_t get_expiry(char **bpp)
 {
 	time64_t rv;
-	struct timespec64 boot;
 
 	if (get_time(bpp, &rv))
 		return 0;
 	if (rv < 0)
 		return 0;
-	getboottime64(&boot);
-	return rv - boot.tv_sec;
+	return ktime_real_to_boot(rv * NSEC_PER_SEC);
 }
 
 #endif /*  _LINUX_SUNRPC_CACHE_H_ */
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index d5471d6fa778..07f2b8436e48 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -68,6 +68,8 @@ extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
+extern ktime_t ktime_real_to_boot(ktime_t real);
+extern ktime_t ktime_boot_to_real(ktime_t real);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
 
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4c47f388a83f..52d24ffc229f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -851,6 +851,28 @@ ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs)
 }
 EXPORT_SYMBOL_GPL(ktime_mono_to_any);
 
+/**
+ * ktime_real_to_boot() - convert real time to boot time
+ * @real:	time to conver.
+ */
+ktime_t ktime_real_to_boot(ktime_t real)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	ktime_t mono = ktime_sub(real, tk->offs_real);
+
+	return ktime_add(mono, tk->offs_boot);
+}
+EXPORT_SYMBOL_GPL(ktime_real_to_boot);
+
+ktime_t ktime_boot_to_real(ktime_t boot)
+{
+	struct timekeeper *tk = &tk_core.timekeeper;
+	ktime_t mono = ktime_sub(boot, tk->offs_boot);
+
+	return ktime_add(mono, tk->offs_real);
+}
+EXPORT_SYMBOL_GPL(ktime_boot_to_real);
+
 /**
  * ktime_get_raw - Returns the raw monotonic time in ktime_t format
  */
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 2990a7ab9e2a..4041a483d5b2 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -42,7 +42,7 @@ static void cache_revisit_request(struct cache_head *item);
 
 static void cache_init(struct cache_head *h, struct cache_detail *detail)
 {
-	time64_t now = seconds_since_boot();
+	ktime_t now = ktime_get_coarse_boottime();
 	INIT_HLIST_NODE(&h->cache_list);
 	h->flags = 0;
 	kref_init(&h->ref);
@@ -156,10 +156,10 @@ EXPORT_SYMBOL_GPL(sunrpc_cache_lookup_rcu);
 
 static void cache_dequeue(struct cache_detail *detail, struct cache_head *ch);
 
-static void cache_fresh_locked(struct cache_head *head, time64_t expiry,
+static void cache_fresh_locked(struct cache_head *head, ktime_t expiry,
 			       struct cache_detail *detail)
 {
-	time64_t now = seconds_since_boot();
+	ktime_t now = ktime_get_coarse_boottime();
 	if (now <= detail->flush_time)
 		/* ensure it isn't immediately treated as expired */
 		now = detail->flush_time + 1;
@@ -297,7 +297,7 @@ int cache_check(struct cache_detail *detail,
 		    struct cache_head *h, struct cache_req *rqstp)
 {
 	int rv;
-	time64_t refresh_age, age;
+	ktime_t refresh_age, age;
 
 	/* First decide return status as best we can */
 	rv = cache_is_valid(h);
@@ -470,7 +470,8 @@ static int cache_clean(void)
 		head = &current_detail->hash_table[current_index];
 		hlist_for_each_entry_safe(ch, tmp, head, cache_list) {
 			if (current_detail->nextcheck > ch->expiry_time)
-				current_detail->nextcheck = ch->expiry_time+1;
+				current_detail->nextcheck = ch->expiry_time
+					+ NSEC_PER_SEC;
 			if (!cache_is_expired(current_detail, ch))
 				continue;
 
@@ -1515,7 +1516,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 {
 	char tbuf[20];
 	char *ep;
-	time64_t now;
+	ktime_t now;
 
 	if (*ppos || count > sizeof(tbuf)-1)
 		return -EINVAL;
@@ -1530,10 +1531,10 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
 	 * Making use of the number leads to races.
 	 */
 
-	now = seconds_since_boot();
+	now = ktime_get_coarse_boottime();
 	/* Always flush everything, so behave like cache_purge()
 	 * Do this by advancing flush_time to the current time,
-	 * or by one second if it has already reached the current time.
+	 * or by one if it has already reached the current time.
 	 * Newly added cache entries will always have ->last_refresh greater
 	 * that ->flush_time, so they don't get flushed prematurely.
 	 */

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/4] time: make getboottime64 aware of time namespace
  2020-10-14 22:05           ` J. Bruce Fields
@ 2020-10-16 12:37             ` Thomas Gleixner
  0 siblings, 0 replies; 17+ messages in thread
From: Thomas Gleixner @ 2020-10-16 12:37 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christian Brauner, Michael Weiß,
	Andrei Vagin, Dmitry Safonov, linux-kernel, Chuck Lever,
	Trond Myklebust, Anna Schumaker, Arnd Bergmann

Bruce,

On Wed, Oct 14 2020 at 18:05, J. Bruce Fields wrote:
> On Mon, Oct 12, 2020 at 05:13:08PM -0400, J. Bruce Fields wrote:
>> On Fri, Oct 09, 2020 at 10:08:15PM +0200, Thomas Gleixner wrote:
>> I'll give it a shot.
>
> I took your code above verbatim, but should I really be following the
> example of ktime_mono_to_any()?  (With the seqlock, and maybe also the
> more general prototype in case someone needs the other conversions.)--b.

Having ktime_boot/real_to_any() makes sense.

Vs. the sequence count: In ktime_mono_to_any() it's required for 32bit,
but now that I look at it again it's pointless for 64bit.

For ktime_real_to_any() and ktime_boot_to_any() the sequence count is
obviously mandatory for 32bit, but it's also needed for 64bit, because it
needs to apply two offsets.

So
      read();
      sub(offs_real)
      set_timeofday()
      add(offs_boot)
      consume_result()

is not any different from:

      read();
      sub(offs_real)
      add(offs_boot)
      set_timeofday()
      consume_result()

Same thing for suspend_resume() instead of set_timeofday()

The only case which needs the sequence count even on 64bit is:

      read();
      sub(offs_real)
      set_timeofday();
      suspend_resume();
      add(offs_boot)
      consume_result();

No matter what, the conversion is a best effort approach and only
accurate when the offsets did not change between the point where the
input time was acquired and the point of conversion. But that's not any
different to the approach you have so far in sunrpc.

Something like the untested below. Needs to be split in several patches
and gain the wrapper inlines.

Thanks,

        tglx
---
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -58,6 +58,7 @@ extern time64_t ktime_get_real_seconds(v
  */
 
 enum tk_offsets {
+	TK_OFFS_MONO,
 	TK_OFFS_REAL,
 	TK_OFFS_BOOT,
 	TK_OFFS_TAI,
@@ -68,6 +69,8 @@ extern ktime_t ktime_get(void);
 extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
+extern ktime_t ktime_boot_to_any(ktime_t tboot, enum tk_offsets offs);
+extern ktime_t ktime_real_to_any(ktime_t treal, enum tk_offsets offs);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
 
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -855,7 +855,10 @@ u32 ktime_get_resolution_ns(void)
 }
 EXPORT_SYMBOL_GPL(ktime_get_resolution_ns);
 
-static ktime_t *offsets[TK_OFFS_MAX] = {
+static const ktime_t zero_offset;
+
+static const ktime_t * const offsets[TK_OFFS_MAX] = {
+	[TK_OFFS_MONO]	= &zero_offset,
 	[TK_OFFS_REAL]	= &tk_core.timekeeper.offs_real,
 	[TK_OFFS_BOOT]	= &tk_core.timekeeper.offs_boot,
 	[TK_OFFS_TAI]	= &tk_core.timekeeper.offs_tai,
@@ -864,8 +867,9 @@ static ktime_t *offsets[TK_OFFS_MAX] = {
 ktime_t ktime_get_with_offset(enum tk_offsets offs)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
+	const ktime_t *offset = offsets[offs];
 	unsigned int seq;
-	ktime_t base, *offset = offsets[offs];
+	ktime_t base;
 	u64 nsecs;
 
 	WARN_ON(timekeeping_suspended);
@@ -885,8 +889,9 @@ EXPORT_SYMBOL_GPL(ktime_get_with_offset)
 ktime_t ktime_get_coarse_with_offset(enum tk_offsets offs)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
+	const ktime_t *offset = offsets[offs];
 	unsigned int seq;
-	ktime_t base, *offset = offsets[offs];
+	ktime_t base;
 	u64 nsecs;
 
 	WARN_ON(timekeeping_suspended);
@@ -906,13 +911,26 @@ EXPORT_SYMBOL_GPL(ktime_get_coarse_with_
  * ktime_mono_to_any() - convert mononotic time to any other time
  * @tmono:	time to convert.
  * @offs:	which offset to use
+ *
+ * The conversion is a best effort approach and only correct when the
+ * offset of the selected target time has not changed since @tmono was
+ * read.
  */
 ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs)
 {
-	ktime_t *offset = offsets[offs];
+	const ktime_t *offset = offsets[offs];
 	unsigned int seq;
 	ktime_t tconv;
 
+	/*
+	 * 64bit can spare the sequence count. The read is safe
+	 * and checking the sequence count does not make it
+	 * more consistent.
+	 */
+	if (IS_ENABLED(CONFIG_64BIT))
+		return ktime_add(tmono, *offset);
+
+	/* 32bit requires it to access the 64bit offset safely */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		tconv = ktime_add(tmono, *offset);
@@ -922,6 +940,56 @@ ktime_t ktime_mono_to_any(ktime_t tmono,
 }
 EXPORT_SYMBOL_GPL(ktime_mono_to_any);
 
+static ktime_t ktime_to_any(ktime_t tfrom, ktime_t *from_offset,
+			    enum tk_offsets to_offs)
+{
+	const ktime_t *to_offset = offsets[to_offs];
+	unsigned int seq;
+	ktime_t tconv;
+
+	/*
+	 * Use the sequence count even on 64bit to keep at least the two
+	 * offsets consistent.
+	 */
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		tconv = ktime_sub(tfrom, *from_offset);
+		tconv = ktime_add(tconv, *to_offset);
+	} while (read_seqcount_retry(&tk_core.seq, seq));
+
+	return tconv;
+}
+
+/**
+ * ktime_boot_to_any() - convert boot time to any other time
+ * @tboot:	time to convert.
+ * @offs:	which offset to use
+ *
+ * The conversion is a best effort approach and only correct when the boot
+ * time offset and the offset of the selected target time has not changed
+ * since @tboot was read.
+ */
+ktime_t ktime_boot_to_any(ktime_t tboot, enum tk_offsets offs)
+{
+	return ktime_to_any(tboot, &tk_core.timekeeper.offs_boot, offs);
+}
+EXPORT_SYMBOL_GPL(ktime_boot_to_any);
+
+/**
+ * ktime_real_to_any() - convert real time to any other time
+ * @treal:	time to convert.
+ * @offs:	which offset to use
+ *
+ * The conversion is a best effort approach and only correct when the real
+ * time offset and the offset of the selected target time has not changed
+ * since @treal was read.
+ */
+ktime_t ktime_real_to_any(ktime_t treal, enum tk_offsets offs)
+{
+	return ktime_to_any(treal, &tk_core.timekeeper.offs_real, offs);
+}
+EXPORT_SYMBOL_GPL(ktime_real_to_any);
+
 /**
  * ktime_get_raw - Returns the raw monotonic time in ktime_t format
  */

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-10-16 12:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-08  5:39 [PATCH v2 0/4] time namespace aware system boot time Michael Weiß
2020-10-08  5:39 ` [PATCH v2 1/4] timens: additional helper function to add boottime in nsec Michael Weiß
2020-10-08  5:39 ` [PATCH v2 2/4] time: make getboottime64 aware of time namespace Michael Weiß
2020-10-09 13:28   ` Christian Brauner
2020-10-09 13:55     ` J. Bruce Fields
2020-10-09 20:08       ` Thomas Gleixner
2020-10-12 21:13         ` J. Bruce Fields
2020-10-12 21:36           ` Thomas Gleixner
2020-10-14 22:05           ` J. Bruce Fields
2020-10-16 12:37             ` Thomas Gleixner
2020-10-10  7:19     ` Andrei Vagin
2020-10-10 11:50       ` Michael Weiß
2020-10-10 16:37         ` Thomas Gleixner
2020-10-13 20:05       ` Christian Brauner
2020-10-08  5:39 ` [PATCH v2 3/4] fs/proc: apply timens offset for start_boottime of processes Michael Weiß
2020-10-08  5:39 ` [PATCH v2 4/4] selftests/timens: added selftest for /proc/stat btime Michael Weiß
2020-10-09 13:21 ` [PATCH v2 0/4] time namespace aware system boot time Christian Brauner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.