All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] timers/nohz: Fixes and cleanups v3
@ 2023-02-22 14:46 Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

Try to (partially) fix the issue reported in https://lore.kernel.org/lkml/20230128020051.2328465-1-liaoyu15@huawei.com/

Changes since v2:

* Keep the monotonicity check against /proc/uptime first field
* Add monotonicity test between clock_gettime(CLOCK_BOOTTIME) and
  /proc/uptime

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
	timers/core-v2

HEAD: 45034144d493f62516285d7ec8bcd6408653d056

Thanks,
	Frederic
---

Frederic Weisbecker (8):
      timers/nohz: Restructure and reshuffle struct tick_sched
      timers/nohz: Only ever update sleeptime from idle exit
      timers/nohz: Protect idle/iowait sleep time under seqcount
      timers/nohz: Add a comment about broken iowait counter update race
      timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
      MAINTAINERS: Remove stale email address
      selftests/proc: Remove idle time monotonicity assertions
      selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity


 MAINTAINERS                                    |   2 +-
 kernel/time/tick-sched.c                       | 135 ++++++++++++-------------
 kernel/time/tick-sched.h                       |  67 +++++++-----
 tools/testing/selftests/proc/proc-uptime-001.c |  25 +++--
 tools/testing/selftests/proc/proc-uptime-002.c |  27 +++--
 tools/testing/selftests/proc/proc-uptime.h     |  28 ++---
 6 files changed, 159 insertions(+), 125 deletions(-)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 2/8] timers/nohz: Only ever update sleeptime from idle exit Frederic Weisbecker
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

Restructure and group fields by access in order to optimize cache
layout. While at it, also add missing kernel doc for two fields:
@last_jiffies and @idle_expires.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.h | 66 +++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 504649513399..c6663254d17d 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -22,65 +22,81 @@ enum tick_nohz_mode {
 
 /**
  * struct tick_sched - sched tick emulation and no idle tick control/stats
- * @sched_timer:	hrtimer to schedule the periodic tick in high
- *			resolution mode
- * @check_clocks:	Notification mechanism about clocksource changes
- * @nohz_mode:		Mode - one state of tick_nohz_mode
+ *
  * @inidle:		Indicator that the CPU is in the tick idle mode
  * @tick_stopped:	Indicator that the idle tick has been stopped
  * @idle_active:	Indicator that the CPU is actively in the tick idle mode;
  *			it is reset during irq handling phases.
- * @do_timer_lst:	CPU was the last one doing do_timer before going idle
+ * @do_timer_last:	CPU was the last one doing do_timer before going idle
  * @got_idle_tick:	Tick timer function has run with @inidle set
+ * @stalled_jiffies:	Number of stalled jiffies detected across ticks
+ * @last_tick_jiffies:	Value of jiffies seen on last tick
+ * @sched_timer:	hrtimer to schedule the periodic tick in high
+ *			resolution mode
  * @last_tick:		Store the last tick expiry time when the tick
  *			timer is modified for nohz sleeps. This is necessary
  *			to resume the tick timer operation in the timeline
  *			when the CPU returns from nohz sleep.
  * @next_tick:		Next tick to be fired when in dynticks mode.
  * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
+ * @idle_waketime:	Time when the idle was interrupted
+ * @idle_entrytime:	Time when the idle call was entered
+ * @nohz_mode:		Mode - one state of tick_nohz_mode
+ * @last_jiffies:	Base jiffies snapshot when next event was last computed
+ * @timer_expires_base:	Base time clock monotonic for @timer_expires
+ * @timer_expires:	Anticipated timer expiration time (in case sched tick is stopped)
+ * @next_timer:		Expiry time of next expiring timer for debugging purpose only
+ * @idle_expires:	Next tick in idle, for debugging purpose only
  * @idle_calls:		Total number of idle calls
  * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
- * @idle_entrytime:	Time when the idle call was entered
- * @idle_waketime:	Time when the idle was interrupted
  * @idle_exittime:	Time when the idle state was left
  * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
  * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @timer_expires:	Anticipated timer expiration time (in case sched tick is stopped)
- * @timer_expires_base:	Base time clock monotonic for @timer_expires
- * @next_timer:		Expiry time of next expiring timer for debugging purpose only
  * @tick_dep_mask:	Tick dependency mask - is set, if someone needs the tick
- * @last_tick_jiffies:	Value of jiffies seen on last tick
- * @stalled_jiffies:	Number of stalled jiffies detected across ticks
+ * @check_clocks:	Notification mechanism about clocksource changes
  */
 struct tick_sched {
-	struct hrtimer			sched_timer;
-	unsigned long			check_clocks;
-	enum tick_nohz_mode		nohz_mode;
-
+	/* Common flags */
 	unsigned int			inidle		: 1;
 	unsigned int			tick_stopped	: 1;
 	unsigned int			idle_active	: 1;
 	unsigned int			do_timer_last	: 1;
 	unsigned int			got_idle_tick	: 1;
 
+	/* Tick handling: jiffies stall check */
+	unsigned int			stalled_jiffies;
+	unsigned long			last_tick_jiffies;
+
+	/* Tick handling */
+	struct hrtimer			sched_timer;
 	ktime_t				last_tick;
 	ktime_t				next_tick;
 	unsigned long			idle_jiffies;
-	unsigned long			idle_calls;
-	unsigned long			idle_sleeps;
-	ktime_t				idle_entrytime;
 	ktime_t				idle_waketime;
-	ktime_t				idle_exittime;
-	ktime_t				idle_sleeptime;
-	ktime_t				iowait_sleeptime;
+
+	/* Idle entry */
+	ktime_t				idle_entrytime;
+
+	/* Tick stop */
+	enum tick_nohz_mode		nohz_mode;
 	unsigned long			last_jiffies;
-	u64				timer_expires;
 	u64				timer_expires_base;
+	u64				timer_expires;
 	u64				next_timer;
 	ktime_t				idle_expires;
+	unsigned long			idle_calls;
+	unsigned long			idle_sleeps;
+
+	/* Idle exit */
+	ktime_t				idle_exittime;
+	ktime_t				idle_sleeptime;
+	ktime_t				iowait_sleeptime;
+
+	/* Full dynticks handling */
 	atomic_t			tick_dep_mask;
-	unsigned long			last_tick_jiffies;
-	unsigned int			stalled_jiffies;
+
+	/* Clocksource changes */
+	unsigned long			check_clocks;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/8] timers/nohz: Only ever update sleeptime from idle exit
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 3/8] timers/nohz: Protect idle/iowait sleep time under seqcount Frederic Weisbecker
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

The idle and io sleeptime statistics appearing in /proc/stat can be
currently updated from two sites: locally on idle exit and remotely
by cpufreq. However there is no synchronization mechanism protecting
concurrent updates. It is therefore possible to account the sleeptime
twice, among all the possible broken scenarios.

To prevent from breaking the sleeptime accounting source, restrict the
sleeptime updates to the local idle exit site. If there is a delta to
add since the last update, IO/Idle sleep time readers will now only
compute the delta without actually writing it back to the internal idle
statistic fields.

This fixes a writer VS writer race. Note there are still two known
reader VS writer races to handle. A subsequent patch will fix one.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 103 ++++++++++++++++-----------------------
 1 file changed, 41 insertions(+), 62 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b0e3c9205946..9058b9eb8bc1 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -637,31 +637,21 @@ static void tick_nohz_update_jiffies(ktime_t now)
 	touch_softlockup_watchdog_sched();
 }
 
-/*
- * Updates the per-CPU time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
-{
-	ktime_t delta;
-
-	if (ts->idle_active) {
-		delta = ktime_sub(now, ts->idle_entrytime);
-		if (nr_iowait_cpu(cpu) > 0)
-			ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
-		else
-			ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
-		ts->idle_entrytime = now;
-	}
-
-	if (last_update_time)
-		*last_update_time = ktime_to_us(now);
-
-}
-
 static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 {
-	update_ts_time_stats(smp_processor_id(), ts, now, NULL);
+	ktime_t delta;
+
+	if (WARN_ON_ONCE(!ts->idle_active))
+		return;
+
+	delta = ktime_sub(now, ts->idle_entrytime);
+
+	if (nr_iowait_cpu(smp_processor_id()) > 0)
+		ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+	else
+		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
+
+	ts->idle_entrytime = now;
 	ts->idle_active = 0;
 
 	sched_clock_idle_wakeup_event();
@@ -674,6 +664,30 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
 	sched_clock_idle_sleep_event();
 }
 
+static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
+				 bool compute_delta, u64 *last_update_time)
+{
+	ktime_t now, idle;
+
+	if (!tick_nohz_active)
+		return -1;
+
+	now = ktime_get();
+	if (last_update_time)
+		*last_update_time = ktime_to_us(now);
+
+	if (ts->idle_active && compute_delta) {
+		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+
+		idle = ktime_add(*sleeptime, delta);
+	} else {
+		idle = *sleeptime;
+	}
+
+	return ktime_to_us(idle);
+
+}
+
 /**
  * get_cpu_idle_time_us - get the total idle time of a CPU
  * @cpu: CPU number to query
@@ -691,27 +705,9 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t now, idle;
-
-	if (!tick_nohz_active)
-		return -1;
-
-	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		idle = ts->idle_sleeptime;
-	} else {
-		if (ts->idle_active && !nr_iowait_cpu(cpu)) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
-
-			idle = ktime_add(ts->idle_sleeptime, delta);
-		} else {
-			idle = ts->idle_sleeptime;
-		}
-	}
-
-	return ktime_to_us(idle);
 
+	return get_cpu_sleep_time_us(ts, &ts->idle_sleeptime,
+				     !nr_iowait_cpu(cpu), last_update_time);
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
@@ -732,26 +728,9 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t now, iowait;
 
-	if (!tick_nohz_active)
-		return -1;
-
-	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		iowait = ts->iowait_sleeptime;
-	} else {
-		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
-
-			iowait = ktime_add(ts->iowait_sleeptime, delta);
-		} else {
-			iowait = ts->iowait_sleeptime;
-		}
-	}
-
-	return ktime_to_us(iowait);
+	return get_cpu_sleep_time_us(ts, &ts->iowait_sleeptime,
+				     nr_iowait_cpu(cpu), last_update_time);
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/8] timers/nohz: Protect idle/iowait sleep time under seqcount
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 2/8] timers/nohz: Only ever update sleeptime from idle exit Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race Frederic Weisbecker
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

Reading idle/io sleep time (eg: from /proc/stat) can race with idle exit
updates because the state machine handling the stats is not atomic and
requires a coherent read batch.

As a result reading the sleep time may report irrelevant or backward
values.

Fix this with protecting the simple state machine within a seqcount.
This is expected to be cheap enough not to add measurable performance
impact on the idle path.

Note this only fixes reader VS writer condition partitially. A race
remains that involves remote updates of the CPU iowait task counter. It
can hardly be fixed.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 22 ++++++++++++++++------
 kernel/time/tick-sched.h |  1 +
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9058b9eb8bc1..90d9b7b29875 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -646,6 +646,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 
 	delta = ktime_sub(now, ts->idle_entrytime);
 
+	write_seqcount_begin(&ts->idle_sleeptime_seq);
 	if (nr_iowait_cpu(smp_processor_id()) > 0)
 		ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
 	else
@@ -653,14 +654,18 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 
 	ts->idle_entrytime = now;
 	ts->idle_active = 0;
+	write_seqcount_end(&ts->idle_sleeptime_seq);
 
 	sched_clock_idle_wakeup_event();
 }
 
 static void tick_nohz_start_idle(struct tick_sched *ts)
 {
+	write_seqcount_begin(&ts->idle_sleeptime_seq);
 	ts->idle_entrytime = ktime_get();
 	ts->idle_active = 1;
+	write_seqcount_end(&ts->idle_sleeptime_seq);
+
 	sched_clock_idle_sleep_event();
 }
 
@@ -668,6 +673,7 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
 				 bool compute_delta, u64 *last_update_time)
 {
 	ktime_t now, idle;
+	unsigned int seq;
 
 	if (!tick_nohz_active)
 		return -1;
@@ -676,13 +682,17 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
 	if (last_update_time)
 		*last_update_time = ktime_to_us(now);
 
-	if (ts->idle_active && compute_delta) {
-		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+	do {
+		seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
 
-		idle = ktime_add(*sleeptime, delta);
-	} else {
-		idle = *sleeptime;
-	}
+		if (ts->idle_active && compute_delta) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+
+			idle = ktime_add(*sleeptime, delta);
+		} else {
+			idle = *sleeptime;
+		}
+	} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
 
 	return ktime_to_us(idle);
 
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index c6663254d17d..5ed5a9d41d5a 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -75,6 +75,7 @@ struct tick_sched {
 	ktime_t				idle_waketime;
 
 	/* Idle entry */
+	seqcount_t			idle_sleeptime_seq;
 	ktime_t				idle_entrytime;
 
 	/* Tick stop */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
                   ` (2 preceding siblings ...)
  2023-02-22 14:46 ` [PATCH 3/8] timers/nohz: Protect idle/iowait sleep time under seqcount Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 5/8] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Frederic Weisbecker
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

The per-cpu iowait task counter is incremented locally upon sleeping.
But since the task can be woken to (and by) another CPU, the counter may
then be decremented remotely. This is the source of a race involving
readers VS writer of idle/iowait sleeptime.

The following scenario shows an example where a /proc/stat reader
observes a pending sleep time as IO whereas that pending sleep time
later eventually gets accounted as non-IO.

    CPU 0                       CPU  1                    CPU 2
    -----                       -----                     ------
    //io_schedule() TASK A
    current->in_iowait = 1
    rq(0)->nr_iowait++
    //switch to idle
                        // READ /proc/stat
                        // See nr_iowait_cpu(0) == 1
                        return ts->iowait_sleeptime +
                               ktime_sub(ktime_get(), ts->idle_entrytime)

                                                          //try_to_wake_up(TASK A)
                                                          rq(0)->nr_iowait--
    //idle exit
    // See nr_iowait_cpu(0) == 0
    ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)

As a result subsequent reads on /proc/stat may expose backward progress.

This is unfortunately hardly fixable. Just add a comment about that
condition.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 90d9b7b29875..edd6e9f26d16 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -705,7 +705,10 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
  * counters if NULL.
  *
  * Return the cumulative idle time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
  *
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.
@@ -728,7 +731,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
  * counters if NULL.
  *
  * Return the cumulative iowait time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
  *
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/8] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
                   ` (3 preceding siblings ...)
  2023-02-22 14:46 ` [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 6/8] MAINTAINERS: Remove stale email address Frederic Weisbecker
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

There is no need for the __tick_nohz_idle_stop_tick() function between
tick_nohz_idle_stop_tick() and its implementation. Remove that
unnecessary step.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/tick-sched.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index edd6e9f26d16..3b53b894ca98 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1079,10 +1079,16 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	return true;
 }
 
-static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
+/**
+ * tick_nohz_idle_stop_tick - stop the idle tick from the idle task
+ *
+ * When the next event is more than a tick into the future, stop the idle tick
+ */
+void tick_nohz_idle_stop_tick(void)
 {
+	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+	int cpu = smp_processor_id();
 	ktime_t expires;
-	int cpu = smp_processor_id();
 
 	/*
 	 * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
@@ -1114,16 +1120,6 @@ static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
 	}
 }
 
-/**
- * tick_nohz_idle_stop_tick - stop the idle tick from the idle task
- *
- * When the next event is more than a tick into the future, stop the idle tick
- */
-void tick_nohz_idle_stop_tick(void)
-{
-	__tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched));
-}
-
 void tick_nohz_idle_retain_tick(void)
 {
 	tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched));
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/8] MAINTAINERS: Remove stale email address
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
                   ` (4 preceding siblings ...)
  2023-02-22 14:46 ` [PATCH 5/8] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 7/8] selftests/proc: Remove idle time monotonicity assertions Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity Frederic Weisbecker
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index fb1471cb5ed3..300ca61fa0bc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14794,7 +14794,7 @@ F:	include/uapi/linux/nitro_enclaves.h
 F:	samples/nitro_enclaves/
 
 NOHZ, DYNTICKS SUPPORT
-M:	Frederic Weisbecker <fweisbec@gmail.com>
+M:	Frederic Weisbecker <frederic@kernel.org>
 M:	Thomas Gleixner <tglx@linutronix.de>
 M:	Ingo Molnar <mingo@kernel.org>
 L:	linux-kernel@vger.kernel.org
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/8] selftests/proc: Remove idle time monotonicity assertions
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
                   ` (5 preceding siblings ...)
  2023-02-22 14:46 ` [PATCH 6/8] MAINTAINERS: Remove stale email address Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  2023-02-22 14:46 ` [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity Frederic Weisbecker
  7 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.

Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 tools/testing/selftests/proc/proc-uptime-001.c | 12 ++++++------
 tools/testing/selftests/proc/proc-uptime-002.c | 13 ++++++-------
 tools/testing/selftests/proc/proc-uptime.h     | 16 ++--------------
 3 files changed, 14 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/proc/proc-uptime-001.c b/tools/testing/selftests/proc/proc-uptime-001.c
index 781f7a50fc3f..35bddd9dd60b 100644
--- a/tools/testing/selftests/proc/proc-uptime-001.c
+++ b/tools/testing/selftests/proc/proc-uptime-001.c
@@ -13,7 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that values in /proc/uptime increment monotonically.
+// Test that boottime value in /proc/uptime increments monotonically.
+// We don't test idle time monotonicity due to broken iowait task
+// counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <stdint.h>
@@ -25,20 +27,18 @@
 
 int main(void)
 {
-	uint64_t start, u0, u1, i0, i1;
+	uint64_t start, u0, u1;
 	int fd;
 
 	fd = open("/proc/uptime", O_RDONLY);
 	assert(fd >= 0);
 
-	proc_uptime(fd, &u0, &i0);
+	u0 = proc_uptime(fd);
 	start = u0;
 	do {
-		proc_uptime(fd, &u1, &i1);
+		u1 = proc_uptime(fd);
 		assert(u1 >= u0);
-		assert(i1 >= i0);
 		u0 = u1;
-		i0 = i1;
 	} while (u1 - start < 100);
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c
index 7d0aa22bdc12..7ad79d5eaa84 100644
--- a/tools/testing/selftests/proc/proc-uptime-002.c
+++ b/tools/testing/selftests/proc/proc-uptime-002.c
@@ -13,8 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that values in /proc/uptime increment monotonically
-// while shifting across CPUs.
+// Test that boottime value in /proc/uptime increments monotonically
+// while shifting across CPUs. We don't test idle time monotonicity
+// due to broken iowait task counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <errno.h>
@@ -45,7 +46,7 @@ int main(void)
 	unsigned int len;
 	unsigned long *m;
 	unsigned int cpu;
-	uint64_t u0, u1, i0, i1;
+	uint64_t u0, u1;
 	int fd;
 
 	/* find out "nr_cpu_ids" */
@@ -60,7 +61,7 @@ int main(void)
 	fd = open("/proc/uptime", O_RDONLY);
 	assert(fd >= 0);
 
-	proc_uptime(fd, &u0, &i0);
+	u0 = proc_uptime(fd);
 	for (cpu = 0; cpu < len * 8; cpu++) {
 		memset(m, 0, len);
 		m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
@@ -68,11 +69,9 @@ int main(void)
 		/* CPU might not exist, ignore error */
 		sys_sched_setaffinity(0, len, m);
 
-		proc_uptime(fd, &u1, &i1);
+		u1 = proc_uptime(fd);
 		assert(u1 >= u0);
-		assert(i1 >= i0);
 		u0 = u1;
-		i0 = i1;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime.h b/tools/testing/selftests/proc/proc-uptime.h
index dc6a42b1d6b0..ca55abeb0ccc 100644
--- a/tools/testing/selftests/proc/proc-uptime.h
+++ b/tools/testing/selftests/proc/proc-uptime.h
@@ -22,7 +22,7 @@
 
 #include "proc.h"
 
-static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle)
+static uint64_t proc_uptime(int fd)
 {
 	uint64_t val1, val2;
 	char buf[64], *p;
@@ -43,18 +43,6 @@ static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle)
 	assert(p[3] == ' ');
 
 	val2 = (p[1] - '0') * 10 + p[2] - '0';
-	*uptime = val1 * 100 + val2;
 
-	p += 4;
-
-	val1 = xstrtoull(p, &p);
-	assert(p[0] == '.');
-	assert('0' <= p[1] && p[1] <= '9');
-	assert('0' <= p[2] && p[2] <= '9');
-	assert(p[3] == '\n');
-
-	val2 = (p[1] - '0') * 10 + p[2] - '0';
-	*idle = val1 * 100 + val2;
-
-	assert(p + 4 == buf + rv);
+	return val1 * 100 + val2;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
                   ` (6 preceding siblings ...)
  2023-02-22 14:46 ` [PATCH 7/8] selftests/proc: Remove idle time monotonicity assertions Frederic Weisbecker
@ 2023-02-22 14:46 ` Frederic Weisbecker
  2023-03-08 15:59   ` Mirsad Todorovac
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  7 siblings, 2 replies; 20+ messages in thread
From: Frederic Weisbecker @ 2023-02-22 14:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Frederic Weisbecker, Alexey Dobriyan, Wei Li,
	Peter Zijlstra, Mirsad Goran Todorovac, Yu Liao, Hillf Danton,
	Ingo Molnar

The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.

Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Wei Li <liwei391@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 .../testing/selftests/proc/proc-uptime-001.c  | 21 ++++++++++++++----
 .../testing/selftests/proc/proc-uptime-002.c  | 22 +++++++++++++++----
 tools/testing/selftests/proc/proc-uptime.h    | 12 ++++++++++
 3 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/proc/proc-uptime-001.c b/tools/testing/selftests/proc/proc-uptime-001.c
index 35bddd9dd60b..f335eec5067e 100644
--- a/tools/testing/selftests/proc/proc-uptime-001.c
+++ b/tools/testing/selftests/proc/proc-uptime-001.c
@@ -13,9 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that boottime value in /proc/uptime increments monotonically.
-// We don't test idle time monotonicity due to broken iowait task
-// counting, cf: comment above get_cpu_idle_time_us()
+// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
+// monotonically. We don't test idle time monotonicity due to broken iowait
+// task counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <stdint.h>
@@ -27,7 +27,7 @@
 
 int main(void)
 {
-	uint64_t start, u0, u1;
+	uint64_t start, u0, u1, c0, c1;
 	int fd;
 
 	fd = open("/proc/uptime", O_RDONLY);
@@ -35,10 +35,23 @@ int main(void)
 
 	u0 = proc_uptime(fd);
 	start = u0;
+	c0 = clock_boottime();
+
 	do {
 		u1 = proc_uptime(fd);
+		c1 = clock_boottime();
+
+		/* Is /proc/uptime monotonic ? */
 		assert(u1 >= u0);
+
+		/* Is CLOCK_BOOTTIME monotonic ? */
+		assert(c1 >= c0);
+
+		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
+		assert(c0 >= u0);
+
 		u0 = u1;
+		c0 = c1;
 	} while (u1 - start < 100);
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c
index 7ad79d5eaa84..ae453daa96c1 100644
--- a/tools/testing/selftests/proc/proc-uptime-002.c
+++ b/tools/testing/selftests/proc/proc-uptime-002.c
@@ -13,9 +13,10 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that boottime value in /proc/uptime increments monotonically
-// while shifting across CPUs. We don't test idle time monotonicity
-// due to broken iowait task counting, cf: comment above get_cpu_idle_time_us()
+// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
+// monotonically while shifting across CPUs. We don't test idle time
+// monotonicity due to broken iowait task counting, cf: comment above
+// get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <errno.h>
@@ -43,10 +44,10 @@ static inline int sys_sched_setaffinity(pid_t pid, unsigned int len, unsigned lo
 
 int main(void)
 {
+	uint64_t u0, u1, c0, c1;
 	unsigned int len;
 	unsigned long *m;
 	unsigned int cpu;
-	uint64_t u0, u1;
 	int fd;
 
 	/* find out "nr_cpu_ids" */
@@ -62,6 +63,8 @@ int main(void)
 	assert(fd >= 0);
 
 	u0 = proc_uptime(fd);
+	c0 = clock_boottime();
+
 	for (cpu = 0; cpu < len * 8; cpu++) {
 		memset(m, 0, len);
 		m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
@@ -70,8 +73,19 @@ int main(void)
 		sys_sched_setaffinity(0, len, m);
 
 		u1 = proc_uptime(fd);
+		c1 = clock_boottime();
+
+		/* Is /proc/uptime monotonic ? */
 		assert(u1 >= u0);
+
+		/* Is CLOCK_BOOTTIME monotonic ? */
+		assert(c1 >= c0);
+
+		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
+		assert(c0 >= u0);
+
 		u0 = u1;
+		c0 = c1;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime.h b/tools/testing/selftests/proc/proc-uptime.h
index ca55abeb0ccc..730cce4a3d73 100644
--- a/tools/testing/selftests/proc/proc-uptime.h
+++ b/tools/testing/selftests/proc/proc-uptime.h
@@ -19,9 +19,21 @@
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>
+#include <time.h>
 
 #include "proc.h"
 
+static uint64_t clock_boottime(void)
+{
+	struct timespec ts;
+	int err;
+
+	err = clock_gettime(CLOCK_BOOTTIME, &ts);
+	assert(err >= 0);
+
+	return (ts.tv_sec * 100) + (ts.tv_nsec / 10000000);
+}
+
 static uint64_t proc_uptime(int fd)
 {
 	uint64_t val1, val2;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  2023-02-22 14:46 ` [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity Frederic Weisbecker
@ 2023-03-08 15:59   ` Mirsad Todorovac
  2023-03-21 12:44     ` Frederic Weisbecker
  2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
  1 sibling, 1 reply; 20+ messages in thread
From: Mirsad Todorovac @ 2023-03-08 15:59 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner
  Cc: LKML, Alexey Dobriyan, Wei Li, Peter Zijlstra, Yu Liao,
	Hillf Danton, Ingo Molnar

Hi Frederic,

On 2/22/23 15:46, Frederic Weisbecker wrote:
> The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
> can also be fetched from clock_gettime() API.
> 
> Improve the test coverage while verifying the monotonicity of
> CLOCK_BOOTTIME accross both interfaces.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Yu Liao <liaoyu15@huawei.com>
> Cc: Hillf Danton <hdanton@sina.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Wei Li <liwei391@huawei.com>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Cc: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>   .../testing/selftests/proc/proc-uptime-001.c  | 21 ++++++++++++++----
>   .../testing/selftests/proc/proc-uptime-002.c  | 22 +++++++++++++++----
>   tools/testing/selftests/proc/proc-uptime.h    | 12 ++++++++++
>   3 files changed, 47 insertions(+), 8 deletions(-)
> 
> diff --git a/tools/testing/selftests/proc/proc-uptime-001.c b/tools/testing/selftests/proc/proc-uptime-001.c
> index 35bddd9dd60b..f335eec5067e 100644
> --- a/tools/testing/selftests/proc/proc-uptime-001.c
> +++ b/tools/testing/selftests/proc/proc-uptime-001.c
> @@ -13,9 +13,9 @@
>    * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>    * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>    */
> -// Test that boottime value in /proc/uptime increments monotonically.
> -// We don't test idle time monotonicity due to broken iowait task
> -// counting, cf: comment above get_cpu_idle_time_us()
> +// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
> +// monotonically. We don't test idle time monotonicity due to broken iowait
> +// task counting, cf: comment above get_cpu_idle_time_us()
>   #undef NDEBUG
>   #include <assert.h>
>   #include <stdint.h>
> @@ -27,7 +27,7 @@
>   
>   int main(void)
>   {
> -	uint64_t start, u0, u1;
> +	uint64_t start, u0, u1, c0, c1;
>   	int fd;
>   
>   	fd = open("/proc/uptime", O_RDONLY);
> @@ -35,10 +35,23 @@ int main(void)
>   
>   	u0 = proc_uptime(fd);
>   	start = u0;
> +	c0 = clock_boottime();
> +
>   	do {
>   		u1 = proc_uptime(fd);
> +		c1 = clock_boottime();
> +
> +		/* Is /proc/uptime monotonic ? */
>   		assert(u1 >= u0);
> +
> +		/* Is CLOCK_BOOTTIME monotonic ? */
> +		assert(c1 >= c0);
> +
> +		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
> +		assert(c0 >= u0);
> +
>   		u0 = u1;
> +		c0 = c1;
>   	} while (u1 - start < 100);
>   
>   	return 0;
> diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c
> index 7ad79d5eaa84..ae453daa96c1 100644
> --- a/tools/testing/selftests/proc/proc-uptime-002.c
> +++ b/tools/testing/selftests/proc/proc-uptime-002.c
> @@ -13,9 +13,10 @@
>    * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
>    * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
>    */
> -// Test that boottime value in /proc/uptime increments monotonically
> -// while shifting across CPUs. We don't test idle time monotonicity
> -// due to broken iowait task counting, cf: comment above get_cpu_idle_time_us()
> +// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
> +// monotonically while shifting across CPUs. We don't test idle time
> +// monotonicity due to broken iowait task counting, cf: comment above
> +// get_cpu_idle_time_us()
>   #undef NDEBUG
>   #include <assert.h>
>   #include <errno.h>
> @@ -43,10 +44,10 @@ static inline int sys_sched_setaffinity(pid_t pid, unsigned int len, unsigned lo
>   
>   int main(void)
>   {
> +	uint64_t u0, u1, c0, c1;
>   	unsigned int len;
>   	unsigned long *m;
>   	unsigned int cpu;
> -	uint64_t u0, u1;
>   	int fd;
>   
>   	/* find out "nr_cpu_ids" */
> @@ -62,6 +63,8 @@ int main(void)
>   	assert(fd >= 0);
>   
>   	u0 = proc_uptime(fd);
> +	c0 = clock_boottime();
> +
>   	for (cpu = 0; cpu < len * 8; cpu++) {
>   		memset(m, 0, len);
>   		m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
> @@ -70,8 +73,19 @@ int main(void)
>   		sys_sched_setaffinity(0, len, m);
>   
>   		u1 = proc_uptime(fd);
> +		c1 = clock_boottime();
> +
> +		/* Is /proc/uptime monotonic ? */
>   		assert(u1 >= u0);
> +
> +		/* Is CLOCK_BOOTTIME monotonic ? */
> +		assert(c1 >= c0);
> +
> +		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
> +		assert(c0 >= u0);
> +
>   		u0 = u1;
> +		c0 = c1;
>   	}
>   
>   	return 0;
> diff --git a/tools/testing/selftests/proc/proc-uptime.h b/tools/testing/selftests/proc/proc-uptime.h
> index ca55abeb0ccc..730cce4a3d73 100644
> --- a/tools/testing/selftests/proc/proc-uptime.h
> +++ b/tools/testing/selftests/proc/proc-uptime.h
> @@ -19,9 +19,21 @@
>   #include <string.h>
>   #include <stdlib.h>
>   #include <unistd.h>
> +#include <time.h>
>   
>   #include "proc.h"
>   
> +static uint64_t clock_boottime(void)
> +{
> +	struct timespec ts;
> +	int err;
> +
> +	err = clock_gettime(CLOCK_BOOTTIME, &ts);
> +	assert(err >= 0);
> +
> +	return (ts.tv_sec * 100) + (ts.tv_nsec / 10000000);
> +}
> +
>   static uint64_t proc_uptime(int fd)
>   {
>   	uint64_t val1, val2;

 From what I see, you round the CLOCK_BOOTIME time to 1/100ths of a second.

A simple program that queries clock_getres() on system clocks gives this
result:

clock_res [CLOCK_REALTIME] = 0.000000001s
clock_res [CLOCK_REALTIME_COARSE] = 0.004000000s
clock_res [CLOCK_MONOTONIC] = 0.000000001s
clock_res [CLOCK_MONOTONIC_COARSE] = 0.004000000s
clock_res [CLOCK_MONOTONIC_RAW] = 0.000000001s
clock_res [CLOCK_BOOTTIME] = 0.000000001s
clock_res [CLOCK_PROCESS_CPUTIME_ID] = 0.000000001s
clock_res [CLOCK_THREAD_CPUTIME_ID] = 0.000000001s

A number of programs may depend i.e. on CLOCK_REALTIME or CLOCK_BOOTIME to give
different result each nanosecond.

I came across this when generating nonces for HMACs according to recommendations
from RFC 4086 "Randomness Requirements for Security".

If the value of CLOCK_BOOTTIME or CLOCK_REALTIME is incremented not in what
clock_getres() gives, but at best in 1/100th of second instead, that would seriously
weaken our security (for as you know, in many cryptographic uses nonces need not
be random, but MUST NOT ever repeat nor go backwards).

Could we modify the test for this assumption, or is the assumption wrong?

Here the test for CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID
increasing monotonically with guaranteed increased value of nanoseconds
would also seem good.

Maybe this is already covered in another test case, but it seems that all
clocks should be guaranteed to be monotonically increasing, and increased
at least by one nanosecond with each syscall, or many algorithms would break.

In other words, CLOCK_BOOTTIME should be tested to increase monotonically in
the resolution given by clock_getres (CLOCK_BOOTTIME, &tv_res), not in 1/100ths
of second (IMHO).

Am I wrong in my assumption?

Thank you,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  2023-03-08 15:59   ` Mirsad Todorovac
@ 2023-03-21 12:44     ` Frederic Weisbecker
  2023-03-26 20:03       ` Mirsad Goran Todorovac
  0 siblings, 1 reply; 20+ messages in thread
From: Frederic Weisbecker @ 2023-03-21 12:44 UTC (permalink / raw)
  To: Mirsad Todorovac
  Cc: Thomas Gleixner, LKML, Alexey Dobriyan, Wei Li, Peter Zijlstra,
	Yu Liao, Hillf Danton, Ingo Molnar

On Wed, Mar 08, 2023 at 04:59:41PM +0100, Mirsad Todorovac wrote:
> On 2/22/23 15:46, Frederic Weisbecker wrote:
> From what I see, you round the CLOCK_BOOTIME time to 1/100ths of a second.
> 
> A simple program that queries clock_getres() on system clocks gives this
> result:
> 
> clock_res [CLOCK_REALTIME] = 0.000000001s
> clock_res [CLOCK_REALTIME_COARSE] = 0.004000000s
> clock_res [CLOCK_MONOTONIC] = 0.000000001s
> clock_res [CLOCK_MONOTONIC_COARSE] = 0.004000000s
> clock_res [CLOCK_MONOTONIC_RAW] = 0.000000001s
> clock_res [CLOCK_BOOTTIME] = 0.000000001s
> clock_res [CLOCK_PROCESS_CPUTIME_ID] = 0.000000001s
> clock_res [CLOCK_THREAD_CPUTIME_ID] = 0.000000001s
> 
> A number of programs may depend i.e. on CLOCK_REALTIME or CLOCK_BOOTIME to give
> different result each nanosecond.
> 
> I came across this when generating nonces for HMACs according to recommendations
> from RFC 4086 "Randomness Requirements for Security".
> 
> If the value of CLOCK_BOOTTIME or CLOCK_REALTIME is incremented not in what
> clock_getres() gives, but at best in 1/100th of second instead, that would seriously
> weaken our security (for as you know, in many cryptographic uses nonces need not
> be random, but MUST NOT ever repeat nor go backwards).
> 
> Could we modify the test for this assumption, or is the assumption wrong?
> 
> Here the test for CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID
> increasing monotonically with guaranteed increased value of nanoseconds
> would also seem good.
> 
> Maybe this is already covered in another test case, but it seems that all
> clocks should be guaranteed to be monotonically increasing, and increased
> at least by one nanosecond with each syscall, or many algorithms would break.
> 
> In other words, CLOCK_BOOTTIME should be tested to increase monotonically in
> the resolution given by clock_getres (CLOCK_BOOTTIME, &tv_res), not in 1/100ths
> of second (IMHO).

Maybe but verifying a clock against its own resolution is another testcase. Here the
point is to verify that CLOCK_BOOTTIME is monotonic against /proc/uptime, and
since /proc/uptime has an 1/100 second resolution, rounding clock_gettime(CLOCK_BOOTTIME)
result down to that is the best we can do.

Thanks.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  2023-03-21 12:44     ` Frederic Weisbecker
@ 2023-03-26 20:03       ` Mirsad Goran Todorovac
  0 siblings, 0 replies; 20+ messages in thread
From: Mirsad Goran Todorovac @ 2023-03-26 20:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Thomas Gleixner, LKML, Alexey Dobriyan, Wei Li, Peter Zijlstra,
	Yu Liao, Hillf Danton, Ingo Molnar

On 21. 03. 2023. 13:44, Frederic Weisbecker wrote:
> On Wed, Mar 08, 2023 at 04:59:41PM +0100, Mirsad Todorovac wrote:
>> On 2/22/23 15:46, Frederic Weisbecker wrote:
>> From what I see, you round the CLOCK_BOOTIME time to 1/100ths of a second.
>>
>> A simple program that queries clock_getres() on system clocks gives this
>> result:
>>
>> clock_res [CLOCK_REALTIME] = 0.000000001s
>> clock_res [CLOCK_REALTIME_COARSE] = 0.004000000s
>> clock_res [CLOCK_MONOTONIC] = 0.000000001s
>> clock_res [CLOCK_MONOTONIC_COARSE] = 0.004000000s
>> clock_res [CLOCK_MONOTONIC_RAW] = 0.000000001s
>> clock_res [CLOCK_BOOTTIME] = 0.000000001s
>> clock_res [CLOCK_PROCESS_CPUTIME_ID] = 0.000000001s
>> clock_res [CLOCK_THREAD_CPUTIME_ID] = 0.000000001s
>>
>> A number of programs may depend i.e. on CLOCK_REALTIME or CLOCK_BOOTIME to give
>> different result each nanosecond.
>>
>> I came across this when generating nonces for HMACs according to recommendations
>> from RFC 4086 "Randomness Requirements for Security".
>>
>> If the value of CLOCK_BOOTTIME or CLOCK_REALTIME is incremented not in what
>> clock_getres() gives, but at best in 1/100th of second instead, that would seriously
>> weaken our security (for as you know, in many cryptographic uses nonces need not
>> be random, but MUST NOT ever repeat nor go backwards).
>>
>> Could we modify the test for this assumption, or is the assumption wrong?
>>
>> Here the test for CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID
>> increasing monotonically with guaranteed increased value of nanoseconds
>> would also seem good.
>>
>> Maybe this is already covered in another test case, but it seems that all
>> clocks should be guaranteed to be monotonically increasing, and increased
>> at least by one nanosecond with each syscall, or many algorithms would break.
>>
>> In other words, CLOCK_BOOTTIME should be tested to increase monotonically in
>> the resolution given by clock_getres (CLOCK_BOOTTIME, &tv_res), not in 1/100ths
>> of second (IMHO).
> 
> Maybe but verifying a clock against its own resolution is another testcase. Here the
> point is to verify that CLOCK_BOOTTIME is monotonic against /proc/uptime, and
> since /proc/uptime has an 1/100 second resolution, rounding clock_gettime(CLOCK_BOOTTIME)
> result down to that is the best we can do.
> 
> Thanks.

Hi Frederic,

Thank you for explaining that.

I've read somewhere (forgot the link) that clock_gettime(CLOCK_*) clocks should
be guaranteed to return at least a nanosecond increased value for a PID or TID
from call to call.

Returning the same value would break some algorithms that depend on monotonous
increase of time - for example, some naive implementations of nonce generators.

I believe this is worth assuring in tests, or possibly some naive crypto
would reveal its pre-shared secrets in consecutive calls (Please see RFC 4086,
"Randomness Requirements for Security" for greater detail in explanation.

Best regards,
Mirsad

-- 
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
 
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [tip: timers/core] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  2023-02-22 14:46 ` [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity Frederic Weisbecker
  2023-03-08 15:59   ` Mirsad Todorovac
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Thomas Gleixner, Frederic Weisbecker, x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     263dda24fff0957f6b0a9abde2809122f7f0fad8
Gitweb:        https://git.kernel.org/tip/263dda24fff0957f6b0a9abde2809122f7f0fad8
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:49 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:13 +02:00

selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity

The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.

Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-9-frederic@kernel.org

---
 tools/testing/selftests/proc/proc-uptime-001.c | 21 ++++++++++++----
 tools/testing/selftests/proc/proc-uptime-002.c | 22 +++++++++++++----
 tools/testing/selftests/proc/proc-uptime.h     | 12 +++++++++-
 3 files changed, 47 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/proc/proc-uptime-001.c b/tools/testing/selftests/proc/proc-uptime-001.c
index 35bddd9..f335eec 100644
--- a/tools/testing/selftests/proc/proc-uptime-001.c
+++ b/tools/testing/selftests/proc/proc-uptime-001.c
@@ -13,9 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that boottime value in /proc/uptime increments monotonically.
-// We don't test idle time monotonicity due to broken iowait task
-// counting, cf: comment above get_cpu_idle_time_us()
+// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
+// monotonically. We don't test idle time monotonicity due to broken iowait
+// task counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <stdint.h>
@@ -27,7 +27,7 @@
 
 int main(void)
 {
-	uint64_t start, u0, u1;
+	uint64_t start, u0, u1, c0, c1;
 	int fd;
 
 	fd = open("/proc/uptime", O_RDONLY);
@@ -35,10 +35,23 @@ int main(void)
 
 	u0 = proc_uptime(fd);
 	start = u0;
+	c0 = clock_boottime();
+
 	do {
 		u1 = proc_uptime(fd);
+		c1 = clock_boottime();
+
+		/* Is /proc/uptime monotonic ? */
 		assert(u1 >= u0);
+
+		/* Is CLOCK_BOOTTIME monotonic ? */
+		assert(c1 >= c0);
+
+		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
+		assert(c0 >= u0);
+
 		u0 = u1;
+		c0 = c1;
 	} while (u1 - start < 100);
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c
index 7ad79d5..ae453da 100644
--- a/tools/testing/selftests/proc/proc-uptime-002.c
+++ b/tools/testing/selftests/proc/proc-uptime-002.c
@@ -13,9 +13,10 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that boottime value in /proc/uptime increments monotonically
-// while shifting across CPUs. We don't test idle time monotonicity
-// due to broken iowait task counting, cf: comment above get_cpu_idle_time_us()
+// Test that boottime value in /proc/uptime and CLOCK_BOOTTIME increment
+// monotonically while shifting across CPUs. We don't test idle time
+// monotonicity due to broken iowait task counting, cf: comment above
+// get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <errno.h>
@@ -43,10 +44,10 @@ static inline int sys_sched_setaffinity(pid_t pid, unsigned int len, unsigned lo
 
 int main(void)
 {
+	uint64_t u0, u1, c0, c1;
 	unsigned int len;
 	unsigned long *m;
 	unsigned int cpu;
-	uint64_t u0, u1;
 	int fd;
 
 	/* find out "nr_cpu_ids" */
@@ -62,6 +63,8 @@ int main(void)
 	assert(fd >= 0);
 
 	u0 = proc_uptime(fd);
+	c0 = clock_boottime();
+
 	for (cpu = 0; cpu < len * 8; cpu++) {
 		memset(m, 0, len);
 		m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
@@ -70,8 +73,19 @@ int main(void)
 		sys_sched_setaffinity(0, len, m);
 
 		u1 = proc_uptime(fd);
+		c1 = clock_boottime();
+
+		/* Is /proc/uptime monotonic ? */
 		assert(u1 >= u0);
+
+		/* Is CLOCK_BOOTTIME monotonic ? */
+		assert(c1 >= c0);
+
+		/* Is CLOCK_BOOTTIME VS /proc/uptime monotonic ? */
+		assert(c0 >= u0);
+
 		u0 = u1;
+		c0 = c1;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime.h b/tools/testing/selftests/proc/proc-uptime.h
index ca55abe..730cce4 100644
--- a/tools/testing/selftests/proc/proc-uptime.h
+++ b/tools/testing/selftests/proc/proc-uptime.h
@@ -19,9 +19,21 @@
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>
+#include <time.h>
 
 #include "proc.h"
 
+static uint64_t clock_boottime(void)
+{
+	struct timespec ts;
+	int err;
+
+	err = clock_gettime(CLOCK_BOOTTIME, &ts);
+	assert(err >= 0);
+
+	return (ts.tv_sec * 100) + (ts.tv_nsec / 10000000);
+}
+
 static uint64_t proc_uptime(int fd)
 {
 	uint64_t val1, val2;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] selftests/proc: Remove idle time monotonicity assertions
  2023-02-22 14:46 ` [PATCH 7/8] selftests/proc: Remove idle time monotonicity assertions Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yu Liao, Thomas Gleixner, Frederic Weisbecker, x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     270b2a679ea7be1ee2d84ea67751fb1ab15bcb20
Gitweb:        https://git.kernel.org/tip/270b2a679ea7be1ee2d84ea67751fb1ab15bcb20
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:48 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:13 +02:00

selftests/proc: Remove idle time monotonicity assertions

Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.

Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-8-frederic@kernel.org

---
 tools/testing/selftests/proc/proc-uptime-001.c | 12 ++++++------
 tools/testing/selftests/proc/proc-uptime-002.c | 13 ++++++-------
 tools/testing/selftests/proc/proc-uptime.h     | 16 ++--------------
 3 files changed, 14 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/proc/proc-uptime-001.c b/tools/testing/selftests/proc/proc-uptime-001.c
index 781f7a5..35bddd9 100644
--- a/tools/testing/selftests/proc/proc-uptime-001.c
+++ b/tools/testing/selftests/proc/proc-uptime-001.c
@@ -13,7 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that values in /proc/uptime increment monotonically.
+// Test that boottime value in /proc/uptime increments monotonically.
+// We don't test idle time monotonicity due to broken iowait task
+// counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <stdint.h>
@@ -25,20 +27,18 @@
 
 int main(void)
 {
-	uint64_t start, u0, u1, i0, i1;
+	uint64_t start, u0, u1;
 	int fd;
 
 	fd = open("/proc/uptime", O_RDONLY);
 	assert(fd >= 0);
 
-	proc_uptime(fd, &u0, &i0);
+	u0 = proc_uptime(fd);
 	start = u0;
 	do {
-		proc_uptime(fd, &u1, &i1);
+		u1 = proc_uptime(fd);
 		assert(u1 >= u0);
-		assert(i1 >= i0);
 		u0 = u1;
-		i0 = i1;
 	} while (u1 - start < 100);
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime-002.c b/tools/testing/selftests/proc/proc-uptime-002.c
index 7d0aa22..7ad79d5 100644
--- a/tools/testing/selftests/proc/proc-uptime-002.c
+++ b/tools/testing/selftests/proc/proc-uptime-002.c
@@ -13,8 +13,9 @@
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
-// Test that values in /proc/uptime increment monotonically
-// while shifting across CPUs.
+// Test that boottime value in /proc/uptime increments monotonically
+// while shifting across CPUs. We don't test idle time monotonicity
+// due to broken iowait task counting, cf: comment above get_cpu_idle_time_us()
 #undef NDEBUG
 #include <assert.h>
 #include <errno.h>
@@ -45,7 +46,7 @@ int main(void)
 	unsigned int len;
 	unsigned long *m;
 	unsigned int cpu;
-	uint64_t u0, u1, i0, i1;
+	uint64_t u0, u1;
 	int fd;
 
 	/* find out "nr_cpu_ids" */
@@ -60,7 +61,7 @@ int main(void)
 	fd = open("/proc/uptime", O_RDONLY);
 	assert(fd >= 0);
 
-	proc_uptime(fd, &u0, &i0);
+	u0 = proc_uptime(fd);
 	for (cpu = 0; cpu < len * 8; cpu++) {
 		memset(m, 0, len);
 		m[cpu / (8 * sizeof(unsigned long))] |= 1UL << (cpu % (8 * sizeof(unsigned long)));
@@ -68,11 +69,9 @@ int main(void)
 		/* CPU might not exist, ignore error */
 		sys_sched_setaffinity(0, len, m);
 
-		proc_uptime(fd, &u1, &i1);
+		u1 = proc_uptime(fd);
 		assert(u1 >= u0);
-		assert(i1 >= i0);
 		u0 = u1;
-		i0 = i1;
 	}
 
 	return 0;
diff --git a/tools/testing/selftests/proc/proc-uptime.h b/tools/testing/selftests/proc/proc-uptime.h
index dc6a42b..ca55abe 100644
--- a/tools/testing/selftests/proc/proc-uptime.h
+++ b/tools/testing/selftests/proc/proc-uptime.h
@@ -22,7 +22,7 @@
 
 #include "proc.h"
 
-static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle)
+static uint64_t proc_uptime(int fd)
 {
 	uint64_t val1, val2;
 	char buf[64], *p;
@@ -43,18 +43,6 @@ static void proc_uptime(int fd, uint64_t *uptime, uint64_t *idle)
 	assert(p[3] == ' ');
 
 	val2 = (p[1] - '0') * 10 + p[2] - '0';
-	*uptime = val1 * 100 + val2;
 
-	p += 4;
-
-	val1 = xstrtoull(p, &p);
-	assert(p[0] == '.');
-	assert('0' <= p[1] && p[1] <= '9');
-	assert('0' <= p[2] && p[2] <= '9');
-	assert(p[3] == '\n');
-
-	val2 = (p[1] - '0') * 10 + p[2] - '0';
-	*idle = val1 * 100 + val2;
-
-	assert(p + 4 == buf + rv);
+	return val1 * 100 + val2;
 }

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] MAINTAINERS: Remove stale email address
  2023-02-22 14:46 ` [PATCH 6/8] MAINTAINERS: Remove stale email address Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     9a1d4b8a7b80c5c796f05744851c535c5020024b
Gitweb:        https://git.kernel.org/tip/9a1d4b8a7b80c5c796f05744851c535c5020024b
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:47 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

MAINTAINERS: Remove stale email address

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-7-frederic@kernel.org

---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d8ebab5..ee7e011 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -14746,7 +14746,7 @@ F:	include/uapi/linux/nitro_enclaves.h
 F:	samples/nitro_enclaves/
 
 NOHZ, DYNTICKS SUPPORT
-M:	Frederic Weisbecker <fweisbec@gmail.com>
+M:	Frederic Weisbecker <frederic@kernel.org>
 M:	Thomas Gleixner <tglx@linutronix.de>
 M:	Ingo Molnar <mingo@kernel.org>
 L:	linux-kernel@vger.kernel.org

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
  2023-02-22 14:46 ` [PATCH 5/8] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     289dafed3851a7693a47896f5d09420bf6046ef2
Gitweb:        https://git.kernel.org/tip/289dafed3851a7693a47896f5d09420bf6046ef2
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:46 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()

There is no need for the __tick_nohz_idle_stop_tick() function between
tick_nohz_idle_stop_tick() and its implementation. Remove that
unnecessary step.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-6-frederic@kernel.org

---
 kernel/time/tick-sched.c | 20 ++++++++------------
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index edd6e9f..3b53b89 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1079,10 +1079,16 @@ static bool can_stop_idle_tick(int cpu, struct tick_sched *ts)
 	return true;
 }
 
-static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
+/**
+ * tick_nohz_idle_stop_tick - stop the idle tick from the idle task
+ *
+ * When the next event is more than a tick into the future, stop the idle tick
+ */
+void tick_nohz_idle_stop_tick(void)
 {
-	ktime_t expires;
+	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 	int cpu = smp_processor_id();
+	ktime_t expires;
 
 	/*
 	 * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
@@ -1114,16 +1120,6 @@ static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
 	}
 }
 
-/**
- * tick_nohz_idle_stop_tick - stop the idle tick from the idle task
- *
- * When the next event is more than a tick into the future, stop the idle tick
- */
-void tick_nohz_idle_stop_tick(void)
-{
-	__tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched));
-}
-
 void tick_nohz_idle_retain_tick(void)
 {
 	tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched));

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] timers/nohz: Add a comment about broken iowait counter update race
  2023-02-22 14:46 ` [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     ead70b75237371c735a481a9843b411cfbb18404
Gitweb:        https://git.kernel.org/tip/ead70b75237371c735a481a9843b411cfbb18404
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:45 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

timers/nohz: Add a comment about broken iowait counter update race

The per-cpu iowait task counter is incremented locally upon sleeping.
But since the task can be woken to (and by) another CPU, the counter may
then be decremented remotely. This is the source of a race involving
readers VS writer of idle/iowait sleeptime.

The following scenario shows an example where a /proc/stat reader
observes a pending sleep time as IO whereas that pending sleep time
later eventually gets accounted as non-IO.

    CPU 0                       CPU  1                    CPU 2
    -----                       -----                     ------
    //io_schedule() TASK A
    current->in_iowait = 1
    rq(0)->nr_iowait++
    //switch to idle
                        // READ /proc/stat
                        // See nr_iowait_cpu(0) == 1
                        return ts->iowait_sleeptime +
                               ktime_sub(ktime_get(), ts->idle_entrytime)

                                                          //try_to_wake_up(TASK A)
                                                          rq(0)->nr_iowait--
    //idle exit
    // See nr_iowait_cpu(0) == 0
    ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime)

As a result subsequent reads on /proc/stat may expose backward progress.

This is unfortunately hardly fixable. Just add a comment about that
condition.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-5-frederic@kernel.org

---
 kernel/time/tick-sched.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 90d9b7b..edd6e9f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -705,7 +705,10 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
  * counters if NULL.
  *
  * Return the cumulative idle time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
  *
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.
@@ -728,7 +731,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
  * counters if NULL.
  *
  * Return the cumulative iowait time (since boot) for a given
- * CPU, in microseconds.
+ * CPU, in microseconds. Note this is partially broken due to
+ * the counter of iowait tasks that can be remotely updated without
+ * any synchronization. Therefore it is possible to observe backward
+ * values within two consecutive reads.
  *
  * This time is measured via accounting rather than sampling,
  * and is as accurate as ktime_get() is.

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] timers/nohz: Protect idle/iowait sleep time under seqcount
  2023-02-22 14:46 ` [PATCH 3/8] timers/nohz: Protect idle/iowait sleep time under seqcount Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yu Liao, Frederic Weisbecker, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     620a30fa0bd14878891b22bf2261e6ed4587c2b4
Gitweb:        https://git.kernel.org/tip/620a30fa0bd14878891b22bf2261e6ed4587c2b4
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:44 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

timers/nohz: Protect idle/iowait sleep time under seqcount

Reading idle/IO sleep time (eg: from /proc/stat) can race with idle exit
updates because the state machine handling the stats is not atomic and
requires a coherent read batch.

As a result reading the sleep time may report irrelevant or backward
values.

Fix this with protecting the simple state machine within a seqcount.
This is expected to be cheap enough not to add measurable performance
impact on the idle path.

Note this only fixes reader VS writer condition partitially. A race
remains that involves remote updates of the CPU iowait task counter. It
can hardly be fixed.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-4-frederic@kernel.org

---
 kernel/time/tick-sched.c | 22 ++++++++++++++++------
 kernel/time/tick-sched.h |  1 +
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9058b9e..90d9b7b 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -646,6 +646,7 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 
 	delta = ktime_sub(now, ts->idle_entrytime);
 
+	write_seqcount_begin(&ts->idle_sleeptime_seq);
 	if (nr_iowait_cpu(smp_processor_id()) > 0)
 		ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
 	else
@@ -653,14 +654,18 @@ static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 
 	ts->idle_entrytime = now;
 	ts->idle_active = 0;
+	write_seqcount_end(&ts->idle_sleeptime_seq);
 
 	sched_clock_idle_wakeup_event();
 }
 
 static void tick_nohz_start_idle(struct tick_sched *ts)
 {
+	write_seqcount_begin(&ts->idle_sleeptime_seq);
 	ts->idle_entrytime = ktime_get();
 	ts->idle_active = 1;
+	write_seqcount_end(&ts->idle_sleeptime_seq);
+
 	sched_clock_idle_sleep_event();
 }
 
@@ -668,6 +673,7 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
 				 bool compute_delta, u64 *last_update_time)
 {
 	ktime_t now, idle;
+	unsigned int seq;
 
 	if (!tick_nohz_active)
 		return -1;
@@ -676,13 +682,17 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
 	if (last_update_time)
 		*last_update_time = ktime_to_us(now);
 
-	if (ts->idle_active && compute_delta) {
-		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+	do {
+		seq = read_seqcount_begin(&ts->idle_sleeptime_seq);
 
-		idle = ktime_add(*sleeptime, delta);
-	} else {
-		idle = *sleeptime;
-	}
+		if (ts->idle_active && compute_delta) {
+			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+
+			idle = ktime_add(*sleeptime, delta);
+		} else {
+			idle = *sleeptime;
+		}
+	} while (read_seqcount_retry(&ts->idle_sleeptime_seq, seq));
 
 	return ktime_to_us(idle);
 
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index c666325..5ed5a9d 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -75,6 +75,7 @@ struct tick_sched {
 	ktime_t				idle_waketime;
 
 	/* Idle entry */
+	seqcount_t			idle_sleeptime_seq;
 	ktime_t				idle_entrytime;
 
 	/* Tick stop */

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] timers/nohz: Restructure and reshuffle struct tick_sched
  2023-02-22 14:46 ` [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Thomas Gleixner, Frederic Weisbecker, Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     605da849d5982dee0527edb2488b79795f31a150
Gitweb:        https://git.kernel.org/tip/605da849d5982dee0527edb2488b79795f31a150
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:42 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

timers/nohz: Restructure and reshuffle struct tick_sched

Restructure and group fields by access in order to optimize cache
layout. While at it, also add missing kernel doc for two fields:
@last_jiffies and @idle_expires.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-2-frederic@kernel.org

---
 kernel/time/tick-sched.h | 66 ++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 5046495..c666325 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -22,65 +22,81 @@ enum tick_nohz_mode {
 
 /**
  * struct tick_sched - sched tick emulation and no idle tick control/stats
- * @sched_timer:	hrtimer to schedule the periodic tick in high
- *			resolution mode
- * @check_clocks:	Notification mechanism about clocksource changes
- * @nohz_mode:		Mode - one state of tick_nohz_mode
+ *
  * @inidle:		Indicator that the CPU is in the tick idle mode
  * @tick_stopped:	Indicator that the idle tick has been stopped
  * @idle_active:	Indicator that the CPU is actively in the tick idle mode;
  *			it is reset during irq handling phases.
- * @do_timer_lst:	CPU was the last one doing do_timer before going idle
+ * @do_timer_last:	CPU was the last one doing do_timer before going idle
  * @got_idle_tick:	Tick timer function has run with @inidle set
+ * @stalled_jiffies:	Number of stalled jiffies detected across ticks
+ * @last_tick_jiffies:	Value of jiffies seen on last tick
+ * @sched_timer:	hrtimer to schedule the periodic tick in high
+ *			resolution mode
  * @last_tick:		Store the last tick expiry time when the tick
  *			timer is modified for nohz sleeps. This is necessary
  *			to resume the tick timer operation in the timeline
  *			when the CPU returns from nohz sleep.
  * @next_tick:		Next tick to be fired when in dynticks mode.
  * @idle_jiffies:	jiffies at the entry to idle for idle time accounting
+ * @idle_waketime:	Time when the idle was interrupted
+ * @idle_entrytime:	Time when the idle call was entered
+ * @nohz_mode:		Mode - one state of tick_nohz_mode
+ * @last_jiffies:	Base jiffies snapshot when next event was last computed
+ * @timer_expires_base:	Base time clock monotonic for @timer_expires
+ * @timer_expires:	Anticipated timer expiration time (in case sched tick is stopped)
+ * @next_timer:		Expiry time of next expiring timer for debugging purpose only
+ * @idle_expires:	Next tick in idle, for debugging purpose only
  * @idle_calls:		Total number of idle calls
  * @idle_sleeps:	Number of idle calls, where the sched tick was stopped
- * @idle_entrytime:	Time when the idle call was entered
- * @idle_waketime:	Time when the idle was interrupted
  * @idle_exittime:	Time when the idle state was left
  * @idle_sleeptime:	Sum of the time slept in idle with sched tick stopped
  * @iowait_sleeptime:	Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @timer_expires:	Anticipated timer expiration time (in case sched tick is stopped)
- * @timer_expires_base:	Base time clock monotonic for @timer_expires
- * @next_timer:		Expiry time of next expiring timer for debugging purpose only
  * @tick_dep_mask:	Tick dependency mask - is set, if someone needs the tick
- * @last_tick_jiffies:	Value of jiffies seen on last tick
- * @stalled_jiffies:	Number of stalled jiffies detected across ticks
+ * @check_clocks:	Notification mechanism about clocksource changes
  */
 struct tick_sched {
-	struct hrtimer			sched_timer;
-	unsigned long			check_clocks;
-	enum tick_nohz_mode		nohz_mode;
-
+	/* Common flags */
 	unsigned int			inidle		: 1;
 	unsigned int			tick_stopped	: 1;
 	unsigned int			idle_active	: 1;
 	unsigned int			do_timer_last	: 1;
 	unsigned int			got_idle_tick	: 1;
 
+	/* Tick handling: jiffies stall check */
+	unsigned int			stalled_jiffies;
+	unsigned long			last_tick_jiffies;
+
+	/* Tick handling */
+	struct hrtimer			sched_timer;
 	ktime_t				last_tick;
 	ktime_t				next_tick;
 	unsigned long			idle_jiffies;
-	unsigned long			idle_calls;
-	unsigned long			idle_sleeps;
-	ktime_t				idle_entrytime;
 	ktime_t				idle_waketime;
-	ktime_t				idle_exittime;
-	ktime_t				idle_sleeptime;
-	ktime_t				iowait_sleeptime;
+
+	/* Idle entry */
+	ktime_t				idle_entrytime;
+
+	/* Tick stop */
+	enum tick_nohz_mode		nohz_mode;
 	unsigned long			last_jiffies;
-	u64				timer_expires;
 	u64				timer_expires_base;
+	u64				timer_expires;
 	u64				next_timer;
 	ktime_t				idle_expires;
+	unsigned long			idle_calls;
+	unsigned long			idle_sleeps;
+
+	/* Idle exit */
+	ktime_t				idle_exittime;
+	ktime_t				idle_sleeptime;
+	ktime_t				iowait_sleeptime;
+
+	/* Full dynticks handling */
 	atomic_t			tick_dep_mask;
-	unsigned long			last_tick_jiffies;
-	unsigned int			stalled_jiffies;
+
+	/* Clocksource changes */
+	unsigned long			check_clocks;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [tip: timers/core] timers/nohz: Only ever update sleeptime from idle exit
  2023-02-22 14:46 ` [PATCH 2/8] timers/nohz: Only ever update sleeptime from idle exit Frederic Weisbecker
@ 2023-04-18 14:53   ` tip-bot2 for Frederic Weisbecker
  0 siblings, 0 replies; 20+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2023-04-18 14:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Yu Liao, Frederic Weisbecker, Thomas Gleixner,
	Peter Zijlstra (Intel),
	x86, linux-kernel

The following commit has been merged into the timers/core branch of tip:

Commit-ID:     07b65a800b6d5b6afbd6a91487b47038eac97c21
Gitweb:        https://git.kernel.org/tip/07b65a800b6d5b6afbd6a91487b47038eac97c21
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Wed, 22 Feb 2023 15:46:43 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Tue, 18 Apr 2023 16:35:12 +02:00

timers/nohz: Only ever update sleeptime from idle exit

The idle and IO sleeptime statistics appearing in /proc/stat can be
currently updated from two sites: locally on idle exit and remotely
by cpufreq. However there is no synchronization mechanism protecting
concurrent updates. It is therefore possible to account the sleeptime
twice, among all the other possible broken scenarios.

To prevent from breaking the sleeptime accounting source, restrict the
sleeptime updates to the local idle exit site. If there is a delta to
add since the last update, IO/Idle sleep time readers will now only
compute the delta without actually writing it back to the internal idle
statistic fields.

This fixes a writer VS writer race. Note there are still two known
reader VS writer races to handle. A subsequent patch will fix one.

Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230222144649.624380-3-frederic@kernel.org

---
 kernel/time/tick-sched.c | 95 +++++++++++++++------------------------
 1 file changed, 37 insertions(+), 58 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b0e3c92..9058b9e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -637,31 +637,21 @@ static void tick_nohz_update_jiffies(ktime_t now)
 	touch_softlockup_watchdog_sched();
 }
 
-/*
- * Updates the per-CPU time idle statistics counters
- */
-static void
-update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
 {
 	ktime_t delta;
 
-	if (ts->idle_active) {
-		delta = ktime_sub(now, ts->idle_entrytime);
-		if (nr_iowait_cpu(cpu) > 0)
-			ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
-		else
-			ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
-		ts->idle_entrytime = now;
-	}
+	if (WARN_ON_ONCE(!ts->idle_active))
+		return;
 
-	if (last_update_time)
-		*last_update_time = ktime_to_us(now);
+	delta = ktime_sub(now, ts->idle_entrytime);
 
-}
+	if (nr_iowait_cpu(smp_processor_id()) > 0)
+		ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+	else
+		ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
 
-static void tick_nohz_stop_idle(struct tick_sched *ts, ktime_t now)
-{
-	update_ts_time_stats(smp_processor_id(), ts, now, NULL);
+	ts->idle_entrytime = now;
 	ts->idle_active = 0;
 
 	sched_clock_idle_wakeup_event();
@@ -674,6 +664,30 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
 	sched_clock_idle_sleep_event();
 }
 
+static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
+				 bool compute_delta, u64 *last_update_time)
+{
+	ktime_t now, idle;
+
+	if (!tick_nohz_active)
+		return -1;
+
+	now = ktime_get();
+	if (last_update_time)
+		*last_update_time = ktime_to_us(now);
+
+	if (ts->idle_active && compute_delta) {
+		ktime_t delta = ktime_sub(now, ts->idle_entrytime);
+
+		idle = ktime_add(*sleeptime, delta);
+	} else {
+		idle = *sleeptime;
+	}
+
+	return ktime_to_us(idle);
+
+}
+
 /**
  * get_cpu_idle_time_us - get the total idle time of a CPU
  * @cpu: CPU number to query
@@ -691,27 +705,9 @@ static void tick_nohz_start_idle(struct tick_sched *ts)
 u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t now, idle;
-
-	if (!tick_nohz_active)
-		return -1;
-
-	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		idle = ts->idle_sleeptime;
-	} else {
-		if (ts->idle_active && !nr_iowait_cpu(cpu)) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
-
-			idle = ktime_add(ts->idle_sleeptime, delta);
-		} else {
-			idle = ts->idle_sleeptime;
-		}
-	}
-
-	return ktime_to_us(idle);
 
+	return get_cpu_sleep_time_us(ts, &ts->idle_sleeptime,
+				     !nr_iowait_cpu(cpu), last_update_time);
 }
 EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 
@@ -732,26 +728,9 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
 u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
 {
 	struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
-	ktime_t now, iowait;
-
-	if (!tick_nohz_active)
-		return -1;
-
-	now = ktime_get();
-	if (last_update_time) {
-		update_ts_time_stats(cpu, ts, now, last_update_time);
-		iowait = ts->iowait_sleeptime;
-	} else {
-		if (ts->idle_active && nr_iowait_cpu(cpu) > 0) {
-			ktime_t delta = ktime_sub(now, ts->idle_entrytime);
-
-			iowait = ktime_add(ts->iowait_sleeptime, delta);
-		} else {
-			iowait = ts->iowait_sleeptime;
-		}
-	}
 
-	return ktime_to_us(iowait);
+	return get_cpu_sleep_time_us(ts, &ts->iowait_sleeptime,
+				     nr_iowait_cpu(cpu), last_update_time);
 }
 EXPORT_SYMBOL_GPL(get_cpu_iowait_time_us);
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-04-18 14:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-22 14:46 [PATCH 0/8] timers/nohz: Fixes and cleanups v3 Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 1/8] timers/nohz: Restructure and reshuffle struct tick_sched Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 2/8] timers/nohz: Only ever update sleeptime from idle exit Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 3/8] timers/nohz: Protect idle/iowait sleep time under seqcount Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 4/8] timers/nohz: Add a comment about broken iowait counter update race Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 5/8] timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick() Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 6/8] MAINTAINERS: Remove stale email address Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 7/8] selftests/proc: Remove idle time monotonicity assertions Frederic Weisbecker
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker
2023-02-22 14:46 ` [PATCH 8/8] selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity Frederic Weisbecker
2023-03-08 15:59   ` Mirsad Todorovac
2023-03-21 12:44     ` Frederic Weisbecker
2023-03-26 20:03       ` Mirsad Goran Todorovac
2023-04-18 14:53   ` [tip: timers/core] " tip-bot2 for Frederic Weisbecker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.