linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] NOHZ updates for v4.6
@ 2016-03-14 12:32 Ingo Molnar
  2016-03-15  2:44 ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-14 12:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Frédéric Weisbecker, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

Linus,

Please pull the latest timers-nohz-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-nohz-for-linus

   # HEAD: 1f25184656a00a59e3a953189070d42a749f6aee Merge branch 'timers/core-v9' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/nohz

Please pull NOHZ enhancements, by Frederic Weisbecker, which reorganizes/refactors 
the NOHZ 'can the tick be stopped?' infrastructure and related code to be data 
driven, and harmonizes the naming and handling of all the various properties.

Querying it all is now a matter of:

  triton:~/tip> git grep TICK_DEP_ kernel/
  kernel/events/core.c:   tick_dep_clear_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
  kernel/events/core.c:           tick_nohz_dep_clear(TICK_DEP_BIT_PERF_EVENTS);
  kernel/events/core.c:                   tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
  kernel/events/core.c:           tick_nohz_dep_set(TICK_DEP_BIT_PERF_EVENTS);
  kernel/sched/clock.c:   tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
  kernel/sched/clock.c:   tick_dep_set(TICK_DEP_BIT_CLOCK_UNSTABLE);
  kernel/sched/sched.h:           tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);
  kernel/sched/sched.h:           tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
  kernel/time/posix-cpu-timers.c:                 tick_dep_set_task(p, TICK_DEP_BIT_POSIX_TIMER);
  kernel/time/posix-cpu-timers.c:                 tick_dep_set_signal(p->signal, TICK_DEP_BIT_POSIX_TIMER);
  kernel/time/posix-cpu-timers.c:         tick_dep_clear_task(tsk, TICK_DEP_BIT_POSIX_TIMER);
  kernel/time/posix-cpu-timers.c: tick_dep_clear_signal(sig, TICK_DEP_BIT_POSIX_TIMER);
  kernel/time/posix-cpu-timers.c: tick_dep_set_signal(tsk->signal, TICK_DEP_BIT_POSIX_TIMER);

  kernel/time/tick-sched.c:       if (dep & TICK_DEP_MASK_POSIX_TIMER) {
  kernel/time/tick-sched.c:               trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
  kernel/time/tick-sched.c:       if (dep & TICK_DEP_MASK_PERF_EVENTS) {
  kernel/time/tick-sched.c:               trace_tick_stop(0, TICK_DEP_MASK_PERF_EVENTS);
  kernel/time/tick-sched.c:       if (dep & TICK_DEP_MASK_SCHED) {
  kernel/time/tick-sched.c:               trace_tick_stop(0, TICK_DEP_MASK_SCHED);
  kernel/time/tick-sched.c:       if (dep & TICK_DEP_MASK_CLOCK_UNSTABLE)
  kernel/time/tick-sched.c:               trace_tick_stop(0, TICK_DEP_MASK_CLOCK_UNSTABLE);
  kernel/time/tick-sched.c:               trace_tick_stop(1, TICK_DEP_MASK_NONE);

Thanks,

	Ingo

------------------>
Frederic Weisbecker (9):
      atomic: Export fetch_or()
      nohz: Implement wide kick on top of irq work
      nohz: New tick dependency mask
      nohz: Use enum code for tick stop failure tracing message
      perf: Migrate perf to use new tick dependency mask model
      sched: Account rr tasks
      sched: Migrate sched to use new tick dependency mask model
      posix-cpu-timers: Migrate to use new tick dependency mask model
      sched-clock: Migrate to use new tick dependency mask model


 include/linux/atomic.h         |  21 +++++
 include/linux/perf_event.h     |   6 --
 include/linux/posix-timers.h   |   3 -
 include/linux/sched.h          |  11 ++-
 include/linux/tick.h           |  97 ++++++++++++++++++++++-
 include/trace/events/timer.h   |  36 +++++++--
 kernel/events/core.c           |  65 +++++++++++----
 kernel/sched/clock.c           |   5 ++
 kernel/sched/core.c            |  49 +++++-------
 kernel/sched/rt.c              |  16 ++++
 kernel/sched/sched.h           |  48 +++++++----
 kernel/time/posix-cpu-timers.c |  52 +++---------
 kernel/time/tick-sched.c       | 175 ++++++++++++++++++++++++++++++++---------
 kernel/time/tick-sched.h       |   1 +
 14 files changed, 424 insertions(+), 161 deletions(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 301de78d65f7..6c502cb13c95 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -548,6 +548,27 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 }
 #endif
 
+/**
+ * fetch_or - perform *ptr |= mask and return old value of *ptr
+ * @ptr: pointer to value
+ * @mask: mask to OR on the value
+ *
+ * cmpxchg based fetch_or, macro so it works for different integer types
+ */
+#ifndef fetch_or
+#define fetch_or(ptr, mask)						\
+({	typeof(*(ptr)) __old, __val = *(ptr);				\
+	for (;;) {							\
+		__old = cmpxchg((ptr), __val, __val | (mask));		\
+		if (__old == __val)					\
+			break;						\
+		__val = __old;						\
+	}								\
+	__old;								\
+})
+#endif
+
+
 #ifdef CONFIG_GENERIC_ATOMIC64
 #include <asm-generic/atomic64.h>
 #endif
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5c5a3fa2c81..6e44efc19a6a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1109,12 +1109,6 @@ static inline void perf_event_task_tick(void)				{ }
 static inline int perf_event_release_kernel(struct perf_event *event)	{ return 0; }
 #endif
 
-#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL)
-extern bool perf_event_can_stop_tick(void);
-#else
-static inline bool perf_event_can_stop_tick(void)			{ return true; }
-#endif
-
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
 extern void perf_restore_debug_store(void);
 #else
diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h
index 907f3fd191ac..62d44c176071 100644
--- a/include/linux/posix-timers.h
+++ b/include/linux/posix-timers.h
@@ -128,9 +128,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer);
 void run_posix_cpu_timers(struct task_struct *task);
 void posix_cpu_timers_exit(struct task_struct *task);
 void posix_cpu_timers_exit_group(struct task_struct *task);
-
-bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk);
-
 void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx,
 			   cputime_t *newval, cputime_t *oldval);
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a10494a94cc3..2b10348806d8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -719,6 +719,10 @@ struct signal_struct {
 	/* Earliest-expiration cache. */
 	struct task_cputime cputime_expires;
 
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dep_mask;
+#endif
+
 	struct list_head cpu_timers[3];
 
 	struct pid *tty_old_pgrp;
@@ -1542,6 +1546,10 @@ struct task_struct {
 		VTIME_SYS,
 	} vtime_snap_whence;
 #endif
+
+#ifdef CONFIG_NO_HZ_FULL
+	unsigned long tick_dep_mask;
+#endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	u64 start_time;		/* monotonic time in nsec */
 	u64 real_start_time;	/* boot based time in nsec */
@@ -2356,10 +2364,7 @@ static inline void wake_up_nohz_cpu(int cpu) { }
 #endif
 
 #ifdef CONFIG_NO_HZ_FULL
-extern bool sched_can_stop_tick(void);
 extern u64 scheduler_tick_max_deferment(void);
-#else
-static inline bool sched_can_stop_tick(void) { return false; }
 #endif
 
 #ifdef CONFIG_SCHED_AUTOGROUP
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 97fd4e543846..21f73649a4dc 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -97,6 +97,19 @@ static inline void tick_broadcast_exit(void)
 	tick_broadcast_oneshot_control(TICK_BROADCAST_EXIT);
 }
 
+enum tick_dep_bits {
+	TICK_DEP_BIT_POSIX_TIMER	= 0,
+	TICK_DEP_BIT_PERF_EVENTS	= 1,
+	TICK_DEP_BIT_SCHED		= 2,
+	TICK_DEP_BIT_CLOCK_UNSTABLE	= 3
+};
+
+#define TICK_DEP_MASK_NONE		0
+#define TICK_DEP_MASK_POSIX_TIMER	(1 << TICK_DEP_BIT_POSIX_TIMER)
+#define TICK_DEP_MASK_PERF_EVENTS	(1 << TICK_DEP_BIT_PERF_EVENTS)
+#define TICK_DEP_MASK_SCHED		(1 << TICK_DEP_BIT_SCHED)
+#define TICK_DEP_MASK_CLOCK_UNSTABLE	(1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
+
 #ifdef CONFIG_NO_HZ_COMMON
 extern int tick_nohz_enabled;
 extern int tick_nohz_tick_stopped(void);
@@ -154,9 +167,73 @@ static inline int housekeeping_any_cpu(void)
 	return cpumask_any_and(housekeeping_mask, cpu_online_mask);
 }
 
-extern void tick_nohz_full_kick(void);
+extern void tick_nohz_dep_set(enum tick_dep_bits bit);
+extern void tick_nohz_dep_clear(enum tick_dep_bits bit);
+extern void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit);
+extern void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit);
+extern void tick_nohz_dep_set_task(struct task_struct *tsk,
+				   enum tick_dep_bits bit);
+extern void tick_nohz_dep_clear_task(struct task_struct *tsk,
+				     enum tick_dep_bits bit);
+extern void tick_nohz_dep_set_signal(struct signal_struct *signal,
+				     enum tick_dep_bits bit);
+extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
+				       enum tick_dep_bits bit);
+
+/*
+ * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases
+ * on top of static keys.
+ */
+static inline void tick_dep_set(enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_set(bit);
+}
+
+static inline void tick_dep_clear(enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_clear(bit);
+}
+
+static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_cpu(cpu))
+		tick_nohz_dep_set_cpu(cpu, bit);
+}
+
+static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_cpu(cpu))
+		tick_nohz_dep_clear_cpu(cpu, bit);
+}
+
+static inline void tick_dep_set_task(struct task_struct *tsk,
+				     enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_set_task(tsk, bit);
+}
+static inline void tick_dep_clear_task(struct task_struct *tsk,
+				       enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_clear_task(tsk, bit);
+}
+static inline void tick_dep_set_signal(struct signal_struct *signal,
+				       enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_set_signal(signal, bit);
+}
+static inline void tick_dep_clear_signal(struct signal_struct *signal,
+					 enum tick_dep_bits bit)
+{
+	if (tick_nohz_full_enabled())
+		tick_nohz_dep_clear_signal(signal, bit);
+}
+
 extern void tick_nohz_full_kick_cpu(int cpu);
-extern void tick_nohz_full_kick_all(void);
 extern void __tick_nohz_task_switch(void);
 #else
 static inline int housekeeping_any_cpu(void)
@@ -166,9 +243,21 @@ static inline int housekeeping_any_cpu(void)
 static inline bool tick_nohz_full_enabled(void) { return false; }
 static inline bool tick_nohz_full_cpu(int cpu) { return false; }
 static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+
+static inline void tick_dep_set(enum tick_dep_bits bit) { }
+static inline void tick_dep_clear(enum tick_dep_bits bit) { }
+static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
+static inline void tick_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }
+static inline void tick_dep_set_task(struct task_struct *tsk,
+				     enum tick_dep_bits bit) { }
+static inline void tick_dep_clear_task(struct task_struct *tsk,
+				       enum tick_dep_bits bit) { }
+static inline void tick_dep_set_signal(struct signal_struct *signal,
+				       enum tick_dep_bits bit) { }
+static inline void tick_dep_clear_signal(struct signal_struct *signal,
+					 enum tick_dep_bits bit) { }
+
 static inline void tick_nohz_full_kick_cpu(int cpu) { }
-static inline void tick_nohz_full_kick(void) { }
-static inline void tick_nohz_full_kick_all(void) { }
 static inline void __tick_nohz_task_switch(void) { }
 #endif
 
diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 073b9ac245ba..51440131d337 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -328,23 +328,49 @@ TRACE_EVENT(itimer_expire,
 );
 
 #ifdef CONFIG_NO_HZ_COMMON
+
+#define TICK_DEP_NAMES					\
+		tick_dep_name(NONE)			\
+		tick_dep_name(POSIX_TIMER)		\
+		tick_dep_name(PERF_EVENTS)		\
+		tick_dep_name(SCHED)			\
+		tick_dep_name_end(CLOCK_UNSTABLE)
+
+#undef tick_dep_name
+#undef tick_dep_name_end
+
+#define tick_dep_name(sdep) TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
+#define tick_dep_name_end(sdep)  TRACE_DEFINE_ENUM(TICK_DEP_MASK_##sdep);
+
+TICK_DEP_NAMES
+
+#undef tick_dep_name
+#undef tick_dep_name_end
+
+#define tick_dep_name(sdep) { TICK_DEP_MASK_##sdep, #sdep },
+#define tick_dep_name_end(sdep) { TICK_DEP_MASK_##sdep, #sdep }
+
+#define show_tick_dep_name(val)				\
+	__print_symbolic(val, TICK_DEP_NAMES)
+
 TRACE_EVENT(tick_stop,
 
-	TP_PROTO(int success, char *error_msg),
+	TP_PROTO(int success, int dependency),
 
-	TP_ARGS(success, error_msg),
+	TP_ARGS(success, dependency),
 
 	TP_STRUCT__entry(
 		__field( int ,		success	)
-		__string( msg, 		error_msg )
+		__field( int ,		dependency )
 	),
 
 	TP_fast_assign(
 		__entry->success	= success;
-		__assign_str(msg, error_msg);
+		__entry->dependency	= dependency;
 	),
 
-	TP_printk("success=%s msg=%s",  __entry->success ? "yes" : "no", __get_str(msg))
+	TP_printk("success=%d dependency=%s",  __entry->success, \
+			show_tick_dep_name(__entry->dependency))
 );
 #endif
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 614614821f00..effe8d652c1d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3112,17 +3112,6 @@ static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 	return rotate;
 }
 
-#ifdef CONFIG_NO_HZ_FULL
-bool perf_event_can_stop_tick(void)
-{
-	if (atomic_read(&nr_freq_events) ||
-	    __this_cpu_read(perf_throttled_count))
-		return false;
-	else
-		return true;
-}
-#endif
-
 void perf_event_task_tick(void)
 {
 	struct list_head *head = this_cpu_ptr(&active_ctx_list);
@@ -3133,6 +3122,7 @@ void perf_event_task_tick(void)
 
 	__this_cpu_inc(perf_throttled_seq);
 	throttled = __this_cpu_xchg(perf_throttled_count, 0);
+	tick_dep_clear_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
 
 	list_for_each_entry_safe(ctx, tmp, head, active_ctx_list)
 		perf_adjust_freq_unthr_context(ctx, throttled);
@@ -3564,6 +3554,28 @@ static void unaccount_event_cpu(struct perf_event *event, int cpu)
 		atomic_dec(&per_cpu(perf_cgroup_events, cpu));
 }
 
+#ifdef CONFIG_NO_HZ_FULL
+static DEFINE_SPINLOCK(nr_freq_lock);
+#endif
+
+static void unaccount_freq_event_nohz(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	spin_lock(&nr_freq_lock);
+	if (atomic_dec_and_test(&nr_freq_events))
+		tick_nohz_dep_clear(TICK_DEP_BIT_PERF_EVENTS);
+	spin_unlock(&nr_freq_lock);
+#endif
+}
+
+static void unaccount_freq_event(void)
+{
+	if (tick_nohz_full_enabled())
+		unaccount_freq_event_nohz();
+	else
+		atomic_dec(&nr_freq_events);
+}
+
 static void unaccount_event(struct perf_event *event)
 {
 	bool dec = false;
@@ -3580,7 +3592,7 @@ static void unaccount_event(struct perf_event *event)
 	if (event->attr.task)
 		atomic_dec(&nr_task_events);
 	if (event->attr.freq)
-		atomic_dec(&nr_freq_events);
+		unaccount_freq_event();
 	if (event->attr.context_switch) {
 		dec = true;
 		atomic_dec(&nr_switch_events);
@@ -6424,9 +6436,9 @@ static int __perf_event_overflow(struct perf_event *event,
 		if (unlikely(throttle
 			     && hwc->interrupts >= max_samples_per_tick)) {
 			__this_cpu_inc(perf_throttled_count);
+			tick_dep_set_cpu(smp_processor_id(), TICK_DEP_BIT_PERF_EVENTS);
 			hwc->interrupts = MAX_INTERRUPTS;
 			perf_log_throttle(event, 0);
-			tick_nohz_full_kick();
 			ret = 1;
 		}
 	}
@@ -7816,6 +7828,27 @@ static void account_event_cpu(struct perf_event *event, int cpu)
 		atomic_inc(&per_cpu(perf_cgroup_events, cpu));
 }
 
+/* Freq events need the tick to stay alive (see perf_event_task_tick). */
+static void account_freq_event_nohz(void)
+{
+#ifdef CONFIG_NO_HZ_FULL
+	/* Lock so we don't race with concurrent unaccount */
+	spin_lock(&nr_freq_lock);
+	if (atomic_inc_return(&nr_freq_events) == 1)
+		tick_nohz_dep_set(TICK_DEP_BIT_PERF_EVENTS);
+	spin_unlock(&nr_freq_lock);
+#endif
+}
+
+static void account_freq_event(void)
+{
+	if (tick_nohz_full_enabled())
+		account_freq_event_nohz();
+	else
+		atomic_inc(&nr_freq_events);
+}
+
+
 static void account_event(struct perf_event *event)
 {
 	bool inc = false;
@@ -7831,10 +7864,8 @@ static void account_event(struct perf_event *event)
 		atomic_inc(&nr_comm_events);
 	if (event->attr.task)
 		atomic_inc(&nr_task_events);
-	if (event->attr.freq) {
-		if (atomic_inc_return(&nr_freq_events) == 1)
-			tick_nohz_full_kick_all();
-	}
+	if (event->attr.freq)
+		account_freq_event();
 	if (event->attr.context_switch) {
 		atomic_inc(&nr_switch_events);
 		inc = true;
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index bc54e84675da..fedb967a9841 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -61,6 +61,7 @@
 #include <linux/static_key.h>
 #include <linux/workqueue.h>
 #include <linux/compiler.h>
+#include <linux/tick.h>
 
 /*
  * Scheduler clock - returns current time in nanosec units.
@@ -89,6 +90,8 @@ static void __set_sched_clock_stable(void)
 {
 	if (!sched_clock_stable())
 		static_key_slow_inc(&__sched_clock_stable);
+
+	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
 }
 
 void set_sched_clock_stable(void)
@@ -108,6 +111,8 @@ static void __clear_sched_clock_stable(struct work_struct *work)
 	/* XXX worry about clock continuity */
 	if (sched_clock_stable())
 		static_key_slow_dec(&__sched_clock_stable);
+
+	tick_dep_set(TICK_DEP_BIT_CLOCK_UNSTABLE);
 }
 
 static DECLARE_WORK(sched_clock_work, __clear_sched_clock_stable);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9503d590e5ef..1fad82364ffe 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -453,20 +453,6 @@ static inline void init_hrtick(void)
 }
 #endif	/* CONFIG_SCHED_HRTICK */
 
-/*
- * cmpxchg based fetch_or, macro so it works for different integer types
- */
-#define fetch_or(ptr, val)						\
-({	typeof(*(ptr)) __old, __val = *(ptr);				\
- 	for (;;) {							\
- 		__old = cmpxchg((ptr), __val, __val | (val));		\
- 		if (__old == __val)					\
- 			break;						\
- 		__val = __old;						\
- 	}								\
- 	__old;								\
-})
-
 #if defined(CONFIG_SMP) && defined(TIF_POLLING_NRFLAG)
 /*
  * Atomically set TIF_NEED_RESCHED and test for TIF_POLLING_NRFLAG,
@@ -715,31 +701,36 @@ static inline bool got_nohz_idle_kick(void)
 #endif /* CONFIG_NO_HZ_COMMON */
 
 #ifdef CONFIG_NO_HZ_FULL
-bool sched_can_stop_tick(void)
+bool sched_can_stop_tick(struct rq *rq)
 {
+	int fifo_nr_running;
+
+	/* Deadline tasks, even if single, need the tick */
+	if (rq->dl.dl_nr_running)
+		return false;
+
 	/*
-	 * FIFO realtime policy runs the highest priority task. Other runnable
-	 * tasks are of a lower priority. The scheduler tick does nothing.
+	 * FIFO realtime policy runs the highest priority task (after DEADLINE).
+	 * Other runnable tasks are of a lower priority. The scheduler tick
+	 * isn't needed.
 	 */
-	if (current->policy == SCHED_FIFO)
+	fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running;
+	if (fifo_nr_running)
 		return true;
 
 	/*
 	 * Round-robin realtime tasks time slice with other tasks at the same
-	 * realtime priority. Is this task the only one at this priority?
+	 * realtime priority.
 	 */
-	if (current->policy == SCHED_RR) {
-		struct sched_rt_entity *rt_se = &current->rt;
-
-		return list_is_singular(&rt_se->run_list);
+	if (rq->rt.rr_nr_running) {
+		if (rq->rt.rr_nr_running == 1)
+			return true;
+		else
+			return false;
 	}
 
-	/*
-	 * More than one running task need preemption.
-	 * nr_running update is assumed to be visible
-	 * after IPI is sent from wakers.
-	 */
-	if (this_rq()->nr_running > 1)
+	/* Normal multitasking need periodic preemption checks */
+	if (rq->cfs.nr_running > 1)
 		return false;
 
 	return true;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8ec86abe0ea1..3f1fcffbb18f 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1142,12 +1142,27 @@ unsigned int rt_se_nr_running(struct sched_rt_entity *rt_se)
 }
 
 static inline
+unsigned int rt_se_rr_nr_running(struct sched_rt_entity *rt_se)
+{
+	struct rt_rq *group_rq = group_rt_rq(rt_se);
+	struct task_struct *tsk;
+
+	if (group_rq)
+		return group_rq->rr_nr_running;
+
+	tsk = rt_task_of(rt_se);
+
+	return (tsk->policy == SCHED_RR) ? 1 : 0;
+}
+
+static inline
 void inc_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 {
 	int prio = rt_se_prio(rt_se);
 
 	WARN_ON(!rt_prio(prio));
 	rt_rq->rt_nr_running += rt_se_nr_running(rt_se);
+	rt_rq->rr_nr_running += rt_se_rr_nr_running(rt_se);
 
 	inc_rt_prio(rt_rq, prio);
 	inc_rt_migration(rt_se, rt_rq);
@@ -1160,6 +1175,7 @@ void dec_rt_tasks(struct sched_rt_entity *rt_se, struct rt_rq *rt_rq)
 	WARN_ON(!rt_prio(rt_se_prio(rt_se)));
 	WARN_ON(!rt_rq->rt_nr_running);
 	rt_rq->rt_nr_running -= rt_se_nr_running(rt_se);
+	rt_rq->rr_nr_running -= rt_se_rr_nr_running(rt_se);
 
 	dec_rt_prio(rt_rq, rt_se_prio(rt_se));
 	dec_rt_migration(rt_se, rt_rq);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10f16374df7f..4f0bca770108 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -450,6 +450,7 @@ static inline int rt_bandwidth_enabled(void)
 struct rt_rq {
 	struct rt_prio_array active;
 	unsigned int rt_nr_running;
+	unsigned int rr_nr_running;
 #if defined CONFIG_SMP || defined CONFIG_RT_GROUP_SCHED
 	struct {
 		int curr; /* highest queued rt task prio */
@@ -1278,6 +1279,35 @@ unsigned long to_ratio(u64 period, u64 runtime);
 
 extern void init_entity_runnable_average(struct sched_entity *se);
 
+#ifdef CONFIG_NO_HZ_FULL
+extern bool sched_can_stop_tick(struct rq *rq);
+
+/*
+ * Tick may be needed by tasks in the runqueue depending on their policy and
+ * requirements. If tick is needed, lets send the target an IPI to kick it out of
+ * nohz mode if necessary.
+ */
+static inline void sched_update_tick_dependency(struct rq *rq)
+{
+	int cpu;
+
+	if (!tick_nohz_full_enabled())
+		return;
+
+	cpu = cpu_of(rq);
+
+	if (!tick_nohz_full_cpu(cpu))
+		return;
+
+	if (sched_can_stop_tick(rq))
+		tick_nohz_dep_clear_cpu(cpu, TICK_DEP_BIT_SCHED);
+	else
+		tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+}
+#else
+static inline void sched_update_tick_dependency(struct rq *rq) { }
+#endif
+
 static inline void add_nr_running(struct rq *rq, unsigned count)
 {
 	unsigned prev_nr = rq->nr_running;
@@ -1289,26 +1319,16 @@ static inline void add_nr_running(struct rq *rq, unsigned count)
 		if (!rq->rd->overload)
 			rq->rd->overload = true;
 #endif
-
-#ifdef CONFIG_NO_HZ_FULL
-		if (tick_nohz_full_cpu(rq->cpu)) {
-			/*
-			 * Tick is needed if more than one task runs on a CPU.
-			 * Send the target an IPI to kick it out of nohz mode.
-			 *
-			 * We assume that IPI implies full memory barrier and the
-			 * new value of rq->nr_running is visible on reception
-			 * from the target.
-			 */
-			tick_nohz_full_kick_cpu(rq->cpu);
-		}
-#endif
 	}
+
+	sched_update_tick_dependency(rq);
 }
 
 static inline void sub_nr_running(struct rq *rq, unsigned count)
 {
 	rq->nr_running -= count;
+	/* Check if we still need preemption */
+	sched_update_tick_dependency(rq);
 }
 
 static inline void rq_last_tick_reset(struct rq *rq)
diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index f5e86d282d52..1cafba860b08 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -333,7 +333,6 @@ static int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *tp)
 	return err;
 }
 
-
 /*
  * Validate the clockid_t for a new CPU-clock timer, and initialize the timer.
  * This is called from sys_timer_create() and do_cpu_nanosleep() with the
@@ -517,6 +516,10 @@ static void arm_timer(struct k_itimer *timer)
 				cputime_expires->sched_exp = exp;
 			break;
 		}
+		if (CPUCLOCK_PERTHREAD(timer->it_clock))
+			tick_dep_set_task(p, TICK_DEP_BIT_POSIX_TIMER);
+		else
+			tick_dep_set_signal(p->signal, TICK_DEP_BIT_POSIX_TIMER);
 	}
 }
 
@@ -582,39 +585,6 @@ static int cpu_timer_sample_group(const clockid_t which_clock,
 	return 0;
 }
 
-#ifdef CONFIG_NO_HZ_FULL
-static void nohz_kick_work_fn(struct work_struct *work)
-{
-	tick_nohz_full_kick_all();
-}
-
-static DECLARE_WORK(nohz_kick_work, nohz_kick_work_fn);
-
-/*
- * We need the IPIs to be sent from sane process context.
- * The posix cpu timers are always set with irqs disabled.
- */
-static void posix_cpu_timer_kick_nohz(void)
-{
-	if (context_tracking_is_enabled())
-		schedule_work(&nohz_kick_work);
-}
-
-bool posix_cpu_timers_can_stop_tick(struct task_struct *tsk)
-{
-	if (!task_cputime_zero(&tsk->cputime_expires))
-		return false;
-
-	/* Check if cputimer is running. This is accessed without locking. */
-	if (READ_ONCE(tsk->signal->cputimer.running))
-		return false;
-
-	return true;
-}
-#else
-static inline void posix_cpu_timer_kick_nohz(void) { }
-#endif
-
 /*
  * Guts of sys_timer_settime for CPU timers.
  * This is called with the timer locked and interrupts disabled.
@@ -761,8 +731,7 @@ static int posix_cpu_timer_set(struct k_itimer *timer, int timer_flags,
 		sample_to_timespec(timer->it_clock,
 				   old_incr, &old->it_interval);
 	}
-	if (!ret)
-		posix_cpu_timer_kick_nohz();
+
 	return ret;
 }
 
@@ -911,6 +880,8 @@ static void check_thread_timers(struct task_struct *tsk,
 			__group_send_sig_info(SIGXCPU, SEND_SIG_PRIV, tsk);
 		}
 	}
+	if (task_cputime_zero(tsk_expires))
+		tick_dep_clear_task(tsk, TICK_DEP_BIT_POSIX_TIMER);
 }
 
 static inline void stop_process_timers(struct signal_struct *sig)
@@ -919,6 +890,7 @@ static inline void stop_process_timers(struct signal_struct *sig)
 
 	/* Turn off cputimer->running. This is done without locking. */
 	WRITE_ONCE(cputimer->running, false);
+	tick_dep_clear_signal(sig, TICK_DEP_BIT_POSIX_TIMER);
 }
 
 static u32 onecputick;
@@ -1095,8 +1067,6 @@ void posix_cpu_timer_schedule(struct k_itimer *timer)
 	arm_timer(timer);
 	unlock_task_sighand(p, &flags);
 
-	/* Kick full dynticks CPUs in case they need to tick on the new timer */
-	posix_cpu_timer_kick_nohz();
 out:
 	timer->it_overrun_last = timer->it_overrun;
 	timer->it_overrun = -1;
@@ -1270,7 +1240,7 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx,
 		}
 
 		if (!*newval)
-			goto out;
+			return;
 		*newval += now;
 	}
 
@@ -1288,8 +1258,8 @@ void set_process_cpu_timer(struct task_struct *tsk, unsigned int clock_idx,
 			tsk->signal->cputime_expires.virt_exp = *newval;
 		break;
 	}
-out:
-	posix_cpu_timer_kick_nohz();
+
+	tick_dep_set_signal(tsk->signal, TICK_DEP_BIT_POSIX_TIMER);
 }
 
 static int do_cpu_nanosleep(const clockid_t which_clock, int flags,
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 0b17424349eb..969e6704c3c9 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -22,7 +22,6 @@
 #include <linux/module.h>
 #include <linux/irq_work.h>
 #include <linux/posix-timers.h>
-#include <linux/perf_event.h>
 #include <linux/context_tracking.h>
 
 #include <asm/irq_regs.h>
@@ -158,54 +157,63 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 cpumask_var_t tick_nohz_full_mask;
 cpumask_var_t housekeeping_mask;
 bool tick_nohz_full_running;
+static unsigned long tick_dep_mask;
 
-static bool can_stop_full_tick(void)
+static void trace_tick_dependency(unsigned long dep)
+{
+	if (dep & TICK_DEP_MASK_POSIX_TIMER) {
+		trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
+		return;
+	}
+
+	if (dep & TICK_DEP_MASK_PERF_EVENTS) {
+		trace_tick_stop(0, TICK_DEP_MASK_PERF_EVENTS);
+		return;
+	}
+
+	if (dep & TICK_DEP_MASK_SCHED) {
+		trace_tick_stop(0, TICK_DEP_MASK_SCHED);
+		return;
+	}
+
+	if (dep & TICK_DEP_MASK_CLOCK_UNSTABLE)
+		trace_tick_stop(0, TICK_DEP_MASK_CLOCK_UNSTABLE);
+}
+
+static bool can_stop_full_tick(struct tick_sched *ts)
 {
 	WARN_ON_ONCE(!irqs_disabled());
 
-	if (!sched_can_stop_tick()) {
-		trace_tick_stop(0, "more than 1 task in runqueue\n");
+	if (tick_dep_mask) {
+		trace_tick_dependency(tick_dep_mask);
 		return false;
 	}
 
-	if (!posix_cpu_timers_can_stop_tick(current)) {
-		trace_tick_stop(0, "posix timers running\n");
+	if (ts->tick_dep_mask) {
+		trace_tick_dependency(ts->tick_dep_mask);
 		return false;
 	}
 
-	if (!perf_event_can_stop_tick()) {
-		trace_tick_stop(0, "perf events running\n");
+	if (current->tick_dep_mask) {
+		trace_tick_dependency(current->tick_dep_mask);
 		return false;
 	}
 
-	/* sched_clock_tick() needs us? */
-#ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
-	/*
-	 * TODO: kick full dynticks CPUs when
-	 * sched_clock_stable is set.
-	 */
-	if (!sched_clock_stable()) {
-		trace_tick_stop(0, "unstable sched clock\n");
-		/*
-		 * Don't allow the user to think they can get
-		 * full NO_HZ with this machine.
-		 */
-		WARN_ONCE(tick_nohz_full_running,
-			  "NO_HZ FULL will not work with unstable sched clock");
+	if (current->signal->tick_dep_mask) {
+		trace_tick_dependency(current->signal->tick_dep_mask);
 		return false;
 	}
-#endif
 
 	return true;
 }
 
-static void nohz_full_kick_work_func(struct irq_work *work)
+static void nohz_full_kick_func(struct irq_work *work)
 {
 	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
 }
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
-	.func = nohz_full_kick_work_func,
+	.func = nohz_full_kick_func,
 };
 
 /*
@@ -214,7 +222,7 @@ static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
  * This kick, unlike tick_nohz_full_kick_cpu() and tick_nohz_full_kick_all(),
  * is NMI safe.
  */
-void tick_nohz_full_kick(void)
+static void tick_nohz_full_kick(void)
 {
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		return;
@@ -234,27 +242,112 @@ void tick_nohz_full_kick_cpu(int cpu)
 	irq_work_queue_on(&per_cpu(nohz_full_kick_work, cpu), cpu);
 }
 
-static void nohz_full_kick_ipi(void *info)
-{
-	/* Empty, the tick restart happens on tick_nohz_irq_exit() */
-}
-
 /*
  * Kick all full dynticks CPUs in order to force these to re-evaluate
  * their dependency on the tick and restart it if necessary.
  */
-void tick_nohz_full_kick_all(void)
+static void tick_nohz_full_kick_all(void)
 {
+	int cpu;
+
 	if (!tick_nohz_full_running)
 		return;
 
 	preempt_disable();
-	smp_call_function_many(tick_nohz_full_mask,
-			       nohz_full_kick_ipi, NULL, false);
-	tick_nohz_full_kick();
+	for_each_cpu_and(cpu, tick_nohz_full_mask, cpu_online_mask)
+		tick_nohz_full_kick_cpu(cpu);
 	preempt_enable();
 }
 
+static void tick_nohz_dep_set_all(unsigned long *dep,
+				  enum tick_dep_bits bit)
+{
+	unsigned long prev;
+
+	prev = fetch_or(dep, BIT_MASK(bit));
+	if (!prev)
+		tick_nohz_full_kick_all();
+}
+
+/*
+ * Set a global tick dependency. Used by perf events that rely on freq and
+ * by unstable clock.
+ */
+void tick_nohz_dep_set(enum tick_dep_bits bit)
+{
+	tick_nohz_dep_set_all(&tick_dep_mask, bit);
+}
+
+void tick_nohz_dep_clear(enum tick_dep_bits bit)
+{
+	clear_bit(bit, &tick_dep_mask);
+}
+
+/*
+ * Set per-CPU tick dependency. Used by scheduler and perf events in order to
+ * manage events throttling.
+ */
+void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
+{
+	unsigned long prev;
+	struct tick_sched *ts;
+
+	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	prev = fetch_or(&ts->tick_dep_mask, BIT_MASK(bit));
+	if (!prev) {
+		preempt_disable();
+		/* Perf needs local kick that is NMI safe */
+		if (cpu == smp_processor_id()) {
+			tick_nohz_full_kick();
+		} else {
+			/* Remote irq work not NMI-safe */
+			if (!WARN_ON_ONCE(in_nmi()))
+				tick_nohz_full_kick_cpu(cpu);
+		}
+		preempt_enable();
+	}
+}
+
+void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
+{
+	struct tick_sched *ts = per_cpu_ptr(&tick_cpu_sched, cpu);
+
+	clear_bit(bit, &ts->tick_dep_mask);
+}
+
+/*
+ * Set a per-task tick dependency. Posix CPU timers need this in order to elapse
+ * per task timers.
+ */
+void tick_nohz_dep_set_task(struct task_struct *tsk, enum tick_dep_bits bit)
+{
+	/*
+	 * We could optimize this with just kicking the target running the task
+	 * if that noise matters for nohz full users.
+	 */
+	tick_nohz_dep_set_all(&tsk->tick_dep_mask, bit);
+}
+
+void tick_nohz_dep_clear_task(struct task_struct *tsk, enum tick_dep_bits bit)
+{
+	clear_bit(bit, &tsk->tick_dep_mask);
+}
+
+/*
+ * Set a per-taskgroup tick dependency. Posix CPU timers need this in order to elapse
+ * per process timers.
+ */
+void tick_nohz_dep_set_signal(struct signal_struct *sig, enum tick_dep_bits bit)
+{
+	tick_nohz_dep_set_all(&sig->tick_dep_mask, bit);
+}
+
+void tick_nohz_dep_clear_signal(struct signal_struct *sig, enum tick_dep_bits bit)
+{
+	clear_bit(bit, &sig->tick_dep_mask);
+}
+
 /*
  * Re-evaluate the need for the tick as we switch the current task.
  * It might need the tick due to per task/process properties:
@@ -263,15 +356,19 @@ void tick_nohz_full_kick_all(void)
 void __tick_nohz_task_switch(void)
 {
 	unsigned long flags;
+	struct tick_sched *ts;
 
 	local_irq_save(flags);
 
 	if (!tick_nohz_full_cpu(smp_processor_id()))
 		goto out;
 
-	if (tick_nohz_tick_stopped() && !can_stop_full_tick())
-		tick_nohz_full_kick();
+	ts = this_cpu_ptr(&tick_cpu_sched);
 
+	if (ts->tick_stopped) {
+		if (current->tick_dep_mask || current->signal->tick_dep_mask)
+			tick_nohz_full_kick();
+	}
 out:
 	local_irq_restore(flags);
 }
@@ -689,7 +786,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 		ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 		ts->tick_stopped = 1;
-		trace_tick_stop(1, " ");
+		trace_tick_stop(1, TICK_DEP_MASK_NONE);
 	}
 
 	/*
@@ -740,7 +837,7 @@ static void tick_nohz_full_update_tick(struct tick_sched *ts)
 	if (!ts->tick_stopped && ts->nohz_mode == NOHZ_MODE_INACTIVE)
 		return;
 
-	if (can_stop_full_tick())
+	if (can_stop_full_tick(ts))
 		tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
 	else if (ts->tick_stopped)
 		tick_nohz_restart_sched_tick(ts, ktime_get(), 1);
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index a4a8d4e9baa1..eb4e32566a83 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -60,6 +60,7 @@ struct tick_sched {
 	u64				next_timer;
 	ktime_t				idle_expires;
 	int				do_timer_last;
+	unsigned long			tick_dep_mask;
 };
 
 extern struct tick_sched *tick_get_tick_sched(int cpu);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] NOHZ updates for v4.6
  2016-03-14 12:32 [GIT PULL] NOHZ updates for v4.6 Ingo Molnar
@ 2016-03-15  2:44 ` Linus Torvalds
  2016-03-15  8:42   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2016-03-15  2:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton

On Mon, Mar 14, 2016 at 5:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
> +/**
> + * fetch_or - perform *ptr |= mask and return old value of *ptr
> + * @ptr: pointer to value
> + * @mask: mask to OR on the value
> + *
> + * cmpxchg based fetch_or, macro so it works for different integer types
> + */
> +#ifndef fetch_or
> +#define fetch_or(ptr, mask)                                            \
> +({     typeof(*(ptr)) __old, __val = *(ptr);                           \
> +       for (;;) {                                                      \
> +               __old = cmpxchg((ptr), __val, __val | (mask));          \
> +               if (__old == __val)                                     \
> +                       break;                                          \
> +               __val = __old;                                          \
> +       }                                                               \
> +       __old;                                                          \
> +})
> +#endif

This is garbage.

This macro re-uses the "mask" argument potentially many many times, so
semantically it's very dubious.

That may have been acceptable in the old situation when this was
internal to sched.c, but now the code was moved to a generic header
file. And this kind of broken macro is *not* acceptable in that
context any more.

It is now in asm-generic/atomic.h, so it should now conform to the
rest of the code there. Try to model it around ATOMIC_OP_RETURN(),
although obviously this fetch_or() function returns the value *before*
the logical 'or' rather than the end result.

It would be lovely if it were piossible to just use an "atomic_t"
type, but it looks like it is used on thread_info->flags. Which
doesn't have a good type, sadly.

As a result, the code then makes a big deal about how this works with
any integer type, but then the new code that uses it uses a stupid
type that isn't appropriate. Why would it be using an "unsigned long",
when

 - it holds a fixed number of bits that don't depend on architecture
and certainly is not 64 (or even close to 32).

 - the structure it is in was just preceded by an "int", so on 64-bit
it generates pointless padding in addition to the pointless 64-bit
field.

The only reason to use a "unsigned long" is because "fetch_or()" would
be hardcoded to that type, which doesn't seem to be true.

Now, in practice, the code looks like it should *work* fine, so I'm
going to pull this, but I do want to lodge a protest on sloppiness and
general and unnecessary uglicity of this code.

So please get this fixed.

                  Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] NOHZ updates for v4.6
  2016-03-15  2:44 ` Linus Torvalds
@ 2016-03-15  8:42   ` Peter Zijlstra
  2016-03-15  9:49     ` Ingo Molnar
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
  2016-03-15  9:53   ` [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int' Ingo Molnar
  2 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15  8:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Mon, Mar 14, 2016 at 07:44:14PM -0700, Linus Torvalds wrote:
> On Mon, Mar 14, 2016 at 5:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > +/**
> > + * fetch_or - perform *ptr |= mask and return old value of *ptr
> > + * @ptr: pointer to value
> > + * @mask: mask to OR on the value
> > + *
> > + * cmpxchg based fetch_or, macro so it works for different integer types
> > + */
> > +#ifndef fetch_or
> > +#define fetch_or(ptr, mask)                                            \
> > +({     typeof(*(ptr)) __old, __val = *(ptr);                           \
> > +       for (;;) {                                                      \
> > +               __old = cmpxchg((ptr), __val, __val | (mask));          \
> > +               if (__old == __val)                                     \
> > +                       break;                                          \
> > +               __val = __old;                                          \
> > +       }                                                               \
> > +       __old;                                                          \
> > +})
> > +#endif
> 
> This is garbage.
> 
> This macro re-uses the "mask" argument potentially many many times, so
> semantically it's very dubious.

So the below cures that; but do we want to maybe pull this back into
sched.c and expose a version that operates on a fixed type instead?

Although with xchg() and cmpxchg() we've already set a precedence for
multi-width operators.

The whole thread_info::flags thing is rather 'special' and I'm not sure
we even want to go clean that up, I can see why 64bit arch would very
much want a 64bit flags word etc. And TIF_flags are very arch specific.

---
 include/linux/atomic.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 6c502cb13c95..114caf672b11 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -558,8 +558,9 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 #ifndef fetch_or
 #define fetch_or(ptr, mask)						\
 ({	typeof(*(ptr)) __old, __val = *(ptr);				\
+ 	typeof(mask) __mask = (mask);					\
 	for (;;) {							\
-		__old = cmpxchg((ptr), __val, __val | (mask));		\
+		__old = cmpxchg((ptr), __val, __val | __mask);		\
 		if (__old == __val)					\
 			break;						\
 		__val = __old;						\

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  2:44 ` Linus Torvalds
  2016-03-15  8:42   ` Peter Zijlstra
@ 2016-03-15  9:32   ` Ingo Molnar
  2016-03-15 10:50     ` Peter Zijlstra
                       ` (5 more replies)
  2016-03-15  9:53   ` [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int' Ingo Molnar
  2 siblings, 6 replies; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15  9:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Mon, Mar 14, 2016 at 5:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > +/**
> > + * fetch_or - perform *ptr |= mask and return old value of *ptr
> > + * @ptr: pointer to value
> > + * @mask: mask to OR on the value
> > + *
> > + * cmpxchg based fetch_or, macro so it works for different integer types
> > + */
> > +#ifndef fetch_or
> > +#define fetch_or(ptr, mask)                                            \
> > +({     typeof(*(ptr)) __old, __val = *(ptr);                           \
> > +       for (;;) {                                                      \
> > +               __old = cmpxchg((ptr), __val, __val | (mask));          \
> > +               if (__old == __val)                                     \
> > +                       break;                                          \
> > +               __val = __old;                                          \
> > +       }                                                               \
> > +       __old;                                                          \
> > +})
> > +#endif
> 
> This is garbage.

Indeed.

> This macro re-uses the "mask" argument potentially many many times, so
> semantically it's very dubious.

Yes.

> That may have been acceptable in the old situation when this was
> internal to sched.c, but now the code was moved to a generic header
> file. And this kind of broken macro is *not* acceptable in that
> context any more.
> 
> It is now in asm-generic/atomic.h, so it should now conform to the
> rest of the code there. Try to model it around ATOMIC_OP_RETURN(),
> although obviously this fetch_or() function returns the value *before*
> the logical 'or' rather than the end result.

Ok.

I can see two other problems with it as well:

 1) 'ptr' may get evaluated multiple times as well, not just 'mask'.

 2) its naming sucks. "fetch_or()" does not really signal that it's a
    fundamentally atomic operation, nor what API family it belongs to.

Given that it's in include/linux/atomic.h, and it has _almost_ xchg() semantics - 
how about we name it xchg_or() and model it after that? From the semantics POV 
it's really an unconditional xchg()-alike API, even though technically it's a 
cmpxchg() loop.

I've attached a (totally untested) patch that fixes the bugs and does the rename.

> It would be lovely if it were piossible to just use an "atomic_t"
> type, but it looks like it is used on thread_info->flags. Which
> doesn't have a good type, sadly.

Indeed, thread_info->flags is 'almost' but not quite 'unsigned long':

 triton:~/tip> grep -E '\<flags\>.*;' $(git grep -l 'thread_info {' arch/) | grep -vE 'WARN|return'
 arch/alpha/include/asm/thread_info.h:   unsigned int            flags;          /* low level flags */
 arch/arc/include/asm/thread_info.h:     unsigned long flags;            /* low level flags */
 arch/arm/include/asm/thread_info.h:     unsigned long           flags;          /* low level flags */
 arch/arm64/include/asm/thread_info.h:   unsigned long           flags;          /* low level flags */
 arch/avr32/include/asm/thread_info.h:   unsigned long           flags;          /* low level flags */
 arch/blackfin/include/asm/thread_info.h:        unsigned long flags;    /* low level flags */
 arch/c6x/include/asm/thread_info.h:     unsigned long           flags;          /* low level flags */
 arch/cris/include/asm/thread_info.h:    unsigned long           flags;          /* low level flags */
 arch/frv/include/asm/thread_info.h:     unsigned long           flags;          /* low level flags */
 arch/h8300/include/asm/thread_info.h:   unsigned long      flags;               /* low level flags */
 arch/hexagon/include/asm/thread_info.h: unsigned long           flags;          /* low level flags */
 arch/ia64/include/asm/thread_info.h:    __u32 flags;                    /* thread_info flags (see TIF_*) */
 arch/m32r/include/asm/thread_info.h:    unsigned long           flags;          /* low level flags */
 arch/m68k/include/asm/thread_info.h:    unsigned long           flags;
 arch/metag/include/asm/thread_info.h:   unsigned long flags;    /* low level flags */
 arch/microblaze/include/asm/thread_info.h:      unsigned long           flags; /* low level flags */
 arch/mips/include/asm/thread_info.h:    unsigned long           flags;          /* low level flags */
 arch/mn10300/include/asm/thread_info.h: unsigned long           flags;          /* low level flags */
 arch/nios2/include/asm/thread_info.h:   unsigned long           flags;          /* low level flags */
 arch/openrisc/include/asm/thread_info.h:        unsigned long           flags;          /* low level flags */
 arch/parisc/include/asm/thread_info.h:  unsigned long flags;            /* thread_info flags (see TIF_*) */
 arch/powerpc/include/asm/thread_info.h: unsigned long   flags ____cacheline_aligned_in_smp;
 arch/s390/include/asm/thread_info.h:    unsigned long           flags;          /* low level flags */
 arch/score/include/asm/thread_info.h:   unsigned long           flags;          /* low level flags */
 arch/sh/include/asm/thread_info.h:      unsigned long           flags;          /* low level flags */
 arch/sparc/include/asm/thread_info_32.h:        unsigned long           flags;          /* low level flags */
 arch/sparc/include/asm/thread_info_64.h:        unsigned long           flags;
 arch/tile/include/asm/thread_info.h:    unsigned long           flags;          /* low level flags */
 arch/um/include/asm/thread_info.h:      unsigned long           flags;          /* low level flags */
 arch/unicore32/include/asm/thread_info.h:       unsigned long           flags;          /* low level flags */
 arch/x86/include/asm/thread_info.h:     __u32                   flags;          /* low level flags */
 arch/xtensa/include/asm/thread_info.h:  unsigned long           flags;          /* low level flags */

alpha, ia64 and x86 are the odd ones out with an int ti::flags.

> As a result, the code then makes a big deal about how this works with
> any integer type, but then the new code that uses it uses a stupid
> type that isn't appropriate. Why would it be using an "unsigned long",
> when
> 
>  - it holds a fixed number of bits that don't depend on architecture
> and certainly is not 64 (or even close to 32).
> 
>  - the structure it is in was just preceded by an "int", so on 64-bit
> it generates pointless padding in addition to the pointless 64-bit
> field.

Yes, its natural type would be 'unsigned int'.

> The only reason to use a "unsigned long" is because "fetch_or()" would
> be hardcoded to that type, which doesn't seem to be true.

Yes, fetch_or() uses cmpxchg() internally, and AFAICS cmpxchg() is guaranteed to 
work on two integer types:

 - 32 bit
 - the natural word type of the architecture. (32 or 64 bit)

Many architectures have wider support for cmpxchg(): 8, 16, 32 and 64 bits, but we 
cannot rely on that in a generic header.

So changing the dependency mask type to 'unsigned int' should be safe. I'll send 
another patch for that.

> Now, in practice, the code looks like it should *work* fine, so I'm
> going to pull this, but I do want to lodge a protest on sloppiness and
> general and unnecessary uglicity of this code.
> 
> So please get this fixed.

Sorry about this, that was quite sloppy on several levels!

Thanks,

	Ingo

===========================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 15 Mar 2016 10:19:10 +0100
Subject: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'

Linus noticed a couple of problems with the fetch_or() implementation introduced 
by 5fd7a09cfb8c ("atomic: Export fetch_or()"):

 - Sloppy macro implementation: 'mask' and 'ptr' is evaluated multiple times, 
   which will break if arguments have side effects.

 - Sloppy semantics: the naming and structure of the macro is ambigious, with
   no well-defined semantics. It's neither an atomic nor a cmpxchg() interface, 
   but still it lives in include/linux/atomic.h.

Solve this by:

 - Extracting the arguments into helper variables once, and never
   using the original arguments from that point on. This makes it
   pretty clear that there can be no unwanted macro evaluation
   side effects.

 - Renaming fetch_or() to xchg_or(), recognizing that the semantics
   are xchg()-alike.

Also propagate the rename to the call sites.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/atomic.h   | 20 +++++++++++++-------
 kernel/sched/core.c      |  2 +-
 kernel/time/tick-sched.c |  4 ++--
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 6c502cb13c95..a884f56b882f 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -549,21 +549,27 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 #endif
 
 /**
- * fetch_or - perform *ptr |= mask and return old value of *ptr
- * @ptr: pointer to value
+ * xchg_or - perform *ptr |= mask atomically and return old value of *ptr
+ * @ptr: pointer to value (cmpxchg() compatible integer pointer type)
  * @mask: mask to OR on the value
  *
- * cmpxchg based fetch_or, macro so it works for different integer types
+ * cmpxchg() based, it's a macro so it works for different integer types.
  */
-#ifndef fetch_or
-#define fetch_or(ptr, mask)						\
-({	typeof(*(ptr)) __old, __val = *(ptr);				\
+#ifndef xchg_or
+# define xchg_or(ptr, mask)						\
+({									\
+	typeof(ptr)  __ptr  = (ptr);					\
+	typeof(mask) __mask = (mask);					\
+									\
+	typeof(*(__ptr)) __old, __val = *__ptr;				\
+									\
 	for (;;) {							\
-		__old = cmpxchg((ptr), __val, __val | (mask));		\
+		__old = cmpxchg(__ptr, __val, __val | __mask);		\
 		if (__old == __val)					\
 			break;						\
 		__val = __old;						\
 	}								\
+									\
 	__old;								\
 })
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1c82ded3e675..de380776d1df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -330,7 +330,7 @@ static inline void init_hrtick(void)
 static bool set_nr_and_not_polling(struct task_struct *p)
 {
 	struct thread_info *ti = task_thread_info(p);
-	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
+	return !(xchg_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 969e6704c3c9..851631899352 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -264,7 +264,7 @@ static void tick_nohz_dep_set_all(unsigned long *dep,
 {
 	unsigned long prev;
 
-	prev = fetch_or(dep, BIT_MASK(bit));
+	prev = xchg_or(dep, BIT_MASK(bit));
 	if (!prev)
 		tick_nohz_full_kick_all();
 }
@@ -294,7 +294,7 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 
 	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
-	prev = fetch_or(&ts->tick_dep_mask, BIT_MASK(bit));
+	prev = xchg_or(&ts->tick_dep_mask, BIT_MASK(bit));
 	if (!prev) {
 		preempt_disable();
 		/* Perf needs local kick that is NMI safe */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [GIT PULL] NOHZ updates for v4.6
  2016-03-15  8:42   ` Peter Zijlstra
@ 2016-03-15  9:49     ` Ingo Molnar
  0 siblings, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15  9:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Mar 14, 2016 at 07:44:14PM -0700, Linus Torvalds wrote:
> > On Mon, Mar 14, 2016 at 5:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
> > > +/**
> > > + * fetch_or - perform *ptr |= mask and return old value of *ptr
> > > + * @ptr: pointer to value
> > > + * @mask: mask to OR on the value
> > > + *
> > > + * cmpxchg based fetch_or, macro so it works for different integer types
> > > + */
> > > +#ifndef fetch_or
> > > +#define fetch_or(ptr, mask)                                            \
> > > +({     typeof(*(ptr)) __old, __val = *(ptr);                           \
> > > +       for (;;) {                                                      \
> > > +               __old = cmpxchg((ptr), __val, __val | (mask));          \
> > > +               if (__old == __val)                                     \
> > > +                       break;                                          \
> > > +               __val = __old;                                          \
> > > +       }                                                               \
> > > +       __old;                                                          \
> > > +})
> > > +#endif
> > 
> > This is garbage.
> > 
> > This macro re-uses the "mask" argument potentially many many times, so
> > semantically it's very dubious.
> 
> So the below cures that; but do we want to maybe pull this back into
> sched.c and expose a version that operates on a fixed type instead?

So there are other problems with the macro - see the patch I just sent that tries 
to fix all of them.

> Although with xchg() and cmpxchg() we've already set a precedence for
> multi-width operators.

Yes, and if we name the new API 'xchg_or()' as I did it in my patch then it all 
nicely fits into the existing scheme.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int'
  2016-03-15  2:44 ` Linus Torvalds
  2016-03-15  8:42   ` Peter Zijlstra
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
@ 2016-03-15  9:53   ` Ingo Molnar
  2016-03-15 12:15     ` Ingo Molnar
  2 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15  9:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


Linus noticed that the new tick_dep_mask types introduced in:

  d027d45d8a17 ("nohz: New tick dependency mask")

... are sloppily defined as 'unsigned long' - which is wasteful
to carry just 4 bits and which may also create suboptimal data
types on 64-bit systems with word alignment padding holes in them.

Fix this by changing the type to the more natural 'unsigned int'.

(The xchg_or() API will work fine with 'unsigned int' as well.)

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h    |  4 ++--
 kernel/time/tick-sched.c | 11 +++++------
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c617ea12c6b7..6d1842bb7abd 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -718,7 +718,7 @@ struct signal_struct {
 	struct task_cputime cputime_expires;
 
 #ifdef CONFIG_NO_HZ_FULL
-	unsigned long tick_dep_mask;
+	unsigned int tick_dep_mask;
 #endif
 
 	struct list_head cpu_timers[3];
@@ -1548,7 +1548,7 @@ struct task_struct {
 #endif
 
 #ifdef CONFIG_NO_HZ_FULL
-	unsigned long tick_dep_mask;
+	unsigned int tick_dep_mask;
 #endif
 	unsigned long nvcsw, nivcsw; /* context switch counts */
 	u64 start_time;		/* monotonic time in nsec */
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 851631899352..fe0f57f3432f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -157,9 +157,9 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs)
 cpumask_var_t tick_nohz_full_mask;
 cpumask_var_t housekeeping_mask;
 bool tick_nohz_full_running;
-static unsigned long tick_dep_mask;
+static unsigned int tick_dep_mask;
 
-static void trace_tick_dependency(unsigned long dep)
+static void trace_tick_dependency(unsigned int dep)
 {
 	if (dep & TICK_DEP_MASK_POSIX_TIMER) {
 		trace_tick_stop(0, TICK_DEP_MASK_POSIX_TIMER);
@@ -259,10 +259,9 @@ static void tick_nohz_full_kick_all(void)
 	preempt_enable();
 }
 
-static void tick_nohz_dep_set_all(unsigned long *dep,
-				  enum tick_dep_bits bit)
+static void tick_nohz_dep_set_all(unsigned int *dep, enum tick_dep_bits bit)
 {
-	unsigned long prev;
+	unsigned int prev;
 
 	prev = xchg_or(dep, BIT_MASK(bit));
 	if (!prev)
@@ -289,7 +288,7 @@ void tick_nohz_dep_clear(enum tick_dep_bits bit)
  */
 void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 {
-	unsigned long prev;
+	unsigned int prev;
 	struct tick_sched *ts;
 
 	ts = per_cpu_ptr(&tick_cpu_sched, cpu);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
@ 2016-03-15 10:50     ` Peter Zijlstra
  2016-03-15 12:08       ` Ingo Molnar
  2016-03-15 11:06     ` Peter Zijlstra
                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 10:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 10:32:45AM +0100, Ingo Molnar wrote:
>  2) its naming sucks. "fetch_or()" does not really signal that it's a
>     fundamentally atomic operation, nor what API family it belongs to.

I disagree there, the fetch-$op naming is widely used for atomic
operations that return the previous value. See for example the C/C++11
atomic ops.

I've even thought about reworking our entire atomic*_t bits to match.
That is, introduce all the fetch_$op primitives, then convert all the
$op_return ones over and finally remove all the $op_return ones.

I've not done so because we're all so very used to $op_return that I'm
sure people (and this would very much include me) would curse me for
changing this.

The reason for fetch_$op is that it also works for irreversible
operations like or. With or_return you simply cannot tell what the
previous state was (with add_return you can do a simple subtraction to
revert to the prior state).

And yes, some people use the xchg-$op naming, but its less widely used
(x86 asm being one). Other also use swap-$op.

In any case, I prefer the name as it was.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
  2016-03-15 10:50     ` Peter Zijlstra
@ 2016-03-15 11:06     ` Peter Zijlstra
  2016-03-15 11:59     ` Peter Zijlstra
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 11:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 10:32:45AM +0100, Ingo Molnar wrote:
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 969e6704c3c9..851631899352 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -264,7 +264,7 @@ static void tick_nohz_dep_set_all(unsigned long *dep,
>  {
>  	unsigned long prev;
>  
> -	prev = fetch_or(dep, BIT_MASK(bit));
> +	prev = xchg_or(dep, BIT_MASK(bit));
>  	if (!prev)
>  		tick_nohz_full_kick_all();
>  }
> @@ -294,7 +294,7 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
>  
>  	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
>  
> -	prev = fetch_or(&ts->tick_dep_mask, BIT_MASK(bit));
> +	prev = xchg_or(&ts->tick_dep_mask, BIT_MASK(bit));
>  	if (!prev) {
>  		preempt_disable();
>  		/* Perf needs local kick that is NMI safe */

Those should probably also use BIT(), BIT_MASK() is specifically for
bitmasks, and should be combined with BIT_WORD().

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
  2016-03-15 10:50     ` Peter Zijlstra
  2016-03-15 11:06     ` Peter Zijlstra
@ 2016-03-15 11:59     ` Peter Zijlstra
  2016-03-15 12:01     ` Ingo Molnar
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 11:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 10:32:45AM +0100, Ingo Molnar wrote:
> +#ifndef xchg_or
> +# define xchg_or(ptr, mask)						\
> +({									\
> +	typeof(ptr)  __ptr  = (ptr);					\
> +	typeof(mask) __mask = (mask);					\
> +									\
> +	typeof(*(__ptr)) __old, __val = *__ptr;				\
> +									\
>  	for (;;) {							\
> +		__old = cmpxchg(__ptr, __val, __val | __mask);		\
>  		if (__old == __val)					\
>  			break;						\
>  		__val = __old;						\
>  	}								\
> +									\
>  	__old;								\
>  })

As reported by you this explodes, and it obvious from the generated asm
why:

     48e1: 89 c2                 mov    %eax,%edx
     48e3: 41 89 d0              mov    %edx,%r8d
     48e6: 31 c9                 xor    %ecx,%ecx
     48e8: 89 d0                 mov    %edx,%eax
     48ea: 41 83 c8 08           or     $0x8,%r8d
     48ee: f0 44 0f b1 01        lock cmpxchg %r8d,(%rcx)
     48f3: 39 c2                 cmp    %eax,%edx
     48f5: 75 ea                 jne    48e1 <resched_curr+0x31>

That's an unconditional NULL deref.

What happens is that __ptr from xchg_or() aliasses with __ptr from
cmpxchg() and weird stuff happens.

If you do: s/__ptr/_ptr/ or similar on the xchg_or() code it all works
again.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
                       ` (2 preceding siblings ...)
  2016-03-15 11:59     ` Peter Zijlstra
@ 2016-03-15 12:01     ` Ingo Molnar
  2016-03-15 12:32       ` Ingo Molnar
  2016-03-15 12:21     ` [PATCH v2] " Ingo Molnar
  2016-03-15 16:18     ` [PATCH] " Linus Torvalds
  5 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> Subject: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
> 
> Linus noticed a couple of problems with the fetch_or() implementation introduced 
> by 5fd7a09cfb8c ("atomic: Export fetch_or()"):
> 
>  - Sloppy macro implementation: 'mask' and 'ptr' is evaluated multiple times, 
>    which will break if arguments have side effects.

So this shiny new patch manages to crash the x86 kernel with a NULL pointer 
dereference:

  [    0.143027] BUG: unable to handle kernel NULL pointer dereference at           (null)
  [    0.144000] IP: [<ffffffff8107c64c>] resched_curr+0x3c/0xc0

GCC manages to turn this:

static bool set_nr_and_not_polling(struct task_struct *p)
{
        struct thread_info *ti = task_thread_info(p);
        return !(xchg_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
}

and this:

>  /**
> + * xchg_or - perform *ptr |= mask atomically and return old value of *ptr
> + * @ptr: pointer to value (cmpxchg() compatible integer pointer type)
>   * @mask: mask to OR on the value
>   *
> + * cmpxchg() based, it's a macro so it works for different integer types.
>   */
> +#ifndef xchg_or
> +# define xchg_or(ptr, mask)						\
> +({									\
> +	typeof(ptr)  __ptr  = (ptr);					\
> +	typeof(mask) __mask = (mask);					\
> +									\
> +	typeof(*(__ptr)) __old, __val = *__ptr;				\
> +									\
>  	for (;;) {							\
> +		__old = cmpxchg(__ptr, __val, __val | __mask);		\
>  		if (__old == __val)					\
>  			break;						\
>  		__val = __old;						\
>  	}								\
> +									\
>  	__old;								\
>  })

into:

    41c1:       89 c2                   mov    %eax,%edx
    41c3:       89 d6                   mov    %edx,%esi
    41c5:       31 c9                   xor    %ecx,%ecx
    41c7:       89 d0                   mov    %edx,%eax
    41c9:       83 ce 08                or     $0x8,%esi
    41cc:       f0 0f b1 31             lock cmpxchg %esi,(%rcx)

note the RCX zeroing via XOR...

The original, working sequence is:

    41c4:       89 c2                   mov    %eax,%edx
    41c6:       89 d6                   mov    %edx,%esi
    41c8:       89 d0                   mov    %edx,%eax
    41ca:       83 ce 08                or     $0x8,%esi
    41cd:       f0 0f b1 31             lock cmpxchg %esi,(%rcx)

The change that makes the difference is the 'ptr' part of:

> +		__old = cmpxchg(__ptr, __val, __val | __mask);		\

This variant works:

> +		__old = cmpxchg((ptr), __val, __val | __mask);		\

After a lot of staring PeterZ realized that __ptr aliases with the x86 cmpxchg() 
macro-jungle's __ptr name!!

So if I do a s/__ptr/_ptr it all works...

But IMHO this really highlights a fundamental weakness of all this macro magic, 
it's all way too fragile.

Why don't we introduce a boring family of APIs:

	cmpxchg_8()
	cmpxchg_16()
	cmpxchg_32()
	cmpxchg_64()

	xchg_or_32()
	xchg_or_64()
	...

... with none of this pesky auto-typing property and none of the 
macro-inside-a-macro crap? We could do clean types and would write them all in 
proper C, not fragile CPP.

It's not like we migrate between the types all that frequently - and even if we 
do, it's trivial.

hm?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 10:50     ` Peter Zijlstra
@ 2016-03-15 12:08       ` Ingo Molnar
  2016-03-15 12:42         ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 15, 2016 at 10:32:45AM +0100, Ingo Molnar wrote:
> >  2) its naming sucks. "fetch_or()" does not really signal that it's a
> >     fundamentally atomic operation, nor what API family it belongs to.
> 
> I disagree there, the fetch-$op naming is widely used for atomic
> operations that return the previous value. See for example the C/C++11
> atomic ops.

The problem I see is that we don't really have the fetch_*() naming in the kernel 
right now, while we do have the xchg_*() naming. The latter is 'obviously' an 
atomic operation - while 'fetch' could be anything.

No strong opinion, but I think fetch_or() is not a particularly good name.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int'
  2016-03-15  9:53   ` [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int' Ingo Molnar
@ 2016-03-15 12:15     ` Ingo Molnar
  2016-03-15 16:30       ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> Linus noticed that the new tick_dep_mask types introduced in:
> 
>   d027d45d8a17 ("nohz: New tick dependency mask")
> 
> ... are sloppily defined as 'unsigned long' - which is wasteful
> to carry just 4 bits and which may also create suboptimal data
> types on 64-bit systems with word alignment padding holes in them.
> 
> Fix this by changing the type to the more natural 'unsigned int'.
> 
> (The xchg_or() API will work fine with 'unsigned int' as well.)
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/sched.h    |  4 ++--
>  kernel/time/tick-sched.c | 11 +++++------
>  2 files changed, 7 insertions(+), 8 deletions(-)

Hm, so the problem I did not consider is the following:

 triton:~/tip> git grep tick_dep_mask kernel | grep _bit
 kernel/time/tick-sched.c:       clear_bit(bit, &tick_dep_mask);
 kernel/time/tick-sched.c:       clear_bit(bit, &ts->tick_dep_mask);
 kernel/time/tick-sched.c:       clear_bit(bit, &tsk->tick_dep_mask);
 kernel/time/tick-sched.c:       clear_bit(bit, &sig->tick_dep_mask);

and the bitops natural type (and in fact the only supported bitops type) is 
'unsigned long'.

So it's not that easy to change a bitmask over to unsigned int.

Suggestions?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
                       ` (3 preceding siblings ...)
  2016-03-15 12:01     ` Ingo Molnar
@ 2016-03-15 12:21     ` Ingo Molnar
  2016-03-15 13:26       ` Peter Zijlstra
  2016-03-15 17:08       ` Frederic Weisbecker
  2016-03-15 16:18     ` [PATCH] " Linus Torvalds
  5 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton

Linus noticed a couple of problems with the fetch_or() implementation introduced
by 5fd7a09cfb8c ("atomic: Export fetch_or()"):

 - Sloppy macro implementation: 'mask' and 'ptr' is evaluated multiple times,
   which will break if arguments have side effects. Also, it uses 
   double-underscore temporary variables - which is dangerous as low level asm 
   ones are using those too and they may alias in unexpected ways.

 - Sloppy semantics: the naming and structure of the macro is ambigious, with
   no well-defined semantics. It's neither an atomic nor a cmpxchg() interface,
   but still it lives in include/linux/atomic.h.

Solve this by:

 - Extracting the arguments into helper variables once, and never
   using the original arguments from that point on. This makes it
   pretty clear that there can be no unwanted macro evaluation
   side effects.

 - Using single-underscore temporary variables.

 - Renaming fetch_or() to xchg_or(), recognizing that the semantics
   are xchg()-alike.

Also propagate the rename to the call sites.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: http://lkml.kernel.org/r/20160315093245.GA7943@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/atomic.h   | 26 ++++++++++++++++----------
 kernel/sched/core.c      |  2 +-
 kernel/time/tick-sched.c |  4 ++--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/include/linux/atomic.h b/include/linux/atomic.h
index 6c502cb13c95..7bc5297bcca8 100644
--- a/include/linux/atomic.h
+++ b/include/linux/atomic.h
@@ -549,22 +549,28 @@ static inline int atomic_dec_if_positive(atomic_t *v)
 #endif
 
 /**
- * fetch_or - perform *ptr |= mask and return old value of *ptr
- * @ptr: pointer to value
+ * xchg_or - perform *ptr |= mask atomically and return old value of *ptr
+ * @ptr: pointer to value (cmpxchg() compatible integer pointer type)
  * @mask: mask to OR on the value
  *
- * cmpxchg based fetch_or, macro so it works for different integer types
+ * cmpxchg() based, it's a macro so it works for different integer types.
  */
-#ifndef fetch_or
-#define fetch_or(ptr, mask)						\
-({	typeof(*(ptr)) __old, __val = *(ptr);				\
+#ifndef xchg_or
+# define xchg_or(ptr, mask)						\
+({									\
+	typeof(ptr)  _ptr  = (ptr);					\
+	typeof(mask) _mask = (mask);					\
+									\
+	typeof(*_ptr) _old, _val = *_ptr;				\
+									\
 	for (;;) {							\
-		__old = cmpxchg((ptr), __val, __val | (mask));		\
-		if (__old == __val)					\
+		_old = cmpxchg(_ptr, _val, _val | _mask);		\
+		if (_old == _val)					\
 			break;						\
-		__val = __old;						\
+		_val = _old;						\
 	}								\
-	__old;								\
+									\
+	_old;								\
 })
 #endif
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e5725b931bee..bbd347c04826 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -329,7 +329,7 @@ static inline void init_hrtick(void)
 static bool set_nr_and_not_polling(struct task_struct *p)
 {
 	struct thread_info *ti = task_thread_info(p);
-	return !(fetch_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
+	return !(xchg_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLAG);
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 969e6704c3c9..851631899352 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -264,7 +264,7 @@ static void tick_nohz_dep_set_all(unsigned long *dep,
 {
 	unsigned long prev;
 
-	prev = fetch_or(dep, BIT_MASK(bit));
+	prev = xchg_or(dep, BIT_MASK(bit));
 	if (!prev)
 		tick_nohz_full_kick_all();
 }
@@ -294,7 +294,7 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
 
 	ts = per_cpu_ptr(&tick_cpu_sched, cpu);
 
-	prev = fetch_or(&ts->tick_dep_mask, BIT_MASK(bit));
+	prev = xchg_or(&ts->tick_dep_mask, BIT_MASK(bit));
 	if (!prev) {
 		preempt_disable();
 		/* Perf needs local kick that is NMI safe */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:01     ` Ingo Molnar
@ 2016-03-15 12:32       ` Ingo Molnar
  2016-03-15 12:37         ` Ingo Molnar
  2016-03-15 13:17         ` Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> But IMHO this really highlights a fundamental weakness of all this macro magic, 
> it's all way too fragile.
> 
> Why don't we introduce a boring family of APIs:
> 
> 	cmpxchg_8()
> 	cmpxchg_16()
> 	cmpxchg_32()
> 	cmpxchg_64()
> 
> 	xchg_or_32()
> 	xchg_or_64()
> 	...
> 
> ... with none of this pesky auto-typing property and none of the 
> macro-inside-a-macro crap? We could do clean types and would write them all in 
> proper C, not fragile CPP.
> 
> It's not like we migrate between the types all that frequently - and even if we 
> do, it's trivial.
> 
> hm?

So if we are still on the same page at this point, we'd have to add a pointer 
variant too I suspect:

	cmpxchg_ptr()
	xchg_ptr()

... whose bitness may differ between architectures(subarches), but it would still 
be a single variant per architecture, i.e. still with pretty clear type 
propagation and with a very clear notion of which architecture supports what.

It looks like a lot of work, but it's all low complexity work AFAICS that could be 
partly automated.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:32       ` Ingo Molnar
@ 2016-03-15 12:37         ` Ingo Molnar
  2016-03-15 13:17         ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Ingo Molnar @ 2016-03-15 12:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton


* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > But IMHO this really highlights a fundamental weakness of all this macro magic, 
> > it's all way too fragile.
> > 
> > Why don't we introduce a boring family of APIs:
> > 
> > 	cmpxchg_8()
> > 	cmpxchg_16()
> > 	cmpxchg_32()
> > 	cmpxchg_64()
> > 
> > 	xchg_or_32()
> > 	xchg_or_64()
> > 	...
> > 
> > ... with none of this pesky auto-typing property and none of the 
> > macro-inside-a-macro crap? We could do clean types and would write them all in 
> > proper C, not fragile CPP.
> > 
> > It's not like we migrate between the types all that frequently - and even if we 
> > do, it's trivial.
> > 
> > hm?
> 
> So if we are still on the same page at this point, we'd have to add a pointer 
> variant too I suspect:
> 
> 	cmpxchg_ptr()
> 	xchg_ptr()
> 
> ... whose bitness may differ between architectures(subarches), but it would still 
> be a single variant per architecture, i.e. still with pretty clear type 
> propagation and with a very clear notion of which architecture supports what.
> 
> It looks like a lot of work, but it's all low complexity work AFAICS that could be 
> partly automated.

Btw., if we do all this, we could still add auto-type API variants, but now they 
would be implemented at the highest level, with none of the auto-type complexity 
pushed down to the architecture level. Architectures just provide their set of 
APIs for a given list of types, and that's it.

I hate to see all the auto-typing complexity pushed down to the arch assembly 
level:

/*
 * Atomic compare and exchange.  Compare OLD with MEM, if identical,
 * store NEW in MEM.  Return the initial value in MEM.  Success is
 * indicated by comparing RETURN with OLD.
 */
#define __raw_cmpxchg(ptr, old, new, size, lock)			\
({									\
	__typeof__(*(ptr)) __ret;					\
	__typeof__(*(ptr)) __old = (old);				\
	__typeof__(*(ptr)) __new = (new);				\
	switch (size) {							\
	case __X86_CASE_B:						\
	{								\
		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
		asm volatile(lock "cmpxchgb %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "q" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_W:						\
	{								\
		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
		asm volatile(lock "cmpxchgw %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_L:						\
	{								\
		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
		asm volatile(lock "cmpxchgl %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_Q:						\
	{								\
		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
		asm volatile(lock "cmpxchgq %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	default:							\
		__cmpxchg_wrong_size();					\
	}								\
	__ret;								\
})

it makes things harder to read, harder to debug and harder to optimize.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:08       ` Ingo Molnar
@ 2016-03-15 12:42         ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 01:08:35PM +0100, Ingo Molnar wrote:
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, Mar 15, 2016 at 10:32:45AM +0100, Ingo Molnar wrote:
> > >  2) its naming sucks. "fetch_or()" does not really signal that it's a
> > >     fundamentally atomic operation, nor what API family it belongs to.
> > 
> > I disagree there, the fetch-$op naming is widely used for atomic
> > operations that return the previous value. See for example the C/C++11
> > atomic ops.
> 
> The problem I see is that we don't really have the fetch_*() naming in the kernel 
> right now, while we do have the xchg_*() naming. The latter is 'obviously' an 
> atomic operation - while 'fetch' could be anything.

We don't have xchg_*() naming, we have xchg() and that's about it. And
yes, people know xchg() is an atomic op. But 'fetch (and) or' is also
atomic, it has to be, it needs to do 2 operations in 1.

Furthermore, the relevant wikipedia page is:

  https://en.wikipedia.org/wiki/Fetch-and-add

So the naming is widely established.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:32       ` Ingo Molnar
  2016-03-15 12:37         ` Ingo Molnar
@ 2016-03-15 13:17         ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 13:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 01:32:53PM +0100, Ingo Molnar wrote:
> So if we are still on the same page at this point, we'd have to add a pointer 
> variant too I suspect:
> 
> 	cmpxchg_ptr()
> 	xchg_ptr()

We typically call those _long(), like atomic_long_t etc.. Under the
assumption that sizeof(long) == sizeof(void *).

Also, _IF_ we're going to do this, this is the perfect opportunity to
also introduce the __atomic address space and load() and store()
operations. This is needed for architectures where:

	cmpxchg()/xchg()

isn't safe against regular stores, like all those who implement atomics
using hashed spinlocked.

Its going to be a horrid lot of work though :/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:21     ` [PATCH v2] " Ingo Molnar
@ 2016-03-15 13:26       ` Peter Zijlstra
  2016-03-16  8:04         ` Ingo Molnar
  2016-03-15 17:08       ` Frederic Weisbecker
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-15 13:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Tue, Mar 15, 2016 at 01:21:45PM +0100, Ingo Molnar wrote:
>  - Renaming fetch_or() to xchg_or(), recognizing that the semantics
>    are xchg()-alike.

Let me add another argument for why I don't like the "exchange/swap
(and) add" naming.

Exchange (and swap) replace one value for another, like:

	old = xchg(ptr, val);

Whatever was there, gets replaced by the independent value in @val.
Straight up replacement.

However with something like xchg_or, you don't do a direct replacement
with an unrelated value. Instead you modify the pre-existing value. So
there really isn't an exchange at all.

So "fetch (and) or" really describes the operation better. You load
(fetch) the value and then modify it, in an indivisible (aka atomic)
fashion.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
                       ` (4 preceding siblings ...)
  2016-03-15 12:21     ` [PATCH v2] " Ingo Molnar
@ 2016-03-15 16:18     ` Linus Torvalds
  5 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2016-03-15 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton

On Tue, Mar 15, 2016 at 2:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> I can see two other problems with it as well:
>
>  1) 'ptr' may get evaluated multiple times as well, not just 'mask'.

Right you are.

>  2) its naming sucks. "fetch_or()" does not really signal that it's a
>     fundamentally atomic operation, nor what API family it belongs to.

Actually, I like the naming, I just don't like the implementation.

It's not just the "evaluated multiple times", it's also the fact that
if we start doing "fetch_or()", then some day we'll want "fetch_and()"
or "fetch_clear()" etc.

Or, in fact, "fetch_add()", which is often closer to what hardware
does than the "atomic_add_return()" we have now (the difference is
that "fetch_add()" returns the original value, while
"atomic_add_return()" returns the end result).

So what I meant with "try to match what we already have in our
existing atomic.h" is that we do the _infrastructure_ so well. We do
it well both by having separate UP and SMP versions of the underlying
helpers, but we do it well by having those helpers that are then used
to implement the different atomic versions.

So I'd like something similar for the "fetch_op" thing.

                   Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int'
  2016-03-15 12:15     ` Ingo Molnar
@ 2016-03-15 16:30       ` Linus Torvalds
  2016-03-15 17:28         ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2016-03-15 16:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linux Kernel Mailing List, Frédéric Weisbecker,
	Thomas Gleixner, Peter Zijlstra, Andrew Morton

On Tue, Mar 15, 2016 at 5:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Hm, so the problem I did not consider is the following:
>
>  triton:~/tip> git grep tick_dep_mask kernel | grep _bit
>  kernel/time/tick-sched.c:       clear_bit(bit, &ts->tick_dep_mask);
>  kernel/time/tick-sched.c:       clear_bit(bit, &tsk->tick_dep_mask);
>  kernel/time/tick-sched.c:       clear_bit(bit, &sig->tick_dep_mask);

Ahh. I missed that too.

Ok, leave the 64-bit field for now, while we think about this. But one
option is to just use "atomic_andnot()" instead of clear_bit().

That would imply using an "atomic_t", which would be fairly natural
(and would be 32-bit).

And wouldn't it be so nice if "thread_info->flags" would just be
atomic_t too. Right now we use a mixture of bit-ops and ACCESS_ONCE()
(and many codepaths then doing neither, and just accessing it
directly, ignoring any races.

Oh well.

                Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 12:21     ` [PATCH v2] " Ingo Molnar
  2016-03-15 13:26       ` Peter Zijlstra
@ 2016-03-15 17:08       ` Frederic Weisbecker
  2016-03-16  8:14         ` Ingo Molnar
  1 sibling, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2016-03-15 17:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton


Thanks a lot for fixing this!
Some comments below:

On Tue, Mar 15, 2016 at 01:21:45PM +0100, Ingo Molnar wrote:
> Linus noticed a couple of problems with the fetch_or() implementation introduced
> by 5fd7a09cfb8c ("atomic: Export fetch_or()"):
> 
>  - Sloppy macro implementation: 'mask' and 'ptr' is evaluated multiple times,
>    which will break if arguments have side effects. Also, it uses 
>    double-underscore temporary variables - which is dangerous as low level asm 
>    ones are using those too and they may alias in unexpected ways.
> 
>  - Sloppy semantics: the naming and structure of the macro is ambigious, with
>    no well-defined semantics. It's neither an atomic nor a cmpxchg() interface,
>    but still it lives in include/linux/atomic.h.
> 
> Solve this by:
> 
>  - Extracting the arguments into helper variables once, and never
>    using the original arguments from that point on. This makes it
>    pretty clear that there can be no unwanted macro evaluation
>    side effects.
> 
>  - Using single-underscore temporary variables.
> 
>  - Renaming fetch_or() to xchg_or(), recognizing that the semantics
>    are xchg()-alike.

I wouldn't mind that much having xchg_or() but I think Peterz has good arguments in favour
of keeping the original name.

> 
> Also propagate the rename to the call sites.
> 
> Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Chris Metcalf <cmetcalf@ezchip.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Frederic Weisbecker <fweisbec@gmail.com>
> Cc: Luiz Capitulino <lcapitulino@redhat.com>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Viresh Kumar <viresh.kumar@linaro.org>
> Link: http://lkml.kernel.org/r/20160315093245.GA7943@gmail.com
> Signed-off-by: Ingo Molnar <mingo@kernel.org>
> ---
>  include/linux/atomic.h   | 26 ++++++++++++++++----------
>  kernel/sched/core.c      |  2 +-
>  kernel/time/tick-sched.c |  4 ++--
>  3 files changed, 19 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 6c502cb13c95..7bc5297bcca8 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -549,22 +549,28 @@ static inline int atomic_dec_if_positive(atomic_t *v)
>  #endif
>  
>  /**
> - * fetch_or - perform *ptr |= mask and return old value of *ptr
> - * @ptr: pointer to value
> + * xchg_or - perform *ptr |= mask atomically and return old value of *ptr
> + * @ptr: pointer to value (cmpxchg() compatible integer pointer type)
>   * @mask: mask to OR on the value
>   *
> - * cmpxchg based fetch_or, macro so it works for different integer types
> + * cmpxchg() based, it's a macro so it works for different integer types.
>   */
> -#ifndef fetch_or
> -#define fetch_or(ptr, mask)						\
> -({	typeof(*(ptr)) __old, __val = *(ptr);				\
> +#ifndef xchg_or
> +# define xchg_or(ptr, mask)						\
> +({									\
> +	typeof(ptr)  _ptr  = (ptr);					\

Can we add a comment above to prevent from future mistakes with cmpxchg
variables aliasing?

I'm suprised that GCC doesn't warn about that actually.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int'
  2016-03-15 16:30       ` Linus Torvalds
@ 2016-03-15 17:28         ` Frederic Weisbecker
  2016-03-15 17:36           ` Linus Torvalds
  0 siblings, 1 reply; 27+ messages in thread
From: Frederic Weisbecker @ 2016-03-15 17:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Tue, Mar 15, 2016 at 09:30:49AM -0700, Linus Torvalds wrote:
> On Tue, Mar 15, 2016 at 5:15 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > Hm, so the problem I did not consider is the following:
> >
> >  triton:~/tip> git grep tick_dep_mask kernel | grep _bit
> >  kernel/time/tick-sched.c:       clear_bit(bit, &ts->tick_dep_mask);
> >  kernel/time/tick-sched.c:       clear_bit(bit, &tsk->tick_dep_mask);
> >  kernel/time/tick-sched.c:       clear_bit(bit, &sig->tick_dep_mask);

Ah! Now I remember why I used unsigned long :-s

> 
> Ahh. I missed that too.
> 
> Ok, leave the 64-bit field for now, while we think about this. But one
> option is to just use "atomic_andnot()" instead of clear_bit().
> 
> That would imply using an "atomic_t", which would be fairly natural
> (and would be 32-bit).

I can try that. And then we would end up with atomic_fetch_or() and maybe
the scheduler could keep its ad-hoc version until thread_info types get unified
(if that's possible).

> 
> And wouldn't it be so nice if "thread_info->flags" would just be
> atomic_t too. Right now we use a mixture of bit-ops and ACCESS_ONCE()
> (and many codepaths then doing neither, and just accessing it
> directly, ignoring any races.
> 
> Oh well.

I can try that too while at it :-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int'
  2016-03-15 17:28         ` Frederic Weisbecker
@ 2016-03-15 17:36           ` Linus Torvalds
  0 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2016-03-15 17:36 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Tue, Mar 15, 2016 at 10:28 AM, Frederic Weisbecker
<fweisbec@gmail.com> wrote:
>>
>> And wouldn't it be so nice if "thread_info->flags" would just be
>> atomic_t too. Right now we use a mixture of bit-ops and ACCESS_ONCE()
>> (and many codepaths then doing neither, and just accessing it
>> directly, ignoring any races.
>>
>> Oh well.
>
> I can try that too while at it :-)

The real problem with thread_info->flags is that it's
per-architecture, and people access that thing from assembly code.

So changing that is potentially really *very* painful. The patch
itself is not likely too bad, but having confidence that everything
was caught?

Nasty. It would probably be a very good cleanup, but I'm not sure how
worthwhile it is considering the pain.

(The only good news is that any problem is likely to be *very*
obvious. Switching the field to a 32-bit type on a big-endian machine
and not catching some case will mean that people will check or modify
the wrong flags, and you'd likely get breakage quite quickly)

                 Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 13:26       ` Peter Zijlstra
@ 2016-03-16  8:04         ` Ingo Molnar
  2016-03-16  8:29           ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-16  8:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, Mar 15, 2016 at 01:21:45PM +0100, Ingo Molnar wrote:
> >  - Renaming fetch_or() to xchg_or(), recognizing that the semantics
> >    are xchg()-alike.
> 
> Let me add another argument for why I don't like the "exchange/swap
> (and) add" naming.
> 
> Exchange (and swap) replace one value for another, like:
> 
> 	old = xchg(ptr, val);
> 
> Whatever was there, gets replaced by the independent value in @val.
> Straight up replacement.
> 
> However with something like xchg_or, you don't do a direct replacement
> with an unrelated value. Instead you modify the pre-existing value. So
> there really isn't an exchange at all.
> 
> So "fetch (and) or" really describes the operation better. You load
> (fetch) the value and then modify it, in an indivisible (aka atomic)
> fashion.

Ok!

Could we at least somehow sneak the notion of 'atomicity' into it?

	fetch_or()
	fetch_and()
	fetch_not()

vs.

	fetch_atomic_or()
	fetch_atomic_and()
	fetch_atomic_not()

vs.

	atomic_fetch_or()
	atomic_fetch_and()
	atomic_fetch_not()

?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-15 17:08       ` Frederic Weisbecker
@ 2016-03-16  8:14         ` Ingo Molnar
  2016-03-17  0:54           ` Frederic Weisbecker
  0 siblings, 1 reply; 27+ messages in thread
From: Ingo Molnar @ 2016-03-16  8:14 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton


* Frederic Weisbecker <fweisbec@gmail.com> wrote:

> > diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> > index 6c502cb13c95..7bc5297bcca8 100644
> > --- a/include/linux/atomic.h
> > +++ b/include/linux/atomic.h
> > @@ -549,22 +549,28 @@ static inline int atomic_dec_if_positive(atomic_t *v)
> >  #endif
> >  
> >  /**
> > - * fetch_or - perform *ptr |= mask and return old value of *ptr
> > - * @ptr: pointer to value
> > + * xchg_or - perform *ptr |= mask atomically and return old value of *ptr
> > + * @ptr: pointer to value (cmpxchg() compatible integer pointer type)
> >   * @mask: mask to OR on the value
> >   *
> > - * cmpxchg based fetch_or, macro so it works for different integer types
> > + * cmpxchg() based, it's a macro so it works for different integer types.
> >   */
> > -#ifndef fetch_or
> > -#define fetch_or(ptr, mask)						\
> > -({	typeof(*(ptr)) __old, __val = *(ptr);				\
> > +#ifndef xchg_or
> > +# define xchg_or(ptr, mask)						\
> > +({									\
> > +	typeof(ptr)  _ptr  = (ptr);					\
> 
> Can we add a comment above to prevent from future mistakes with cmpxchg
> variables aliasing?
> 
> I'm suprised that GCC doesn't warn about that actually.

Yeah, so in the perf tooling build we do have -Wshadow to catch such mishaps,
but not in the main kernel build.

... and yes, if I add it via the patch below the bug gets warned about:

 include/linux/atomic.h:561:15: note: shadowed declaration is here
   typeof(ptr)  __ptr  = (ptr);     \
               ^
 kernel/sched/core.c:332:11: note: in expansion of macro ‘xchg_or’
   return !(xchg_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLA

... but I also get a ton of other warnings, just when building a single 
kernel/sched/core.o file:

 ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
 ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
 include/linux/jiffies.h:422:60: warning: declaration of ‘jiffies’ shadows a global declaration [-Wshadow]
 ./arch/x86/include/asm/io_apic.h:187:54: warning: declaration of ‘apic’ shadows a global declaration [-Wshadow]
 ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
 include/linux/jiffies.h:422:60: warning: declaration of ‘jiffies’ shadows a global declaration [-Wshadow]
 ./arch/x86/include/asm/io_apic.h:187:54: warning: declaration of ‘apic’ shadows a global declaration [-Wshadow]
 include/linux/kernel.h:750:12: warning: declaration of ‘_min1’ shadows a previous local [-Wshadow]
 include/linux/kernel.h:750:12: warning: declaration of ‘_min1’ shadows a previous local [-Wshadow]
 include/linux/kernel.h:751:12: warning: declaration of ‘_min2’ shadows a previous local [-Wshadow]
 kernel/sched/sched.h:308:43: warning: declaration of ‘down’ shadows a global declaration [-Wshadow]
 kernel/sched/sched.h:308:60: warning: declaration of ‘up’ shadows a global declaration [-Wshadow]
 kernel/sched/auto_group.h:44:55: warning: declaration of ‘init_task’ shadows a global declaration [-Wshadow]
 kernel/sched/core.c:635:20: warning: declaration of ‘down’ shadows a global declaration [-Wshadow]
 kernel/sched/core.c:635:37: warning: declaration of ‘up’ shadows a global declaration [-Wshadow]

and yes, I'd say most of these are signatures of sloppy macros and sloppy variable 
names - but it would be a ton of work to eliminate these warnings.

Thanks,

	Ingo

==============>

 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 7b3ecdcdc6c1..14b0ce82f2b0 100644
--- a/Makefile
+++ b/Makefile
@@ -776,6 +776,9 @@ KBUILD_CFLAGS   += $(call cc-option,-Werror=date-time)
 # use the deterministic mode of AR if available
 KBUILD_ARFLAGS := $(call ar-option,D)
 
+# enforce correct pointer usage
+KBUILD_CFLAGS	+= $(call cc-option,-Wshadow)
+
 # check for 'asm goto'
 ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y)
 	KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-16  8:04         ` Ingo Molnar
@ 2016-03-16  8:29           ` Peter Zijlstra
  0 siblings, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2016-03-16  8:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List,
	Frédéric Weisbecker, Thomas Gleixner, Andrew Morton

On Wed, Mar 16, 2016 at 09:04:37AM +0100, Ingo Molnar wrote:
> 	atomic_fetch_or()
> 	atomic_fetch_and()
> 	atomic_fetch_not()

Sure we can go add those. I have a flight scheduled early next month,
I'll try and have a go at adding all the fetch_$op variants to all
arches.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()'
  2016-03-16  8:14         ` Ingo Molnar
@ 2016-03-17  0:54           ` Frederic Weisbecker
  0 siblings, 0 replies; 27+ messages in thread
From: Frederic Weisbecker @ 2016-03-17  0:54 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner,
	Peter Zijlstra, Andrew Morton

On Wed, Mar 16, 2016 at 09:14:44AM +0100, Ingo Molnar wrote:
> Yeah, so in the perf tooling build we do have -Wshadow to catch such mishaps,
> but not in the main kernel build.
> 
> ... and yes, if I add it via the patch below the bug gets warned about:
> 
>  include/linux/atomic.h:561:15: note: shadowed declaration is here
>    typeof(ptr)  __ptr  = (ptr);     \
>                ^
>  kernel/sched/core.c:332:11: note: in expansion of macro ‘xchg_or’
>    return !(xchg_or(&ti->flags, _TIF_NEED_RESCHED) & _TIF_POLLING_NRFLA
> 
> ... but I also get a ton of other warnings, just when building a single 
> kernel/sched/core.o file:
> 
>  ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
>  ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
>  include/linux/jiffies.h:422:60: warning: declaration of ‘jiffies’ shadows a global declaration [-Wshadow]
>  ./arch/x86/include/asm/io_apic.h:187:54: warning: declaration of ‘apic’ shadows a global declaration [-Wshadow]
>  ./arch/x86/include/asm/bitops.h:396:28: warning: declaration of ‘ffs’ shadows a built-in function [-Wshadow]
>  include/linux/jiffies.h:422:60: warning: declaration of ‘jiffies’ shadows a global declaration [-Wshadow]
>  ./arch/x86/include/asm/io_apic.h:187:54: warning: declaration of ‘apic’ shadows a global declaration [-Wshadow]
>  include/linux/kernel.h:750:12: warning: declaration of ‘_min1’ shadows a previous local [-Wshadow]
>  include/linux/kernel.h:750:12: warning: declaration of ‘_min1’ shadows a previous local [-Wshadow]
>  include/linux/kernel.h:751:12: warning: declaration of ‘_min2’ shadows a previous local [-Wshadow]
>  kernel/sched/sched.h:308:43: warning: declaration of ‘down’ shadows a global declaration [-Wshadow]
>  kernel/sched/sched.h:308:60: warning: declaration of ‘up’ shadows a global declaration [-Wshadow]
>  kernel/sched/auto_group.h:44:55: warning: declaration of ‘init_task’ shadows a global declaration [-Wshadow]
>  kernel/sched/core.c:635:20: warning: declaration of ‘down’ shadows a global declaration [-Wshadow]
>  kernel/sched/core.c:635:37: warning: declaration of ‘up’ shadows a global declaration [-Wshadow]

Heh, what else can we expect from global functions named up() and down() in a millions-lines C project :-)

> 
> and yes, I'd say most of these are signatures of sloppy macros and sloppy variable 
> names - but it would be a ton of work to eliminate these warnings.

Yeah that's what I was afraid of.

Thanks for trying it though!

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-03-17  0:54 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-14 12:32 [GIT PULL] NOHZ updates for v4.6 Ingo Molnar
2016-03-15  2:44 ` Linus Torvalds
2016-03-15  8:42   ` Peter Zijlstra
2016-03-15  9:49     ` Ingo Molnar
2016-03-15  9:32   ` [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg_or()' Ingo Molnar
2016-03-15 10:50     ` Peter Zijlstra
2016-03-15 12:08       ` Ingo Molnar
2016-03-15 12:42         ` Peter Zijlstra
2016-03-15 11:06     ` Peter Zijlstra
2016-03-15 11:59     ` Peter Zijlstra
2016-03-15 12:01     ` Ingo Molnar
2016-03-15 12:32       ` Ingo Molnar
2016-03-15 12:37         ` Ingo Molnar
2016-03-15 13:17         ` Peter Zijlstra
2016-03-15 12:21     ` [PATCH v2] " Ingo Molnar
2016-03-15 13:26       ` Peter Zijlstra
2016-03-16  8:04         ` Ingo Molnar
2016-03-16  8:29           ` Peter Zijlstra
2016-03-15 17:08       ` Frederic Weisbecker
2016-03-16  8:14         ` Ingo Molnar
2016-03-17  0:54           ` Frederic Weisbecker
2016-03-15 16:18     ` [PATCH] " Linus Torvalds
2016-03-15  9:53   ` [PATCH] nohz: Change tick_dep_mask from 'unsigned long' to 'unsigned int' Ingo Molnar
2016-03-15 12:15     ` Ingo Molnar
2016-03-15 16:30       ` Linus Torvalds
2016-03-15 17:28         ` Frederic Weisbecker
2016-03-15 17:36           ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).