linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] thermal/intel_powerclamp: Conversion to kthread worker API and new CPU hotplug state
@ 2016-10-17 12:32 Petr Mladek
  2016-10-17 12:32 ` [PATCH 1/3] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Petr Mladek @ 2016-10-17 12:32 UTC (permalink / raw)
  To: Jacob Pan, Zhang Rui, Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Eduardo Valentin, Tejun Heo,
	Peter Zijlstra, linux-pm, linux-kernel, Petr Mladek

The kthread worker API enhancements are in 4.9-rc1. Therefore
we could finally convert the intel_powerclamp kthreads to it.

The API hides a rather tricky code for the sleeping, freezing,
and exiting checks. It should help to avoid races and maintain
these operations.

Sebastian asked me to send also the conversion to the new
CPU hotplug state machine on top of the kthread conversion.

IMPORTANT: 

I have tested this on top of 4.9-rc1. But I needed to add
("sched/fair: Fix sched domains NULL deference in select_idle_sibling()")
from linux-tip, see
https://lkml.kernel.org/r/tip-9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441@git.kernel.org
Otherwise, the CPU hotplug failed very often.


Petr Mladek (2):
  thermal/intel_powerclamp: Remove duplicated code that starts the
    kthread
  thermal/intel_powerclamp: Convert the kthread to kthread worker API

Sebastian Andrzej Siewior (1):
  thermal/intel_powerclamp: Convert to CPU hotplug state

 drivers/thermal/intel_powerclamp.c | 374 +++++++++++++++++++++----------------
 1 file changed, 209 insertions(+), 165 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/3] thermal/intel_powerclamp: Remove duplicated code that starts the kthread
  2016-10-17 12:32 [PATCH 0/3] thermal/intel_powerclamp: Conversion to kthread worker API and new CPU hotplug state Petr Mladek
@ 2016-10-17 12:32 ` Petr Mladek
  2016-10-17 12:32 ` [PATCH 2/3] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
  2016-10-17 12:32 ` [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state Petr Mladek
  2 siblings, 0 replies; 17+ messages in thread
From: Petr Mladek @ 2016-10-17 12:32 UTC (permalink / raw)
  To: Jacob Pan, Zhang Rui, Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Eduardo Valentin, Tejun Heo,
	Peter Zijlstra, linux-pm, linux-kernel, Petr Mladek

This patch removes a code duplication. It does not modify
the functionality.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: linux-pm@vger.kernel.org
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/thermal/intel_powerclamp.c | 45 +++++++++++++++++---------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 0e4dc0afcfd2..63657d193db5 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -508,10 +508,27 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
+static void start_power_clamp_thread(unsigned long cpu)
+{
+	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
+	struct task_struct *thread;
+
+	thread = kthread_create_on_node(clamp_thread,
+					(void *) cpu,
+					cpu_to_node(cpu),
+					"kidle_inject/%ld", cpu);
+	if (IS_ERR(thread))
+		return;
+
+	/* bind to cpu here */
+	kthread_bind(thread, cpu);
+	wake_up_process(thread);
+	*p = thread;
+}
+
 static int start_power_clamp(void)
 {
 	unsigned long cpu;
-	struct task_struct *thread;
 
 	set_target_ratio = clamp(set_target_ratio, 0U, MAX_TARGET_RATIO - 1);
 	/* prevent cpu hotplug */
@@ -527,20 +544,7 @@ static int start_power_clamp(void)
 
 	/* start one thread per online cpu */
 	for_each_online_cpu(cpu) {
-		struct task_struct **p =
-			per_cpu_ptr(powerclamp_thread, cpu);
-
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%ld", cpu);
-		/* bind to cpu here */
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*p = thread;
-		}
-
+		start_power_clamp_thread(cpu);
 	}
 	put_online_cpus();
 
@@ -572,7 +576,6 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct *thread;
 	struct task_struct **percpu_thread =
 		per_cpu_ptr(powerclamp_thread, cpu);
 
@@ -581,15 +584,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 
 	switch (action) {
 	case CPU_ONLINE:
-		thread = kthread_create_on_node(clamp_thread,
-						(void *) cpu,
-						cpu_to_node(cpu),
-						"kidle_inject/%lu", cpu);
-		if (likely(!IS_ERR(thread))) {
-			kthread_bind(thread, cpu);
-			wake_up_process(thread);
-			*percpu_thread = thread;
-		}
+		start_power_clamp_thread(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/3] thermal/intel_powerclamp: Convert the kthread to kthread worker API
  2016-10-17 12:32 [PATCH 0/3] thermal/intel_powerclamp: Conversion to kthread worker API and new CPU hotplug state Petr Mladek
  2016-10-17 12:32 ` [PATCH 1/3] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
@ 2016-10-17 12:32 ` Petr Mladek
  2016-10-17 12:32 ` [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state Petr Mladek
  2 siblings, 0 replies; 17+ messages in thread
From: Petr Mladek @ 2016-10-17 12:32 UTC (permalink / raw)
  To: Jacob Pan, Zhang Rui, Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Eduardo Valentin, Tejun Heo,
	Peter Zijlstra, linux-pm, linux-kernel, Petr Mladek

Kthreads are currently implemented as an infinite loop. Each
has its own variant of checks for terminating, freezing,
awakening. In many cases it is unclear to say in which state
it is and sometimes it is done a wrong way.

The plan is to convert kthreads into kthread_worker or workqueues
API. It allows to split the functionality into separate operations.
It helps to make a better structure. Also it defines a clean state
where no locks are taken, IRQs blocked, the kthread might sleep
or even be safely migrated.

The kthread worker API is useful when we want to have a dedicated
single thread for the work. It helps to make sure that it is
available when needed. Also it allows a better control, e.g.
define a scheduling priority.

This patch converts the intel powerclamp kthreads into the kthread
worker because they need to have a good control over the assigned
CPUs.

IMHO, the most natural way is to split one cycle into two works.
First one does some balancing and let the CPU work normal
way for some time. The second work checks what the CPU has done
in the meantime and put it into C-state to reach the required
idle time ratio. The delay between the two works is achieved
by the delayed kthread work.

The two works have to share some data that used to be local
variables of the single kthread function. This is achieved
by the new per-CPU struct kthread_worker_data. It might look
as a complication. On the other hand, the long original kthread
function was not nice either.

The patch tries to avoid extra init and cleanup works. All the
actions might be done outside the thread. They are moved
to the functions that create or destroy the worker. Especially,
I checked that the timers are assigned to the right CPU.

The two works are queuing each other. It makes it a bit tricky to
break it when we want to stop the worker. We use the global and
per-worker "clamping" variables to make sure that the re-queuing
eventually stops. We also cancel the works to make it faster.
Note that the canceling is not reliable because the handling
of the two variables and queuing is not synchronized via a lock.
But it is not a big deal because it is just an optimization.
The job is stopped faster than before in most cases.

Signed-off-by: Petr Mladek <pmladek@suse.com>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Eduardo Valentin <edubezval@gmail.com>
CC: Jacob Pan <jacob.jun.pan@linux.intel.com>
CC: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
CC: linux-pm@vger.kernel.org
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/thermal/intel_powerclamp.c | 292 +++++++++++++++++++++----------------
 1 file changed, 170 insertions(+), 122 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 63657d193db5..a94f7c849a4e 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -86,11 +86,27 @@
 				  */
 static bool clamping;
 
+static const struct sched_param sparam = {
+	.sched_priority = MAX_USER_RT_PRIO / 2,
+};
+struct powerclamp_worker_data {
+	struct kthread_worker *worker;
+	struct kthread_work balancing_work;
+	struct kthread_delayed_work idle_injection_work;
+	struct timer_list wakeup_timer;
+	unsigned int cpu;
+	unsigned int count;
+	unsigned int guard;
+	unsigned int window_size_now;
+	unsigned int target_ratio;
+	unsigned int duration_jiffies;
+	bool clamping;
+};
 
-static struct task_struct * __percpu *powerclamp_thread;
+static struct powerclamp_worker_data * __percpu worker_data;
 static struct thermal_cooling_device *cooling_dev;
 static unsigned long *cpu_clamping_mask;  /* bit map for tracking per cpu
-					   * clamping thread
+					   * clamping kthread worker
 					   */
 
 static unsigned int duration;
@@ -368,103 +384,104 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio,
 	return set_target_ratio + guard <= current_ratio;
 }
 
-static int clamp_thread(void *arg)
+static void clamp_balancing_func(struct kthread_work *work)
 {
-	int cpunr = (unsigned long)arg;
-	DEFINE_TIMER(wakeup_timer, noop_timer, 0, 0);
-	static const struct sched_param param = {
-		.sched_priority = MAX_USER_RT_PRIO/2,
-	};
-	unsigned int count = 0;
-	unsigned int target_ratio;
+	struct powerclamp_worker_data *w_data;
+	int sleeptime;
+	unsigned long target_jiffies;
+	unsigned int compensated_ratio;
+	int interval; /* jiffies to sleep for each attempt */
 
-	set_bit(cpunr, cpu_clamping_mask);
-	set_freezable();
-	init_timer_on_stack(&wakeup_timer);
-	sched_setscheduler(current, SCHED_FIFO, &param);
-
-	while (true == clamping && !kthread_should_stop() &&
-		cpu_online(cpunr)) {
-		int sleeptime;
-		unsigned long target_jiffies;
-		unsigned int guard;
-		unsigned int compensated_ratio;
-		int interval; /* jiffies to sleep for each attempt */
-		unsigned int duration_jiffies = msecs_to_jiffies(duration);
-		unsigned int window_size_now;
-
-		try_to_freeze();
-		/*
-		 * make sure user selected ratio does not take effect until
-		 * the next round. adjust target_ratio if user has changed
-		 * target such that we can converge quickly.
-		 */
-		target_ratio = set_target_ratio;
-		guard = 1 + target_ratio/20;
-		window_size_now = window_size;
-		count++;
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      balancing_work);
 
-		/*
-		 * systems may have different ability to enter package level
-		 * c-states, thus we need to compensate the injected idle ratio
-		 * to achieve the actual target reported by the HW.
-		 */
-		compensated_ratio = target_ratio +
-			get_compensation(target_ratio);
-		if (compensated_ratio <= 0)
-			compensated_ratio = 1;
-		interval = duration_jiffies * 100 / compensated_ratio;
-
-		/* align idle time */
-		target_jiffies = roundup(jiffies, interval);
-		sleeptime = target_jiffies - jiffies;
-		if (sleeptime <= 0)
-			sleeptime = 1;
-		schedule_timeout_interruptible(sleeptime);
-		/*
-		 * only elected controlling cpu can collect stats and update
-		 * control parameters.
-		 */
-		if (cpunr == control_cpu && !(count%window_size_now)) {
-			should_skip =
-				powerclamp_adjust_controls(target_ratio,
-							guard, window_size_now);
-			smp_mb();
-		}
+	/*
+	 * make sure user selected ratio does not take effect until
+	 * the next round. adjust target_ratio if user has changed
+	 * target such that we can converge quickly.
+	 */
+	w_data->target_ratio = READ_ONCE(set_target_ratio);
+	w_data->guard = 1 + w_data->target_ratio / 20;
+	w_data->window_size_now = window_size;
+	w_data->duration_jiffies = msecs_to_jiffies(duration);
+	w_data->count++;
+
+	/*
+	 * systems may have different ability to enter package level
+	 * c-states, thus we need to compensate the injected idle ratio
+	 * to achieve the actual target reported by the HW.
+	 */
+	compensated_ratio = w_data->target_ratio +
+		get_compensation(w_data->target_ratio);
+	if (compensated_ratio <= 0)
+		compensated_ratio = 1;
+	interval = w_data->duration_jiffies * 100 / compensated_ratio;
+
+	/* align idle time */
+	target_jiffies = roundup(jiffies, interval);
+	sleeptime = target_jiffies - jiffies;
+	if (sleeptime <= 0)
+		sleeptime = 1;
+
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		kthread_queue_delayed_work(w_data->worker,
+					   &w_data->idle_injection_work,
+					   sleeptime);
+}
+
+static void clamp_idle_injection_func(struct kthread_work *work)
+{
+	struct powerclamp_worker_data *w_data;
+	unsigned long target_jiffies;
+
+	w_data = container_of(work, struct powerclamp_worker_data,
+			      idle_injection_work.work);
+
+	/*
+	 * only elected controlling cpu can collect stats and update
+	 * control parameters.
+	 */
+	if (w_data->cpu == control_cpu &&
+	    !(w_data->count % w_data->window_size_now)) {
+		should_skip =
+			powerclamp_adjust_controls(w_data->target_ratio,
+						   w_data->guard,
+						   w_data->window_size_now);
+		smp_mb();
+	}
 
-		if (should_skip)
-			continue;
+	if (should_skip)
+		goto balance;
+
+	target_jiffies = jiffies + w_data->duration_jiffies;
+	mod_timer(&w_data->wakeup_timer, target_jiffies);
+	if (unlikely(local_softirq_pending()))
+		goto balance;
+	/*
+	 * stop tick sched during idle time, interrupts are still
+	 * allowed. thus jiffies are updated properly.
+	 */
+	preempt_disable();
+	/* mwait until target jiffies is reached */
+	while (time_before(jiffies, target_jiffies)) {
+		unsigned long ecx = 1;
+		unsigned long eax = target_mwait;
 
-		target_jiffies = jiffies + duration_jiffies;
-		mod_timer(&wakeup_timer, target_jiffies);
-		if (unlikely(local_softirq_pending()))
-			continue;
 		/*
-		 * stop tick sched during idle time, interrupts are still
-		 * allowed. thus jiffies are updated properly.
+		 * REVISIT: may call enter_idle() to notify drivers who
+		 * can save power during cpu idle. same for exit_idle()
 		 */
-		preempt_disable();
-		/* mwait until target jiffies is reached */
-		while (time_before(jiffies, target_jiffies)) {
-			unsigned long ecx = 1;
-			unsigned long eax = target_mwait;
-
-			/*
-			 * REVISIT: may call enter_idle() to notify drivers who
-			 * can save power during cpu idle. same for exit_idle()
-			 */
-			local_touch_nmi();
-			stop_critical_timings();
-			mwait_idle_with_hints(eax, ecx);
-			start_critical_timings();
-			atomic_inc(&idle_wakeup_counter);
-		}
-		preempt_enable();
+		local_touch_nmi();
+		stop_critical_timings();
+		mwait_idle_with_hints(eax, ecx);
+		start_critical_timings();
+		atomic_inc(&idle_wakeup_counter);
 	}
-	del_timer_sync(&wakeup_timer);
-	clear_bit(cpunr, cpu_clamping_mask);
+	preempt_enable();
 
-	return 0;
+balance:
+	if (clamping && w_data->clamping && cpu_online(w_data->cpu))
+		kthread_queue_work(w_data->worker, &w_data->balancing_work);
 }
 
 /*
@@ -508,22 +525,58 @@ static void poll_pkg_cstate(struct work_struct *dummy)
 		schedule_delayed_work(&poll_pkg_cstate_work, HZ);
 }
 
-static void start_power_clamp_thread(unsigned long cpu)
+static void start_power_clamp_worker(unsigned long cpu)
 {
-	struct task_struct **p = per_cpu_ptr(powerclamp_thread, cpu);
-	struct task_struct *thread;
-
-	thread = kthread_create_on_node(clamp_thread,
-					(void *) cpu,
-					cpu_to_node(cpu),
-					"kidle_inject/%ld", cpu);
-	if (IS_ERR(thread))
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+	struct kthread_worker *worker;
+
+	worker = kthread_create_worker_on_cpu(cpu, KTW_FREEZABLE,
+					      "kidle_inject/%ld", cpu);
+	if (IS_ERR(worker))
 		return;
 
-	/* bind to cpu here */
-	kthread_bind(thread, cpu);
-	wake_up_process(thread);
-	*p = thread;
+	w_data->worker = worker;
+	w_data->count = 0;
+	w_data->cpu = cpu;
+	w_data->clamping = true;
+	set_bit(cpu, cpu_clamping_mask);
+	setup_timer(&w_data->wakeup_timer, noop_timer, 0);
+	sched_setscheduler(worker->task, SCHED_FIFO, &sparam);
+	kthread_init_work(&w_data->balancing_work, clamp_balancing_func);
+	kthread_init_delayed_work(&w_data->idle_injection_work,
+				  clamp_idle_injection_func);
+	kthread_queue_work(w_data->worker, &w_data->balancing_work);
+}
+
+static void stop_power_clamp_worker(unsigned long cpu)
+{
+	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
+
+	if (!w_data->worker)
+		return;
+
+	w_data->clamping = false;
+	/*
+	 * Make sure that all works that get queued after this point see
+	 * the clamping disabled. The counter part is not needed because
+	 * there is an implicit memory barrier when the queued work
+	 * is proceed.
+	 */
+	smp_wmb();
+	kthread_cancel_work_sync(&w_data->balancing_work);
+	kthread_cancel_delayed_work_sync(&w_data->idle_injection_work);
+	/*
+	 * The balancing work still might be queued here because
+	 * the handling of the "clapming" variable, cancel, and queue
+	 * operations are not synchronized via a lock. But it is not
+	 * a big deal. The balancing work is fast and destroy kthread
+	 * will wait for it.
+	 */
+	del_timer_sync(&w_data->wakeup_timer);
+	clear_bit(w_data->cpu, cpu_clamping_mask);
+	kthread_destroy_worker(w_data->worker);
+
+	w_data->worker = NULL;
 }
 
 static int start_power_clamp(void)
@@ -542,9 +595,9 @@ static int start_power_clamp(void)
 	clamping = true;
 	schedule_delayed_work(&poll_pkg_cstate_work, 0);
 
-	/* start one thread per online cpu */
+	/* start one kthread worker per online cpu */
 	for_each_online_cpu(cpu) {
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 	}
 	put_online_cpus();
 
@@ -554,20 +607,17 @@ static int start_power_clamp(void)
 static void end_power_clamp(void)
 {
 	int i;
-	struct task_struct *thread;
 
-	clamping = false;
 	/*
-	 * make clamping visible to other cpus and give per cpu clamping threads
-	 * sometime to exit, or gets killed later.
+	 * Block requeuing in all the kthread workers. They will flush and
+	 * stop faster.
 	 */
-	smp_mb();
-	msleep(20);
+	clamping = false;
 	if (bitmap_weight(cpu_clamping_mask, num_possible_cpus())) {
 		for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) {
-			pr_debug("clamping thread for cpu %d alive, kill\n", i);
-			thread = *per_cpu_ptr(powerclamp_thread, i);
-			kthread_stop(thread);
+			pr_debug("clamping worker for cpu %d alive, destroy\n",
+				 i);
+			stop_power_clamp_worker(i);
 		}
 	}
 }
@@ -576,15 +626,13 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 				unsigned long action, void *hcpu)
 {
 	unsigned long cpu = (unsigned long)hcpu;
-	struct task_struct **percpu_thread =
-		per_cpu_ptr(powerclamp_thread, cpu);
 
 	if (false == clamping)
 		goto exit_ok;
 
 	switch (action) {
 	case CPU_ONLINE:
-		start_power_clamp_thread(cpu);
+		start_power_clamp_worker(cpu);
 		/* prefer BSP as controlling CPU */
 		if (cpu == 0) {
 			control_cpu = 0;
@@ -595,7 +643,7 @@ static int powerclamp_cpu_callback(struct notifier_block *nfb,
 		if (test_bit(cpu, cpu_clamping_mask)) {
 			pr_err("cpu %lu dead but powerclamping thread is not\n",
 				cpu);
-			kthread_stop(*percpu_thread);
+			stop_power_clamp_worker(cpu);
 		}
 		if (cpu == control_cpu) {
 			control_cpu = smp_processor_id();
@@ -759,8 +807,8 @@ static int __init powerclamp_init(void)
 	window_size = 2;
 	register_hotcpu_notifier(&powerclamp_cpu_notifier);
 
-	powerclamp_thread = alloc_percpu(struct task_struct *);
-	if (!powerclamp_thread) {
+	worker_data = alloc_percpu(struct powerclamp_worker_data);
+	if (!worker_data) {
 		retval = -ENOMEM;
 		goto exit_unregister;
 	}
@@ -780,7 +828,7 @@ static int __init powerclamp_init(void)
 	return 0;
 
 exit_free_thread:
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 exit_unregister:
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 exit_free:
@@ -793,7 +841,7 @@ static void __exit powerclamp_exit(void)
 {
 	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
-	free_percpu(powerclamp_thread);
+	free_percpu(worker_data);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-17 12:32 [PATCH 0/3] thermal/intel_powerclamp: Conversion to kthread worker API and new CPU hotplug state Petr Mladek
  2016-10-17 12:32 ` [PATCH 1/3] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
  2016-10-17 12:32 ` [PATCH 2/3] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
@ 2016-10-17 12:32 ` Petr Mladek
  2016-10-21 20:21   ` Jacob Pan
  2 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-10-17 12:32 UTC (permalink / raw)
  To: Jacob Pan, Zhang Rui, Thomas Gleixner
  Cc: Sebastian Andrzej Siewior, Eduardo Valentin, Tejun Heo,
	Peter Zijlstra, linux-pm, linux-kernel, Petr Mladek

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

This is a conversation to the new hotplug state machine with
the difference that CPU_DEAD becomes CPU_PREDOWN.

At the same time it makes the handling of the two states symmetrical.
stop_power_clamp_worker() is called unconditionally and the controversial
error message is removed.

Finally, the hotplug state callbacks are removed after the powerclamping
is stopped to avoid a potential race.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[pmladek@suse.com: Fixed the possible race in powerclamp_exit()]
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 drivers/thermal/intel_powerclamp.c | 69 +++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 34 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index a94f7c849a4e..390e50b97324 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -622,43 +622,35 @@ static void end_power_clamp(void)
 	}
 }
 
-static int powerclamp_cpu_callback(struct notifier_block *nfb,
-				unsigned long action, void *hcpu)
+static int powerclamp_cpu_online(unsigned int cpu)
 {
-	unsigned long cpu = (unsigned long)hcpu;
+	if (clamping == false)
+		return 0;
+	start_power_clamp_worker(cpu);
+	/* prefer BSP as controlling CPU */
+	if (cpu == 0) {
+		control_cpu = 0;
+		smp_mb();
+	}
+	return 0;
+}
 
-	if (false == clamping)
-		goto exit_ok;
+static int powerclamp_cpu_predown(unsigned int cpu)
+{
+	if (clamping == false)
+		return 0;
 
-	switch (action) {
-	case CPU_ONLINE:
-		start_power_clamp_worker(cpu);
-		/* prefer BSP as controlling CPU */
-		if (cpu == 0) {
-			control_cpu = 0;
-			smp_mb();
-		}
-		break;
-	case CPU_DEAD:
-		if (test_bit(cpu, cpu_clamping_mask)) {
-			pr_err("cpu %lu dead but powerclamping thread is not\n",
-				cpu);
-			stop_power_clamp_worker(cpu);
-		}
-		if (cpu == control_cpu) {
-			control_cpu = smp_processor_id();
-			smp_mb();
-		}
-	}
+	stop_power_clamp_worker(cpu);
+	if (cpu != control_cpu)
+		return 0;
 
-exit_ok:
-	return NOTIFY_OK;
+	control_cpu = cpumask_first(cpu_online_mask);
+	if (control_cpu == cpu)
+		control_cpu = cpumask_next(cpu, cpu_online_mask);
+	smp_mb();
+	return 0;
 }
 
-static struct notifier_block powerclamp_cpu_notifier = {
-	.notifier_call = powerclamp_cpu_callback,
-};
-
 static int powerclamp_get_max_state(struct thermal_cooling_device *cdev,
 				 unsigned long *state)
 {
@@ -788,6 +780,8 @@ static inline void powerclamp_create_debug_files(void)
 	debugfs_remove_recursive(debug_dir);
 }
 
+static enum cpuhp_state hp_state;
+
 static int __init powerclamp_init(void)
 {
 	int retval;
@@ -805,7 +799,14 @@ static int __init powerclamp_init(void)
 
 	/* set default limit, maybe adjusted during runtime based on feedback */
 	window_size = 2;
-	register_hotcpu_notifier(&powerclamp_cpu_notifier);
+	retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					   "thermal/intel_powerclamp:online",
+					   powerclamp_cpu_online,
+					   powerclamp_cpu_predown);
+	if (retval < 0)
+		goto exit_free;
+
+	hp_state = retval;
 
 	worker_data = alloc_percpu(struct powerclamp_worker_data);
 	if (!worker_data) {
@@ -830,7 +831,7 @@ static int __init powerclamp_init(void)
 exit_free_thread:
 	free_percpu(worker_data);
 exit_unregister:
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
+	cpuhp_remove_state_nocalls(hp_state);
 exit_free:
 	kfree(cpu_clamping_mask);
 	return retval;
@@ -839,8 +840,8 @@ static int __init powerclamp_init(void)
 
 static void __exit powerclamp_exit(void)
 {
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
+	cpuhp_remove_state_nocalls(hp_state);
 	free_percpu(worker_data);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-17 12:32 ` [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state Petr Mladek
@ 2016-10-21 20:21   ` Jacob Pan
  2016-10-24 15:48     ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2016-10-21 20:21 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Zhang Rui, Thomas Gleixner, Sebastian Andrzej Siewior,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel, jacob.jun.pan

On Mon, 17 Oct 2016 14:32:52 +0200
Petr Mladek <pmladek@suse.com> wrote:

> From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> 
Hi Sebastian,
I applied this patchset on 4.9-rc1 and run some cpu online/offline
loops test while injecting idle, e.g. 25%. I got system hang after a
few cycles. Still looking into root cause.

Thanks,

Jacob
> This is a conversation to the new hotplug state machine with
> the difference that CPU_DEAD becomes CPU_PREDOWN.
> 
> At the same time it makes the handling of the two states symmetrical.
> stop_power_clamp_worker() is called unconditionally and the
> controversial error message is removed.
> 
> Finally, the hotplug state callbacks are removed after the
> powerclamping is stopped to avoid a potential race.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> [pmladek@suse.com: Fixed the possible race in powerclamp_exit()]
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  drivers/thermal/intel_powerclamp.c | 69
> +++++++++++++++++++------------------- 1 file changed, 35
> insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/thermal/intel_powerclamp.c
> b/drivers/thermal/intel_powerclamp.c index a94f7c849a4e..390e50b97324
> 100644 --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -622,43 +622,35 @@ static void end_power_clamp(void)
>  	}
>  }
>  
> -static int powerclamp_cpu_callback(struct notifier_block *nfb,
> -				unsigned long action, void *hcpu)
> +static int powerclamp_cpu_online(unsigned int cpu)
>  {
> -	unsigned long cpu = (unsigned long)hcpu;
> +	if (clamping == false)
> +		return 0;
> +	start_power_clamp_worker(cpu);
> +	/* prefer BSP as controlling CPU */
> +	if (cpu == 0) {
> +		control_cpu = 0;
> +		smp_mb();
> +	}
> +	return 0;
> +}
>  
> -	if (false == clamping)
> -		goto exit_ok;
> +static int powerclamp_cpu_predown(unsigned int cpu)
> +{
> +	if (clamping == false)
> +		return 0;
>  
> -	switch (action) {
> -	case CPU_ONLINE:
> -		start_power_clamp_worker(cpu);
> -		/* prefer BSP as controlling CPU */
> -		if (cpu == 0) {
> -			control_cpu = 0;
> -			smp_mb();
> -		}
> -		break;
> -	case CPU_DEAD:
> -		if (test_bit(cpu, cpu_clamping_mask)) {
> -			pr_err("cpu %lu dead but powerclamping
> thread is not\n",
> -				cpu);
> -			stop_power_clamp_worker(cpu);
> -		}
> -		if (cpu == control_cpu) {
> -			control_cpu = smp_processor_id();
> -			smp_mb();
> -		}
> -	}
> +	stop_power_clamp_worker(cpu);
> +	if (cpu != control_cpu)
> +		return 0;
>  
> -exit_ok:
> -	return NOTIFY_OK;
> +	control_cpu = cpumask_first(cpu_online_mask);
> +	if (control_cpu == cpu)
> +		control_cpu = cpumask_next(cpu, cpu_online_mask);
> +	smp_mb();
> +	return 0;
>  }
>  
> -static struct notifier_block powerclamp_cpu_notifier = {
> -	.notifier_call = powerclamp_cpu_callback,
> -};
> -
>  static int powerclamp_get_max_state(struct thermal_cooling_device
> *cdev, unsigned long *state)
>  {
> @@ -788,6 +780,8 @@ static inline void
> powerclamp_create_debug_files(void)
> debugfs_remove_recursive(debug_dir); }
>  
> +static enum cpuhp_state hp_state;
> +
>  static int __init powerclamp_init(void)
>  {
>  	int retval;
> @@ -805,7 +799,14 @@ static int __init powerclamp_init(void)
>  
>  	/* set default limit, maybe adjusted during runtime based on
> feedback */ window_size = 2;
> -	register_hotcpu_notifier(&powerclamp_cpu_notifier);
> +	retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> +
> "thermal/intel_powerclamp:online",
> +					   powerclamp_cpu_online,
> +					   powerclamp_cpu_predown);
> +	if (retval < 0)
> +		goto exit_free;
> +
> +	hp_state = retval;
>  
>  	worker_data = alloc_percpu(struct powerclamp_worker_data);
>  	if (!worker_data) {
> @@ -830,7 +831,7 @@ static int __init powerclamp_init(void)
>  exit_free_thread:
>  	free_percpu(worker_data);
>  exit_unregister:
> -	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
> +	cpuhp_remove_state_nocalls(hp_state);
>  exit_free:
>  	kfree(cpu_clamping_mask);
>  	return retval;
> @@ -839,8 +840,8 @@ static int __init powerclamp_init(void)
>  
>  static void __exit powerclamp_exit(void)
>  {
> -	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
>  	end_power_clamp();
> +	cpuhp_remove_state_nocalls(hp_state);
>  	free_percpu(worker_data);
>  	thermal_cooling_device_unregister(cooling_dev);
>  	kfree(cpu_clamping_mask);

[Jacob Pan]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-21 20:21   ` Jacob Pan
@ 2016-10-24 15:48     ` Petr Mladek
  2016-10-24 16:55       ` Jacob Pan
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-10-24 15:48 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Zhang Rui, Thomas Gleixner, Sebastian Andrzej Siewior,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Fri 2016-10-21 13:21:18, Jacob Pan wrote:
> On Mon, 17 Oct 2016 14:32:52 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> Hi Sebastian,
> I applied this patchset on 4.9-rc1 and run some cpu online/offline
> loops test while injecting idle, e.g. 25%. I got system hang after a
> few cycles. Still looking into root cause.

You might need the patch
("sched/fair: Fix sched domains NULL deference in select_idle_sibling()")
from linux-tip, see
https://lkml.kernel.org/r/tip-9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441@git.kernel.org

I have mentioned it in the cover letter. I am sorry if it was not
visible enough.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-24 15:48     ` Petr Mladek
@ 2016-10-24 16:55       ` Jacob Pan
  2016-10-27 14:53         ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2016-10-24 16:55 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Zhang Rui, Thomas Gleixner, Sebastian Andrzej Siewior,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel, jacob.jun.pan

On Mon, 24 Oct 2016 17:48:07 +0200
Petr Mladek <pmladek@suse.com> wrote:

> On Fri 2016-10-21 13:21:18, Jacob Pan wrote:
> > On Mon, 17 Oct 2016 14:32:52 +0200
> > Petr Mladek <pmladek@suse.com> wrote:
> > 
> > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > 
> > Hi Sebastian,
> > I applied this patchset on 4.9-rc1 and run some cpu online/offline
> > loops test while injecting idle, e.g. 25%. I got system hang after a
> > few cycles. Still looking into root cause.
> 
> You might need the patch
> ("sched/fair: Fix sched domains NULL deference in
> select_idle_sibling()") from linux-tip, see
> https://lkml.kernel.org/r/tip-9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441@git.kernel.org
> 
> I have mentioned it in the cover letter. I am sorry if it was not
> visible enough.
Thanks for point it out again, I missed it.

Now rc2 has this fix so I tried again cpuhp seems to work well now.

However, I found suspend to ram does not work with this patchset. Have
you  tested it with "echo mem > /sys/power/state" while doing idle
injection?
It fails to enter s3 on my system.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-24 16:55       ` Jacob Pan
@ 2016-10-27 14:53         ` Petr Mladek
  2016-10-27 15:17           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-10-27 14:53 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Zhang Rui, Thomas Gleixner, Sebastian Andrzej Siewior,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Mon 2016-10-24 09:55:29, Jacob Pan wrote:
> On Mon, 24 Oct 2016 17:48:07 +0200
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > On Fri 2016-10-21 13:21:18, Jacob Pan wrote:
> > > On Mon, 17 Oct 2016 14:32:52 +0200
> > > Petr Mladek <pmladek@suse.com> wrote:
> > > 
> > > > From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > > > 
> > > Hi Sebastian,
> > > I applied this patchset on 4.9-rc1 and run some cpu online/offline
> > > loops test while injecting idle, e.g. 25%. I got system hang after a
> > > few cycles. Still looking into root cause.
> > 
> > You might need the patch
> > ("sched/fair: Fix sched domains NULL deference in
> > select_idle_sibling()") from linux-tip, see
> > https://lkml.kernel.org/r/tip-9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441@git.kernel.org
> > 
> > I have mentioned it in the cover letter. I am sorry if it was not
> > visible enough.
> Thanks for point it out again, I missed it.
> 
> Now rc2 has this fix so I tried again cpuhp seems to work well now.
> 
> However, I found suspend to ram does not work with this patchset. Have
> you  tested it with "echo mem > /sys/power/state" while doing idle
> injection?
> It fails to enter s3 on my system.

Hmm, I haven't tested the system suspend. I am sorry but I am snowed
under some other tasks this week and will be on the Plumbers
conference the next week. I hope that I will have some time to look
at it then.

In each case, I wonder if the problem is caused by the conversion
to the kthread worker or by the CPU hotplug state conversion.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-27 14:53         ` Petr Mladek
@ 2016-10-27 15:17           ` Sebastian Andrzej Siewior
  2016-10-27 20:27             ` Jacob Pan
  0 siblings, 1 reply; 17+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-10-27 15:17 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Jacob Pan, Zhang Rui, Thomas Gleixner, Eduardo Valentin,
	Tejun Heo, Peter Zijlstra, linux-pm, linux-kernel

On 2016-10-27 16:53:48 [+0200], Petr Mladek wrote:
> 
> In each case, I wonder if the problem is caused by the conversion
> to the kthread worker or by the CPU hotplug state conversion.

drop the hotplug patch and you will see.

> Best Regards,
> Petr

Sebastian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-27 15:17           ` Sebastian Andrzej Siewior
@ 2016-10-27 20:27             ` Jacob Pan
  2016-11-11  9:33               ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2016-10-27 20:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Petr Mladek, Zhang Rui, Thomas Gleixner, Eduardo Valentin,
	Tejun Heo, Peter Zijlstra, linux-pm, linux-kernel, jacob.jun.pan

On Thu, 27 Oct 2016 17:17:26 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

> On 2016-10-27 16:53:48 [+0200], Petr Mladek wrote:
> > 
> > In each case, I wonder if the problem is caused by the conversion
> > to the kthread worker or by the CPU hotplug state conversion.  
> 
> drop the hotplug patch and you will see.
> 
Petr,

I dropped hp patch no long see the hang during suspend to s3. However,
the problem seems to be this line,

diff --git a/drivers/thermal/intel_powerclamp.c
b/drivers/thermal/intel_powerclamp.c index 390e50b..b61da57 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -574,7 +574,7 @@ static void stop_power_clamp_worker(unsigned long
cpu) */
        del_timer_sync(&w_data->wakeup_timer);
        clear_bit(w_data->cpu, cpu_clamping_mask);
-       kthread_destroy_worker(w_data->worker);
+//     kthread_destroy_worker(w_data->worker);
 
        w_data->worker = NULL;
 }

If I do the above, everything works with S3 and CPU HP patch.

Inside kthread_destroy_worker()
	kthread_flush_worker(worker);
never completes then blocks s3 entry!

I will be at LPC next week also, we can chat about that.


Thanks,

Jacob


> > Best Regards,
> > Petr  
> 
> Sebastian

[Jacob Pan]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-10-27 20:27             ` Jacob Pan
@ 2016-11-11  9:33               ` Petr Mladek
  2016-11-11 10:07                 ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-11-11  9:33 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Sebastian Andrzej Siewior, Zhang Rui, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Thu 2016-10-27 13:27:36, Jacob Pan wrote:
> On Thu, 27 Oct 2016 17:17:26 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> > On 2016-10-27 16:53:48 [+0200], Petr Mladek wrote:
> > > 
> > > In each case, I wonder if the problem is caused by the conversion
> > > to the kthread worker or by the CPU hotplug state conversion.  
> > 
> > drop the hotplug patch and you will see.
> > 
> Petr,
> 
> I dropped hp patch no long see the hang during suspend to s3. However,
> the problem seems to be this line,
> 
> diff --git a/drivers/thermal/intel_powerclamp.c
> b/drivers/thermal/intel_powerclamp.c index 390e50b..b61da57 100644
> --- a/drivers/thermal/intel_powerclamp.c
> +++ b/drivers/thermal/intel_powerclamp.c
> @@ -574,7 +574,7 @@ static void stop_power_clamp_worker(unsigned long
> cpu) */
>         del_timer_sync(&w_data->wakeup_timer);
>         clear_bit(w_data->cpu, cpu_clamping_mask);
> -       kthread_destroy_worker(w_data->worker);
> +//     kthread_destroy_worker(w_data->worker);
>  
>         w_data->worker = NULL;
>  }
> 
> If I do the above, everything works with S3 and CPU HP patch.
> 
> Inside kthread_destroy_worker()
> 	kthread_flush_worker(worker);
> never completes then blocks s3 entry!

The kthread_flush_worker() was not needed because the queue was empty
in this case (runtime checked). But it still hangs on

  kthread_destroy_worker()
    kthread_stop(task);


Then I tried to revert the conversion to the kthread worker
API (2nd patch from this patchset), see below. And it still
hangs during the suspend inside

	powerclamp_cpu_predown()
	  kthread_stop(*percpu_thread);


Note that both kthread_flush_worker() and kthread_stop()
waits until the kthread gets scheduled and do some job.
Also note that the kthread is bound to the given CPU.

My guess is that the kthread cannot be scheduled at this stage.
I wonder if the CPU is already partially down or that tasks
are freezed so that "normal" tasks are not scheduled at
this point. I am still trying to understand the code
related to suspend, cpu hotplug, and scheduler.


Just for record. Below is the conversion to the new
CPU hotplug state that can be applied on top
of the 1st patch from this patchset ("thermal/intel_powerclamp:
 Remove duplicated code that starts the kthread").
It allows to rule out the kthread worker API for the moment.


>From 928b066ea07532d82c802912e17b44bd01ec5665 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Thu, 1 Sep 2016 11:23:57 +0200
Subject: [PATCH] thermal/intel_powerclamp: Convert to CPU hotplug state

This is a conversation to the new hotplug state machine with
the difference that CPU_DEAD becomes CPU_PREDOWN.

At the same time it makes the handling of the two states symmetrical.
stop_power_clamp_worker() is called unconditionally and the controversial
error message is removed.

Finally, the hotplug state callbacks are removed after the powerclamping
is stopped to avoid a potential race.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[pmladek@suse.com: Fixed the possible race in powerclamp_exit()]
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 drivers/thermal/intel_powerclamp.c | 70 ++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 0244805b7b86..4ebfea3f0f1e 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -572,45 +572,38 @@ static void end_power_clamp(void)
 	}
 }
 
-static int powerclamp_cpu_callback(struct notifier_block *nfb,
-				unsigned long action, void *hcpu)
+static int powerclamp_cpu_online(unsigned int cpu)
+{
+	if (clamping == false)
+		return 0;
+	start_power_clamp_thread(cpu);
+	/* prefer BSP as controlling CPU */
+	if (cpu == 0) {
+		control_cpu = 0;
+		smp_mb();
+	}
+	return 0;
+}
+
+static int powerclamp_cpu_predown(unsigned int cpu)
 {
-	unsigned long cpu = (unsigned long)hcpu;
 	struct task_struct **percpu_thread =
 		per_cpu_ptr(powerclamp_thread, cpu);
 
-	if (false == clamping)
-		goto exit_ok;
+	if (clamping == false)
+		return 0;
 
-	switch (action) {
-	case CPU_ONLINE:
-		start_power_clamp_thread(cpu);
-		/* prefer BSP as controlling CPU */
-		if (cpu == 0) {
-			control_cpu = 0;
-			smp_mb();
-		}
-		break;
-	case CPU_DEAD:
-		if (test_bit(cpu, cpu_clamping_mask)) {
-			pr_err("cpu %lu dead but powerclamping thread is not\n",
-				cpu);
-			kthread_stop(*percpu_thread);
-		}
-		if (cpu == control_cpu) {
-			control_cpu = smp_processor_id();
-			smp_mb();
-		}
-	}
+	kthread_stop(*percpu_thread);
+	if (cpu != control_cpu)
+		return 0;
 
-exit_ok:
-	return NOTIFY_OK;
+	control_cpu = cpumask_first(cpu_online_mask);
+	if (control_cpu == cpu)
+		control_cpu = cpumask_next(cpu, cpu_online_mask);
+	smp_mb();
+	return 0;
 }
 
-static struct notifier_block powerclamp_cpu_notifier = {
-	.notifier_call = powerclamp_cpu_callback,
-};
-
 static int powerclamp_get_max_state(struct thermal_cooling_device *cdev,
 				 unsigned long *state)
 {
@@ -730,6 +723,8 @@ static inline void powerclamp_create_debug_files(void)
 	debugfs_remove_recursive(debug_dir);
 }
 
+static enum cpuhp_state hp_state;
+
 static int __init powerclamp_init(void)
 {
 	int retval;
@@ -747,7 +742,14 @@ static int __init powerclamp_init(void)
 
 	/* set default limit, maybe adjusted during runtime based on feedback */
 	window_size = 2;
-	register_hotcpu_notifier(&powerclamp_cpu_notifier);
+	retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					   "thermal/intel_powerclamp:online",
+					   powerclamp_cpu_online,
+					   powerclamp_cpu_predown);
+	if (retval < 0)
+		goto exit_free;
+
+	hp_state = retval;
 
 	powerclamp_thread = alloc_percpu(struct task_struct *);
 	if (!powerclamp_thread) {
@@ -772,7 +774,7 @@ static int __init powerclamp_init(void)
 exit_free_thread:
 	free_percpu(powerclamp_thread);
 exit_unregister:
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
+	cpuhp_remove_state_nocalls(hp_state);
 exit_free:
 	kfree(cpu_clamping_mask);
 	return retval;
@@ -781,8 +783,8 @@ static int __init powerclamp_init(void)
 
 static void __exit powerclamp_exit(void)
 {
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
+	cpuhp_remove_state_nocalls(hp_state);
 	free_percpu(powerclamp_thread);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-11  9:33               ` Petr Mladek
@ 2016-11-11 10:07                 ` Petr Mladek
  2016-11-11 17:34                   ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-11-11 10:07 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Sebastian Andrzej Siewior, Zhang Rui, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Fri 2016-11-11 10:33:30, Petr Mladek wrote:
> Then I tried to revert the conversion to the kthread worker
> API (2nd patch from this patchset), see below. And it still
> hangs during the suspend inside
> 
> 	powerclamp_cpu_predown()
> 	  kthread_stop(*percpu_thread);
> 
> 
> Note that both kthread_flush_worker() and kthread_stop()
> waits until the kthread gets scheduled and do some job.
> Also note that the kthread is bound to the given CPU.
> 
> My guess is that the kthread cannot be scheduled at this stage.
> I wonder if the CPU is already partially down or that tasks
> are freezed so that "normal" tasks are not scheduled at
> this point. I am still trying to understand the code
> related to suspend, cpu hotplug, and scheduler.

And yes, the problem seems to be that the kthread is freezed
so that it could not run. The suspend works when I disable:

	clamp_thread()
//	  set_freezable();
//	  try_to_freeze();

In fact, we should not need these calls. They are needed only
when we want to stop the kthread on exact location so that
it does not produce I/O that would block/break suspend.
But this is not the case of intel_powerclamp.

I am going to do some more tests and will send a fix. It should
be enough to remove the KTW_FREEZABLE flag from the
kthread_create_worker_on_cpu() call.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-11 10:07                 ` Petr Mladek
@ 2016-11-11 17:34                   ` Petr Mladek
  2016-11-14 19:12                     ` Jacob Pan
  0 siblings, 1 reply; 17+ messages in thread
From: Petr Mladek @ 2016-11-11 17:34 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Sebastian Andrzej Siewior, Zhang Rui, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Fri 2016-11-11 11:07:13, Petr Mladek wrote:
> I am going to do some more tests and will send a fix. It should
> be enough to remove the KTW_FREEZABLE flag from the
> kthread_create_worker_on_cpu() call.

Please, find below an updated patch that fixes the suspend with
kidle_inject kthreads running. The kthread worker must not
get catched by the freezer.

Please, let me known if I should do this as a separate patch
and/or resend the patchset.


>From ffdabb3d61d4ac4fe48bc170a47d2be93cf58e48 Mon Sep 17 00:00:00 2001
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Thu, 1 Sep 2016 11:23:57 +0200
Subject: [PATCH] thermal/intel_powerclamp: Convert to CPU hotplug state

This is a conversation to the new hotplug state machine with
the difference that CPU_DEAD becomes CPU_PREDOWN.

At the same time it makes the handling of the two states symmetrical.
stop_power_clamp_worker() is called unconditionally and the controversial
error message is removed.

The kthread worker must not be freezable. Otherwise it could _not_ be
destroyed in the cpu predown callback. In particular, kthread_stop()
or kthread_flush_worker() would hang. These calls wait for the kthread
to do some job and it would never happen when it was frozen.

Finally, the hotplug state callbacks are removed after the powerclamping
is stopped to avoid a potential race.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
[pmladek@suse.com: Fixed freezer and a possible race in powerclamp_exit().]
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 drivers/thermal/intel_powerclamp.c | 73 +++++++++++++++++++-------------------
 1 file changed, 36 insertions(+), 37 deletions(-)

diff --git a/drivers/thermal/intel_powerclamp.c b/drivers/thermal/intel_powerclamp.c
index 2b99c0627043..d9b9e0fb4e48 100644
--- a/drivers/thermal/intel_powerclamp.c
+++ b/drivers/thermal/intel_powerclamp.c
@@ -43,7 +43,6 @@
 #include <linux/kernel.h>
 #include <linux/delay.h>
 #include <linux/kthread.h>
-#include <linux/freezer.h>
 #include <linux/cpu.h>
 #include <linux/thermal.h>
 #include <linux/slab.h>
@@ -530,8 +529,7 @@ static void start_power_clamp_worker(unsigned long cpu)
 	struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu);
 	struct kthread_worker *worker;
 
-	worker = kthread_create_worker_on_cpu(cpu, KTW_FREEZABLE,
-					      "kidle_inject/%ld", cpu);
+	worker = kthread_create_worker_on_cpu(cpu, 0, "kidle_inject/%ld", cpu);
 	if (IS_ERR(worker))
 		return;
 
@@ -622,43 +620,35 @@ static void end_power_clamp(void)
 	}
 }
 
-static int powerclamp_cpu_callback(struct notifier_block *nfb,
-				unsigned long action, void *hcpu)
+static int powerclamp_cpu_online(unsigned int cpu)
 {
-	unsigned long cpu = (unsigned long)hcpu;
+	if (clamping == false)
+		return 0;
+	start_power_clamp_worker(cpu);
+	/* prefer BSP as controlling CPU */
+	if (cpu == 0) {
+		control_cpu = 0;
+		smp_mb();
+	}
+	return 0;
+}
 
-	if (false == clamping)
-		goto exit_ok;
+static int powerclamp_cpu_predown(unsigned int cpu)
+{
+	if (clamping == false)
+		return 0;
 
-	switch (action) {
-	case CPU_ONLINE:
-		start_power_clamp_worker(cpu);
-		/* prefer BSP as controlling CPU */
-		if (cpu == 0) {
-			control_cpu = 0;
-			smp_mb();
-		}
-		break;
-	case CPU_DEAD:
-		if (test_bit(cpu, cpu_clamping_mask)) {
-			pr_err("cpu %lu dead but powerclamping thread is not\n",
-				cpu);
-			stop_power_clamp_worker(cpu);
-		}
-		if (cpu == control_cpu) {
-			control_cpu = smp_processor_id();
-			smp_mb();
-		}
-	}
+	stop_power_clamp_worker(cpu);
+	if (cpu != control_cpu)
+		return 0;
 
-exit_ok:
-	return NOTIFY_OK;
+	control_cpu = cpumask_first(cpu_online_mask);
+	if (control_cpu == cpu)
+		control_cpu = cpumask_next(cpu, cpu_online_mask);
+	smp_mb();
+	return 0;
 }
 
-static struct notifier_block powerclamp_cpu_notifier = {
-	.notifier_call = powerclamp_cpu_callback,
-};
-
 static int powerclamp_get_max_state(struct thermal_cooling_device *cdev,
 				 unsigned long *state)
 {
@@ -778,6 +768,8 @@ static inline void powerclamp_create_debug_files(void)
 	debugfs_remove_recursive(debug_dir);
 }
 
+static enum cpuhp_state hp_state;
+
 static int __init powerclamp_init(void)
 {
 	int retval;
@@ -795,7 +787,14 @@ static int __init powerclamp_init(void)
 
 	/* set default limit, maybe adjusted during runtime based on feedback */
 	window_size = 2;
-	register_hotcpu_notifier(&powerclamp_cpu_notifier);
+	retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+					   "thermal/intel_powerclamp:online",
+					   powerclamp_cpu_online,
+					   powerclamp_cpu_predown);
+	if (retval < 0)
+		goto exit_free;
+
+	hp_state = retval;
 
 	worker_data = alloc_percpu(struct powerclamp_worker_data);
 	if (!worker_data) {
@@ -820,7 +819,7 @@ static int __init powerclamp_init(void)
 exit_free_thread:
 	free_percpu(worker_data);
 exit_unregister:
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
+	cpuhp_remove_state_nocalls(hp_state);
 exit_free:
 	kfree(cpu_clamping_mask);
 	return retval;
@@ -829,8 +828,8 @@ static int __init powerclamp_init(void)
 
 static void __exit powerclamp_exit(void)
 {
-	unregister_hotcpu_notifier(&powerclamp_cpu_notifier);
 	end_power_clamp();
+	cpuhp_remove_state_nocalls(hp_state);
 	free_percpu(worker_data);
 	thermal_cooling_device_unregister(cooling_dev);
 	kfree(cpu_clamping_mask);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-11 17:34                   ` Petr Mladek
@ 2016-11-14 19:12                     ` Jacob Pan
  2016-11-15 11:36                       ` Zhang Rui
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2016-11-14 19:12 UTC (permalink / raw)
  To: Petr Mladek
  Cc: Sebastian Andrzej Siewior, Zhang Rui, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel, jacob.jun.pan, Eduardo Valentin

On Fri, 11 Nov 2016 18:34:10 +0100
Petr Mladek <pmladek@suse.com> wrote:

> Please, let me known if I should do this as a separate patch
> and/or resend the patchset.

Rui/Eduardo,
There are four independent posted changes made to powerclamp driver:
Should I roll them into one set such that you can easily apply them?
1. add back module device table https://lkml.org/lkml/2016/11/14/760
2. kworker, this patchset
3. cpu hotplug state, this patch.
4. PF_IDLE https://lkml.org/lkml/2016/11/14/747

Thanks,

Jacob

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-14 19:12                     ` Jacob Pan
@ 2016-11-15 11:36                       ` Zhang Rui
  2016-11-15 16:40                         ` Jacob Pan
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang Rui @ 2016-11-15 11:36 UTC (permalink / raw)
  To: Jacob Pan, Petr Mladek
  Cc: Sebastian Andrzej Siewior, Thomas Gleixner, Eduardo Valentin,
	Tejun Heo, Peter Zijlstra, linux-pm, linux-kernel

On Mon, 2016-11-14 at 11:12 -0800, Jacob Pan wrote:
> On Fri, 11 Nov 2016 18:34:10 +0100
> Petr Mladek <pmladek@suse.com> wrote:
> 
> > 
> > Please, let me known if I should do this as a separate patch
> > and/or resend the patchset.
> Rui/Eduardo,
> There are four independent posted changes made to powerclamp driver:
> Should I roll them into one set such that you can easily apply them?
> 1. add back module device table https://lkml.org/lkml/2016/11/14/760
> 2. kworker, this patchset
> 3. cpu hotplug state, this patch.
> 4. PF_IDLE https://lkml.org/lkml/2016/11/14/747
> 
hmm, I will apply patch 1 and queue up for next -rc and 4.8 stable.

please resend the others that are targeted for 4.10 and later in one
patch set.

thanks,
rui

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-15 11:36                       ` Zhang Rui
@ 2016-11-15 16:40                         ` Jacob Pan
  2016-11-21 11:57                           ` Petr Mladek
  0 siblings, 1 reply; 17+ messages in thread
From: Jacob Pan @ 2016-11-15 16:40 UTC (permalink / raw)
  To: Zhang Rui
  Cc: Petr Mladek, Sebastian Andrzej Siewior, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel, jacob.jun.pan

On Tue, 15 Nov 2016 19:36:26 +0800
Zhang Rui <rui.zhang@intel.com> wrote:

> On Mon, 2016-11-14 at 11:12 -0800, Jacob Pan wrote:
> > On Fri, 11 Nov 2016 18:34:10 +0100
> > Petr Mladek <pmladek@suse.com> wrote:
> >   
> > > 
> > > Please, let me known if I should do this as a separate patch
> > > and/or resend the patchset.  
> > Rui/Eduardo,
> > There are four independent posted changes made to powerclamp driver:
> > Should I roll them into one set such that you can easily apply them?
> > 1. add back module device table https://lkml.org/lkml/2016/11/14/760
> > 2. kworker, this patchset
> > 3. cpu hotplug state, this patch.
> > 4. PF_IDLE https://lkml.org/lkml/2016/11/14/747
> >   
> hmm, I will apply patch 1 and queue up for next -rc and 4.8 stable.
> 
> please resend the others that are targeted for 4.10 and later in one
> patch set.
OK, will do. I guess Petr and Sebastian are OK with me putting your
powerclamp changes in one patchset too?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state
  2016-11-15 16:40                         ` Jacob Pan
@ 2016-11-21 11:57                           ` Petr Mladek
  0 siblings, 0 replies; 17+ messages in thread
From: Petr Mladek @ 2016-11-21 11:57 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Zhang Rui, Sebastian Andrzej Siewior, Thomas Gleixner,
	Eduardo Valentin, Tejun Heo, Peter Zijlstra, linux-pm,
	linux-kernel

On Tue 2016-11-15 08:40:57, Jacob Pan wrote:
> On Tue, 15 Nov 2016 19:36:26 +0800
> Zhang Rui <rui.zhang@intel.com> wrote:
> 
> > On Mon, 2016-11-14 at 11:12 -0800, Jacob Pan wrote:
> > > On Fri, 11 Nov 2016 18:34:10 +0100
> > > Petr Mladek <pmladek@suse.com> wrote:
> > >   
> > > > 
> > > > Please, let me known if I should do this as a separate patch
> > > > and/or resend the patchset.  
> > > Rui/Eduardo,
> > > There are four independent posted changes made to powerclamp driver:
> > > Should I roll them into one set such that you can easily apply them?
> > > 1. add back module device table https://lkml.org/lkml/2016/11/14/760
> > > 2. kworker, this patchset
> > > 3. cpu hotplug state, this patch.
> > > 4. PF_IDLE https://lkml.org/lkml/2016/11/14/747
> > >   
> > hmm, I will apply patch 1 and queue up for next -rc and 4.8 stable.
> > 
> > please resend the others that are targeted for 4.10 and later in one
> > patch set.
> OK, will do. I guess Petr and Sebastian are OK with me putting your
> powerclamp changes in one patchset too?

Sure. I am fine with it.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-11-21 11:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-17 12:32 [PATCH 0/3] thermal/intel_powerclamp: Conversion to kthread worker API and new CPU hotplug state Petr Mladek
2016-10-17 12:32 ` [PATCH 1/3] thermal/intel_powerclamp: Remove duplicated code that starts the kthread Petr Mladek
2016-10-17 12:32 ` [PATCH 2/3] thermal/intel_powerclamp: Convert the kthread to kthread worker API Petr Mladek
2016-10-17 12:32 ` [PATCH 3/3] thermal/intel_powerclamp: Convert to CPU hotplug state Petr Mladek
2016-10-21 20:21   ` Jacob Pan
2016-10-24 15:48     ` Petr Mladek
2016-10-24 16:55       ` Jacob Pan
2016-10-27 14:53         ` Petr Mladek
2016-10-27 15:17           ` Sebastian Andrzej Siewior
2016-10-27 20:27             ` Jacob Pan
2016-11-11  9:33               ` Petr Mladek
2016-11-11 10:07                 ` Petr Mladek
2016-11-11 17:34                   ` Petr Mladek
2016-11-14 19:12                     ` Jacob Pan
2016-11-15 11:36                       ` Zhang Rui
2016-11-15 16:40                         ` Jacob Pan
2016-11-21 11:57                           ` Petr Mladek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).