linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/2] timer: provide an api for deferrable timeout
@ 2014-07-31  7:59 Chintan Pandya
  2014-07-31  7:59 ` [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread Chintan Pandya
  2014-08-04  9:26 ` [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
  0 siblings, 2 replies; 7+ messages in thread
From: Chintan Pandya @ 2014-07-31  7:59 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Ingo Molnar, Hugh Dickins

schedule_timeout wakes up the CPU from IDLE state. For some use cases it
is not desirable, hence introduce a convenient API
(schedule_timeout_deferrable_interruptible) on similar pattern which uses
a deferrable timer.

Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
---
Changes:

V2-->V3:
	- Big comment moved from static function to exported function
	- Using __setup_timer_on_stack for better readability

V2:
	- this patch has been newly introduced in patch v2

 include/linux/sched.h |  2 ++
 kernel/time/timer.c   | 73 +++++++++++++++++++++++++++++++--------------------
 2 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 89f531e..10b154e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -377,6 +377,8 @@ extern int in_sched_functions(unsigned long addr);
 #define	MAX_SCHEDULE_TIMEOUT	LONG_MAX
 extern signed long schedule_timeout(signed long timeout);
 extern signed long schedule_timeout_interruptible(signed long timeout);
+extern signed long
+schedule_timeout_deferrable_interruptible(signed long timeout);
 extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index aca5dfe..f4c4082 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1431,33 +1431,8 @@ static void process_timeout(unsigned long __data)
 	wake_up_process((struct task_struct *)__data);
 }
 
-/**
- * schedule_timeout - sleep until timeout
- * @timeout: timeout value in jiffies
- *
- * Make the current task sleep until @timeout jiffies have
- * elapsed. The routine will return immediately unless
- * the current task state has been set (see set_current_state()).
- *
- * You can set the task state as follows -
- *
- * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
- * pass before the routine returns. The routine will return 0
- *
- * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
- * delivered to the current task. In this case the remaining time
- * in jiffies will be returned, or 0 if the timer expired in time
- *
- * The current task state is guaranteed to be TASK_RUNNING when this
- * routine returns.
- *
- * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
- * the CPU away without a bound on the timeout. In this case the return
- * value will be %MAX_SCHEDULE_TIMEOUT.
- *
- * In all cases the return value is guaranteed to be non-negative.
- */
-signed long __sched schedule_timeout(signed long timeout)
+static signed long
+__sched __schedule_timeout(signed long timeout, unsigned long flag)
 {
 	struct timer_list timer;
 	unsigned long expire;
@@ -1493,7 +1468,9 @@ signed long __sched schedule_timeout(signed long timeout)
 
 	expire = timeout + jiffies;
 
-	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
+	__setup_timer_on_stack(&timer, process_timeout, (unsigned long)current,
+				flag);
+
 	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
 	schedule();
 	del_singleshot_timer_sync(&timer);
@@ -1506,12 +1483,52 @@ signed long __sched schedule_timeout(signed long timeout)
  out:
 	return timeout < 0 ? 0 : timeout;
 }
+
+/**
+ * schedule_timeout - sleep until timeout
+ * @timeout: timeout value in jiffies
+ *
+ * Make the current task sleep until @timeout jiffies have
+ * elapsed. The routine will return immediately unless
+ * the current task state has been set (see set_current_state()).
+ *
+ * You can set the task state as follows -
+ *
+ * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
+ * pass before the routine returns. The routine will return 0
+ *
+ * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
+ * delivered to the current task. In this case the remaining time
+ * in jiffies will be returned, or 0 if the timer expired in time
+ *
+ * The current task state is guaranteed to be TASK_RUNNING when this
+ * routine returns.
+ *
+ * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
+ * the CPU away without a bound on the timeout. In this case the return
+ * value will be %MAX_SCHEDULE_TIMEOUT.
+ *
+ * In all cases the return value is guaranteed to be non-negative.
+ */
+signed long __sched schedule_timeout(signed long timeout)
+{
+	return __schedule_timeout(timeout, 0);
+}
 EXPORT_SYMBOL(schedule_timeout);
 
 /*
  * We can use __set_current_state() here because schedule_timeout() calls
  * schedule() unconditionally.
  */
+
+signed long
+__sched schedule_timeout_deferrable_interruptible(signed long timeout)
+{
+	__set_current_state(TASK_INTERRUPTIBLE);
+	return __schedule_timeout(timeout, TIMER_DEFERRABLE);
+}
+EXPORT_SYMBOL(schedule_timeout_deferrable_interruptible);
+
 signed long __sched schedule_timeout_interruptible(signed long timeout)
 {
 	__set_current_state(TASK_INTERRUPTIBLE);
-- 
Chintan Pandya

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread
  2014-07-31  7:59 [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
@ 2014-07-31  7:59 ` Chintan Pandya
  2014-08-11 12:18   ` Hugh Dickins
  2014-08-04  9:26 ` [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
  1 sibling, 1 reply; 7+ messages in thread
From: Chintan Pandya @ 2014-07-31  7:59 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Chintan Pandya, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Ingo Molnar, Hugh Dickins

KSM thread to scan pages is scheduled on definite timeout.  That wakes up
CPU from idle state and hence may affect the power consumption.  Provide
an optional support to use deferrable timer which suites low-power
use-cases.

Typically, on our setup we observed, 10% less power consumption with some
use-cases in which CPU goes to power collapse frequently.  For example,
playing audio while typically CPU remains idle.

To enable deferrable timers,
$ echo 1 > /sys/kernel/mm/ksm/deferrable_timer

Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
---
Changes:

V2-->V3:
	- Handled error case properly
	- Corrected indentation in Documentation
	- Fixed build failure
	- Removed left over process_timeout()
V1-->V2:
	- allowing only valid values to be updated as use_deferrable_timer
	- using only 'deferrable' and not 'deferred'
	- moved out schedule_timeout code for deferrable timer into timer.c

 Documentation/vm/ksm.txt |  7 +++++++
 mm/ksm.c                 | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
index f34a8ee..9735c87 100644
--- a/Documentation/vm/ksm.txt
+++ b/Documentation/vm/ksm.txt
@@ -87,6 +87,13 @@ pages_sharing    - how many more sites are sharing them i.e. how much saved
 pages_unshared   - how many pages unique but repeatedly checked for merging
 pages_volatile   - how many pages changing too fast to be placed in a tree
 full_scans       - how many times all mergeable areas have been scanned
+deferrable_timer - whether to use deferrable timers or not
+                   e.g. "echo 1 > /sys/kernel/mm/ksm/deferrable_timer"
+                   Default: 0 (means, we are not using deferrable timers. Users
+		   might want to set deferrable_timer option if they donot want
+		   ksm thread to wakeup CPU to carryout ksm activities thus
+		   gaining on battery while compromising slightly on memory
+		   that could have been saved.)
 
 A high ratio of pages_sharing to pages_shared indicates good sharing, but
 a high ratio of pages_unshared to pages_sharing indicates wasted effort.
diff --git a/mm/ksm.c b/mm/ksm.c
index fb75902..434a50a 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -223,6 +223,9 @@ static unsigned int ksm_thread_pages_to_scan = 100;
 /* Milliseconds ksmd should sleep between batches */
 static unsigned int ksm_thread_sleep_millisecs = 20;
 
+/* Boolean to indicate whether to use deferrable timer or not */
+static bool use_deferrable_timer;
+
 #ifdef CONFIG_NUMA
 /* Zeroed when merging across nodes is not allowed */
 static unsigned int ksm_merge_across_nodes = 1;
@@ -1725,8 +1728,13 @@ static int ksm_scan_thread(void *nothing)
 		try_to_freeze();
 
 		if (ksmd_should_run()) {
-			schedule_timeout_interruptible(
-				msecs_to_jiffies(ksm_thread_sleep_millisecs));
+			signed long to;
+
+			to = msecs_to_jiffies(ksm_thread_sleep_millisecs);
+			if (use_deferrable_timer)
+				schedule_timeout_deferrable_interruptible(to);
+			else
+				schedule_timeout_interruptible(to);
 		} else {
 			wait_event_freezable(ksm_thread_wait,
 				ksmd_should_run() || kthread_should_stop());
@@ -2175,6 +2183,29 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
 }
 KSM_ATTR(run);
 
+static ssize_t deferrable_timer_show(struct kobject *kobj,
+				    struct kobj_attribute *attr, char *buf)
+{
+	return snprintf(buf, 8, "%d\n", use_deferrable_timer);
+}
+
+static ssize_t deferrable_timer_store(struct kobject *kobj,
+				     struct kobj_attribute *attr,
+				     const char *buf, size_t count)
+{
+	unsigned long enable;
+	int err;
+
+	err = kstrtoul(buf, 10, &enable);
+	if (err < 0)
+		return err;
+	if (enable >= 1)
+		return -EINVAL;
+	use_deferrable_timer = enable;
+	return count;
+}
+KSM_ATTR(deferrable_timer);
+
 #ifdef CONFIG_NUMA
 static ssize_t merge_across_nodes_show(struct kobject *kobj,
 				struct kobj_attribute *attr, char *buf)
@@ -2287,6 +2318,7 @@ static struct attribute *ksm_attrs[] = {
 	&pages_unshared_attr.attr,
 	&pages_volatile_attr.attr,
 	&full_scans_attr.attr,
+	&deferrable_timer_attr.attr,
 #ifdef CONFIG_NUMA
 	&merge_across_nodes_attr.attr,
 #endif
-- 
Chintan Pandya

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] timer: provide an api for deferrable timeout
  2014-07-31  7:59 [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
  2014-07-31  7:59 ` [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread Chintan Pandya
@ 2014-08-04  9:26 ` Chintan Pandya
  2014-08-04 10:44   ` Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Chintan Pandya @ 2014-08-04  9:26 UTC (permalink / raw)
  To: Chintan Pandya
  Cc: akpm, linux-kernel, linux-mm, Thomas Gleixner, John Stultz,
	Peter Zijlstra, Ingo Molnar, Hugh Dickins

Ping !! Anything open for me to do here ?

On 07/31/2014 01:29 PM, Chintan Pandya wrote:
> schedule_timeout wakes up the CPU from IDLE state. For some use cases it
> is not desirable, hence introduce a convenient API
> (schedule_timeout_deferrable_interruptible) on similar pattern which uses
> a deferrable timer.
>
> Signed-off-by: Chintan Pandya<cpandya@codeaurora.org>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: John Stultz<john.stultz@linaro.org>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: Hugh Dickins<hughd@google.com>
> ---
> Changes:
>
> V2-->V3:
> 	- Big comment moved from static function to exported function
> 	- Using __setup_timer_on_stack for better readability
>
> V2:
> 	- this patch has been newly introduced in patch v2
>
>   include/linux/sched.h |  2 ++
>   kernel/time/timer.c   | 73 +++++++++++++++++++++++++++++++--------------------
>   2 files changed, 47 insertions(+), 28 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 89f531e..10b154e 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -377,6 +377,8 @@ extern int in_sched_functions(unsigned long addr);
>   #define	MAX_SCHEDULE_TIMEOUT	LONG_MAX
>   extern signed long schedule_timeout(signed long timeout);
>   extern signed long schedule_timeout_interruptible(signed long timeout);
> +extern signed long
> +schedule_timeout_deferrable_interruptible(signed long timeout);
>   extern signed long schedule_timeout_killable(signed long timeout);
>   extern signed long schedule_timeout_uninterruptible(signed long timeout);
>   asmlinkage void schedule(void);
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index aca5dfe..f4c4082 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1431,33 +1431,8 @@ static void process_timeout(unsigned long __data)
>   	wake_up_process((struct task_struct *)__data);
>   }
>
> -/**
> - * schedule_timeout - sleep until timeout
> - * @timeout: timeout value in jiffies
> - *
> - * Make the current task sleep until @timeout jiffies have
> - * elapsed. The routine will return immediately unless
> - * the current task state has been set (see set_current_state()).
> - *
> - * You can set the task state as follows -
> - *
> - * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
> - * pass before the routine returns. The routine will return 0
> - *
> - * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
> - * delivered to the current task. In this case the remaining time
> - * in jiffies will be returned, or 0 if the timer expired in time
> - *
> - * The current task state is guaranteed to be TASK_RUNNING when this
> - * routine returns.
> - *
> - * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
> - * the CPU away without a bound on the timeout. In this case the return
> - * value will be %MAX_SCHEDULE_TIMEOUT.
> - *
> - * In all cases the return value is guaranteed to be non-negative.
> - */
> -signed long __sched schedule_timeout(signed long timeout)
> +static signed long
> +__sched __schedule_timeout(signed long timeout, unsigned long flag)
>   {
>   	struct timer_list timer;
>   	unsigned long expire;
> @@ -1493,7 +1468,9 @@ signed long __sched schedule_timeout(signed long timeout)
>
>   	expire = timeout + jiffies;
>
> -	setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> +	__setup_timer_on_stack(&timer, process_timeout, (unsigned long)current,
> +				flag);
> +
>   	__mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
>   	schedule();
>   	del_singleshot_timer_sync(&timer);
> @@ -1506,12 +1483,52 @@ signed long __sched schedule_timeout(signed long timeout)
>    out:
>   	return timeout<  0 ? 0 : timeout;
>   }
> +
> +/**
> + * schedule_timeout - sleep until timeout
> + * @timeout: timeout value in jiffies
> + *
> + * Make the current task sleep until @timeout jiffies have
> + * elapsed. The routine will return immediately unless
> + * the current task state has been set (see set_current_state()).
> + *
> + * You can set the task state as follows -
> + *
> + * %TASK_UNINTERRUPTIBLE - at least @timeout jiffies are guaranteed to
> + * pass before the routine returns. The routine will return 0
> + *
> + * %TASK_INTERRUPTIBLE - the routine may return early if a signal is
> + * delivered to the current task. In this case the remaining time
> + * in jiffies will be returned, or 0 if the timer expired in time
> + *
> + * The current task state is guaranteed to be TASK_RUNNING when this
> + * routine returns.
> + *
> + * Specifying a @timeout value of %MAX_SCHEDULE_TIMEOUT will schedule
> + * the CPU away without a bound on the timeout. In this case the return
> + * value will be %MAX_SCHEDULE_TIMEOUT.
> + *
> + * In all cases the return value is guaranteed to be non-negative.
> + */
> +signed long __sched schedule_timeout(signed long timeout)
> +{
> +	return __schedule_timeout(timeout, 0);
> +}
>   EXPORT_SYMBOL(schedule_timeout);
>
>   /*
>    * We can use __set_current_state() here because schedule_timeout() calls
>    * schedule() unconditionally.
>    */
> +
> +signed long
> +__sched schedule_timeout_deferrable_interruptible(signed long timeout)
> +{
> +	__set_current_state(TASK_INTERRUPTIBLE);
> +	return __schedule_timeout(timeout, TIMER_DEFERRABLE);
> +}
> +EXPORT_SYMBOL(schedule_timeout_deferrable_interruptible);
> +
>   signed long __sched schedule_timeout_interruptible(signed long timeout)
>   {
>   	__set_current_state(TASK_INTERRUPTIBLE);


-- 
Chintan Pandya

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/2] timer: provide an api for deferrable timeout
  2014-08-04  9:26 ` [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
@ 2014-08-04 10:44   ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2014-08-04 10:44 UTC (permalink / raw)
  To: Chintan Pandya
  Cc: akpm, linux-kernel, linux-mm, Thomas Gleixner, John Stultz,
	Ingo Molnar, Hugh Dickins

[-- Attachment #1: Type: text/plain, Size: 257 bytes --]

On Mon, Aug 04, 2014 at 02:56:42PM +0530, Chintan Pandya wrote:
> Ping !! Anything open for me to do here ?

Give Thomas a moment to have a look, he is the maintainer after all, do
consider he's busy, and he extra busy because the merge window just
opened.

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread
  2014-07-31  7:59 ` [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread Chintan Pandya
@ 2014-08-11 12:18   ` Hugh Dickins
  2014-08-12 16:25     ` Chintan Pandya
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2014-08-11 12:18 UTC (permalink / raw)
  To: Chintan Pandya
  Cc: Andrew Morton, linux-kernel, linux-mm, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Ingo Molnar, Hugh Dickins

On Thu, 31 Jul 2014, Chintan Pandya wrote:

> KSM thread to scan pages is scheduled on definite timeout.  That wakes up
> CPU from idle state and hence may affect the power consumption.  Provide
> an optional support to use deferrable timer which suites low-power
> use-cases.

Thanks for drawing attention to this: anything to stop KSM from being
such a CPU hog while it's giving no value, should be welcome.

(I wonder if KSM could draw more feedback from its own success,
and slow down when nothing is happening on VM_MERGEABLE areas;
but I guess that's a slightly different topic from your concern
with not-quite-idle power consumption, I'd better not divert us.)

> 
> Typically, on our setup we observed, 10% less power consumption with some
> use-cases in which CPU goes to power collapse frequently.  For example,
> playing audio while typically CPU remains idle.

I'm probably stupid, but I don't quite get your scenario from that
description: please would you spell it out a little more clearly for me?

Are you thinking of two CPUs, one of them running a process busily
streaming audio (with no VM_MERGEABLE areas to work on), most other
processes sleeping, and ksmd "pinned" to another, otherwise idle CPU?

I'm very inexperienced in scheduler (and audio) matters, but I'd like
to think that the scheduler would migrate ksmd to the mostly busy CPU
in that case - or is it actually 100% busy, with no room for ksmd too?

kernel/sched/core.c shows the CONFIG_NO_HZ_COMMON get_nohz_timer_target(),
which looks like it would migrate it if possible (and CONFIG_NO_HZ_COMMON
appears to be more prevalent than CONFIG_NO_HZ_FULL).

> 
> To enable deferrable timers,
> $ echo 1 > /sys/kernel/mm/ksm/deferrable_timer

I do share Andrew's original reservations: I'd much prefer this if we
can just go ahead and do the deferrable timer without a new tunable
to concern the user, simple though your "deferrable_timer" knob is.

In an earlier mail, you said "We have observed that KSM does maximum
savings when system is idle", as reason why some will prefer a non-
deferrable timer.  I am somewhat suspicious of that observation:
because KSM waits for a page's checksum to stabilize before it saves
it in its "unstable" tree of pages to compare against.  So when the
rest of the system goes idle, KSM is briefly more likely to find
matches; but that may be a short-lived "success" once the system
becomes active again.  So, I'm wondering if your observation just
reflects the mechanics of KSM, and is not actually a reason to
refrain from using a deferrable timer for everyone.

On the other hand, I have a worry about using deferrable timer here.
I think I understand the value of a deferrable timer, in doing a job
which is bound to a particular cpu (mm/slab.c's cache_reap() gives
me a good example of that).  But ksmd is potentially serving every
process, every cpu: we would not want it to be deferred indefinitely,
if other cpus (running processes with VM_MERGEABLE vmas) are active.

Perhaps the likelihood of that scenario is too low; or perhaps it's
a reason why we do need to offer your "deferrable_timer" knob.

Please, I need to understand better before acking this change.

By the way: perhaps KSM is the right place to start, but please take
a look also at THP in mm/huge_memory.c, whose khugepaged was originally
modelled on ksmd (but now seems to be using wait_event_freezable_timeout
rather than schedule_timeout_interruptible - I've not yet researched the
history behind that difference).  I expect it to need the same treatment.

Hugh

> 
> Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Hugh Dickins <hughd@google.com>
> ---
> Changes:
> 
> V2-->V3:
> 	- Handled error case properly
> 	- Corrected indentation in Documentation
> 	- Fixed build failure
> 	- Removed left over process_timeout()
> V1-->V2:
> 	- allowing only valid values to be updated as use_deferrable_timer
> 	- using only 'deferrable' and not 'deferred'
> 	- moved out schedule_timeout code for deferrable timer into timer.c
> 
>  Documentation/vm/ksm.txt |  7 +++++++
>  mm/ksm.c                 | 36 ++++++++++++++++++++++++++++++++++--
>  2 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt
> index f34a8ee..9735c87 100644
> --- a/Documentation/vm/ksm.txt
> +++ b/Documentation/vm/ksm.txt
> @@ -87,6 +87,13 @@ pages_sharing    - how many more sites are sharing them i.e. how much saved
>  pages_unshared   - how many pages unique but repeatedly checked for merging
>  pages_volatile   - how many pages changing too fast to be placed in a tree
>  full_scans       - how many times all mergeable areas have been scanned
> +deferrable_timer - whether to use deferrable timers or not
> +                   e.g. "echo 1 > /sys/kernel/mm/ksm/deferrable_timer"
> +                   Default: 0 (means, we are not using deferrable timers. Users
> +		   might want to set deferrable_timer option if they donot want
> +		   ksm thread to wakeup CPU to carryout ksm activities thus
> +		   gaining on battery while compromising slightly on memory
> +		   that could have been saved.)
>  
>  A high ratio of pages_sharing to pages_shared indicates good sharing, but
>  a high ratio of pages_unshared to pages_sharing indicates wasted effort.
> diff --git a/mm/ksm.c b/mm/ksm.c
> index fb75902..434a50a 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -223,6 +223,9 @@ static unsigned int ksm_thread_pages_to_scan = 100;
>  /* Milliseconds ksmd should sleep between batches */
>  static unsigned int ksm_thread_sleep_millisecs = 20;
>  
> +/* Boolean to indicate whether to use deferrable timer or not */
> +static bool use_deferrable_timer;
> +
>  #ifdef CONFIG_NUMA
>  /* Zeroed when merging across nodes is not allowed */
>  static unsigned int ksm_merge_across_nodes = 1;
> @@ -1725,8 +1728,13 @@ static int ksm_scan_thread(void *nothing)
>  		try_to_freeze();
>  
>  		if (ksmd_should_run()) {
> -			schedule_timeout_interruptible(
> -				msecs_to_jiffies(ksm_thread_sleep_millisecs));
> +			signed long to;
> +
> +			to = msecs_to_jiffies(ksm_thread_sleep_millisecs);
> +			if (use_deferrable_timer)
> +				schedule_timeout_deferrable_interruptible(to);
> +			else
> +				schedule_timeout_interruptible(to);
>  		} else {
>  			wait_event_freezable(ksm_thread_wait,
>  				ksmd_should_run() || kthread_should_stop());
> @@ -2175,6 +2183,29 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
>  }
>  KSM_ATTR(run);
>  
> +static ssize_t deferrable_timer_show(struct kobject *kobj,
> +				    struct kobj_attribute *attr, char *buf)
> +{
> +	return snprintf(buf, 8, "%d\n", use_deferrable_timer);
> +}
> +
> +static ssize_t deferrable_timer_store(struct kobject *kobj,
> +				     struct kobj_attribute *attr,
> +				     const char *buf, size_t count)
> +{
> +	unsigned long enable;
> +	int err;
> +
> +	err = kstrtoul(buf, 10, &enable);
> +	if (err < 0)
> +		return err;
> +	if (enable >= 1)
> +		return -EINVAL;

I haven't studied the patch itself, I'm still worrying about the concept.
But this caught my eye just before hitting Send: I don't think we need
a tunable which only accepts the value 0 ;)

> +	use_deferrable_timer = enable;
> +	return count;
> +}
> +KSM_ATTR(deferrable_timer);
> +
>  #ifdef CONFIG_NUMA
>  static ssize_t merge_across_nodes_show(struct kobject *kobj,
>  				struct kobj_attribute *attr, char *buf)
> @@ -2287,6 +2318,7 @@ static struct attribute *ksm_attrs[] = {
>  	&pages_unshared_attr.attr,
>  	&pages_volatile_attr.attr,
>  	&full_scans_attr.attr,
> +	&deferrable_timer_attr.attr,
>  #ifdef CONFIG_NUMA
>  	&merge_across_nodes_attr.attr,
>  #endif
> -- 
> Chintan Pandya
> 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
> member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread
  2014-08-11 12:18   ` Hugh Dickins
@ 2014-08-12 16:25     ` Chintan Pandya
  2014-08-20 11:54       ` Chintan Pandya
  0 siblings, 1 reply; 7+ messages in thread
From: Chintan Pandya @ 2014-08-12 16:25 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-kernel, linux-mm, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Ingo Molnar, linux-arm-msm

Hi Hugh,

>>
>> Typically, on our setup we observed, 10% less power consumption with some
>> use-cases in which CPU goes to power collapse frequently.  For example,
>> playing audio while typically CPU remains idle.
>
> I'm probably stupid, but I don't quite get your scenario from that
> description: please would you spell it out a little more clearly for me?

I think I have missed to share some important details here. Scenario 
here in general is where CPUs stay in Low Power mode and waits for 
interrupt for long time. One such situation could be what we observed in 
our testing. In our SoC based platform, Audio decode happens on a 
separate HW block but during that, use-case demands CPUs to stay in Low 
Power mode (not completely shut down) if not busy. Classic end-user 
scenario is Audio playback while user does nothing else on the 
mobile/watch. At this time, KSM wakes up the CPU at very regular 
interval and affects the power consumption. Another scenario could be 
background email sync where also for very long duration, CPU stays in 
Low Power mode, not busy, not off-line.

>
> Are you thinking of two CPUs, one of them running a process busily
> streaming audio (with no VM_MERGEABLE areas to work on), most other
> processes sleeping, and ksmd "pinned" to another, otherwise idle CPU?
>
> I'm very inexperienced in scheduler (and audio) matters, but I'd like
> to think that the scheduler would migrate ksmd to the mostly busy CPU
> in that case - or is it actually 100% busy, with no room for ksmd too?

IMHO waking up the CPU considering busyness of active CPU is probably 
not scheduler's interest (I am completely naive here so please correct 
me if I am wrong). So, here I believe that ksmd will get scheduled on 
active CPU where its timer will expire and not deferred.

>> To enable deferrable timers,
>> $ echo 1>  /sys/kernel/mm/ksm/deferrable_timer
>
> I do share Andrew's original reservations: I'd much prefer this if we
> can just go ahead and do the deferrable timer without a new tunable
> to concern the user, simple though your "deferrable_timer" knob is.
>
> In an earlier mail, you said "We have observed that KSM does maximum
> savings when system is idle", as reason why some will prefer a non-
> deferrable timer.  I am somewhat suspicious of that observation:
> because KSM waits for a page's checksum to stabilize before it saves
> it in its "unstable" tree of pages to compare against.  So when the
> rest of the system goes idle, KSM is briefly more likely to find
> matches; but that may be a short-lived "success" once the system
> becomes active again.  So, I'm wondering if your observation just
> reflects the mechanics of KSM, and is not actually a reason to
> refrain from using a deferrable timer for everyone.

Sometimes savings in idle time are not always short-lived. For example, 
in 512 MB DDR system running android saturates at somewhat around 25 MB 
of KSM savings. We have observed this saturation achieved quicker when 
phone is in idle. But I think that doesn't disprove your comment above. 
So, I would keep deferrable timer as a default. Next patch.

>
> On the other hand, I have a worry about using deferrable timer here.
> I think I understand the value of a deferrable timer, in doing a job
> which is bound to a particular cpu (mm/slab.c's cache_reap() gives
> me a good example of that).  But ksmd is potentially serving every
> process, every cpu: we would not want it to be deferred indefinitely,
> if other cpus (running processes with VM_MERGEABLE vmas) are active.

I too consider this as undesirable situation. But I think scheduler 
won't schedule ksmd on a non-busy idle CPU if we have active CPUs.

>
> Perhaps the likelihood of that scenario is too low; or perhaps it's
> a reason why we do need to offer your "deferrable_timer" knob.

I didn't thought of this reason for having the knob but I would still 
prefer to have a knob for some futuristic use-cases. Such as,

(1) When power is not constraint (charging mobile ?), we don't want KSM 
to use deferrable timers.

(2) We want maximum savings from KSM at the cost of power to get more 
free memory, even if it is short-lived.

May be above use-cases are not realistic. But providing a knob just 
enables us to implement some logic in userspace. So, what do you think 
of keeping the knob but default value is '1' i.e. use deferrable timers ?

>
> Please, I need to understand better before acking this change.
>
> By the way: perhaps KSM is the right place to start, but please take
> a look also at THP in mm/huge_memory.c, whose khugepaged was originally
> modelled on ksmd (but now seems to be using wait_event_freezable_timeout
> rather than schedule_timeout_interruptible - I've not yet researched the
> history behind that difference).  I expect it to need the same treatment.

I have no idea of THP so far. But would check it in this perspective. 
Thanks for the guide.

>> +	unsigned long enable;
>> +	int err;
>> +
>> +	err = kstrtoul(buf, 10,&enable);
>> +	if (err<  0)
>> +		return err;
>> +	if (enable>= 1)
>> +		return -EINVAL;
>
> I haven't studied the patch itself, I'm still worrying about the concept.
> But this caught my eye just before hitting Send: I don't think we need
> a tunable which only accepts the value 0 ;)

Okay. I can correct this to accept any non-zero value. Is that okay ?

>
>> +	use_deferrable_timer = enable;

-- 
Chintan Pandya

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread
  2014-08-12 16:25     ` Chintan Pandya
@ 2014-08-20 11:54       ` Chintan Pandya
  0 siblings, 0 replies; 7+ messages in thread
From: Chintan Pandya @ 2014-08-20 11:54 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, linux-kernel, linux-mm, Thomas Gleixner,
	John Stultz, Peter Zijlstra, Ingo Molnar, linux-arm-msm

Hi Hugh,

>>> + unsigned long enable;
>>> + int err;
>>> +
>>> + err = kstrtoul(buf, 10,&enable);
>>> + if (err< 0)
>>> + return err;
>>> + if (enable>= 1)
>>> + return -EINVAL;
>>
>> I haven't studied the patch itself, I'm still worrying about the concept.
>> But this caught my eye just before hitting Send: I don't think we need
>> a tunable which only accepts the value 0 ;)
>
> Okay. I can correct this to accept any non-zero value. Is that okay ?

I missed that to reply earlier. This was suggested by Andrew. And I 
think that is okay as displaying any non-zero value to user via this 
knob may not be completely right.

>
>>
>>> + use_deferrable_timer = enable;
>


-- 
Chintan Pandya

QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-20 11:55 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-31  7:59 [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
2014-07-31  7:59 ` [PATCH v3 2/2] ksm: provide support to use deferrable timers for scanner thread Chintan Pandya
2014-08-11 12:18   ` Hugh Dickins
2014-08-12 16:25     ` Chintan Pandya
2014-08-20 11:54       ` Chintan Pandya
2014-08-04  9:26 ` [PATCH v3 1/2] timer: provide an api for deferrable timeout Chintan Pandya
2014-08-04 10:44   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).