[1/3] sched: define a function to report the number of context switches on a CPU
diff mbox series

Message ID 1566281669-48212-2-git-send-email-longli@linuxonhyperv.com
State New
Headers show
Series
  • fix interrupt swamp in NVMe
Related show

Commit Message

Long Li Aug. 20, 2019, 6:14 a.m. UTC
From: Long Li <longli@microsoft.com>

The number of context switches on a CPU is useful to determine how busy this
CPU is on processing IRQs. Export this information so it can be used by device
drivers.

Signed-off-by: Long Li <longli@microsoft.com>
---
 include/linux/sched.h | 1 +
 kernel/sched/core.c   | 6 ++++++
 2 files changed, 7 insertions(+)

Comments

Peter Zijlstra Aug. 20, 2019, 9:38 a.m. UTC | #1
On Mon, Aug 19, 2019 at 11:14:27PM -0700, longli@linuxonhyperv.com wrote:
> From: Long Li <longli@microsoft.com>
> 
> The number of context switches on a CPU is useful to determine how busy this
> CPU is on processing IRQs. Export this information so it can be used by device
> drivers.

Please do explain that; because I'm not seeing how number of switches
relates to processing IRQs _at_all_!
Peter Zijlstra Aug. 20, 2019, 9:39 a.m. UTC | #2
On Mon, Aug 19, 2019 at 11:14:27PM -0700, longli@linuxonhyperv.com wrote:

> +u64 get_cpu_rq_switches(int cpu)
> +{
> +	return cpu_rq(cpu)->nr_switches;
> +}
> +EXPORT_SYMBOL_GPL(get_cpu_rq_switches);

Also, that is broken on 32bit.
Long Li Aug. 21, 2019, 8:20 a.m. UTC | #3
>>>Subject: Re: [PATCH 1/3] sched: define a function to report the number of
>>>context switches on a CPU
>>>
>>>On Mon, Aug 19, 2019 at 11:14:27PM -0700, longli@linuxonhyperv.com
>>>wrote:
>>>> From: Long Li <longli@microsoft.com>
>>>>
>>>> The number of context switches on a CPU is useful to determine how
>>>> busy this CPU is on processing IRQs. Export this information so it can
>>>> be used by device drivers.
>>>
>>>Please do explain that; because I'm not seeing how number of switches
>>>relates to processing IRQs _at_all_!

Some kernel components rely on context switch to progress, for example watchdog and RCU. On a CPU with reasonable interrupt load, it continues to make context switches, normally a number of switches per seconds. 

While observing a CPU with heavy interrupt loads, I see that it spends all its time in IRQ and softIRQ, and not to get a chance to do a switch (calling __schedule()) for a long time. This will result in system unresponsive at times. The purpose is to find out if this CPU is in this state, and implement some throttling mechanism to help reduce the number of interrupts. I think the number of switches is not accurate for detecting this condition in the most precise way, but maybe it's good enough.

I agree this may not be the best way. If you have other idea on detecting a CPU is swamped by interrupts, please point me to where to look at.

Thanks

Long
Peter Zijlstra Aug. 21, 2019, 10:34 a.m. UTC | #4
On Wed, Aug 21, 2019 at 08:20:48AM +0000, Long Li wrote:
> >>>Subject: Re: [PATCH 1/3] sched: define a function to report the number of
> >>>context switches on a CPU
> >>>
> >>>On Mon, Aug 19, 2019 at 11:14:27PM -0700, longli@linuxonhyperv.com
> >>>wrote:
> >>>> From: Long Li <longli@microsoft.com>
> >>>>
> >>>> The number of context switches on a CPU is useful to determine how
> >>>> busy this CPU is on processing IRQs. Export this information so it can
> >>>> be used by device drivers.
> >>>
> >>>Please do explain that; because I'm not seeing how number of switches
> >>>relates to processing IRQs _at_all_!
> 
> Some kernel components rely on context switch to progress, for example
> watchdog and RCU. On a CPU with reasonable interrupt load, it
> continues to make context switches, normally a number of switches per
> seconds. 

That isn't true; RCU is perfectly fine with a single task always running
and not making context switches, and so is the watchdog.

Patch
diff mbox series

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9b35aff09f70..575f1ef7b159 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1500,6 +1500,7 @@  current_restore_flags(unsigned long orig_flags, unsigned long flags)
 
 extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial);
 extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed);
+extern u64 get_cpu_rq_switches(int cpu);
 #ifdef CONFIG_SMP
 extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask);
 extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4a8e7207cafa..1a76f0e97c2d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1143,6 +1143,12 @@  int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask)
 }
 EXPORT_SYMBOL_GPL(set_cpus_allowed_ptr);
 
+u64 get_cpu_rq_switches(int cpu)
+{
+	return cpu_rq(cpu)->nr_switches;
+}
+EXPORT_SYMBOL_GPL(get_cpu_rq_switches);
+
 void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
 {
 #ifdef CONFIG_SCHED_DEBUG