linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently.
@ 2020-04-16 14:18 Muchun Song
  2020-04-16 15:35 ` Steven Rostedt
  0 siblings, 1 reply; 5+ messages in thread
From: Muchun Song @ 2020-04-16 14:18 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, mingo
  Cc: linux-kernel, Muchun Song

The user_mode(task_pt_regs(tsk)) always return true for
user thread, and false for kernel thread. So it means that
the cpuacct.usage_sys is the time that kernel thread uses
not the time that thread uses in the kernel mode. We can
use get_irq_regs() instead of task_pt_regs() to fix it.

Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
 kernel/sched/cpuacct.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 6448b0438ffb2..edfc62554648e 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -5,6 +5,7 @@
  * Based on the work by Paul Menage (menage@google.com) and Balbir Singh
  * (balbir@in.ibm.com).
  */
+#include <asm/irq_regs.h>
 #include "sched.h"
 
 /* Time spent by the tasks of the CPU accounting group executing in ... */
@@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
 {
 	struct cpuacct *ca;
 	int index = CPUACCT_STAT_SYSTEM;
-	struct pt_regs *regs = task_pt_regs(tsk);
+	struct pt_regs *regs = get_irq_regs();
 
 	if (regs && user_mode(regs))
 		index = CPUACCT_STAT_USER;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently.
  2020-04-16 14:18 [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently Muchun Song
@ 2020-04-16 15:35 ` Steven Rostedt
  2020-04-16 16:01   ` Steven Rostedt
  2020-04-17  3:07   ` Muchun Song
  0 siblings, 2 replies; 5+ messages in thread
From: Steven Rostedt @ 2020-04-16 15:35 UTC (permalink / raw)
  To: Muchun Song
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, mingo, linux-kernel

On Thu, 16 Apr 2020 22:18:33 +0800
Muchun Song <songmuchun@bytedance.com> wrote:

> The user_mode(task_pt_regs(tsk)) always return true for
> user thread, and false for kernel thread. So it means that
> the cpuacct.usage_sys is the time that kernel thread uses
> not the time that thread uses in the kernel mode. We can
> use get_irq_regs() instead of task_pt_regs() to fix it.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
>  kernel/sched/cpuacct.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index 6448b0438ffb2..edfc62554648e 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -5,6 +5,7 @@
>   * Based on the work by Paul Menage (menage@google.com) and Balbir Singh
>   * (balbir@in.ibm.com).
>   */
> +#include <asm/irq_regs.h>
>  #include "sched.h"
>  
>  /* Time spent by the tasks of the CPU accounting group executing in ... */
> @@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>  {
>  	struct cpuacct *ca;
>  	int index = CPUACCT_STAT_SYSTEM;
> -	struct pt_regs *regs = task_pt_regs(tsk);
> +	struct pt_regs *regs = get_irq_regs();

But get_irq_regs() is only available from interrupt context. This will be
NULL most the time, whereas the original way will have regs existing for
the task.

>  
>  	if (regs && user_mode(regs))
>  		index = CPUACCT_STAT_USER;

To show this, I applied your patch then did the following:

 # echo 'p:cpuacct cpuacct_charge+0x36 regs=%ax' > /sys/kernel/tracing/kprobe_events

Where I found that the test of 'regs' is %rax at offset 0x36.

 # trace-cmd start -p function -l cpuacct_charge -e kprobes
 # trace-cmd show
# tracer: function
#
# entries-in-buffer/entries-written: 70664/70664   #P:8
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           <...>-1720  [002] d..2   306.430302: cpuacct_charge <-update_curr
           <...>-1720  [002] d..3   306.430306: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
           <...>-1720  [002] dN.2   306.430321: cpuacct_charge <-update_curr
           <...>-1720  [002] dN.3   306.430322: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
           <...>-1720  [002] d..2   306.430355: cpuacct_charge <-update_curr
           <...>-1720  [002] d..3   306.430357: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
            bash-1652  [006] d.h2   306.430799: cpuacct_charge <-update_curr
            bash-1652  [006] d.h3   306.430802: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0xffffaf34012abdd8
           <...>-199   [005] d.h2   306.430806: cpuacct_charge <-update_curr
           <...>-199   [005] d.h3   306.430809: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0xffffaf3400347c38
           <...>-16    [001] d..2   306.430873: cpuacct_charge <-update_curr
           <...>-16    [001] d..3   306.430875: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
           <...>-199   [005] d..2   306.430936: cpuacct_charge <-update_curr
           <...>-199   [005] d..3   306.430937: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
            bash-1652  [006] d..2   306.430944: cpuacct_charge <-update_curr
            bash-1652  [006] d..3   306.430946: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
            sshd-1649  [000] d..2   306.430990: cpuacct_charge <-update_curr
            sshd-1649  [000] d..3   306.430992: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.432844: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.432846: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.436848: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.436850: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.440868: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.440871: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.444867: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.444870: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     kworker/2:1-127   [002] d..2   306.446925: cpuacct_charge <-update_curr
     kworker/2:1-127   [002] d..3   306.446928: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.448868: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.448870: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
     rcu_preempt-10    [006] d..2   306.452869: cpuacct_charge <-update_curr
     rcu_preempt-10    [006] d..3   306.452872: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0

The only times regs has content is from the the interrupt handler (seen as
the 'h' in the status portion of the trace.

-- Steve

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently.
  2020-04-16 15:35 ` Steven Rostedt
@ 2020-04-16 16:01   ` Steven Rostedt
  2020-04-17  3:11     ` [External] " Muchun Song
  2020-04-17  3:07   ` Muchun Song
  1 sibling, 1 reply; 5+ messages in thread
From: Steven Rostedt @ 2020-04-16 16:01 UTC (permalink / raw)
  To: Muchun Song
  Cc: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, mingo, linux-kernel

On Thu, 16 Apr 2020 11:35:02 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 16 Apr 2020 22:18:33 +0800
> Muchun Song <songmuchun@bytedance.com> wrote:
> 
> > The user_mode(task_pt_regs(tsk)) always return true for
> > user thread, and false for kernel thread. So it means that
> > the cpuacct.usage_sys is the time that kernel thread uses
> > not the time that thread uses in the kernel mode. We can
> > use get_irq_regs() instead of task_pt_regs() to fix it.
> > 
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  kernel/sched/cpuacct.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> > index 6448b0438ffb2..edfc62554648e 100644
> > --- a/kernel/sched/cpuacct.c
> > +++ b/kernel/sched/cpuacct.c
> > @@ -5,6 +5,7 @@
> >   * Based on the work by Paul Menage (menage@google.com) and Balbir Singh
> >   * (balbir@in.ibm.com).
> >   */
> > +#include <asm/irq_regs.h>
> >  #include "sched.h"
> >  
> >  /* Time spent by the tasks of the CPU accounting group executing in ... */
> > @@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> >  {
> >  	struct cpuacct *ca;
> >  	int index = CPUACCT_STAT_SYSTEM;
> > -	struct pt_regs *regs = task_pt_regs(tsk);
> > +	struct pt_regs *regs = get_irq_regs();  
> 
> But get_irq_regs() is only available from interrupt context. This will be
> NULL most the time, whereas the original way will have regs existing for
> the task.

Perhaps you want:

	regs = get_irqs_regs();
	if (!regs)
		regs = task_pt_regs(tsk);

?

-- Steve

> 
> >  
> >  	if (regs && user_mode(regs))
> >  		index = CPUACCT_STAT_USER;  
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently.
  2020-04-16 15:35 ` Steven Rostedt
  2020-04-16 16:01   ` Steven Rostedt
@ 2020-04-17  3:07   ` Muchun Song
  1 sibling, 0 replies; 5+ messages in thread
From: Muchun Song @ 2020-04-17  3:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, Peter Zijlstra, juri.lelli, Vincent Guittot,
	dietmar.eggemann, Benjamin Segall, mgorman, mingo, linux-kernel

On Thu, Apr 16, 2020 at 11:35 PM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 16 Apr 2020 22:18:33 +0800
> Muchun Song <songmuchun@bytedance.com> wrote:
>
> > The user_mode(task_pt_regs(tsk)) always return true for
> > user thread, and false for kernel thread. So it means that
> > the cpuacct.usage_sys is the time that kernel thread uses
> > not the time that thread uses in the kernel mode. We can
> > use get_irq_regs() instead of task_pt_regs() to fix it.
> >
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > ---
> >  kernel/sched/cpuacct.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> > index 6448b0438ffb2..edfc62554648e 100644
> > --- a/kernel/sched/cpuacct.c
> > +++ b/kernel/sched/cpuacct.c
> > @@ -5,6 +5,7 @@
> >   * Based on the work by Paul Menage (menage@google.com) and Balbir Singh
> >   * (balbir@in.ibm.com).
> >   */
> > +#include <asm/irq_regs.h>
> >  #include "sched.h"
> >
> >  /* Time spent by the tasks of the CPU accounting group executing in ... */
> > @@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> >  {
> >       struct cpuacct *ca;
> >       int index = CPUACCT_STAT_SYSTEM;
> > -     struct pt_regs *regs = task_pt_regs(tsk);
> > +     struct pt_regs *regs = get_irq_regs();
>
> But get_irq_regs() is only available from interrupt context. This will be
> NULL most the time, whereas the original way will have regs existing for
> the task.
>
> >
> >       if (regs && user_mode(regs))
> >               index = CPUACCT_STAT_USER;
>
> To show this, I applied your patch then did the following:
>
>  # echo 'p:cpuacct cpuacct_charge+0x36 regs=%ax' > /sys/kernel/tracing/kprobe_events
>
> Where I found that the test of 'regs' is %rax at offset 0x36.
>
>  # trace-cmd start -p function -l cpuacct_charge -e kprobes
>  # trace-cmd show
> # tracer: function
> #
> # entries-in-buffer/entries-written: 70664/70664   #P:8
> #
> #                              _-----=> irqs-off
> #                             / _----=> need-resched
> #                            | / _---=> hardirq/softirq
> #                            || / _--=> preempt-depth
> #                            ||| /     delay
> #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> #              | |       |   ||||       |         |
>            <...>-1720  [002] d..2   306.430302: cpuacct_charge <-update_curr
>            <...>-1720  [002] d..3   306.430306: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>            <...>-1720  [002] dN.2   306.430321: cpuacct_charge <-update_curr
>            <...>-1720  [002] dN.3   306.430322: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>            <...>-1720  [002] d..2   306.430355: cpuacct_charge <-update_curr
>            <...>-1720  [002] d..3   306.430357: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>             bash-1652  [006] d.h2   306.430799: cpuacct_charge <-update_curr
>             bash-1652  [006] d.h3   306.430802: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0xffffaf34012abdd8
>            <...>-199   [005] d.h2   306.430806: cpuacct_charge <-update_curr
>            <...>-199   [005] d.h3   306.430809: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0xffffaf3400347c38
>            <...>-16    [001] d..2   306.430873: cpuacct_charge <-update_curr
>            <...>-16    [001] d..3   306.430875: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>            <...>-199   [005] d..2   306.430936: cpuacct_charge <-update_curr
>            <...>-199   [005] d..3   306.430937: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>             bash-1652  [006] d..2   306.430944: cpuacct_charge <-update_curr
>             bash-1652  [006] d..3   306.430946: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>             sshd-1649  [000] d..2   306.430990: cpuacct_charge <-update_curr
>             sshd-1649  [000] d..3   306.430992: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.432844: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.432846: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.436848: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.436850: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.440868: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.440871: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.444867: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.444870: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      kworker/2:1-127   [002] d..2   306.446925: cpuacct_charge <-update_curr
>      kworker/2:1-127   [002] d..3   306.446928: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.448868: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.448870: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>      rcu_preempt-10    [006] d..2   306.452869: cpuacct_charge <-update_curr
>      rcu_preempt-10    [006] d..3   306.452872: cpuacct: (cpuacct_charge+0x36/0x1f0) regs=0x0
>
> The only times regs has content is from the the interrupt handler (seen as
> the 'h' in the status portion of the trace.
>
> -- Steve

Thanks for your test. You are right.

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [External] Re: [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently.
  2020-04-16 16:01   ` Steven Rostedt
@ 2020-04-17  3:11     ` Muchun Song
  0 siblings, 0 replies; 5+ messages in thread
From: Muchun Song @ 2020-04-17  3:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: mingo, Peter Zijlstra, juri.lelli, Vincent Guittot,
	dietmar.eggemann, Benjamin Segall, mgorman, mingo, linux-kernel

On Fri, Apr 17, 2020 at 12:01 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 16 Apr 2020 11:35:02 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Thu, 16 Apr 2020 22:18:33 +0800
> > Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > > The user_mode(task_pt_regs(tsk)) always return true for
> > > user thread, and false for kernel thread. So it means that
> > > the cpuacct.usage_sys is the time that kernel thread uses
> > > not the time that thread uses in the kernel mode. We can
> > > use get_irq_regs() instead of task_pt_regs() to fix it.
> > >
> > > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > > ---
> > >  kernel/sched/cpuacct.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> > > index 6448b0438ffb2..edfc62554648e 100644
> > > --- a/kernel/sched/cpuacct.c
> > > +++ b/kernel/sched/cpuacct.c
> > > @@ -5,6 +5,7 @@
> > >   * Based on the work by Paul Menage (menage@google.com) and Balbir Singh
> > >   * (balbir@in.ibm.com).
> > >   */
> > > +#include <asm/irq_regs.h>
> > >  #include "sched.h"
> > >
> > >  /* Time spent by the tasks of the CPU accounting group executing in ... */
> > > @@ -339,7 +340,7 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> > >  {
> > >     struct cpuacct *ca;
> > >     int index = CPUACCT_STAT_SYSTEM;
> > > -   struct pt_regs *regs = task_pt_regs(tsk);
> > > +   struct pt_regs *regs = get_irq_regs();
> >
> > But get_irq_regs() is only available from interrupt context. This will be
> > NULL most the time, whereas the original way will have regs existing for
> > the task.
>
> Perhaps you want:
>
>         regs = get_irqs_regs();
>         if (!regs)
>                 regs = task_pt_regs(tsk);
>
> ?

Yeah, If regs is NULL, we can get it fall back to task_pt_regs.
Does Anyone else have suggestions?

>
> -- Steve
>
> >
> > >
> > >     if (regs && user_mode(regs))
> > >             index = CPUACCT_STAT_USER;
> >

-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-04-17  3:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-16 14:18 [PATCH] sched/cpuacct: Fix charge cpuacct.usage_sys incorrently Muchun Song
2020-04-16 15:35 ` Steven Rostedt
2020-04-16 16:01   ` Steven Rostedt
2020-04-17  3:11     ` [External] " Muchun Song
2020-04-17  3:07   ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).