All of lore.kernel.org
 help / color / mirror / Atom feed
From: Giovanni Gherdovich <ggherdovich@suse.cz>
To: Stanislaw Gruszka <sgruszka@redhat.com>, linux-kernel@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Mike Galbraith <mgalbraith@suse.de>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH] sched/cputime: do not account thread group tasks pending runtime to improve performance
Date: Fri, 26 Aug 2016 17:24:26 +0200	[thread overview]
Message-ID: <1472225066.1821.24.camel@suse.cz> (raw)
In-Reply-To: <20160817093043.GA25206@redhat.com>

On Wed, 2016-08-17 at 11:30 +0200, Stanislaw Gruszka wrote:
> Commit d670ec13178d0 ("posix-cpu-timers: Cure SMP wobbles") makes we
> account thread group tasks pending runtime in thread_group_cputime().
> Another commit 6e998916dfe32 ("sched/cputime:
> Fix clock_nanosleep()/clock_gettime() inconsistency") makes we update
> scheduler runtime statistics (call update_curr()) when read task pending
> runtime. Those changes cause bad performance of times() and
> clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls.
> 
> While we would like to have cpuclock monotonicity kept i.e. have
> problems fixed by above commits stay fixed, we also would like to have
> good performance.
>
>                  [... snip ...]
>
> Reported-and-tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
> ---
>  kernel/sched/cputime.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 1934f65..4fca604 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -301,6 +301,26 @@ static inline cputime_t account_other_time(cputime_t max)
>  	return accounted;
>  }
>  
> +#ifdef CONFIG_64BIT
> +static inline u64 read_sum_exec_runtime(struct task_struct *t)
> +{
> +	return t->se.sum_exec_runtime;
> +}
> +#else
> +static u64 read_sum_exec_runtime(struct task_struct *t)
> +{
> +	u64 ns;
> +	struct rq_flags rf;
> +	struct rq *rq;
> +
> +	rq = task_rq_lock(t, &rf);
> +	ns = t->se.sum_exec_runtime;
> +	task_rq_unlock(rq, t, &rf);
> +
> +	return ns;
> +}
> +#endif
> +
>  /*
>   * Accumulate raw cputime values of dead tasks (sig->[us]time) and live
>   * tasks (sum on group iteration) belonging to @tsk's group.
> @@ -313,6 +333,17 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  	unsigned int seq, nextseq;
>  	unsigned long flags;
>  
> +	/*
> +	 * Update current task runtime to account pending time since last
> +	 * scheduler action or thread_group_cputime() call. This thread group
> +	 * might have other running tasks on different CPUs, but updating
> +	 * their runtime can affect syscall performance, so we skip account
> +	 * those pending times and rely only on values updated on tick or
> +	 * other scheduler action.
> +	 */
> +	if (same_thread_group(current, tsk))
> +		(void) task_sched_runtime(current);
> +
>  	rcu_read_lock();
>  	/* Attempt a lockless read on the first round. */
>  	nextseq = 0;
> @@ -327,7 +358,7 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>  			task_cputime(t, &utime, &stime);
>  			times->utime += utime;
>  			times->stime += stime;
> -			times->sum_exec_runtime += task_sched_runtime(t);
> +			times->sum_exec_runtime += read_sum_exec_runtime(t);
>  		}
>  		/* If lockless access failed, take the lock. */
>  		nextseq = 1;

Hello Stanislaw and all,

I know I'm quite late to the party as this patch is already taken in Ingo's
"tip" repo, but I want to chime in anyway and give my positive review and
acknowledgment of the patch.

The patch works as advertised in the commit message; the time accounting
behaviour you're changing is consistent with what happened before
d670ec13178d0 "posix-cpu-timers: Cure SMP wobbles", i.e. only the runtime
statistics for the current task are up-to-date and not those for all the other
threads in the group. As you say, that's how things used to work -- I'm
favorable to this trade-off.

You correctly address Mel Gorman's remark ("how do you know that tsk ==
current?") by using the "current" macro when you call task_sched_runtime.
As you note, task_sched_runtime(current) (which in turns call update_curr on
that task) is all you need to solve the problem of "the diff of 'process'
should always be >= the diff of 'thread'" that you initially addressed in your
6e998916df "sched/cputime: Fix clock_nanosleep()/clock_gettime()
inconsistency".

Acked-by: Giovanni Gherdovich <ggherdovich@suse.cz>


--
Giovanni Gherdovich
SUSE Labs

      parent reply	other threads:[~2016-08-26 15:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-17  9:30 [PATCH] sched/cputime: do not account thread group tasks pending runtime to improve performance Stanislaw Gruszka
2016-08-18 11:04 ` [tip:sched/core] sched/cputime: Improve scalability by not accounting thread group tasks pending runtime tip-bot for Stanislaw Gruszka
2016-08-26 15:24 ` Giovanni Gherdovich [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1472225066.1821.24.camel@suse.cz \
    --to=ggherdovich@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgalbraith@suse.de \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sgruszka@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.