linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Frederic Weisbecker <frederic@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Wanpeng Li <wanpengli@tencent.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Yauheni Kaliuta <yauheni.kaliuta@redhat.com>,
	Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>
Subject: Re: [PATCH 19/25] sched/vite: Handle nice updates under vtime
Date: Tue, 20 Nov 2018 15:17:54 +0100	[thread overview]
Message-ID: <20181120141754.GW2131@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <1542163569-20047-20-git-send-email-frederic@kernel.org>

On Wed, Nov 14, 2018 at 03:46:03AM +0100, Frederic Weisbecker wrote:
> On the vtime level, nice updates are currently handled on context
> switches. When a task's nice value gets updated while it is sleeping,
> the context switch takes into account the new nice value in order to
> later record the vtime delta to the appropriate kcpustat index.

Urgh, so this patch should be folded into the previous one. On their own
neither really makes sense.

> We have yet to handle live updates: when set_user_nice() is called
> while the target is running. We'll handle that on two sides:
> 
> * If the caller of set_user_nice() is the current task, we update the
>   vtime state in place.
> 
> * If the target runs on a different CPU, we interrupt it with an IPI to
>   update the vtime state in place.

*groan*... So what are the rules for vtime updates? Who can do that
when?

So when we change nice, we'll have the respective rq locked and task
effectively unqueued. It cannot schedule at such a point. Can
'concurrent' vtime updates still happen?

> The vtime update in question consists in flushing the pending vtime
> delta to the task/kcpustat and resume the accounting on top of the new
> nice value.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f12225f..e8f0437 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3868,6 +3868,7 @@ void set_user_nice(struct task_struct *p, long nice)
>  	int old_prio, delta;
>  	struct rq_flags rf;
>  	struct rq *rq;
> +	long old_nice;
>  
>  	if (task_nice(p) == nice || nice < MIN_NICE || nice > MAX_NICE)
>  		return;
> @@ -3878,6 +3879,8 @@ void set_user_nice(struct task_struct *p, long nice)
>  	rq = task_rq_lock(p, &rf);
>  	update_rq_clock(rq);
>  
> +	old_nice = task_nice(p);
> +
>  	/*
>  	 * The RT priorities are set via sched_setscheduler(), but we still
>  	 * allow the 'normal' nice value to be set - but as expected
> @@ -3913,6 +3916,7 @@ void set_user_nice(struct task_struct *p, long nice)
>  	if (running)
>  		set_curr_task(rq, p);
>  out_unlock:
> +	vtime_set_nice(rq, p, old_nice);
>  	task_rq_unlock(rq, p, &rf);
>  }

That's not sufficient; I think you want to hook set_load_weight() or
something. Things like sys_sched_setattr() can also change the nice
value.

>  EXPORT_SYMBOL(set_user_nice);
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index 07c2e7f..2b35132 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c

> @@ -937,6 +937,33 @@ void vtime_exit_task(struct task_struct *t)
>  	local_irq_restore(flags);
>  }
>  
> +void vtime_set_nice_local(struct task_struct *t)
> +{
> +	struct vtime *vtime = &t->vtime;
> +
> +	write_seqcount_begin(&vtime->seqcount);
> +	if (vtime->state == VTIME_USER)
> +		vtime_account_user(t, vtime, true);
> +	else if (vtime->state == VTIME_GUEST)
> +		vtime_account_guest(t, vtime, true);
> +	vtime->nice = (task_nice(t) > 0) ? 1 : 0;
> +	write_seqcount_end(&vtime->seqcount);
> +}
> +
> +static void vtime_set_nice_func(struct irq_work *work)
> +{
> +	vtime_set_nice_local(current);
> +}
> +
> +static DEFINE_PER_CPU(struct irq_work, vtime_set_nice_work) = {
> +	.func = vtime_set_nice_func,
> +};
> +
> +void vtime_set_nice_remote(int cpu)
> +{
> +	irq_work_queue_on(&per_cpu(vtime_set_nice_work, cpu), cpu);

What happens if you already had one pending? Do we loose updates?

> +}
> +
>  u64 task_gtime(struct task_struct *t)
>  {
>  	struct vtime *vtime = &t->vtime;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 618577f..c7846ca 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1790,6 +1790,45 @@ static inline int sched_tick_offload_init(void) { return 0; }
>  static inline void sched_update_tick_dependency(struct rq *rq) { }
>  #endif
>  
> +static inline void vtime_set_nice(struct rq *rq,
> +				  struct task_struct *p, long old_nice)
> +{
> +#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
> +	long nice;
> +	int cpu;
> +
> +	if (!vtime_accounting_enabled())
> +		return;
> +
> +	cpu = cpu_of(rq);
> +
> +	if (!vtime_accounting_enabled_cpu(cpu))
> +		return;
> +
> +	/*
> +	 * Task not running, nice update will be seen by vtime on its
> +	 * next context switch.
> +	 */
> +	if (!task_current(rq, p))
> +		return;
> +
> +	nice = task_nice(p);
> +
> +	/* Task stays nice, still accounted as nice in kcpustat */
> +	if (old_nice > 0 && nice > 0)
> +		return;
> +
> +	/* Task stays rude, still accounted as non-nice in kcpustat */
> +	if (old_nice <= 0 && nice <= 0)
> +		return;
> +
> +	if (p == current)
> +		vtime_set_nice_local(p);
> +	else
> +		vtime_set_nice_remote(cpu);
> +#endif
> +}

That's _far_ too large for an inline I'm thinking. Also, changing nice
really isn't a fast path or anything.

  reply	other threads:[~2018-11-20 14:18 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  2:45 [PATCH 00/25] sched/nohz: Make kcpustat vtime aware (Fix kcpustat on nohz_full) Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 01/25] sched/vtime: Fix guest/system mis-accounting on task switch Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 02/25] sched/vtime: Protect idle accounting under vtime seqcount Frederic Weisbecker
2018-11-20 13:19   ` Peter Zijlstra
2018-11-14  2:45 ` [PATCH 03/25] vtime: Rename vtime_account_system() to vtime_account_kernel() Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 04/25] vtime: Spare a seqcount lock/unlock cycle on context switch Frederic Weisbecker
2018-11-20 13:25   ` Peter Zijlstra
2019-09-25 14:42     ` Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 05/25] sched/vtime: Record CPU under seqcount for kcpustat needs Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 06/25] sched/cputime: Add vtime idle task state Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 07/25] sched/cputime: Add vtime guest " Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 08/25] vtime: Exit vtime before exit_notify() Frederic Weisbecker
2018-11-20 13:54   ` Peter Zijlstra
2018-11-14  2:45 ` [PATCH 09/25] kcpustat: Track running task following vtime sequences Frederic Weisbecker
2018-11-20 13:58   ` Peter Zijlstra
2018-11-14  2:45 ` [PATCH 10/25] context_tracking: Remove context_tracking_active() Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 11/25] context_tracking: s/context_tracking_is_enabled/context_tracking_enabled() Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 12/25] context_tracking: Rename context_tracking_is_cpu_enabled() to context_tracking_enabled_this_cpu() Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 13/25] context_tracking: Introduce context_tracking_enabled_cpu() Frederic Weisbecker
2018-11-20 14:02   ` Peter Zijlstra
2018-11-14  2:45 ` [PATCH 14/25] sched/vtime: Rename vtime_accounting_cpu_enabled() to vtime_accounting_enabled_this_cpu() Frederic Weisbecker
2018-11-14  2:45 ` [PATCH 15/25] sched/vtime: Introduce vtime_accounting_enabled_cpu() Frederic Weisbecker
2018-11-20 14:04   ` Peter Zijlstra
2018-11-14  2:46 ` [PATCH 16/25] sched/cputime: Allow to pass cputime index on user/guest accounting Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 17/25] sched/cputime: Standardize the kcpustat index based accounting functions Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 18/25] vtime: Track nice-ness on top of context switch Frederic Weisbecker
2018-11-20 14:09   ` Peter Zijlstra
2018-11-14  2:46 ` [PATCH 19/25] sched/vite: Handle nice updates under vtime Frederic Weisbecker
2018-11-20 14:17   ` Peter Zijlstra [this message]
2018-11-26 15:53     ` Frederic Weisbecker
2018-11-26 16:11       ` Peter Zijlstra
2018-11-26 18:41         ` Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 20/25] sched/kcpustat: Introduce vtime-aware kcpustat accessor Frederic Weisbecker
2018-11-20 14:23   ` Peter Zijlstra
2018-11-20 22:40     ` Frederic Weisbecker
2018-11-21  8:18       ` Peter Zijlstra
2018-11-21  8:35         ` Peter Zijlstra
2018-11-21 16:33         ` Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 21/25] procfs: Use vtime aware " Frederic Weisbecker
2018-11-20 14:24   ` Peter Zijlstra
2018-11-20 22:31     ` Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 22/25] cpufreq: " Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 23/25] leds: Use vtime aware kcpustat accessors Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 24/25] rackmeter: " Frederic Weisbecker
2018-11-14  2:46 ` [PATCH 25/25] sched/vtime: Clarify vtime_task_switch() argument layout Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181120141754.GW2131@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=wanpengli@tencent.com \
    --cc=yauheni.kaliuta@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).