linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Julien Desfossez <jdesfossez@digitalocean.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vineeth Pillai <viremana@linux.microsoft.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Aaron Lu <aaron.lwe@gmail.com>,
	Aubrey Li <aubrey.intel@gmail.com>,
	Dhaval Giani <dhaval.giani@oracle.com>,
	Chris Hyser <chris.hyser@oracle.com>,
	Nishanth Aravamudan <naravamudan@digitalocean.com>,
	mingo@kernel.org, pjt@google.com, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org, fweisbec@gmail.com,
	keescook@chromium.org, kerrnel@google.com,
	Phil Auld <pauld@redhat.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	vineeth@bitbyteword.org, Chen Yu <yu.c.chen@intel.com>,
	Christian Brauner <christian.brauner@ubuntu.com>,
	Agata Gruza <agata.gruza@intel.com>,
	Antonio Gomez Iglesias <antonio.gomez.iglesias@intel.com>,
	graf@amazon.com, konrad.wilk@oracle.com, dfaggioli@suse.com,
	rostedt@goodmis.org, derkling@google.com, benbjiang@tencent.com,
	Aubrey Li <aubrey.li@linux.intel.com>,
	Tim Chen <tim.c.chen@intel.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode
Date: Tue, 1 Sep 2020 12:50:52 -0400	[thread overview]
Message-ID: <20200901165052.GA1662854@google.com> (raw)
In-Reply-To: <87y2lth4qa.fsf@nanos.tec.linutronix.de>

On Tue, Sep 01, 2020 at 05:54:53PM +0200, Thomas Gleixner wrote:
> On Fri, Aug 28 2020 at 15:51, Julien Desfossez wrote:
> > @@ -112,59 +113,84 @@ static __always_inline void exit_to_user_mode(void)
> >  /* Workaround to allow gradual conversion of architecture code */
> >  void __weak arch_do_signal(struct pt_regs *regs) { }
> >  
> > -static unsigned long exit_to_user_mode_loop(struct pt_regs *regs,
> > -					    unsigned long ti_work)
> 
> Can the rework of that code please be split out into a seperate patch
> and then adding that unsafe muck on top?

Yeah, good idea. Will do.

> > +static inline bool exit_to_user_mode_work_pending(unsigned long ti_work)
> >  {
> > -	/*
> > -	 * Before returning to user space ensure that all pending work
> > -	 * items have been completed.
> > -	 */
> > -	while (ti_work & EXIT_TO_USER_MODE_WORK) {
> > +	return (ti_work & EXIT_TO_USER_MODE_WORK);
> > +}
> >  
> > -		local_irq_enable_exit_to_user(ti_work);
> > +static inline void exit_to_user_mode_work(struct pt_regs *regs,
> > +					  unsigned long ti_work)
> > +{
> >  
> > -		if (ti_work & _TIF_NEED_RESCHED)
> > -			schedule();
> > +	local_irq_enable_exit_to_user(ti_work);
> >  
> > -		if (ti_work & _TIF_UPROBE)
> > -			uprobe_notify_resume(regs);
> > +	if (ti_work & _TIF_NEED_RESCHED)
> > +		schedule();
> >  
> > -		if (ti_work & _TIF_PATCH_PENDING)
> > -			klp_update_patch_state(current);
> > +	if (ti_work & _TIF_UPROBE)
> > +		uprobe_notify_resume(regs);
> >  
> > -		if (ti_work & _TIF_SIGPENDING)
> > -			arch_do_signal(regs);
> > +	if (ti_work & _TIF_PATCH_PENDING)
> > +		klp_update_patch_state(current);
> >  
> > -		if (ti_work & _TIF_NOTIFY_RESUME) {
> > -			clear_thread_flag(TIF_NOTIFY_RESUME);
> > -			tracehook_notify_resume(regs);
> > -			rseq_handle_notify_resume(NULL, regs);
> > -		}
> > +	if (ti_work & _TIF_SIGPENDING)
> > +		arch_do_signal(regs);
> > +
> > +	if (ti_work & _TIF_NOTIFY_RESUME) {
> > +		clear_thread_flag(TIF_NOTIFY_RESUME);
> > +		tracehook_notify_resume(regs);
> > +		rseq_handle_notify_resume(NULL, regs);
> > +	}
> > +
> > +	/* Architecture specific TIF work */
> > +	arch_exit_to_user_mode_work(regs, ti_work);
> > +
> > +	local_irq_disable_exit_to_user();
> > +}
> >  
> > -		/* Architecture specific TIF work */
> > -		arch_exit_to_user_mode_work(regs, ti_work);
> > +static unsigned long exit_to_user_mode_loop(struct pt_regs *regs)
> > +{
> > +	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > +
> > +	/*
> > +	 * Before returning to user space ensure that all pending work
> > +	 * items have been completed.
> > +	 */
> > +	while (1) {
> > +		/* Both interrupts and preemption could be enabled here. */
> 
>    What? Preemption is enabled here, but interrupts are disabled.

Sorry, I meant about what happens within exit_to_user_mode_work(). That's
what the comment was for. I agree I will do better with the comment next
time.

> > +		if (exit_to_user_mode_work_pending(ti_work))
> > +		    exit_to_user_mode_work(regs, ti_work);
> > +
> > +		/* Interrupts may be reenabled with preemption disabled. */
> > +		sched_core_unsafe_exit_wait(EXIT_TO_USER_MODE_WORK);
> > +
> >  		/*
> > -		 * Disable interrupts and reevaluate the work flags as they
> > -		 * might have changed while interrupts and preemption was
> > -		 * enabled above.
> > +		 * Reevaluate the work flags as they might have changed
> > +		 * while interrupts and preemption were enabled.
> 
> What enables preemption and interrupts? Can you pretty please write
> comments which explain what's going on.

Yes, sorry. So, sched_core_unsafe_exit_wait() will spin with IRQs enabled and
preemption disabled. I did it this way to get past stop_machine() issues
where you were unhappy with us spinning in IRQ disabled region.

> >  		 */
> > -		local_irq_disable_exit_to_user();
> >  		ti_work = READ_ONCE(current_thread_info()->flags);
> > +
> > +		/*
> > +		 * We may be switching out to another task in kernel mode. That
> > +		 * process is responsible for exiting the "unsafe" kernel mode
> > +		 * when it returns to user or guest.
> > +		 */
> > +		if (exit_to_user_mode_work_pending(ti_work))
> > +			sched_core_unsafe_enter();
> > +		else
> > +			break;
> >  	}
> >  
> > -	/* Return the latest work state for arch_exit_to_user_mode() */
> > -	return ti_work;
> > +    return ti_work;
> >  }
> >  
> >  static void exit_to_user_mode_prepare(struct pt_regs *regs)
> >  {
> > -	unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> > +	unsigned long ti_work;
> >  
> >  	lockdep_assert_irqs_disabled();
> >  
> > -	if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK))
> > -		ti_work = exit_to_user_mode_loop(regs, ti_work);
> > +	ti_work = exit_to_user_mode_loop(regs);
> 
> Why has the above loop to be invoked unconditionally even when that core
> scheduling muck is disabled? Just to make all return to user paths more
> expensive unconditionally, right?

If you see the above loop, we are calling exit_to_user_mode_work()
conditionally by calling exit_to_user_mode_work_pending() which does the same
thing.

So we are still conditionally doing the usual exit_to_user_mode work, its
just that now we have to unconditionally invoke the 'loop' anyway. The reason
for that is, the loop can switch into another thread, so we have to do
unsafe_exit() for the old thread, and unsafe_enter() for the new one while
handling the tif work properly. We could get migrated to another CPU in this
loop itself, AFAICS. So the unsafe_enter() / unsafe_exit() calls could also
happen on different CPUs.

I will rework it by splitting, and adding more elaborate comments, etc.
Thanks,

 - Joel



  reply	other threads:[~2020-09-01 16:51 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 19:51 [RFC PATCH v7 00/23] Core scheduling v7 Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 01/23] sched: Wrap rq::lock access Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 02/23] sched: Introduce sched_class::pick_task() Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 03/23] sched: Core-wide rq->lock Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 04/23] sched/fair: Add a few assertions Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 05/23] sched: Basic tracking of matching tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 06/23] bitops: Introduce find_next_or_bit Julien Desfossez
2020-09-03  5:13   ` Randy Dunlap
2020-08-28 19:51 ` [RFC PATCH v7 07/23] cpumask: Introduce a new iterator for_each_cpu_wrap_or Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 08/23] sched: Add core wide task selection and scheduling Julien Desfossez
2020-08-28 20:51   ` Peter Zijlstra
2020-08-28 22:02     ` Vineeth Pillai
2020-08-28 22:23       ` Joel Fernandes
2020-08-29  7:47       ` peterz
2020-08-31 13:01         ` Vineeth Pillai
2020-08-31 14:24         ` Joel Fernandes
2020-09-01  3:38         ` Joel Fernandes
2020-09-01  5:10         ` Joel Fernandes
2020-09-01 12:34           ` Vineeth Pillai
2020-09-01 17:30             ` Joel Fernandes
2020-09-01 21:23               ` Vineeth Pillai
2020-09-02  1:11                 ` Joel Fernandes
2020-08-28 20:55   ` Peter Zijlstra
2020-08-28 22:15     ` Vineeth Pillai
2020-09-15 20:08   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 09/23] sched/fair: Fix forced idle sibling starvation corner case Julien Desfossez
2020-08-28 21:25   ` Peter Zijlstra
2020-08-28 23:24     ` Vineeth Pillai
2020-08-28 19:51 ` [RFC PATCH v7 10/23] sched/fair: wrapper for cfs_rq->min_vruntime Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 11/23] sched/fair: core wide cfs task priority comparison Julien Desfossez
2020-08-28 21:29   ` Peter Zijlstra
2020-09-17 14:15     ` Vineeth Pillai
2020-09-17 20:39       ` Vineeth Pillai
2020-09-23  1:46     ` Joel Fernandes
2020-09-23  1:52       ` Joel Fernandes
2020-09-25 15:02         ` Joel Fernandes
2020-09-15 21:49   ` chris hyser
     [not found]     ` <81b208ad-b9e6-bfbf-631e-02e9f75d73a2@linux.intel.com>
2020-09-16 14:24       ` chris hyser
2020-09-16 20:53         ` chris hyser
2020-09-17  1:09           ` Li, Aubrey
2020-08-28 19:51 ` [RFC PATCH v7 12/23] sched: Trivial forced-newidle balancer Julien Desfossez
2020-09-02  7:08   ` Pavan Kondeti
2020-08-28 19:51 ` [RFC PATCH v7 13/23] sched: migration changes for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 14/23] irq_work: Add support to detect if work is pending Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 15/23] entry/idle: Add a common function for activites during idle entry/exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 16/23] arch/x86: Add a new TIF flag for untrusted tasks Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 17/23] kernel/entry: Add support for core-wide protection of kernel-mode Julien Desfossez
2020-09-01 15:54   ` Thomas Gleixner
2020-09-01 16:50     ` Joel Fernandes [this message]
2020-09-01 20:02       ` Thomas Gleixner
2020-09-02  1:29         ` Joel Fernandes
2020-09-02  7:53           ` Thomas Gleixner
2020-09-02 15:12             ` Joel Fernandes
2020-09-02 16:57             ` Dario Faggioli
2020-09-03  4:34               ` Joel Fernandes
2020-09-03 11:05                 ` Vineeth Pillai
2020-09-03 13:20                 ` Thomas Gleixner
2020-09-03 20:30                   ` Joel Fernandes
2020-09-03 13:43                 ` Dario Faggioli
2020-09-03 20:25                   ` Joel Fernandes
2020-08-28 19:51 ` [RFC PATCH v7 18/23] entry/idle: Enter and exit kernel protection during idle entry and exit Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 19/23] entry/kvm: Protect the kernel when entering from guest Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 20/23] sched/coresched: config option for kernel protection Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 21/23] sched: cgroup tagging interface for core scheduling Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 22/23] Documentation: Add documentation on " Julien Desfossez
2020-08-28 19:51 ` [RFC PATCH v7 23/23] sched: Debug bits Julien Desfossez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200901165052.GA1662854@google.com \
    --to=joel@joelfernandes.org \
    --cc=aaron.lwe@gmail.com \
    --cc=agata.gruza@intel.com \
    --cc=antonio.gomez.iglesias@intel.com \
    --cc=aubrey.intel@gmail.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=benbjiang@tencent.com \
    --cc=chris.hyser@oracle.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=derkling@google.com \
    --cc=dfaggioli@suse.com \
    --cc=dhaval.giani@oracle.com \
    --cc=fweisbec@gmail.com \
    --cc=graf@amazon.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineeth@bitbyteword.org \
    --cc=viremana@linux.microsoft.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).