All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: gor@linux.ibm.com, jpoimboe@redhat.com, jikos@kernel.org,
	mbenes@suse.cz, pmladek@suse.com, mingo@kernel.org,
	linux-kernel@vger.kernel.org, joe.lawrence@redhat.com,
	fweisbec@gmail.com, tglx@linutronix.de, hca@linux.ibm.com,
	svens@linux.ibm.com, sumanthk@linux.ibm.com,
	live-patching@vger.kernel.org, paulmck@kernel.org,
	rostedt@goodmis.org, x86@kernel.org
Subject: Re: [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL
Date: Thu, 21 Oct 2021 16:57:09 -0300	[thread overview]
Message-ID: <20211021195709.GA22422@fuller.cnet> (raw)
In-Reply-To: <20211021192543.GV174703@worktop.programming.kicks-ass.net>

On Thu, Oct 21, 2021 at 09:25:43PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 03:39:35PM -0300, Marcelo Tosatti wrote:
> > Peter,
> > 
> > static __always_inline void arch_exit_to_user_mode(void)
> > {
> >         mds_user_clear_cpu_buffers();
> > }
> > 
> > /**
> >  * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability
> >  *
> >  * Clear CPU buffers if the corresponding static key is enabled
> >  */
> > static __always_inline void mds_user_clear_cpu_buffers(void)
> > {
> >         if (static_branch_likely(&mds_user_clear))
> >                 mds_clear_cpu_buffers();
> > }
> > 
> > We were discussing how to perform objtool style validation 
> > that no code after the check for 
> 
> I'm not sure what the point of the above is... Were you trying to ask
> for validation that nothing runs after the mds_user_clear_cpu_buffer()?
> 
> That isn't strictly true today, there's lockdep code after it. I can't
> recall why that order is as it is though.
> 
> Pretty much everything in noinstr is magical, we just have to think
> harder there (and possibly start writing more comments there).

mds_user_clear_cpu_buffers happens after sync_core, in your patchset, 
if i am not mistaken.

> > > +             /* NMI happens here and must still do/finish CT_WORK_n */
> > > +             sync_core();
> > 
> > But after the discussion with you, it seems doing the TLB checking 
> > and (also sync_core) checking very late/very early on exit/entry 
> > makes things easier to review.
> 
> I don't know about late, it must happen *very* early in entry. The
> sync_core() must happen before any self-modifying code gets called
> (static_branch, static_call, etc..) with possible exception of the
> context_tracking static_branch.
> 
> The TLBi must also happen super early, possibly while still on the
> entry stack (since the task stack is vmap'ed).

But will it be ever be freed/remapped from other CPUs while the task
is running?

> We currently don't run C
> code on the entry stack, that needs quite a bit of careful work to make
> happen.

Was thinking of coding in ASM after (as early as possible) the write to 
switch to kernel CR3:

 Kernel entry:
 -------------

       cpu = smp_processor_id();

       if (isolation_enabled(cpu)) {
               reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
               if (reqs & CPU_REQ_FLUSH_TLB)
			flush_tlb_all();
               if (reqs & CPU_REQ_SYNC_CORE)
			sync_core();
       }                           

Exit to userspace (as close to write to CR3 with user pagetable
pointer):
 -----------------

       cpu = smp_processor_id();

       if (isolation_enabled(cpu)) {
               atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
       }

You think that is a bad idea (in ASM, not C) ? 
And request side can be in C:

 Request side:
 -------------

       int targetcpu;

       do {
               struct percpudata *pcpudata = per_cpu(&percpudata, targetcpu);

               old_state = pcpudata->user_kernel_state;

               /* in kernel mode ? */
               if (!(old_state & IN_USER_MODE)) {
                       smp_call_function_single(request_fn, targetcpu, 1);
                       break;
               }                                                                                                                         
               new_state = remote_state | CPU_REQ_FLUSH_TLB; // (or CPU_REQ_INV_ICACHE)
       } while (atomic_cmpxchg(&pcpudata->user_kernel_state, old_state, new_state) != old_state);   

(need logic to protect from atomic_cmpxchg always failing, but shouldnt
be difficult).

> > Can then use a single atomic variable with USER/KERNEL state and cmpxchg
> > loops.
> 
> We're not going to add an atomic to context tracking. There is one, we
> just got to extract/share it with RCU.

Again, to avoid kernel TLB flushes you'd have to ensure:

kernel entry:
	instrA addr1,addr2,addr3
	instrB addr2,addr3,addr4  <--- that no address here has TLBs
				       modified and flushed
	instrC addr5,addr6,addr7
        reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
        if (reqs & CPU_REQ_FLUSH_TLB)
        	flush_tlb_all();

kernel exit:

        atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
	instrA addr1,addr2,addr3
	instrB addr2,addr3,addr4  <--- that no address here has TLBs
				       modified and flushed

This could be conditional on "task isolated mode" enabled (would be 
better if it didnt, though).

			


  reply	other threads:[~2021-10-21 19:57 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29 15:17 [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 01/11] sched: Improve try_invoke_on_locked_down_task() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 02/11] sched,rcu: Rework try_invoke_on_locked_down_task() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 03/11] sched,livepatch: Use task_call_func() Peter Zijlstra
2021-10-05 11:40   ` Petr Mladek
2021-10-05 14:03     ` Peter Zijlstra
2021-10-06  8:59   ` Miroslav Benes
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-09-29 15:17 ` [PATCH v2 04/11] sched: Simplify wake_up_*idle*() Peter Zijlstra
2021-10-09 10:07   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-10-13 14:32   ` [PATCH v2 04/11] " Qian Cai
2021-10-19  3:47     ` Qian Cai
2021-10-19  8:56       ` Peter Zijlstra
2021-10-19  9:10         ` Peter Zijlstra
2021-10-19 15:32           ` Qian Cai
2021-10-19 15:50             ` Peter Zijlstra
2021-10-19 19:22               ` Qian Cai
2021-10-19 20:27                 ` Peter Zijlstra
     [not found]   ` <CGME20211022134630eucas1p2e79e2816587d182c580459d567c1f2a9@eucas1p2.samsung.com>
2021-10-22 13:46     ` Marek Szyprowski
2021-09-29 15:17 ` [PATCH v2 05/11] sched,livepatch: Use wake_up_if_idle() Peter Zijlstra
2021-10-05 12:00   ` Petr Mladek
2021-10-06  9:16   ` Miroslav Benes
2021-10-07  9:18     ` Vasily Gorbik
2021-10-07 10:02       ` Peter Zijlstra
2021-10-13 19:37   ` Arnd Bergmann
2021-10-14 10:42     ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 06/11] context_tracking: Prefix user_{enter,exit}*() Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 07/11] context_tracking: Add an atomic sequence/state count Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 08/11] context_tracking,rcu: Replace RCU dynticks counter with context_tracking Peter Zijlstra
2021-09-29 18:37   ` Paul E. McKenney
2021-09-29 19:09     ` Peter Zijlstra
2021-09-29 19:11     ` Peter Zijlstra
2021-09-29 19:13     ` Peter Zijlstra
2021-09-29 19:24       ` Peter Zijlstra
2021-09-29 19:45         ` Paul E. McKenney
2021-09-29 18:54   ` Peter Zijlstra
2021-09-29 15:17 ` [RFC][PATCH v2 09/11] context_tracking,livepatch: Dont disturb NOHZ_FULL Peter Zijlstra
2021-10-06  8:12   ` Petr Mladek
2021-10-06  9:04     ` Peter Zijlstra
2021-10-06 10:29       ` Petr Mladek
2021-10-06 11:41         ` Peter Zijlstra
2021-10-06 11:48         ` Miroslav Benes
2021-09-29 15:17 ` [RFC][PATCH v2 10/11] livepatch: Remove klp_synchronize_transition() Peter Zijlstra
2021-10-06 12:30   ` Petr Mladek
2021-09-29 15:17 ` [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ_FULL Peter Zijlstra
2021-10-21 18:39   ` Marcelo Tosatti
2021-10-21 18:40     ` Marcelo Tosatti
2021-10-21 19:25     ` Peter Zijlstra
2021-10-21 19:57       ` Marcelo Tosatti [this message]
2021-10-21 20:18         ` Peter Zijlstra
2021-10-26 18:19           ` Marcelo Tosatti
2021-10-26 19:38             ` Peter Zijlstra
2021-09-29 18:03 ` [PATCH v2 00/11] sched,rcu,context_tracking,livepatch: Improve livepatch task transitions for idle and NOHZ_FULL Paul E. McKenney
2021-10-09 10:07 ` [tip: sched/core] sched,livepatch: Use wake_up_if_idle() tip-bot2 for Peter Zijlstra
2021-10-14 11:16 ` tip-bot2 for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211021195709.GA22422@fuller.cnet \
    --to=mtosatti@redhat.com \
    --cc=fweisbec@gmail.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jikos@kernel.org \
    --cc=joe.lawrence@redhat.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=live-patching@vger.kernel.org \
    --cc=mbenes@suse.cz \
    --cc=mingo@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sumanthk@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.