Linux-Doc Archive on lore.kernel.org
 help / color / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, Dan Williams <dan.j.williams@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions
Date: Thu, 23 Jul 2020 15:04:36 -0700
Message-ID: <20200723220435.GI844235@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <87r1t2vwi7.fsf@nanos.tec.linutronix.de>

On Thu, Jul 23, 2020 at 09:53:20PM +0200, Thomas Gleixner wrote:
> Ira,
> 
> ira.weiny@intel.com writes:
> 
> > ...
> > 	// ref == 0
> > 	dev_access_enable()  // ref += 1 ==> disable protection
> > 		irq()
> > 			// enable protection
> > 			// ref = 0
> > 			_handler()
> > 				dev_access_enable()   // ref += 1 ==> disable protection
> > 				dev_access_disable()  // ref -= 1 ==> enable protection
> > 			// WARN_ON(ref != 0)
> > 			// disable protection
> > 	do_pmem_thing()  // all good here
> > 	dev_access_disable() // ref -= 1 ==> 0 ==> enable protection
> 
> ...
> 
> > First I'm not sure if adding this state to idtentry_state and having
> > that state copied is the right way to go.
> 
> Adding the state to idtentry_state is fine at least for most interrupts
> and exceptions. Emphasis on most.
> 
> #PF does not work because #PF can schedule.


Merging with your other response:

<quote: https://lore.kernel.org/lkml/87o8o6vvt0.fsf@nanos.tec.linutronix.de/>
Only from #PF, but after the fault has been resolved and the tasks is
scheduled in again then the task returns through idtentry_exit() to the
place where it took the fault. That's not guaranteed to be on the same
CPU. If schedule is not aware of the fact that the exception turned off
stuff then you surely get into trouble. So you really want to store it
in the task itself then the context switch code can actually see the
state and act accordingly.
</quote>

I think, after fixing my code (see below), using idtentry_state could still
work.  If the per-cpu cache and the MSR is updated in idtentry_exit() that
should carry the state to the new cpu, correct?

The exception may have turned off stuff but it is a bug if it did not also turn
it back on.

Ie the irq should not be doing:

	kmap_atomic()
	return;

... without a kunmap_atomic().

That is why I put the WARN_ON_ONCE() to ensure the ref count has been properly
returned to 0 by the exception handler.

FWIW the fixed code is working for my tests...

> 
> > It seems like we should start passing this by reference instead of
> > value.  But for now this works as an RFC.  Comments?
> 
> Works as in compiles, right?
> 
> static void noinstr idt_save_pkrs(idtentry_state_t state)
> {
>         state.foo = 1;
> }
> 
> How is that supposed to change the caller state? C programming basics.

<sigh>  I am so stupid.  I was not looking at this particular case but you are
100% correct...  I can't believe I did not see this.

In the above statement I was only thinking about the extra overhead I was
adding to idtentry_enter() and the callers of it.

"C programming basics" indeed... Once again sorry...

> 
> > Second, I'm not 100% happy with having to save the reference count in
> > the exception handler.  It seems like a very ugly layering violation but
> > I don't see a way around it at the moment.
> 
> That state is strict per task, right? So why do you want to store it
> anywhere else that in task/thread storage. That solves your problem of
> #PF scheduling nicely.

The problem is with the kmap code.  If an exception handler calls kmap_atomic()
[more importantly if they nest kmap_atomic() calls] the kmap code will need to
keep track of that re-entrant call.  With a separate reference counter, the
kmap code would have to know it is in irq context or not.

Here I'm attempting to make the task 'current->dev_page_access_ref' counter
serve double duty so that the kmap code is agnostic to the context it is in.

> 
> > Third, this patch has gone through a couple of revisions as I've had
> > crashes which just don't make sense to me.  One particular issue I've
> > had is taking a MCE during memcpy_mcsafe causing my WARN_ON() to fire.
> > The code path was a pmem copy and the ref count should have been
> > elevated due to dev_access_enable() but why was
> > idtentry_enter()->idt_save_pkrs() not called I don't know.
> 
> Because #MC does not go through idtentry_enter(). Neither do #NMI, #DB, #BP.

And the above probably would work if I knew how to code in C...  :-/  I'm so
embarrassed.

> 
> > Finally, it looks like the entry/exit code is being refactored into
> > common code.  So likely this is best handled somewhat there.  Because
> > this can be predicated on CONFIG_ARCH_HAS_SUPERVISOR_PKEYS and handled
> > in a generic fashion.  But that is a ways off I think.
> 
> The invocation of save/restore might be placed in generic code at least
> for the common exception and interrupt entries.

I think you are correct.  I'm testing now...

> 
> > +static void noinstr idt_save_pkrs(idtentry_state_t state)
> 
> *state. See above.

Yep...  :-/

> 
> > +#else
> > +/* Define as macros to prevent conflict of inline and noinstr */
> > +#define idt_save_pkrs(state)
> > +#define idt_restore_pkrs(state)
> 
> Empty inlines do not need noinstr because they are optimized out. If you
> want inlines in a noinstr section then use __always_inline

Peter made the same comment so I've changed to __always_inline.

> 
> >  /**
> >   * idtentry_enter - Handle state tracking on ordinary idtentries
> >   * @regs:	Pointer to pt_regs of interrupted context
> > @@ -604,6 +671,8 @@ idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
> >  		return ret;
> >  	}
> >  
> > +	idt_save_pkrs(ret);
> 
> No. This really has no business to be called before the state is
> established. It's not something horribly urgent and write_pkrs() is NOT
> noinstr and invokes wrmsrl() which is subject to tracing.

I don't understand.  I'm not calling it within intrumentation_{begin,end}() so
does that mean I can remove the noinstr?  I think I can actually.

Or do you mean call it at the end of idtentry_enter()?  Like this:

@@ -672,8 +672,6 @@ idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
                return ret;
        }
 
-       idt_save_pkrs(ret);
-
        /*
         * If this entry hit the idle task invoke rcu_irq_enter() whether
         * RCU is watching or not.
@@ -710,7 +708,7 @@ idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
                instrumentation_end();
 
                ret.exit_rcu = true;
-               return ret;
+               goto done;
        }
 
        /*
@@ -725,6 +723,8 @@ idtentry_state_t noinstr idtentry_enter(struct pt_regs *regs)
        trace_hardirqs_off();
        instrumentation_end();
 
+done:
+       idt_save_pkrs(&ret);
        return ret;
 }


> 
> > +
> > +	idt_restore_pkrs(state);
> 
> This one is placed correctly.
> 
> Thanks,

No thank you...  I'm really sorry for wasting every ones time on this one.

I'm still processing all your responses so forgive me if I've missed something.
And thank you so much for pointing out my mistake.  That seems to have fixed
all the problems I have seen thus far.  But I want to think on if there may be
more issues.

Thank you,
Ira

  reply index

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-17  7:20 [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 01/17] x86/pkeys: Create pkeys_internal.h ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 02/17] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
2020-07-17  8:54   ` Peter Zijlstra
2020-07-17 20:52     ` Ira Weiny
2020-07-20  9:14       ` Peter Zijlstra
2020-07-17 22:36     ` Dave Hansen
2020-07-20  9:13       ` Peter Zijlstra
2020-07-17  7:20 ` [PATCH RFC V2 03/17] x86/pks: Enable Protection Keys Supervisor (PKS) ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 04/17] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
2020-07-17  8:31   ` Peter Zijlstra
2020-07-17 21:39     ` Ira Weiny
2020-07-17  8:59   ` Peter Zijlstra
2020-07-17 22:34     ` Ira Weiny
2020-07-20  9:15       ` Peter Zijlstra
2020-07-20 18:35         ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 05/17] x86/pks: Add PKS kernel API ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 06/17] x86/pks: Add a debugfs file for allocated PKS keys ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 07/17] Documentation/pkeys: Update documentation for kernel pkeys ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 08/17] x86/pks: Add PKS Test code ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 09/17] memremap: Convert devmap static branch to {inc,dec} ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 10/17] fs/dax: Remove unused size parameter ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 11/17] drivers/dax: Expand lock scope to cover the use of addresses ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 12/17] memremap: Add zone device access protection ira.weiny
2020-07-17  9:10   ` Peter Zijlstra
2020-07-18  5:06     ` Ira Weiny
2020-07-20  9:16       ` Peter Zijlstra
2020-07-17  9:17   ` Peter Zijlstra
2020-07-18  5:51     ` Ira Weiny
2020-07-17  9:20   ` Peter Zijlstra
2020-07-17  7:20 ` [PATCH RFC V2 13/17] kmap: Add stray write protection for device pages ira.weiny
2020-07-17  9:21   ` Peter Zijlstra
2020-07-19  4:13     ` Ira Weiny
2020-07-20  9:17       ` Peter Zijlstra
2020-07-21 16:31         ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 14/17] dax: Stray write protection for dax_direct_access() ira.weiny
2020-07-17  9:22   ` Peter Zijlstra
2020-07-19  4:41     ` Ira Weiny
2020-07-17  7:20 ` [PATCH RFC V2 15/17] nvdimm/pmem: Stray write protection for pmem->virt_addr ira.weiny
2020-07-17  7:20 ` [PATCH RFC V2 16/17] [dax|pmem]: Enable stray write protection ira.weiny
2020-07-17  9:25   ` Peter Zijlstra
2020-07-17  7:20 ` [PATCH RFC V2 17/17] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
2020-07-17  9:30   ` Peter Zijlstra
2020-07-21 18:01     ` Ira Weiny
2020-07-21 19:11       ` Peter Zijlstra
2020-07-17  9:34   ` Peter Zijlstra
2020-07-17 10:06   ` Peter Zijlstra
2020-07-22  5:27     ` Ira Weiny
2020-07-22  9:48       ` Peter Zijlstra
2020-07-22 21:24         ` Ira Weiny
2020-07-23 20:08       ` Thomas Gleixner
2020-07-23 20:15         ` Thomas Gleixner
2020-07-24 17:23           ` Ira Weiny
2020-07-24 17:29             ` Andy Lutomirski
2020-07-24 19:43               ` Ira Weiny
2020-07-22 16:21   ` Andy Lutomirski
2020-07-23 16:18     ` Fenghua Yu
2020-07-23 16:23       ` Dave Hansen
2020-07-23 16:52         ` Fenghua Yu
2020-07-23 17:08           ` Andy Lutomirski
2020-07-23 17:30             ` Dave Hansen
2020-07-23 20:23               ` Thomas Gleixner
2020-07-23 20:22             ` Thomas Gleixner
2020-07-23 21:30               ` Andy Lutomirski
2020-07-23 22:14                 ` Thomas Gleixner
2020-07-23 19:53   ` Thomas Gleixner
2020-07-23 22:04     ` Ira Weiny [this message]
2020-07-23 23:41       ` Thomas Gleixner
2020-07-24 21:24         ` Thomas Gleixner
2020-07-24 21:31           ` Thomas Gleixner
2020-07-25  0:09           ` Andy Lutomirski
2020-07-27 20:59           ` Ira Weiny
2020-07-24 22:19 ` [PATCH RFC V2 00/17] PKS: Add Protection Keys Supervisor (PKS) support Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200723220435.GI844235@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vishal.l.verma@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Doc Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-doc/0 linux-doc/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-doc linux-doc/ https://lore.kernel.org/linux-doc \
		linux-doc@vger.kernel.org
	public-inbox-index linux-doc

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-doc


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git