All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ira Weiny <ira.weiny@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux-MM <linux-mm@kvack.org>,
	"open list:KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH V3.1] entry: Pass irqentry_state_t by reference
Date: Tue, 15 Dec 2020 17:32:02 -0800	[thread overview]
Message-ID: <20201216013202.GY1563847@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com>

On Fri, Dec 11, 2020 at 02:14:28PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 23, 2020 at 10:10 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Currently struct irqentry_state_t only contains a single bool value
> > which makes passing it by value is reasonable.  However, future patches
> > add information to this struct.  This includes the PKRS thread state,
> > included in this series, as well as information to store kmap reference
> > tracking and PKS global state outside this series.  In total, we
> > anticipate 2 new 32 bit fields and an integer field to be added to the
> > struct beyond the existing bool value.
> >
> > Adding information to irqentry_state_t makes passing by value less
> > efficient.  Therefore, change the entry/exit calls to pass irq_state by
> > reference in preparation for the changes which follow.
> >
> > While at it, make the code easier to follow by changing all the usage
> > sites to consistently use the variable name 'irq_state'.
> 
> After contemplating this for a bit, I think this isn't really the
> right approach.  It *works*, but we've mostly just created a bit of an
> unfortunate situation.

First off please forgive my ignorance on how this code works.

> Our stack, on a (possibly nested) entry looks
> like:
> 
> previous frame (or empty if we came from usermode)
> ---
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> 
> C frame
> 
> irqentry_state_t (maybe -- the compiler is within its rights to play
> almost arbitrary games here)
> 
> more C stuff
> 
> 
> So what we've accomplished is having two distinct arch register
> regions, one called pt_regs and the other stuck in irqentry_state_t.
> This is annoying because it means that, if we want to access this
> thing without passing a pointer around or access it at all from outer
> frames, we need to do something terrible with the unwinder, and we
> don't want to go there.
> 
> So I propose a somewhat different solution: lay out the stack like this.
> 
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> PKS
> ^^^^^^^^ extended_pt_regs points here
> 
> C frame
> more C stuff
> ...
> 
> IOW we have:
> 
> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };
> 
> and arch_extended_pt_regs has unsigned long pks;
> 
> and instead of passing a pointer to irqentry_state_t to the generic
> entry/exit code, we just pass a pt_regs pointer.  And we have a little
> accessor like:
> 
> struct extended_pt_regs *extended_regs(struct pt_regs *) { return
> container_of(...); }
> 
> And we tell eBPF that extended_pt_regs is NOT ABI, and we will change
> it whenever we feel like just to keep you on your toes, thank you very
> much.
> 
> Does this seem reasonable?

Conceptually yes.  But I'm failing to see how this implementation can be made
generic for the generic fields.  The pks fields, assuming they stay x86
specific, would be reasonable to add in PUSH_AND_CLEAR_REGS.  But the
rcu/lockdep field is generic.  Wouldn't we have to modify every architecture to
add space for the rcu/lockdep bool?

If not, where is a generic place that could be done?  Basically I'm missing how
the effective stack structure can look like this:

> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };

It seems more reasonable to make it look like:

#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
struct extended_pt_regs {
	unsigned long pkrs;
	struct pt_regs regs;
};
#endif

And leave the rcu/lockdep bool passed by value as before (still in C).

Is that what you mean?  Or am I missing something with the way pt_regs is set
up?  Which is entirely possible because I'm pretty ignorant about how this code
works...  :-/

Ira
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: Ira Weiny <ira.weiny@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux-MM <linux-mm@kvack.org>,
	"open list:KERNEL SELFTEST FRAMEWORK" 
	<linux-kselftest@vger.kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH V3.1] entry: Pass irqentry_state_t by reference
Date: Tue, 15 Dec 2020 17:32:02 -0800	[thread overview]
Message-ID: <20201216013202.GY1563847@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com>

On Fri, Dec 11, 2020 at 02:14:28PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 23, 2020 at 10:10 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Currently struct irqentry_state_t only contains a single bool value
> > which makes passing it by value is reasonable.  However, future patches
> > add information to this struct.  This includes the PKRS thread state,
> > included in this series, as well as information to store kmap reference
> > tracking and PKS global state outside this series.  In total, we
> > anticipate 2 new 32 bit fields and an integer field to be added to the
> > struct beyond the existing bool value.
> >
> > Adding information to irqentry_state_t makes passing by value less
> > efficient.  Therefore, change the entry/exit calls to pass irq_state by
> > reference in preparation for the changes which follow.
> >
> > While at it, make the code easier to follow by changing all the usage
> > sites to consistently use the variable name 'irq_state'.
> 
> After contemplating this for a bit, I think this isn't really the
> right approach.  It *works*, but we've mostly just created a bit of an
> unfortunate situation.

First off please forgive my ignorance on how this code works.

> Our stack, on a (possibly nested) entry looks
> like:
> 
> previous frame (or empty if we came from usermode)
> ---
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> 
> C frame
> 
> irqentry_state_t (maybe -- the compiler is within its rights to play
> almost arbitrary games here)
> 
> more C stuff
> 
> 
> So what we've accomplished is having two distinct arch register
> regions, one called pt_regs and the other stuck in irqentry_state_t.
> This is annoying because it means that, if we want to access this
> thing without passing a pointer around or access it at all from outer
> frames, we need to do something terrible with the unwinder, and we
> don't want to go there.
> 
> So I propose a somewhat different solution: lay out the stack like this.
> 
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> PKS
> ^^^^^^^^ extended_pt_regs points here
> 
> C frame
> more C stuff
> ...
> 
> IOW we have:
> 
> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };
> 
> and arch_extended_pt_regs has unsigned long pks;
> 
> and instead of passing a pointer to irqentry_state_t to the generic
> entry/exit code, we just pass a pt_regs pointer.  And we have a little
> accessor like:
> 
> struct extended_pt_regs *extended_regs(struct pt_regs *) { return
> container_of(...); }
> 
> And we tell eBPF that extended_pt_regs is NOT ABI, and we will change
> it whenever we feel like just to keep you on your toes, thank you very
> much.
> 
> Does this seem reasonable?

Conceptually yes.  But I'm failing to see how this implementation can be made
generic for the generic fields.  The pks fields, assuming they stay x86
specific, would be reasonable to add in PUSH_AND_CLEAR_REGS.  But the
rcu/lockdep field is generic.  Wouldn't we have to modify every architecture to
add space for the rcu/lockdep bool?

If not, where is a generic place that could be done?  Basically I'm missing how
the effective stack structure can look like this:

> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };

It seems more reasonable to make it look like:

#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
struct extended_pt_regs {
	unsigned long pkrs;
	struct pt_regs regs;
};
#endif

And leave the rcu/lockdep bool passed by value as before (still in C).

Is that what you mean?  Or am I missing something with the way pt_regs is set
up?  Which is entirely possible because I'm pretty ignorant about how this code
works...  :-/

Ira

WARNING: multiple messages have this Message-ID (diff)
From: Ira Weiny <ira.weiny@intel.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	"open list:DOCUMENTATION" <linux-doc@vger.kernel.org>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux-MM <linux-mm@kvack.org>,
	"open list:KERNEL SELFTEST FRAMEWORK"
	<linux-kselftest@vger.kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [PATCH V3.1] entry: Pass irqentry_state_t by reference
Date: Tue, 15 Dec 2020 17:32:02 -0800	[thread overview]
Message-ID: <20201216013202.GY1563847@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com>

On Fri, Dec 11, 2020 at 02:14:28PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 23, 2020 at 10:10 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > Currently struct irqentry_state_t only contains a single bool value
> > which makes passing it by value is reasonable.  However, future patches
> > add information to this struct.  This includes the PKRS thread state,
> > included in this series, as well as information to store kmap reference
> > tracking and PKS global state outside this series.  In total, we
> > anticipate 2 new 32 bit fields and an integer field to be added to the
> > struct beyond the existing bool value.
> >
> > Adding information to irqentry_state_t makes passing by value less
> > efficient.  Therefore, change the entry/exit calls to pass irq_state by
> > reference in preparation for the changes which follow.
> >
> > While at it, make the code easier to follow by changing all the usage
> > sites to consistently use the variable name 'irq_state'.
> 
> After contemplating this for a bit, I think this isn't really the
> right approach.  It *works*, but we've mostly just created a bit of an
> unfortunate situation.

First off please forgive my ignorance on how this code works.

> Our stack, on a (possibly nested) entry looks
> like:
> 
> previous frame (or empty if we came from usermode)
> ---
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> 
> C frame
> 
> irqentry_state_t (maybe -- the compiler is within its rights to play
> almost arbitrary games here)
> 
> more C stuff
> 
> 
> So what we've accomplished is having two distinct arch register
> regions, one called pt_regs and the other stuck in irqentry_state_t.
> This is annoying because it means that, if we want to access this
> thing without passing a pointer around or access it at all from outer
> frames, we need to do something terrible with the unwinder, and we
> don't want to go there.
> 
> So I propose a somewhat different solution: lay out the stack like this.
> 
> SS
> RSP
> FLAGS
> CS
> RIP
> rest of pt_regs
> PKS
> ^^^^^^^^ extended_pt_regs points here
> 
> C frame
> more C stuff
> ...
> 
> IOW we have:
> 
> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };
> 
> and arch_extended_pt_regs has unsigned long pks;
> 
> and instead of passing a pointer to irqentry_state_t to the generic
> entry/exit code, we just pass a pt_regs pointer.  And we have a little
> accessor like:
> 
> struct extended_pt_regs *extended_regs(struct pt_regs *) { return
> container_of(...); }
> 
> And we tell eBPF that extended_pt_regs is NOT ABI, and we will change
> it whenever we feel like just to keep you on your toes, thank you very
> much.
> 
> Does this seem reasonable?

Conceptually yes.  But I'm failing to see how this implementation can be made
generic for the generic fields.  The pks fields, assuming they stay x86
specific, would be reasonable to add in PUSH_AND_CLEAR_REGS.  But the
rcu/lockdep field is generic.  Wouldn't we have to modify every architecture to
add space for the rcu/lockdep bool?

If not, where is a generic place that could be done?  Basically I'm missing how
the effective stack structure can look like this:

> struct extended_pt_regs {
>   bool rcu_whatever;
>   other generic fields here;
>   struct arch_extended_pt_regs arch_regs;
>   struct pt_regs regs;
> };

It seems more reasonable to make it look like:

#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS
struct extended_pt_regs {
	unsigned long pkrs;
	struct pt_regs regs;
};
#endif

And leave the rcu/lockdep bool passed by value as before (still in C).

Is that what you mean?  Or am I missing something with the way pt_regs is set
up?  Which is entirely possible because I'm pretty ignorant about how this code
works...  :-/

Ira


  reply	other threads:[~2020-12-16  1:32 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-06 23:28 [PATCH V3 00/10] PKS: Add Protection Keys Supervisor (PKS) support V3 ira.weiny
2020-11-06 23:28 ` ira.weiny
2020-11-06 23:28 ` [PATCH V3 01/10] x86/pkeys: Create pkeys_common.h ira.weiny
2020-11-06 23:28   ` ira.weiny
2020-11-06 23:29 ` [PATCH V3 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-11-06 23:29 ` [PATCH V3 03/10] x86/pks: Add PKS defines and Kconfig options ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-11-06 23:29 ` [PATCH V3 04/10] x86/pks: Preserve the PKRS MSR on context switch ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-12-17 14:50   ` Thomas Gleixner
2020-12-17 14:50     ` Thomas Gleixner
2020-12-17 22:43     ` Thomas Gleixner
2020-12-17 22:43       ` Thomas Gleixner
2020-12-18 13:57       ` Thomas Gleixner
2020-12-18 13:57         ` Thomas Gleixner
2020-12-18 19:20         ` Dan Williams
2020-12-18 19:20           ` Dan Williams
2020-12-18 19:20           ` Dan Williams
2020-12-18 21:06           ` Thomas Gleixner
2020-12-18 21:06             ` Thomas Gleixner
2020-12-18 21:58             ` Dan Williams
2020-12-18 21:58               ` Dan Williams
2020-12-18 21:58               ` Dan Williams
2020-12-18 22:44               ` Thomas Gleixner
2020-12-18 22:44                 ` Thomas Gleixner
2020-12-18 19:42         ` Ira Weiny
2020-12-18 19:42           ` Ira Weiny
2020-12-18 20:10           ` Dave Hansen
2020-12-18 20:10             ` Dave Hansen
2020-12-18 21:30           ` Thomas Gleixner
2020-12-18 21:30             ` Thomas Gleixner
2020-12-18  4:05     ` Ira Weiny
2020-12-18  4:05       ` Ira Weiny
2020-12-17 20:41   ` [NEEDS-REVIEW] " Dave Hansen
2020-12-17 20:41     ` Dave Hansen
2020-12-18  4:10     ` Ira Weiny
2020-12-18  4:10       ` Ira Weiny
2020-12-18 15:33       ` Dave Hansen
2020-12-18 15:33         ` Dave Hansen
2020-11-06 23:29 ` [PATCH V3 05/10] x86/entry: Pass irqentry_state_t by reference ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-11-15 18:58   ` Thomas Gleixner
2020-11-15 18:58     ` Thomas Gleixner
2020-11-16 18:49     ` Ira Weiny
2020-11-16 18:49       ` Ira Weiny
2020-11-16 20:36       ` Thomas Gleixner
2020-11-16 20:36         ` Thomas Gleixner
2020-11-24  6:09   ` [PATCH V3.1] entry: " ira.weiny
2020-11-24  6:09     ` ira.weiny
2020-12-11 22:14     ` Andy Lutomirski
2020-12-11 22:14       ` Andy Lutomirski
2020-12-11 22:14       ` Andy Lutomirski
2020-12-16  1:32       ` Ira Weiny [this message]
2020-12-16  1:32         ` Ira Weiny
2020-12-16  1:32         ` Ira Weiny
2020-12-16  2:09         ` Andy Lutomirski
2020-12-16  2:09           ` Andy Lutomirski
2020-12-16  2:09           ` Andy Lutomirski
2020-12-17  0:38           ` Ira Weiny
2020-12-17  0:38             ` Ira Weiny
2020-12-17  0:38             ` Ira Weiny
2020-12-17 13:07       ` Thomas Gleixner
2020-12-17 13:07         ` Thomas Gleixner
2020-12-17 13:07         ` Thomas Gleixner
2020-12-17 13:19         ` Peter Zijlstra
2020-12-17 13:19           ` Peter Zijlstra
2020-12-17 13:19           ` Peter Zijlstra
2020-12-17 15:35           ` Andy Lutomirski
2020-12-17 15:35             ` Andy Lutomirski
2020-12-17 15:35             ` Andy Lutomirski
2020-12-17 16:58     ` Thomas Gleixner
2020-12-17 16:58       ` Thomas Gleixner
2020-11-06 23:29 ` [PATCH V3 06/10] x86/entry: Preserve PKRS MSR across exceptions ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-12-17 15:28   ` Thomas Gleixner
2020-12-17 15:28     ` Thomas Gleixner
2020-11-06 23:29 ` [PATCH V3 07/10] x86/fault: Report the PKRS state on fault ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-11-06 23:29 ` [PATCH V3 08/10] x86/pks: Add PKS kernel API ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-12-23 20:39   ` Randy Dunlap
2020-12-23 20:39     ` Randy Dunlap
2020-11-06 23:29 ` [PATCH V3 09/10] x86/pks: Enable Protection Keys Supervisor (PKS) ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-11-06 23:29 ` [PATCH V3 10/10] x86/pks: Add PKS test code ira.weiny
2020-11-06 23:29   ` ira.weiny
2020-12-17 20:55   ` Dave Hansen
2020-12-17 20:55     ` Dave Hansen
2020-12-18  4:05     ` Ira Weiny
2020-12-18  4:05       ` Ira Weiny
2020-12-18 16:59       ` Dan Williams
2020-12-18 16:59         ` Dan Williams
2020-12-18 16:59         ` Dan Williams
2020-12-07 22:14 ` [PATCH V3 00/10] PKS: Add Protection Keys Supervisor (PKS) support V3 Ira Weiny
2020-12-07 22:14   ` Ira Weiny
2020-12-08 15:55   ` Thomas Gleixner
2020-12-08 15:55     ` Thomas Gleixner
2020-12-08 17:22     ` Ira Weiny
2020-12-08 17:22       ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201216013202.GY1563847@iweiny-DESK2.sc.intel.com \
    --to=ira.weiny@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=fenghua.yu@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.