[RFC,2/6] x86/mm: temporary mm struct
diff mbox series

Message ID 20180829081147.184610-3-namit@vmware.com
State New
Headers show
Series
  • x86: text_poke() fixes
Related show

Commit Message

Nadav Amit Aug. 29, 2018, 8:11 a.m. UTC
From: Andy Lutomirski <luto@kernel.org>

Sometimes we want to set a temporary page-table entries (PTEs) in one of
the cores, without allowing other cores to use - even speculatively -
these mappings. There are two benefits for doing so:

(1) Security: if sensitive PTEs are set, temporary mm prevents their use
in other cores. This hardens the security as it prevents exploding a
dangling pointer to overwrite sensitive data using the sensitive PTE.

(2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
remote page-tables.

To do so a temporary mm_struct can be used. Mappings which are private
for this mm can be set in the userspace part of the address-space.
During the whole time in which the temporary mm is loaded, interrupts
must be disabled.

The first use-case for temporary PTEs, which will follow, is for poking
the kernel text.

[ Commit message was written by Nadav ]

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

Comments

Masami Hiramatsu Aug. 29, 2018, 9:49 a.m. UTC | #1
On Wed, 29 Aug 2018 01:11:43 -0700
Nadav Amit <namit@vmware.com> wrote:

> From: Andy Lutomirski <luto@kernel.org>
> 
> Sometimes we want to set a temporary page-table entries (PTEs) in one of
> the cores, without allowing other cores to use - even speculatively -
> these mappings. There are two benefits for doing so:
> 
> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
> in other cores. This hardens the security as it prevents exploding a
> dangling pointer to overwrite sensitive data using the sensitive PTE.
> 
> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
> remote page-tables.
> 
> To do so a temporary mm_struct can be used. Mappings which are private
> for this mm can be set in the userspace part of the address-space.
> During the whole time in which the temporary mm is loaded, interrupts
> must be disabled.
> 
> The first use-case for temporary PTEs, which will follow, is for poking
> the kernel text.
> 
> [ Commit message was written by Nadav ]
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index eeeb9289c764..96afc8c0cf15 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>  	return cr3;
>  }
>  
> +typedef struct {
> +	struct mm_struct *prev;
> +} temporary_mm_state_t;
> +
> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
> +{
> +	temporary_mm_state_t state;
> +
> +	lockdep_assert_irqs_disabled();
> +	state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
> +	switch_mm_irqs_off(NULL, mm, current);
> +	return state;
> +}

Hmm, why don't we return mm_struct *prev directly?

Thank you,

> +
> +static inline void unuse_temporary_mm(temporary_mm_state_t prev)
> +{
> +	lockdep_assert_irqs_disabled();
> +	switch_mm_irqs_off(NULL, prev.prev, current);
> +}
> +
>  #endif /* _ASM_X86_MMU_CONTEXT_H */
> -- 
> 2.17.1
>
Andy Lutomirski Aug. 29, 2018, 3:41 p.m. UTC | #2
On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> On Wed, 29 Aug 2018 01:11:43 -0700
> Nadav Amit <namit@vmware.com> wrote:
>
>> From: Andy Lutomirski <luto@kernel.org>
>>
>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
>> the cores, without allowing other cores to use - even speculatively -
>> these mappings. There are two benefits for doing so:
>>
>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
>> in other cores. This hardens the security as it prevents exploding a
>> dangling pointer to overwrite sensitive data using the sensitive PTE.
>>
>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
>> remote page-tables.
>>
>> To do so a temporary mm_struct can be used. Mappings which are private
>> for this mm can be set in the userspace part of the address-space.
>> During the whole time in which the temporary mm is loaded, interrupts
>> must be disabled.
>>
>> The first use-case for temporary PTEs, which will follow, is for poking
>> the kernel text.
>>
>> [ Commit message was written by Nadav ]
>>
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Kees Cook <keescook@chromium.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Nadav Amit <namit@vmware.com>
>> ---
>>  arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>> index eeeb9289c764..96afc8c0cf15 100644
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>>       return cr3;
>>  }
>>
>> +typedef struct {
>> +     struct mm_struct *prev;
>> +} temporary_mm_state_t;
>> +
>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
>> +{
>> +     temporary_mm_state_t state;
>> +
>> +     lockdep_assert_irqs_disabled();
>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>> +     switch_mm_irqs_off(NULL, mm, current);
>> +     return state;
>> +}
>
> Hmm, why don't we return mm_struct *prev directly?

I did it this way to make it easier to add future debugging stuff
later.  Also, when I first wrote this, I stashed the old CR3 instead
of the old mm_struct, and it seemed like callers should be insulated
from details like this.
Andy Lutomirski Aug. 29, 2018, 3:46 p.m. UTC | #3
Rik, this is the patch I was referring to.

On Wed, Aug 29, 2018 at 1:11 AM, Nadav Amit <namit@vmware.com> wrote:
> From: Andy Lutomirski <luto@kernel.org>
>
> Sometimes we want to set a temporary page-table entries (PTEs) in one of
> the cores, without allowing other cores to use - even speculatively -
> these mappings. There are two benefits for doing so:
>
> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
> in other cores. This hardens the security as it prevents exploding a
> dangling pointer to overwrite sensitive data using the sensitive PTE.
>
> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
> remote page-tables.
>
> To do so a temporary mm_struct can be used. Mappings which are private
> for this mm can be set in the userspace part of the address-space.
> During the whole time in which the temporary mm is loaded, interrupts
> must be disabled.
>
> The first use-case for temporary PTEs, which will follow, is for poking
> the kernel text.
>
> [ Commit message was written by Nadav ]
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Nadav Amit <namit@vmware.com>
> ---
>  arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
>
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index eeeb9289c764..96afc8c0cf15 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>         return cr3;
>  }
>
> +typedef struct {
> +       struct mm_struct *prev;
> +} temporary_mm_state_t;
> +
> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
> +{
> +       temporary_mm_state_t state;
> +
> +       lockdep_assert_irqs_disabled();
> +       state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
> +       switch_mm_irqs_off(NULL, mm, current);
> +       return state;
> +}
> +
> +static inline void unuse_temporary_mm(temporary_mm_state_t prev)
> +{
> +       lockdep_assert_irqs_disabled();
> +       switch_mm_irqs_off(NULL, prev.prev, current);
> +}
> +
>  #endif /* _ASM_X86_MMU_CONTEXT_H */
> --
> 2.17.1
>
Nadav Amit Aug. 29, 2018, 4:54 p.m. UTC | #4
at 8:41 AM, Andy Lutomirski <luto@kernel.org> wrote:

> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>> On Wed, 29 Aug 2018 01:11:43 -0700
>> Nadav Amit <namit@vmware.com> wrote:
>> 
>>> From: Andy Lutomirski <luto@kernel.org>
>>> 
>>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
>>> the cores, without allowing other cores to use - even speculatively -
>>> these mappings. There are two benefits for doing so:
>>> 
>>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
>>> in other cores. This hardens the security as it prevents exploding a
>>> dangling pointer to overwrite sensitive data using the sensitive PTE.
>>> 
>>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
>>> remote page-tables.
>>> 
>>> To do so a temporary mm_struct can be used. Mappings which are private
>>> for this mm can be set in the userspace part of the address-space.
>>> During the whole time in which the temporary mm is loaded, interrupts
>>> must be disabled.
>>> 
>>> The first use-case for temporary PTEs, which will follow, is for poking
>>> the kernel text.
>>> 
>>> [ Commit message was written by Nadav ]
>>> 
>>> Cc: Andy Lutomirski <luto@kernel.org>
>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>> Cc: Kees Cook <keescook@chromium.org>
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>> ---
>>> arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>>> 1 file changed, 20 insertions(+)
>>> 
>>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>>> index eeeb9289c764..96afc8c0cf15 100644
>>> --- a/arch/x86/include/asm/mmu_context.h
>>> +++ b/arch/x86/include/asm/mmu_context.h
>>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>>>      return cr3;
>>> }
>>> 
>>> +typedef struct {
>>> +     struct mm_struct *prev;
>>> +} temporary_mm_state_t;
>>> +
>>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>> +{
>>> +     temporary_mm_state_t state;
>>> +
>>> +     lockdep_assert_irqs_disabled();
>>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>>> +     switch_mm_irqs_off(NULL, mm, current);
>>> +     return state;
>>> +}
>> 
>> Hmm, why don't we return mm_struct *prev directly?
> 
> I did it this way to make it easier to add future debugging stuff
> later.  Also, when I first wrote this, I stashed the old CR3 instead
> of the old mm_struct, and it seemed like callers should be insulated
> from details like this.

Andy, please let me know if you want me to change it somehow, and please
provide your signed-off-by.
Andy Lutomirski Aug. 29, 2018, 9:38 p.m. UTC | #5
On Wed, Aug 29, 2018 at 9:54 AM, Nadav Amit <namit@vmware.com> wrote:
> at 8:41 AM, Andy Lutomirski <luto@kernel.org> wrote:
>
>> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>>> On Wed, 29 Aug 2018 01:11:43 -0700
>>> Nadav Amit <namit@vmware.com> wrote:
>>>
>>>> From: Andy Lutomirski <luto@kernel.org>
>>>>
>>>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
>>>> the cores, without allowing other cores to use - even speculatively -
>>>> these mappings. There are two benefits for doing so:
>>>>
>>>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
>>>> in other cores. This hardens the security as it prevents exploding a
>>>> dangling pointer to overwrite sensitive data using the sensitive PTE.
>>>>
>>>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
>>>> remote page-tables.
>>>>
>>>> To do so a temporary mm_struct can be used. Mappings which are private
>>>> for this mm can be set in the userspace part of the address-space.
>>>> During the whole time in which the temporary mm is loaded, interrupts
>>>> must be disabled.
>>>>
>>>> The first use-case for temporary PTEs, which will follow, is for poking
>>>> the kernel text.
>>>>
>>>> [ Commit message was written by Nadav ]
>>>>
>>>> Cc: Andy Lutomirski <luto@kernel.org>
>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>>>> 1 file changed, 20 insertions(+)
>>>>
>>>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>>>> index eeeb9289c764..96afc8c0cf15 100644
>>>> --- a/arch/x86/include/asm/mmu_context.h
>>>> +++ b/arch/x86/include/asm/mmu_context.h
>>>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>>>>      return cr3;
>>>> }
>>>>
>>>> +typedef struct {
>>>> +     struct mm_struct *prev;
>>>> +} temporary_mm_state_t;
>>>> +
>>>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>>> +{
>>>> +     temporary_mm_state_t state;
>>>> +
>>>> +     lockdep_assert_irqs_disabled();
>>>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>>>> +     switch_mm_irqs_off(NULL, mm, current);
>>>> +     return state;
>>>> +}
>>>
>>> Hmm, why don't we return mm_struct *prev directly?
>>
>> I did it this way to make it easier to add future debugging stuff
>> later.  Also, when I first wrote this, I stashed the old CR3 instead
>> of the old mm_struct, and it seemed like callers should be insulated
>> from details like this.
>
> Andy, please let me know if you want me to change it somehow, and please
> provide your signed-off-by.
>

I'm happy with it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Masami Hiramatsu Aug. 30, 2018, 1:38 a.m. UTC | #6
On Wed, 29 Aug 2018 08:41:00 -0700
Andy Lutomirski <luto@kernel.org> wrote:

> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > On Wed, 29 Aug 2018 01:11:43 -0700
> > Nadav Amit <namit@vmware.com> wrote:
> >
> >> From: Andy Lutomirski <luto@kernel.org>
> >>
> >> Sometimes we want to set a temporary page-table entries (PTEs) in one of
> >> the cores, without allowing other cores to use - even speculatively -
> >> these mappings. There are two benefits for doing so:
> >>
> >> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
> >> in other cores. This hardens the security as it prevents exploding a
> >> dangling pointer to overwrite sensitive data using the sensitive PTE.
> >>
> >> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
> >> remote page-tables.
> >>
> >> To do so a temporary mm_struct can be used. Mappings which are private
> >> for this mm can be set in the userspace part of the address-space.
> >> During the whole time in which the temporary mm is loaded, interrupts
> >> must be disabled.
> >>
> >> The first use-case for temporary PTEs, which will follow, is for poking
> >> the kernel text.
> >>
> >> [ Commit message was written by Nadav ]
> >>
> >> Cc: Andy Lutomirski <luto@kernel.org>
> >> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> >> Cc: Kees Cook <keescook@chromium.org>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Signed-off-by: Nadav Amit <namit@vmware.com>
> >> ---
> >>  arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> >> index eeeb9289c764..96afc8c0cf15 100644
> >> --- a/arch/x86/include/asm/mmu_context.h
> >> +++ b/arch/x86/include/asm/mmu_context.h
> >> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
> >>       return cr3;
> >>  }
> >>
> >> +typedef struct {
> >> +     struct mm_struct *prev;
> >> +} temporary_mm_state_t;
> >> +
> >> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
> >> +{
> >> +     temporary_mm_state_t state;
> >> +
> >> +     lockdep_assert_irqs_disabled();
> >> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
> >> +     switch_mm_irqs_off(NULL, mm, current);
> >> +     return state;
> >> +}
> >
> > Hmm, why don't we return mm_struct *prev directly?
> 
> I did it this way to make it easier to add future debugging stuff
> later. Also, when I first wrote this, I stashed the old CR3 instead
> of the old mm_struct, and it seemed like callers should be insulated
> from details like this.

Hmm, I see. But in that case, we should call it "struct temporary_mm"
and explicitly allocate (and pass) it, since we can not return the
data structure from stack. If we can combine it with new mm, it will
be more encapsulated e.g.

struct temporary_mm {
	struct mm_struct *mm;
	struct mm_struct *prev;
};

static struct temporary_mm poking_tmp_mm;

poking_init()
{
	if (init_temporary_mm(&tmp_mm, &init_mm))
		goto error;
	...
}

text_poke_safe()
{
	...
	use_temporary_mm(&tmp_mm);
	...
	unuse_temporary_mm(&tmp_mm);
}

Any thought?

Thanks,
Andy Lutomirski Aug. 30, 2018, 1:59 a.m. UTC | #7
> On Aug 29, 2018, at 6:38 PM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> 
> On Wed, 29 Aug 2018 08:41:00 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
> 
>>> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
>>> On Wed, 29 Aug 2018 01:11:43 -0700
>>> Nadav Amit <namit@vmware.com> wrote:
>>> 
>>>> From: Andy Lutomirski <luto@kernel.org>
>>>> 
>>>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
>>>> the cores, without allowing other cores to use - even speculatively -
>>>> these mappings. There are two benefits for doing so:
>>>> 
>>>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
>>>> in other cores. This hardens the security as it prevents exploding a
>>>> dangling pointer to overwrite sensitive data using the sensitive PTE.
>>>> 
>>>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
>>>> remote page-tables.
>>>> 
>>>> To do so a temporary mm_struct can be used. Mappings which are private
>>>> for this mm can be set in the userspace part of the address-space.
>>>> During the whole time in which the temporary mm is loaded, interrupts
>>>> must be disabled.
>>>> 
>>>> The first use-case for temporary PTEs, which will follow, is for poking
>>>> the kernel text.
>>>> 
>>>> [ Commit message was written by Nadav ]
>>>> 
>>>> Cc: Andy Lutomirski <luto@kernel.org>
>>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>>>> Cc: Kees Cook <keescook@chromium.org>
>>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>>> Signed-off-by: Nadav Amit <namit@vmware.com>
>>>> ---
>>>> arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
>>>> 1 file changed, 20 insertions(+)
>>>> 
>>>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
>>>> index eeeb9289c764..96afc8c0cf15 100644
>>>> --- a/arch/x86/include/asm/mmu_context.h
>>>> +++ b/arch/x86/include/asm/mmu_context.h
>>>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
>>>>      return cr3;
>>>> }
>>>> 
>>>> +typedef struct {
>>>> +     struct mm_struct *prev;
>>>> +} temporary_mm_state_t;
>>>> +
>>>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
>>>> +{
>>>> +     temporary_mm_state_t state;
>>>> +
>>>> +     lockdep_assert_irqs_disabled();
>>>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>>>> +     switch_mm_irqs_off(NULL, mm, current);
>>>> +     return state;
>>>> +}
>>> 
>>> Hmm, why don't we return mm_struct *prev directly?
>> 
>> I did it this way to make it easier to add future debugging stuff
>> later. Also, when I first wrote this, I stashed the old CR3 instead
>> of the old mm_struct, and it seemed like callers should be insulated
>> from details like this.
> 
> Hmm, I see. But in that case, we should call it "struct temporary_mm"
> and explicitly allocate (and pass) it, since we can not return the
> data structure from stack.

Why not?

> If we can combine it with new mm, it will
> be more encapsulated e.g.
> 
> struct temporary_mm {
>    struct mm_struct *mm;
>    struct mm_struct *prev;
> };
> 
> static struct temporary_mm poking_tmp_mm;
> 
> poking_init()
> {
>    if (init_temporary_mm(&tmp_mm, &init_mm))
>        goto error;
>    ...
> }
> 
> text_poke_safe()
> {
>    ...
>    use_temporary_mm(&tmp_mm);
>    ...
>    unuse_temporary_mm(&tmp_mm);
> }
> 
> Any thought?

That seems more complicated for not very much gain.
Masami Hiramatsu Aug. 31, 2018, 4:42 a.m. UTC | #8
On Wed, 29 Aug 2018 18:59:52 -0700
Andy Lutomirski <luto@amacapital.net> wrote:

> 
> 
> > On Aug 29, 2018, at 6:38 PM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> > 
> > On Wed, 29 Aug 2018 08:41:00 -0700
> > Andy Lutomirski <luto@kernel.org> wrote:
> > 
> >>> On Wed, Aug 29, 2018 at 2:49 AM, Masami Hiramatsu <mhiramat@kernel.org> wrote:
> >>> On Wed, 29 Aug 2018 01:11:43 -0700
> >>> Nadav Amit <namit@vmware.com> wrote:
> >>> 
> >>>> From: Andy Lutomirski <luto@kernel.org>
> >>>> 
> >>>> Sometimes we want to set a temporary page-table entries (PTEs) in one of
> >>>> the cores, without allowing other cores to use - even speculatively -
> >>>> these mappings. There are two benefits for doing so:
> >>>> 
> >>>> (1) Security: if sensitive PTEs are set, temporary mm prevents their use
> >>>> in other cores. This hardens the security as it prevents exploding a
> >>>> dangling pointer to overwrite sensitive data using the sensitive PTE.
> >>>> 
> >>>> (2) Avoiding TLB shootdowns: the PTEs do not need to be flushed in
> >>>> remote page-tables.
> >>>> 
> >>>> To do so a temporary mm_struct can be used. Mappings which are private
> >>>> for this mm can be set in the userspace part of the address-space.
> >>>> During the whole time in which the temporary mm is loaded, interrupts
> >>>> must be disabled.
> >>>> 
> >>>> The first use-case for temporary PTEs, which will follow, is for poking
> >>>> the kernel text.
> >>>> 
> >>>> [ Commit message was written by Nadav ]
> >>>> 
> >>>> Cc: Andy Lutomirski <luto@kernel.org>
> >>>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> >>>> Cc: Kees Cook <keescook@chromium.org>
> >>>> Cc: Peter Zijlstra <peterz@infradead.org>
> >>>> Signed-off-by: Nadav Amit <namit@vmware.com>
> >>>> ---
> >>>> arch/x86/include/asm/mmu_context.h | 20 ++++++++++++++++++++
> >>>> 1 file changed, 20 insertions(+)
> >>>> 
> >>>> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> >>>> index eeeb9289c764..96afc8c0cf15 100644
> >>>> --- a/arch/x86/include/asm/mmu_context.h
> >>>> +++ b/arch/x86/include/asm/mmu_context.h
> >>>> @@ -338,4 +338,24 @@ static inline unsigned long __get_current_cr3_fast(void)
> >>>>      return cr3;
> >>>> }
> >>>> 
> >>>> +typedef struct {
> >>>> +     struct mm_struct *prev;
> >>>> +} temporary_mm_state_t;
> >>>> +
> >>>> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
> >>>> +{
> >>>> +     temporary_mm_state_t state;
> >>>> +
> >>>> +     lockdep_assert_irqs_disabled();
> >>>> +     state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
> >>>> +     switch_mm_irqs_off(NULL, mm, current);
> >>>> +     return state;
> >>>> +}
> >>> 
> >>> Hmm, why don't we return mm_struct *prev directly?
> >> 
> >> I did it this way to make it easier to add future debugging stuff
> >> later. Also, when I first wrote this, I stashed the old CR3 instead
> >> of the old mm_struct, and it seemed like callers should be insulated
> >> from details like this.
> > 
> > Hmm, I see. But in that case, we should call it "struct temporary_mm"
> > and explicitly allocate (and pass) it, since we can not return the
> > data structure from stack.
> 
> Why not?

Ah, ok as far as it returns a data structure as immediate value.
(I don't recommend it because it hides a copy..)

> 
> > If we can combine it with new mm, it will
> > be more encapsulated e.g.
> > 
> > struct temporary_mm {
> >    struct mm_struct *mm;
> >    struct mm_struct *prev;
> > };
> > 
> > static struct temporary_mm poking_tmp_mm;
> > 
> > poking_init()
> > {
> >    if (init_temporary_mm(&tmp_mm, &init_mm))
> >        goto error;
> >    ...
> > }
> > 
> > text_poke_safe()
> > {
> >    ...
> >    use_temporary_mm(&tmp_mm);
> >    ...
> >    unuse_temporary_mm(&tmp_mm);
> > }
> > 
> > Any thought?
> 
> That seems more complicated for not very much gain.

Hmm, OK. anyway that is just a style note. The code itself looks good for me.

Thank you,

>

Patch
diff mbox series

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index eeeb9289c764..96afc8c0cf15 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -338,4 +338,24 @@  static inline unsigned long __get_current_cr3_fast(void)
 	return cr3;
 }
 
+typedef struct {
+	struct mm_struct *prev;
+} temporary_mm_state_t;
+
+static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm)
+{
+	temporary_mm_state_t state;
+
+	lockdep_assert_irqs_disabled();
+	state.prev = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, mm, current);
+	return state;
+}
+
+static inline void unuse_temporary_mm(temporary_mm_state_t prev)
+{
+	lockdep_assert_irqs_disabled();
+	switch_mm_irqs_off(NULL, prev.prev, current);
+}
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */