All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christopher M. Riedl" <cmr@linux.ibm.com>
To: "Nicholas Piggin" <npiggin@gmail.com>, <linuxppc-dev@lists.ozlabs.org>
Cc: <tglx@linutronix.de>, <x86@kernel.org>, <keescook@chromium.org>,
	<linux-hardening@vger.kernel.org>
Subject: Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Date: Fri, 09 Jul 2021 00:03:17 -0500	[thread overview]
Message-ID: <CCOCHBK9FCOF.1KLFA4IL2AJNE@oc8246131445.ibm.com> (raw)
In-Reply-To: <1625125043.re5l9zg4kg.astroid@bobo.none>

On Thu Jul 1, 2021 at 2:51 AM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm:
> > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
> >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> >> > address to be patched is temporarily mapped as writeable. Currently, a
> >> > per-cpu vmalloc patch area is used for this purpose. While the patch
> >> > area is per-cpu, the temporary page mapping is inserted into the kernel
> >> > page tables for the duration of patching. The mapping is exposed to CPUs
> >> > other than the patching CPU - this is undesirable from a hardening
> >> > perspective. Use a temporary mm instead which keeps the mapping local to
> >> > the CPU doing the patching.
> >> > 
> >> > Use the `poking_init` init hook to prepare a temporary mm and patching
> >> > address. Initialize the temporary mm by copying the init mm. Choose a
> >> > randomized patching address inside the temporary mm userspace address
> >> > space. The patching address is randomized between PAGE_SIZE and
> >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> >> > the Book3s64 Hash MMU operates - by default the space above
> >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> >> > all platforms/MMUs is randomized inside this range.  The number of
> >> > possible random addresses is dependent on PAGE_SIZE and limited by
> >> > DEFAULT_MAP_WINDOW.
> >> > 
> >> > Bits of entropy with 64K page size on BOOK3S_64:
> >> > 
> >> >         bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> >> > 
> >> >         PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> >> >         bits of entropy = log2(128TB / 64K) bits of entropy = 31
> >> > 
> >> > Randomization occurs only once during initialization at boot.
> >> > 
> >> > Introduce two new functions, map_patch() and unmap_patch(), to
> >> > respectively create and remove the temporary mapping with write
> >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> >> > the page for patching with PAGE_SHARED since the kernel cannot access
> >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> >> > 
> >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
> >> > taking an SLB and Hash fault during patching.
> >>
> >> What prevents the SLBE or HPTE from being removed before the last
> >> access?
> > 
> > This code runs with local IRQs disabled - we also don't access anything
> > else in userspace so I'm not sure what else could cause the entries to
> > be removed TBH.
> > 
> >>
> >>
> >> > +#ifdef CONFIG_PPC_BOOK3S_64
> >> > +
> >> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >> >  {
> >> > -	struct vm_struct *area;
> >> > +	int err;
> >> >  
> >> > -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >> > -	if (!area) {
> >> > -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >> > -			cpu);
> >> > -		return -1;
> >> > -	}
> >> > -	this_cpu_write(text_poke_area, area);
> >> > +	if (radix_enabled())
> >> > +		return 0;
> >> >  
> >> > -	return 0;
> >> > -}
> >> > +	err = slb_allocate_user(patching_mm, patching_addr);
> >> > +	if (err)
> >> > +		pr_warn("map patch: failed to allocate slb entry\n");
> >> >  
> >> > -static int text_area_cpu_down(unsigned int cpu)
> >> > -{
> >> > -	free_vm_area(this_cpu_read(text_poke_area));
> >> > -	return 0;
> >> > +	err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> >> > +			   HPTE_USE_KERNEL_KEY);
> >> > +	if (err)
> >> > +		pr_warn("map patch: failed to insert hashed page\n");
> >> > +
> >> > +	/* See comment in switch_slb() in mm/book3s64/slb.c */
> >> > +	isync();
> >>
> >> I'm not sure if this is enough. Could we context switch here? You've
> >> got the PTL so no with a normal kernel but maybe yes with an RT kernel
> >> How about taking an machine check that clears the SLB? Could the HPTE
> >> get removed by something else here?
> > 
> > All of this happens after a local_irq_save() which should at least
> > prevent context switches IIUC.
>
> Ah yeah I didn't look that far back. A machine check can take out SLB
> entries.
>
> > I am not sure what else could cause the
> > HPTE to get removed here.
>
> Other CPUs?
>

Right because the HPTEs are "global".

> >> You want to prevent faults because you might be patching a fault
> >> handler?
> > 
> > In a more general sense: I don't think we want to take page faults every
> > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
> > fault handler codepath also checks `current->mm` in some places which
> > won't match the temporary mm. Also `current->mm` can be NULL which
> > caused problems in my earlier revisions of this series.
>
> Hmm, that's a bit of a hack then. Maybe doing an actual mm switch and
> setting current->mm properly would explode too much. Maybe that's
> okayish.
> But I can't see how the HPT code is up to the job of this in general
> (even if that current->mm issue was fixed).
>
> To do it without holes you would either have to get the SLB MCE handler
> to restore that particular SLB if it flushed it, or restart the patch
> code from a fixup location if it took an MCE after installing the SLB.
> And bolt a hash table entry.

We discussed this a bit off list and decided that it's not worth the
trouble implementing percpu temp mm support for Hash at this time.
Instead, I will post a new version of this series where we drop into
realmode to patch with the Hash MMU. This avoids the W+X mapping
altogether and so doesn't expose anything to other CPUs during patching.
We will keep the Radix support for a percpu temp mm since 1) it doesn't
require hacks like Hash and 2) it's overall preferable to dropping into
realmode.

>
> Thanks,
> Nick


WARNING: multiple messages have this Message-ID (diff)
From: "Christopher M. Riedl" <cmr@linux.ibm.com>
To: "Nicholas Piggin" <npiggin@gmail.com>, <linuxppc-dev@lists.ozlabs.org>
Cc: tglx@linutronix.de, x86@kernel.org, keescook@chromium.org,
	linux-hardening@vger.kernel.org
Subject: Re: [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching
Date: Fri, 09 Jul 2021 00:03:17 -0500	[thread overview]
Message-ID: <CCOCHBK9FCOF.1KLFA4IL2AJNE@oc8246131445.ibm.com> (raw)
In-Reply-To: <1625125043.re5l9zg4kg.astroid@bobo.none>

On Thu Jul 1, 2021 at 2:51 AM CDT, Nicholas Piggin wrote:
> Excerpts from Christopher M. Riedl's message of July 1, 2021 5:02 pm:
> > On Thu Jul 1, 2021 at 1:12 AM CDT, Nicholas Piggin wrote:
> >> Excerpts from Christopher M. Riedl's message of May 6, 2021 2:34 pm:
> >> > When code patching a STRICT_KERNEL_RWX kernel the page containing the
> >> > address to be patched is temporarily mapped as writeable. Currently, a
> >> > per-cpu vmalloc patch area is used for this purpose. While the patch
> >> > area is per-cpu, the temporary page mapping is inserted into the kernel
> >> > page tables for the duration of patching. The mapping is exposed to CPUs
> >> > other than the patching CPU - this is undesirable from a hardening
> >> > perspective. Use a temporary mm instead which keeps the mapping local to
> >> > the CPU doing the patching.
> >> > 
> >> > Use the `poking_init` init hook to prepare a temporary mm and patching
> >> > address. Initialize the temporary mm by copying the init mm. Choose a
> >> > randomized patching address inside the temporary mm userspace address
> >> > space. The patching address is randomized between PAGE_SIZE and
> >> > DEFAULT_MAP_WINDOW-PAGE_SIZE. The upper limit is necessary due to how
> >> > the Book3s64 Hash MMU operates - by default the space above
> >> > DEFAULT_MAP_WINDOW is not available. For now, the patching address for
> >> > all platforms/MMUs is randomized inside this range.  The number of
> >> > possible random addresses is dependent on PAGE_SIZE and limited by
> >> > DEFAULT_MAP_WINDOW.
> >> > 
> >> > Bits of entropy with 64K page size on BOOK3S_64:
> >> > 
> >> >         bits of entropy = log2(DEFAULT_MAP_WINDOW_USER64 / PAGE_SIZE)
> >> > 
> >> >         PAGE_SIZE=64K, DEFAULT_MAP_WINDOW_USER64=128TB
> >> >         bits of entropy = log2(128TB / 64K) bits of entropy = 31
> >> > 
> >> > Randomization occurs only once during initialization at boot.
> >> > 
> >> > Introduce two new functions, map_patch() and unmap_patch(), to
> >> > respectively create and remove the temporary mapping with write
> >> > permissions at patching_addr. The Hash MMU on Book3s64 requires mapping
> >> > the page for patching with PAGE_SHARED since the kernel cannot access
> >> > userspace pages with the PAGE_PRIVILEGED (PAGE_KERNEL) bit set.
> >> > 
> >> > Also introduce hash_prefault_mapping() to preload the SLB entry and HPTE
> >> > for the patching_addr when using the Hash MMU on Book3s64 to avoid
> >> > taking an SLB and Hash fault during patching.
> >>
> >> What prevents the SLBE or HPTE from being removed before the last
> >> access?
> > 
> > This code runs with local IRQs disabled - we also don't access anything
> > else in userspace so I'm not sure what else could cause the entries to
> > be removed TBH.
> > 
> >>
> >>
> >> > +#ifdef CONFIG_PPC_BOOK3S_64
> >> > +
> >> > +static inline int hash_prefault_mapping(pgprot_t pgprot)
> >> >  {
> >> > -	struct vm_struct *area;
> >> > +	int err;
> >> >  
> >> > -	area = get_vm_area(PAGE_SIZE, VM_ALLOC);
> >> > -	if (!area) {
> >> > -		WARN_ONCE(1, "Failed to create text area for cpu %d\n",
> >> > -			cpu);
> >> > -		return -1;
> >> > -	}
> >> > -	this_cpu_write(text_poke_area, area);
> >> > +	if (radix_enabled())
> >> > +		return 0;
> >> >  
> >> > -	return 0;
> >> > -}
> >> > +	err = slb_allocate_user(patching_mm, patching_addr);
> >> > +	if (err)
> >> > +		pr_warn("map patch: failed to allocate slb entry\n");
> >> >  
> >> > -static int text_area_cpu_down(unsigned int cpu)
> >> > -{
> >> > -	free_vm_area(this_cpu_read(text_poke_area));
> >> > -	return 0;
> >> > +	err = hash_page_mm(patching_mm, patching_addr, pgprot_val(pgprot), 0,
> >> > +			   HPTE_USE_KERNEL_KEY);
> >> > +	if (err)
> >> > +		pr_warn("map patch: failed to insert hashed page\n");
> >> > +
> >> > +	/* See comment in switch_slb() in mm/book3s64/slb.c */
> >> > +	isync();
> >>
> >> I'm not sure if this is enough. Could we context switch here? You've
> >> got the PTL so no with a normal kernel but maybe yes with an RT kernel
> >> How about taking an machine check that clears the SLB? Could the HPTE
> >> get removed by something else here?
> > 
> > All of this happens after a local_irq_save() which should at least
> > prevent context switches IIUC.
>
> Ah yeah I didn't look that far back. A machine check can take out SLB
> entries.
>
> > I am not sure what else could cause the
> > HPTE to get removed here.
>
> Other CPUs?
>

Right because the HPTEs are "global".

> >> You want to prevent faults because you might be patching a fault
> >> handler?
> > 
> > In a more general sense: I don't think we want to take page faults every
> > time we patch an instruction with a STRICT_RWX kernel. The Hash MMU page
> > fault handler codepath also checks `current->mm` in some places which
> > won't match the temporary mm. Also `current->mm` can be NULL which
> > caused problems in my earlier revisions of this series.
>
> Hmm, that's a bit of a hack then. Maybe doing an actual mm switch and
> setting current->mm properly would explode too much. Maybe that's
> okayish.
> But I can't see how the HPT code is up to the job of this in general
> (even if that current->mm issue was fixed).
>
> To do it without holes you would either have to get the SLB MCE handler
> to restore that particular SLB if it flushed it, or restart the patch
> code from a fixup location if it took an MCE after installing the SLB.
> And bolt a hash table entry.

We discussed this a bit off list and decided that it's not worth the
trouble implementing percpu temp mm support for Hash at this time.
Instead, I will post a new version of this series where we drop into
realmode to patch with the Hash MMU. This avoids the W+X mapping
altogether and so doesn't expose anything to other CPUs during patching.
We will keep the Radix support for a percpu temp mm since 1) it doesn't
require hacks like Hash and 2) it's overall preferable to dropping into
realmode.

>
> Thanks,
> Nick


  reply	other threads:[~2021-07-09  5:03 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-06  4:34 [RESEND PATCH v4 00/11] Use per-CPU temporary mappings for patching Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 01/11] powerpc: Add LKDTM accessor for patching addr Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 02/11] lkdtm/powerpc: Add test to hijack a patch mapping Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 03/11] x86_64: Add LKDTM accessor for patching addr Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 04/11] lkdtm/x86_64: Add test to hijack a patch mapping Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 05/11] powerpc/64s: Add ability to skip SLB preload Christopher M. Riedl
2021-06-21  3:13   ` Daniel Axtens
2021-07-01  3:48     ` Christopher M. Riedl
2021-07-01  3:48       ` Christopher M. Riedl
2021-07-01  4:15       ` Nicholas Piggin
2021-07-01  4:15         ` Nicholas Piggin
2021-07-01  5:28         ` Christopher M. Riedl
2021-07-01  5:28           ` Christopher M. Riedl
2021-07-01  6:04           ` Nicholas Piggin
2021-07-01  6:04             ` Nicholas Piggin
2021-07-01  6:53             ` Christopher M. Riedl
2021-07-01  6:53               ` Christopher M. Riedl
2021-07-01  7:37               ` Nicholas Piggin
2021-07-01  7:37                 ` Nicholas Piggin
2021-07-01 11:30                 ` Nicholas Piggin
2021-07-01 11:30                   ` Nicholas Piggin
2021-07-09  4:55                 ` Christopher M. Riedl
2021-07-09  4:55                   ` Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 06/11] powerpc: Introduce temporary mm Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 07/11] powerpc/64s: Make slb_allocate_user() non-static Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 08/11] powerpc: Initialize and use a temporary mm for patching Christopher M. Riedl
2021-06-21  3:19   ` Daniel Axtens
2021-07-01  5:11     ` Christopher M. Riedl
2021-07-01  5:11       ` Christopher M. Riedl
2021-07-01  6:12   ` Nicholas Piggin
2021-07-01  6:12     ` Nicholas Piggin
2021-07-01  7:02     ` Christopher M. Riedl
2021-07-01  7:02       ` Christopher M. Riedl
2021-07-01  7:51       ` Nicholas Piggin
2021-07-01  7:51         ` Nicholas Piggin
2021-07-09  5:03         ` Christopher M. Riedl [this message]
2021-07-09  5:03           ` Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 09/11] lkdtm/powerpc: Fix code patching hijack test Christopher M. Riedl
2021-05-06  4:34 ` [RESEND PATCH v4 10/11] powerpc: Protect patching_mm with a lock Christopher M. Riedl
2021-05-06 10:51   ` Peter Zijlstra
2021-05-06 10:51     ` Peter Zijlstra
2021-05-07 20:03     ` Christopher M. Riedl
2021-05-07 20:03       ` Christopher M. Riedl
2021-05-07 22:26       ` Peter Zijlstra
2021-05-06  4:34 ` [RESEND PATCH v4 11/11] powerpc: Use patch_instruction_unlocked() in loops Christopher M. Riedl

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CCOCHBK9FCOF.1KLFA4IL2AJNE@oc8246131445.ibm.com \
    --to=cmr@linux.ibm.com \
    --cc=keescook@chromium.org \
    --cc=linux-hardening@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=npiggin@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.