[1/4,tip:x86/mm] Correcting improper large page preservation
diff mbox series

Message ID v2o817ecb6f1003311859nb14bab83y80f743a0accc5028@mail.gmail.com
State New, archived
Headers show
Series
  • [1/4,tip:x86/mm] Correcting improper large page preservation
Related show

Commit Message

tip-bot for Siarhei Liakh April 1, 2010, 1:59 a.m. UTC
This patch fixes a bug in try_preserve_large_page() which may result
in improper large page preservation and improper application of page
attributes to the memory area outside of the original change request.
More specifically, the problem manifests itself when set_memory_*() is
called for several pages at the beginning of the large page and
try_preserve_large_page() erroneously concludes that the change can be
applied to whole large page.
The fix consists of 3 parts:
1. addition of "required" protection attributes in
static_protections(), so .data and .bss can be guaranteed to stay "RW"
2. static_protections() is now called for every small page within
large page to determine compatibility of new protection attributes
(instead of just small pages within the requested range).
3. large page can be preserved only if attribute change is
large-page-aligned and covers whole large page.

V1:  try_preserve_large_page() patch for Linux 2.6.34-rc2

Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com>
Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu>
---

 	 * The BIOS area between 640k and 1Mb needs to be executable for
@@ -278,6 +279,15 @@ static inline pgprot_t
static_protections(pgprot_t prot, unsigned long address,
 	if (within(pfn, __pa((unsigned long)__start_rodata) >> PAGE_SHIFT,
 		   __pa((unsigned long)__end_rodata) >> PAGE_SHIFT))
 		pgprot_val(forbidden) |= _PAGE_RW;
+	/*
+	 * .data and .bss should always be writable.
+	 */
+	if ((within(pfn, __pa((unsigned long)_sdata) >> PAGE_SHIFT,
+		   __pa((unsigned long)_edata) >> PAGE_SHIFT)) ||
+	    (within(pfn, __pa((unsigned long)__bss_start) >> PAGE_SHIFT,
+		   __pa((unsigned long)__bss_stop) >> PAGE_SHIFT))) {
+		pgprot_val(required) |= _PAGE_RW;
+	}

 #if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA)
 	/*
@@ -317,6 +327,7 @@ static inline pgprot_t static_protections(pgprot_t
prot, unsigned long address,
 #endif

 	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
+	prot = __pgprot(pgprot_val(prot) | pgprot_val(required));

 	return prot;
 }
@@ -393,7 +404,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 {
 	unsigned long nextpage_addr, numpages, pmask, psize, flags, addr, pfn;
 	pte_t new_pte, old_pte, *tmp;
-	pgprot_t old_prot, new_prot;
+	pgprot_t old_prot, new_prot, req_prot;
 	int i, do_split = 1;
 	unsigned int level;

@@ -438,10 +449,10 @@ try_preserve_large_page(pte_t *kpte, unsigned
long address,
 	 * We are safe now. Check whether the new pgprot is the same:
 	 */
 	old_pte = *kpte;
-	old_prot = new_prot = pte_pgprot(old_pte);
+	old_prot = new_prot = req_prot = pte_pgprot(old_pte);

-	pgprot_val(new_prot) &= ~pgprot_val(cpa->mask_clr);
-	pgprot_val(new_prot) |= pgprot_val(cpa->mask_set);
+	pgprot_val(req_prot) &= ~pgprot_val(cpa->mask_clr);
+	pgprot_val(req_prot) |= pgprot_val(cpa->mask_set);

 	/*
 	 * old_pte points to the large page base address. So we need
@@ -450,17 +461,17 @@ try_preserve_large_page(pte_t *kpte, unsigned
long address,
 	pfn = pte_pfn(old_pte) + ((address & (psize - 1)) >> PAGE_SHIFT);
 	cpa->pfn = pfn;

-	new_prot = static_protections(new_prot, address, pfn);
+	new_prot = static_protections(req_prot, address, pfn);

 	/*
 	 * We need to check the full range, whether
 	 * static_protection() requires a different pgprot for one of
 	 * the pages in the range we try to preserve:
 	 */
-	addr = address + PAGE_SIZE;
-	pfn++;
-	for (i = 1; i < cpa->numpages; i++, addr += PAGE_SIZE, pfn++) {
-		pgprot_t chk_prot = static_protections(new_prot, addr, pfn);
+	addr = address & pmask;
+	pfn = pte_pfn(old_pte);
+	for (i = 0; i < (psize >> PAGE_SHIFT); i++, addr += PAGE_SIZE, pfn++) {
+		pgprot_t chk_prot = static_protections(req_prot, addr, pfn);

 		if (pgprot_val(chk_prot) != pgprot_val(new_prot))
 			goto out_unlock;
@@ -483,7 +494,8 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
 	 * that we limited the number of possible pages already to
 	 * the number of pages in the large page.
 	 */
-	if (address == (nextpage_addr - psize) && cpa->numpages == numpages) {
+	if (address == (address & pmask) &&
+		cpa->numpages == (psize >> PAGE_SHIFT)) {
 		/*
 		 * The address is aligned and the number of pages
 		 * covers the full page.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

tip-bot for Suresh Siddha April 3, 2010, 6:43 a.m. UTC | #1
On Wed, 2010-03-31 at 18:59 -0700, Siarhei Liakh wrote:
> +	/*
> +	 * .data and .bss should always be writable.
> +	 */
> +	if ((within(pfn, __pa((unsigned long)_sdata) >> PAGE_SHIFT,
> +		   __pa((unsigned long)_edata) >> PAGE_SHIFT)) ||
> +	    (within(pfn, __pa((unsigned long)__bss_start) >> PAGE_SHIFT,
> +		   __pa((unsigned long)__bss_stop) >> PAGE_SHIFT))) {
> +		pgprot_val(required) |= _PAGE_RW;
> +	}

I have reviewed this patch and the only comment I have is:

On 64bit kernels, kernel text/data mapping and kernel identity mappings
are different virtual addresses mapping to same pfn ranges. For the
data/bss pages, does it help (in identifying certain data corruptions
more easily) in making the kernel identity mapping to be set to
read-only and enforce the need of RW only for the kernel data mappings.

Or is there some obscure code that uses something like
__va(__pa(data_symbol)) and writes to it?

If not, we can remove the __pa() constructs above and use the addr for
comparisons.

Otherwise this patch looks good to me.

Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Siarhei Liakh April 6, 2010, 2:51 p.m. UTC | #2
On Sat, Apr 3, 2010 at 2:43 AM, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> On Wed, 2010-03-31 at 18:59 -0700, Siarhei Liakh wrote:
>> +     /*
>> +      * .data and .bss should always be writable.
>> +      */
>> +     if ((within(pfn, __pa((unsigned long)_sdata) >> PAGE_SHIFT,
>> +                __pa((unsigned long)_edata) >> PAGE_SHIFT)) ||
>> +         (within(pfn, __pa((unsigned long)__bss_start) >> PAGE_SHIFT,
>> +                __pa((unsigned long)__bss_stop) >> PAGE_SHIFT))) {
>> +             pgprot_val(required) |= _PAGE_RW;
>> +     }
>
> I have reviewed this patch and the only comment I have is:
>
> On 64bit kernels, kernel text/data mapping and kernel identity mappings
> are different virtual addresses mapping to same pfn ranges. For the
> data/bss pages, does it help (in identifying certain data corruptions
> more easily) in making the kernel identity mapping to be set to
> read-only and enforce the need of RW only for the kernel data mappings.
>
> Or is there some obscure code that uses something like
> __va(__pa(data_symbol)) and writes to it?
>
> If not, we can remove the __pa() constructs above and use the addr for
> comparisons.

Done.
Patch V2 have been posted.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Siarhei Liakh May 5, 2010, 6:14 p.m. UTC | #3
On Tue, Apr 6, 2010 at 10:51 AM, Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> On Sat, Apr 3, 2010 at 2:43 AM, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>> On Wed, 2010-03-31 at 18:59 -0700, Siarhei Liakh wrote:
>>> +     /*
>>> +      * .data and .bss should always be writable.
>>> +      */
>>> +     if ((within(pfn, __pa((unsigned long)_sdata) >> PAGE_SHIFT,
>>> +                __pa((unsigned long)_edata) >> PAGE_SHIFT)) ||
>>> +         (within(pfn, __pa((unsigned long)__bss_start) >> PAGE_SHIFT,
>>> +                __pa((unsigned long)__bss_stop) >> PAGE_SHIFT))) {
>>> +             pgprot_val(required) |= _PAGE_RW;
>>> +     }
>>
>> I have reviewed this patch and the only comment I have is:
>>
>> On 64bit kernels, kernel text/data mapping and kernel identity mappings
>> are different virtual addresses mapping to same pfn ranges. For the
>> data/bss pages, does it help (in identifying certain data corruptions
>> more easily) in making the kernel identity mapping to be set to
>> read-only and enforce the need of RW only for the kernel data mappings.
>>
>> Or is there some obscure code that uses something like
>> __va(__pa(data_symbol)) and writes to it?
>>
>> If not, we can remove the __pa() constructs above and use the addr for
>> comparisons.
>
> Done.
> Patch V2 have been posted.

Does anyone have  any feedback on the whole kernel RO/NX patch set?
Or should I re-post all 4 patches one more time?

Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
tip-bot for Ingo Molnar May 6, 2010, 6:19 a.m. UTC | #4
* Siarhei Liakh <sliakh.lkml@gmail.com> wrote:

> On Tue, Apr 6, 2010 at 10:51 AM, Siarhei Liakh <sliakh.lkml@gmail.com> wrote:
> > On Sat, Apr 3, 2010 at 2:43 AM, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> >> On Wed, 2010-03-31 at 18:59 -0700, Siarhei Liakh wrote:
> >>> + ? ? /*
> >>> + ? ? ?* .data and .bss should always be writable.
> >>> + ? ? ?*/
> >>> + ? ? if ((within(pfn, __pa((unsigned long)_sdata) >> PAGE_SHIFT,
> >>> + ? ? ? ? ? ? ? ?__pa((unsigned long)_edata) >> PAGE_SHIFT)) ||
> >>> + ? ? ? ? (within(pfn, __pa((unsigned long)__bss_start) >> PAGE_SHIFT,
> >>> + ? ? ? ? ? ? ? ?__pa((unsigned long)__bss_stop) >> PAGE_SHIFT))) {
> >>> + ? ? ? ? ? ? pgprot_val(required) |= _PAGE_RW;
> >>> + ? ? }
> >>
> >> I have reviewed this patch and the only comment I have is:
> >>
> >> On 64bit kernels, kernel text/data mapping and kernel identity mappings
> >> are different virtual addresses mapping to same pfn ranges. For the
> >> data/bss pages, does it help (in identifying certain data corruptions
> >> more easily) in making the kernel identity mapping to be set to
> >> read-only and enforce the need of RW only for the kernel data mappings.
> >>
> >> Or is there some obscure code that uses something like
> >> __va(__pa(data_symbol)) and writes to it?
> >>
> >> If not, we can remove the __pa() constructs above and use the addr for
> >> comparisons.
> >
> > Done.
> > Patch V2 have been posted.
> 
> Does anyone have any feedback on the whole kernel RO/NX patch set? Or should 
> I re-post all 4 patches one more time?
> 
> Thank you.

Please do - i havent seen any other review feedback.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index cf07c26..6844675 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -255,6 +255,7 @@  static inline pgprot_t static_protections(pgprot_t
prot, unsigned long address,
 				   unsigned long pfn)
 {
 	pgprot_t forbidden = __pgprot(0);
+	pgprot_t required = __pgprot(0);

 	/*