From mboxrd@z Thu Jan 1 00:00:00 1970 From: Christoph Hellwig Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 4 Nov 2016 07:50:40 -0700 Message-ID: <20161104145040.GA24930@infradead.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu, Tejun Heo , linux-ide@vger.kernel.org List-Id: linux-ide@vger.kernel.org The libata parts here really need to be split out and the proper list and maintainer need to be Cc'ed. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h This is just piling one nasty hack on top of another. libata should just use the highmem case unconditionally, as it is the correct thing to do for all cases. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754010AbcBZOVO (ORCPT ); Fri, 26 Feb 2016 09:21:14 -0500 Received: from g1t6225.austin.hp.com ([15.73.96.126]:52211 "EHLO g1t6225.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752174AbcBZOVM (ORCPT ); Fri, 26 Feb 2016 09:21:12 -0500 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu, juerg.haefliger@hpe.com Subject: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 26 Feb 2016 15:21:07 +0100 Message-Id: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userland, unless explicitly requested by the kernel. Whenever a page destined for userland is allocated, it is unmapped from physmap. When such a page is reclaimed from userland, it is mapped back to physmap. Mapping/unmapping from physmap is accomplished by modifying the PTE permission bits to allow/disallow access to the page. Additional fields are added to the page struct for XPFO housekeeping. Specifically a flags field to distinguish user vs. kernel pages, a reference counter to track physmap map/unmap operations and a lock to protect the XPFO fields. Known issues/limitations: - Only supported on x86-64. - Only supports 4k pages. - Adds additional data to the page struct. - There are most likely some additional and legitimate uses cases where the kernel needs to access userspace. Those need to be identified and made XPFO-aware. - There's a performance impact if XPFO is turned on. Per the paper referenced below it's in the 1-3% ballpark. More performance testing wouldn't hurt. What tests to run though? Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 2 +- arch/x86/Kconfig.debug | 17 +++++ arch/x86/mm/Makefile | 2 + arch/x86/mm/init.c | 3 +- arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ block/blk-map.c | 7 +- include/linux/highmem.h | 23 +++++-- include/linux/mm_types.h | 4 ++ include/linux/xpfo.h | 88 ++++++++++++++++++++++++ lib/swiotlb.c | 3 +- mm/page_alloc.c | 7 +- 11 files changed, 323 insertions(+), 9 deletions(-) create mode 100644 arch/x86/mm/xpfo.c create mode 100644 include/linux/xpfo.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c46662f..9d32b4a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 9b18ed9..1331da5 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT source "lib/Kconfig.debug" +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL + depends on X86_64 + select DEBUG_TLBFLUSH + ---help--- + This option offers protection against 'ret2dir' (kernel) attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, whenever page frames are freed/reclaimed, they + are mapped back to physmap. Special care is taken to minimize the + impact on performance by reducing TLB shootdowns and unnecessary page + zero fills. + + If in doubt, say "N". + config X86_VERBOSE_BOOTUP bool "Enable verbose x86 bootup info messages" default y diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index f9d38a4..8bf52b6 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MPX) += mpx.o + +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 493f541..27fc8a6 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -150,7 +150,8 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ + !defined(CONFIG_XPFO) /* * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. * This will simplify cpa(), which otherwise needs to support splitting diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c new file mode 100644 index 0000000..6bc24d3 --- /dev/null +++ b/arch/x86/mm/xpfo.c @@ -0,0 +1,176 @@ +/* + * Copyright (C) 2016 Brown University. All rights reserved. + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * + * Authors: + * Vasileios P. Kemerlis + * Juerg Haefliger + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include + +#include +#include + +#define TEST_XPFO_FLAG(flag, page) \ + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define SET_XPFO_FLAG(flag, page) \ + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define CLEAR_XPFO_FLAG(flag, page) \ + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +inline void xpfo_clear_zap(struct page *page, int order) +{ + int i; + + for (i = 0; i < (1 << order); i++) + CLEAR_XPFO_FLAG(zap, page + i); +} + +inline int xpfo_test_and_clear_zap(struct page *page) +{ + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); +} + +inline int xpfo_test_kernel(struct page *page) +{ + return TEST_XPFO_FLAG(kernel, page); +} + +inline int xpfo_test_user(struct page *page) +{ + return TEST_XPFO_FLAG(user, page); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, tlb_shoot = 0; + unsigned long kaddr; + + for (i = 0; i < (1 << order); i++) { + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || + TEST_XPFO_FLAG(user, page + i)); + + if (gfp & GFP_HIGHUSER) { + /* Initialize the xpfo lock and map counter */ + spin_lock_init(&(page + i)->xpfo.lock); + atomic_set(&(page + i)->xpfo.mapcount, 0); + + /* Mark it as a user page */ + SET_XPFO_FLAG(user_fp, page + i); + + /* + * Shoot the TLB if the page was previously allocated + * to kernel space + */ + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) + tlb_shoot = 1; + } else { + /* Mark it as a kernel page */ + SET_XPFO_FLAG(kernel, page + i); + } + } + + if (tlb_shoot) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + unsigned long kaddr; + + for (i = 0; i < (1 << order); i++) { + + /* The page frame was previously allocated to user space */ + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { + kaddr = (unsigned long)page_address(page + i); + + /* Clear the page and mark it accordingly */ + clear_page((void *)kaddr); + SET_XPFO_FLAG(zap, page + i); + + /* Map it back to kernel space */ + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + + /* No TLB update */ + } + + /* Clear the xpfo fast-path flag */ + CLEAR_XPFO_FLAG(user_fp, page + i); + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + unsigned long flags; + + /* The page is allocated to kernel space, so nothing to do */ + if (TEST_XPFO_FLAG(kernel, page)) + return; + + spin_lock_irqsave(&page->xpfo.lock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB update required. + */ + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && + TEST_XPFO_FLAG(user, page)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page->xpfo.lock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + unsigned long flags; + + /* The page is allocated to kernel space, so nothing to do */ + if (TEST_XPFO_FLAG(kernel, page)) + return; + + spin_lock_irqsave(&page->xpfo.lock, flags); + + /* + * The page frame is to be allocated back to user space. So unmap it + * from the kernel, update the TLB and mark it as a user page. + */ + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + SET_XPFO_FLAG(user, page); + } + + spin_unlock_irqrestore(&page->xpfo.lock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); diff --git a/block/blk-map.c b/block/blk-map.c index f565e11..b7b8302 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, prv.iov_len = iov.iov_len; } - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) + /* + * juergh: Temporary hack to force the use of a bounce buffer if XPFO + * is enabled. Results in an XPFO page fault otherwise. + */ + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || + IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f329..0ca9130 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); + pagefault_enable(); preempt_enable(); } @@ -133,7 +146,8 @@ do { \ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { void *addr = kmap_atomic(page); - clear_user_page(addr, vaddr, page); + if (!xpfo_test_and_clear_zap(page)) + clear_user_page(addr, vaddr, page); kunmap_atomic(addr); } #endif @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, static inline void clear_highpage(struct page *page) { void *kaddr = kmap_atomic(page); - clear_page(kaddr); + if (!xpfo_test_and_clear_zap(page)) + clear_page(kaddr); kunmap_atomic(kaddr); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 624b78b..71c95aa 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -215,6 +216,9 @@ struct page { #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif +#ifdef CONFIG_XPFO + struct xpfo_info xpfo; +#endif } /* * The struct page can be forced to be double word aligned so that atomic ops diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 0000000..c4f0871 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,88 @@ +/* + * Copyright (C) 2016 Brown University. All rights reserved. + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * + * Authors: + * Vasileios P. Kemerlis + * Juerg Haefliger + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +/* + * XPFO page flags: + * + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag + * is used in the fast path, where the page is marked accordingly but *not* + * unmapped from the kernel. In most cases, the kernel will need access to the + * page immediately after its acquisition so an unnecessary mapping operation + * is avoided. + * + * PG_XPFO_user denotes that the page is destined for user space. This flag is + * used in the slow path, where the page needs to be mapped/unmapped when the + * kernel wants to access it. If a page is deallocated and this flag is set, + * the page is cleared and mapped back into the kernel. + * + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used + * for identifying pages that are first assigned to kernel space and then freed + * and mapped to user space. In such cases, an expensive TLB shootdown is + * necessary. Pages allocated to user space, freed, and subsequently allocated + * to user space again, require only local TLB invalidation. + * + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to + * avoid zapping pages multiple times. Whenever a page is freed and was + * previously mapped to user space, it needs to be zapped before mapped back + * in to the kernel. + */ + +enum xpfo_pageflags { + PG_XPFO_user_fp, + PG_XPFO_user, + PG_XPFO_kernel, + PG_XPFO_zap, +}; + +struct xpfo_info { + unsigned long flags; /* Flags for tracking the page's XPFO state */ + atomic_t mapcount; /* Counter for balancing page map/unmap + * requests. Only the first map request maps + * the page back to kernel space. Likewise, + * only the last unmap request unmaps the page. + */ + spinlock_t lock; /* Lock to serialize concurrent map/unmap + * requests. + */ +}; + +extern void xpfo_clear_zap(struct page *page, int order); +extern int xpfo_test_and_clear_zap(struct page *page); +extern int xpfo_test_kernel(struct page *page); +extern int xpfo_test_user(struct page *page); + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +#else /* ifdef CONFIG_XPFO */ + +static inline void xpfo_clear_zap(struct page *page, int order) { } +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } +static inline int xpfo_test_kernel(struct page *page) { return 0; } +static inline int xpfo_test_user(struct page *page) { return 0; } + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +#endif /* ifdef CONFIG_XPFO */ + +#endif /* ifndef _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 76f29ec..cf57ee9 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_test_user(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 838ca8bb..47b42a3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) } arch_free_page(page, order); kernel_map_pages(page, 1 << order, 0); + xpfo_free_page(page, order); return true; } @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, arch_alloc_page(page, order); kernel_map_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); if (gfp_flags & __GFP_ZERO) for (i = 0; i < (1 << order); i++) clear_highpage(page + i); + else + xpfo_clear_zap(page, order); if (order && (gfp_flags & __GFP_COMP)) prep_compound_page(page, order); @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + if (!cold && !xpfo_test_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); + pcp->count++; if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); -- 2.1.4 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751103AbcCABbL (ORCPT ); Mon, 29 Feb 2016 20:31:11 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41897 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750725AbcCABbH (ORCPT ); Mon, 29 Feb 2016 20:31:07 -0500 Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Cc: vpk@cs.brown.edu, Kees Cook From: Laura Abbott Message-ID: <56D4F0D6.2060308@redhat.com> Date: Mon, 29 Feb 2016 17:31:02 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/26/2016 06:21 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userland, unless explicitly requested by the > kernel. Whenever a page destined for userland is allocated, it is > unmapped from physmap. When such a page is reclaimed from userland, it is > mapped back to physmap. > > Mapping/unmapping from physmap is accomplished by modifying the PTE > permission bits to allow/disallow access to the page. > > Additional fields are added to the page struct for XPFO housekeeping. > Specifically a flags field to distinguish user vs. kernel pages, a > reference counter to track physmap map/unmap operations and a lock to > protect the XPFO fields. > > Known issues/limitations: > - Only supported on x86-64. > - Only supports 4k pages. > - Adds additional data to the page struct. > - There are most likely some additional and legitimate uses cases where > the kernel needs to access userspace. Those need to be identified and > made XPFO-aware. > - There's a performance impact if XPFO is turned on. Per the paper > referenced below it's in the 1-3% ballpark. More performance testing > wouldn't hurt. What tests to run though? > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > General note: Make sure to cc the x86 maintainers on the next version of the patch. I'd also recommend ccing the kernel hardening list (see the wiki page http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project for details) If you can find a way to break this up into x86 specific vs. generic patches that would be better. Perhaps move the Kconfig for XPFO to the generic Kconfig layer and make it depend on ARCH_HAS_XPFO? x86 can then select ARCH_HAS_XPFO as the last option. There also isn't much that's actually x86 specific here except for some of the page table manipulation functions and even those can probably be abstracted away. It would be good to get more of this out of x86 to let other arches take advantage of it. The arm64 implementation would look pretty similar if you save the old kernel mapping and restore it on free. > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 2 +- > arch/x86/Kconfig.debug | 17 +++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/init.c | 3 +- > arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ > block/blk-map.c | 7 +- > include/linux/highmem.h | 23 +++++-- > include/linux/mm_types.h | 4 ++ > include/linux/xpfo.h | 88 ++++++++++++++++++++++++ > lib/swiotlb.c | 3 +- > mm/page_alloc.c | 7 +- > 11 files changed, 323 insertions(+), 9 deletions(-) > create mode 100644 arch/x86/mm/xpfo.c > create mode 100644 include/linux/xpfo.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index c46662f..9d32b4a 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug > index 9b18ed9..1331da5 100644 > --- a/arch/x86/Kconfig.debug > +++ b/arch/x86/Kconfig.debug > @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT > > source "lib/Kconfig.debug" > > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on DEBUG_KERNEL > + depends on X86_64 > + select DEBUG_TLBFLUSH > + ---help--- > + This option offers protection against 'ret2dir' (kernel) attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, whenever page frames are freed/reclaimed, they > + are mapped back to physmap. Special care is taken to minimize the > + impact on performance by reducing TLB shootdowns and unnecessary page > + zero fills. > + > + If in doubt, say "N". > + > config X86_VERBOSE_BOOTUP > bool "Enable verbose x86 bootup info messages" > default y > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index f9d38a4..8bf52b6 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > + > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 493f541..27fc8a6 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -150,7 +150,8 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ > + !defined(CONFIG_XPFO) > /* > * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. > * This will simplify cpa(), which otherwise needs to support splitting > diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c > new file mode 100644 > index 0000000..6bc24d3 > --- /dev/null > +++ b/arch/x86/mm/xpfo.c > @@ -0,0 +1,176 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > + > +#include > +#include > + > +#define TEST_XPFO_FLAG(flag, page) \ > + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define SET_XPFO_FLAG(flag, page) \ > + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define CLEAR_XPFO_FLAG(flag, page) \ > + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ > + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +inline void xpfo_clear_zap(struct page *page, int order) > +{ > + int i; > + > + for (i = 0; i < (1 << order); i++) > + CLEAR_XPFO_FLAG(zap, page + i); > +} > + > +inline int xpfo_test_and_clear_zap(struct page *page) > +{ > + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); > +} > + > +inline int xpfo_test_kernel(struct page *page) > +{ > + return TEST_XPFO_FLAG(kernel, page); > +} > + > +inline int xpfo_test_user(struct page *page) > +{ > + return TEST_XPFO_FLAG(user, page); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, tlb_shoot = 0; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || > + TEST_XPFO_FLAG(user, page + i)); > + > + if (gfp & GFP_HIGHUSER) { This check doesn't seem right. If the GFP flags have _any_ in common with GFP_HIGHUSER it will be marked as a user page so GFP_KERNEL will be marked as well. > + /* Initialize the xpfo lock and map counter */ > + spin_lock_init(&(page + i)->xpfo.lock); This is initializing the spin_lock every time. That's not really necessary. > + atomic_set(&(page + i)->xpfo.mapcount, 0); > + > + /* Mark it as a user page */ > + SET_XPFO_FLAG(user_fp, page + i); > + > + /* > + * Shoot the TLB if the page was previously allocated > + * to kernel space > + */ > + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) > + tlb_shoot = 1; > + } else { > + /* Mark it as a kernel page */ > + SET_XPFO_FLAG(kernel, page + i); > + } > + } > + > + if (tlb_shoot) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + > + /* The page frame was previously allocated to user space */ > + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { > + kaddr = (unsigned long)page_address(page + i); > + > + /* Clear the page and mark it accordingly */ > + clear_page((void *)kaddr); Clearing the page isn't related to XPFO. There's other work ongoing to do clearing of the page on free. > + SET_XPFO_FLAG(zap, page + i); > + > + /* Map it back to kernel space */ > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + > + /* No TLB update */ > + } > + > + /* Clear the xpfo fast-path flag */ > + CLEAR_XPFO_FLAG(user_fp, page + i); > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB update required. > + */ > + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && > + TEST_XPFO_FLAG(user, page)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page frame is to be allocated back to user space. So unmap it > + * from the kernel, update the TLB and mark it as a user page. > + */ > + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && > + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + SET_XPFO_FLAG(user, page); > + } > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); I'm confused by the checks in kmap/kunmap here. It looks like once the page is allocated there is no changing of flags between user and kernel mode so the checks for if the page is user seem redundant. > diff --git a/block/blk-map.c b/block/blk-map.c > index f565e11..b7b8302 100644 > --- a/block/blk-map.c > +++ b/block/blk-map.c > @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, > prv.iov_len = iov.iov_len; > } > > - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) > + /* > + * juergh: Temporary hack to force the use of a bounce buffer if XPFO > + * is enabled. Results in an XPFO page fault otherwise. > + */ > + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || > + IS_ENABLED(CONFIG_XPFO)) > bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); > else > bio = bio_map_user_iov(q, iter, gfp_mask); > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f329..0ca9130 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > + > pagefault_enable(); > preempt_enable(); > } > @@ -133,7 +146,8 @@ do { \ > static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > { > void *addr = kmap_atomic(page); > - clear_user_page(addr, vaddr, page); > + if (!xpfo_test_and_clear_zap(page)) > + clear_user_page(addr, vaddr, page); > kunmap_atomic(addr); > } > #endif > @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > static inline void clear_highpage(struct page *page) > { > void *kaddr = kmap_atomic(page); > - clear_page(kaddr); > + if (!xpfo_test_and_clear_zap(page)) > + clear_page(kaddr); > kunmap_atomic(kaddr); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 624b78b..71c95aa 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -215,6 +216,9 @@ struct page { > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > int _last_cpupid; > #endif > +#ifdef CONFIG_XPFO > + struct xpfo_info xpfo; > +#endif > } > /* > * The struct page can be forced to be double word aligned so that atomic ops > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 0000000..c4f0871 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,88 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +/* > + * XPFO page flags: > + * > + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag > + * is used in the fast path, where the page is marked accordingly but *not* > + * unmapped from the kernel. In most cases, the kernel will need access to the > + * page immediately after its acquisition so an unnecessary mapping operation > + * is avoided. > + * > + * PG_XPFO_user denotes that the page is destined for user space. This flag is > + * used in the slow path, where the page needs to be mapped/unmapped when the > + * kernel wants to access it. If a page is deallocated and this flag is set, > + * the page is cleared and mapped back into the kernel. > + * > + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used > + * for identifying pages that are first assigned to kernel space and then freed > + * and mapped to user space. In such cases, an expensive TLB shootdown is > + * necessary. Pages allocated to user space, freed, and subsequently allocated > + * to user space again, require only local TLB invalidation. > + * > + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to > + * avoid zapping pages multiple times. Whenever a page is freed and was > + * previously mapped to user space, it needs to be zapped before mapped back > + * in to the kernel. > + */ 'zap' doesn't really indicate what is actually happening with the page. Can you be a bit more descriptive about what this actually does? > + > +enum xpfo_pageflags { > + PG_XPFO_user_fp, > + PG_XPFO_user, > + PG_XPFO_kernel, > + PG_XPFO_zap, > +}; > + > +struct xpfo_info { > + unsigned long flags; /* Flags for tracking the page's XPFO state */ > + atomic_t mapcount; /* Counter for balancing page map/unmap > + * requests. Only the first map request maps > + * the page back to kernel space. Likewise, > + * only the last unmap request unmaps the page. > + */ > + spinlock_t lock; /* Lock to serialize concurrent map/unmap > + * requests. > + */ > +}; Can you change this to use the page_ext implementation? See what mm/page_owner.c does. This might lessen the impact of the extra page metadata. This metadata still feels like a copy of what mm/highmem.c is trying to do though. > + > +extern void xpfo_clear_zap(struct page *page, int order); > +extern int xpfo_test_and_clear_zap(struct page *page); > +extern int xpfo_test_kernel(struct page *page); > +extern int xpfo_test_user(struct page *page); > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +#else /* ifdef CONFIG_XPFO */ > + > +static inline void xpfo_clear_zap(struct page *page, int order) { } > +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } > +static inline int xpfo_test_kernel(struct page *page) { return 0; } > +static inline int xpfo_test_user(struct page *page) { return 0; } > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +#endif /* ifdef CONFIG_XPFO */ > + > +#endif /* ifndef _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 76f29ec..cf57ee9 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_test_user(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 838ca8bb..47b42a3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) > } > arch_free_page(page, order); > kernel_map_pages(page, 1 << order, 0); > + xpfo_free_page(page, order); > > return true; > } > @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, > arch_alloc_page(page, order); > kernel_map_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > > if (gfp_flags & __GFP_ZERO) > for (i = 0; i < (1 << order); i++) > clear_highpage(page + i); > + else > + xpfo_clear_zap(page, order); > > if (order && (gfp_flags & __GFP_COMP)) > prep_compound_page(page, order); > @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) > } > > pcp = &this_cpu_ptr(zone->pageset)->pcp; > - if (!cold) > + if (!cold && !xpfo_test_kernel(page)) > list_add(&page->lru, &pcp->lists[migratetype]); > else > list_add_tail(&page->lru, &pcp->lists[migratetype]); > + What's the advantage of this? > pcp->count++; > if (pcp->count >= pcp->high) { > unsigned long batch = READ_ONCE(pcp->batch); > Thanks, Laura From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751496AbcCACKh (ORCPT ); Mon, 29 Feb 2016 21:10:37 -0500 Received: from mail-pa0-f54.google.com ([209.85.220.54]:34640 "EHLO mail-pa0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750998AbcCACKf (ORCPT ); Mon, 29 Feb 2016 21:10:35 -0500 Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Cc: vpk@cs.brown.edu From: Balbir Singh Message-ID: <56D4FA15.9060700@gmail.com> Date: Tue, 1 Mar 2016 13:10:29 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 MIME-Version: 1.0 In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27/02/16 01:21, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userland, unless explicitly requested by the > kernel. Whenever a page destined for userland is allocated, it is > unmapped from physmap. When such a page is reclaimed from userland, it is > mapped back to physmap. physmap == xen physmap? Please clarify > Mapping/unmapping from physmap is accomplished by modifying the PTE > permission bits to allow/disallow access to the page. > > Additional fields are added to the page struct for XPFO housekeeping. > Specifically a flags field to distinguish user vs. kernel pages, a > reference counter to track physmap map/unmap operations and a lock to > protect the XPFO fields. > > Known issues/limitations: > - Only supported on x86-64. Is it due to lack of porting or a design limitation? > - Only supports 4k pages. > - Adds additional data to the page struct. > - There are most likely some additional and legitimate uses cases where > the kernel needs to access userspace. Those need to be identified and > made XPFO-aware. Why not build an audit mode for it? > - There's a performance impact if XPFO is turned on. Per the paper > referenced below it's in the 1-3% ballpark. More performance testing > wouldn't hurt. What tests to run though? > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger This patch needs to be broken down into smaller patches - a series > --- > arch/x86/Kconfig | 2 +- > arch/x86/Kconfig.debug | 17 +++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/init.c | 3 +- > arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ > block/blk-map.c | 7 +- > include/linux/highmem.h | 23 +++++-- > include/linux/mm_types.h | 4 ++ > include/linux/xpfo.h | 88 ++++++++++++++++++++++++ > lib/swiotlb.c | 3 +- > mm/page_alloc.c | 7 +- > 11 files changed, 323 insertions(+), 9 deletions(-) > create mode 100644 arch/x86/mm/xpfo.c > create mode 100644 include/linux/xpfo.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index c46662f..9d32b4a 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug > index 9b18ed9..1331da5 100644 > --- a/arch/x86/Kconfig.debug > +++ b/arch/x86/Kconfig.debug > @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT > > source "lib/Kconfig.debug" > > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on DEBUG_KERNEL > + depends on X86_64 > + select DEBUG_TLBFLUSH > + ---help--- > + This option offers protection against 'ret2dir' (kernel) attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, whenever page frames are freed/reclaimed, they > + are mapped back to physmap. Special care is taken to minimize the > + impact on performance by reducing TLB shootdowns and unnecessary page > + zero fills. > + > + If in doubt, say "N". > + > config X86_VERBOSE_BOOTUP > bool "Enable verbose x86 bootup info messages" > default y > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index f9d38a4..8bf52b6 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > + > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 493f541..27fc8a6 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -150,7 +150,8 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ > + !defined(CONFIG_XPFO) > /* > * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. > * This will simplify cpa(), which otherwise needs to support splitting > diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c > new file mode 100644 > index 0000000..6bc24d3 > --- /dev/null > +++ b/arch/x86/mm/xpfo.c > @@ -0,0 +1,176 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > + > +#include > +#include > + > +#define TEST_XPFO_FLAG(flag, page) \ > + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define SET_XPFO_FLAG(flag, page) \ > + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define CLEAR_XPFO_FLAG(flag, page) \ > + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ > + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +inline void xpfo_clear_zap(struct page *page, int order) > +{ > + int i; > + > + for (i = 0; i < (1 << order); i++) > + CLEAR_XPFO_FLAG(zap, page + i); > +} > + > +inline int xpfo_test_and_clear_zap(struct page *page) > +{ > + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); > +} > + > +inline int xpfo_test_kernel(struct page *page) > +{ > + return TEST_XPFO_FLAG(kernel, page); > +} > + > +inline int xpfo_test_user(struct page *page) > +{ > + return TEST_XPFO_FLAG(user, page); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, tlb_shoot = 0; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || > + TEST_XPFO_FLAG(user, page + i)); > + > + if (gfp & GFP_HIGHUSER) { Why GFP_HIGHUSER? > + /* Initialize the xpfo lock and map counter */ > + spin_lock_init(&(page + i)->xpfo.lock); > + atomic_set(&(page + i)->xpfo.mapcount, 0); > + > + /* Mark it as a user page */ > + SET_XPFO_FLAG(user_fp, page + i); > + > + /* > + * Shoot the TLB if the page was previously allocated > + * to kernel space > + */ > + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) > + tlb_shoot = 1; > + } else { > + /* Mark it as a kernel page */ > + SET_XPFO_FLAG(kernel, page + i); > + } > + } > + > + if (tlb_shoot) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + > + /* The page frame was previously allocated to user space */ > + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { > + kaddr = (unsigned long)page_address(page + i); > + > + /* Clear the page and mark it accordingly */ > + clear_page((void *)kaddr); > + SET_XPFO_FLAG(zap, page + i); > + > + /* Map it back to kernel space */ > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + > + /* No TLB update */ > + } > + > + /* Clear the xpfo fast-path flag */ > + CLEAR_XPFO_FLAG(user_fp, page + i); > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB update required. > + */ > + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && > + TEST_XPFO_FLAG(user, page)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page frame is to be allocated back to user space. So unmap it > + * from the kernel, update the TLB and mark it as a user page. > + */ > + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && > + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + SET_XPFO_FLAG(user, page); > + } > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > diff --git a/block/blk-map.c b/block/blk-map.c > index f565e11..b7b8302 100644 > --- a/block/blk-map.c > +++ b/block/blk-map.c > @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, > prv.iov_len = iov.iov_len; > } > > - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) > + /* > + * juergh: Temporary hack to force the use of a bounce buffer if XPFO > + * is enabled. Results in an XPFO page fault otherwise. > + */ This does look like it might add a bunch of overhead > + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || > + IS_ENABLED(CONFIG_XPFO)) > bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); > else > bio = bio_map_user_iov(q, iter, gfp_mask); > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f329..0ca9130 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > + > pagefault_enable(); > preempt_enable(); > } > @@ -133,7 +146,8 @@ do { \ > static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > { > void *addr = kmap_atomic(page); > - clear_user_page(addr, vaddr, page); > + if (!xpfo_test_and_clear_zap(page)) > + clear_user_page(addr, vaddr, page); > kunmap_atomic(addr); > } > #endif > @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > static inline void clear_highpage(struct page *page) > { > void *kaddr = kmap_atomic(page); > - clear_page(kaddr); > + if (!xpfo_test_and_clear_zap(page)) > + clear_page(kaddr); > kunmap_atomic(kaddr); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 624b78b..71c95aa 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -215,6 +216,9 @@ struct page { > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > int _last_cpupid; > #endif > +#ifdef CONFIG_XPFO > + struct xpfo_info xpfo; > +#endif > } > /* > * The struct page can be forced to be double word aligned so that atomic ops > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 0000000..c4f0871 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,88 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +/* > + * XPFO page flags: > + * > + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag > + * is used in the fast path, where the page is marked accordingly but *not* > + * unmapped from the kernel. In most cases, the kernel will need access to the > + * page immediately after its acquisition so an unnecessary mapping operation > + * is avoided. > + * > + * PG_XPFO_user denotes that the page is destined for user space. This flag is > + * used in the slow path, where the page needs to be mapped/unmapped when the > + * kernel wants to access it. If a page is deallocated and this flag is set, > + * the page is cleared and mapped back into the kernel. > + * > + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used > + * for identifying pages that are first assigned to kernel space and then freed > + * and mapped to user space. In such cases, an expensive TLB shootdown is > + * necessary. Pages allocated to user space, freed, and subsequently allocated > + * to user space again, require only local TLB invalidation. > + * > + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to > + * avoid zapping pages multiple times. Whenever a page is freed and was > + * previously mapped to user space, it needs to be zapped before mapped back > + * in to the kernel. > + */ > + > +enum xpfo_pageflags { > + PG_XPFO_user_fp, > + PG_XPFO_user, > + PG_XPFO_kernel, > + PG_XPFO_zap, > +}; > + > +struct xpfo_info { > + unsigned long flags; /* Flags for tracking the page's XPFO state */ > + atomic_t mapcount; /* Counter for balancing page map/unmap > + * requests. Only the first map request maps > + * the page back to kernel space. Likewise, > + * only the last unmap request unmaps the page. > + */ > + spinlock_t lock; /* Lock to serialize concurrent map/unmap > + * requests. > + */ > +}; > + > +extern void xpfo_clear_zap(struct page *page, int order); > +extern int xpfo_test_and_clear_zap(struct page *page); > +extern int xpfo_test_kernel(struct page *page); > +extern int xpfo_test_user(struct page *page); > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +#else /* ifdef CONFIG_XPFO */ > + > +static inline void xpfo_clear_zap(struct page *page, int order) { } > +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } > +static inline int xpfo_test_kernel(struct page *page) { return 0; } > +static inline int xpfo_test_user(struct page *page) { return 0; } > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +#endif /* ifdef CONFIG_XPFO */ > + > +#endif /* ifndef _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 76f29ec..cf57ee9 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_test_user(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 838ca8bb..47b42a3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) > } > arch_free_page(page, order); > kernel_map_pages(page, 1 << order, 0); > + xpfo_free_page(page, order); > > return true; > } > @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, > arch_alloc_page(page, order); > kernel_map_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > > if (gfp_flags & __GFP_ZERO) > for (i = 0; i < (1 << order); i++) > clear_highpage(page + i); > + else > + xpfo_clear_zap(page, order); > > if (order && (gfp_flags & __GFP_COMP)) > prep_compound_page(page, order); > @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) > } > > pcp = &this_cpu_ptr(zone->pageset)->pcp; > - if (!cold) > + if (!cold && !xpfo_test_kernel(page)) > list_add(&page->lru, &pcp->lists[migratetype]); > else > list_add_tail(&page->lru, &pcp->lists[migratetype]); > + > pcp->count++; > if (pcp->count >= pcp->high) { > unsigned long batch = READ_ONCE(pcp->batch); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752700AbcCUIiK (ORCPT ); Mon, 21 Mar 2016 04:38:10 -0400 Received: from g1t6213.austin.hp.com ([15.73.96.121]:58625 "EHLO g1t6213.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751371AbcCUIiH (ORCPT ); Mon, 21 Mar 2016 04:38:07 -0400 From: Juerg Haefliger Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) To: Laura Abbott , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4F0D6.2060308@redhat.com> Cc: vpk@cs.brown.edu, Kees Cook Message-ID: <56EFB2DB.3090602@hpe.com> Date: Mon, 21 Mar 2016 09:37:47 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <56D4F0D6.2060308@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Laura, Sorry for the late reply. I was on FTO and then traveling for the past couple of days. On 03/01/2016 02:31 AM, Laura Abbott wrote: > On 02/26/2016 06:21 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userland, unless explicitly requested by the >> kernel. Whenever a page destined for userland is allocated, it is >> unmapped from physmap. When such a page is reclaimed from userland, it is >> mapped back to physmap. >> >> Mapping/unmapping from physmap is accomplished by modifying the PTE >> permission bits to allow/disallow access to the page. >> >> Additional fields are added to the page struct for XPFO housekeeping. >> Specifically a flags field to distinguish user vs. kernel pages, a >> reference counter to track physmap map/unmap operations and a lock to >> protect the XPFO fields. >> >> Known issues/limitations: >> - Only supported on x86-64. >> - Only supports 4k pages. >> - Adds additional data to the page struct. >> - There are most likely some additional and legitimate uses cases where >> the kernel needs to access userspace. Those need to be identified and >> made XPFO-aware. >> - There's a performance impact if XPFO is turned on. Per the paper >> referenced below it's in the 1-3% ballpark. More performance testing >> wouldn't hurt. What tests to run though? >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >> > > General note: Make sure to cc the x86 maintainers on the next version of > the patch. I'd also recommend ccing the kernel hardening list (see the wiki > page http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project for > details) Good idea. Thanks for the suggestion. > If you can find a way to break this up into x86 specific vs. generic patches > that would be better. Perhaps move the Kconfig for XPFO to the generic > Kconfig layer and make it depend on ARCH_HAS_XPFO? x86 can then select > ARCH_HAS_XPFO as the last option. Good idea. > There also isn't much that's actually x86 specific here except for > some of the page table manipulation functions and even those can probably > be abstracted away. It would be good to get more of this out of x86 to > let other arches take advantage of it. The arm64 implementation would > look pretty similar if you save the old kernel mapping and restore > it on free. OK. I need to familiarize myself with ARM to figure out which pieces can move out of the arch subdir. > >> Suggested-by: Vasileios P. Kemerlis >> Signed-off-by: Juerg Haefliger >> --- >> arch/x86/Kconfig | 2 +- >> arch/x86/Kconfig.debug | 17 +++++ >> arch/x86/mm/Makefile | 2 + >> arch/x86/mm/init.c | 3 +- >> arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ >> block/blk-map.c | 7 +- >> include/linux/highmem.h | 23 +++++-- >> include/linux/mm_types.h | 4 ++ >> include/linux/xpfo.h | 88 ++++++++++++++++++++++++ >> lib/swiotlb.c | 3 +- >> mm/page_alloc.c | 7 +- >> 11 files changed, 323 insertions(+), 9 deletions(-) >> create mode 100644 arch/x86/mm/xpfo.c >> create mode 100644 include/linux/xpfo.h >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index c46662f..9d32b4a 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT >> >> config X86_DIRECT_GBPAGES >> def_bool y >> - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK >> + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO >> ---help--- >> Certain kernel features effectively disable kernel >> linear 1 GB mappings (even if the CPU otherwise >> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug >> index 9b18ed9..1331da5 100644 >> --- a/arch/x86/Kconfig.debug >> +++ b/arch/x86/Kconfig.debug >> @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT >> >> source "lib/Kconfig.debug" >> >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on DEBUG_KERNEL >> + depends on X86_64 >> + select DEBUG_TLBFLUSH >> + ---help--- >> + This option offers protection against 'ret2dir' (kernel) attacks. >> + When enabled, every time a page frame is allocated to user space, it >> + is unmapped from the direct mapped RAM region in kernel space >> + (physmap). Similarly, whenever page frames are freed/reclaimed, they >> + are mapped back to physmap. Special care is taken to minimize the >> + impact on performance by reducing TLB shootdowns and unnecessary page >> + zero fills. >> + >> + If in doubt, say "N". >> + >> config X86_VERBOSE_BOOTUP >> bool "Enable verbose x86 bootup info messages" >> default y >> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile >> index f9d38a4..8bf52b6 100644 >> --- a/arch/x86/mm/Makefile >> +++ b/arch/x86/mm/Makefile >> @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o >> obj-$(CONFIG_NUMA_EMU) += numa_emulation.o >> >> obj-$(CONFIG_X86_INTEL_MPX) += mpx.o >> + >> +obj-$(CONFIG_XPFO) += xpfo.o >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >> index 493f541..27fc8a6 100644 >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -150,7 +150,8 @@ static int page_size_mask; >> >> static void __init probe_page_size_mask(void) >> { >> -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) >> +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ >> + !defined(CONFIG_XPFO) >> /* >> * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. >> * This will simplify cpa(), which otherwise needs to support splitting >> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c >> new file mode 100644 >> index 0000000..6bc24d3 >> --- /dev/null >> +++ b/arch/x86/mm/xpfo.c >> @@ -0,0 +1,176 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#include >> +#include >> + >> +#include >> +#include >> + >> +#define TEST_XPFO_FLAG(flag, page) \ >> + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define SET_XPFO_FLAG(flag, page) \ >> + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define CLEAR_XPFO_FLAG(flag, page) \ >> + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ >> + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte = lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level != PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); >> +} >> + >> +inline void xpfo_clear_zap(struct page *page, int order) >> +{ >> + int i; >> + >> + for (i = 0; i < (1 << order); i++) >> + CLEAR_XPFO_FLAG(zap, page + i); >> +} >> + >> +inline int xpfo_test_and_clear_zap(struct page *page) >> +{ >> + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); >> +} >> + >> +inline int xpfo_test_kernel(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(kernel, page); >> +} >> + >> +inline int xpfo_test_user(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(user, page); >> +} >> + >> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) >> +{ >> + int i, tlb_shoot = 0; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || >> + TEST_XPFO_FLAG(user, page + i)); >> + >> + if (gfp & GFP_HIGHUSER) { > > This check doesn't seem right. If the GFP flags have _any_ in common with > GFP_HIGHUSER it will be marked as a user page so GFP_KERNEL will be marked > as well. Duh. You're right. I broke this when I cleaned up the original patch. It should be: (gfp & GFP_HIGHUSER) == GFP_HIGHUSER >> + /* Initialize the xpfo lock and map counter */ >> + spin_lock_init(&(page + i)->xpfo.lock); > > This is initializing the spin_lock every time. That's not really necessary. Correct. The initialization should probably be done when the page struct is first allocated. But I haven't been able to find that piece of code quickly. Will look again. >> + atomic_set(&(page + i)->xpfo.mapcount, 0); >> + >> + /* Mark it as a user page */ >> + SET_XPFO_FLAG(user_fp, page + i); >> + >> + /* >> + * Shoot the TLB if the page was previously allocated >> + * to kernel space >> + */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) >> + tlb_shoot = 1; >> + } else { >> + /* Mark it as a kernel page */ >> + SET_XPFO_FLAG(kernel, page + i); >> + } >> + } >> + >> + if (tlb_shoot) { >> + kaddr = (unsigned long)page_address(page); >> + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * >> + PAGE_SIZE); >> + } >> +} >> + >> +void xpfo_free_page(struct page *page, int order) >> +{ >> + int i; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + >> + /* The page frame was previously allocated to user space */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >> + kaddr = (unsigned long)page_address(page + i); >> + >> + /* Clear the page and mark it accordingly */ >> + clear_page((void *)kaddr); > > Clearing the page isn't related to XPFO. There's other work ongoing to > do clearing of the page on free. It's not strictly related to XPFO but adds another layer of security. Do you happen to have a pointer to the ongoing work that you mentioned? >> + SET_XPFO_FLAG(zap, page + i); >> + >> + /* Map it back to kernel space */ >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + /* No TLB update */ >> + } >> + >> + /* Clear the xpfo fast-path flag */ >> + CLEAR_XPFO_FLAG(user_fp, page + i); >> + } >> +} >> + >> +void xpfo_kmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page was previously allocated to user space, so map it back >> + * into the kernel. No TLB update required. >> + */ >> + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && >> + TEST_XPFO_FLAG(user, page)) >> + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kmap); >> + >> +void xpfo_kunmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page frame is to be allocated back to user space. So unmap it >> + * from the kernel, update the TLB and mark it as a user page. >> + */ >> + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && >> + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >> + SET_XPFO_FLAG(user, page); >> + } >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kunmap); > > I'm confused by the checks in kmap/kunmap here. It looks like once the > page is allocated there is no changing of flags between user and > kernel mode so the checks for if the page is user seem redundant. Hmm... I think you're partially right. In xpfo_kmap we need to distinguish between user and user_fp, so the check for 'user' is necessary. However, in kunmap we can drop the check for 'user' || 'user_fp'. >> diff --git a/block/blk-map.c b/block/blk-map.c >> index f565e11..b7b8302 100644 >> --- a/block/blk-map.c >> +++ b/block/blk-map.c >> @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct >> request *rq, >> prv.iov_len = iov.iov_len; >> } >> >> - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) >> + /* >> + * juergh: Temporary hack to force the use of a bounce buffer if XPFO >> + * is enabled. Results in an XPFO page fault otherwise. >> + */ >> + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || >> + IS_ENABLED(CONFIG_XPFO)) >> bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); >> else >> bio = bio_map_user_iov(q, iter, gfp_mask); >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h >> index bb3f329..0ca9130 100644 >> --- a/include/linux/highmem.h >> +++ b/include/linux/highmem.h >> @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) >> #ifndef ARCH_HAS_KMAP >> static inline void *kmap(struct page *page) >> { >> + void *kaddr; >> + >> might_sleep(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> >> static inline void kunmap(struct page *page) >> { >> + xpfo_kunmap(page_address(page), page); >> } >> >> static inline void *kmap_atomic(struct page *page) >> { >> + void *kaddr; >> + >> preempt_disable(); >> pagefault_disable(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> #define kmap_atomic_prot(page, prot) kmap_atomic(page) >> >> static inline void __kunmap_atomic(void *addr) >> { >> + xpfo_kunmap(addr, virt_to_page(addr)); >> + >> pagefault_enable(); >> preempt_enable(); >> } >> @@ -133,7 +146,8 @@ do >> { \ >> static inline void clear_user_highpage(struct page *page, unsigned long vaddr) >> { >> void *addr = kmap_atomic(page); >> - clear_user_page(addr, vaddr, page); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_user_page(addr, vaddr, page); >> kunmap_atomic(addr); >> } >> #endif >> @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct >> *vma, >> static inline void clear_highpage(struct page *page) >> { >> void *kaddr = kmap_atomic(page); >> - clear_page(kaddr); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_page(kaddr); >> kunmap_atomic(kaddr); >> } >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 624b78b..71c95aa 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -215,6 +216,9 @@ struct page { >> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS >> int _last_cpupid; >> #endif >> +#ifdef CONFIG_XPFO >> + struct xpfo_info xpfo; >> +#endif >> } >> /* >> * The struct page can be forced to be double word aligned so that atomic ops >> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h >> new file mode 100644 >> index 0000000..c4f0871 >> --- /dev/null >> +++ b/include/linux/xpfo.h >> @@ -0,0 +1,88 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#ifndef _LINUX_XPFO_H >> +#define _LINUX_XPFO_H >> + >> +#ifdef CONFIG_XPFO >> + >> +/* >> + * XPFO page flags: >> + * >> + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag >> + * is used in the fast path, where the page is marked accordingly but *not* >> + * unmapped from the kernel. In most cases, the kernel will need access to the >> + * page immediately after its acquisition so an unnecessary mapping operation >> + * is avoided. >> + * >> + * PG_XPFO_user denotes that the page is destined for user space. This flag is >> + * used in the slow path, where the page needs to be mapped/unmapped when the >> + * kernel wants to access it. If a page is deallocated and this flag is set, >> + * the page is cleared and mapped back into the kernel. >> + * >> + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used >> + * for identifying pages that are first assigned to kernel space and then freed >> + * and mapped to user space. In such cases, an expensive TLB shootdown is >> + * necessary. Pages allocated to user space, freed, and subsequently allocated >> + * to user space again, require only local TLB invalidation. >> + * >> + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to >> + * avoid zapping pages multiple times. Whenever a page is freed and was >> + * previously mapped to user space, it needs to be zapped before mapped back >> + * in to the kernel. >> + */ > > 'zap' doesn't really indicate what is actually happening with the page. Can you > be a bit more descriptive about what this actually does? It means that the page has been cleared at the time it was released back to the free pool. To prevent multiple expensive cleaning operations. But this might go away because of the ongoing work of sanitizing pages that you mentioned. >> + >> +enum xpfo_pageflags { >> + PG_XPFO_user_fp, >> + PG_XPFO_user, >> + PG_XPFO_kernel, >> + PG_XPFO_zap, >> +}; >> + >> +struct xpfo_info { >> + unsigned long flags; /* Flags for tracking the page's XPFO state */ >> + atomic_t mapcount; /* Counter for balancing page map/unmap >> + * requests. Only the first map request maps >> + * the page back to kernel space. Likewise, >> + * only the last unmap request unmaps the page. >> + */ >> + spinlock_t lock; /* Lock to serialize concurrent map/unmap >> + * requests. >> + */ >> +}; > > Can you change this to use the page_ext implementation? See what > mm/page_owner.c does. This might lessen the impact of the extra > page metadata. This metadata still feels like a copy of what > mm/highmem.c is trying to do though. I'll look into that, thanks for the pointer. >> + >> +extern void xpfo_clear_zap(struct page *page, int order); >> +extern int xpfo_test_and_clear_zap(struct page *page); >> +extern int xpfo_test_kernel(struct page *page); >> +extern int xpfo_test_user(struct page *page); >> + >> +extern void xpfo_kmap(void *kaddr, struct page *page); >> +extern void xpfo_kunmap(void *kaddr, struct page *page); >> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); >> +extern void xpfo_free_page(struct page *page, int order); >> + >> +#else /* ifdef CONFIG_XPFO */ >> + >> +static inline void xpfo_clear_zap(struct page *page, int order) { } >> +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } >> +static inline int xpfo_test_kernel(struct page *page) { return 0; } >> +static inline int xpfo_test_user(struct page *page) { return 0; } >> + >> +static inline void xpfo_kmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } >> +static inline void xpfo_free_page(struct page *page, int order) { } >> + >> +#endif /* ifdef CONFIG_XPFO */ >> + >> +#endif /* ifndef _LINUX_XPFO_H */ >> diff --git a/lib/swiotlb.c b/lib/swiotlb.c >> index 76f29ec..cf57ee9 100644 >> --- a/lib/swiotlb.c >> +++ b/lib/swiotlb.c >> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, >> phys_addr_t tlb_addr, >> { >> unsigned long pfn = PFN_DOWN(orig_addr); >> unsigned char *vaddr = phys_to_virt(tlb_addr); >> + struct page *page = pfn_to_page(pfn); >> >> - if (PageHighMem(pfn_to_page(pfn))) { >> + if (PageHighMem(page) || xpfo_test_user(page)) { >> /* The buffer does not have a mapping. Map it in and copy */ >> unsigned int offset = orig_addr & ~PAGE_MASK; >> char *buffer; >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 838ca8bb..47b42a3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, >> unsigned int order) >> } >> arch_free_page(page, order); >> kernel_map_pages(page, 1 << order, 0); >> + xpfo_free_page(page, order); >> >> return true; >> } >> @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned >> int order, gfp_t gfp_flags, >> arch_alloc_page(page, order); >> kernel_map_pages(page, 1 << order, 1); >> kasan_alloc_pages(page, order); >> + xpfo_alloc_page(page, order, gfp_flags); >> >> if (gfp_flags & __GFP_ZERO) >> for (i = 0; i < (1 << order); i++) >> clear_highpage(page + i); >> + else >> + xpfo_clear_zap(page, order); >> >> if (order && (gfp_flags & __GFP_COMP)) >> prep_compound_page(page, order); >> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >> } >> >> pcp = &this_cpu_ptr(zone->pageset)->pcp; >> - if (!cold) >> + if (!cold && !xpfo_test_kernel(page)) >> list_add(&page->lru, &pcp->lists[migratetype]); >> else >> list_add_tail(&page->lru, &pcp->lists[migratetype]); >> + > > What's the advantage of this? Allocating a page to userspace that was previously allocated to kernel space requires an expensive TLB shootdown. The above will put previously kernel-allocated pages in the cold page cache to postpone their allocation as long as possible to minimize TLB shootdowns. >> pcp->count++; >> if (pcp->count >= pcp->high) { >> unsigned long batch = READ_ONCE(pcp->batch); >> Thanks for the review and comments! It's highly appreciated. ...Juerg > Thanks, > Laura From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753296AbcCUIpF (ORCPT ); Mon, 21 Mar 2016 04:45:05 -0400 Received: from g1t6225.austin.hp.com ([15.73.96.126]:45906 "EHLO g1t6225.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751067AbcCUIo7 (ORCPT ); Mon, 21 Mar 2016 04:44:59 -0400 Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) To: Balbir Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4FA15.9060700@gmail.com> Cc: vpk@cs.brown.edu From: Juerg Haefliger Message-ID: <56EFB486.2090501@hpe.com> Date: Mon, 21 Mar 2016 09:44:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <56D4FA15.9060700@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Balbir, Apologies for the slow reply. On 03/01/2016 03:10 AM, Balbir Singh wrote: > > > On 27/02/16 01:21, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userland, unless explicitly requested by the >> kernel. Whenever a page destined for userland is allocated, it is >> unmapped from physmap. When such a page is reclaimed from userland, it is >> mapped back to physmap. > physmap == xen physmap? Please clarify No, it's not XEN related. I might have the terminology wrong. Physmap is what the original authors used for describing a large, contiguous virtual memory region inside kernel address space that contains a direct mapping of part or all (depending on the architecture) physical memory. >> Mapping/unmapping from physmap is accomplished by modifying the PTE >> permission bits to allow/disallow access to the page. >> >> Additional fields are added to the page struct for XPFO housekeeping. >> Specifically a flags field to distinguish user vs. kernel pages, a >> reference counter to track physmap map/unmap operations and a lock to >> protect the XPFO fields. >> >> Known issues/limitations: >> - Only supported on x86-64. > Is it due to lack of porting or a design limitation? Lack of porting. Support for other architectures will come later. >> - Only supports 4k pages. >> - Adds additional data to the page struct. >> - There are most likely some additional and legitimate uses cases where >> the kernel needs to access userspace. Those need to be identified and >> made XPFO-aware. > Why not build an audit mode for it? Can you elaborate what you mean by this? >> - There's a performance impact if XPFO is turned on. Per the paper >> referenced below it's in the 1-3% ballpark. More performance testing >> wouldn't hurt. What tests to run though? >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >> >> Suggested-by: Vasileios P. Kemerlis >> Signed-off-by: Juerg Haefliger > This patch needs to be broken down into smaller patches - a series Agreed. >> --- >> arch/x86/Kconfig | 2 +- >> arch/x86/Kconfig.debug | 17 +++++ >> arch/x86/mm/Makefile | 2 + >> arch/x86/mm/init.c | 3 +- >> arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ >> block/blk-map.c | 7 +- >> include/linux/highmem.h | 23 +++++-- >> include/linux/mm_types.h | 4 ++ >> include/linux/xpfo.h | 88 ++++++++++++++++++++++++ >> lib/swiotlb.c | 3 +- >> mm/page_alloc.c | 7 +- >> 11 files changed, 323 insertions(+), 9 deletions(-) >> create mode 100644 arch/x86/mm/xpfo.c >> create mode 100644 include/linux/xpfo.h >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index c46662f..9d32b4a 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT >> >> config X86_DIRECT_GBPAGES >> def_bool y >> - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK >> + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO >> ---help--- >> Certain kernel features effectively disable kernel >> linear 1 GB mappings (even if the CPU otherwise >> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug >> index 9b18ed9..1331da5 100644 >> --- a/arch/x86/Kconfig.debug >> +++ b/arch/x86/Kconfig.debug >> @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT >> >> source "lib/Kconfig.debug" >> >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on DEBUG_KERNEL >> + depends on X86_64 >> + select DEBUG_TLBFLUSH >> + ---help--- >> + This option offers protection against 'ret2dir' (kernel) attacks. >> + When enabled, every time a page frame is allocated to user space, it >> + is unmapped from the direct mapped RAM region in kernel space >> + (physmap). Similarly, whenever page frames are freed/reclaimed, they >> + are mapped back to physmap. Special care is taken to minimize the >> + impact on performance by reducing TLB shootdowns and unnecessary page >> + zero fills. >> + >> + If in doubt, say "N". >> + >> config X86_VERBOSE_BOOTUP >> bool "Enable verbose x86 bootup info messages" >> default y >> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile >> index f9d38a4..8bf52b6 100644 >> --- a/arch/x86/mm/Makefile >> +++ b/arch/x86/mm/Makefile >> @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o >> obj-$(CONFIG_NUMA_EMU) += numa_emulation.o >> >> obj-$(CONFIG_X86_INTEL_MPX) += mpx.o >> + >> +obj-$(CONFIG_XPFO) += xpfo.o >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >> index 493f541..27fc8a6 100644 >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -150,7 +150,8 @@ static int page_size_mask; >> >> static void __init probe_page_size_mask(void) >> { >> -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) >> +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ >> + !defined(CONFIG_XPFO) >> /* >> * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. >> * This will simplify cpa(), which otherwise needs to support splitting >> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c >> new file mode 100644 >> index 0000000..6bc24d3 >> --- /dev/null >> +++ b/arch/x86/mm/xpfo.c >> @@ -0,0 +1,176 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#include >> +#include >> + >> +#include >> +#include >> + >> +#define TEST_XPFO_FLAG(flag, page) \ >> + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define SET_XPFO_FLAG(flag, page) \ >> + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define CLEAR_XPFO_FLAG(flag, page) \ >> + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ >> + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte = lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level != PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); >> +} >> + >> +inline void xpfo_clear_zap(struct page *page, int order) >> +{ >> + int i; >> + >> + for (i = 0; i < (1 << order); i++) >> + CLEAR_XPFO_FLAG(zap, page + i); >> +} >> + >> +inline int xpfo_test_and_clear_zap(struct page *page) >> +{ >> + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); >> +} >> + >> +inline int xpfo_test_kernel(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(kernel, page); >> +} >> + >> +inline int xpfo_test_user(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(user, page); >> +} >> + >> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) >> +{ >> + int i, tlb_shoot = 0; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || >> + TEST_XPFO_FLAG(user, page + i)); >> + >> + if (gfp & GFP_HIGHUSER) { > Why GFP_HIGHUSER? The check is wrong. It should be ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER). Thanks ...Juerg >> + /* Initialize the xpfo lock and map counter */ >> + spin_lock_init(&(page + i)->xpfo.lock); >> + atomic_set(&(page + i)->xpfo.mapcount, 0); >> + >> + /* Mark it as a user page */ >> + SET_XPFO_FLAG(user_fp, page + i); >> + >> + /* >> + * Shoot the TLB if the page was previously allocated >> + * to kernel space >> + */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) >> + tlb_shoot = 1; >> + } else { >> + /* Mark it as a kernel page */ >> + SET_XPFO_FLAG(kernel, page + i); >> + } >> + } >> + >> + if (tlb_shoot) { >> + kaddr = (unsigned long)page_address(page); >> + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * >> + PAGE_SIZE); >> + } >> +} >> + >> +void xpfo_free_page(struct page *page, int order) >> +{ >> + int i; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + >> + /* The page frame was previously allocated to user space */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >> + kaddr = (unsigned long)page_address(page + i); >> + >> + /* Clear the page and mark it accordingly */ >> + clear_page((void *)kaddr); >> + SET_XPFO_FLAG(zap, page + i); >> + >> + /* Map it back to kernel space */ >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + /* No TLB update */ >> + } >> + >> + /* Clear the xpfo fast-path flag */ >> + CLEAR_XPFO_FLAG(user_fp, page + i); >> + } >> +} >> + >> +void xpfo_kmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page was previously allocated to user space, so map it back >> + * into the kernel. No TLB update required. >> + */ >> + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && >> + TEST_XPFO_FLAG(user, page)) >> + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kmap); >> + >> +void xpfo_kunmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page frame is to be allocated back to user space. So unmap it >> + * from the kernel, update the TLB and mark it as a user page. >> + */ >> + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && >> + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >> + SET_XPFO_FLAG(user, page); >> + } >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kunmap); >> diff --git a/block/blk-map.c b/block/blk-map.c >> index f565e11..b7b8302 100644 >> --- a/block/blk-map.c >> +++ b/block/blk-map.c >> @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, >> prv.iov_len = iov.iov_len; >> } >> >> - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) >> + /* >> + * juergh: Temporary hack to force the use of a bounce buffer if XPFO >> + * is enabled. Results in an XPFO page fault otherwise. >> + */ > This does look like it might add a bunch of overhead >> + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || >> + IS_ENABLED(CONFIG_XPFO)) >> bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); >> else >> bio = bio_map_user_iov(q, iter, gfp_mask); >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h >> index bb3f329..0ca9130 100644 >> --- a/include/linux/highmem.h >> +++ b/include/linux/highmem.h >> @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) >> #ifndef ARCH_HAS_KMAP >> static inline void *kmap(struct page *page) >> { >> + void *kaddr; >> + >> might_sleep(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> >> static inline void kunmap(struct page *page) >> { >> + xpfo_kunmap(page_address(page), page); >> } >> >> static inline void *kmap_atomic(struct page *page) >> { >> + void *kaddr; >> + >> preempt_disable(); >> pagefault_disable(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> #define kmap_atomic_prot(page, prot) kmap_atomic(page) >> >> static inline void __kunmap_atomic(void *addr) >> { >> + xpfo_kunmap(addr, virt_to_page(addr)); >> + >> pagefault_enable(); >> preempt_enable(); >> } >> @@ -133,7 +146,8 @@ do { \ >> static inline void clear_user_highpage(struct page *page, unsigned long vaddr) >> { >> void *addr = kmap_atomic(page); >> - clear_user_page(addr, vaddr, page); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_user_page(addr, vaddr, page); >> kunmap_atomic(addr); >> } >> #endif >> @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, >> static inline void clear_highpage(struct page *page) >> { >> void *kaddr = kmap_atomic(page); >> - clear_page(kaddr); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_page(kaddr); >> kunmap_atomic(kaddr); >> } >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 624b78b..71c95aa 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -215,6 +216,9 @@ struct page { >> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS >> int _last_cpupid; >> #endif >> +#ifdef CONFIG_XPFO >> + struct xpfo_info xpfo; >> +#endif >> } >> /* >> * The struct page can be forced to be double word aligned so that atomic ops >> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h >> new file mode 100644 >> index 0000000..c4f0871 >> --- /dev/null >> +++ b/include/linux/xpfo.h >> @@ -0,0 +1,88 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#ifndef _LINUX_XPFO_H >> +#define _LINUX_XPFO_H >> + >> +#ifdef CONFIG_XPFO >> + >> +/* >> + * XPFO page flags: >> + * >> + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag >> + * is used in the fast path, where the page is marked accordingly but *not* >> + * unmapped from the kernel. In most cases, the kernel will need access to the >> + * page immediately after its acquisition so an unnecessary mapping operation >> + * is avoided. >> + * >> + * PG_XPFO_user denotes that the page is destined for user space. This flag is >> + * used in the slow path, where the page needs to be mapped/unmapped when the >> + * kernel wants to access it. If a page is deallocated and this flag is set, >> + * the page is cleared and mapped back into the kernel. >> + * >> + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used >> + * for identifying pages that are first assigned to kernel space and then freed >> + * and mapped to user space. In such cases, an expensive TLB shootdown is >> + * necessary. Pages allocated to user space, freed, and subsequently allocated >> + * to user space again, require only local TLB invalidation. >> + * >> + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to >> + * avoid zapping pages multiple times. Whenever a page is freed and was >> + * previously mapped to user space, it needs to be zapped before mapped back >> + * in to the kernel. >> + */ >> + >> +enum xpfo_pageflags { >> + PG_XPFO_user_fp, >> + PG_XPFO_user, >> + PG_XPFO_kernel, >> + PG_XPFO_zap, >> +}; >> + >> +struct xpfo_info { >> + unsigned long flags; /* Flags for tracking the page's XPFO state */ >> + atomic_t mapcount; /* Counter for balancing page map/unmap >> + * requests. Only the first map request maps >> + * the page back to kernel space. Likewise, >> + * only the last unmap request unmaps the page. >> + */ >> + spinlock_t lock; /* Lock to serialize concurrent map/unmap >> + * requests. >> + */ >> +}; >> + >> +extern void xpfo_clear_zap(struct page *page, int order); >> +extern int xpfo_test_and_clear_zap(struct page *page); >> +extern int xpfo_test_kernel(struct page *page); >> +extern int xpfo_test_user(struct page *page); >> + >> +extern void xpfo_kmap(void *kaddr, struct page *page); >> +extern void xpfo_kunmap(void *kaddr, struct page *page); >> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); >> +extern void xpfo_free_page(struct page *page, int order); >> + >> +#else /* ifdef CONFIG_XPFO */ >> + >> +static inline void xpfo_clear_zap(struct page *page, int order) { } >> +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } >> +static inline int xpfo_test_kernel(struct page *page) { return 0; } >> +static inline int xpfo_test_user(struct page *page) { return 0; } >> + >> +static inline void xpfo_kmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } >> +static inline void xpfo_free_page(struct page *page, int order) { } >> + >> +#endif /* ifdef CONFIG_XPFO */ >> + >> +#endif /* ifndef _LINUX_XPFO_H */ >> diff --git a/lib/swiotlb.c b/lib/swiotlb.c >> index 76f29ec..cf57ee9 100644 >> --- a/lib/swiotlb.c >> +++ b/lib/swiotlb.c >> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, >> { >> unsigned long pfn = PFN_DOWN(orig_addr); >> unsigned char *vaddr = phys_to_virt(tlb_addr); >> + struct page *page = pfn_to_page(pfn); >> >> - if (PageHighMem(pfn_to_page(pfn))) { >> + if (PageHighMem(page) || xpfo_test_user(page)) { >> /* The buffer does not have a mapping. Map it in and copy */ >> unsigned int offset = orig_addr & ~PAGE_MASK; >> char *buffer; >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 838ca8bb..47b42a3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) >> } >> arch_free_page(page, order); >> kernel_map_pages(page, 1 << order, 0); >> + xpfo_free_page(page, order); >> >> return true; >> } >> @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, >> arch_alloc_page(page, order); >> kernel_map_pages(page, 1 << order, 1); >> kasan_alloc_pages(page, order); >> + xpfo_alloc_page(page, order, gfp_flags); >> >> if (gfp_flags & __GFP_ZERO) >> for (i = 0; i < (1 << order); i++) >> clear_highpage(page + i); >> + else >> + xpfo_clear_zap(page, order); >> >> if (order && (gfp_flags & __GFP_COMP)) >> prep_compound_page(page, order); >> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >> } >> >> pcp = &this_cpu_ptr(zone->pageset)->pcp; >> - if (!cold) >> + if (!cold && !xpfo_test_kernel(page)) >> list_add(&page->lru, &pcp->lists[migratetype]); >> else >> list_add_tail(&page->lru, &pcp->lists[migratetype]); >> + >> pcp->count++; >> if (pcp->count >= pcp->high) { >> unsigned long batch = READ_ONCE(pcp->batch); > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755094AbcC1TaF (ORCPT ); Mon, 28 Mar 2016 15:30:05 -0400 Received: from mail-pf0-f170.google.com ([209.85.192.170]:34035 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753901AbcC1TaC (ORCPT ); Mon, 28 Mar 2016 15:30:02 -0400 Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4F0D6.2060308@redhat.com> <56EFB2DB.3090602@hpe.com> Cc: vpk@cs.brown.edu, Kees Cook From: Laura Abbott Message-ID: <56F98637.4070705@redhat.com> Date: Mon, 28 Mar 2016 12:29:59 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <56EFB2DB.3090602@hpe.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/21/2016 01:37 AM, Juerg Haefliger wrote: ... >>> +void xpfo_free_page(struct page *page, int order) >>> +{ >>> + int i; >>> + unsigned long kaddr; >>> + >>> + for (i = 0; i < (1 << order); i++) { >>> + >>> + /* The page frame was previously allocated to user space */ >>> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >>> + kaddr = (unsigned long)page_address(page + i); >>> + >>> + /* Clear the page and mark it accordingly */ >>> + clear_page((void *)kaddr); >> >> Clearing the page isn't related to XPFO. There's other work ongoing to >> do clearing of the page on free. > > It's not strictly related to XPFO but adds another layer of security. Do you > happen to have a pointer to the ongoing work that you mentioned? > > The work was merged for the 4.6 merge window https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8823b1dbc05fab1a8bec275eeae4709257c2661d This is a separate option to clear the page. ... >>> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >>> } >>> >>> pcp = &this_cpu_ptr(zone->pageset)->pcp; >>> - if (!cold) >>> + if (!cold && !xpfo_test_kernel(page)) >>> list_add(&page->lru, &pcp->lists[migratetype]); >>> else >>> list_add_tail(&page->lru, &pcp->lists[migratetype]); >>> + >> >> What's the advantage of this? > > Allocating a page to userspace that was previously allocated to kernel space > requires an expensive TLB shootdown. The above will put previously > kernel-allocated pages in the cold page cache to postpone their allocation as > long as possible to minimize TLB shootdowns. > > That makes sense. You probably want to make this a separate commmit with this explanation as the commit text. >>> pcp->count++; >>> if (pcp->count >= pcp->high) { >>> unsigned long batch = READ_ONCE(pcp->batch); >>> > > Thanks for the review and comments! It's highly appreciated. > > ...Juerg > > >> Thanks, >> Laura From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757524AbcDAAVN (ORCPT ); Thu, 31 Mar 2016 20:21:13 -0400 Received: from mail-yw0-f180.google.com ([209.85.161.180]:35059 "EHLO mail-yw0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752160AbcDAAVM (ORCPT ); Thu, 31 Mar 2016 20:21:12 -0400 MIME-Version: 1.0 In-Reply-To: <56EFB486.2090501@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4FA15.9060700@gmail.com> <56EFB486.2090501@hpe.com> Date: Fri, 1 Apr 2016 11:21:11 +1100 Message-ID: Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) From: Balbir Singh To: Juerg Haefliger Cc: "linux-kernel@vger.kernel.org" , linux-mm , vpk@cs.brown.edu Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 21, 2016 at 7:44 PM, Juerg Haefliger wrote: > Hi Balbir, > > Apologies for the slow reply. > No problem, I lost this in my inbox as well due to the reply latency. > > On 03/01/2016 03:10 AM, Balbir Singh wrote: >> >> >> On 27/02/16 01:21, Juerg Haefliger wrote: >>> This patch adds support for XPFO which protects against 'ret2dir' kernel >>> attacks. The basic idea is to enforce exclusive ownership of page frames >>> by either the kernel or userland, unless explicitly requested by the >>> kernel. Whenever a page destined for userland is allocated, it is >>> unmapped from physmap. When such a page is reclaimed from userland, it is >>> mapped back to physmap. >> physmap == xen physmap? Please clarify > > No, it's not XEN related. I might have the terminology wrong. Physmap is what > the original authors used for describing a large, contiguous virtual > memory region inside kernel address space that contains a direct mapping of part > or all (depending on the architecture) physical memory. > Thanks for clarifying > >>> Mapping/unmapping from physmap is accomplished by modifying the PTE >>> permission bits to allow/disallow access to the page. >>> >>> Additional fields are added to the page struct for XPFO housekeeping. >>> Specifically a flags field to distinguish user vs. kernel pages, a >>> reference counter to track physmap map/unmap operations and a lock to >>> protect the XPFO fields. >>> >>> Known issues/limitations: >>> - Only supported on x86-64. >> Is it due to lack of porting or a design limitation? > > Lack of porting. Support for other architectures will come later. > OK > >>> - Only supports 4k pages. >>> - Adds additional data to the page struct. >>> - There are most likely some additional and legitimate uses cases where >>> the kernel needs to access userspace. Those need to be identified and >>> made XPFO-aware. >> Why not build an audit mode for it? > > Can you elaborate what you mean by this? > What I meant is when the kernel needs to access userspace and XPFO is not aware of it and is going to block it, write to a log/trace buffer so that it can be audited for correctness > >>> - There's a performance impact if XPFO is turned on. Per the paper >>> referenced below it's in the 1-3% ballpark. More performance testing >>> wouldn't hurt. What tests to run though? >>> >>> Reference paper by the original patch authors: >>> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >>> >>> Suggested-by: Vasileios P. Kemerlis >>> Signed-off-by: Juerg Haefliger >> This patch needs to be broken down into smaller patches - a series > > Agreed. > I think it will be good to describe what is XPFO aware 1. How are device mmap'd shared between kernel/user covered? 2. How is copy_from/to_user covered? 3. How is vdso covered? 4. More... Balbir Singh. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752708AbcIBLjb (ORCPT ); Fri, 2 Sep 2016 07:39:31 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:43168 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751836AbcIBLj2 (ORCPT ); Fri, 2 Sep 2016 07:39:28 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 2 Sep 2016 13:39:06 +0200 Message-Id: <20160902113909.32631-1-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752974AbcIBLju (ORCPT ); Fri, 2 Sep 2016 07:39:50 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:43184 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752541AbcIBLjr (ORCPT ); Fri, 2 Sep 2016 07:39:47 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 2 Sep 2016 13:39:07 +0200 Message-Id: <20160902113909.32631-2-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753578AbcIBLkE (ORCPT ); Fri, 2 Sep 2016 07:40:04 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:37228 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752987AbcIBLkB (ORCPT ); Fri, 2 Sep 2016 07:40:01 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Date: Fri, 2 Sep 2016 13:39:09 +0200 Message-Id: <20160902113909.32631-4-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753698AbcIBLk2 (ORCPT ); Fri, 2 Sep 2016 07:40:28 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:37226 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752994AbcIBLj7 (ORCPT ); Fri, 2 Sep 2016 07:39:59 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Date: Fri, 2 Sep 2016 13:39:08 +0200 Message-Id: <20160902113909.32631-3-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753464AbcIBUj1 (ORCPT ); Fri, 2 Sep 2016 16:39:27 -0400 Received: from mga03.intel.com ([134.134.136.65]:44471 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752275AbcIBUjY (ORCPT ); Fri, 2 Sep 2016 16:39:24 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,272,1470726000"; d="scan'208";a="1045048724" Subject: Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> Cc: vpk@cs.columbia.edu From: Dave Hansen Message-ID: <57C9E37A.9070805@intel.com> Date: Fri, 2 Sep 2016 13:39:22 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160902113909.32631-3-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/02/2016 04:39 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755546AbcIELzG (ORCPT ); Mon, 5 Sep 2016 07:55:06 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:11172 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755465AbcIELy6 (ORCPT ); Mon, 5 Sep 2016 07:54:58 -0400 Subject: Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> <57C9E37A.9070805@intel.com> Cc: vpk@cs.columbia.edu From: Juerg Haefliger Message-ID: Date: Mon, 5 Sep 2016 13:54:47 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <57C9E37A.9070805@intel.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t Content-Type: multipart/mixed; boundary="HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta"; protected-headers="v1" From: Juerg Haefliger To: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> <57C9E37A.9070805@intel.com> In-Reply-To: <57C9E37A.9070805@intel.com> --HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 09/02/2016 10:39 PM, Dave Hansen wrote: > On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. >=20 > But kernel allocations do allocate from these pools, right? Yes. > Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? Only pages that are allocated for userspace (gfp & GFP_HIGHUSER =3D=3D GF= P_HIGHUSER) which were previously allocated for the kernel (gfp & GFP_HIGHUSER !=3D GFP_HIGHUSER= ) have to pay the penalty. > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? The logic is to put reclaimed kernel pages into the cold cache to postpon= e their allocation as long as possible to minimize (potential) TLB flushes. =2E..Juerg --HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta-- --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXzV0HAAoJEHVMOpb5+LSM304QAINSMlOQKNAIRon29Uy318Sf J9Vfv3p2L/WIxrL6kKaHYkDqj+b0XnSVlWvnxNp1MX1qAOqeUSipfymvwNaYGuIV IJQeahOcJccupJMw1ILF+H1Rhxn+gBOc9I745omwO/CtlqYaYaXfCeIxI/R1Q9LQ yCtBPnbL4v1St7FnjDhZd3FdgiP+F98MAz8040FYq1cO+qWVDTyIRcpq4rPaAJNi 8zcpLB+A34qjA2i3ZFV/ZNls2L4Buw4pYW1ZGnHxNTKKmbrYkZhBuxYuCNpfnyhB M00AnBKJQ7fqHKxCa64eo59rRTpYQ0Zd8KaKvVaZfZfbBaAg8Ir2UWNoBcPvE8ox D8TMhKlORMhHfnAE73DIlkENt1wYt2gGGScIJ+bL8nulJpqvNo5lPyTT3NHhrNZa prre5DzDQFvyv2SLx2P3MDqtyJ658hKx5own+82N99K5GuhC2++Xaq3/BpOC4rQI rEONoXhm0j63g2udCmkc1BIRSb+ZTaqzC1fxWoYH75nYEiIhGcgTQVJWXMx5DB/Y gvJJn/okC97zSXGk8zQtYIO2aDhUzRowYoy5bslzlR20hoNTWL9ctySE2OobqIdM WmWm/Hyq59cAMneimMv68+/RiWtxL2s5Q+8lci8uf18ollN1g/zp6H/1qOOMWvAr vqBZxyugEM72PbSEFv8Y =Jva/ -----END PGP SIGNATURE----- --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759478AbcINHTR (ORCPT ); Wed, 14 Sep 2016 03:19:17 -0400 Received: from g4t3425.houston.hpe.com ([15.241.140.78]:25169 "EHLO g4t3425.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755204AbcINHTP (ORCPT ); Wed, 14 Sep 2016 03:19:15 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Wed, 14 Sep 2016 09:18:58 +0200 Message-Id: <20160914071901.8127-1-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759942AbcINHT0 (ORCPT ); Wed, 14 Sep 2016 03:19:26 -0400 Received: from g4t3425.houston.hpe.com ([15.241.140.78]:25195 "EHLO g4t3425.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755452AbcINHTW (ORCPT ); Wed, 14 Sep 2016 03:19:22 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Wed, 14 Sep 2016 09:18:59 +0200 Message-Id: <20160914071901.8127-2-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760094AbcINHTh (ORCPT ); Wed, 14 Sep 2016 03:19:37 -0400 Received: from g4t3425.houston.hpe.com ([15.241.140.78]:25226 "EHLO g4t3425.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759948AbcINHTc (ORCPT ); Wed, 14 Sep 2016 03:19:32 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Date: Wed, 14 Sep 2016 09:19:01 +0200 Message-Id: <20160914071901.8127-4-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760016AbcINHTf (ORCPT ); Wed, 14 Sep 2016 03:19:35 -0400 Received: from g4t3425.houston.hpe.com ([15.241.140.78]:25216 "EHLO g4t3425.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759944AbcINHT3 (ORCPT ); Wed, 14 Sep 2016 03:19:29 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Date: Wed, 14 Sep 2016 09:19:00 +0200 Message-Id: <20160914071901.8127-3-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760003AbcINHYE (ORCPT ); Wed, 14 Sep 2016 03:24:04 -0400 Received: from g4t3428.houston.hpe.com ([15.241.140.76]:21469 "EHLO g4t3428.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756421AbcINHYD (ORCPT ); Wed, 14 Sep 2016 03:24:03 -0400 Subject: Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Cc: vpk@cs.columbia.edu From: Juerg Haefliger Message-ID: Date: Wed, 14 Sep 2016 09:23:58 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA Content-Type: multipart/mixed; boundary="rOu3TDjgvxEHAGHDHwIoR03apojltv8SM"; protected-headers="v1" From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> --rOu3TDjgvxEHAGHDHwIoR03apojltv8SM Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Resending to include the kernel-hardening list. Sorry, I wasn't subscribe= d with the correct email address when I sent this the first time. =2E..Juerg On 09/14/2016 09:18 AM, Juerg Haefliger wrote: > Changes from: > v1 -> v2: > - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) > arch-agnostic. > - Moved the config to the generic layer and added ARCH_SUPPORTS_XPF= O > for x86. > - Use page_ext for the additional per-page data. > - Removed the clearing of pages. This can be accomplished by using > PAGE_POISONING. > - Split up the patch into multiple patches. > - Fixed additional issues identified by reviewers. >=20 > This patch series adds support for XPFO which protects against 'ret2dir= ' > kernel attacks. The basic idea is to enforce exclusive ownership of pag= e > frames by either the kernel or userspace, unless explicitly requested b= y > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. >=20 > Additional fields in the page_ext struct are used for XPFO housekeeping= =2E > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operation= s > and a lock to serialize access to the XPFO fields. >=20 > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel n= eeds > to access userspace which need to be made XPFO-aware > - Performance penalty >=20 > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >=20 > Juerg Haefliger (3): > Add support for eXclusive Page Frame Ownership (XPFO) > xpfo: Only put previous userspace pages into the hot cache > block: Always use a bounce buffer when XPFO is enabled >=20 > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > block/blk-map.c | 2 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 41 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 10 ++- > mm/page_ext.c | 4 + > mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++= ++++++++ > security/Kconfig | 20 +++++ > 12 files changed, 314 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c >=20 --=20 Juerg Haefliger Hewlett Packard Enterprise --rOu3TDjgvxEHAGHDHwIoR03apojltv8SM-- --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX2PsOAAoJEHVMOpb5+LSMsA0QAIHMfTGCGDrTmqVL6Bi7/947 CYxgUUi235iAvh9DX+c50Oci+IrhRGKOiXOw7D4lj6MFYnhoBFKFlT/hioJoB+KU iv+Kb9HfN8Ab0BITVhmFKOJ8vsELLI/gbOTBceDimVoEndlkYXeP1AnVL56Y1Dmr 17k5Yhy4pdKLvOt4NYTprKEnc+td1XtbZ/biRZhrCRrhLFgaQDB2gOYZmu0kny7X Plp04Ts/fhsh8nh86ej1BeU4yg0XPexi9I+O8TSrzsG8LUSj3Ev1g/56rETzYeze +QOzUhuMOEZLju+5Cix9tjPG7RPPQJ+k1SqNhE4q+YHwwOhx+Qa5RJRL92/hyoDk cMBhDb5Mk/G2Y0CzvGYurfJxFny6h324NTjvUhNquTV5hXwy61e2qAkd0bOh2W3o 8RfwVp/xYoYeqbkcNcq+tyPSx6rC4MUC07jm28pn9McAyaLIBN63tuyAX9Hm8lAh euxdnSG0EcFMA2PpFVAvIoTY+a7l3gEViQPYdjmDgVY3Sbq7cBZJv4mBcgsnI7oY S2Jbd0y9oE7zv2lJL2xbt1Ylu3wR5+BHUWcg6nUrEHrNNJ/C3QRtoMEKA7MqP+l7 DgSIbUQZ5IFDZ5nNHUmGI6pz49PG+4k8aef2hoH3tzFJ1Az1u/qCSB5H0obSez9Q 2WwUHG3aQkkJwO5Rd0DS =Wq/n -----END PGP SIGNATURE----- --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932613AbcINHdr (ORCPT ); Wed, 14 Sep 2016 03:33:47 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:58894 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932522AbcINHdm (ORCPT ); Wed, 14 Sep 2016 03:33:42 -0400 Date: Wed, 14 Sep 2016 00:33:40 -0700 From: Christoph Hellwig To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Subject: Re: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Message-ID: <20160914073340.GA28090@infradead.org> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-4-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914071901.8127-4-juerg.haefliger@hpe.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 14, 2016 at 09:19:01AM +0200, Juerg Haefliger wrote: > This is a temporary hack to prevent the use of bio_map_user_iov() > which causes XPFO page faults. > > Signed-off-by: Juerg Haefliger Sorry, but if your scheme doesn't support get_user_pages access to user memory is't a steaming pile of crap and entirely unacceptable. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759582AbcINJgy (ORCPT ); Wed, 14 Sep 2016 05:36:54 -0400 Received: from foss.arm.com ([217.140.101.70]:36222 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753197AbcINJgw (ORCPT ); Wed, 14 Sep 2016 05:36:52 -0400 Date: Wed, 14 Sep 2016 10:36:34 +0100 From: Mark Rutland To: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org, juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20160914093634.GB13121@leverpostej> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > This patch series adds support for XPFO which protects against 'ret2dir' > kernel attacks. The basic idea is to enforce exclusive ownership of page > frames by either the kernel or userspace, unless explicitly requested by > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Just to check, doesn't DEBUG_RODATA ensure that the linear mapping is non-executable on x86_64 (as it does for arm64)? For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). Assuming that implies a lack of execute permission for x86_64, that should provide a similar level of protection against erroneously branching to addresses in the linear map, without the complexity and overhead of mapping/unmapping pages. So to me it looks like this approach may only be useful for architectures without page-granular execute permission controls. Is this also intended to protect against erroneous *data* accesses to the linear map? Am I missing something? Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758882AbcINJt0 (ORCPT ); Wed, 14 Sep 2016 05:49:26 -0400 Received: from foss.arm.com ([217.140.101.70]:36372 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755834AbcINJtZ (ORCPT ); Wed, 14 Sep 2016 05:49:25 -0400 Date: Wed, 14 Sep 2016 10:49:02 +0100 From: Mark Rutland To: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org, juerg.haefliger@hpe.com, vpk@cs.columbia.edu Subject: Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20160914094902.GA14330@leverpostej> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914093634.GB13121@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914093634.GB13121@leverpostej> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 14, 2016 at 10:36:34AM +0100, Mark Rutland wrote: > On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > > This patch series adds support for XPFO which protects against 'ret2dir' > > kernel attacks. The basic idea is to enforce exclusive ownership of page > > frames by either the kernel or userspace, unless explicitly requested by > > the kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). > Assuming that implies a lack of execute permission for x86_64, that > should provide a similar level of protection against erroneously > branching to addresses in the linear map, without the complexity and > overhead of mapping/unmapping pages. > > So to me it looks like this approach may only be useful for > architectures without page-granular execute permission controls. > > Is this also intended to protect against erroneous *data* accesses to > the linear map? Now that I read the paper more carefully, I can see that this is the case, and this does catch issues which DEBUG_RODATA cannot. Apologies for the noise. Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764095AbcINOdN (ORCPT ); Wed, 14 Sep 2016 10:33:13 -0400 Received: from mga07.intel.com ([134.134.136.100]:4839 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756090AbcINOdJ (ORCPT ); Wed, 14 Sep 2016 10:33:09 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,334,1470726000"; d="scan'208";a="168086710" Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu From: Dave Hansen Message-ID: <57D95FA3.3030103@intel.com> Date: Wed, 14 Sep 2016 07:33:07 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <20160914071901.8127-3-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/14/2016 12:19 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. Hi, I had some questions about this the last time you posted it. Maybe you want to address them now. -- But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764303AbcINOlB (ORCPT ); Wed, 14 Sep 2016 10:41:01 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:30480 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1764284AbcINOkz (ORCPT ); Wed, 14 Sep 2016 10:40:55 -0400 Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Dave Hansen , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> Cc: vpk@cs.columbia.edu From: Juerg Haefliger Message-ID: <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> Date: Wed, 14 Sep 2016 16:40:49 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <57D95FA3.3030103@intel.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MvDRGgCgUqFIL4RSGlCO7Xs24fkENiwV2" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MvDRGgCgUqFIL4RSGlCO7Xs24fkENiwV2 Content-Type: multipart/mixed; boundary="O1CtIAVv6aLLWs1NIc0rmG57UUaBW7E7P"; protected-headers="v1" From: Juerg Haefliger To: Dave Hansen , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> In-Reply-To: <57D95FA3.3030103@intel.com> --O1CtIAVv6aLLWs1NIc0rmG57UUaBW7E7P Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Dave, On 09/14/2016 04:33 PM, Dave Hansen wrote: > On 09/14/2016 12:19 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. >=20 > Hi, I had some questions about this the last time you posted it. Maybe= > you want to address them now. I did reply: https://lkml.org/lkml/2016/9/5/249 =2E..Juerg > -- >=20 > But kernel allocations do allocate from these pools, right? Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? >=20 > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? >=20 --O1CtIAVv6aLLWs1NIc0rmG57UUaBW7E7P-- --MvDRGgCgUqFIL4RSGlCO7Xs24fkENiwV2 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX2WFxAAoJEHVMOpb5+LSMrQoP/jtdf3Fbc1bSXJXL93ntuPf7 i3r1RKdrhfJgT8ScBPMs4PUahWaZMA/ktMfteu53+HXIIThGZNc5N5+T7VIrt7Xn rj594vhVfeXftedOqjlRIO2r2LRubX6qD+7oRHf5yLHhdr7W1wgFv7cZModu6Nuj h61J5RZoKGzogN0/Tn5hOoe6s6MS3qdcBaPNDFBTs0ZJz4s0X+i70pz9ysIH2jgS kgxuQJ1bWnvZztlkzGJZLiK1CNBezHHy58zFyFx7HJTNR4G5FW+2nNyqVUUXIRvi GddECUJaPPIIuIyNfeuiFqiJ5oYjuzhWKDbMfZXTaRfZr1mCtmUcwSjuiv/NlOJI ynuq+A+GnTxSiXqQtenuARSzgz8QCkBi9ZuDzHU6wlSY3/GVO3XVf0yhqpYwEbQF pHV3Rjrwkx+rOa35wTYkdR+JCEZZQoxQj9tXyGpctxrbZStc5eOuIRHzJu2mZo7P eiGy4UsdNR+RtClauD+4QMLb4udStmFMG3RMwA78ioJYZE4j93QHGJN7+HBq/e7d FuEkuWcAk/UsNVJdfuBjHyATNin+wjxrllEcPA4OPzHEbMevsXO4AE8ucmDSNGvD hWa50Ahf8x/Zx/FvyBUSD7+g5c5CK9wNLdE70WwWUkphFGEhbJFhIm7wrTv0700g KGlgI1DsXBwxzIzvnv/8 =T/wt -----END PGP SIGNATURE----- --MvDRGgCgUqFIL4RSGlCO7Xs24fkENiwV2-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1763688AbcINOs3 (ORCPT ); Wed, 14 Sep 2016 10:48:29 -0400 Received: from mga06.intel.com ([134.134.136.31]:40266 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756090AbcINOs1 (ORCPT ); Wed, 14 Sep 2016 10:48:27 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.30,334,1470726000"; d="scan'208";a="1050190312" Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Juerg Haefliger , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> Cc: vpk@cs.columbia.edu From: Dave Hansen Message-ID: <57D9633A.2010702@intel.com> Date: Wed, 14 Sep 2016 07:48:26 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On 09/02/2016 10:39 PM, Dave Hansen wrote: >> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Does this >> just mean that kernel allocations usually have to pay the penalty to >> convert a page? > > Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were > previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. > >> So, what's the logic here? You're assuming that order-0 kernel >> allocations are more rare than allocations for userspace? > > The logic is to put reclaimed kernel pages into the cold cache to > postpone their allocation as long as possible to minimize (potential) > TLB flushes. OK, but if we put them in the cold area but kernel allocations pull them from the hot cache, aren't we virtually guaranteeing that kernel allocations will have to to TLB shootdown to convert a page? It seems like you also need to convert all kernel allocations to pull from the cold area. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754557AbcIUFci (ORCPT ); Wed, 21 Sep 2016 01:32:38 -0400 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:37508 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751190AbcIUFcg (ORCPT ); Wed, 21 Sep 2016 01:32:36 -0400 From: Juerg Haefliger Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Dave Hansen , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> <57D9633A.2010702@intel.com> Cc: vpk@cs.columbia.edu Message-ID: <09d3ac8c-1111-b7aa-4720-b7a7b7c7798b@hpe.com> Date: Wed, 21 Sep 2016 07:32:09 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <57D9633A.2010702@intel.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="c2csM3GwGuAP6k4UUq4uQ3pOeC0oH3gwe" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --c2csM3GwGuAP6k4UUq4uQ3pOeC0oH3gwe Content-Type: multipart/mixed; boundary="cp3KG1pRWNVkg2cUrE2FeqpfXPuIvKHFp"; protected-headers="v1" From: Juerg Haefliger To: Dave Hansen , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: <09d3ac8c-1111-b7aa-4720-b7a7b7c7798b@hpe.com> Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> <57D9633A.2010702@intel.com> In-Reply-To: <57D9633A.2010702@intel.com> --cp3KG1pRWNVkg2cUrE2FeqpfXPuIvKHFp Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 09/14/2016 04:48 PM, Dave Hansen wrote: >> On 09/02/2016 10:39 PM, Dave Hansen wrote: >>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >>> Does this >>> just mean that kernel allocations usually have to pay the penalty to >>> convert a page? >> >> Only pages that are allocated for userspace (gfp & GFP_HIGHUSER =3D=3D= GFP_HIGHUSER) which were >> previously allocated for the kernel (gfp & GFP_HIGHUSER !=3D GFP_HIGHU= SER) have to pay the penalty. >> >>> So, what's the logic here? You're assuming that order-0 kernel >>> allocations are more rare than allocations for userspace? >> >> The logic is to put reclaimed kernel pages into the cold cache to >> postpone their allocation as long as possible to minimize (potential) >> TLB flushes. >=20 > OK, but if we put them in the cold area but kernel allocations pull the= m > from the hot cache, aren't we virtually guaranteeing that kernel > allocations will have to to TLB shootdown to convert a page? No. Allocations for the kernel never require a TLB shootdown. Only alloca= tions for userspace (and only if the page was previously a kernel page). > It seems like you also need to convert all kernel allocations to pull > from the cold area. Kernel allocations can continue to pull from the hot cache. Maybe introdu= ce another cache for the userspace pages? But I'm not sure what other implications this might have= =2E =2E..Juerg --cp3KG1pRWNVkg2cUrE2FeqpfXPuIvKHFp-- --c2csM3GwGuAP6k4UUq4uQ3pOeC0oH3gwe Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX4httAAoJEHVMOpb5+LSMZ70P+wUBIgtKwmFUXkR1gRWJRVMx qaebjDcW32Edkxrg0579JSX3QHbpdQ9FU/oA/2gAGWpBi+w1WzJM/RzRHPxEG+ef e9vimmquTKzWdJSwEy1AqJSwF3QE39o0aJmsGBycvrc9mKQKb8rSSjpthSlOPCtb S8a9IRL3zghpOAeQSkKuiWYhYTHcfmYQpSBcBrzP3cCuX17LNKNHIGeRb4uFdbMA MMtSUCnXt8mkk5HgTXkAGv0UN+ox+bBIQ1hzHbdyTSahKzi6pIDlcufMaz9JSxHH CVVdll/vzPBB2hAQ7nTa+4bOULe091bHMDM4ibI9O3e70E1HIdGwjyKHZFdTMQeT Ft8GC4+LRQYphOlZi/UrhzLPaZMfew4PZAslo6EuPyixxZ+9B/hMy70O35p8y1zL 3RYb+oAPF8WGUQXQ71DHj3eOVaHxPwTxFXSF2rdRoGeicQEN1UP8J3X6ztbrUXR6 rPVoArezDHp1vaj7IzeLAK/HDUKGQvhMi4sur9AHgdu6GK0zKg59OYZEgUHcilOU TvatAm5stL1IA127BzN1LBE4TZgIx78PxIqtTHXX5SR6n0x0WstcEgXuV5gR1YF+ vxHNnBNBlLWTI5ztKRu5fD5pJRlRdqdua9WZEJVjj94vR+WgkE0tD5yMr2YPGoDZ 9F1XnrIX6PCxw9lxJ8mq =MRdI -----END PGP SIGNATURE----- --c2csM3GwGuAP6k4UUq4uQ3pOeC0oH3gwe-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935107AbcKDOpt (ORCPT ); Fri, 4 Nov 2016 10:45:49 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:53066 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934919AbcKDOpr (ORCPT ); Fri, 4 Nov 2016 10:45:47 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com Subject: [RFC PATCH v3 0/2] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 4 Nov 2016 15:45:32 +0100 Message-Id: <20161104144534.14790-1-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Changes from: v2 -> v3: - Removed 'depends on DEBUG_KERNEL' and 'select DEBUG_TLBFLUSH'. These are left-overs from the original patch and are not required. - Make libata XPFO-aware, i.e., properly handle pages that were unmapped by XPFO. This takes care of the temporary hack in v2 that forced the use of a bounce buffer in block/blk-map.c. v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (removed from the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (2): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 315 insertions(+), 8 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935124AbcKDOp5 (ORCPT ); Fri, 4 Nov 2016 10:45:57 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:53068 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934888AbcKDOpy (ORCPT ); Fri, 4 Nov 2016 10:45:54 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com Subject: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 4 Nov 2016 15:45:33 +0100 Message-Id: <20161104144534.14790-2-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 298 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bada636d1065..38b334f8fde5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 22af912d66d2..a6fafbae02bb 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 051b6158d1b7..58af734be25d 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use a bounce buffer */ @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use bounce buffer */ diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 9298c393ddaa..0e451a42e5a3 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,6 +29,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -44,6 +46,11 @@ enum page_ext_flags { */ struct page_ext { unsigned long flags; +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 295bd7a9f76b..175680f516aa 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fd42aa7c4bd..100e80e008e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 121dcffc4ec1..ba6dbcacc2db 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..8e3a6a694b6a --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,206 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} +EXPORT_SYMBOL(xpfo_page_is_unmapped); diff --git a/security/Kconfig b/security/Kconfig index 118f4549404e..4502e15c8419 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,25 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on ARCH_SUPPORTS_XPFO + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935153AbcKDOqE (ORCPT ); Fri, 4 Nov 2016 10:46:04 -0400 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:53073 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934104AbcKDOqC (ORCPT ); Fri, 4 Nov 2016 10:46:02 -0400 From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com Subject: [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Date: Fri, 4 Nov 2016 15:45:34 +0100 Message-Id: <20161104144534.14790-3-juerg.haefliger@hpe.com> X-Mailer: git-send-email 2.10.1 In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 100e80e008e2..09ef4f7cfd14 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2440,7 +2440,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index 8e3a6a694b6a..0e447e38008a 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -204,3 +204,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } EXPORT_SYMBOL(xpfo_page_is_unmapped); + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935215AbcKDOuo (ORCPT ); Fri, 4 Nov 2016 10:50:44 -0400 Received: from bombadil.infradead.org ([198.137.202.9]:44227 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935113AbcKDOum (ORCPT ); Fri, 4 Nov 2016 10:50:42 -0400 Date: Fri, 4 Nov 2016 07:50:40 -0700 From: Christoph Hellwig To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu, Tejun Heo , linux-ide@vger.kernel.org Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20161104145040.GA24930@infradead.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> User-Agent: Mutt/1.6.1 (2016-04-27) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org. See http://www.infradead.org/rpr.html Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The libata parts here really need to be split out and the proper list and maintainer need to be Cc'ed. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h This is just piling one nasty hack on top of another. libata should just use the highmem case unconditionally, as it is the correct thing to do for all cases. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752312AbcKJFyT (ORCPT ); Thu, 10 Nov 2016 00:54:19 -0500 Received: from szxga01-in.huawei.com ([58.251.152.64]:25621 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751400AbcKJFyR (ORCPT ); Thu, 10 Nov 2016 00:54:17 -0500 Subject: Re: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: , , , References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> CC: , From: "ZhaoJunmin Zhao(Junmin)" Message-ID: <58240B46.7080108@huawei.com> Date: Thu, 10 Nov 2016 13:53:10 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Content-Type: text/plain; charset="gbk"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.111.57.210] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > When a physical page is assigned to a process in user space, it should be unmaped from kernel physmap. From the code, I can see the patch only handle the page in high memory zone. if the kernel use the high memory zone, it will call the kmap. So I would like to know if the physical page is coming from normal zone,how to handle it. Thanks Zhaojunmin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964977AbcKJTLj (ORCPT ); Thu, 10 Nov 2016 14:11:39 -0500 Received: from mail-wm0-f41.google.com ([74.125.82.41]:36892 "EHLO mail-wm0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964779AbcKJTLh (ORCPT ); Thu, 10 Nov 2016 14:11:37 -0500 MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:11:34 -0800 X-Google-Sender-Auth: l_RFGWczxEoptaTPP6Gz7CpNqe0 Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. Thanks for keeping on this! I'd really like to see it land and then get more architectures to support it. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty In the Kconfig you say "slight", but I'm curious what kinds of benchmarks you've done and if there's a more specific cost we can declare, just to give people more of an idea what the hit looks like? (What workloads would trigger a lot of XPFO unmapping, for example?) Thanks! -Kees -- Kees Cook Nexus Security From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965189AbcKJTYv (ORCPT ); Thu, 10 Nov 2016 14:24:51 -0500 Received: from mail-wm0-f45.google.com ([74.125.82.45]:38415 "EHLO mail-wm0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932259AbcKJTYt (ORCPT ); Thu, 10 Nov 2016 14:24:49 -0500 MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:24:46 -0800 X-Google-Sender-Auth: Qnk0qdxR-CCbMzmat9R-4Zkfh6I Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Would it be possible to create an lkdtm test that can exercise this protection? > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool Can you include a "help" section here to describe what requirements an architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and HAVE_ARCH_VMAP_STACK or some examples. > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > I've added these patches to my kspp tree on kernel.org, so it should get some 0-day testing now... Thanks! -Kees -- Kees Cook Nexus Security From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752090AbcKOLPX (ORCPT ); Tue, 15 Nov 2016 06:15:23 -0500 Received: from g2t1383g.austin.hpe.com ([15.233.16.89]:6211 "EHLO g2t1383g.austin.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772AbcKOLPU (ORCPT ); Tue, 15 Nov 2016 06:15:20 -0500 Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Kees Cook References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu From: Juerg Haefliger Message-ID: Date: Tue, 15 Nov 2016 12:15:14 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: multipart/mixed; boundary="Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sorry for the late reply, I just found your email in my cluttered inbox. On 11/10/2016 08:11 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >=20 > Thanks for keeping on this! I'd really like to see it land and then > get more architectures to support it. Good to hear :-) >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >=20 > In the Kconfig you say "slight", but I'm curious what kinds of > benchmarks you've done and if there's a more specific cost we can > declare, just to give people more of an idea what the hit looks like? > (What workloads would trigger a lot of XPFO unmapping, for example?) That 'slight' wording is based on the performance numbers published in th= e referenced paper. So far I've only run kernel compilation tests. For that workload, the big= performance hit comes from disabling >4k page sizes (around 10%). Adding XPFO on top causes 'only' a= nother 0.5% performance penalty. I'm currently looking into adding support for larger page sizes = to see what the real impact is and then generate some more relevant numbers. =2E..Juerg > Thanks! >=20 > -Kees >=20 --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR-- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu5DAAoJEHVMOpb5+LSMzQ8P+wWBd+Sen2m8U4Q7HjsdGCoB 9fHq5r8x/bt+WvqF2i8vMR5Txrfn/EoOAkxkOu8tYiq7ECnHSnETAR8NVR2ckp0M cizhmBdOiiMcOUiLSnPGxEx9390Qdx5li0ODwqQS5dSa9qCkBbbv6qf7ri5CzDFH VO+OIAHI/kChTi4baKENq3UNHh0+8s/M0dykDwStIjrDG4Nh+IcEWOeDvOBWZ5HG qxZQEg20reipzZTcba7paJ/pJQZBuKg/AFdQW/RFBFK3O0JngWKp67ZmxSU7PHw+ xr9qpKy+N9Yk3q5id7q2f2zA7eq3a3uYTNC+8d7zc6KQJIofnCLX/3dtuIEwS9rR QSxQIPtk2sFmPLy/kXpU2RihdIJijJtx7RmbW7KEiuUMwUO+dDjjwJul9SNxlYWg gYjUxPAGP6jxfGL443YKNbss2e5KfIh6LXlJpbtnD0WEfYiI7Ef2Y2qRrXpCkcw/ Z2kBLojOJOn8HagkHJiiw8lTwgDm2+YNcUWQoDgaTK9xOoAfMssETJfFaiGt6hsG 7VJot9jHg33kSZDyiTVBV6nwmCkOqtgXINYj8Q82iRmWUKPq2VEQEWWlvg31N9eu S1L7EFIaAzZvt+6qc/GCrjjQzgOz+En/UyfmPoojJ+A6dx8/gM6oWkOOZDsG614J 9rFANUbutWyZav73fc/L =Wzyi -----END PGP SIGNATURE----- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752724AbcKOLSU (ORCPT ); Tue, 15 Nov 2016 06:18:20 -0500 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:27478 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751127AbcKOLSQ (ORCPT ); Tue, 15 Nov 2016 06:18:16 -0500 Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Kees Cook References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu From: Juerg Haefliger Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Date: Tue, 15 Nov 2016 12:18:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: multipart/mixed; boundary="rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/10/2016 08:24 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >> >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >=20 > Would it be possible to create an lkdtm test that can exercise this pro= tection? I'll look into it. >> diff --git a/security/Kconfig b/security/Kconfig >> index 118f4549404e..4502e15c8419 100644 >> --- a/security/Kconfig >> +++ b/security/Kconfig >> @@ -6,6 +6,25 @@ menu "Security options" >> >> source security/keys/Kconfig >> >> +config ARCH_SUPPORTS_XPFO >> + bool >=20 > Can you include a "help" section here to describe what requirements an > architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and > HAVE_ARCH_VMAP_STACK or some examples. Will do. >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on ARCH_SUPPORTS_XPFO >> + select PAGE_EXTENSION >> + help >> + This option offers protection against 'ret2dir' kernel attac= ks. >> + When enabled, every time a page frame is allocated to user s= pace, it >> + is unmapped from the direct mapped RAM region in kernel spac= e >> + (physmap). Similarly, when a page frame is freed/reclaimed, = it is >> + mapped back to physmap. >> + >> + There is a slight performance impact when this option is ena= bled. >> + >> + If in doubt, say "N". >> + >> config SECURITY_DMESG_RESTRICT >> bool "Restrict unprivileged access to the kernel syslog" >> default n >=20 > I've added these patches to my kspp tree on kernel.org, so it should > get some 0-day testing now... Very good. Thanks! > Thanks! Appreciate the feedback. =2E..Juerg > -Kees >=20 --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA-- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu7zAAoJEHVMOpb5+LSMThkP/1ZSAODxbIB2ebdrvax2absi nJwtgo56pBL7g7OJu/OrxUXvMHi9LGfahZOUTUmRCiQIL60EdjCJvQB9wcASVr3i 7AO1ztMGZxmGl/UlobukQs0xTlFU9FcYJFxTqKQPHA8PFnzQZe5jqG1JwTjhw4Z7 ANULiFZGG0G0vSXAagWwiwdzZJyt4HCSamfoESBKSBTK8TywvIFDqy/qsHHlmpjd EExwax4E/VB+Yl8Tg2RvgHHI1kQpTB1dPBfAQvXOTjujdHVGxVZSZBss+3HXL5vi BbNA0Gez+aNvVp2tTTeyWce9y11nIAZgU4rcjxkBqGoU73S+I2ltlIN7MCbKOYR3 /wGxXpCeOCWRVcFxm4yxnQcWOXWMa7aIVHMf7uHU53oKOqGtglFQcMR6V4bcmNG9 n+jLQZr/ADR9PJ2Rsb1vVyOlNiy+uQ+JCA5lBfEe+ckPW2MSc5GedzeETGYQgdUS u9ZzGrbtW9++PXXjgm6YBoaij0vjhVH2/Q1WU3wwdzBDGIaRpy1Bh0zShDdQ7S8y G83c8dHH4Yc1CIljCA0+Ipur3nvuoJKdc6Kxy+j1JK86t6dK8sktXS/1SnBIGM7T L30CH60pgfyvpDEWbSXoQXjdyuYMaQALBYX258KXuH8e9+vjPrO/UC8prgJqK/C1 rbWnk9S8v1HGxMfThiYi =Vrrl -----END PGP SIGNATURE----- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S940134AbcKXK5Q (ORCPT ); Thu, 24 Nov 2016 05:57:16 -0500 Received: from mail-pg0-f41.google.com ([74.125.83.41]:36288 "EHLO mail-pg0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S938974AbcKXK5A (ORCPT ); Thu, 24 Nov 2016 05:57:00 -0500 Date: Thu, 24 Nov 2016 19:56:30 +0900 From: AKASHI Takahiro To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20161124105629.GA23034@linaro.org> Mail-Followup-To: AKASHI Takahiro , Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, I'm trying to give it a spin on arm64, but ... On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, would it be better to put the whole definition into arch-specific part? > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); Why not PAGE_KERNEL? > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); Again __flush_tlb_one() is x86-specific. flush_tlb_kernel_range() instead? Thanks, -Takahiro AKASHI > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932843AbcK1LPZ (ORCPT ); Mon, 28 Nov 2016 06:15:25 -0500 Received: from g9t1613g.houston.hpe.com ([15.241.32.99]:44622 "EHLO g9t1613g.houston.hpe.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932283AbcK1LPQ (ORCPT ); Mon, 28 Nov 2016 06:15:16 -0500 Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: AKASHI Takahiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> From: Juerg Haefliger Message-ID: <795a34a6-ed04-dea3-73f5-d23e48f69de6@hpe.com> Date: Mon, 28 Nov 2016 12:15:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161124105629.GA23034@linaro.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xK83457Tl0b0VsjtiEgrgA8md3HirLXVv" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv Content-Type: multipart/mixed; boundary="7nt0D3PUfp44460FfNLT0gAFB4oxfXvll"; protected-headers="v1" From: Juerg Haefliger To: AKASHI Takahiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: <795a34a6-ed04-dea3-73f5-d23e48f69de6@hpe.com> Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> In-Reply-To: <20161124105629.GA23034@linaro.org> --7nt0D3PUfp44460FfNLT0gAFB4oxfXvll Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 11/24/2016 11:56 AM, AKASHI Takahiro wrote: > Hi, >=20 > I'm trying to give it a spin on arm64, but ... Thanks for trying this. >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte =3D lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level !=3D PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)))= ; >> +} >=20 > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-sp= ecific, > would it be better to put the whole definition into arch-specific part?= Well yes but I haven't really looked into splitting up the arch specific = stuff. >> + /* >> + * Map the page back into the kernel if it was previously >> + * allocated to user space. >> + */ >> + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, >> + &page_ext->flags)) { >> + kaddr =3D (unsigned long)page_address(page + i); >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >=20 > Why not PAGE_KERNEL? Good catch, thanks! >> + /* >> + * The page is to be allocated back to user space, so unmap it from = the >> + * kernel, flush the TLB and tag it as a user page. >> + */ >> + if (atomic_dec_return(&page_ext->mapcount) =3D=3D 0) { >> + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); >> + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >=20 > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? I'll take a look. If you can tell me what the relevant arm64 equivalents = are for the arch-specific functions, that would help tremendously. Thanks for the comments! =2E..Juerg > Thanks, > -Takahiro AKASHI --=20 Juerg Haefliger Hewlett Packard Enterprise --7nt0D3PUfp44460FfNLT0gAFB4oxfXvll-- --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYPBG+AAoJEHVMOpb5+LSMYaUP/ivlQhGWbPz1scInxJxIBSSL dHPcug/WEH2XjLIfm1BEhWVNMBYSUrVN/eWWcWE7BjYh7O+/makinUSIESNcbTPw uuA5NiMtsBEBgjgReq+hWC/yLJg0P3HFxFIdlg6nl8QnbGe3xT31UUm3/KxowaEb QcCvONwXl46FxpCMoQxq8Y4+2oSJm7Skaxp3lP3zPPuLClOvucxtbWOFM77nompO 1GagLX+kssFGKYNlUdkNlEK487hbLNkOx4Ipz9IqoPLvRNiYSJCjVlelFYkV6dfz UzBPbchD/HHiGIs8jPZFucGeFgMr9SMRNhJ6yMDfHjNXGsw1PycW93MVU3h2wIUH y+jW1IXmMiOI8q89sHPIAJtBYxRxDIStYmmd6XpdFhEmdhQwTJpR0uObwigDxcHz qvy88HvWepH8OnT/XkKfNNT7/HuVkg/jYbmraiLYP+ALWQBJg+iStaQ5bsRGtosh eQ17odAAs1438iWIaqSr84KtffSsKO+bNARWXAOhd2RPOoJAsWudpl/EkNQ+fyWd Lm0X2UfLQJ9MPRIdfXhFL0LkHGOYHfzut/8yG9KKTglV/sSoxDjtkbsWIm9TgyYT wpVs1zRAU9JUOfMkPeb+ih0oYZy7KZ1dJNSYPuBcfsQhHEeAAWYu539L51kmbPyu sB/zTqnSlUBfM71Ha3fV =GxFB -----END PGP SIGNATURE----- --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932679AbcLIJCx (ORCPT ); Fri, 9 Dec 2016 04:02:53 -0500 Received: from mail-pg0-f50.google.com ([74.125.83.50]:34103 "EHLO mail-pg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932347AbcLIJCq (ORCPT ); Fri, 9 Dec 2016 04:02:46 -0500 Date: Fri, 9 Dec 2016 18:02:53 +0900 From: AKASHI Takahiro To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20161209090251.GF23034@linaro.org> Mail-Followup-To: AKASHI Takahiro , Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161124105629.GA23034@linaro.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Nov 24, 2016 at 07:56:30PM +0900, AKASHI Takahiro wrote: > Hi, > > I'm trying to give it a spin on arm64, but ... In my experiment on hikey, the kernel boot failed, catching a page fault around cache operations, (a) __clean_dcache_area_pou() on 4KB-page kernel, (b) __inval_cache_range() on 64KB-page kernel, (See more details for backtrace below.) This is because, on arm64, cache operations are by VA (in particular, of direct/linear mapping of physical memory). So I think that naively unmapping a page from physmap in xpfo_kunmap() won't work well on arm64. -Takahiro AKASHI case (a) -------- Unable to handle kernel paging request at virtual address ffff800000cba000 pgd = ffff80003ba8c000 *pgd=0000000000000000 task: ffff80003be38000 task.stack: ffff80003be40000 PC is at __clean_dcache_area_pou+0x20/0x38 LR is at sync_icache_aliases+0x2c/0x40 ... Call trace: ... __clean_dcache_area_pou+0x20/0x38 __sync_icache_dcache+0x6c/0xa8 alloc_set_pte+0x33c/0x588 filemap_map_pages+0x3a8/0x3b8 handle_mm_fault+0x910/0x1080 do_page_fault+0x2b0/0x358 do_mem_abort+0x44/0xa0 el0_ia+0x18/0x1c case (b) -------- Unable to handle kernel paging request at virtual address ffff80002aed0000 pgd = ffff000008f40000 , *pud=000000003dfc0003 , *pmd=000000003dfa0003 , *pte=000000002aed0000 task: ffff800028711900 task.stack: ffff800029020000 PC is at __inval_cache_range+0x3c/0x60 LR is at __swiotlb_map_sg_attrs+0x6c/0x98 ... Call trace: ... __inval_cache_range+0x3c/0x60 dw_mci_pre_dma_transfer.isra.7+0xfc/0x190 dw_mci_pre_req+0x50/0x60 mmc_start_req+0x4c/0x420 mmc_blk_issue_rw_rq+0xb0/0x9b8 mmc_blk_issue_rq+0x154/0x518 mmc_queue_thread+0xac/0x158 kthread+0xd0/0xe8 ret_from_fork+0x10/0x20 > > On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > > This patch adds support for XPFO which protects against 'ret2dir' kernel > > attacks. The basic idea is to enforce exclusive ownership of page frames > > by either the kernel or userspace, unless explicitly requested by the > > kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > > > Additional fields in the page_ext struct are used for XPFO housekeeping. > > Specifically two flags to distinguish user vs. kernel pages and to tag > > unmapped pages and a reference counter to balance kmap/kunmap operations > > and a lock to serialize access to the XPFO fields. > > > > Known issues/limitations: > > - Only supports x86-64 (for now) > > - Only supports 4k pages (for now) > > - There are most likely some legitimate uses cases where the kernel needs > > to access userspace which need to be made XPFO-aware > > - Performance penalty > > > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > > > Suggested-by: Vasileios P. Kemerlis > > Signed-off-by: Juerg Haefliger > > --- > > arch/x86/Kconfig | 3 +- > > arch/x86/mm/init.c | 2 +- > > drivers/ata/libata-sff.c | 4 +- > > include/linux/highmem.h | 15 +++- > > include/linux/page_ext.h | 7 ++ > > include/linux/xpfo.h | 39 +++++++++ > > lib/swiotlb.c | 3 +- > > mm/Makefile | 1 + > > mm/page_alloc.c | 2 + > > mm/page_ext.c | 4 + > > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > > security/Kconfig | 19 +++++ > > 12 files changed, 298 insertions(+), 7 deletions(-) > > create mode 100644 include/linux/xpfo.h > > create mode 100644 mm/xpfo.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index bada636d1065..38b334f8fde5 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -165,6 +165,7 @@ config X86 > > select HAVE_STACK_VALIDATION if X86_64 > > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > > + select ARCH_SUPPORTS_XPFO if X86_64 > > > > config INSTRUCTION_DECODER > > def_bool y > > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > > > config X86_DIRECT_GBPAGES > > def_bool y > > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > > ---help--- > > Certain kernel features effectively disable kernel > > linear 1 GB mappings (even if the CPU otherwise > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index 22af912d66d2..a6fafbae02bb 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -161,7 +161,7 @@ static int page_size_mask; > > > > static void __init probe_page_size_mask(void) > > { > > -#if !defined(CONFIG_KMEMCHECK) > > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > > /* > > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > > * use small pages. > > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > > index 051b6158d1b7..58af734be25d 100644 > > --- a/drivers/ata/libata-sff.c > > +++ b/drivers/ata/libata-sff.c > > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use a bounce buffer */ > > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use bounce buffer */ > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index bb3f3297062a..7a17c166532f 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > > #ifndef ARCH_HAS_KMAP > > static inline void *kmap(struct page *page) > > { > > + void *kaddr; > > + > > might_sleep(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > > > static inline void kunmap(struct page *page) > > { > > + xpfo_kunmap(page_address(page), page); > > } > > > > static inline void *kmap_atomic(struct page *page) > > { > > + void *kaddr; > > + > > preempt_disable(); > > pagefault_disable(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > > > static inline void __kunmap_atomic(void *addr) > > { > > + xpfo_kunmap(addr, virt_to_page(addr)); > > pagefault_enable(); > > preempt_enable(); > > } > > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > > index 9298c393ddaa..0e451a42e5a3 100644 > > --- a/include/linux/page_ext.h > > +++ b/include/linux/page_ext.h > > @@ -29,6 +29,8 @@ enum page_ext_flags { > > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > > PAGE_EXT_DEBUG_GUARD, > > PAGE_EXT_OWNER, > > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > PAGE_EXT_YOUNG, > > PAGE_EXT_IDLE, > > @@ -44,6 +46,11 @@ enum page_ext_flags { > > */ > > struct page_ext { > > unsigned long flags; > > +#ifdef CONFIG_XPFO > > + int inited; /* Map counter and lock initialized */ > > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > > +#endif > > }; > > > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > > new file mode 100644 > > index 000000000000..77187578ca33 > > --- /dev/null > > +++ b/include/linux/xpfo.h > > @@ -0,0 +1,39 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#ifndef _LINUX_XPFO_H > > +#define _LINUX_XPFO_H > > + > > +#ifdef CONFIG_XPFO > > + > > +extern struct page_ext_operations page_xpfo_ops; > > + > > +extern void xpfo_kmap(void *kaddr, struct page *page); > > +extern void xpfo_kunmap(void *kaddr, struct page *page); > > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > > +extern void xpfo_free_page(struct page *page, int order); > > + > > +extern bool xpfo_page_is_unmapped(struct page *page); > > + > > +#else /* !CONFIG_XPFO */ > > + > > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > > +static inline void xpfo_free_page(struct page *page, int order) { } > > + > > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > > + > > +#endif /* CONFIG_XPFO */ > > + > > +#endif /* _LINUX_XPFO_H */ > > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > > index 22e13a0e19d7..455eff44604e 100644 > > --- a/lib/swiotlb.c > > +++ b/lib/swiotlb.c > > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > > { > > unsigned long pfn = PFN_DOWN(orig_addr); > > unsigned char *vaddr = phys_to_virt(tlb_addr); > > + struct page *page = pfn_to_page(pfn); > > > > - if (PageHighMem(pfn_to_page(pfn))) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > /* The buffer does not have a mapping. Map it in and copy */ > > unsigned int offset = orig_addr & ~PAGE_MASK; > > char *buffer; > > diff --git a/mm/Makefile b/mm/Makefile > > index 295bd7a9f76b..175680f516aa 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > > +obj-$(CONFIG_XPFO) += xpfo.o > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8fd42aa7c4bd..100e80e008e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > > kernel_poison_pages(page, 1 << order, 0); > > kernel_map_pages(page, 1 << order, 0); > > kasan_free_pages(page, order); > > + xpfo_free_page(page, order); > > > > return true; > > } > > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > > kernel_map_pages(page, 1 << order, 1); > > kernel_poison_pages(page, 1 << order, 1); > > kasan_alloc_pages(page, order); > > + xpfo_alloc_page(page, order, gfp_flags); > > set_page_owner(page, order, gfp_flags); > > } > > > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 121dcffc4ec1..ba6dbcacc2db 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * struct page extension > > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > &page_idle_ops, > > #endif > > +#ifdef CONFIG_XPFO > > + &page_xpfo_ops, > > +#endif > > }; > > > > static unsigned long total_usage; > > diff --git a/mm/xpfo.c b/mm/xpfo.c > > new file mode 100644 > > index 000000000000..8e3a6a694b6a > > --- /dev/null > > +++ b/mm/xpfo.c > > @@ -0,0 +1,206 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > > + > > +static bool need_xpfo(void) > > +{ > > + return true; > > +} > > + > > +static void init_xpfo(void) > > +{ > > + printk(KERN_INFO "XPFO enabled\n"); > > + static_branch_enable(&xpfo_inited); > > +} > > + > > +struct page_ext_operations page_xpfo_ops = { > > + .need = need_xpfo, > > + .init = init_xpfo, > > +}; > > + > > +/* > > + * Update a single kernel page table entry > > + */ > > +static inline void set_kpte(struct page *page, unsigned long kaddr, > > + pgprot_t prot) { > > + unsigned int level; > > + pte_t *kpte = lookup_address(kaddr, &level); > > + > > + /* We only support 4k pages for now */ > > + BUG_ON(!kpte || level != PG_LEVEL_4K); > > + > > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > > +} > > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, > would it be better to put the whole definition into arch-specific part? > > > + > > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > > +{ > > + int i, flush_tlb = 0; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + > > + /* Initialize the map lock and map counter */ > > + if (!page_ext->inited) { > > + spin_lock_init(&page_ext->maplock); > > + atomic_set(&page_ext->mapcount, 0); > > + page_ext->inited = 1; > > + } > > + BUG_ON(atomic_read(&page_ext->mapcount)); > > + > > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > > + /* > > + * Flush the TLB if the page was previously allocated > > + * to the kernel. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > > + &page_ext->flags)) > > + flush_tlb = 1; > > + } else { > > + /* Tag the page as a kernel page */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + } > > + } > > + > > + if (flush_tlb) { > > + kaddr = (unsigned long)page_address(page); > > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > > + PAGE_SIZE); > > + } > > +} > > + > > +void xpfo_free_page(struct page *page, int order) > > +{ > > + int i; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + if (!page_ext->inited) { > > + /* > > + * The page was allocated before page_ext was > > + * initialized, so it is a kernel page and it needs to > > + * be tagged accordingly. > > + */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + continue; > > + } > > + > > + /* > > + * Map the page back into the kernel if it was previously > > + * allocated to user space. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > > + &page_ext->flags)) { > > + kaddr = (unsigned long)page_address(page + i); > > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > > Why not PAGE_KERNEL? > > > + } > > + } > > +} > > + > > +void xpfo_kmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page was previously allocated to user space, so map it back > > + * into the kernel. No TLB flush required. > > + */ > > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kmap); > > + > > +void xpfo_kunmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page is to be allocated back to user space, so unmap it from the > > + * kernel, flush the TLB and tag it as a user page. > > + */ > > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > > + __flush_tlb_one((unsigned long)kaddr); > > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? > > Thanks, > -Takahiro AKASHI > > > + } > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kunmap); > > + > > +inline bool xpfo_page_is_unmapped(struct page *page) > > +{ > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return false; > > + > > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > > +} > > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > > diff --git a/security/Kconfig b/security/Kconfig > > index 118f4549404e..4502e15c8419 100644 > > --- a/security/Kconfig > > +++ b/security/Kconfig > > @@ -6,6 +6,25 @@ menu "Security options" > > > > source security/keys/Kconfig > > > > +config ARCH_SUPPORTS_XPFO > > + bool > > + > > +config XPFO > > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > > + default n > > + depends on ARCH_SUPPORTS_XPFO > > + select PAGE_EXTENSION > > + help > > + This option offers protection against 'ret2dir' kernel attacks. > > + When enabled, every time a page frame is allocated to user space, it > > + is unmapped from the direct mapped RAM region in kernel space > > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > > + mapped back to physmap. > > + > > + There is a slight performance impact when this option is enabled. > > + > > + If in doubt, say "N". > > + > > config SECURITY_DMESG_RESTRICT > > bool "Restrict unprivileged access to the kernel syslog" > > default n > > -- > > 2.10.1 > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-f180.google.com (mail-ob0-f180.google.com [209.85.214.180]) by kanga.kvack.org (Postfix) with ESMTP id 6978F6B0009 for ; Fri, 26 Feb 2016 09:21:12 -0500 (EST) Received: by mail-ob0-f180.google.com with SMTP id ts10so79191711obc.1 for ; Fri, 26 Feb 2016 06:21:12 -0800 (PST) Received: from g2t4622.austin.hp.com (g2t4622.austin.hp.com. [15.73.212.79]) by mx.google.com with ESMTPS id e125si2877654oia.6.2016.02.26.06.21.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Feb 2016 06:21:11 -0800 (PST) From: Juerg Haefliger Subject: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 26 Feb 2016 15:21:07 +0100 Message-Id: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu, juerg.haefliger@hpe.com This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userland, unless explicitly requested by the kernel. Whenever a page destined for userland is allocated, it is unmapped from physmap. When such a page is reclaimed from userland, it is mapped back to physmap. Mapping/unmapping from physmap is accomplished by modifying the PTE permission bits to allow/disallow access to the page. Additional fields are added to the page struct for XPFO housekeeping. Specifically a flags field to distinguish user vs. kernel pages, a reference counter to track physmap map/unmap operations and a lock to protect the XPFO fields. Known issues/limitations: - Only supported on x86-64. - Only supports 4k pages. - Adds additional data to the page struct. - There are most likely some additional and legitimate uses cases where the kernel needs to access userspace. Those need to be identified and made XPFO-aware. - There's a performance impact if XPFO is turned on. Per the paper referenced below it's in the 1-3% ballpark. More performance testing wouldn't hurt. What tests to run though? Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 2 +- arch/x86/Kconfig.debug | 17 +++++ arch/x86/mm/Makefile | 2 + arch/x86/mm/init.c | 3 +- arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ block/blk-map.c | 7 +- include/linux/highmem.h | 23 +++++-- include/linux/mm_types.h | 4 ++ include/linux/xpfo.h | 88 ++++++++++++++++++++++++ lib/swiotlb.c | 3 +- mm/page_alloc.c | 7 +- 11 files changed, 323 insertions(+), 9 deletions(-) create mode 100644 arch/x86/mm/xpfo.c create mode 100644 include/linux/xpfo.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c46662f..9d32b4a 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 9b18ed9..1331da5 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT source "lib/Kconfig.debug" +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL + depends on X86_64 + select DEBUG_TLBFLUSH + ---help--- + This option offers protection against 'ret2dir' (kernel) attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, whenever page frames are freed/reclaimed, they + are mapped back to physmap. Special care is taken to minimize the + impact on performance by reducing TLB shootdowns and unnecessary page + zero fills. + + If in doubt, say "N". + config X86_VERBOSE_BOOTUP bool "Enable verbose x86 bootup info messages" default y diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index f9d38a4..8bf52b6 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MPX) += mpx.o + +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 493f541..27fc8a6 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -150,7 +150,8 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ + !defined(CONFIG_XPFO) /* * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. * This will simplify cpa(), which otherwise needs to support splitting diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c new file mode 100644 index 0000000..6bc24d3 --- /dev/null +++ b/arch/x86/mm/xpfo.c @@ -0,0 +1,176 @@ +/* + * Copyright (C) 2016 Brown University. All rights reserved. + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * + * Authors: + * Vasileios P. Kemerlis + * Juerg Haefliger + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include + +#include +#include + +#define TEST_XPFO_FLAG(flag, page) \ + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define SET_XPFO_FLAG(flag, page) \ + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define CLEAR_XPFO_FLAG(flag, page) \ + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +inline void xpfo_clear_zap(struct page *page, int order) +{ + int i; + + for (i = 0; i < (1 << order); i++) + CLEAR_XPFO_FLAG(zap, page + i); +} + +inline int xpfo_test_and_clear_zap(struct page *page) +{ + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); +} + +inline int xpfo_test_kernel(struct page *page) +{ + return TEST_XPFO_FLAG(kernel, page); +} + +inline int xpfo_test_user(struct page *page) +{ + return TEST_XPFO_FLAG(user, page); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, tlb_shoot = 0; + unsigned long kaddr; + + for (i = 0; i < (1 << order); i++) { + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || + TEST_XPFO_FLAG(user, page + i)); + + if (gfp & GFP_HIGHUSER) { + /* Initialize the xpfo lock and map counter */ + spin_lock_init(&(page + i)->xpfo.lock); + atomic_set(&(page + i)->xpfo.mapcount, 0); + + /* Mark it as a user page */ + SET_XPFO_FLAG(user_fp, page + i); + + /* + * Shoot the TLB if the page was previously allocated + * to kernel space + */ + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) + tlb_shoot = 1; + } else { + /* Mark it as a kernel page */ + SET_XPFO_FLAG(kernel, page + i); + } + } + + if (tlb_shoot) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + unsigned long kaddr; + + for (i = 0; i < (1 << order); i++) { + + /* The page frame was previously allocated to user space */ + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { + kaddr = (unsigned long)page_address(page + i); + + /* Clear the page and mark it accordingly */ + clear_page((void *)kaddr); + SET_XPFO_FLAG(zap, page + i); + + /* Map it back to kernel space */ + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + + /* No TLB update */ + } + + /* Clear the xpfo fast-path flag */ + CLEAR_XPFO_FLAG(user_fp, page + i); + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + unsigned long flags; + + /* The page is allocated to kernel space, so nothing to do */ + if (TEST_XPFO_FLAG(kernel, page)) + return; + + spin_lock_irqsave(&page->xpfo.lock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB update required. + */ + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && + TEST_XPFO_FLAG(user, page)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page->xpfo.lock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + unsigned long flags; + + /* The page is allocated to kernel space, so nothing to do */ + if (TEST_XPFO_FLAG(kernel, page)) + return; + + spin_lock_irqsave(&page->xpfo.lock, flags); + + /* + * The page frame is to be allocated back to user space. So unmap it + * from the kernel, update the TLB and mark it as a user page. + */ + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + SET_XPFO_FLAG(user, page); + } + + spin_unlock_irqrestore(&page->xpfo.lock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); diff --git a/block/blk-map.c b/block/blk-map.c index f565e11..b7b8302 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, prv.iov_len = iov.iov_len; } - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) + /* + * juergh: Temporary hack to force the use of a bounce buffer if XPFO + * is enabled. Results in an XPFO page fault otherwise. + */ + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || + IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f329..0ca9130 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); + pagefault_enable(); preempt_enable(); } @@ -133,7 +146,8 @@ do { \ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) { void *addr = kmap_atomic(page); - clear_user_page(addr, vaddr, page); + if (!xpfo_test_and_clear_zap(page)) + clear_user_page(addr, vaddr, page); kunmap_atomic(addr); } #endif @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, static inline void clear_highpage(struct page *page) { void *kaddr = kmap_atomic(page); - clear_page(kaddr); + if (!xpfo_test_and_clear_zap(page)) + clear_page(kaddr); kunmap_atomic(kaddr); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 624b78b..71c95aa 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -215,6 +216,9 @@ struct page { #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif +#ifdef CONFIG_XPFO + struct xpfo_info xpfo; +#endif } /* * The struct page can be forced to be double word aligned so that atomic ops diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 0000000..c4f0871 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,88 @@ +/* + * Copyright (C) 2016 Brown University. All rights reserved. + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * + * Authors: + * Vasileios P. Kemerlis + * Juerg Haefliger + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +/* + * XPFO page flags: + * + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag + * is used in the fast path, where the page is marked accordingly but *not* + * unmapped from the kernel. In most cases, the kernel will need access to the + * page immediately after its acquisition so an unnecessary mapping operation + * is avoided. + * + * PG_XPFO_user denotes that the page is destined for user space. This flag is + * used in the slow path, where the page needs to be mapped/unmapped when the + * kernel wants to access it. If a page is deallocated and this flag is set, + * the page is cleared and mapped back into the kernel. + * + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used + * for identifying pages that are first assigned to kernel space and then freed + * and mapped to user space. In such cases, an expensive TLB shootdown is + * necessary. Pages allocated to user space, freed, and subsequently allocated + * to user space again, require only local TLB invalidation. + * + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to + * avoid zapping pages multiple times. Whenever a page is freed and was + * previously mapped to user space, it needs to be zapped before mapped back + * in to the kernel. + */ + +enum xpfo_pageflags { + PG_XPFO_user_fp, + PG_XPFO_user, + PG_XPFO_kernel, + PG_XPFO_zap, +}; + +struct xpfo_info { + unsigned long flags; /* Flags for tracking the page's XPFO state */ + atomic_t mapcount; /* Counter for balancing page map/unmap + * requests. Only the first map request maps + * the page back to kernel space. Likewise, + * only the last unmap request unmaps the page. + */ + spinlock_t lock; /* Lock to serialize concurrent map/unmap + * requests. + */ +}; + +extern void xpfo_clear_zap(struct page *page, int order); +extern int xpfo_test_and_clear_zap(struct page *page); +extern int xpfo_test_kernel(struct page *page); +extern int xpfo_test_user(struct page *page); + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +#else /* ifdef CONFIG_XPFO */ + +static inline void xpfo_clear_zap(struct page *page, int order) { } +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } +static inline int xpfo_test_kernel(struct page *page) { return 0; } +static inline int xpfo_test_user(struct page *page) { return 0; } + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +#endif /* ifdef CONFIG_XPFO */ + +#endif /* ifndef _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 76f29ec..cf57ee9 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_test_user(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 838ca8bb..47b42a3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) } arch_free_page(page, order); kernel_map_pages(page, 1 << order, 0); + xpfo_free_page(page, order); return true; } @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, arch_alloc_page(page, order); kernel_map_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); if (gfp_flags & __GFP_ZERO) for (i = 0; i < (1 << order); i++) clear_highpage(page + i); + else + xpfo_clear_zap(page, order); if (order && (gfp_flags & __GFP_COMP)) prep_compound_page(page, order); @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + if (!cold && !xpfo_test_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); + pcp->count++; if (pcp->count >= pcp->high) { unsigned long batch = READ_ONCE(pcp->batch); -- 2.1.4 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by kanga.kvack.org (Postfix) with ESMTP id 1E2286B0005 for ; Mon, 29 Feb 2016 20:31:07 -0500 (EST) Received: by mail-qk0-f180.google.com with SMTP id s68so63907620qkh.3 for ; Mon, 29 Feb 2016 17:31:07 -0800 (PST) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id o206si1764256qho.27.2016.02.29.17.31.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Feb 2016 17:31:06 -0800 (PST) Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> From: Laura Abbott Message-ID: <56D4F0D6.2060308@redhat.com> Date: Mon, 29 Feb 2016 17:31:02 -0800 MIME-Version: 1.0 In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu, Kees Cook On 02/26/2016 06:21 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userland, unless explicitly requested by the > kernel. Whenever a page destined for userland is allocated, it is > unmapped from physmap. When such a page is reclaimed from userland, it is > mapped back to physmap. > > Mapping/unmapping from physmap is accomplished by modifying the PTE > permission bits to allow/disallow access to the page. > > Additional fields are added to the page struct for XPFO housekeeping. > Specifically a flags field to distinguish user vs. kernel pages, a > reference counter to track physmap map/unmap operations and a lock to > protect the XPFO fields. > > Known issues/limitations: > - Only supported on x86-64. > - Only supports 4k pages. > - Adds additional data to the page struct. > - There are most likely some additional and legitimate uses cases where > the kernel needs to access userspace. Those need to be identified and > made XPFO-aware. > - There's a performance impact if XPFO is turned on. Per the paper > referenced below it's in the 1-3% ballpark. More performance testing > wouldn't hurt. What tests to run though? > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > General note: Make sure to cc the x86 maintainers on the next version of the patch. I'd also recommend ccing the kernel hardening list (see the wiki page http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project for details) If you can find a way to break this up into x86 specific vs. generic patches that would be better. Perhaps move the Kconfig for XPFO to the generic Kconfig layer and make it depend on ARCH_HAS_XPFO? x86 can then select ARCH_HAS_XPFO as the last option. There also isn't much that's actually x86 specific here except for some of the page table manipulation functions and even those can probably be abstracted away. It would be good to get more of this out of x86 to let other arches take advantage of it. The arm64 implementation would look pretty similar if you save the old kernel mapping and restore it on free. > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 2 +- > arch/x86/Kconfig.debug | 17 +++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/init.c | 3 +- > arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ > block/blk-map.c | 7 +- > include/linux/highmem.h | 23 +++++-- > include/linux/mm_types.h | 4 ++ > include/linux/xpfo.h | 88 ++++++++++++++++++++++++ > lib/swiotlb.c | 3 +- > mm/page_alloc.c | 7 +- > 11 files changed, 323 insertions(+), 9 deletions(-) > create mode 100644 arch/x86/mm/xpfo.c > create mode 100644 include/linux/xpfo.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index c46662f..9d32b4a 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug > index 9b18ed9..1331da5 100644 > --- a/arch/x86/Kconfig.debug > +++ b/arch/x86/Kconfig.debug > @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT > > source "lib/Kconfig.debug" > > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on DEBUG_KERNEL > + depends on X86_64 > + select DEBUG_TLBFLUSH > + ---help--- > + This option offers protection against 'ret2dir' (kernel) attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, whenever page frames are freed/reclaimed, they > + are mapped back to physmap. Special care is taken to minimize the > + impact on performance by reducing TLB shootdowns and unnecessary page > + zero fills. > + > + If in doubt, say "N". > + > config X86_VERBOSE_BOOTUP > bool "Enable verbose x86 bootup info messages" > default y > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index f9d38a4..8bf52b6 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > + > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 493f541..27fc8a6 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -150,7 +150,8 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ > + !defined(CONFIG_XPFO) > /* > * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. > * This will simplify cpa(), which otherwise needs to support splitting > diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c > new file mode 100644 > index 0000000..6bc24d3 > --- /dev/null > +++ b/arch/x86/mm/xpfo.c > @@ -0,0 +1,176 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > + > +#include > +#include > + > +#define TEST_XPFO_FLAG(flag, page) \ > + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define SET_XPFO_FLAG(flag, page) \ > + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define CLEAR_XPFO_FLAG(flag, page) \ > + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ > + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +inline void xpfo_clear_zap(struct page *page, int order) > +{ > + int i; > + > + for (i = 0; i < (1 << order); i++) > + CLEAR_XPFO_FLAG(zap, page + i); > +} > + > +inline int xpfo_test_and_clear_zap(struct page *page) > +{ > + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); > +} > + > +inline int xpfo_test_kernel(struct page *page) > +{ > + return TEST_XPFO_FLAG(kernel, page); > +} > + > +inline int xpfo_test_user(struct page *page) > +{ > + return TEST_XPFO_FLAG(user, page); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, tlb_shoot = 0; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || > + TEST_XPFO_FLAG(user, page + i)); > + > + if (gfp & GFP_HIGHUSER) { This check doesn't seem right. If the GFP flags have _any_ in common with GFP_HIGHUSER it will be marked as a user page so GFP_KERNEL will be marked as well. > + /* Initialize the xpfo lock and map counter */ > + spin_lock_init(&(page + i)->xpfo.lock); This is initializing the spin_lock every time. That's not really necessary. > + atomic_set(&(page + i)->xpfo.mapcount, 0); > + > + /* Mark it as a user page */ > + SET_XPFO_FLAG(user_fp, page + i); > + > + /* > + * Shoot the TLB if the page was previously allocated > + * to kernel space > + */ > + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) > + tlb_shoot = 1; > + } else { > + /* Mark it as a kernel page */ > + SET_XPFO_FLAG(kernel, page + i); > + } > + } > + > + if (tlb_shoot) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + > + /* The page frame was previously allocated to user space */ > + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { > + kaddr = (unsigned long)page_address(page + i); > + > + /* Clear the page and mark it accordingly */ > + clear_page((void *)kaddr); Clearing the page isn't related to XPFO. There's other work ongoing to do clearing of the page on free. > + SET_XPFO_FLAG(zap, page + i); > + > + /* Map it back to kernel space */ > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + > + /* No TLB update */ > + } > + > + /* Clear the xpfo fast-path flag */ > + CLEAR_XPFO_FLAG(user_fp, page + i); > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB update required. > + */ > + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && > + TEST_XPFO_FLAG(user, page)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page frame is to be allocated back to user space. So unmap it > + * from the kernel, update the TLB and mark it as a user page. > + */ > + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && > + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + SET_XPFO_FLAG(user, page); > + } > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); I'm confused by the checks in kmap/kunmap here. It looks like once the page is allocated there is no changing of flags between user and kernel mode so the checks for if the page is user seem redundant. > diff --git a/block/blk-map.c b/block/blk-map.c > index f565e11..b7b8302 100644 > --- a/block/blk-map.c > +++ b/block/blk-map.c > @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, > prv.iov_len = iov.iov_len; > } > > - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) > + /* > + * juergh: Temporary hack to force the use of a bounce buffer if XPFO > + * is enabled. Results in an XPFO page fault otherwise. > + */ > + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || > + IS_ENABLED(CONFIG_XPFO)) > bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); > else > bio = bio_map_user_iov(q, iter, gfp_mask); > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f329..0ca9130 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > + > pagefault_enable(); > preempt_enable(); > } > @@ -133,7 +146,8 @@ do { \ > static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > { > void *addr = kmap_atomic(page); > - clear_user_page(addr, vaddr, page); > + if (!xpfo_test_and_clear_zap(page)) > + clear_user_page(addr, vaddr, page); > kunmap_atomic(addr); > } > #endif > @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > static inline void clear_highpage(struct page *page) > { > void *kaddr = kmap_atomic(page); > - clear_page(kaddr); > + if (!xpfo_test_and_clear_zap(page)) > + clear_page(kaddr); > kunmap_atomic(kaddr); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 624b78b..71c95aa 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -215,6 +216,9 @@ struct page { > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > int _last_cpupid; > #endif > +#ifdef CONFIG_XPFO > + struct xpfo_info xpfo; > +#endif > } > /* > * The struct page can be forced to be double word aligned so that atomic ops > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 0000000..c4f0871 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,88 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +/* > + * XPFO page flags: > + * > + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag > + * is used in the fast path, where the page is marked accordingly but *not* > + * unmapped from the kernel. In most cases, the kernel will need access to the > + * page immediately after its acquisition so an unnecessary mapping operation > + * is avoided. > + * > + * PG_XPFO_user denotes that the page is destined for user space. This flag is > + * used in the slow path, where the page needs to be mapped/unmapped when the > + * kernel wants to access it. If a page is deallocated and this flag is set, > + * the page is cleared and mapped back into the kernel. > + * > + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used > + * for identifying pages that are first assigned to kernel space and then freed > + * and mapped to user space. In such cases, an expensive TLB shootdown is > + * necessary. Pages allocated to user space, freed, and subsequently allocated > + * to user space again, require only local TLB invalidation. > + * > + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to > + * avoid zapping pages multiple times. Whenever a page is freed and was > + * previously mapped to user space, it needs to be zapped before mapped back > + * in to the kernel. > + */ 'zap' doesn't really indicate what is actually happening with the page. Can you be a bit more descriptive about what this actually does? > + > +enum xpfo_pageflags { > + PG_XPFO_user_fp, > + PG_XPFO_user, > + PG_XPFO_kernel, > + PG_XPFO_zap, > +}; > + > +struct xpfo_info { > + unsigned long flags; /* Flags for tracking the page's XPFO state */ > + atomic_t mapcount; /* Counter for balancing page map/unmap > + * requests. Only the first map request maps > + * the page back to kernel space. Likewise, > + * only the last unmap request unmaps the page. > + */ > + spinlock_t lock; /* Lock to serialize concurrent map/unmap > + * requests. > + */ > +}; Can you change this to use the page_ext implementation? See what mm/page_owner.c does. This might lessen the impact of the extra page metadata. This metadata still feels like a copy of what mm/highmem.c is trying to do though. > + > +extern void xpfo_clear_zap(struct page *page, int order); > +extern int xpfo_test_and_clear_zap(struct page *page); > +extern int xpfo_test_kernel(struct page *page); > +extern int xpfo_test_user(struct page *page); > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +#else /* ifdef CONFIG_XPFO */ > + > +static inline void xpfo_clear_zap(struct page *page, int order) { } > +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } > +static inline int xpfo_test_kernel(struct page *page) { return 0; } > +static inline int xpfo_test_user(struct page *page) { return 0; } > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +#endif /* ifdef CONFIG_XPFO */ > + > +#endif /* ifndef _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 76f29ec..cf57ee9 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_test_user(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 838ca8bb..47b42a3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) > } > arch_free_page(page, order); > kernel_map_pages(page, 1 << order, 0); > + xpfo_free_page(page, order); > > return true; > } > @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, > arch_alloc_page(page, order); > kernel_map_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > > if (gfp_flags & __GFP_ZERO) > for (i = 0; i < (1 << order); i++) > clear_highpage(page + i); > + else > + xpfo_clear_zap(page, order); > > if (order && (gfp_flags & __GFP_COMP)) > prep_compound_page(page, order); > @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) > } > > pcp = &this_cpu_ptr(zone->pageset)->pcp; > - if (!cold) > + if (!cold && !xpfo_test_kernel(page)) > list_add(&page->lru, &pcp->lists[migratetype]); > else > list_add_tail(&page->lru, &pcp->lists[migratetype]); > + What's the advantage of this? > pcp->count++; > if (pcp->count >= pcp->high) { > unsigned long batch = READ_ONCE(pcp->batch); > Thanks, Laura -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f172.google.com (mail-pf0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id BEC446B0005 for ; Mon, 29 Feb 2016 21:10:35 -0500 (EST) Received: by mail-pf0-f172.google.com with SMTP id w128so58015977pfb.2 for ; Mon, 29 Feb 2016 18:10:35 -0800 (PST) Received: from mail-pa0-x234.google.com (mail-pa0-x234.google.com. [2607:f8b0:400e:c03::234]) by mx.google.com with ESMTPS id 72si31535215pfi.35.2016.02.29.18.10.34 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 29 Feb 2016 18:10:34 -0800 (PST) Received: by mail-pa0-x234.google.com with SMTP id fy10so102238510pac.1 for ; Mon, 29 Feb 2016 18:10:34 -0800 (PST) Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> From: Balbir Singh Message-ID: <56D4FA15.9060700@gmail.com> Date: Tue, 1 Mar 2016 13:10:29 +1100 MIME-Version: 1.0 In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu On 27/02/16 01:21, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userland, unless explicitly requested by the > kernel. Whenever a page destined for userland is allocated, it is > unmapped from physmap. When such a page is reclaimed from userland, it is > mapped back to physmap. physmap == xen physmap? Please clarify > Mapping/unmapping from physmap is accomplished by modifying the PTE > permission bits to allow/disallow access to the page. > > Additional fields are added to the page struct for XPFO housekeeping. > Specifically a flags field to distinguish user vs. kernel pages, a > reference counter to track physmap map/unmap operations and a lock to > protect the XPFO fields. > > Known issues/limitations: > - Only supported on x86-64. Is it due to lack of porting or a design limitation? > - Only supports 4k pages. > - Adds additional data to the page struct. > - There are most likely some additional and legitimate uses cases where > the kernel needs to access userspace. Those need to be identified and > made XPFO-aware. Why not build an audit mode for it? > - There's a performance impact if XPFO is turned on. Per the paper > referenced below it's in the 1-3% ballpark. More performance testing > wouldn't hurt. What tests to run though? > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger This patch needs to be broken down into smaller patches - a series > --- > arch/x86/Kconfig | 2 +- > arch/x86/Kconfig.debug | 17 +++++ > arch/x86/mm/Makefile | 2 + > arch/x86/mm/init.c | 3 +- > arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ > block/blk-map.c | 7 +- > include/linux/highmem.h | 23 +++++-- > include/linux/mm_types.h | 4 ++ > include/linux/xpfo.h | 88 ++++++++++++++++++++++++ > lib/swiotlb.c | 3 +- > mm/page_alloc.c | 7 +- > 11 files changed, 323 insertions(+), 9 deletions(-) > create mode 100644 arch/x86/mm/xpfo.c > create mode 100644 include/linux/xpfo.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index c46662f..9d32b4a 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug > index 9b18ed9..1331da5 100644 > --- a/arch/x86/Kconfig.debug > +++ b/arch/x86/Kconfig.debug > @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT > > source "lib/Kconfig.debug" > > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on DEBUG_KERNEL > + depends on X86_64 > + select DEBUG_TLBFLUSH > + ---help--- > + This option offers protection against 'ret2dir' (kernel) attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, whenever page frames are freed/reclaimed, they > + are mapped back to physmap. Special care is taken to minimize the > + impact on performance by reducing TLB shootdowns and unnecessary page > + zero fills. > + > + If in doubt, say "N". > + > config X86_VERBOSE_BOOTUP > bool "Enable verbose x86 bootup info messages" > default y > diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile > index f9d38a4..8bf52b6 100644 > --- a/arch/x86/mm/Makefile > +++ b/arch/x86/mm/Makefile > @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o > obj-$(CONFIG_NUMA_EMU) += numa_emulation.o > > obj-$(CONFIG_X86_INTEL_MPX) += mpx.o > + > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 493f541..27fc8a6 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -150,7 +150,8 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ > + !defined(CONFIG_XPFO) > /* > * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. > * This will simplify cpa(), which otherwise needs to support splitting > diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c > new file mode 100644 > index 0000000..6bc24d3 > --- /dev/null > +++ b/arch/x86/mm/xpfo.c > @@ -0,0 +1,176 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > + > +#include > +#include > + > +#define TEST_XPFO_FLAG(flag, page) \ > + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define SET_XPFO_FLAG(flag, page) \ > + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define CLEAR_XPFO_FLAG(flag, page) \ > + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ > + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +inline void xpfo_clear_zap(struct page *page, int order) > +{ > + int i; > + > + for (i = 0; i < (1 << order); i++) > + CLEAR_XPFO_FLAG(zap, page + i); > +} > + > +inline int xpfo_test_and_clear_zap(struct page *page) > +{ > + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); > +} > + > +inline int xpfo_test_kernel(struct page *page) > +{ > + return TEST_XPFO_FLAG(kernel, page); > +} > + > +inline int xpfo_test_user(struct page *page) > +{ > + return TEST_XPFO_FLAG(user, page); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, tlb_shoot = 0; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || > + TEST_XPFO_FLAG(user, page + i)); > + > + if (gfp & GFP_HIGHUSER) { Why GFP_HIGHUSER? > + /* Initialize the xpfo lock and map counter */ > + spin_lock_init(&(page + i)->xpfo.lock); > + atomic_set(&(page + i)->xpfo.mapcount, 0); > + > + /* Mark it as a user page */ > + SET_XPFO_FLAG(user_fp, page + i); > + > + /* > + * Shoot the TLB if the page was previously allocated > + * to kernel space > + */ > + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) > + tlb_shoot = 1; > + } else { > + /* Mark it as a kernel page */ > + SET_XPFO_FLAG(kernel, page + i); > + } > + } > + > + if (tlb_shoot) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + unsigned long kaddr; > + > + for (i = 0; i < (1 << order); i++) { > + > + /* The page frame was previously allocated to user space */ > + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { > + kaddr = (unsigned long)page_address(page + i); > + > + /* Clear the page and mark it accordingly */ > + clear_page((void *)kaddr); > + SET_XPFO_FLAG(zap, page + i); > + > + /* Map it back to kernel space */ > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + > + /* No TLB update */ > + } > + > + /* Clear the xpfo fast-path flag */ > + CLEAR_XPFO_FLAG(user_fp, page + i); > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB update required. > + */ > + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && > + TEST_XPFO_FLAG(user, page)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + unsigned long flags; > + > + /* The page is allocated to kernel space, so nothing to do */ > + if (TEST_XPFO_FLAG(kernel, page)) > + return; > + > + spin_lock_irqsave(&page->xpfo.lock, flags); > + > + /* > + * The page frame is to be allocated back to user space. So unmap it > + * from the kernel, update the TLB and mark it as a user page. > + */ > + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && > + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + SET_XPFO_FLAG(user, page); > + } > + > + spin_unlock_irqrestore(&page->xpfo.lock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > diff --git a/block/blk-map.c b/block/blk-map.c > index f565e11..b7b8302 100644 > --- a/block/blk-map.c > +++ b/block/blk-map.c > @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, > prv.iov_len = iov.iov_len; > } > > - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) > + /* > + * juergh: Temporary hack to force the use of a bounce buffer if XPFO > + * is enabled. Results in an XPFO page fault otherwise. > + */ This does look like it might add a bunch of overhead > + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || > + IS_ENABLED(CONFIG_XPFO)) > bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); > else > bio = bio_map_user_iov(q, iter, gfp_mask); > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f329..0ca9130 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > + > pagefault_enable(); > preempt_enable(); > } > @@ -133,7 +146,8 @@ do { \ > static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > { > void *addr = kmap_atomic(page); > - clear_user_page(addr, vaddr, page); > + if (!xpfo_test_and_clear_zap(page)) > + clear_user_page(addr, vaddr, page); > kunmap_atomic(addr); > } > #endif > @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, > static inline void clear_highpage(struct page *page) > { > void *kaddr = kmap_atomic(page); > - clear_page(kaddr); > + if (!xpfo_test_and_clear_zap(page)) > + clear_page(kaddr); > kunmap_atomic(kaddr); > } > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 624b78b..71c95aa 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -215,6 +216,9 @@ struct page { > #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS > int _last_cpupid; > #endif > +#ifdef CONFIG_XPFO > + struct xpfo_info xpfo; > +#endif > } > /* > * The struct page can be forced to be double word aligned so that atomic ops > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 0000000..c4f0871 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,88 @@ > +/* > + * Copyright (C) 2016 Brown University. All rights reserved. > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * > + * Authors: > + * Vasileios P. Kemerlis > + * Juerg Haefliger > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +/* > + * XPFO page flags: > + * > + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag > + * is used in the fast path, where the page is marked accordingly but *not* > + * unmapped from the kernel. In most cases, the kernel will need access to the > + * page immediately after its acquisition so an unnecessary mapping operation > + * is avoided. > + * > + * PG_XPFO_user denotes that the page is destined for user space. This flag is > + * used in the slow path, where the page needs to be mapped/unmapped when the > + * kernel wants to access it. If a page is deallocated and this flag is set, > + * the page is cleared and mapped back into the kernel. > + * > + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used > + * for identifying pages that are first assigned to kernel space and then freed > + * and mapped to user space. In such cases, an expensive TLB shootdown is > + * necessary. Pages allocated to user space, freed, and subsequently allocated > + * to user space again, require only local TLB invalidation. > + * > + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to > + * avoid zapping pages multiple times. Whenever a page is freed and was > + * previously mapped to user space, it needs to be zapped before mapped back > + * in to the kernel. > + */ > + > +enum xpfo_pageflags { > + PG_XPFO_user_fp, > + PG_XPFO_user, > + PG_XPFO_kernel, > + PG_XPFO_zap, > +}; > + > +struct xpfo_info { > + unsigned long flags; /* Flags for tracking the page's XPFO state */ > + atomic_t mapcount; /* Counter for balancing page map/unmap > + * requests. Only the first map request maps > + * the page back to kernel space. Likewise, > + * only the last unmap request unmaps the page. > + */ > + spinlock_t lock; /* Lock to serialize concurrent map/unmap > + * requests. > + */ > +}; > + > +extern void xpfo_clear_zap(struct page *page, int order); > +extern int xpfo_test_and_clear_zap(struct page *page); > +extern int xpfo_test_kernel(struct page *page); > +extern int xpfo_test_user(struct page *page); > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +#else /* ifdef CONFIG_XPFO */ > + > +static inline void xpfo_clear_zap(struct page *page, int order) { } > +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } > +static inline int xpfo_test_kernel(struct page *page) { return 0; } > +static inline int xpfo_test_user(struct page *page) { return 0; } > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +#endif /* ifdef CONFIG_XPFO */ > + > +#endif /* ifndef _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 76f29ec..cf57ee9 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_test_user(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 838ca8bb..47b42a3 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) > } > arch_free_page(page, order); > kernel_map_pages(page, 1 << order, 0); > + xpfo_free_page(page, order); > > return true; > } > @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, > arch_alloc_page(page, order); > kernel_map_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > > if (gfp_flags & __GFP_ZERO) > for (i = 0; i < (1 << order); i++) > clear_highpage(page + i); > + else > + xpfo_clear_zap(page, order); > > if (order && (gfp_flags & __GFP_COMP)) > prep_compound_page(page, order); > @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) > } > > pcp = &this_cpu_ptr(zone->pageset)->pcp; > - if (!cold) > + if (!cold && !xpfo_test_kernel(page)) > list_add(&page->lru, &pcp->lists[migratetype]); > else > list_add_tail(&page->lru, &pcp->lists[migratetype]); > + > pcp->count++; > if (pcp->count >= pcp->high) { > unsigned long batch = READ_ONCE(pcp->batch); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f172.google.com (mail-pf0-f172.google.com [209.85.192.172]) by kanga.kvack.org (Postfix) with ESMTP id 0615B6B0005 for ; Mon, 21 Mar 2016 04:38:10 -0400 (EDT) Received: by mail-pf0-f172.google.com with SMTP id n5so256838813pfn.2 for ; Mon, 21 Mar 2016 01:38:09 -0700 (PDT) Received: from g1t6213.austin.hp.com (g1t6213.austin.hp.com. [15.73.96.121]) by mx.google.com with ESMTPS id n69si12069371pfi.104.2016.03.21.01.38.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Mar 2016 01:38:08 -0700 (PDT) From: Juerg Haefliger Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4F0D6.2060308@redhat.com> Message-ID: <56EFB2DB.3090602@hpe.com> Date: Mon, 21 Mar 2016 09:37:47 +0100 MIME-Version: 1.0 In-Reply-To: <56D4F0D6.2060308@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Laura Abbott , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu, Kees Cook Hi Laura, Sorry for the late reply. I was on FTO and then traveling for the past couple of days. On 03/01/2016 02:31 AM, Laura Abbott wrote: > On 02/26/2016 06:21 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userland, unless explicitly requested by the >> kernel. Whenever a page destined for userland is allocated, it is >> unmapped from physmap. When such a page is reclaimed from userland, it is >> mapped back to physmap. >> >> Mapping/unmapping from physmap is accomplished by modifying the PTE >> permission bits to allow/disallow access to the page. >> >> Additional fields are added to the page struct for XPFO housekeeping. >> Specifically a flags field to distinguish user vs. kernel pages, a >> reference counter to track physmap map/unmap operations and a lock to >> protect the XPFO fields. >> >> Known issues/limitations: >> - Only supported on x86-64. >> - Only supports 4k pages. >> - Adds additional data to the page struct. >> - There are most likely some additional and legitimate uses cases where >> the kernel needs to access userspace. Those need to be identified and >> made XPFO-aware. >> - There's a performance impact if XPFO is turned on. Per the paper >> referenced below it's in the 1-3% ballpark. More performance testing >> wouldn't hurt. What tests to run though? >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >> > > General note: Make sure to cc the x86 maintainers on the next version of > the patch. I'd also recommend ccing the kernel hardening list (see the wiki > page http://kernsec.org/wiki/index.php/Kernel_Self_Protection_Project for > details) Good idea. Thanks for the suggestion. > If you can find a way to break this up into x86 specific vs. generic patches > that would be better. Perhaps move the Kconfig for XPFO to the generic > Kconfig layer and make it depend on ARCH_HAS_XPFO? x86 can then select > ARCH_HAS_XPFO as the last option. Good idea. > There also isn't much that's actually x86 specific here except for > some of the page table manipulation functions and even those can probably > be abstracted away. It would be good to get more of this out of x86 to > let other arches take advantage of it. The arm64 implementation would > look pretty similar if you save the old kernel mapping and restore > it on free. OK. I need to familiarize myself with ARM to figure out which pieces can move out of the arch subdir. > >> Suggested-by: Vasileios P. Kemerlis >> Signed-off-by: Juerg Haefliger >> --- >> arch/x86/Kconfig | 2 +- >> arch/x86/Kconfig.debug | 17 +++++ >> arch/x86/mm/Makefile | 2 + >> arch/x86/mm/init.c | 3 +- >> arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ >> block/blk-map.c | 7 +- >> include/linux/highmem.h | 23 +++++-- >> include/linux/mm_types.h | 4 ++ >> include/linux/xpfo.h | 88 ++++++++++++++++++++++++ >> lib/swiotlb.c | 3 +- >> mm/page_alloc.c | 7 +- >> 11 files changed, 323 insertions(+), 9 deletions(-) >> create mode 100644 arch/x86/mm/xpfo.c >> create mode 100644 include/linux/xpfo.h >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index c46662f..9d32b4a 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT >> >> config X86_DIRECT_GBPAGES >> def_bool y >> - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK >> + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO >> ---help--- >> Certain kernel features effectively disable kernel >> linear 1 GB mappings (even if the CPU otherwise >> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug >> index 9b18ed9..1331da5 100644 >> --- a/arch/x86/Kconfig.debug >> +++ b/arch/x86/Kconfig.debug >> @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT >> >> source "lib/Kconfig.debug" >> >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on DEBUG_KERNEL >> + depends on X86_64 >> + select DEBUG_TLBFLUSH >> + ---help--- >> + This option offers protection against 'ret2dir' (kernel) attacks. >> + When enabled, every time a page frame is allocated to user space, it >> + is unmapped from the direct mapped RAM region in kernel space >> + (physmap). Similarly, whenever page frames are freed/reclaimed, they >> + are mapped back to physmap. Special care is taken to minimize the >> + impact on performance by reducing TLB shootdowns and unnecessary page >> + zero fills. >> + >> + If in doubt, say "N". >> + >> config X86_VERBOSE_BOOTUP >> bool "Enable verbose x86 bootup info messages" >> default y >> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile >> index f9d38a4..8bf52b6 100644 >> --- a/arch/x86/mm/Makefile >> +++ b/arch/x86/mm/Makefile >> @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o >> obj-$(CONFIG_NUMA_EMU) += numa_emulation.o >> >> obj-$(CONFIG_X86_INTEL_MPX) += mpx.o >> + >> +obj-$(CONFIG_XPFO) += xpfo.o >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >> index 493f541..27fc8a6 100644 >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -150,7 +150,8 @@ static int page_size_mask; >> >> static void __init probe_page_size_mask(void) >> { >> -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) >> +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ >> + !defined(CONFIG_XPFO) >> /* >> * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. >> * This will simplify cpa(), which otherwise needs to support splitting >> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c >> new file mode 100644 >> index 0000000..6bc24d3 >> --- /dev/null >> +++ b/arch/x86/mm/xpfo.c >> @@ -0,0 +1,176 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#include >> +#include >> + >> +#include >> +#include >> + >> +#define TEST_XPFO_FLAG(flag, page) \ >> + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define SET_XPFO_FLAG(flag, page) \ >> + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define CLEAR_XPFO_FLAG(flag, page) \ >> + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ >> + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte = lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level != PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); >> +} >> + >> +inline void xpfo_clear_zap(struct page *page, int order) >> +{ >> + int i; >> + >> + for (i = 0; i < (1 << order); i++) >> + CLEAR_XPFO_FLAG(zap, page + i); >> +} >> + >> +inline int xpfo_test_and_clear_zap(struct page *page) >> +{ >> + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); >> +} >> + >> +inline int xpfo_test_kernel(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(kernel, page); >> +} >> + >> +inline int xpfo_test_user(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(user, page); >> +} >> + >> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) >> +{ >> + int i, tlb_shoot = 0; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || >> + TEST_XPFO_FLAG(user, page + i)); >> + >> + if (gfp & GFP_HIGHUSER) { > > This check doesn't seem right. If the GFP flags have _any_ in common with > GFP_HIGHUSER it will be marked as a user page so GFP_KERNEL will be marked > as well. Duh. You're right. I broke this when I cleaned up the original patch. It should be: (gfp & GFP_HIGHUSER) == GFP_HIGHUSER >> + /* Initialize the xpfo lock and map counter */ >> + spin_lock_init(&(page + i)->xpfo.lock); > > This is initializing the spin_lock every time. That's not really necessary. Correct. The initialization should probably be done when the page struct is first allocated. But I haven't been able to find that piece of code quickly. Will look again. >> + atomic_set(&(page + i)->xpfo.mapcount, 0); >> + >> + /* Mark it as a user page */ >> + SET_XPFO_FLAG(user_fp, page + i); >> + >> + /* >> + * Shoot the TLB if the page was previously allocated >> + * to kernel space >> + */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) >> + tlb_shoot = 1; >> + } else { >> + /* Mark it as a kernel page */ >> + SET_XPFO_FLAG(kernel, page + i); >> + } >> + } >> + >> + if (tlb_shoot) { >> + kaddr = (unsigned long)page_address(page); >> + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * >> + PAGE_SIZE); >> + } >> +} >> + >> +void xpfo_free_page(struct page *page, int order) >> +{ >> + int i; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + >> + /* The page frame was previously allocated to user space */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >> + kaddr = (unsigned long)page_address(page + i); >> + >> + /* Clear the page and mark it accordingly */ >> + clear_page((void *)kaddr); > > Clearing the page isn't related to XPFO. There's other work ongoing to > do clearing of the page on free. It's not strictly related to XPFO but adds another layer of security. Do you happen to have a pointer to the ongoing work that you mentioned? >> + SET_XPFO_FLAG(zap, page + i); >> + >> + /* Map it back to kernel space */ >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + /* No TLB update */ >> + } >> + >> + /* Clear the xpfo fast-path flag */ >> + CLEAR_XPFO_FLAG(user_fp, page + i); >> + } >> +} >> + >> +void xpfo_kmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page was previously allocated to user space, so map it back >> + * into the kernel. No TLB update required. >> + */ >> + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && >> + TEST_XPFO_FLAG(user, page)) >> + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kmap); >> + >> +void xpfo_kunmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page frame is to be allocated back to user space. So unmap it >> + * from the kernel, update the TLB and mark it as a user page. >> + */ >> + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && >> + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >> + SET_XPFO_FLAG(user, page); >> + } >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kunmap); > > I'm confused by the checks in kmap/kunmap here. It looks like once the > page is allocated there is no changing of flags between user and > kernel mode so the checks for if the page is user seem redundant. Hmm... I think you're partially right. In xpfo_kmap we need to distinguish between user and user_fp, so the check for 'user' is necessary. However, in kunmap we can drop the check for 'user' || 'user_fp'. >> diff --git a/block/blk-map.c b/block/blk-map.c >> index f565e11..b7b8302 100644 >> --- a/block/blk-map.c >> +++ b/block/blk-map.c >> @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct >> request *rq, >> prv.iov_len = iov.iov_len; >> } >> >> - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) >> + /* >> + * juergh: Temporary hack to force the use of a bounce buffer if XPFO >> + * is enabled. Results in an XPFO page fault otherwise. >> + */ >> + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || >> + IS_ENABLED(CONFIG_XPFO)) >> bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); >> else >> bio = bio_map_user_iov(q, iter, gfp_mask); >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h >> index bb3f329..0ca9130 100644 >> --- a/include/linux/highmem.h >> +++ b/include/linux/highmem.h >> @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) >> #ifndef ARCH_HAS_KMAP >> static inline void *kmap(struct page *page) >> { >> + void *kaddr; >> + >> might_sleep(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> >> static inline void kunmap(struct page *page) >> { >> + xpfo_kunmap(page_address(page), page); >> } >> >> static inline void *kmap_atomic(struct page *page) >> { >> + void *kaddr; >> + >> preempt_disable(); >> pagefault_disable(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> #define kmap_atomic_prot(page, prot) kmap_atomic(page) >> >> static inline void __kunmap_atomic(void *addr) >> { >> + xpfo_kunmap(addr, virt_to_page(addr)); >> + >> pagefault_enable(); >> preempt_enable(); >> } >> @@ -133,7 +146,8 @@ do >> { \ >> static inline void clear_user_highpage(struct page *page, unsigned long vaddr) >> { >> void *addr = kmap_atomic(page); >> - clear_user_page(addr, vaddr, page); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_user_page(addr, vaddr, page); >> kunmap_atomic(addr); >> } >> #endif >> @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct >> *vma, >> static inline void clear_highpage(struct page *page) >> { >> void *kaddr = kmap_atomic(page); >> - clear_page(kaddr); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_page(kaddr); >> kunmap_atomic(kaddr); >> } >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 624b78b..71c95aa 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -215,6 +216,9 @@ struct page { >> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS >> int _last_cpupid; >> #endif >> +#ifdef CONFIG_XPFO >> + struct xpfo_info xpfo; >> +#endif >> } >> /* >> * The struct page can be forced to be double word aligned so that atomic ops >> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h >> new file mode 100644 >> index 0000000..c4f0871 >> --- /dev/null >> +++ b/include/linux/xpfo.h >> @@ -0,0 +1,88 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#ifndef _LINUX_XPFO_H >> +#define _LINUX_XPFO_H >> + >> +#ifdef CONFIG_XPFO >> + >> +/* >> + * XPFO page flags: >> + * >> + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag >> + * is used in the fast path, where the page is marked accordingly but *not* >> + * unmapped from the kernel. In most cases, the kernel will need access to the >> + * page immediately after its acquisition so an unnecessary mapping operation >> + * is avoided. >> + * >> + * PG_XPFO_user denotes that the page is destined for user space. This flag is >> + * used in the slow path, where the page needs to be mapped/unmapped when the >> + * kernel wants to access it. If a page is deallocated and this flag is set, >> + * the page is cleared and mapped back into the kernel. >> + * >> + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used >> + * for identifying pages that are first assigned to kernel space and then freed >> + * and mapped to user space. In such cases, an expensive TLB shootdown is >> + * necessary. Pages allocated to user space, freed, and subsequently allocated >> + * to user space again, require only local TLB invalidation. >> + * >> + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to >> + * avoid zapping pages multiple times. Whenever a page is freed and was >> + * previously mapped to user space, it needs to be zapped before mapped back >> + * in to the kernel. >> + */ > > 'zap' doesn't really indicate what is actually happening with the page. Can you > be a bit more descriptive about what this actually does? It means that the page has been cleared at the time it was released back to the free pool. To prevent multiple expensive cleaning operations. But this might go away because of the ongoing work of sanitizing pages that you mentioned. >> + >> +enum xpfo_pageflags { >> + PG_XPFO_user_fp, >> + PG_XPFO_user, >> + PG_XPFO_kernel, >> + PG_XPFO_zap, >> +}; >> + >> +struct xpfo_info { >> + unsigned long flags; /* Flags for tracking the page's XPFO state */ >> + atomic_t mapcount; /* Counter for balancing page map/unmap >> + * requests. Only the first map request maps >> + * the page back to kernel space. Likewise, >> + * only the last unmap request unmaps the page. >> + */ >> + spinlock_t lock; /* Lock to serialize concurrent map/unmap >> + * requests. >> + */ >> +}; > > Can you change this to use the page_ext implementation? See what > mm/page_owner.c does. This might lessen the impact of the extra > page metadata. This metadata still feels like a copy of what > mm/highmem.c is trying to do though. I'll look into that, thanks for the pointer. >> + >> +extern void xpfo_clear_zap(struct page *page, int order); >> +extern int xpfo_test_and_clear_zap(struct page *page); >> +extern int xpfo_test_kernel(struct page *page); >> +extern int xpfo_test_user(struct page *page); >> + >> +extern void xpfo_kmap(void *kaddr, struct page *page); >> +extern void xpfo_kunmap(void *kaddr, struct page *page); >> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); >> +extern void xpfo_free_page(struct page *page, int order); >> + >> +#else /* ifdef CONFIG_XPFO */ >> + >> +static inline void xpfo_clear_zap(struct page *page, int order) { } >> +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } >> +static inline int xpfo_test_kernel(struct page *page) { return 0; } >> +static inline int xpfo_test_user(struct page *page) { return 0; } >> + >> +static inline void xpfo_kmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } >> +static inline void xpfo_free_page(struct page *page, int order) { } >> + >> +#endif /* ifdef CONFIG_XPFO */ >> + >> +#endif /* ifndef _LINUX_XPFO_H */ >> diff --git a/lib/swiotlb.c b/lib/swiotlb.c >> index 76f29ec..cf57ee9 100644 >> --- a/lib/swiotlb.c >> +++ b/lib/swiotlb.c >> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, >> phys_addr_t tlb_addr, >> { >> unsigned long pfn = PFN_DOWN(orig_addr); >> unsigned char *vaddr = phys_to_virt(tlb_addr); >> + struct page *page = pfn_to_page(pfn); >> >> - if (PageHighMem(pfn_to_page(pfn))) { >> + if (PageHighMem(page) || xpfo_test_user(page)) { >> /* The buffer does not have a mapping. Map it in and copy */ >> unsigned int offset = orig_addr & ~PAGE_MASK; >> char *buffer; >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 838ca8bb..47b42a3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, >> unsigned int order) >> } >> arch_free_page(page, order); >> kernel_map_pages(page, 1 << order, 0); >> + xpfo_free_page(page, order); >> >> return true; >> } >> @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned >> int order, gfp_t gfp_flags, >> arch_alloc_page(page, order); >> kernel_map_pages(page, 1 << order, 1); >> kasan_alloc_pages(page, order); >> + xpfo_alloc_page(page, order, gfp_flags); >> >> if (gfp_flags & __GFP_ZERO) >> for (i = 0; i < (1 << order); i++) >> clear_highpage(page + i); >> + else >> + xpfo_clear_zap(page, order); >> >> if (order && (gfp_flags & __GFP_COMP)) >> prep_compound_page(page, order); >> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >> } >> >> pcp = &this_cpu_ptr(zone->pageset)->pcp; >> - if (!cold) >> + if (!cold && !xpfo_test_kernel(page)) >> list_add(&page->lru, &pcp->lists[migratetype]); >> else >> list_add_tail(&page->lru, &pcp->lists[migratetype]); >> + > > What's the advantage of this? Allocating a page to userspace that was previously allocated to kernel space requires an expensive TLB shootdown. The above will put previously kernel-allocated pages in the cold page cache to postpone their allocation as long as possible to minimize TLB shootdowns. >> pcp->count++; >> if (pcp->count >= pcp->high) { >> unsigned long batch = READ_ONCE(pcp->batch); >> Thanks for the review and comments! It's highly appreciated. ...Juerg > Thanks, > Laura -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by kanga.kvack.org (Postfix) with ESMTP id B11136B0253 for ; Mon, 21 Mar 2016 04:44:58 -0400 (EDT) Received: by mail-io0-f175.google.com with SMTP id c63so22817303iof.0 for ; Mon, 21 Mar 2016 01:44:58 -0700 (PDT) Received: from g2t4623.austin.hp.com (g2t4623.austin.hp.com. [15.73.212.78]) by mx.google.com with ESMTPS id t19si8890665igr.59.2016.03.21.01.44.57 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Mar 2016 01:44:57 -0700 (PDT) Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4FA15.9060700@gmail.com> From: Juerg Haefliger Message-ID: <56EFB486.2090501@hpe.com> Date: Mon, 21 Mar 2016 09:44:54 +0100 MIME-Version: 1.0 In-Reply-To: <56D4FA15.9060700@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Balbir Singh , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu Hi Balbir, Apologies for the slow reply. On 03/01/2016 03:10 AM, Balbir Singh wrote: > > > On 27/02/16 01:21, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kernel >> attacks. The basic idea is to enforce exclusive ownership of page frames >> by either the kernel or userland, unless explicitly requested by the >> kernel. Whenever a page destined for userland is allocated, it is >> unmapped from physmap. When such a page is reclaimed from userland, it is >> mapped back to physmap. > physmap == xen physmap? Please clarify No, it's not XEN related. I might have the terminology wrong. Physmap is what the original authors used for describing a large, contiguous virtual memory region inside kernel address space that contains a direct mapping of part or all (depending on the architecture) physical memory. >> Mapping/unmapping from physmap is accomplished by modifying the PTE >> permission bits to allow/disallow access to the page. >> >> Additional fields are added to the page struct for XPFO housekeeping. >> Specifically a flags field to distinguish user vs. kernel pages, a >> reference counter to track physmap map/unmap operations and a lock to >> protect the XPFO fields. >> >> Known issues/limitations: >> - Only supported on x86-64. > Is it due to lack of porting or a design limitation? Lack of porting. Support for other architectures will come later. >> - Only supports 4k pages. >> - Adds additional data to the page struct. >> - There are most likely some additional and legitimate uses cases where >> the kernel needs to access userspace. Those need to be identified and >> made XPFO-aware. > Why not build an audit mode for it? Can you elaborate what you mean by this? >> - There's a performance impact if XPFO is turned on. Per the paper >> referenced below it's in the 1-3% ballpark. More performance testing >> wouldn't hurt. What tests to run though? >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >> >> Suggested-by: Vasileios P. Kemerlis >> Signed-off-by: Juerg Haefliger > This patch needs to be broken down into smaller patches - a series Agreed. >> --- >> arch/x86/Kconfig | 2 +- >> arch/x86/Kconfig.debug | 17 +++++ >> arch/x86/mm/Makefile | 2 + >> arch/x86/mm/init.c | 3 +- >> arch/x86/mm/xpfo.c | 176 +++++++++++++++++++++++++++++++++++++++++++++++ >> block/blk-map.c | 7 +- >> include/linux/highmem.h | 23 +++++-- >> include/linux/mm_types.h | 4 ++ >> include/linux/xpfo.h | 88 ++++++++++++++++++++++++ >> lib/swiotlb.c | 3 +- >> mm/page_alloc.c | 7 +- >> 11 files changed, 323 insertions(+), 9 deletions(-) >> create mode 100644 arch/x86/mm/xpfo.c >> create mode 100644 include/linux/xpfo.h >> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >> index c46662f..9d32b4a 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1343,7 +1343,7 @@ config ARCH_DMA_ADDR_T_64BIT >> >> config X86_DIRECT_GBPAGES >> def_bool y >> - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK >> + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO >> ---help--- >> Certain kernel features effectively disable kernel >> linear 1 GB mappings (even if the CPU otherwise >> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug >> index 9b18ed9..1331da5 100644 >> --- a/arch/x86/Kconfig.debug >> +++ b/arch/x86/Kconfig.debug >> @@ -5,6 +5,23 @@ config TRACE_IRQFLAGS_SUPPORT >> >> source "lib/Kconfig.debug" >> >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on DEBUG_KERNEL >> + depends on X86_64 >> + select DEBUG_TLBFLUSH >> + ---help--- >> + This option offers protection against 'ret2dir' (kernel) attacks. >> + When enabled, every time a page frame is allocated to user space, it >> + is unmapped from the direct mapped RAM region in kernel space >> + (physmap). Similarly, whenever page frames are freed/reclaimed, they >> + are mapped back to physmap. Special care is taken to minimize the >> + impact on performance by reducing TLB shootdowns and unnecessary page >> + zero fills. >> + >> + If in doubt, say "N". >> + >> config X86_VERBOSE_BOOTUP >> bool "Enable verbose x86 bootup info messages" >> default y >> diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile >> index f9d38a4..8bf52b6 100644 >> --- a/arch/x86/mm/Makefile >> +++ b/arch/x86/mm/Makefile >> @@ -34,3 +34,5 @@ obj-$(CONFIG_ACPI_NUMA) += srat.o >> obj-$(CONFIG_NUMA_EMU) += numa_emulation.o >> >> obj-$(CONFIG_X86_INTEL_MPX) += mpx.o >> + >> +obj-$(CONFIG_XPFO) += xpfo.o >> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c >> index 493f541..27fc8a6 100644 >> --- a/arch/x86/mm/init.c >> +++ b/arch/x86/mm/init.c >> @@ -150,7 +150,8 @@ static int page_size_mask; >> >> static void __init probe_page_size_mask(void) >> { >> -#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) >> +#if !defined(CONFIG_DEBUG_PAGEALLOC) && !defined(CONFIG_KMEMCHECK) && \ >> + !defined(CONFIG_XPFO) >> /* >> * For CONFIG_DEBUG_PAGEALLOC, identity mapping will use small pages. >> * This will simplify cpa(), which otherwise needs to support splitting >> diff --git a/arch/x86/mm/xpfo.c b/arch/x86/mm/xpfo.c >> new file mode 100644 >> index 0000000..6bc24d3 >> --- /dev/null >> +++ b/arch/x86/mm/xpfo.c >> @@ -0,0 +1,176 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#include >> +#include >> + >> +#include >> +#include >> + >> +#define TEST_XPFO_FLAG(flag, page) \ >> + test_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define SET_XPFO_FLAG(flag, page) \ >> + __set_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define CLEAR_XPFO_FLAG(flag, page) \ >> + __clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +#define TEST_AND_CLEAR_XPFO_FLAG(flag, page) \ >> + __test_and_clear_bit(PG_XPFO_##flag, &(page)->xpfo.flags) >> + >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte = lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level != PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); >> +} >> + >> +inline void xpfo_clear_zap(struct page *page, int order) >> +{ >> + int i; >> + >> + for (i = 0; i < (1 << order); i++) >> + CLEAR_XPFO_FLAG(zap, page + i); >> +} >> + >> +inline int xpfo_test_and_clear_zap(struct page *page) >> +{ >> + return TEST_AND_CLEAR_XPFO_FLAG(zap, page); >> +} >> + >> +inline int xpfo_test_kernel(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(kernel, page); >> +} >> + >> +inline int xpfo_test_user(struct page *page) >> +{ >> + return TEST_XPFO_FLAG(user, page); >> +} >> + >> +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) >> +{ >> + int i, tlb_shoot = 0; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + WARN_ON(TEST_XPFO_FLAG(user_fp, page + i) || >> + TEST_XPFO_FLAG(user, page + i)); >> + >> + if (gfp & GFP_HIGHUSER) { > Why GFP_HIGHUSER? The check is wrong. It should be ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER). Thanks ...Juerg >> + /* Initialize the xpfo lock and map counter */ >> + spin_lock_init(&(page + i)->xpfo.lock); >> + atomic_set(&(page + i)->xpfo.mapcount, 0); >> + >> + /* Mark it as a user page */ >> + SET_XPFO_FLAG(user_fp, page + i); >> + >> + /* >> + * Shoot the TLB if the page was previously allocated >> + * to kernel space >> + */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(kernel, page + i)) >> + tlb_shoot = 1; >> + } else { >> + /* Mark it as a kernel page */ >> + SET_XPFO_FLAG(kernel, page + i); >> + } >> + } >> + >> + if (tlb_shoot) { >> + kaddr = (unsigned long)page_address(page); >> + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * >> + PAGE_SIZE); >> + } >> +} >> + >> +void xpfo_free_page(struct page *page, int order) >> +{ >> + int i; >> + unsigned long kaddr; >> + >> + for (i = 0; i < (1 << order); i++) { >> + >> + /* The page frame was previously allocated to user space */ >> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >> + kaddr = (unsigned long)page_address(page + i); >> + >> + /* Clear the page and mark it accordingly */ >> + clear_page((void *)kaddr); >> + SET_XPFO_FLAG(zap, page + i); >> + >> + /* Map it back to kernel space */ >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + /* No TLB update */ >> + } >> + >> + /* Clear the xpfo fast-path flag */ >> + CLEAR_XPFO_FLAG(user_fp, page + i); >> + } >> +} >> + >> +void xpfo_kmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page was previously allocated to user space, so map it back >> + * into the kernel. No TLB update required. >> + */ >> + if ((atomic_inc_return(&page->xpfo.mapcount) == 1) && >> + TEST_XPFO_FLAG(user, page)) >> + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kmap); >> + >> +void xpfo_kunmap(void *kaddr, struct page *page) >> +{ >> + unsigned long flags; >> + >> + /* The page is allocated to kernel space, so nothing to do */ >> + if (TEST_XPFO_FLAG(kernel, page)) >> + return; >> + >> + spin_lock_irqsave(&page->xpfo.lock, flags); >> + >> + /* >> + * The page frame is to be allocated back to user space. So unmap it >> + * from the kernel, update the TLB and mark it as a user page. >> + */ >> + if ((atomic_dec_return(&page->xpfo.mapcount) == 0) && >> + (TEST_XPFO_FLAG(user_fp, page) || TEST_XPFO_FLAG(user, page))) { >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >> + SET_XPFO_FLAG(user, page); >> + } >> + >> + spin_unlock_irqrestore(&page->xpfo.lock, flags); >> +} >> +EXPORT_SYMBOL(xpfo_kunmap); >> diff --git a/block/blk-map.c b/block/blk-map.c >> index f565e11..b7b8302 100644 >> --- a/block/blk-map.c >> +++ b/block/blk-map.c >> @@ -107,7 +107,12 @@ int blk_rq_map_user_iov(struct request_queue *q, struct request *rq, >> prv.iov_len = iov.iov_len; >> } >> >> - if (unaligned || (q->dma_pad_mask & iter->count) || map_data) >> + /* >> + * juergh: Temporary hack to force the use of a bounce buffer if XPFO >> + * is enabled. Results in an XPFO page fault otherwise. >> + */ > This does look like it might add a bunch of overhead >> + if (unaligned || (q->dma_pad_mask & iter->count) || map_data || >> + IS_ENABLED(CONFIG_XPFO)) >> bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); >> else >> bio = bio_map_user_iov(q, iter, gfp_mask); >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h >> index bb3f329..0ca9130 100644 >> --- a/include/linux/highmem.h >> +++ b/include/linux/highmem.h >> @@ -55,24 +55,37 @@ static inline struct page *kmap_to_page(void *addr) >> #ifndef ARCH_HAS_KMAP >> static inline void *kmap(struct page *page) >> { >> + void *kaddr; >> + >> might_sleep(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> >> static inline void kunmap(struct page *page) >> { >> + xpfo_kunmap(page_address(page), page); >> } >> >> static inline void *kmap_atomic(struct page *page) >> { >> + void *kaddr; >> + >> preempt_disable(); >> pagefault_disable(); >> - return page_address(page); >> + >> + kaddr = page_address(page); >> + xpfo_kmap(kaddr, page); >> + return kaddr; >> } >> #define kmap_atomic_prot(page, prot) kmap_atomic(page) >> >> static inline void __kunmap_atomic(void *addr) >> { >> + xpfo_kunmap(addr, virt_to_page(addr)); >> + >> pagefault_enable(); >> preempt_enable(); >> } >> @@ -133,7 +146,8 @@ do { \ >> static inline void clear_user_highpage(struct page *page, unsigned long vaddr) >> { >> void *addr = kmap_atomic(page); >> - clear_user_page(addr, vaddr, page); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_user_page(addr, vaddr, page); >> kunmap_atomic(addr); >> } >> #endif >> @@ -186,7 +200,8 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma, >> static inline void clear_highpage(struct page *page) >> { >> void *kaddr = kmap_atomic(page); >> - clear_page(kaddr); >> + if (!xpfo_test_and_clear_zap(page)) >> + clear_page(kaddr); >> kunmap_atomic(kaddr); >> } >> >> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h >> index 624b78b..71c95aa 100644 >> --- a/include/linux/mm_types.h >> +++ b/include/linux/mm_types.h >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -215,6 +216,9 @@ struct page { >> #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS >> int _last_cpupid; >> #endif >> +#ifdef CONFIG_XPFO >> + struct xpfo_info xpfo; >> +#endif >> } >> /* >> * The struct page can be forced to be double word aligned so that atomic ops >> diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h >> new file mode 100644 >> index 0000000..c4f0871 >> --- /dev/null >> +++ b/include/linux/xpfo.h >> @@ -0,0 +1,88 @@ >> +/* >> + * Copyright (C) 2016 Brown University. All rights reserved. >> + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. >> + * >> + * Authors: >> + * Vasileios P. Kemerlis >> + * Juerg Haefliger >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms of the GNU General Public License version 2 as published by >> + * the Free Software Foundation. >> + */ >> + >> +#ifndef _LINUX_XPFO_H >> +#define _LINUX_XPFO_H >> + >> +#ifdef CONFIG_XPFO >> + >> +/* >> + * XPFO page flags: >> + * >> + * PG_XPFO_user_fp denotes that the page is allocated to user space. This flag >> + * is used in the fast path, where the page is marked accordingly but *not* >> + * unmapped from the kernel. In most cases, the kernel will need access to the >> + * page immediately after its acquisition so an unnecessary mapping operation >> + * is avoided. >> + * >> + * PG_XPFO_user denotes that the page is destined for user space. This flag is >> + * used in the slow path, where the page needs to be mapped/unmapped when the >> + * kernel wants to access it. If a page is deallocated and this flag is set, >> + * the page is cleared and mapped back into the kernel. >> + * >> + * PG_XPFO_kernel denotes a page that is destined to kernel space. This is used >> + * for identifying pages that are first assigned to kernel space and then freed >> + * and mapped to user space. In such cases, an expensive TLB shootdown is >> + * necessary. Pages allocated to user space, freed, and subsequently allocated >> + * to user space again, require only local TLB invalidation. >> + * >> + * PG_XPFO_zap indicates that the page has been zapped. This flag is used to >> + * avoid zapping pages multiple times. Whenever a page is freed and was >> + * previously mapped to user space, it needs to be zapped before mapped back >> + * in to the kernel. >> + */ >> + >> +enum xpfo_pageflags { >> + PG_XPFO_user_fp, >> + PG_XPFO_user, >> + PG_XPFO_kernel, >> + PG_XPFO_zap, >> +}; >> + >> +struct xpfo_info { >> + unsigned long flags; /* Flags for tracking the page's XPFO state */ >> + atomic_t mapcount; /* Counter for balancing page map/unmap >> + * requests. Only the first map request maps >> + * the page back to kernel space. Likewise, >> + * only the last unmap request unmaps the page. >> + */ >> + spinlock_t lock; /* Lock to serialize concurrent map/unmap >> + * requests. >> + */ >> +}; >> + >> +extern void xpfo_clear_zap(struct page *page, int order); >> +extern int xpfo_test_and_clear_zap(struct page *page); >> +extern int xpfo_test_kernel(struct page *page); >> +extern int xpfo_test_user(struct page *page); >> + >> +extern void xpfo_kmap(void *kaddr, struct page *page); >> +extern void xpfo_kunmap(void *kaddr, struct page *page); >> +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); >> +extern void xpfo_free_page(struct page *page, int order); >> + >> +#else /* ifdef CONFIG_XPFO */ >> + >> +static inline void xpfo_clear_zap(struct page *page, int order) { } >> +static inline int xpfo_test_and_clear_zap(struct page *page) { return 0; } >> +static inline int xpfo_test_kernel(struct page *page) { return 0; } >> +static inline int xpfo_test_user(struct page *page) { return 0; } >> + >> +static inline void xpfo_kmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } >> +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } >> +static inline void xpfo_free_page(struct page *page, int order) { } >> + >> +#endif /* ifdef CONFIG_XPFO */ >> + >> +#endif /* ifndef _LINUX_XPFO_H */ >> diff --git a/lib/swiotlb.c b/lib/swiotlb.c >> index 76f29ec..cf57ee9 100644 >> --- a/lib/swiotlb.c >> +++ b/lib/swiotlb.c >> @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, >> { >> unsigned long pfn = PFN_DOWN(orig_addr); >> unsigned char *vaddr = phys_to_virt(tlb_addr); >> + struct page *page = pfn_to_page(pfn); >> >> - if (PageHighMem(pfn_to_page(pfn))) { >> + if (PageHighMem(page) || xpfo_test_user(page)) { >> /* The buffer does not have a mapping. Map it in and copy */ >> unsigned int offset = orig_addr & ~PAGE_MASK; >> char *buffer; >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 838ca8bb..47b42a3 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1003,6 +1003,7 @@ static bool free_pages_prepare(struct page *page, unsigned int order) >> } >> arch_free_page(page, order); >> kernel_map_pages(page, 1 << order, 0); >> + xpfo_free_page(page, order); >> >> return true; >> } >> @@ -1398,10 +1399,13 @@ static int prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags, >> arch_alloc_page(page, order); >> kernel_map_pages(page, 1 << order, 1); >> kasan_alloc_pages(page, order); >> + xpfo_alloc_page(page, order, gfp_flags); >> >> if (gfp_flags & __GFP_ZERO) >> for (i = 0; i < (1 << order); i++) >> clear_highpage(page + i); >> + else >> + xpfo_clear_zap(page, order); >> >> if (order && (gfp_flags & __GFP_COMP)) >> prep_compound_page(page, order); >> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >> } >> >> pcp = &this_cpu_ptr(zone->pageset)->pcp; >> - if (!cold) >> + if (!cold && !xpfo_test_kernel(page)) >> list_add(&page->lru, &pcp->lists[migratetype]); >> else >> list_add_tail(&page->lru, &pcp->lists[migratetype]); >> + >> pcp->count++; >> if (pcp->count >= pcp->high) { >> unsigned long batch = READ_ONCE(pcp->batch); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f42.google.com (mail-pa0-f42.google.com [209.85.220.42]) by kanga.kvack.org (Postfix) with ESMTP id BB1166B007E for ; Mon, 28 Mar 2016 15:30:02 -0400 (EDT) Received: by mail-pa0-f42.google.com with SMTP id zm5so19380092pac.0 for ; Mon, 28 Mar 2016 12:30:02 -0700 (PDT) Received: from mail-pf0-f181.google.com (mail-pf0-f181.google.com. [209.85.192.181]) by mx.google.com with ESMTPS id p28si4713165pfi.167.2016.03.28.12.30.01 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 28 Mar 2016 12:30:01 -0700 (PDT) Received: by mail-pf0-f181.google.com with SMTP id x3so144571857pfb.1 for ; Mon, 28 Mar 2016 12:30:01 -0700 (PDT) Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4F0D6.2060308@redhat.com> <56EFB2DB.3090602@hpe.com> From: Laura Abbott Message-ID: <56F98637.4070705@redhat.com> Date: Mon, 28 Mar 2016 12:29:59 -0700 MIME-Version: 1.0 In-Reply-To: <56EFB2DB.3090602@hpe.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: vpk@cs.brown.edu, Kees Cook On 03/21/2016 01:37 AM, Juerg Haefliger wrote: ... >>> +void xpfo_free_page(struct page *page, int order) >>> +{ >>> + int i; >>> + unsigned long kaddr; >>> + >>> + for (i = 0; i < (1 << order); i++) { >>> + >>> + /* The page frame was previously allocated to user space */ >>> + if (TEST_AND_CLEAR_XPFO_FLAG(user, page + i)) { >>> + kaddr = (unsigned long)page_address(page + i); >>> + >>> + /* Clear the page and mark it accordingly */ >>> + clear_page((void *)kaddr); >> >> Clearing the page isn't related to XPFO. There's other work ongoing to >> do clearing of the page on free. > > It's not strictly related to XPFO but adds another layer of security. Do you > happen to have a pointer to the ongoing work that you mentioned? > > The work was merged for the 4.6 merge window https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8823b1dbc05fab1a8bec275eeae4709257c2661d This is a separate option to clear the page. ... >>> @@ -2072,10 +2076,11 @@ void free_hot_cold_page(struct page *page, bool cold) >>> } >>> >>> pcp = &this_cpu_ptr(zone->pageset)->pcp; >>> - if (!cold) >>> + if (!cold && !xpfo_test_kernel(page)) >>> list_add(&page->lru, &pcp->lists[migratetype]); >>> else >>> list_add_tail(&page->lru, &pcp->lists[migratetype]); >>> + >> >> What's the advantage of this? > > Allocating a page to userspace that was previously allocated to kernel space > requires an expensive TLB shootdown. The above will put previously > kernel-allocated pages in the cold page cache to postpone their allocation as > long as possible to minimize TLB shootdowns. > > That makes sense. You probably want to make this a separate commmit with this explanation as the commit text. >>> pcp->count++; >>> if (pcp->count >= pcp->high) { >>> unsigned long batch = READ_ONCE(pcp->batch); >>> > > Thanks for the review and comments! It's highly appreciated. > > ...Juerg > > >> Thanks, >> Laura -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f172.google.com (mail-yw0-f172.google.com [209.85.161.172]) by kanga.kvack.org (Postfix) with ESMTP id 620976B007E for ; Thu, 31 Mar 2016 20:21:12 -0400 (EDT) Received: by mail-yw0-f172.google.com with SMTP id g3so122830598ywa.3 for ; Thu, 31 Mar 2016 17:21:12 -0700 (PDT) Received: from mail-yw0-x232.google.com (mail-yw0-x232.google.com. [2607:f8b0:4002:c05::232]) by mx.google.com with ESMTPS id b20si3229011ywe.369.2016.03.31.17.21.11 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 Mar 2016 17:21:11 -0700 (PDT) Received: by mail-yw0-x232.google.com with SMTP id g3so122829995ywa.3 for ; Thu, 31 Mar 2016 17:21:11 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <56EFB486.2090501@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <56D4FA15.9060700@gmail.com> <56EFB486.2090501@hpe.com> Date: Fri, 1 Apr 2016 11:21:11 +1100 Message-ID: Subject: Re: [RFC PATCH] Add support for eXclusive Page Frame Ownership (XPFO) From: Balbir Singh Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger Cc: "linux-kernel@vger.kernel.org" , linux-mm , vpk@cs.brown.edu On Mon, Mar 21, 2016 at 7:44 PM, Juerg Haefliger wrote: > Hi Balbir, > > Apologies for the slow reply. > No problem, I lost this in my inbox as well due to the reply latency. > > On 03/01/2016 03:10 AM, Balbir Singh wrote: >> >> >> On 27/02/16 01:21, Juerg Haefliger wrote: >>> This patch adds support for XPFO which protects against 'ret2dir' kernel >>> attacks. The basic idea is to enforce exclusive ownership of page frames >>> by either the kernel or userland, unless explicitly requested by the >>> kernel. Whenever a page destined for userland is allocated, it is >>> unmapped from physmap. When such a page is reclaimed from userland, it is >>> mapped back to physmap. >> physmap == xen physmap? Please clarify > > No, it's not XEN related. I might have the terminology wrong. Physmap is what > the original authors used for describing a large, contiguous virtual > memory region inside kernel address space that contains a direct mapping of part > or all (depending on the architecture) physical memory. > Thanks for clarifying > >>> Mapping/unmapping from physmap is accomplished by modifying the PTE >>> permission bits to allow/disallow access to the page. >>> >>> Additional fields are added to the page struct for XPFO housekeeping. >>> Specifically a flags field to distinguish user vs. kernel pages, a >>> reference counter to track physmap map/unmap operations and a lock to >>> protect the XPFO fields. >>> >>> Known issues/limitations: >>> - Only supported on x86-64. >> Is it due to lack of porting or a design limitation? > > Lack of porting. Support for other architectures will come later. > OK > >>> - Only supports 4k pages. >>> - Adds additional data to the page struct. >>> - There are most likely some additional and legitimate uses cases where >>> the kernel needs to access userspace. Those need to be identified and >>> made XPFO-aware. >> Why not build an audit mode for it? > > Can you elaborate what you mean by this? > What I meant is when the kernel needs to access userspace and XPFO is not aware of it and is going to block it, write to a log/trace buffer so that it can be audited for correctness > >>> - There's a performance impact if XPFO is turned on. Per the paper >>> referenced below it's in the 1-3% ballpark. More performance testing >>> wouldn't hurt. What tests to run though? >>> >>> Reference paper by the original patch authors: >>> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >>> >>> Suggested-by: Vasileios P. Kemerlis >>> Signed-off-by: Juerg Haefliger >> This patch needs to be broken down into smaller patches - a series > > Agreed. > I think it will be good to describe what is XPFO aware 1. How are device mmap'd shared between kernel/user covered? 2. How is copy_from/to_user covered? 3. How is vdso covered? 4. More... Balbir Singh. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 18E5D6B0038 for ; Fri, 2 Sep 2016 07:39:27 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id e124so32297431ith.0 for ; Fri, 02 Sep 2016 04:39:27 -0700 (PDT) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id b7si2565606otc.231.2016.09.02.04.39.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Sep 2016 04:39:26 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 2 Sep 2016 13:39:06 +0200 Message-Id: <20160902113909.32631-1-juerg.haefliger@hpe.com> In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id A64346B0253 for ; Fri, 2 Sep 2016 07:39:46 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id g185so32118814ith.2 for ; Fri, 02 Sep 2016 04:39:46 -0700 (PDT) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id r128si12610889oib.70.2016.09.02.04.39.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Sep 2016 04:39:45 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 2 Sep 2016 13:39:07 +0200 Message-Id: <20160902113909.32631-2-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f69.google.com (mail-it0-f69.google.com [209.85.214.69]) by kanga.kvack.org (Postfix) with ESMTP id 2A3E76B025E for ; Fri, 2 Sep 2016 07:39:57 -0400 (EDT) Received: by mail-it0-f69.google.com with SMTP id 192so32569423itm.1 for ; Fri, 02 Sep 2016 04:39:57 -0700 (PDT) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id a34si12604232otc.76.2016.09.02.04.39.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Sep 2016 04:39:56 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Date: Fri, 2 Sep 2016 13:39:08 +0200 Message-Id: <20160902113909.32631-3-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id 68A136B0260 for ; Fri, 2 Sep 2016 07:40:00 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id i4so114789164oih.1 for ; Fri, 02 Sep 2016 04:40:00 -0700 (PDT) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id y6si12534563ota.280.2016.09.02.04.39.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 02 Sep 2016 04:39:59 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Date: Fri, 2 Sep 2016 13:39:09 +0200 Message-Id: <20160902113909.32631-4-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id 3F7166B0069 for ; Fri, 2 Sep 2016 16:39:24 -0400 (EDT) Received: by mail-pf0-f200.google.com with SMTP id g202so145375305pfb.3 for ; Fri, 02 Sep 2016 13:39:24 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id b64si13196217pfa.51.2016.09.02.13.39.23 for (version=TLS1 cipher=AES128-SHA bits=128/128); Fri, 02 Sep 2016 13:39:23 -0700 (PDT) Subject: Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> From: Dave Hansen Message-ID: <57C9E37A.9070805@intel.com> Date: Fri, 2 Sep 2016 13:39:22 -0700 MIME-Version: 1.0 In-Reply-To: <20160902113909.32631-3-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu On 09/02/2016 04:39 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id 54CA96B0069 for ; Wed, 14 Sep 2016 03:19:28 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id e1so28498216itb.3 for ; Wed, 14 Sep 2016 00:19:28 -0700 (PDT) Received: from g4t3425.houston.hpe.com (g4t3425.houston.hpe.com. [15.241.140.78]) by mx.google.com with ESMTPS id p37si17357627otd.69.2016.09.14.00.19.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2016 00:19:14 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Wed, 14 Sep 2016 09:18:58 +0200 Message-Id: <20160914071901.8127-1-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id CA47E6B0253 for ; Wed, 14 Sep 2016 03:19:36 -0400 (EDT) Received: by mail-it0-f72.google.com with SMTP id e20so27777581itc.0 for ; Wed, 14 Sep 2016 00:19:36 -0700 (PDT) Received: from g4t3425.houston.hpe.com (g4t3425.houston.hpe.com. [15.241.140.78]) by mx.google.com with ESMTPS id k19si17131344ote.192.2016.09.14.00.19.21 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2016 00:19:21 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) Date: Wed, 14 Sep 2016 09:18:59 +0200 Message-Id: <20160914071901.8127-2-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f71.google.com (mail-it0-f71.google.com [209.85.214.71]) by kanga.kvack.org (Postfix) with ESMTP id E1DE96B025E for ; Wed, 14 Sep 2016 03:19:42 -0400 (EDT) Received: by mail-it0-f71.google.com with SMTP id 192so26975444itm.1 for ; Wed, 14 Sep 2016 00:19:42 -0700 (PDT) Received: from g4t3425.houston.hpe.com (g4t3425.houston.hpe.com. [15.241.140.78]) by mx.google.com with ESMTPS id d16si14640875oig.162.2016.09.14.00.19.28 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2016 00:19:28 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache Date: Wed, 14 Sep 2016 09:19:00 +0200 Message-Id: <20160914071901.8127-3-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id E4FA46B0260 for ; Wed, 14 Sep 2016 03:19:45 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id q188so19844667oia.1 for ; Wed, 14 Sep 2016 00:19:45 -0700 (PDT) Received: from g4t3425.houston.hpe.com (g4t3425.houston.hpe.com. [15.241.140.78]) by mx.google.com with ESMTPS id f30si17366820otd.156.2016.09.14.00.19.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2016 00:19:31 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Date: Wed, 14 Sep 2016 09:19:01 +0200 Message-Id: <20160914071901.8127-4-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f70.google.com (mail-pa0-f70.google.com [209.85.220.70]) by kanga.kvack.org (Postfix) with ESMTP id 8F12E6B0069 for ; Wed, 14 Sep 2016 03:33:43 -0400 (EDT) Received: by mail-pa0-f70.google.com with SMTP id fu12so12049529pac.1 for ; Wed, 14 Sep 2016 00:33:43 -0700 (PDT) Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id d8si32118857pfd.256.2016.09.14.00.33.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Sep 2016 00:33:42 -0700 (PDT) Date: Wed, 14 Sep 2016 00:33:40 -0700 From: Christoph Hellwig Subject: Re: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled Message-ID: <20160914073340.GA28090@infradead.org> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-4-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914071901.8127-4-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu On Wed, Sep 14, 2016 at 09:19:01AM +0200, Juerg Haefliger wrote: > This is a temporary hack to prevent the use of bio_map_user_iov() > which causes XPFO page faults. > > Signed-off-by: Juerg Haefliger Sorry, but if your scheme doesn't support get_user_pages access to user memory is't a steaming pile of crap and entirely unacceptable. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f71.google.com (mail-pa0-f71.google.com [209.85.220.71]) by kanga.kvack.org (Postfix) with ESMTP id 01B4F6B0253 for ; Wed, 14 Sep 2016 05:36:53 -0400 (EDT) Received: by mail-pa0-f71.google.com with SMTP id ex14so16923378pac.0 for ; Wed, 14 Sep 2016 02:36:52 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id s4si5799850pfi.286.2016.09.14.02.36.52 for ; Wed, 14 Sep 2016 02:36:52 -0700 (PDT) Date: Wed, 14 Sep 2016 10:36:34 +0100 From: Mark Rutland Subject: Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20160914093634.GB13121@leverpostej> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org, juerg.haefliger@hpe.com, vpk@cs.columbia.edu Hi, On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > This patch series adds support for XPFO which protects against 'ret2dir' > kernel attacks. The basic idea is to enforce exclusive ownership of page > frames by either the kernel or userspace, unless explicitly requested by > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Just to check, doesn't DEBUG_RODATA ensure that the linear mapping is non-executable on x86_64 (as it does for arm64)? For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). Assuming that implies a lack of execute permission for x86_64, that should provide a similar level of protection against erroneously branching to addresses in the linear map, without the complexity and overhead of mapping/unmapping pages. So to me it looks like this approach may only be useful for architectures without page-granular execute permission controls. Is this also intended to protect against erroneous *data* accesses to the linear map? Am I missing something? Thanks, Mark. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f197.google.com (mail-pf0-f197.google.com [209.85.192.197]) by kanga.kvack.org (Postfix) with ESMTP id 5929B6B0069 for ; Wed, 14 Sep 2016 05:49:26 -0400 (EDT) Received: by mail-pf0-f197.google.com with SMTP id g202so17972076pfb.3 for ; Wed, 14 Sep 2016 02:49:26 -0700 (PDT) Received: from foss.arm.com (foss.arm.com. [217.140.101.70]) by mx.google.com with ESMTP id m8si14985507pfi.128.2016.09.14.02.49.24 for ; Wed, 14 Sep 2016 02:49:24 -0700 (PDT) Date: Wed, 14 Sep 2016 10:49:02 +0100 From: Mark Rutland Subject: Re: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20160914094902.GA14330@leverpostej> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914093634.GB13121@leverpostej> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914093634.GB13121@leverpostej> Sender: owner-linux-mm@kvack.org List-ID: To: kernel-hardening@lists.openwall.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org, juerg.haefliger@hpe.com, vpk@cs.columbia.edu On Wed, Sep 14, 2016 at 10:36:34AM +0100, Mark Rutland wrote: > On Wed, Sep 14, 2016 at 09:18:58AM +0200, Juerg Haefliger wrote: > > This patch series adds support for XPFO which protects against 'ret2dir' > > kernel attacks. The basic idea is to enforce exclusive ownership of page > > frames by either the kernel or userspace, unless explicitly requested by > > the kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > For both arm64 and x86_64, DEBUG_RODATA is mandatory (or soon to be so). > Assuming that implies a lack of execute permission for x86_64, that > should provide a similar level of protection against erroneously > branching to addresses in the linear map, without the complexity and > overhead of mapping/unmapping pages. > > So to me it looks like this approach may only be useful for > architectures without page-granular execute permission controls. > > Is this also intended to protect against erroneous *data* accesses to > the linear map? Now that I read the paper more carefully, I can see that this is the case, and this does catch issues which DEBUG_RODATA cannot. Apologies for the noise. Thanks, Mark. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f72.google.com (mail-pa0-f72.google.com [209.85.220.72]) by kanga.kvack.org (Postfix) with ESMTP id C8D6F6B0253 for ; Wed, 14 Sep 2016 10:33:09 -0400 (EDT) Received: by mail-pa0-f72.google.com with SMTP id mi5so30148494pab.2 for ; Wed, 14 Sep 2016 07:33:09 -0700 (PDT) Received: from mga02.intel.com (mga02.intel.com. [134.134.136.20]) by mx.google.com with ESMTPS id bx7si5029664pac.110.2016.09.14.07.33.09 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 14 Sep 2016 07:33:09 -0700 (PDT) Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> From: Dave Hansen Message-ID: <57D95FA3.3030103@intel.com> Date: Wed, 14 Sep 2016 07:33:07 -0700 MIME-Version: 1.0 In-Reply-To: <20160914071901.8127-3-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu On 09/14/2016 12:19 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. Hi, I had some questions about this the last time you posted it. Maybe you want to address them now. -- But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f72.google.com (mail-pa0-f72.google.com [209.85.220.72]) by kanga.kvack.org (Postfix) with ESMTP id C3B6C6B0038 for ; Wed, 14 Sep 2016 10:48:27 -0400 (EDT) Received: by mail-pa0-f72.google.com with SMTP id mi5so30859523pab.2 for ; Wed, 14 Sep 2016 07:48:27 -0700 (PDT) Received: from mga14.intel.com (mga14.intel.com. [192.55.52.115]) by mx.google.com with ESMTPS id s4si5118038pan.6.2016.09.14.07.48.26 for (version=TLS1 cipher=AES128-SHA bits=128/128); Wed, 14 Sep 2016 07:48:26 -0700 (PDT) Subject: Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-3-juerg.haefliger@hpe.com> <57D95FA3.3030103@intel.com> <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> From: Dave Hansen Message-ID: <57D9633A.2010702@intel.com> Date: Wed, 14 Sep 2016 07:48:26 -0700 MIME-Version: 1.0 In-Reply-To: <7badeb6c-e343-4327-29ed-f9c9c0b6654b@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu > On 09/02/2016 10:39 PM, Dave Hansen wrote: >> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Does this >> just mean that kernel allocations usually have to pay the penalty to >> convert a page? > > Only pages that are allocated for userspace (gfp & GFP_HIGHUSER == GFP_HIGHUSER) which were > previously allocated for the kernel (gfp & GFP_HIGHUSER != GFP_HIGHUSER) have to pay the penalty. > >> So, what's the logic here? You're assuming that order-0 kernel >> allocations are more rare than allocations for userspace? > > The logic is to put reclaimed kernel pages into the cold cache to > postpone their allocation as long as possible to minimize (potential) > TLB flushes. OK, but if we put them in the cold area but kernel allocations pull them from the hot cache, aren't we virtually guaranteeing that kernel allocations will have to to TLB shootdown to convert a page? It seems like you also need to convert all kernel allocations to pull from the cold area. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f72.google.com (mail-oi0-f72.google.com [209.85.218.72]) by kanga.kvack.org (Postfix) with ESMTP id B83C56B0319 for ; Fri, 4 Nov 2016 10:45:45 -0400 (EDT) Received: by mail-oi0-f72.google.com with SMTP id 128so124493511oih.1 for ; Fri, 04 Nov 2016 07:45:45 -0700 (PDT) Received: from g9t5008.houston.hpe.com (g9t5008.houston.hpe.com. [15.241.48.72]) by mx.google.com with ESMTPS id y187si8879957oig.271.2016.11.04.07.45.45 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 04 Nov 2016 07:45:45 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v3 0/2] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 4 Nov 2016 15:45:32 +0100 Message-Id: <20161104144534.14790-1-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com Changes from: v2 -> v3: - Removed 'depends on DEBUG_KERNEL' and 'select DEBUG_TLBFLUSH'. These are left-overs from the original patch and are not required. - Make libata XPFO-aware, i.e., properly handle pages that were unmapped by XPFO. This takes care of the temporary hack in v2 that forced the use of a bounce buffer in block/blk-map.c. v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (removed from the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (2): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 315 insertions(+), 8 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.10.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f70.google.com (mail-oi0-f70.google.com [209.85.218.70]) by kanga.kvack.org (Postfix) with ESMTP id 414456B031B for ; Fri, 4 Nov 2016 10:45:53 -0400 (EDT) Received: by mail-oi0-f70.google.com with SMTP id j198so124631423oih.5 for ; Fri, 04 Nov 2016 07:45:53 -0700 (PDT) Received: from g9t5008.houston.hpe.com (g9t5008.houston.hpe.com. [15.241.48.72]) by mx.google.com with ESMTPS id c131si8990124oig.5.2016.11.04.07.45.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 04 Nov 2016 07:45:52 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Date: Fri, 4 Nov 2016 15:45:33 +0100 Message-Id: <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 298 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bada636d1065..38b334f8fde5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 22af912d66d2..a6fafbae02bb 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 051b6158d1b7..58af734be25d 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use a bounce buffer */ @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use bounce buffer */ diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 9298c393ddaa..0e451a42e5a3 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,6 +29,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -44,6 +46,11 @@ enum page_ext_flags { */ struct page_ext { unsigned long flags; +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 295bd7a9f76b..175680f516aa 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fd42aa7c4bd..100e80e008e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 121dcffc4ec1..ba6dbcacc2db 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..8e3a6a694b6a --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,206 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} +EXPORT_SYMBOL(xpfo_page_is_unmapped); diff --git a/security/Kconfig b/security/Kconfig index 118f4549404e..4502e15c8419 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,25 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on ARCH_SUPPORTS_XPFO + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.10.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-f69.google.com (mail-oi0-f69.google.com [209.85.218.69]) by kanga.kvack.org (Postfix) with ESMTP id 492E16B031D for ; Fri, 4 Nov 2016 10:45:56 -0400 (EDT) Received: by mail-oi0-f69.google.com with SMTP id 128so124503633oih.1 for ; Fri, 04 Nov 2016 07:45:56 -0700 (PDT) Received: from g9t5008.houston.hpe.com (g9t5008.houston.hpe.com. [15.241.48.72]) by mx.google.com with ESMTPS id s56si8380632otd.221.2016.11.04.07.45.55 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 04 Nov 2016 07:45:55 -0700 (PDT) From: Juerg Haefliger Subject: [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache Date: Fri, 4 Nov 2016 15:45:34 +0100 Message-Id: <20161104144534.14790-3-juerg.haefliger@hpe.com> In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 100e80e008e2..09ef4f7cfd14 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2440,7 +2440,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index 8e3a6a694b6a..0e447e38008a 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -204,3 +204,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } EXPORT_SYMBOL(xpfo_page_is_unmapped); + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.10.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f72.google.com (mail-it0-f72.google.com [209.85.214.72]) by kanga.kvack.org (Postfix) with ESMTP id F06896B0253 for ; Thu, 10 Nov 2016 00:56:47 -0500 (EST) Received: by mail-it0-f72.google.com with SMTP id q124so17929805itd.2 for ; Wed, 09 Nov 2016 21:56:47 -0800 (PST) Received: from szxga01-in.huawei.com (szxga01-in.huawei.com. [58.251.152.64]) by mx.google.com with ESMTP id 23si849545otv.256.2016.11.09.21.56.44 for ; Wed, 09 Nov 2016 21:56:46 -0800 (PST) Subject: Re: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: "ZhaoJunmin Zhao(Junmin)" Message-ID: <58240B46.7080108@huawei.com> Date: Thu, 10 Nov 2016 13:53:10 +0800 MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Content-Type: text/plain; charset="gbk"; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > When a physical page is assigned to a process in user space, it should be unmaped from kernel physmap. From the code, I can see the patch only handle the page in high memory zone. if the kernel use the high memory zone, it will call the kmap. So I would like to know if the physical page is coming from normal zone,how to handle it. Thanks Zhaojunmin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f70.google.com (mail-wm0-f70.google.com [74.125.82.70]) by kanga.kvack.org (Postfix) with ESMTP id E0662280253 for ; Thu, 10 Nov 2016 14:11:37 -0500 (EST) Received: by mail-wm0-f70.google.com with SMTP id u144so14292552wmu.1 for ; Thu, 10 Nov 2016 11:11:37 -0800 (PST) Received: from mail-wm0-x22f.google.com (mail-wm0-x22f.google.com. [2a00:1450:400c:c09::22f]) by mx.google.com with ESMTPS id g82si29480015wmc.54.2016.11.10.11.11.36 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Nov 2016 11:11:36 -0800 (PST) Received: by mail-wm0-x22f.google.com with SMTP id t79so51386576wmt.0 for ; Thu, 10 Nov 2016 11:11:36 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:11:34 -0800 Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. Thanks for keeping on this! I'd really like to see it land and then get more architectures to support it. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty In the Kconfig you say "slight", but I'm curious what kinds of benchmarks you've done and if there's a more specific cost we can declare, just to give people more of an idea what the hit looks like? (What workloads would trigger a lot of XPFO unmapping, for example?) Thanks! -Kees -- Kees Cook Nexus Security -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id DB9CF280278 for ; Thu, 10 Nov 2016 14:24:49 -0500 (EST) Received: by mail-wm0-f69.google.com with SMTP id g23so14527111wme.4 for ; Thu, 10 Nov 2016 11:24:49 -0800 (PST) Received: from mail-wm0-x235.google.com (mail-wm0-x235.google.com. [2a00:1450:400c:c09::235]) by mx.google.com with ESMTPS id e200si17684258wma.2.2016.11.10.11.24.48 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 10 Nov 2016 11:24:48 -0800 (PST) Received: by mail-wm0-x235.google.com with SMTP id f82so52062680wmf.1 for ; Thu, 10 Nov 2016 11:24:48 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:24:46 -0800 Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Would it be possible to create an lkdtm test that can exercise this protection? > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool Can you include a "help" section here to describe what requirements an architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and HAVE_ARCH_VMAP_STACK or some examples. > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > I've added these patches to my kspp tree on kernel.org, so it should get some 0-day testing now... Thanks! -Kees -- Kees Cook Nexus Security -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f70.google.com (mail-it0-f70.google.com [209.85.214.70]) by kanga.kvack.org (Postfix) with ESMTP id C36946B0273 for ; Tue, 15 Nov 2016 06:15:18 -0500 (EST) Received: by mail-it0-f70.google.com with SMTP id w132so7293650ita.1 for ; Tue, 15 Nov 2016 03:15:18 -0800 (PST) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id k69si2384448oib.175.2016.11.15.03.15.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Nov 2016 03:15:17 -0800 (PST) Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Juerg Haefliger Message-ID: Date: Tue, 15 Nov 2016 12:15:14 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE" Sender: owner-linux-mm@kvack.org List-ID: To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: multipart/mixed; boundary="Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sorry for the late reply, I just found your email in my cluttered inbox. On 11/10/2016 08:11 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >=20 > Thanks for keeping on this! I'd really like to see it land and then > get more architectures to support it. Good to hear :-) >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >=20 > In the Kconfig you say "slight", but I'm curious what kinds of > benchmarks you've done and if there's a more specific cost we can > declare, just to give people more of an idea what the hit looks like? > (What workloads would trigger a lot of XPFO unmapping, for example?) That 'slight' wording is based on the performance numbers published in th= e referenced paper. So far I've only run kernel compilation tests. For that workload, the big= performance hit comes from disabling >4k page sizes (around 10%). Adding XPFO on top causes 'only' a= nother 0.5% performance penalty. I'm currently looking into adding support for larger page sizes = to see what the real impact is and then generate some more relevant numbers. =2E..Juerg > Thanks! >=20 > -Kees >=20 --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR-- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu5DAAoJEHVMOpb5+LSMzQ8P+wWBd+Sen2m8U4Q7HjsdGCoB 9fHq5r8x/bt+WvqF2i8vMR5Txrfn/EoOAkxkOu8tYiq7ECnHSnETAR8NVR2ckp0M cizhmBdOiiMcOUiLSnPGxEx9390Qdx5li0ODwqQS5dSa9qCkBbbv6qf7ri5CzDFH VO+OIAHI/kChTi4baKENq3UNHh0+8s/M0dykDwStIjrDG4Nh+IcEWOeDvOBWZ5HG qxZQEg20reipzZTcba7paJ/pJQZBuKg/AFdQW/RFBFK3O0JngWKp67ZmxSU7PHw+ xr9qpKy+N9Yk3q5id7q2f2zA7eq3a3uYTNC+8d7zc6KQJIofnCLX/3dtuIEwS9rR QSxQIPtk2sFmPLy/kXpU2RihdIJijJtx7RmbW7KEiuUMwUO+dDjjwJul9SNxlYWg gYjUxPAGP6jxfGL443YKNbss2e5KfIh6LXlJpbtnD0WEfYiI7Ef2Y2qRrXpCkcw/ Z2kBLojOJOn8HagkHJiiw8lTwgDm2+YNcUWQoDgaTK9xOoAfMssETJfFaiGt6hsG 7VJot9jHg33kSZDyiTVBV6nwmCkOqtgXINYj8Q82iRmWUKPq2VEQEWWlvg31N9eu S1L7EFIaAzZvt+6qc/GCrjjQzgOz+En/UyfmPoojJ+A6dx8/gM6oWkOOZDsG614J 9rFANUbutWyZav73fc/L =Wzyi -----END PGP SIGNATURE----- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb0-f198.google.com (mail-yb0-f198.google.com [209.85.213.198]) by kanga.kvack.org (Postfix) with ESMTP id 17FC56B0275 for ; Tue, 15 Nov 2016 06:18:15 -0500 (EST) Received: by mail-yb0-f198.google.com with SMTP id d128so174261961ybh.6 for ; Tue, 15 Nov 2016 03:18:15 -0800 (PST) Received: from g9t5009.houston.hpe.com (g9t5009.houston.hpe.com. [15.241.48.73]) by mx.google.com with ESMTPS id a83si11553626oif.108.2016.11.15.03.18.14 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Nov 2016 03:18:14 -0800 (PST) Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Juerg Haefliger Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Date: Tue, 15 Nov 2016 12:18:10 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF" Sender: owner-linux-mm@kvack.org List-ID: To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: multipart/mixed; boundary="rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/10/2016 08:24 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >> >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >=20 > Would it be possible to create an lkdtm test that can exercise this pro= tection? I'll look into it. >> diff --git a/security/Kconfig b/security/Kconfig >> index 118f4549404e..4502e15c8419 100644 >> --- a/security/Kconfig >> +++ b/security/Kconfig >> @@ -6,6 +6,25 @@ menu "Security options" >> >> source security/keys/Kconfig >> >> +config ARCH_SUPPORTS_XPFO >> + bool >=20 > Can you include a "help" section here to describe what requirements an > architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and > HAVE_ARCH_VMAP_STACK or some examples. Will do. >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on ARCH_SUPPORTS_XPFO >> + select PAGE_EXTENSION >> + help >> + This option offers protection against 'ret2dir' kernel attac= ks. >> + When enabled, every time a page frame is allocated to user s= pace, it >> + is unmapped from the direct mapped RAM region in kernel spac= e >> + (physmap). Similarly, when a page frame is freed/reclaimed, = it is >> + mapped back to physmap. >> + >> + There is a slight performance impact when this option is ena= bled. >> + >> + If in doubt, say "N". >> + >> config SECURITY_DMESG_RESTRICT >> bool "Restrict unprivileged access to the kernel syslog" >> default n >=20 > I've added these patches to my kspp tree on kernel.org, so it should > get some 0-day testing now... Very good. Thanks! > Thanks! Appreciate the feedback. =2E..Juerg > -Kees >=20 --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA-- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu7zAAoJEHVMOpb5+LSMThkP/1ZSAODxbIB2ebdrvax2absi nJwtgo56pBL7g7OJu/OrxUXvMHi9LGfahZOUTUmRCiQIL60EdjCJvQB9wcASVr3i 7AO1ztMGZxmGl/UlobukQs0xTlFU9FcYJFxTqKQPHA8PFnzQZe5jqG1JwTjhw4Z7 ANULiFZGG0G0vSXAagWwiwdzZJyt4HCSamfoESBKSBTK8TywvIFDqy/qsHHlmpjd EExwax4E/VB+Yl8Tg2RvgHHI1kQpTB1dPBfAQvXOTjujdHVGxVZSZBss+3HXL5vi BbNA0Gez+aNvVp2tTTeyWce9y11nIAZgU4rcjxkBqGoU73S+I2ltlIN7MCbKOYR3 /wGxXpCeOCWRVcFxm4yxnQcWOXWMa7aIVHMf7uHU53oKOqGtglFQcMR6V4bcmNG9 n+jLQZr/ADR9PJ2Rsb1vVyOlNiy+uQ+JCA5lBfEe+ckPW2MSc5GedzeETGYQgdUS u9ZzGrbtW9++PXXjgm6YBoaij0vjhVH2/Q1WU3wwdzBDGIaRpy1Bh0zShDdQ7S8y G83c8dHH4Yc1CIljCA0+Ipur3nvuoJKdc6Kxy+j1JK86t6dK8sktXS/1SnBIGM7T L30CH60pgfyvpDEWbSXoQXjdyuYMaQALBYX258KXuH8e9+vjPrO/UC8prgJqK/C1 rbWnk9S8v1HGxMfThiYi =Vrrl -----END PGP SIGNATURE----- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF-- -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f198.google.com (mail-pf0-f198.google.com [209.85.192.198]) by kanga.kvack.org (Postfix) with ESMTP id E28566B0038 for ; Thu, 24 Nov 2016 05:57:00 -0500 (EST) Received: by mail-pf0-f198.google.com with SMTP id c4so58410016pfb.7 for ; Thu, 24 Nov 2016 02:57:00 -0800 (PST) Received: from mail-pf0-x232.google.com (mail-pf0-x232.google.com. [2607:f8b0:400e:c00::232]) by mx.google.com with ESMTPS id b3si10263702plb.131.2016.11.24.02.56.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 24 Nov 2016 02:56:59 -0800 (PST) Received: by mail-pf0-x232.google.com with SMTP id i88so9696969pfk.2 for ; Thu, 24 Nov 2016 02:56:59 -0800 (PST) Date: Thu, 24 Nov 2016 19:56:30 +0900 From: AKASHI Takahiro Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20161124105629.GA23034@linaro.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Hi, I'm trying to give it a spin on arm64, but ... On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, would it be better to put the whole definition into arch-specific part? > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); Why not PAGE_KERNEL? > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); Again __flush_tlb_one() is x86-specific. flush_tlb_kernel_range() instead? Thanks, -Takahiro AKASHI > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f200.google.com (mail-pf0-f200.google.com [209.85.192.200]) by kanga.kvack.org (Postfix) with ESMTP id C08376B025E for ; Fri, 9 Dec 2016 04:02:48 -0500 (EST) Received: by mail-pf0-f200.google.com with SMTP id 17so13466580pfy.2 for ; Fri, 09 Dec 2016 01:02:48 -0800 (PST) Received: from mail-pf0-x230.google.com (mail-pf0-x230.google.com. [2607:f8b0:400e:c00::230]) by mx.google.com with ESMTPS id q5si32809822pgj.243.2016.12.09.01.02.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 09 Dec 2016 01:02:46 -0800 (PST) Received: by mail-pf0-x230.google.com with SMTP id 189so3013206pfz.3 for ; Fri, 09 Dec 2016 01:02:46 -0800 (PST) Date: Fri, 9 Dec 2016 18:02:53 +0900 From: AKASHI Takahiro Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) Message-ID: <20161209090251.GF23034@linaro.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161124105629.GA23034@linaro.org> Sender: owner-linux-mm@kvack.org List-ID: To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu On Thu, Nov 24, 2016 at 07:56:30PM +0900, AKASHI Takahiro wrote: > Hi, > > I'm trying to give it a spin on arm64, but ... In my experiment on hikey, the kernel boot failed, catching a page fault around cache operations, (a) __clean_dcache_area_pou() on 4KB-page kernel, (b) __inval_cache_range() on 64KB-page kernel, (See more details for backtrace below.) This is because, on arm64, cache operations are by VA (in particular, of direct/linear mapping of physical memory). So I think that naively unmapping a page from physmap in xpfo_kunmap() won't work well on arm64. -Takahiro AKASHI case (a) -------- Unable to handle kernel paging request at virtual address ffff800000cba000 pgd = ffff80003ba8c000 *pgd=0000000000000000 task: ffff80003be38000 task.stack: ffff80003be40000 PC is at __clean_dcache_area_pou+0x20/0x38 LR is at sync_icache_aliases+0x2c/0x40 ... Call trace: ... __clean_dcache_area_pou+0x20/0x38 __sync_icache_dcache+0x6c/0xa8 alloc_set_pte+0x33c/0x588 filemap_map_pages+0x3a8/0x3b8 handle_mm_fault+0x910/0x1080 do_page_fault+0x2b0/0x358 do_mem_abort+0x44/0xa0 el0_ia+0x18/0x1c case (b) -------- Unable to handle kernel paging request at virtual address ffff80002aed0000 pgd = ffff000008f40000 , *pud=000000003dfc0003 , *pmd=000000003dfa0003 , *pte=000000002aed0000 task: ffff800028711900 task.stack: ffff800029020000 PC is at __inval_cache_range+0x3c/0x60 LR is at __swiotlb_map_sg_attrs+0x6c/0x98 ... Call trace: ... __inval_cache_range+0x3c/0x60 dw_mci_pre_dma_transfer.isra.7+0xfc/0x190 dw_mci_pre_req+0x50/0x60 mmc_start_req+0x4c/0x420 mmc_blk_issue_rw_rq+0xb0/0x9b8 mmc_blk_issue_rq+0x154/0x518 mmc_queue_thread+0xac/0x158 kthread+0xd0/0xe8 ret_from_fork+0x10/0x20 > > On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > > This patch adds support for XPFO which protects against 'ret2dir' kernel > > attacks. The basic idea is to enforce exclusive ownership of page frames > > by either the kernel or userspace, unless explicitly requested by the > > kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > > > Additional fields in the page_ext struct are used for XPFO housekeeping. > > Specifically two flags to distinguish user vs. kernel pages and to tag > > unmapped pages and a reference counter to balance kmap/kunmap operations > > and a lock to serialize access to the XPFO fields. > > > > Known issues/limitations: > > - Only supports x86-64 (for now) > > - Only supports 4k pages (for now) > > - There are most likely some legitimate uses cases where the kernel needs > > to access userspace which need to be made XPFO-aware > > - Performance penalty > > > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > > > Suggested-by: Vasileios P. Kemerlis > > Signed-off-by: Juerg Haefliger > > --- > > arch/x86/Kconfig | 3 +- > > arch/x86/mm/init.c | 2 +- > > drivers/ata/libata-sff.c | 4 +- > > include/linux/highmem.h | 15 +++- > > include/linux/page_ext.h | 7 ++ > > include/linux/xpfo.h | 39 +++++++++ > > lib/swiotlb.c | 3 +- > > mm/Makefile | 1 + > > mm/page_alloc.c | 2 + > > mm/page_ext.c | 4 + > > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > > security/Kconfig | 19 +++++ > > 12 files changed, 298 insertions(+), 7 deletions(-) > > create mode 100644 include/linux/xpfo.h > > create mode 100644 mm/xpfo.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index bada636d1065..38b334f8fde5 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -165,6 +165,7 @@ config X86 > > select HAVE_STACK_VALIDATION if X86_64 > > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > > + select ARCH_SUPPORTS_XPFO if X86_64 > > > > config INSTRUCTION_DECODER > > def_bool y > > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > > > config X86_DIRECT_GBPAGES > > def_bool y > > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > > ---help--- > > Certain kernel features effectively disable kernel > > linear 1 GB mappings (even if the CPU otherwise > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index 22af912d66d2..a6fafbae02bb 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -161,7 +161,7 @@ static int page_size_mask; > > > > static void __init probe_page_size_mask(void) > > { > > -#if !defined(CONFIG_KMEMCHECK) > > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > > /* > > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > > * use small pages. > > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > > index 051b6158d1b7..58af734be25d 100644 > > --- a/drivers/ata/libata-sff.c > > +++ b/drivers/ata/libata-sff.c > > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use a bounce buffer */ > > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use bounce buffer */ > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index bb3f3297062a..7a17c166532f 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > > #ifndef ARCH_HAS_KMAP > > static inline void *kmap(struct page *page) > > { > > + void *kaddr; > > + > > might_sleep(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > > > static inline void kunmap(struct page *page) > > { > > + xpfo_kunmap(page_address(page), page); > > } > > > > static inline void *kmap_atomic(struct page *page) > > { > > + void *kaddr; > > + > > preempt_disable(); > > pagefault_disable(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > > > static inline void __kunmap_atomic(void *addr) > > { > > + xpfo_kunmap(addr, virt_to_page(addr)); > > pagefault_enable(); > > preempt_enable(); > > } > > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > > index 9298c393ddaa..0e451a42e5a3 100644 > > --- a/include/linux/page_ext.h > > +++ b/include/linux/page_ext.h > > @@ -29,6 +29,8 @@ enum page_ext_flags { > > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > > PAGE_EXT_DEBUG_GUARD, > > PAGE_EXT_OWNER, > > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > PAGE_EXT_YOUNG, > > PAGE_EXT_IDLE, > > @@ -44,6 +46,11 @@ enum page_ext_flags { > > */ > > struct page_ext { > > unsigned long flags; > > +#ifdef CONFIG_XPFO > > + int inited; /* Map counter and lock initialized */ > > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > > +#endif > > }; > > > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > > new file mode 100644 > > index 000000000000..77187578ca33 > > --- /dev/null > > +++ b/include/linux/xpfo.h > > @@ -0,0 +1,39 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#ifndef _LINUX_XPFO_H > > +#define _LINUX_XPFO_H > > + > > +#ifdef CONFIG_XPFO > > + > > +extern struct page_ext_operations page_xpfo_ops; > > + > > +extern void xpfo_kmap(void *kaddr, struct page *page); > > +extern void xpfo_kunmap(void *kaddr, struct page *page); > > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > > +extern void xpfo_free_page(struct page *page, int order); > > + > > +extern bool xpfo_page_is_unmapped(struct page *page); > > + > > +#else /* !CONFIG_XPFO */ > > + > > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > > +static inline void xpfo_free_page(struct page *page, int order) { } > > + > > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > > + > > +#endif /* CONFIG_XPFO */ > > + > > +#endif /* _LINUX_XPFO_H */ > > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > > index 22e13a0e19d7..455eff44604e 100644 > > --- a/lib/swiotlb.c > > +++ b/lib/swiotlb.c > > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > > { > > unsigned long pfn = PFN_DOWN(orig_addr); > > unsigned char *vaddr = phys_to_virt(tlb_addr); > > + struct page *page = pfn_to_page(pfn); > > > > - if (PageHighMem(pfn_to_page(pfn))) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > /* The buffer does not have a mapping. Map it in and copy */ > > unsigned int offset = orig_addr & ~PAGE_MASK; > > char *buffer; > > diff --git a/mm/Makefile b/mm/Makefile > > index 295bd7a9f76b..175680f516aa 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > > +obj-$(CONFIG_XPFO) += xpfo.o > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8fd42aa7c4bd..100e80e008e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > > kernel_poison_pages(page, 1 << order, 0); > > kernel_map_pages(page, 1 << order, 0); > > kasan_free_pages(page, order); > > + xpfo_free_page(page, order); > > > > return true; > > } > > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > > kernel_map_pages(page, 1 << order, 1); > > kernel_poison_pages(page, 1 << order, 1); > > kasan_alloc_pages(page, order); > > + xpfo_alloc_page(page, order, gfp_flags); > > set_page_owner(page, order, gfp_flags); > > } > > > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 121dcffc4ec1..ba6dbcacc2db 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * struct page extension > > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > &page_idle_ops, > > #endif > > +#ifdef CONFIG_XPFO > > + &page_xpfo_ops, > > +#endif > > }; > > > > static unsigned long total_usage; > > diff --git a/mm/xpfo.c b/mm/xpfo.c > > new file mode 100644 > > index 000000000000..8e3a6a694b6a > > --- /dev/null > > +++ b/mm/xpfo.c > > @@ -0,0 +1,206 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > > + > > +static bool need_xpfo(void) > > +{ > > + return true; > > +} > > + > > +static void init_xpfo(void) > > +{ > > + printk(KERN_INFO "XPFO enabled\n"); > > + static_branch_enable(&xpfo_inited); > > +} > > + > > +struct page_ext_operations page_xpfo_ops = { > > + .need = need_xpfo, > > + .init = init_xpfo, > > +}; > > + > > +/* > > + * Update a single kernel page table entry > > + */ > > +static inline void set_kpte(struct page *page, unsigned long kaddr, > > + pgprot_t prot) { > > + unsigned int level; > > + pte_t *kpte = lookup_address(kaddr, &level); > > + > > + /* We only support 4k pages for now */ > > + BUG_ON(!kpte || level != PG_LEVEL_4K); > > + > > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > > +} > > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, > would it be better to put the whole definition into arch-specific part? > > > + > > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > > +{ > > + int i, flush_tlb = 0; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + > > + /* Initialize the map lock and map counter */ > > + if (!page_ext->inited) { > > + spin_lock_init(&page_ext->maplock); > > + atomic_set(&page_ext->mapcount, 0); > > + page_ext->inited = 1; > > + } > > + BUG_ON(atomic_read(&page_ext->mapcount)); > > + > > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > > + /* > > + * Flush the TLB if the page was previously allocated > > + * to the kernel. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > > + &page_ext->flags)) > > + flush_tlb = 1; > > + } else { > > + /* Tag the page as a kernel page */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + } > > + } > > + > > + if (flush_tlb) { > > + kaddr = (unsigned long)page_address(page); > > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > > + PAGE_SIZE); > > + } > > +} > > + > > +void xpfo_free_page(struct page *page, int order) > > +{ > > + int i; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + if (!page_ext->inited) { > > + /* > > + * The page was allocated before page_ext was > > + * initialized, so it is a kernel page and it needs to > > + * be tagged accordingly. > > + */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + continue; > > + } > > + > > + /* > > + * Map the page back into the kernel if it was previously > > + * allocated to user space. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > > + &page_ext->flags)) { > > + kaddr = (unsigned long)page_address(page + i); > > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > > Why not PAGE_KERNEL? > > > + } > > + } > > +} > > + > > +void xpfo_kmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page was previously allocated to user space, so map it back > > + * into the kernel. No TLB flush required. > > + */ > > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kmap); > > + > > +void xpfo_kunmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page is to be allocated back to user space, so unmap it from the > > + * kernel, flush the TLB and tag it as a user page. > > + */ > > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > > + __flush_tlb_one((unsigned long)kaddr); > > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? > > Thanks, > -Takahiro AKASHI > > > + } > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kunmap); > > + > > +inline bool xpfo_page_is_unmapped(struct page *page) > > +{ > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return false; > > + > > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > > +} > > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > > diff --git a/security/Kconfig b/security/Kconfig > > index 118f4549404e..4502e15c8419 100644 > > --- a/security/Kconfig > > +++ b/security/Kconfig > > @@ -6,6 +6,25 @@ menu "Security options" > > > > source security/keys/Kconfig > > > > +config ARCH_SUPPORTS_XPFO > > + bool > > + > > +config XPFO > > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > > + default n > > + depends on ARCH_SUPPORTS_XPFO > > + select PAGE_EXTENSION > > + help > > + This option offers protection against 'ret2dir' kernel attacks. > > + When enabled, every time a page frame is allocated to user space, it > > + is unmapped from the direct mapped RAM region in kernel space > > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > > + mapped back to physmap. > > + > > + There is a slight performance impact when this option is enabled. > > + > > + If in doubt, say "N". > > + > > config SECURITY_DMESG_RESTRICT > > bool "Restrict unprivileged access to the kernel syslog" > > default n > > -- > > 2.10.1 > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> <57C9E37A.9070805@intel.com> From: Juerg Haefliger Message-ID: Date: Mon, 5 Sep 2016 13:54:47 +0200 MIME-Version: 1.0 In-Reply-To: <57C9E37A.9070805@intel.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t" Subject: [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t Content-Type: multipart/mixed; boundary="HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta"; protected-headers="v1" From: Juerg Haefliger To: Dave Hansen , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> <57C9E37A.9070805@intel.com> In-Reply-To: <57C9E37A.9070805@intel.com> --HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 09/02/2016 10:39 PM, Dave Hansen wrote: > On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Allocating a page to userspace that was previously allocated to the >> kernel requires an expensive TLB shootdown. To minimize this, we only >> put non-kernel pages into the hot cache to favor their allocation. >=20 > But kernel allocations do allocate from these pools, right? Yes. > Does this > just mean that kernel allocations usually have to pay the penalty to > convert a page? Only pages that are allocated for userspace (gfp & GFP_HIGHUSER =3D=3D GF= P_HIGHUSER) which were previously allocated for the kernel (gfp & GFP_HIGHUSER !=3D GFP_HIGHUSER= ) have to pay the penalty. > So, what's the logic here? You're assuming that order-0 kernel > allocations are more rare than allocations for userspace? The logic is to put reclaimed kernel pages into the cold cache to postpon= e their allocation as long as possible to minimize (potential) TLB flushes. =2E..Juerg --HHOVQnPV2XNCXH9POK17P2Fj5EiJGnEta-- --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJXzV0HAAoJEHVMOpb5+LSM304QAINSMlOQKNAIRon29Uy318Sf J9Vfv3p2L/WIxrL6kKaHYkDqj+b0XnSVlWvnxNp1MX1qAOqeUSipfymvwNaYGuIV IJQeahOcJccupJMw1ILF+H1Rhxn+gBOc9I745omwO/CtlqYaYaXfCeIxI/R1Q9LQ yCtBPnbL4v1St7FnjDhZd3FdgiP+F98MAz8040FYq1cO+qWVDTyIRcpq4rPaAJNi 8zcpLB+A34qjA2i3ZFV/ZNls2L4Buw4pYW1ZGnHxNTKKmbrYkZhBuxYuCNpfnyhB M00AnBKJQ7fqHKxCa64eo59rRTpYQ0Zd8KaKvVaZfZfbBaAg8Ir2UWNoBcPvE8ox D8TMhKlORMhHfnAE73DIlkENt1wYt2gGGScIJ+bL8nulJpqvNo5lPyTT3NHhrNZa prre5DzDQFvyv2SLx2P3MDqtyJ658hKx5own+82N99K5GuhC2++Xaq3/BpOC4rQI rEONoXhm0j63g2udCmkc1BIRSb+ZTaqzC1fxWoYH75nYEiIhGcgTQVJWXMx5DB/Y gvJJn/okC97zSXGk8zQtYIO2aDhUzRowYoy5bslzlR20hoNTWL9ctySE2OobqIdM WmWm/Hyq59cAMneimMv68+/RiWtxL2s5Q+8lci8uf18ollN1g/zp6H/1qOOMWvAr vqBZxyugEM72PbSEFv8Y =Jva/ -----END PGP SIGNATURE----- --feNkja7ck77XjJ8qB7LALqg0b3p8XSw7t-- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 2 Sep 2016 13:39:06 +0200 Message-Id: <20160902113909.32631-1-juerg.haefliger@hpe.com> In-Reply-To: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 2 Sep 2016 13:39:07 +0200 Message-Id: <20160902113909.32631-2-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 2 Sep 2016 13:39:08 +0200 Message-Id: <20160902113909.32631-3-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 2 Sep 2016 13:39:09 +0200 Message-Id: <20160902113909.32631-4-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <1456496467-14247-1-git-send-email-juerg.haefliger@hpe.com> <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160902113909.32631-3-juerg.haefliger@hpe.com> From: Dave Hansen Message-ID: <57C9E37A.9070805@intel.com> Date: Fri, 2 Sep 2016 13:39:22 -0700 MIME-Version: 1.0 In-Reply-To: <20160902113909.32631-3-juerg.haefliger@hpe.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: [kernel-hardening] Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu List-ID: On 09/02/2016 04:39 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. But kernel allocations do allocate from these pools, right? Does this just mean that kernel allocations usually have to pay the penalty to convert a page? So, what's the logic here? You're assuming that order-0 kernel allocations are more rare than allocations for userspace? From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> From: Juerg Haefliger Message-ID: Date: Wed, 14 Sep 2016 09:23:58 +0200 MIME-Version: 1.0 In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA" Subject: [kernel-hardening] Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA Content-Type: multipart/mixed; boundary="rOu3TDjgvxEHAGHDHwIoR03apojltv8SM"; protected-headers="v1" From: Juerg Haefliger To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> --rOu3TDjgvxEHAGHDHwIoR03apojltv8SM Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Resending to include the kernel-hardening list. Sorry, I wasn't subscribe= d with the correct email address when I sent this the first time. =2E..Juerg On 09/14/2016 09:18 AM, Juerg Haefliger wrote: > Changes from: > v1 -> v2: > - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) > arch-agnostic. > - Moved the config to the generic layer and added ARCH_SUPPORTS_XPF= O > for x86. > - Use page_ext for the additional per-page data. > - Removed the clearing of pages. This can be accomplished by using > PAGE_POISONING. > - Split up the patch into multiple patches. > - Fixed additional issues identified by reviewers. >=20 > This patch series adds support for XPFO which protects against 'ret2dir= ' > kernel attacks. The basic idea is to enforce exclusive ownership of pag= e > frames by either the kernel or userspace, unless explicitly requested b= y > the kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. >=20 > Additional fields in the page_ext struct are used for XPFO housekeeping= =2E > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operation= s > and a lock to serialize access to the XPFO fields. >=20 > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel n= eeds > to access userspace which need to be made XPFO-aware > - Performance penalty >=20 > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >=20 > Juerg Haefliger (3): > Add support for eXclusive Page Frame Ownership (XPFO) > xpfo: Only put previous userspace pages into the hot cache > block: Always use a bounce buffer when XPFO is enabled >=20 > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > block/blk-map.c | 2 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 41 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 10 ++- > mm/page_ext.c | 4 + > mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++= ++++++++ > security/Kconfig | 20 +++++ > 12 files changed, 314 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c >=20 --=20 Juerg Haefliger Hewlett Packard Enterprise --rOu3TDjgvxEHAGHDHwIoR03apojltv8SM-- --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJX2PsOAAoJEHVMOpb5+LSMsA0QAIHMfTGCGDrTmqVL6Bi7/947 CYxgUUi235iAvh9DX+c50Oci+IrhRGKOiXOw7D4lj6MFYnhoBFKFlT/hioJoB+KU iv+Kb9HfN8Ab0BITVhmFKOJ8vsELLI/gbOTBceDimVoEndlkYXeP1AnVL56Y1Dmr 17k5Yhy4pdKLvOt4NYTprKEnc+td1XtbZ/biRZhrCRrhLFgaQDB2gOYZmu0kny7X Plp04Ts/fhsh8nh86ej1BeU4yg0XPexi9I+O8TSrzsG8LUSj3Ev1g/56rETzYeze +QOzUhuMOEZLju+5Cix9tjPG7RPPQJ+k1SqNhE4q+YHwwOhx+Qa5RJRL92/hyoDk cMBhDb5Mk/G2Y0CzvGYurfJxFny6h324NTjvUhNquTV5hXwy61e2qAkd0bOh2W3o 8RfwVp/xYoYeqbkcNcq+tyPSx6rC4MUC07jm28pn9McAyaLIBN63tuyAX9Hm8lAh euxdnSG0EcFMA2PpFVAvIoTY+a7l3gEViQPYdjmDgVY3Sbq7cBZJv4mBcgsnI7oY S2Jbd0y9oE7zv2lJL2xbt1Ylu3wR5+BHUWcg6nUrEHrNNJ/C3QRtoMEKA7MqP+l7 DgSIbUQZ5IFDZ5nNHUmGI6pz49PG+4k8aef2hoH3tzFJ1Az1u/qCSB5H0obSez9Q 2WwUHG3aQkkJwO5Rd0DS =Wq/n -----END PGP SIGNATURE----- --X1S6k1OXnDBv9lOh0eaK4rxduXi2rQ5fA-- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Wed, 14 Sep 2016 09:18:58 +0200 Message-Id: <20160914071901.8127-1-juerg.haefliger@hpe.com> In-Reply-To: <20160902113909.32631-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 0/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: Changes from: v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (3): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache block: Always use a bounce buffer when XPFO is enabled arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- block/blk-map.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 213 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 12 files changed, 314 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Wed, 14 Sep 2016 09:18:59 +0200 Message-Id: <20160914071901.8127-2-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 1/3] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 205 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 20 +++++ 11 files changed, 296 insertions(+), 5 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index c580d8c33562..dc5604a710c6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1350,7 +1351,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d28a2d741f9e..426427b54639 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 03f2a3e7d76d..fdf63dcc399e 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -27,6 +27,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -48,6 +50,11 @@ struct page_ext { int last_migrate_reason; depot_stack_handle_t handle; #endif +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 2ca1faf3fa09..e6f8894423da 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3fbe73a6fe4b..0241c8a7e72a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1029,6 +1029,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1726,6 +1727,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 44a4c029c8e7..1cd7d7f460cc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -63,6 +64,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..ddb1be05485d --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,205 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} diff --git a/security/Kconfig b/security/Kconfig index da10d9b573a4..1eac37a9bec2 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,26 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on DEBUG_KERNEL && ARCH_SUPPORTS_XPFO + select DEBUG_TLBFLUSH + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Wed, 14 Sep 2016 09:19:00 +0200 Message-Id: <20160914071901.8127-3-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 0241c8a7e72a..83404b41e52d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2421,7 +2421,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index ddb1be05485d..f8dffda0c961 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -203,3 +203,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Wed, 14 Sep 2016 09:19:01 +0200 Message-Id: <20160914071901.8127-4-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: juerg.haefliger@hpe.com, vpk@cs.columbia.edu List-ID: This is a temporary hack to prevent the use of bio_map_user_iov() which causes XPFO page faults. Signed-off-by: Juerg Haefliger --- block/blk-map.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/blk-map.c b/block/blk-map.c index b8657fa8dc9a..e889dbfee6fb 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -52,7 +52,7 @@ static int __blk_rq_map_user_iov(struct request *rq, struct bio *bio, *orig_bio; int ret; - if (copy) + if (copy || IS_ENABLED(CONFIG_XPFO)) bio = bio_copy_user_iov(q, map_data, iter, gfp_mask); else bio = bio_map_user_iov(q, iter, gfp_mask); -- 2.9.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Date: Wed, 14 Sep 2016 00:33:40 -0700 From: Christoph Hellwig Message-ID: <20160914073340.GA28090@infradead.org> References: <20160902113909.32631-1-juerg.haefliger@hpe.com> <20160914071901.8127-1-juerg.haefliger@hpe.com> <20160914071901.8127-4-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160914071901.8127-4-juerg.haefliger@hpe.com> Subject: [kernel-hardening] Re: [RFC PATCH v2 3/3] block: Always use a bounce buffer when XPFO is enabled To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: On Wed, Sep 14, 2016 at 09:19:01AM +0200, Juerg Haefliger wrote: > This is a temporary hack to prevent the use of bio_map_user_iov() > which causes XPFO page faults. > > Signed-off-by: Juerg Haefliger Sorry, but if your scheme doesn't support get_user_pages access to user memory is't a steaming pile of crap and entirely unacceptable. From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 4 Nov 2016 15:45:32 +0100 Message-Id: <20161104144534.14790-1-juerg.haefliger@hpe.com> In-Reply-To: <20160914071901.8127-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v3 0/2] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com List-ID: Changes from: v2 -> v3: - Removed 'depends on DEBUG_KERNEL' and 'select DEBUG_TLBFLUSH'. These are left-overs from the original patch and are not required. - Make libata XPFO-aware, i.e., properly handle pages that were unmapped by XPFO. This takes care of the temporary hack in v2 that forced the use of a bounce buffer in block/blk-map.c. v1 -> v2: - Moved the code from arch/x86/mm/ to mm/ since it's (mostly) arch-agnostic. - Moved the config to the generic layer and added ARCH_SUPPORTS_XPFO for x86. - Use page_ext for the additional per-page data. - Removed the clearing of pages. This can be accomplished by using PAGE_POISONING. - Split up the patch into multiple patches. - Fixed additional issues identified by reviewers. This patch series adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (removed from the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Juerg Haefliger (2): Add support for eXclusive Page Frame Ownership (XPFO) xpfo: Only put previous userspace pages into the hot cache arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 41 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 10 ++- mm/page_ext.c | 4 + mm/xpfo.c | 214 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 315 insertions(+), 8 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 4 Nov 2016 15:45:33 +0100 Message-Id: <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com List-ID: This patch adds support for XPFO which protects against 'ret2dir' kernel attacks. The basic idea is to enforce exclusive ownership of page frames by either the kernel or userspace, unless explicitly requested by the kernel. Whenever a page destined for userspace is allocated, it is unmapped from physmap (the kernel's page table). When such a page is reclaimed from userspace, it is mapped back to physmap. Additional fields in the page_ext struct are used for XPFO housekeeping. Specifically two flags to distinguish user vs. kernel pages and to tag unmapped pages and a reference counter to balance kmap/kunmap operations and a lock to serialize access to the XPFO fields. Known issues/limitations: - Only supports x86-64 (for now) - Only supports 4k pages (for now) - There are most likely some legitimate uses cases where the kernel needs to access userspace which need to be made XPFO-aware - Performance penalty Reference paper by the original patch authors: http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Suggested-by: Vasileios P. Kemerlis Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 3 +- arch/x86/mm/init.c | 2 +- drivers/ata/libata-sff.c | 4 +- include/linux/highmem.h | 15 +++- include/linux/page_ext.h | 7 ++ include/linux/xpfo.h | 39 +++++++++ lib/swiotlb.c | 3 +- mm/Makefile | 1 + mm/page_alloc.c | 2 + mm/page_ext.c | 4 + mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ security/Kconfig | 19 +++++ 12 files changed, 298 insertions(+), 7 deletions(-) create mode 100644 include/linux/xpfo.h create mode 100644 mm/xpfo.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index bada636d1065..38b334f8fde5 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -165,6 +165,7 @@ config X86 select HAVE_STACK_VALIDATION if X86_64 select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS + select ARCH_SUPPORTS_XPFO if X86_64 config INSTRUCTION_DECODER def_bool y @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT config X86_DIRECT_GBPAGES def_bool y - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO ---help--- Certain kernel features effectively disable kernel linear 1 GB mappings (even if the CPU otherwise diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 22af912d66d2..a6fafbae02bb 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,7 +161,7 @@ static int page_size_mask; static void __init probe_page_size_mask(void) { -#if !defined(CONFIG_KMEMCHECK) +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) /* * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will * use small pages. diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 051b6158d1b7..58af734be25d 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use a bounce buffer */ @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { unsigned long flags; /* FIXME: use bounce buffer */ diff --git a/include/linux/highmem.h b/include/linux/highmem.h index bb3f3297062a..7a17c166532f 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -7,6 +7,7 @@ #include #include #include +#include #include @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) #ifndef ARCH_HAS_KMAP static inline void *kmap(struct page *page) { + void *kaddr; + might_sleep(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } static inline void kunmap(struct page *page) { + xpfo_kunmap(page_address(page), page); } static inline void *kmap_atomic(struct page *page) { + void *kaddr; + preempt_disable(); pagefault_disable(); - return page_address(page); + kaddr = page_address(page); + xpfo_kmap(kaddr, page); + return kaddr; } #define kmap_atomic_prot(page, prot) kmap_atomic(page) static inline void __kunmap_atomic(void *addr) { + xpfo_kunmap(addr, virt_to_page(addr)); pagefault_enable(); preempt_enable(); } diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index 9298c393ddaa..0e451a42e5a3 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,6 +29,8 @@ enum page_ext_flags { PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ PAGE_EXT_DEBUG_GUARD, PAGE_EXT_OWNER, + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, @@ -44,6 +46,11 @@ enum page_ext_flags { */ struct page_ext { unsigned long flags; +#ifdef CONFIG_XPFO + int inited; /* Map counter and lock initialized */ + atomic_t mapcount; /* Counter for balancing map/unmap requests */ + spinlock_t maplock; /* Lock to serialize map/unmap requests */ +#endif }; extern void pgdat_page_ext_init(struct pglist_data *pgdat); diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h new file mode 100644 index 000000000000..77187578ca33 --- /dev/null +++ b/include/linux/xpfo.h @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#ifndef _LINUX_XPFO_H +#define _LINUX_XPFO_H + +#ifdef CONFIG_XPFO + +extern struct page_ext_operations page_xpfo_ops; + +extern void xpfo_kmap(void *kaddr, struct page *page); +extern void xpfo_kunmap(void *kaddr, struct page *page); +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); +extern void xpfo_free_page(struct page *page, int order); + +extern bool xpfo_page_is_unmapped(struct page *page); + +#else /* !CONFIG_XPFO */ + +static inline void xpfo_kmap(void *kaddr, struct page *page) { } +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } +static inline void xpfo_free_page(struct page *page, int order) { } + +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } + +#endif /* CONFIG_XPFO */ + +#endif /* _LINUX_XPFO_H */ diff --git a/lib/swiotlb.c b/lib/swiotlb.c index 22e13a0e19d7..455eff44604e 100644 --- a/lib/swiotlb.c +++ b/lib/swiotlb.c @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, { unsigned long pfn = PFN_DOWN(orig_addr); unsigned char *vaddr = phys_to_virt(tlb_addr); + struct page *page = pfn_to_page(pfn); - if (PageHighMem(pfn_to_page(pfn))) { + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { /* The buffer does not have a mapping. Map it in and copy */ unsigned int offset = orig_addr & ~PAGE_MASK; char *buffer; diff --git a/mm/Makefile b/mm/Makefile index 295bd7a9f76b..175680f516aa 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o +obj-$(CONFIG_XPFO) += xpfo.o diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8fd42aa7c4bd..100e80e008e2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, kernel_poison_pages(page, 1 << order, 0); kernel_map_pages(page, 1 << order, 0); kasan_free_pages(page, order); + xpfo_free_page(page, order); return true; } @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, kernel_map_pages(page, 1 << order, 1); kernel_poison_pages(page, 1 << order, 1); kasan_alloc_pages(page, order); + xpfo_alloc_page(page, order, gfp_flags); set_page_owner(page, order, gfp_flags); } diff --git a/mm/page_ext.c b/mm/page_ext.c index 121dcffc4ec1..ba6dbcacc2db 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -7,6 +7,7 @@ #include #include #include +#include /* * struct page extension @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) &page_idle_ops, #endif +#ifdef CONFIG_XPFO + &page_xpfo_ops, +#endif }; static unsigned long total_usage; diff --git a/mm/xpfo.c b/mm/xpfo.c new file mode 100644 index 000000000000..8e3a6a694b6a --- /dev/null +++ b/mm/xpfo.c @@ -0,0 +1,206 @@ +/* + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. + * Copyright (C) 2016 Brown University. All rights reserved. + * + * Authors: + * Juerg Haefliger + * Vasileios P. Kemerlis + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + */ + +#include +#include +#include +#include + +#include + +DEFINE_STATIC_KEY_FALSE(xpfo_inited); + +static bool need_xpfo(void) +{ + return true; +} + +static void init_xpfo(void) +{ + printk(KERN_INFO "XPFO enabled\n"); + static_branch_enable(&xpfo_inited); +} + +struct page_ext_operations page_xpfo_ops = { + .need = need_xpfo, + .init = init_xpfo, +}; + +/* + * Update a single kernel page table entry + */ +static inline void set_kpte(struct page *page, unsigned long kaddr, + pgprot_t prot) { + unsigned int level; + pte_t *kpte = lookup_address(kaddr, &level); + + /* We only support 4k pages for now */ + BUG_ON(!kpte || level != PG_LEVEL_4K); + + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); +} + +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) +{ + int i, flush_tlb = 0; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + + /* Initialize the map lock and map counter */ + if (!page_ext->inited) { + spin_lock_init(&page_ext->maplock); + atomic_set(&page_ext->mapcount, 0); + page_ext->inited = 1; + } + BUG_ON(atomic_read(&page_ext->mapcount)); + + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { + /* + * Flush the TLB if the page was previously allocated + * to the kernel. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, + &page_ext->flags)) + flush_tlb = 1; + } else { + /* Tag the page as a kernel page */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + } + } + + if (flush_tlb) { + kaddr = (unsigned long)page_address(page); + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * + PAGE_SIZE); + } +} + +void xpfo_free_page(struct page *page, int order) +{ + int i; + struct page_ext *page_ext; + unsigned long kaddr; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + for (i = 0; i < (1 << order); i++) { + page_ext = lookup_page_ext(page + i); + + if (!page_ext->inited) { + /* + * The page was allocated before page_ext was + * initialized, so it is a kernel page and it needs to + * be tagged accordingly. + */ + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); + continue; + } + + /* + * Map the page back into the kernel if it was previously + * allocated to user space. + */ + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, + &page_ext->flags)) { + kaddr = (unsigned long)page_address(page + i); + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); + } + } +} + +void xpfo_kmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page was previously allocated to user space, so map it back + * into the kernel. No TLB flush required. + */ + if ((atomic_inc_return(&page_ext->mapcount) == 1) && + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kmap); + +void xpfo_kunmap(void *kaddr, struct page *page) +{ + struct page_ext *page_ext; + unsigned long flags; + + if (!static_branch_unlikely(&xpfo_inited)) + return; + + page_ext = lookup_page_ext(page); + + /* + * The page was allocated before page_ext was initialized (which means + * it's a kernel page) or it's allocated to the kernel, so nothing to + * do. + */ + if (!page_ext->inited || + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) + return; + + spin_lock_irqsave(&page_ext->maplock, flags); + + /* + * The page is to be allocated back to user space, so unmap it from the + * kernel, flush the TLB and tag it as a user page. + */ + if (atomic_dec_return(&page_ext->mapcount) == 0) { + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); + __flush_tlb_one((unsigned long)kaddr); + } + + spin_unlock_irqrestore(&page_ext->maplock, flags); +} +EXPORT_SYMBOL(xpfo_kunmap); + +inline bool xpfo_page_is_unmapped(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); +} +EXPORT_SYMBOL(xpfo_page_is_unmapped); diff --git a/security/Kconfig b/security/Kconfig index 118f4549404e..4502e15c8419 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -6,6 +6,25 @@ menu "Security options" source security/keys/Kconfig +config ARCH_SUPPORTS_XPFO + bool + +config XPFO + bool "Enable eXclusive Page Frame Ownership (XPFO)" + default n + depends on ARCH_SUPPORTS_XPFO + select PAGE_EXTENSION + help + This option offers protection against 'ret2dir' kernel attacks. + When enabled, every time a page frame is allocated to user space, it + is unmapped from the direct mapped RAM region in kernel space + (physmap). Similarly, when a page frame is freed/reclaimed, it is + mapped back to physmap. + + There is a slight performance impact when this option is enabled. + + If in doubt, say "N". + config SECURITY_DMESG_RESTRICT bool "Restrict unprivileged access to the kernel syslog" default n -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com From: Juerg Haefliger Date: Fri, 4 Nov 2016 15:45:34 +0100 Message-Id: <20161104144534.14790-3-juerg.haefliger@hpe.com> In-Reply-To: <20161104144534.14790-1-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> Subject: [kernel-hardening] [RFC PATCH v3 2/2] xpfo: Only put previous userspace pages into the hot cache To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com List-ID: Allocating a page to userspace that was previously allocated to the kernel requires an expensive TLB shootdown. To minimize this, we only put non-kernel pages into the hot cache to favor their allocation. Signed-off-by: Juerg Haefliger --- include/linux/xpfo.h | 2 ++ mm/page_alloc.c | 8 +++++++- mm/xpfo.c | 8 ++++++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h index 77187578ca33..077d1cfadfa2 100644 --- a/include/linux/xpfo.h +++ b/include/linux/xpfo.h @@ -24,6 +24,7 @@ extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); extern void xpfo_free_page(struct page *page, int order); extern bool xpfo_page_is_unmapped(struct page *page); +extern bool xpfo_page_is_kernel(struct page *page); #else /* !CONFIG_XPFO */ @@ -33,6 +34,7 @@ static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } static inline void xpfo_free_page(struct page *page, int order) { } static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } +static inline bool xpfo_page_is_kernel(struct page *page) { return false; } #endif /* CONFIG_XPFO */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 100e80e008e2..09ef4f7cfd14 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2440,7 +2440,13 @@ void free_hot_cold_page(struct page *page, bool cold) } pcp = &this_cpu_ptr(zone->pageset)->pcp; - if (!cold) + /* + * XPFO: Allocating a page to userspace that was previously allocated + * to the kernel requires an expensive TLB shootdown. To minimize this, + * we only put non-kernel pages into the hot cache to favor their + * allocation. + */ + if (!cold && !xpfo_page_is_kernel(page)) list_add(&page->lru, &pcp->lists[migratetype]); else list_add_tail(&page->lru, &pcp->lists[migratetype]); diff --git a/mm/xpfo.c b/mm/xpfo.c index 8e3a6a694b6a..0e447e38008a 100644 --- a/mm/xpfo.c +++ b/mm/xpfo.c @@ -204,3 +204,11 @@ inline bool xpfo_page_is_unmapped(struct page *page) return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); } EXPORT_SYMBOL(xpfo_page_is_unmapped); + +inline bool xpfo_page_is_kernel(struct page *page) +{ + if (!static_branch_unlikely(&xpfo_inited)) + return false; + + return test_bit(PAGE_EXT_XPFO_KERNEL, &lookup_page_ext(page)->flags); +} -- 2.10.1 From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Date: Fri, 4 Nov 2016 07:50:40 -0700 From: Christoph Hellwig Message-ID: <20161104145040.GA24930@infradead.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu, Tejun Heo , linux-ide@vger.kernel.org List-ID: The libata parts here really need to be split out and the proper list and maintainer need to be Cc'ed. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h This is just piling one nasty hack on top of another. libata should just use the highmem case unconditionally, as it is the correct thing to do for all cases. From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: "ZhaoJunmin Zhao(Junmin)" Message-ID: <58240B46.7080108@huawei.com> Date: Thu, 10 Nov 2016 13:53:10 +0800 MIME-Version: 1.0 In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Content-Type: text/plain; charset="gbk"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [kernel-hardening] [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: kernel-hardening@lists.openwall.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-x86_64@vger.kernel.org Cc: vpk@cs.columbia.edu, juerg.haefliger@hpe.com List-ID: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > When a physical page is assigned to a process in user space, it should be unmaped from kernel physmap. From the code, I can see the patch only handle the page in high memory zone. if the kernel use the high memory zone, it will call the kmap. So I would like to know if the physical page is coming from normal zone,how to handle it. Thanks Zhaojunmin From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com MIME-Version: 1.0 Sender: keescook@google.com In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:11:34 -0800 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. Thanks for keeping on this! I'd really like to see it land and then get more architectures to support it. > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty In the Kconfig you say "slight", but I'm curious what kinds of benchmarks you've done and if there's a more specific cost we can declare, just to give people more of an idea what the hit looks like? (What workloads would trigger a lot of XPFO unmapping, for example?) Thanks! -Kees -- Kees Cook Nexus Security From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com MIME-Version: 1.0 Sender: keescook@google.com In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Kees Cook Date: Thu, 10 Nov 2016 11:24:46 -0800 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf Would it be possible to create an lkdtm test that can exercise this protection? > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool Can you include a "help" section here to describe what requirements an architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and HAVE_ARCH_VMAP_STACK or some examples. > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > I've added these patches to my kspp tree on kernel.org, so it should get some 0-day testing now... Thanks! -Kees -- Kees Cook Nexus Security From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Juerg Haefliger Message-ID: Date: Tue, 15 Nov 2016 12:15:14 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE" Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: multipart/mixed; boundary="Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Sorry for the late reply, I just found your email in my cluttered inbox. On 11/10/2016 08:11 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >=20 > Thanks for keeping on this! I'd really like to see it land and then > get more architectures to support it. Good to hear :-) >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >=20 > In the Kconfig you say "slight", but I'm curious what kinds of > benchmarks you've done and if there's a more specific cost we can > declare, just to give people more of an idea what the hit looks like? > (What workloads would trigger a lot of XPFO unmapping, for example?) That 'slight' wording is based on the performance numbers published in th= e referenced paper. So far I've only run kernel compilation tests. For that workload, the big= performance hit comes from disabling >4k page sizes (around 10%). Adding XPFO on top causes 'only' a= nother 0.5% performance penalty. I'm currently looking into adding support for larger page sizes = to see what the real impact is and then generate some more relevant numbers. =2E..Juerg > Thanks! >=20 > -Kees >=20 --Hdqn216xR43vntA6rDQsbXD8mI2B2GFMR-- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu5DAAoJEHVMOpb5+LSMzQ8P+wWBd+Sen2m8U4Q7HjsdGCoB 9fHq5r8x/bt+WvqF2i8vMR5Txrfn/EoOAkxkOu8tYiq7ECnHSnETAR8NVR2ckp0M cizhmBdOiiMcOUiLSnPGxEx9390Qdx5li0ODwqQS5dSa9qCkBbbv6qf7ri5CzDFH VO+OIAHI/kChTi4baKENq3UNHh0+8s/M0dykDwStIjrDG4Nh+IcEWOeDvOBWZ5HG qxZQEg20reipzZTcba7paJ/pJQZBuKg/AFdQW/RFBFK3O0JngWKp67ZmxSU7PHw+ xr9qpKy+N9Yk3q5id7q2f2zA7eq3a3uYTNC+8d7zc6KQJIofnCLX/3dtuIEwS9rR QSxQIPtk2sFmPLy/kXpU2RihdIJijJtx7RmbW7KEiuUMwUO+dDjjwJul9SNxlYWg gYjUxPAGP6jxfGL443YKNbss2e5KfIh6LXlJpbtnD0WEfYiI7Ef2Y2qRrXpCkcw/ Z2kBLojOJOn8HagkHJiiw8lTwgDm2+YNcUWQoDgaTK9xOoAfMssETJfFaiGt6hsG 7VJot9jHg33kSZDyiTVBV6nwmCkOqtgXINYj8Q82iRmWUKPq2VEQEWWlvg31N9eu S1L7EFIaAzZvt+6qc/GCrjjQzgOz+En/UyfmPoojJ+A6dx8/gM6oWkOOZDsG614J 9rFANUbutWyZav73fc/L =Wzyi -----END PGP SIGNATURE----- --MMA5n0WALLx3TQ9tiMHKXcQ7909wPUkXE-- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> From: Juerg Haefliger Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Date: Tue, 15 Nov 2016 12:18:10 +0100 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF" Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: multipart/mixed; boundary="rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA"; protected-headers="v1" From: Juerg Haefliger To: Kees Cook Cc: LKML , Linux-MM , "kernel-hardening@lists.openwall.com" , linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: <9c558dfc-112a-bb52-88c5-206f5ca4fc42@hpe.com> Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> In-Reply-To: --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/10/2016 08:24 PM, Kees Cook wrote: > On Fri, Nov 4, 2016 at 7:45 AM, Juerg Haefliger wrote: >> This patch adds support for XPFO which protects against 'ret2dir' kern= el >> attacks. The basic idea is to enforce exclusive ownership of page fram= es >> by either the kernel or userspace, unless explicitly requested by the >> kernel. Whenever a page destined for userspace is allocated, it is >> unmapped from physmap (the kernel's page table). When such a page is >> reclaimed from userspace, it is mapped back to physmap. >> >> Additional fields in the page_ext struct are used for XPFO housekeepin= g. >> Specifically two flags to distinguish user vs. kernel pages and to tag= >> unmapped pages and a reference counter to balance kmap/kunmap operatio= ns >> and a lock to serialize access to the XPFO fields. >> >> Known issues/limitations: >> - Only supports x86-64 (for now) >> - Only supports 4k pages (for now) >> - There are most likely some legitimate uses cases where the kernel = needs >> to access userspace which need to be made XPFO-aware >> - Performance penalty >> >> Reference paper by the original patch authors: >> http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf >=20 > Would it be possible to create an lkdtm test that can exercise this pro= tection? I'll look into it. >> diff --git a/security/Kconfig b/security/Kconfig >> index 118f4549404e..4502e15c8419 100644 >> --- a/security/Kconfig >> +++ b/security/Kconfig >> @@ -6,6 +6,25 @@ menu "Security options" >> >> source security/keys/Kconfig >> >> +config ARCH_SUPPORTS_XPFO >> + bool >=20 > Can you include a "help" section here to describe what requirements an > architecture needs to support XPFO? See HAVE_ARCH_SECCOMP_FILTER and > HAVE_ARCH_VMAP_STACK or some examples. Will do. >> +config XPFO >> + bool "Enable eXclusive Page Frame Ownership (XPFO)" >> + default n >> + depends on ARCH_SUPPORTS_XPFO >> + select PAGE_EXTENSION >> + help >> + This option offers protection against 'ret2dir' kernel attac= ks. >> + When enabled, every time a page frame is allocated to user s= pace, it >> + is unmapped from the direct mapped RAM region in kernel spac= e >> + (physmap). Similarly, when a page frame is freed/reclaimed, = it is >> + mapped back to physmap. >> + >> + There is a slight performance impact when this option is ena= bled. >> + >> + If in doubt, say "N". >> + >> config SECURITY_DMESG_RESTRICT >> bool "Restrict unprivileged access to the kernel syslog" >> default n >=20 > I've added these patches to my kspp tree on kernel.org, so it should > get some 0-day testing now... Very good. Thanks! > Thanks! Appreciate the feedback. =2E..Juerg > -Kees >=20 --rclb8TCNqwCQ5eCEnVAGCpLPAekUixeEA-- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYKu7zAAoJEHVMOpb5+LSMThkP/1ZSAODxbIB2ebdrvax2absi nJwtgo56pBL7g7OJu/OrxUXvMHi9LGfahZOUTUmRCiQIL60EdjCJvQB9wcASVr3i 7AO1ztMGZxmGl/UlobukQs0xTlFU9FcYJFxTqKQPHA8PFnzQZe5jqG1JwTjhw4Z7 ANULiFZGG0G0vSXAagWwiwdzZJyt4HCSamfoESBKSBTK8TywvIFDqy/qsHHlmpjd EExwax4E/VB+Yl8Tg2RvgHHI1kQpTB1dPBfAQvXOTjujdHVGxVZSZBss+3HXL5vi BbNA0Gez+aNvVp2tTTeyWce9y11nIAZgU4rcjxkBqGoU73S+I2ltlIN7MCbKOYR3 /wGxXpCeOCWRVcFxm4yxnQcWOXWMa7aIVHMf7uHU53oKOqGtglFQcMR6V4bcmNG9 n+jLQZr/ADR9PJ2Rsb1vVyOlNiy+uQ+JCA5lBfEe+ckPW2MSc5GedzeETGYQgdUS u9ZzGrbtW9++PXXjgm6YBoaij0vjhVH2/Q1WU3wwdzBDGIaRpy1Bh0zShDdQ7S8y G83c8dHH4Yc1CIljCA0+Ipur3nvuoJKdc6Kxy+j1JK86t6dK8sktXS/1SnBIGM7T L30CH60pgfyvpDEWbSXoQXjdyuYMaQALBYX258KXuH8e9+vjPrO/UC8prgJqK/C1 rbWnk9S8v1HGxMfThiYi =Vrrl -----END PGP SIGNATURE----- --tIVC2OL3od72E71xQ07EDiAJ4OLahkkKF-- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> From: Juerg Haefliger Message-ID: <795a34a6-ed04-dea3-73f5-d23e48f69de6@hpe.com> Date: Mon, 28 Nov 2016 12:15:10 +0100 MIME-Version: 1.0 In-Reply-To: <20161124105629.GA23034@linaro.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xK83457Tl0b0VsjtiEgrgA8md3HirLXVv" Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: AKASHI Takahiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv Content-Type: multipart/mixed; boundary="7nt0D3PUfp44460FfNLT0gAFB4oxfXvll"; protected-headers="v1" From: Juerg Haefliger To: AKASHI Takahiro , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu Message-ID: <795a34a6-ed04-dea3-73f5-d23e48f69de6@hpe.com> Subject: Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> In-Reply-To: <20161124105629.GA23034@linaro.org> --7nt0D3PUfp44460FfNLT0gAFB4oxfXvll Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 11/24/2016 11:56 AM, AKASHI Takahiro wrote: > Hi, >=20 > I'm trying to give it a spin on arm64, but ... Thanks for trying this. >> +/* >> + * Update a single kernel page table entry >> + */ >> +static inline void set_kpte(struct page *page, unsigned long kaddr, >> + pgprot_t prot) { >> + unsigned int level; >> + pte_t *kpte =3D lookup_address(kaddr, &level); >> + >> + /* We only support 4k pages for now */ >> + BUG_ON(!kpte || level !=3D PG_LEVEL_4K); >> + >> + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot)))= ; >> +} >=20 > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-sp= ecific, > would it be better to put the whole definition into arch-specific part?= Well yes but I haven't really looked into splitting up the arch specific = stuff. >> + /* >> + * Map the page back into the kernel if it was previously >> + * allocated to user space. >> + */ >> + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, >> + &page_ext->flags)) { >> + kaddr =3D (unsigned long)page_address(page + i); >> + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); >=20 > Why not PAGE_KERNEL? Good catch, thanks! >> + /* >> + * The page is to be allocated back to user space, so unmap it from = the >> + * kernel, flush the TLB and tag it as a user page. >> + */ >> + if (atomic_dec_return(&page_ext->mapcount) =3D=3D 0) { >> + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); >> + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); >> + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); >> + __flush_tlb_one((unsigned long)kaddr); >=20 > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? I'll take a look. If you can tell me what the relevant arm64 equivalents = are for the arch-specific functions, that would help tremendously. Thanks for the comments! =2E..Juerg > Thanks, > -Takahiro AKASHI --=20 Juerg Haefliger Hewlett Packard Enterprise --7nt0D3PUfp44460FfNLT0gAFB4oxfXvll-- --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCAAGBQJYPBG+AAoJEHVMOpb5+LSMYaUP/ivlQhGWbPz1scInxJxIBSSL dHPcug/WEH2XjLIfm1BEhWVNMBYSUrVN/eWWcWE7BjYh7O+/makinUSIESNcbTPw uuA5NiMtsBEBgjgReq+hWC/yLJg0P3HFxFIdlg6nl8QnbGe3xT31UUm3/KxowaEb QcCvONwXl46FxpCMoQxq8Y4+2oSJm7Skaxp3lP3zPPuLClOvucxtbWOFM77nompO 1GagLX+kssFGKYNlUdkNlEK487hbLNkOx4Ipz9IqoPLvRNiYSJCjVlelFYkV6dfz UzBPbchD/HHiGIs8jPZFucGeFgMr9SMRNhJ6yMDfHjNXGsw1PycW93MVU3h2wIUH y+jW1IXmMiOI8q89sHPIAJtBYxRxDIStYmmd6XpdFhEmdhQwTJpR0uObwigDxcHz qvy88HvWepH8OnT/XkKfNNT7/HuVkg/jYbmraiLYP+ALWQBJg+iStaQ5bsRGtosh eQ17odAAs1438iWIaqSr84KtffSsKO+bNARWXAOhd2RPOoJAsWudpl/EkNQ+fyWd Lm0X2UfLQJ9MPRIdfXhFL0LkHGOYHfzut/8yG9KKTglV/sSoxDjtkbsWIm9TgyYT wpVs1zRAU9JUOfMkPeb+ih0oYZy7KZ1dJNSYPuBcfsQhHEeAAWYu539L51kmbPyu sB/zTqnSlUBfM71Ha3fV =GxFB -----END PGP SIGNATURE----- --xK83457Tl0b0VsjtiEgrgA8md3HirLXVv-- From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Date: Thu, 24 Nov 2016 19:56:30 +0900 From: AKASHI Takahiro Message-ID: <20161124105629.GA23034@linaro.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161104144534.14790-2-juerg.haefliger@hpe.com> Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: Hi, I'm trying to give it a spin on arm64, but ... On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > This patch adds support for XPFO which protects against 'ret2dir' kernel > attacks. The basic idea is to enforce exclusive ownership of page frames > by either the kernel or userspace, unless explicitly requested by the > kernel. Whenever a page destined for userspace is allocated, it is > unmapped from physmap (the kernel's page table). When such a page is > reclaimed from userspace, it is mapped back to physmap. > > Additional fields in the page_ext struct are used for XPFO housekeeping. > Specifically two flags to distinguish user vs. kernel pages and to tag > unmapped pages and a reference counter to balance kmap/kunmap operations > and a lock to serialize access to the XPFO fields. > > Known issues/limitations: > - Only supports x86-64 (for now) > - Only supports 4k pages (for now) > - There are most likely some legitimate uses cases where the kernel needs > to access userspace which need to be made XPFO-aware > - Performance penalty > > Reference paper by the original patch authors: > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > Suggested-by: Vasileios P. Kemerlis > Signed-off-by: Juerg Haefliger > --- > arch/x86/Kconfig | 3 +- > arch/x86/mm/init.c | 2 +- > drivers/ata/libata-sff.c | 4 +- > include/linux/highmem.h | 15 +++- > include/linux/page_ext.h | 7 ++ > include/linux/xpfo.h | 39 +++++++++ > lib/swiotlb.c | 3 +- > mm/Makefile | 1 + > mm/page_alloc.c | 2 + > mm/page_ext.c | 4 + > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > security/Kconfig | 19 +++++ > 12 files changed, 298 insertions(+), 7 deletions(-) > create mode 100644 include/linux/xpfo.h > create mode 100644 mm/xpfo.c > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index bada636d1065..38b334f8fde5 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -165,6 +165,7 @@ config X86 > select HAVE_STACK_VALIDATION if X86_64 > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > + select ARCH_SUPPORTS_XPFO if X86_64 > > config INSTRUCTION_DECODER > def_bool y > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > config X86_DIRECT_GBPAGES > def_bool y > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > ---help--- > Certain kernel features effectively disable kernel > linear 1 GB mappings (even if the CPU otherwise > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > index 22af912d66d2..a6fafbae02bb 100644 > --- a/arch/x86/mm/init.c > +++ b/arch/x86/mm/init.c > @@ -161,7 +161,7 @@ static int page_size_mask; > > static void __init probe_page_size_mask(void) > { > -#if !defined(CONFIG_KMEMCHECK) > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > /* > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > * use small pages. > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > index 051b6158d1b7..58af734be25d 100644 > --- a/drivers/ata/libata-sff.c > +++ b/drivers/ata/libata-sff.c > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use a bounce buffer */ > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > - if (PageHighMem(page)) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > unsigned long flags; > > /* FIXME: use bounce buffer */ > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index bb3f3297062a..7a17c166532f 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > #include > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > #ifndef ARCH_HAS_KMAP > static inline void *kmap(struct page *page) > { > + void *kaddr; > + > might_sleep(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > > static inline void kunmap(struct page *page) > { > + xpfo_kunmap(page_address(page), page); > } > > static inline void *kmap_atomic(struct page *page) > { > + void *kaddr; > + > preempt_disable(); > pagefault_disable(); > - return page_address(page); > + kaddr = page_address(page); > + xpfo_kmap(kaddr, page); > + return kaddr; > } > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > static inline void __kunmap_atomic(void *addr) > { > + xpfo_kunmap(addr, virt_to_page(addr)); > pagefault_enable(); > preempt_enable(); > } > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index 9298c393ddaa..0e451a42e5a3 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -29,6 +29,8 @@ enum page_ext_flags { > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > PAGE_EXT_DEBUG_GUARD, > PAGE_EXT_OWNER, > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > PAGE_EXT_YOUNG, > PAGE_EXT_IDLE, > @@ -44,6 +46,11 @@ enum page_ext_flags { > */ > struct page_ext { > unsigned long flags; > +#ifdef CONFIG_XPFO > + int inited; /* Map counter and lock initialized */ > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > +#endif > }; > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > new file mode 100644 > index 000000000000..77187578ca33 > --- /dev/null > +++ b/include/linux/xpfo.h > @@ -0,0 +1,39 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#ifndef _LINUX_XPFO_H > +#define _LINUX_XPFO_H > + > +#ifdef CONFIG_XPFO > + > +extern struct page_ext_operations page_xpfo_ops; > + > +extern void xpfo_kmap(void *kaddr, struct page *page); > +extern void xpfo_kunmap(void *kaddr, struct page *page); > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > +extern void xpfo_free_page(struct page *page, int order); > + > +extern bool xpfo_page_is_unmapped(struct page *page); > + > +#else /* !CONFIG_XPFO */ > + > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > +static inline void xpfo_free_page(struct page *page, int order) { } > + > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > + > +#endif /* CONFIG_XPFO */ > + > +#endif /* _LINUX_XPFO_H */ > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > index 22e13a0e19d7..455eff44604e 100644 > --- a/lib/swiotlb.c > +++ b/lib/swiotlb.c > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > { > unsigned long pfn = PFN_DOWN(orig_addr); > unsigned char *vaddr = phys_to_virt(tlb_addr); > + struct page *page = pfn_to_page(pfn); > > - if (PageHighMem(pfn_to_page(pfn))) { > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > /* The buffer does not have a mapping. Map it in and copy */ > unsigned int offset = orig_addr & ~PAGE_MASK; > char *buffer; > diff --git a/mm/Makefile b/mm/Makefile > index 295bd7a9f76b..175680f516aa 100644 > --- a/mm/Makefile > +++ b/mm/Makefile > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > +obj-$(CONFIG_XPFO) += xpfo.o > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8fd42aa7c4bd..100e80e008e2 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > kernel_poison_pages(page, 1 << order, 0); > kernel_map_pages(page, 1 << order, 0); > kasan_free_pages(page, order); > + xpfo_free_page(page, order); > > return true; > } > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > kernel_map_pages(page, 1 << order, 1); > kernel_poison_pages(page, 1 << order, 1); > kasan_alloc_pages(page, order); > + xpfo_alloc_page(page, order, gfp_flags); > set_page_owner(page, order, gfp_flags); > } > > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 121dcffc4ec1..ba6dbcacc2db 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -7,6 +7,7 @@ > #include > #include > #include > +#include > > /* > * struct page extension > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > &page_idle_ops, > #endif > +#ifdef CONFIG_XPFO > + &page_xpfo_ops, > +#endif > }; > > static unsigned long total_usage; > diff --git a/mm/xpfo.c b/mm/xpfo.c > new file mode 100644 > index 000000000000..8e3a6a694b6a > --- /dev/null > +++ b/mm/xpfo.c > @@ -0,0 +1,206 @@ > +/* > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > + * Copyright (C) 2016 Brown University. All rights reserved. > + * > + * Authors: > + * Juerg Haefliger > + * Vasileios P. Kemerlis > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms of the GNU General Public License version 2 as published by > + * the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include > + > +#include > + > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > + > +static bool need_xpfo(void) > +{ > + return true; > +} > + > +static void init_xpfo(void) > +{ > + printk(KERN_INFO "XPFO enabled\n"); > + static_branch_enable(&xpfo_inited); > +} > + > +struct page_ext_operations page_xpfo_ops = { > + .need = need_xpfo, > + .init = init_xpfo, > +}; > + > +/* > + * Update a single kernel page table entry > + */ > +static inline void set_kpte(struct page *page, unsigned long kaddr, > + pgprot_t prot) { > + unsigned int level; > + pte_t *kpte = lookup_address(kaddr, &level); > + > + /* We only support 4k pages for now */ > + BUG_ON(!kpte || level != PG_LEVEL_4K); > + > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > +} As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, would it be better to put the whole definition into arch-specific part? > + > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > +{ > + int i, flush_tlb = 0; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + > + /* Initialize the map lock and map counter */ > + if (!page_ext->inited) { > + spin_lock_init(&page_ext->maplock); > + atomic_set(&page_ext->mapcount, 0); > + page_ext->inited = 1; > + } > + BUG_ON(atomic_read(&page_ext->mapcount)); > + > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > + /* > + * Flush the TLB if the page was previously allocated > + * to the kernel. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > + &page_ext->flags)) > + flush_tlb = 1; > + } else { > + /* Tag the page as a kernel page */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + } > + } > + > + if (flush_tlb) { > + kaddr = (unsigned long)page_address(page); > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > + PAGE_SIZE); > + } > +} > + > +void xpfo_free_page(struct page *page, int order) > +{ > + int i; > + struct page_ext *page_ext; > + unsigned long kaddr; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + for (i = 0; i < (1 << order); i++) { > + page_ext = lookup_page_ext(page + i); > + > + if (!page_ext->inited) { > + /* > + * The page was allocated before page_ext was > + * initialized, so it is a kernel page and it needs to > + * be tagged accordingly. > + */ > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > + continue; > + } > + > + /* > + * Map the page back into the kernel if it was previously > + * allocated to user space. > + */ > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > + &page_ext->flags)) { > + kaddr = (unsigned long)page_address(page + i); > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); Why not PAGE_KERNEL? > + } > + } > +} > + > +void xpfo_kmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page was previously allocated to user space, so map it back > + * into the kernel. No TLB flush required. > + */ > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kmap); > + > +void xpfo_kunmap(void *kaddr, struct page *page) > +{ > + struct page_ext *page_ext; > + unsigned long flags; > + > + if (!static_branch_unlikely(&xpfo_inited)) > + return; > + > + page_ext = lookup_page_ext(page); > + > + /* > + * The page was allocated before page_ext was initialized (which means > + * it's a kernel page) or it's allocated to the kernel, so nothing to > + * do. > + */ > + if (!page_ext->inited || > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > + return; > + > + spin_lock_irqsave(&page_ext->maplock, flags); > + > + /* > + * The page is to be allocated back to user space, so unmap it from the > + * kernel, flush the TLB and tag it as a user page. > + */ > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > + __flush_tlb_one((unsigned long)kaddr); Again __flush_tlb_one() is x86-specific. flush_tlb_kernel_range() instead? Thanks, -Takahiro AKASHI > + } > + > + spin_unlock_irqrestore(&page_ext->maplock, flags); > +} > +EXPORT_SYMBOL(xpfo_kunmap); > + > +inline bool xpfo_page_is_unmapped(struct page *page) > +{ > + if (!static_branch_unlikely(&xpfo_inited)) > + return false; > + > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > +} > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > diff --git a/security/Kconfig b/security/Kconfig > index 118f4549404e..4502e15c8419 100644 > --- a/security/Kconfig > +++ b/security/Kconfig > @@ -6,6 +6,25 @@ menu "Security options" > > source security/keys/Kconfig > > +config ARCH_SUPPORTS_XPFO > + bool > + > +config XPFO > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > + default n > + depends on ARCH_SUPPORTS_XPFO > + select PAGE_EXTENSION > + help > + This option offers protection against 'ret2dir' kernel attacks. > + When enabled, every time a page frame is allocated to user space, it > + is unmapped from the direct mapped RAM region in kernel space > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > + mapped back to physmap. > + > + There is a slight performance impact when this option is enabled. > + > + If in doubt, say "N". > + > config SECURITY_DMESG_RESTRICT > bool "Restrict unprivileged access to the kernel syslog" > default n > -- > 2.10.1 > From mboxrd@z Thu Jan 1 00:00:00 1970 Reply-To: kernel-hardening@lists.openwall.com Date: Fri, 9 Dec 2016 18:02:53 +0900 From: AKASHI Takahiro Message-ID: <20161209090251.GF23034@linaro.org> References: <20160914071901.8127-1-juerg.haefliger@hpe.com> <20161104144534.14790-1-juerg.haefliger@hpe.com> <20161104144534.14790-2-juerg.haefliger@hpe.com> <20161124105629.GA23034@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161124105629.GA23034@linaro.org> Subject: [kernel-hardening] Re: [RFC PATCH v3 1/2] Add support for eXclusive Page Frame Ownership (XPFO) To: Juerg Haefliger , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-x86_64@vger.kernel.org, vpk@cs.columbia.edu List-ID: On Thu, Nov 24, 2016 at 07:56:30PM +0900, AKASHI Takahiro wrote: > Hi, > > I'm trying to give it a spin on arm64, but ... In my experiment on hikey, the kernel boot failed, catching a page fault around cache operations, (a) __clean_dcache_area_pou() on 4KB-page kernel, (b) __inval_cache_range() on 64KB-page kernel, (See more details for backtrace below.) This is because, on arm64, cache operations are by VA (in particular, of direct/linear mapping of physical memory). So I think that naively unmapping a page from physmap in xpfo_kunmap() won't work well on arm64. -Takahiro AKASHI case (a) -------- Unable to handle kernel paging request at virtual address ffff800000cba000 pgd = ffff80003ba8c000 *pgd=0000000000000000 task: ffff80003be38000 task.stack: ffff80003be40000 PC is at __clean_dcache_area_pou+0x20/0x38 LR is at sync_icache_aliases+0x2c/0x40 ... Call trace: ... __clean_dcache_area_pou+0x20/0x38 __sync_icache_dcache+0x6c/0xa8 alloc_set_pte+0x33c/0x588 filemap_map_pages+0x3a8/0x3b8 handle_mm_fault+0x910/0x1080 do_page_fault+0x2b0/0x358 do_mem_abort+0x44/0xa0 el0_ia+0x18/0x1c case (b) -------- Unable to handle kernel paging request at virtual address ffff80002aed0000 pgd = ffff000008f40000 , *pud=000000003dfc0003 , *pmd=000000003dfa0003 , *pte=000000002aed0000 task: ffff800028711900 task.stack: ffff800029020000 PC is at __inval_cache_range+0x3c/0x60 LR is at __swiotlb_map_sg_attrs+0x6c/0x98 ... Call trace: ... __inval_cache_range+0x3c/0x60 dw_mci_pre_dma_transfer.isra.7+0xfc/0x190 dw_mci_pre_req+0x50/0x60 mmc_start_req+0x4c/0x420 mmc_blk_issue_rw_rq+0xb0/0x9b8 mmc_blk_issue_rq+0x154/0x518 mmc_queue_thread+0xac/0x158 kthread+0xd0/0xe8 ret_from_fork+0x10/0x20 > > On Fri, Nov 04, 2016 at 03:45:33PM +0100, Juerg Haefliger wrote: > > This patch adds support for XPFO which protects against 'ret2dir' kernel > > attacks. The basic idea is to enforce exclusive ownership of page frames > > by either the kernel or userspace, unless explicitly requested by the > > kernel. Whenever a page destined for userspace is allocated, it is > > unmapped from physmap (the kernel's page table). When such a page is > > reclaimed from userspace, it is mapped back to physmap. > > > > Additional fields in the page_ext struct are used for XPFO housekeeping. > > Specifically two flags to distinguish user vs. kernel pages and to tag > > unmapped pages and a reference counter to balance kmap/kunmap operations > > and a lock to serialize access to the XPFO fields. > > > > Known issues/limitations: > > - Only supports x86-64 (for now) > > - Only supports 4k pages (for now) > > - There are most likely some legitimate uses cases where the kernel needs > > to access userspace which need to be made XPFO-aware > > - Performance penalty > > > > Reference paper by the original patch authors: > > http://www.cs.columbia.edu/~vpk/papers/ret2dir.sec14.pdf > > > > Suggested-by: Vasileios P. Kemerlis > > Signed-off-by: Juerg Haefliger > > --- > > arch/x86/Kconfig | 3 +- > > arch/x86/mm/init.c | 2 +- > > drivers/ata/libata-sff.c | 4 +- > > include/linux/highmem.h | 15 +++- > > include/linux/page_ext.h | 7 ++ > > include/linux/xpfo.h | 39 +++++++++ > > lib/swiotlb.c | 3 +- > > mm/Makefile | 1 + > > mm/page_alloc.c | 2 + > > mm/page_ext.c | 4 + > > mm/xpfo.c | 206 +++++++++++++++++++++++++++++++++++++++++++++++ > > security/Kconfig | 19 +++++ > > 12 files changed, 298 insertions(+), 7 deletions(-) > > create mode 100644 include/linux/xpfo.h > > create mode 100644 mm/xpfo.c > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > index bada636d1065..38b334f8fde5 100644 > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -165,6 +165,7 @@ config X86 > > select HAVE_STACK_VALIDATION if X86_64 > > select ARCH_USES_HIGH_VMA_FLAGS if X86_INTEL_MEMORY_PROTECTION_KEYS > > select ARCH_HAS_PKEYS if X86_INTEL_MEMORY_PROTECTION_KEYS > > + select ARCH_SUPPORTS_XPFO if X86_64 > > > > config INSTRUCTION_DECODER > > def_bool y > > @@ -1361,7 +1362,7 @@ config ARCH_DMA_ADDR_T_64BIT > > > > config X86_DIRECT_GBPAGES > > def_bool y > > - depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK > > + depends on X86_64 && !DEBUG_PAGEALLOC && !KMEMCHECK && !XPFO > > ---help--- > > Certain kernel features effectively disable kernel > > linear 1 GB mappings (even if the CPU otherwise > > diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c > > index 22af912d66d2..a6fafbae02bb 100644 > > --- a/arch/x86/mm/init.c > > +++ b/arch/x86/mm/init.c > > @@ -161,7 +161,7 @@ static int page_size_mask; > > > > static void __init probe_page_size_mask(void) > > { > > -#if !defined(CONFIG_KMEMCHECK) > > +#if !defined(CONFIG_KMEMCHECK) && !defined(CONFIG_XPFO) > > /* > > * For CONFIG_KMEMCHECK or pagealloc debugging, identity mapping will > > * use small pages. > > diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c > > index 051b6158d1b7..58af734be25d 100644 > > --- a/drivers/ata/libata-sff.c > > +++ b/drivers/ata/libata-sff.c > > @@ -715,7 +715,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use a bounce buffer */ > > @@ -860,7 +860,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) > > > > DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); > > > > - if (PageHighMem(page)) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > unsigned long flags; > > > > /* FIXME: use bounce buffer */ > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > > index bb3f3297062a..7a17c166532f 100644 > > --- a/include/linux/highmem.h > > +++ b/include/linux/highmem.h > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > > > @@ -55,24 +56,34 @@ static inline struct page *kmap_to_page(void *addr) > > #ifndef ARCH_HAS_KMAP > > static inline void *kmap(struct page *page) > > { > > + void *kaddr; > > + > > might_sleep(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > > > static inline void kunmap(struct page *page) > > { > > + xpfo_kunmap(page_address(page), page); > > } > > > > static inline void *kmap_atomic(struct page *page) > > { > > + void *kaddr; > > + > > preempt_disable(); > > pagefault_disable(); > > - return page_address(page); > > + kaddr = page_address(page); > > + xpfo_kmap(kaddr, page); > > + return kaddr; > > } > > #define kmap_atomic_prot(page, prot) kmap_atomic(page) > > > > static inline void __kunmap_atomic(void *addr) > > { > > + xpfo_kunmap(addr, virt_to_page(addr)); > > pagefault_enable(); > > preempt_enable(); > > } > > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > > index 9298c393ddaa..0e451a42e5a3 100644 > > --- a/include/linux/page_ext.h > > +++ b/include/linux/page_ext.h > > @@ -29,6 +29,8 @@ enum page_ext_flags { > > PAGE_EXT_DEBUG_POISON, /* Page is poisoned */ > > PAGE_EXT_DEBUG_GUARD, > > PAGE_EXT_OWNER, > > + PAGE_EXT_XPFO_KERNEL, /* Page is a kernel page */ > > + PAGE_EXT_XPFO_UNMAPPED, /* Page is unmapped */ > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > PAGE_EXT_YOUNG, > > PAGE_EXT_IDLE, > > @@ -44,6 +46,11 @@ enum page_ext_flags { > > */ > > struct page_ext { > > unsigned long flags; > > +#ifdef CONFIG_XPFO > > + int inited; /* Map counter and lock initialized */ > > + atomic_t mapcount; /* Counter for balancing map/unmap requests */ > > + spinlock_t maplock; /* Lock to serialize map/unmap requests */ > > +#endif > > }; > > > > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > > diff --git a/include/linux/xpfo.h b/include/linux/xpfo.h > > new file mode 100644 > > index 000000000000..77187578ca33 > > --- /dev/null > > +++ b/include/linux/xpfo.h > > @@ -0,0 +1,39 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#ifndef _LINUX_XPFO_H > > +#define _LINUX_XPFO_H > > + > > +#ifdef CONFIG_XPFO > > + > > +extern struct page_ext_operations page_xpfo_ops; > > + > > +extern void xpfo_kmap(void *kaddr, struct page *page); > > +extern void xpfo_kunmap(void *kaddr, struct page *page); > > +extern void xpfo_alloc_page(struct page *page, int order, gfp_t gfp); > > +extern void xpfo_free_page(struct page *page, int order); > > + > > +extern bool xpfo_page_is_unmapped(struct page *page); > > + > > +#else /* !CONFIG_XPFO */ > > + > > +static inline void xpfo_kmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_kunmap(void *kaddr, struct page *page) { } > > +static inline void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) { } > > +static inline void xpfo_free_page(struct page *page, int order) { } > > + > > +static inline bool xpfo_page_is_unmapped(struct page *page) { return false; } > > + > > +#endif /* CONFIG_XPFO */ > > + > > +#endif /* _LINUX_XPFO_H */ > > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > > index 22e13a0e19d7..455eff44604e 100644 > > --- a/lib/swiotlb.c > > +++ b/lib/swiotlb.c > > @@ -390,8 +390,9 @@ static void swiotlb_bounce(phys_addr_t orig_addr, phys_addr_t tlb_addr, > > { > > unsigned long pfn = PFN_DOWN(orig_addr); > > unsigned char *vaddr = phys_to_virt(tlb_addr); > > + struct page *page = pfn_to_page(pfn); > > > > - if (PageHighMem(pfn_to_page(pfn))) { > > + if (PageHighMem(page) || xpfo_page_is_unmapped(page)) { > > /* The buffer does not have a mapping. Map it in and copy */ > > unsigned int offset = orig_addr & ~PAGE_MASK; > > char *buffer; > > diff --git a/mm/Makefile b/mm/Makefile > > index 295bd7a9f76b..175680f516aa 100644 > > --- a/mm/Makefile > > +++ b/mm/Makefile > > @@ -100,3 +100,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o > > obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o > > obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o > > obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o > > +obj-$(CONFIG_XPFO) += xpfo.o > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 8fd42aa7c4bd..100e80e008e2 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -1045,6 +1045,7 @@ static __always_inline bool free_pages_prepare(struct page *page, > > kernel_poison_pages(page, 1 << order, 0); > > kernel_map_pages(page, 1 << order, 0); > > kasan_free_pages(page, order); > > + xpfo_free_page(page, order); > > > > return true; > > } > > @@ -1745,6 +1746,7 @@ inline void post_alloc_hook(struct page *page, unsigned int order, > > kernel_map_pages(page, 1 << order, 1); > > kernel_poison_pages(page, 1 << order, 1); > > kasan_alloc_pages(page, order); > > + xpfo_alloc_page(page, order, gfp_flags); > > set_page_owner(page, order, gfp_flags); > > } > > > > diff --git a/mm/page_ext.c b/mm/page_ext.c > > index 121dcffc4ec1..ba6dbcacc2db 100644 > > --- a/mm/page_ext.c > > +++ b/mm/page_ext.c > > @@ -7,6 +7,7 @@ > > #include > > #include > > #include > > +#include > > > > /* > > * struct page extension > > @@ -68,6 +69,9 @@ static struct page_ext_operations *page_ext_ops[] = { > > #if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) > > &page_idle_ops, > > #endif > > +#ifdef CONFIG_XPFO > > + &page_xpfo_ops, > > +#endif > > }; > > > > static unsigned long total_usage; > > diff --git a/mm/xpfo.c b/mm/xpfo.c > > new file mode 100644 > > index 000000000000..8e3a6a694b6a > > --- /dev/null > > +++ b/mm/xpfo.c > > @@ -0,0 +1,206 @@ > > +/* > > + * Copyright (C) 2016 Hewlett Packard Enterprise Development, L.P. > > + * Copyright (C) 2016 Brown University. All rights reserved. > > + * > > + * Authors: > > + * Juerg Haefliger > > + * Vasileios P. Kemerlis > > + * > > + * This program is free software; you can redistribute it and/or modify it > > + * under the terms of the GNU General Public License version 2 as published by > > + * the Free Software Foundation. > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > + > > +#include > > + > > +DEFINE_STATIC_KEY_FALSE(xpfo_inited); > > + > > +static bool need_xpfo(void) > > +{ > > + return true; > > +} > > + > > +static void init_xpfo(void) > > +{ > > + printk(KERN_INFO "XPFO enabled\n"); > > + static_branch_enable(&xpfo_inited); > > +} > > + > > +struct page_ext_operations page_xpfo_ops = { > > + .need = need_xpfo, > > + .init = init_xpfo, > > +}; > > + > > +/* > > + * Update a single kernel page table entry > > + */ > > +static inline void set_kpte(struct page *page, unsigned long kaddr, > > + pgprot_t prot) { > > + unsigned int level; > > + pte_t *kpte = lookup_address(kaddr, &level); > > + > > + /* We only support 4k pages for now */ > > + BUG_ON(!kpte || level != PG_LEVEL_4K); > > + > > + set_pte_atomic(kpte, pfn_pte(page_to_pfn(page), canon_pgprot(prot))); > > +} > > As lookup_address() and set_pte_atomic() (and PG_LEVEL_4K), are arch-specific, > would it be better to put the whole definition into arch-specific part? > > > + > > +void xpfo_alloc_page(struct page *page, int order, gfp_t gfp) > > +{ > > + int i, flush_tlb = 0; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + > > + /* Initialize the map lock and map counter */ > > + if (!page_ext->inited) { > > + spin_lock_init(&page_ext->maplock); > > + atomic_set(&page_ext->mapcount, 0); > > + page_ext->inited = 1; > > + } > > + BUG_ON(atomic_read(&page_ext->mapcount)); > > + > > + if ((gfp & GFP_HIGHUSER) == GFP_HIGHUSER) { > > + /* > > + * Flush the TLB if the page was previously allocated > > + * to the kernel. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_KERNEL, > > + &page_ext->flags)) > > + flush_tlb = 1; > > + } else { > > + /* Tag the page as a kernel page */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + } > > + } > > + > > + if (flush_tlb) { > > + kaddr = (unsigned long)page_address(page); > > + flush_tlb_kernel_range(kaddr, kaddr + (1 << order) * > > + PAGE_SIZE); > > + } > > +} > > + > > +void xpfo_free_page(struct page *page, int order) > > +{ > > + int i; > > + struct page_ext *page_ext; > > + unsigned long kaddr; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + for (i = 0; i < (1 << order); i++) { > > + page_ext = lookup_page_ext(page + i); > > + > > + if (!page_ext->inited) { > > + /* > > + * The page was allocated before page_ext was > > + * initialized, so it is a kernel page and it needs to > > + * be tagged accordingly. > > + */ > > + set_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags); > > + continue; > > + } > > + > > + /* > > + * Map the page back into the kernel if it was previously > > + * allocated to user space. > > + */ > > + if (test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, > > + &page_ext->flags)) { > > + kaddr = (unsigned long)page_address(page + i); > > + set_kpte(page + i, kaddr, __pgprot(__PAGE_KERNEL)); > > Why not PAGE_KERNEL? > > > + } > > + } > > +} > > + > > +void xpfo_kmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page was previously allocated to user space, so map it back > > + * into the kernel. No TLB flush required. > > + */ > > + if ((atomic_inc_return(&page_ext->mapcount) == 1) && > > + test_and_clear_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)) > > + set_kpte(page, (unsigned long)kaddr, __pgprot(__PAGE_KERNEL)); > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kmap); > > + > > +void xpfo_kunmap(void *kaddr, struct page *page) > > +{ > > + struct page_ext *page_ext; > > + unsigned long flags; > > + > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return; > > + > > + page_ext = lookup_page_ext(page); > > + > > + /* > > + * The page was allocated before page_ext was initialized (which means > > + * it's a kernel page) or it's allocated to the kernel, so nothing to > > + * do. > > + */ > > + if (!page_ext->inited || > > + test_bit(PAGE_EXT_XPFO_KERNEL, &page_ext->flags)) > > + return; > > + > > + spin_lock_irqsave(&page_ext->maplock, flags); > > + > > + /* > > + * The page is to be allocated back to user space, so unmap it from the > > + * kernel, flush the TLB and tag it as a user page. > > + */ > > + if (atomic_dec_return(&page_ext->mapcount) == 0) { > > + BUG_ON(test_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags)); > > + set_bit(PAGE_EXT_XPFO_UNMAPPED, &page_ext->flags); > > + set_kpte(page, (unsigned long)kaddr, __pgprot(0)); > > + __flush_tlb_one((unsigned long)kaddr); > > Again __flush_tlb_one() is x86-specific. > flush_tlb_kernel_range() instead? > > Thanks, > -Takahiro AKASHI > > > + } > > + > > + spin_unlock_irqrestore(&page_ext->maplock, flags); > > +} > > +EXPORT_SYMBOL(xpfo_kunmap); > > + > > +inline bool xpfo_page_is_unmapped(struct page *page) > > +{ > > + if (!static_branch_unlikely(&xpfo_inited)) > > + return false; > > + > > + return test_bit(PAGE_EXT_XPFO_UNMAPPED, &lookup_page_ext(page)->flags); > > +} > > +EXPORT_SYMBOL(xpfo_page_is_unmapped); > > diff --git a/security/Kconfig b/security/Kconfig > > index 118f4549404e..4502e15c8419 100644 > > --- a/security/Kconfig > > +++ b/security/Kconfig > > @@ -6,6 +6,25 @@ menu "Security options" > > > > source security/keys/Kconfig > > > > +config ARCH_SUPPORTS_XPFO > > + bool > > + > > +config XPFO > > + bool "Enable eXclusive Page Frame Ownership (XPFO)" > > + default n > > + depends on ARCH_SUPPORTS_XPFO > > + select PAGE_EXTENSION > > + help > > + This option offers protection against 'ret2dir' kernel attacks. > > + When enabled, every time a page frame is allocated to user space, it > > + is unmapped from the direct mapped RAM region in kernel space > > + (physmap). Similarly, when a page frame is freed/reclaimed, it is > > + mapped back to physmap. > > + > > + There is a slight performance impact when this option is enabled. > > + > > + If in doubt, say "N". > > + > > config SECURITY_DMESG_RESTRICT > > bool "Restrict unprivileged access to the kernel syslog" > > default n > > -- > > 2.10.1 > >