For those of you whom I neglected to cc on v3, here's a quick recap: My original plan was for my next RFC to be an implementation of Andy's proposed "dynamic tracking" model, but I was completely flummoxed by the auditing[1]. Cedric's RFC has the same auditing complexities, so I I ended up back at the "make userspace state its intentions" approach. There are no significant LSM changes in v4, e.g. a bug fix and some renaming. I'm spinning v4 early to get the cc list correct, and also because I'm about to disappear on vacation for two weeks. Except for patch 12 (see below), the SGX changes have been fully tested, including updating the kernel's selftest as well as my own fork of (an old version of) Intel's SDK to use the new UAPI. The LSM changes have been smoke tested, but I haven't actually configured AppArmor or SELinux to verify the permissions work as intended. Patches 1-3 are not directly related to LSM support. They're included here as the actual LSM RFC patches are essentially untestable without them, and so that the patches apply to Jarkko's tree. Ignore patches 1-3 unless you actually want to run code. Patches 4-11 are the meat of the RFC. Patch 12 is purely to show how we might implement SGX2 support. It's not intended to be included in the initial upstreaming of SGX. The full code is available at https://github.com/sean-jc/linux.git in a few forms (tagged); sgx-lsm-v4 - Jarkko's full tree plus patches 1-11 sgx-lsm-v4-eaug - Everything above plus patch 12 <boilerplate> This series is a delta to Jarkko's ongoing SGX series and applies on Jarkko's current master at https://github.com/jsakkine-intel/linux-sgx.git: 91f3aa6d241d ("docs: x86/sgx: Document the enclave API") The basic gist of the approach is to track an enclave's page protections separately from any vmas that map the page, and separate from the hardware enforced protections. The SGX UAPI is modified to require userspace to explicitly define the protections for each enclave page, i.e. the ioctl to add pages to an enclave is extended to take PROT_{READ,WRITE,EXEC} flags. An enclave page's protections are the maximal protections that userspace can use to map the page, e.g. mprotect() and mmap() are rejected if the protections for the vma would be more permissible than those of the associated enclave page. Tracking protections for an enclave page (in additional to vmas) allows SGX to invoke LSM upcalls while the enclave is being built. This is critical to enabling LSMs to implement policies for enclave pages that are functionally equivalent to existing policies for normal pages. </boilerplate> [1] https://lkml.kernel.org/r/20190614003759.GE18385@linux.intel.com v4: - Rename SGX__EXECMEM and SGX__EXECMOD to SGX__MAPWX and SGX_EXECDIRTY respectively [Stephen]. - Fix an inverted check on IS_PRIVATE file check [Stephen]. - Take a '__u8 prot' in SGX_IOC_ENCLAVE_ADD_PAGE [Jarkko]. - Rebased to Jarkko's latest code base. - Replace patch 1 with a variant that does encl_mm tracking via mmu_notifier and SRCU. Not relevant for most people, but I wanted to show the end state if we get rid of the per-vma tracking. v3: https://patchwork.kernel.org/cover/11000601/ - Clear VM_MAY* flags instead of using .may_mprotect() to enforce maximal enclave page protections. - Update the SGX selftest to work with the new API. - Rewrite SELinux code to use SGX specific permissions, with the goal of addressing Andy's feedback regarding what people will actually care about when it comes to SGX, e.g. add permissions for restricing unmeasured code and stop trying to infer permissions from the source of each enclave page. - Add a (very minimal) AppArmor patch. - Show line of sight to SGX2 support. - Rebased to Jarkko's latest code base. v2: https://lkml.kernel.org/r/20190606021145.12604-1-sean.j.christopherson@intel.com - Dropped the patch(es) to extend the SGX UAPI to allow adding multiple enclave pages in a single syscall [Jarkko]. - Reject ioctl() immediately on LSM denial [Stephen]. - Rework SELinux code to avoid checking EXEMEM multiple times [Stephen]. - Adding missing equivalents to existing selinux_file_protect() checks [Stephen]. - Hold mmap_sem across copy_to_user() to prevent a TOCTOU race when checking the source vma [Stephen]. - Stubify security_enclave_load() if !CONFIG_SECURITY [Stephen]. - Make flags a 32-bit field [Andy]. - Don't validate the SECINFO protection flags against the enclave page's protection flags [Andy]. - Rename mprotect() hook to may_mprotect() [Andy]. - Test 'vma->vm_flags & VM_MAYEXEC' instead of manually checking for a noexec path [Jarkko]. - Drop the SGX defined flags (use PROT_*) [Jarkko]. - Improve comments and changelogs [Jarkko]. v1: https://lkml.kernel.org/r/20190531233159.30992-1-sean.j.christopherson@intel.com Sean Christopherson (12): x86/sgx: Use mmu_notifier.release() instead of per-vma refcounting x86/sgx: Do not naturally align MAP_FIXED address selftests: x86/sgx: Mark the enclave loader as not needing an exec stack x86/sgx: Require userspace to define enclave pages' protection bits x86/sgx: Enforce noexec filesystem restriction for enclaves mm: Introduce vm_ops->may_mprotect() LSM: x86/sgx: Introduce ->enclave_map() hook for Intel SGX security/selinux: Require SGX_MAPWX to map enclave page WX LSM: x86/sgx: Introduce ->enclave_load() hook for Intel SGX security/selinux: Add enclave_load() implementation security/apparmor: Add enclave_load() implementation LSM: x86/sgx: Show line of sight to LSM support SGX2's EAUG arch/x86/Kconfig | 2 + arch/x86/include/uapi/asm/sgx.h | 6 +- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 69 ++++-- arch/x86/kernel/cpu/sgx/driver/main.c | 106 ++++++++- arch/x86/kernel/cpu/sgx/encl.c | 277 ++++++++++++----------- arch/x86/kernel/cpu/sgx/encl.h | 22 +- arch/x86/kernel/cpu/sgx/reclaim.c | 71 ++---- include/linux/lsm_hooks.h | 20 ++ include/linux/mm.h | 2 + include/linux/security.h | 18 ++ mm/mprotect.c | 15 +- security/apparmor/include/audit.h | 2 + security/apparmor/lsm.c | 14 ++ security/security.c | 12 + security/selinux/hooks.c | 72 ++++++ security/selinux/include/classmap.h | 6 +- tools/testing/selftests/x86/sgx/Makefile | 2 +- tools/testing/selftests/x86/sgx/main.c | 32 ++- 18 files changed, 532 insertions(+), 216 deletions(-) -- 2.21.0
Using per-vma refcounting to track mm_structs associated with an enclave requires hooking .vm_close(), which in turn prevents the mm from merging vmas (precisely to allow refcounting). Avoid refcounting encl_mm altogether by registering an mmu_notifier at .mmap(), removing the dying encl_mm at mmu_notifier.release() and protecting mm_list during reclaim via a per-enclave SRCU. Removing refcounting/vm_close() allows merging of enclave vmas, at the cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is disassociated from an enclave when the mm exits or the enclave dies, as opposed to when the last vma (in a given mm) is closed. The impact of delying encl_mm removal is its memory footprint and whatever overhead is incurred during EPC reclaim (to walk an mm's vmas). Practically speaking, a stale encl_mm will exist for a meaningful amount of time if and only if the enclave is mapped in a long-lived process and then passed off to another long-lived process. It is expected that the vast majority of use cases will not encounter this condition, e.g. even using a daemon to build enclaves should not result in a stale encl_mm as the builder should never need to mmap() the enclave. Even if there are scenarios that lead to defunct encl_mms, the cost is likely far outweighed by the benefits of reducing the number of vmas across all enclaves. Note, using SRCU to protect mm_list is not strictly necessary, i.e. the existing walker with encl_mm refcounting could be massaged to work with mmu_notifier.release(), but the resulting code is subtle and fragile (I never actually got it working). The primary issue is that an encl_mm can't be moved off the list until its refcount goes to zero, otherwise the custom walker goes off into the weeds. The refcount requirement then prevents using mm_list to identify if an mmu_notifier.release() has fired, i.e. another mechanism is needed to guard against races between exit_mmap() and sgx_release(). Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/Kconfig | 2 + arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 -- arch/x86/kernel/cpu/sgx/driver/main.c | 38 ++++ arch/x86/kernel/cpu/sgx/encl.c | 234 +++++++++++-------------- arch/x86/kernel/cpu/sgx/encl.h | 19 +- arch/x86/kernel/cpu/sgx/reclaim.c | 71 +++----- 6 files changed, 182 insertions(+), 196 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a0fd17c32521..940c52762f24 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1918,6 +1918,8 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS config INTEL_SGX bool "Intel SGX core functionality" depends on X86_64 && CPU_SUP_INTEL + select MMU_NOTIFIER + select SRCU ---help--- Intel(R) SGX is a set of CPU instructions that can be used by applications to set aside private regions of code and data, referred diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index d17c60dca114..3552d642b26f 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -276,7 +276,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) { unsigned long encl_size = secs->size + PAGE_SIZE; struct sgx_epc_page *secs_epc; - struct sgx_encl_mm *encl_mm; unsigned long ssaframesize; struct sgx_pageinfo pginfo; struct sgx_secinfo secinfo; @@ -311,12 +310,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) INIT_WORK(&encl->work, sgx_add_page_worker); - encl_mm = sgx_encl_mm_add(encl, current->mm); - if (IS_ERR(encl_mm)) { - ret = PTR_ERR(encl_mm); - goto err_out; - } - secs_epc = sgx_alloc_page(&encl->secs, true); if (IS_ERR(secs_epc)) { ret = PTR_ERR(secs_epc); @@ -369,13 +362,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->backing = NULL; } - if (!list_empty(&encl->mm_list)) { - encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm, - list); - list_del(&encl_mm->list); - kfree(encl_mm); - } - mutex_unlock(&encl->lock); return ret; } diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 0c831ee5e2de..07aa5f91b2dd 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -25,6 +25,7 @@ u32 sgx_xsave_size_tbl[64]; static int sgx_open(struct inode *inode, struct file *file) { struct sgx_encl *encl; + int ret; encl = kzalloc(sizeof(*encl), GFP_KERNEL); if (!encl) @@ -38,6 +39,12 @@ static int sgx_open(struct inode *inode, struct file *file) INIT_LIST_HEAD(&encl->mm_list); spin_lock_init(&encl->mm_lock); + ret = init_srcu_struct(&encl->srcu); + if (ret) { + kfree(encl); + return ret; + } + file->private_data = encl; return 0; @@ -46,6 +53,32 @@ static int sgx_open(struct inode *inode, struct file *file) static int sgx_release(struct inode *inode, struct file *file) { struct sgx_encl *encl = file->private_data; + struct sgx_encl_mm *encl_mm; + + /* + * Objects can't be *moved* off an RCU protected list (deletion is ok), + * nor can the object be freed until after synchronize_srcu(). + */ +restart: + spin_lock(&encl->mm_lock); + if (list_empty(&encl->mm_list)) { + encl_mm = NULL; + } else { + encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm, + list); + list_del_rcu(&encl_mm->list); + } + spin_unlock(&encl->mm_lock); + + if (encl_mm) { + synchronize_srcu(&encl->srcu); + + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); + + sgx_encl_mm_release(encl_mm); + + goto restart; + } kref_put(&encl->refcount, sgx_encl_release); @@ -63,6 +96,11 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + int ret; + + ret = sgx_encl_mm_add(encl, vma->vm_mm); + if (ret) + return ret; vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 9566eb72d417..c6436bbd4a68 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, return entry; } -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, - struct mm_struct *mm) +static void sgx_encl_mm_release_wq(struct work_struct *work) +{ + struct sgx_encl_mm *encl_mm = + container_of(work, struct sgx_encl_mm, release_work); + + sgx_encl_mm_release(encl_mm); +} + +/* + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release() + * is anything but short. Do the final freeing in yet another async callback. + */ +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu) +{ + struct sgx_encl_mm *encl_mm = + container_of(rcu, struct sgx_encl_mm, rcu); + + INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq); + schedule_work(&encl_mm->release_work); +} + +static void sgx_mmu_notifier_release(struct mmu_notifier *mn, + struct mm_struct *mm) +{ + struct sgx_encl_mm *encl_mm = + container_of(mn, struct sgx_encl_mm, mmu_notifier); + struct sgx_encl_mm *tmp = NULL; + + /* + * The enclave itself can remove encl_mm. Note, objects can't be moved + * off an RCU protected list, but deletion is ok. + */ + spin_lock(&encl_mm->encl->mm_lock); + list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) { + if (tmp == encl_mm) { + list_del_rcu(&encl_mm->list); + break; + } + } + spin_unlock(&encl_mm->encl->mm_lock); + + if (tmp == encl_mm) { + synchronize_srcu(&encl_mm->encl->srcu); + + /* + * Delay freeing encl_mm until after mmu_notifier releases any + * SRCU locks. synchronize_srcu() must be called from process + * context, i.e. we can't throw mmu_notifier_unregister() in a + * work queue and be done with it. + */ + mmu_notifier_unregister_no_release(mn, mm); + mmu_notifier_call_srcu(&encl_mm->rcu, + &sgx_encl_mm_release_delayed); + } +} + +static const struct mmu_notifier_ops sgx_mmu_notifier_ops = { + .release = sgx_mmu_notifier_release, +}; + +static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl, + struct mm_struct *mm) +{ + struct sgx_encl_mm *encl_mm = NULL; + struct sgx_encl_mm *tmp; + int idx; + + idx = srcu_read_lock(&encl->srcu); + + list_for_each_entry_rcu(tmp, &encl->mm_list, list) { + if (tmp->mm == mm) { + encl_mm = tmp; + break; + } + } + + srcu_read_unlock(&encl->srcu, idx); + + return encl_mm; +} + +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) { struct sgx_encl_mm *encl_mm; + int ret; + + lockdep_assert_held_exclusive(&mm->mmap_sem); + + /* + * mm_structs are kept on mm_list until the mm or the enclave dies, + * i.e. once an mm is off the list, it's gone for good, therefore it's + * impossible to get a false positive on @mm due to a stale mm_list. + */ + if (sgx_encl_find_mm(encl, mm)) + return 0; encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL); if (!encl_mm) - return ERR_PTR(-ENOMEM); + return -ENOMEM; encl_mm->encl = encl; encl_mm->mm = mm; - kref_init(&encl_mm->refcount); + encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops; + + ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm); + if (ret) { + kfree(encl_mm); + return ret; + } + + kref_get(&encl->refcount); spin_lock(&encl->mm_lock); - list_add(&encl_mm->list, &encl->mm_list); + list_add_rcu(&encl_mm->list, &encl->mm_list); spin_unlock(&encl->mm_lock); - return encl_mm; -} + synchronize_srcu(&encl->srcu); -void sgx_encl_mm_release(struct kref *ref) -{ - struct sgx_encl_mm *encl_mm = - container_of(ref, struct sgx_encl_mm, refcount); - - spin_lock(&encl_mm->encl->mm_lock); - list_del(&encl_mm->list); - spin_unlock(&encl_mm->encl->mm_lock); - - kfree(encl_mm); -} - -static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, - struct mm_struct *mm) -{ - struct sgx_encl_mm *encl_mm = NULL; - struct sgx_encl_mm *prev_mm = NULL; - int iter; - - while (true) { - encl_mm = sgx_encl_next_mm(encl, prev_mm, &iter); - if (prev_mm) - kref_put(&prev_mm->refcount, sgx_encl_mm_release); - prev_mm = encl_mm; - - if (iter == SGX_ENCL_MM_ITER_DONE) - break; - - if (iter == SGX_ENCL_MM_ITER_RESTART) - continue; - - if (mm == encl_mm->mm) - return encl_mm; - } - - return NULL; -} - -static void sgx_vma_open(struct vm_area_struct *vma) -{ - struct sgx_encl *encl = vma->vm_private_data; - struct sgx_encl_mm *encl_mm; - - if (!encl) - return; - - if (encl->flags & SGX_ENCL_DEAD) - goto error; - - encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); - if (!encl_mm) { - encl_mm = sgx_encl_mm_add(encl, vma->vm_mm); - if (IS_ERR(encl_mm)) - goto error; - } - - return; - -error: - vma->vm_private_data = NULL; -} - -static void sgx_vma_close(struct vm_area_struct *vma) -{ - struct sgx_encl *encl = vma->vm_private_data; - struct sgx_encl_mm *encl_mm; - - if (!encl) - return; - - encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); - if (encl_mm) { - kref_put(&encl_mm->refcount, sgx_encl_mm_release); - - /* Release kref for the VMA. */ - kref_put(&encl_mm->refcount, sgx_encl_mm_release); - } + return 0; } static unsigned int sgx_vma_fault(struct vm_fault *vmf) @@ -366,8 +388,6 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, } const struct vm_operations_struct sgx_vm_ops = { - .close = sgx_vma_close, - .open = sgx_vma_open, .fault = sgx_vma_fault, .access = sgx_vma_access, }; @@ -465,7 +485,7 @@ void sgx_encl_release(struct kref *ref) if (encl->backing) fput(encl->backing); - WARN(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); + WARN_ONCE(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); kfree(encl); } @@ -503,46 +523,6 @@ struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index) return shmem_read_mapping_page_gfp(mapping, index, gfpmask); } -/** - * sgx_encl_next_mm() - Iterate to the next mm - * @encl: an enclave - * @mm: an mm list entry - * @iter: iterator status - * - * Return: the enclave mm or NULL - */ -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, - struct sgx_encl_mm *encl_mm, int *iter) -{ - struct list_head *entry; - - WARN(!encl, "%s: encl is NULL", __func__); - WARN(!iter, "%s: iter is NULL", __func__); - - spin_lock(&encl->mm_lock); - - entry = encl_mm ? encl_mm->list.next : encl->mm_list.next; - WARN(!entry, "%s: entry is NULL", __func__); - - if (entry == &encl->mm_list) { - spin_unlock(&encl->mm_lock); - *iter = SGX_ENCL_MM_ITER_DONE; - return NULL; - } - - encl_mm = list_entry(entry, struct sgx_encl_mm, list); - - if (!kref_get_unless_zero(&encl_mm->refcount)) { - spin_unlock(&encl->mm_lock); - *iter = SGX_ENCL_MM_ITER_RESTART; - return NULL; - } - - spin_unlock(&encl->mm_lock); - *iter = SGX_ENCL_MM_ITER_NEXT; - return encl_mm; -} - static int sgx_encl_test_and_clear_young_cb(pte_t *ptep, pgtable_t token, unsigned long addr, void *data) { diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index c557f0374d74..0904b3c20ed0 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -9,9 +9,11 @@ #include <linux/kref.h> #include <linux/list.h> #include <linux/mm_types.h> +#include <linux/mmu_notifier.h> #include <linux/mutex.h> #include <linux/notifier.h> #include <linux/radix-tree.h> +#include <linux/srcu.h> #include <linux/workqueue.h> /** @@ -57,8 +59,10 @@ enum sgx_encl_flags { struct sgx_encl_mm { struct sgx_encl *encl; struct mm_struct *mm; - struct kref refcount; struct list_head list; + struct mmu_notifier mmu_notifier; + struct work_struct release_work; + struct rcu_head rcu; }; struct sgx_encl { @@ -72,6 +76,7 @@ struct sgx_encl { spinlock_t mm_lock; struct file *backing; struct kref refcount; + struct srcu_struct srcu; unsigned long base; unsigned long size; unsigned long ssaframesize; @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl); void sgx_encl_release(struct kref *ref); pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page); struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index); -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, - struct sgx_encl_mm *encl_mm, int *iter); -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, - struct mm_struct *mm); -void sgx_encl_mm_release(struct kref *ref); +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm) +{ + kref_put(&encl_mm->encl->refcount, sgx_encl_release); + + kfree(encl_mm); +} int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); struct sgx_encl_page *sgx_encl_reserve_page(struct sgx_encl *encl, diff --git a/arch/x86/kernel/cpu/sgx/reclaim.c b/arch/x86/kernel/cpu/sgx/reclaim.c index f192ade93245..e9427220415b 100644 --- a/arch/x86/kernel/cpu/sgx/reclaim.c +++ b/arch/x86/kernel/cpu/sgx/reclaim.c @@ -140,23 +140,13 @@ static bool sgx_reclaimer_evict(struct sgx_epc_page *epc_page) { struct sgx_encl_page *page = epc_page->owner; struct sgx_encl *encl = page->encl; - struct sgx_encl_mm *encl_mm = NULL; - struct sgx_encl_mm *prev_mm = NULL; + struct sgx_encl_mm *encl_mm; bool ret = true; - int iter; + int idx; - while (true) { - encl_mm = sgx_encl_next_mm(encl, prev_mm, &iter); - if (prev_mm) - kref_put(&prev_mm->refcount, sgx_encl_mm_release); - prev_mm = encl_mm; - - if (iter == SGX_ENCL_MM_ITER_DONE) - break; - - if (iter == SGX_ENCL_MM_ITER_RESTART) - continue; + idx = srcu_read_lock(&encl->srcu); + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { if (!mmget_not_zero(encl_mm->mm)) continue; @@ -164,14 +154,14 @@ static bool sgx_reclaimer_evict(struct sgx_epc_page *epc_page) ret = !sgx_encl_test_and_clear_young(encl_mm->mm, page); up_read(&encl_mm->mm->mmap_sem); - mmput(encl_mm->mm); + mmput_async(encl_mm->mm); - if (!ret || (encl->flags & SGX_ENCL_DEAD)) { - kref_put(&encl_mm->refcount, sgx_encl_mm_release); + if (!ret || (encl->flags & SGX_ENCL_DEAD)) break; - } } + srcu_read_unlock(&encl->srcu, idx); + /* * Do not reclaim this page if it has been recently accessed by any * mm_struct *and* if the enclave is still alive. No need to take @@ -195,24 +185,13 @@ static void sgx_reclaimer_block(struct sgx_epc_page *epc_page) struct sgx_encl_page *page = epc_page->owner; unsigned long addr = SGX_ENCL_PAGE_ADDR(page); struct sgx_encl *encl = page->encl; - struct sgx_encl_mm *encl_mm = NULL; - struct sgx_encl_mm *prev_mm = NULL; + struct sgx_encl_mm *encl_mm; struct vm_area_struct *vma; - int iter; - int ret; + int idx, ret; - while (true) { - encl_mm = sgx_encl_next_mm(encl, prev_mm, &iter); - if (prev_mm) - kref_put(&prev_mm->refcount, sgx_encl_mm_release); - prev_mm = encl_mm; - - if (iter == SGX_ENCL_MM_ITER_DONE) - break; - - if (iter == SGX_ENCL_MM_ITER_RESTART) - continue; + idx = srcu_read_lock(&encl->srcu); + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { if (!mmget_not_zero(encl_mm->mm)) continue; @@ -224,9 +203,11 @@ static void sgx_reclaimer_block(struct sgx_epc_page *epc_page) up_read(&encl_mm->mm->mmap_sem); - mmput(encl_mm->mm); + mmput_async(encl_mm->mm); } + srcu_read_unlock(&encl->srcu, idx); + mutex_lock(&encl->lock); if (!(encl->flags & SGX_ENCL_DEAD)) { @@ -289,32 +270,24 @@ static void sgx_ipi_cb(void *info) static const cpumask_t *sgx_encl_ewb_cpumask(struct sgx_encl *encl) { cpumask_t *cpumask = &encl->cpumask; - struct sgx_encl_mm *encl_mm = NULL; - struct sgx_encl_mm *prev_mm = NULL; - int iter; + struct sgx_encl_mm *encl_mm; + int idx; cpumask_clear(cpumask); - while (true) { - encl_mm = sgx_encl_next_mm(encl, prev_mm, &iter); - if (prev_mm) - kref_put(&prev_mm->refcount, sgx_encl_mm_release); - prev_mm = encl_mm; - - if (iter == SGX_ENCL_MM_ITER_DONE) - break; - - if (iter == SGX_ENCL_MM_ITER_RESTART) - continue; + idx = srcu_read_lock(&encl->srcu); + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { if (!mmget_not_zero(encl_mm->mm)) continue; cpumask_or(cpumask, cpumask, mm_cpumask(encl_mm->mm)); - mmput(encl_mm->mm); + mmput_async(encl_mm->mm); } + srcu_read_unlock(&encl->srcu, idx); + return cpumask; } -- 2.21.0
SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is tracked and enforced by the CPU using a base+mask approach, similar to how hardware range registers such as the variable MTRRs. As a result, the ELRANGE must be naturally sized and aligned. To reduce boilerplate code that would be needed in every userspace enclave loader, the SGX driver naturally aligns the mmap() address and also requires the range to be naturally sized. Unfortunately, SGX fails to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap() if userspace is attempting to map a small slice of an existing enclave. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 07aa5f91b2dd..29384cdd0842 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -115,7 +115,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file, unsigned long pgoff, unsigned long flags) { - if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE) + if (flags & MAP_PRIVATE) + return -EINVAL; + + if (flags & MAP_FIXED) + return addr; + + if (len < 2 * PAGE_SIZE || len & (len - 1)) return -EINVAL; addr = current->mm->get_unmapped_area(file, addr, 2 * len, pgoff, -- 2.21.0
The SGX enclave loader doesn't need an executable stack, but linkers will assume it does due to the lack of .note.GNU-stack sections in the loader's assembly code. As a result, the kernel tags the loader as having "read implies exec", and so adds PROT_EXEC to all mmap()s, even those for mapping EPC regions. This will cause problems in the future when userspace needs to explicit state a page's protection bits when the page is added to an enclave, e.g. adding TCS pages as R+W will cause mmap() to fail when the kernel tacks on +X. Explicitly tell the linker that an executable stack is not needed. Alternatively, each .S file could add .note.GNU-stack, but the loader should never need an executable stack so zap it in one fell swoop. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- tools/testing/selftests/x86/sgx/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/sgx/Makefile b/tools/testing/selftests/x86/sgx/Makefile index 1fd6f2708e81..10136b73096b 100644 --- a/tools/testing/selftests/x86/sgx/Makefile +++ b/tools/testing/selftests/x86/sgx/Makefile @@ -2,7 +2,7 @@ top_srcdir = ../../../../.. include ../../lib.mk -HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC +HOST_CFLAGS := -Wall -Werror -g $(INCLUDES) -fPIC -z noexecstack ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \ -fno-stack-protector -mrdrnd $(INCLUDES) -- 2.21.0
Existing Linux Security Module policies restrict userspace's ability to map memory, e.g. may require priveleged permissions to map a page that is simultaneously writable and executable. Said permissions are often tied to the file which backs the mapped memory, i.e. vm_file. For reasons explained below, SGX does not allow LSMs to enforce policies using existing LSM hooks such as file_mprotect(). Explicitly track the protection bits for an enclave page (separate from the vma/pte bits) and require userspace to explicit define a page's protection bits when the page is added to the enclave. Enclave page protection bits paves the way to adding security_enclave_load() LSM hook as an SGX equivalent to security_file_mprotect(), e.g. SGX can pass the page's protection bits and source vma to LSMs. The source vma will allow LSMs to tie permissions to files, e.g. the file containing the enclave's code and initial data, and the protection bits will allow LSMs to make decisions based on the capabilities of the process, e.g. if a process is allowed to load unmeasured code or load code from anonymous memory. Due to the nature of the Enclave Page Cache, and because the EPC is manually managed by SGX, all enclave vmas are backed by the same file, i.e. /dev/sgx/enclave. Specifically, a single file allows SGX to use file op hooks to move pages in/out of the EPC. Furthermore, EPC pages for any given enclave are fundamentally shared between processes, i.e. CoW semantics are not possible with EPC pages due to hardware restrictions such as 1:1 mappings between virtual and physical addresses (within the enclave). Lastly, all real world enclaves will need read, write and execute permissions to EPC pages. As a result, SGX does not play nice with existing LSM behavior as it is impossible to apply policies to enclaves with reasonable granularity, e.g. an LSM can deny access to EPC altogether, but can't deny potentially unwanted behavior such as mapping pages WX, loading code from anonymous memory, loading unmeasured code, etc... For example, because all (practical) enclaves need RW pages for data and RX pages for code, SELinux's existing policies will require all enclaves to have FILE__READ, FILE__WRITE and FILE__EXECUTE permissions on /dev/sgx/enclave. Witholding FILE__WRITE or FILE__EXECUTE in an attempt to deny RW->RX or RWX would prevent running *any* enclave, even those that cleanly separate RW and RX pages. And because /dev/sgx/enclave requires MAP_SHARED, the anonymous/CoW checks that would trigger FILE__EXECMOD or PROCESS__EXECMEM permissions will never fire. Taking protection bits has a second use in that it can be used to prevent loading an enclave from a noexec file system. On SGX2 hardware, regardless of kernel support for SGX2, userspace could EADD a page from a noexec path using read-only permissions and later mprotect() and ENCLU[EMODPE] the page to gain execute permissions. By requiring the enclave's page protections up front, SGX will be able to enforce noexec paths when building enclaves. To prevent userspace from circumventing the allowed protections, do not allow PROT_{READ,WRITE,EXEC} mappings to an enclave without an associated enclave page, i.e. prevent creating a mapping with unchecked protection bits. Many alternatives[1][2] have been explored, most notably the concept of having SGX check (at load time) and save the permissions of the enclave loader. The permissions would then be enforced by SGX at run time, e.g. via mmap()/mprotect() hooks of some form. The basic functionality of pre-checking permissions is relatively straightforward, but supporting LSM auditing is complex and fraught with pitfalls. If auditing is done at the time of denial then the audit logs will potentially show a large number of false positives. Auditing when the denial is enforced, e.g. at mprotect(), suffers from its own problems, e.g.: - Requires LSMs to pre-generate audit messages so that they can be replayed by SGX when the denial is actually enforced. - System changes can result in stale audit messages, e.g. if files are removed from the system, an LSM profile is modified, etc... - A process could log what is essentially a false positive denial, e.g. if the current process has the requisite capability but the original enclave loader did not. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/include/uapi/asm/sgx.h | 6 ++-- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 15 +++++--- arch/x86/kernel/cpu/sgx/driver/main.c | 49 ++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/encl.h | 1 + tools/testing/selftests/x86/sgx/main.c | 32 +++++++++++++++-- 5 files changed, 94 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index 6dba9f282232..67a3babbb24d 100644 --- a/arch/x86/include/uapi/asm/sgx.h +++ b/arch/x86/include/uapi/asm/sgx.h @@ -35,15 +35,17 @@ struct sgx_enclave_create { * @src: address for the page data * @secinfo: address for the SECINFO data * @mrmask: bitmask for the measured 256 byte chunks + * @prot: maximal PROT_{READ,WRITE,EXEC} protections for the page */ struct sgx_enclave_add_page { __u64 addr; __u64 src; __u64 secinfo; - __u64 mrmask; + __u16 mrmask; + __u8 prot; + __u8 pad; }; - /** * struct sgx_enclave_init - parameter structure for the * %SGX_IOC_ENCLAVE_INIT ioctl diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index 3552d642b26f..e18d2afd2aad 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -2,6 +2,7 @@ // Copyright(c) 2016-19 Intel Corporation. #include <asm/mman.h> +#include <linux/mman.h> #include <linux/delay.h> #include <linux/file.h> #include <linux/hashtable.h> @@ -235,7 +236,8 @@ static int sgx_validate_secs(const struct sgx_secs *secs, } static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, - unsigned long addr) + unsigned long addr, + unsigned long prot) { struct sgx_encl_page *encl_page; int ret; @@ -247,6 +249,7 @@ static struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, return ERR_PTR(-ENOMEM); encl_page->desc = addr; encl_page->encl = encl; + encl_page->vm_prot_bits = calc_vm_prot_bits(prot, 0); ret = radix_tree_insert(&encl->page_tree, PFN_DOWN(encl_page->desc), encl_page); if (ret) { @@ -517,7 +520,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl, static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, void *data, struct sgx_secinfo *secinfo, - unsigned int mrmask) + unsigned int mrmask, unsigned long prot) { u64 page_type = secinfo->flags & SGX_SECINFO_PAGE_TYPE_MASK; struct sgx_encl_page *encl_page; @@ -543,7 +546,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, goto out; } - encl_page = sgx_encl_page_alloc(encl, addr); + encl_page = sgx_encl_page_alloc(encl, addr, prot); if (IS_ERR(encl_page)) { ret = PTR_ERR(encl_page); goto out; @@ -584,6 +587,7 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) struct sgx_enclave_add_page addp; struct sgx_secinfo secinfo; struct page *data_page; + unsigned long prot; void *data; int ret; @@ -605,7 +609,10 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) goto out; } - ret = sgx_encl_add_page(encl, addp.addr, data, &secinfo, addp.mrmask); + prot = addp.prot & (PROT_READ | PROT_WRITE | PROT_EXEC); + + ret = sgx_encl_add_page(encl, addp.addr, data, &secinfo, addp.mrmask, + prot); if (ret) goto out; diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 29384cdd0842..dabfe2a7245a 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, } #endif +/* + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages + * covered by the specific VMA. A non-existent (or yet to be added) enclave + * page is considered to have no RWX permissions, i.e. is inaccessible. + */ +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, + struct vm_area_struct *vma) +{ + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; + unsigned long idx, idx_start, idx_end; + struct sgx_encl_page *page; + + idx_start = PFN_DOWN(vma->vm_start); + idx_end = PFN_DOWN(vma->vm_end - 1); + + for (idx = idx_start; idx <= idx_end; ++idx) { + /* + * No need to take encl->lock, vm_prot_bits is set prior to + * insertion and never changes, and racing with adding pages is + * a userspace bug. + */ + rcu_read_lock(); + page = radix_tree_lookup(&encl->page_tree, idx); + rcu_read_unlock(); + + /* Do not allow R|W|X to a non-existent page. */ + if (!page) + allowed_rwx = 0; + else + allowed_rwx &= page->vm_prot_bits; + if (!allowed_rwx) + break; + } + + return allowed_rwx; +} + static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + unsigned long allowed_rwx; int ret; + allowed_rwx = sgx_allowed_rwx(encl, vma); + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) + return -EACCES; + ret = sgx_encl_mm_add(encl, vma->vm_mm); if (ret) return ret; + if (!(allowed_rwx & VM_READ)) + vma->vm_flags &= ~VM_MAYREAD; + if (!(allowed_rwx & VM_WRITE)) + vma->vm_flags &= ~VM_MAYWRITE; + if (!(allowed_rwx & VM_EXEC)) + vma->vm_flags &= ~VM_MAYEXEC; + vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; vma->vm_private_data = encl; diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 0904b3c20ed0..5ad018c8d74c 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -43,6 +43,7 @@ enum sgx_encl_page_desc { struct sgx_encl_page { unsigned long desc; + unsigned long vm_prot_bits; struct sgx_epc_page *epc_page; struct sgx_va_page *va_page; struct sgx_encl *encl; diff --git a/tools/testing/selftests/x86/sgx/main.c b/tools/testing/selftests/x86/sgx/main.c index e2265f841fb0..77e93f8e8a59 100644 --- a/tools/testing/selftests/x86/sgx/main.c +++ b/tools/testing/selftests/x86/sgx/main.c @@ -2,6 +2,7 @@ // Copyright(c) 2016-18 Intel Corporation. #include <elf.h> +#include <errno.h> #include <fcntl.h> #include <stdbool.h> #include <stdio.h> @@ -18,6 +19,8 @@ #include "../../../../../arch/x86/kernel/cpu/sgx/arch.h" #include "../../../../../arch/x86/include/uapi/asm/sgx.h" +#define PAGE_SIZE 4096 + static const uint64_t MAGIC = 0x1122334455667788ULL; struct vdso_symtab { @@ -135,8 +138,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size, for (secs->size = 4096; secs->size < bin_size; ) secs->size <<= 1; - base = mmap(NULL, secs->size, PROT_READ | PROT_WRITE | PROT_EXEC, - MAP_SHARED, dev_fd, 0); + base = mmap(NULL, secs->size, PROT_NONE, MAP_SHARED, dev_fd, 0); if (base == MAP_FAILED) { perror("mmap"); return false; @@ -147,7 +149,7 @@ static bool encl_create(int dev_fd, unsigned long bin_size, ioc.src = (unsigned long)secs; rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_CREATE, &ioc); if (rc) { - fprintf(stderr, "ECREATE failed rc=%d.\n", rc); + fprintf(stderr, "ECREATE failed rc=%d, err=%d.\n", rc, errno); munmap(base, secs->size); return false; } @@ -160,8 +162,14 @@ static bool encl_add_page(int dev_fd, unsigned long addr, void *data, { struct sgx_enclave_add_page ioc; struct sgx_secinfo secinfo; + unsigned long prot; int rc; + if (flags == SGX_SECINFO_TCS) + prot = PROT_READ | PROT_WRITE; + else + prot = PROT_READ | PROT_WRITE | PROT_EXEC; + memset(&secinfo, 0, sizeof(secinfo)); secinfo.flags = flags; @@ -169,6 +177,7 @@ static bool encl_add_page(int dev_fd, unsigned long addr, void *data, ioc.mrmask = 0xFFFF; ioc.addr = addr; ioc.src = (uint64_t)data; + ioc.prot = prot; rc = ioctl(dev_fd, SGX_IOC_ENCLAVE_ADD_PAGE, &ioc); if (rc) { @@ -184,6 +193,7 @@ static bool encl_load(struct sgx_secs *secs, unsigned long bin_size) struct sgx_enclave_init ioc; uint64_t offset; uint64_t flags; + void *addr; int dev_fd; int rc; @@ -215,6 +225,22 @@ static bool encl_load(struct sgx_secs *secs, unsigned long bin_size) goto out_map; } + addr = mmap((void *)secs->base, PAGE_SIZE, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed on TCS, errno=%d.\n", errno); + return false; + } + + addr = mmap((void *)(secs->base + PAGE_SIZE), bin_size - PAGE_SIZE, + PROT_READ | PROT_WRITE | PROT_EXEC, + MAP_SHARED | MAP_FIXED, dev_fd, 0); + if (addr == MAP_FAILED) { + fprintf(stderr, "mmap() failed, errno=%d.\n", errno); + return false; + } + + close(dev_fd); return true; out_map: -- 2.21.0
Do not allow an enclave page to be mapped with PROT_EXEC if the source vma does not have VM_MAYEXEC. This effectively enforces noexec as do_mmap() clears VM_MAYEXEC if the vma is being loaded from a noexec path, i.e. prevents executing a file by loading it into an enclave. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 42 +++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index e18d2afd2aad..1fca70a36ce3 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -564,6 +564,39 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } +static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot) +{ + struct vm_area_struct *vma; + int ret; + + /* Hold mmap_sem across copy_from_user() to avoid a TOCTOU race. */ + down_read(¤t->mm->mmap_sem); + + /* Query vma's VM_MAYEXEC as an indirect path_noexec() check. */ + if (prot & PROT_EXEC) { + vma = find_vma(current->mm, src); + if (!vma) { + ret = -EFAULT; + goto out; + } + + if (!(vma->vm_flags & VM_MAYEXEC)) { + ret = -EACCES; + goto out; + } + } + + if (copy_from_user(dst, (void __user *)src, PAGE_SIZE)) + ret = -EFAULT; + else + ret = 0; + +out: + up_read(¤t->mm->mmap_sem); + + return ret; +} + /** * sgx_ioc_enclave_add_page - handler for %SGX_IOC_ENCLAVE_ADD_PAGE * @@ -604,13 +637,12 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) data = kmap(data_page); - if (copy_from_user((void *)data, (void __user *)addp.src, PAGE_SIZE)) { - ret = -EFAULT; - goto out; - } - prot = addp.prot & (PROT_READ | PROT_WRITE | PROT_EXEC); + ret = sgx_encl_page_copy(data, addp.src, prot); + if (ret) + goto out; + ret = sgx_encl_add_page(encl, addp.addr, data, &secinfo, addp.mrmask, prot); if (ret) -- 2.21.0
SGX will use ->may_mprotect() to invoke an SGX variant of the existing file_mprotect() and mmap_file() LSM hooks. The name may_mprotect() is intended to reflect the hook's purpose as a way to restrict mprotect() as opposed to a wholesale replacement. Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be MAP_SHARED. Furthermore, all enclaves need read, write and execute VMAs. As a result, applying W^X restrictions on /dev/sgx/enclave using existing LSM hooks is for all intents and purposes impossible, e.g. denying either W or X would deny access to *any* enclave. By hooking mprotect(), SGX can invoke an SGX specific LSM hook, which in turn allows LSMs to enforce W^X policies. Alternatively, SGX could provide a helper to identify enclaves given a vma or file. LSMs could then check if a mapping is for enclave and take action according. A second alternative would be to have SGX implement its own LSM hooks for file_mprotect() and mmap_file(), using them to "forward" the call to the SGX specific hook. The major con to both alternatives is that they provide zero flexibility for the SGX specific LSM hook. The "is_sgx_enclave()" helper doesn't allow SGX can't supply any additional information whatsoever, and the mmap_file() hook is called before the final address is known, e.g. SGX can't provide any information about the specific enclave being mapped. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- include/linux/mm.h | 2 ++ mm/mprotect.c | 15 +++++++++++---- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0e8834ac32b7..b11ec420c8d7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -458,6 +458,8 @@ struct vm_operations_struct { void (*close)(struct vm_area_struct * area); int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); + int (*may_mprotect)(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long prot); vm_fault_t (*fault)(struct vm_fault *vmf); vm_fault_t (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size); diff --git a/mm/mprotect.c b/mm/mprotect.c index bf38dfbbb4b4..18732543b295 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -547,13 +547,20 @@ static int do_mprotect_pkey(unsigned long start, size_t len, goto out; } - error = security_file_mprotect(vma, reqprot, prot); - if (error) - goto out; - tmp = vma->vm_end; if (tmp > end) tmp = end; + + if (vma->vm_ops && vma->vm_ops->may_mprotect) { + error = vma->vm_ops->may_mprotect(vma, nstart, tmp, prot); + if (error) + goto out; + } + + error = security_file_mprotect(vma, reqprot, prot); + if (error) + goto out; + error = mprotect_fixup(vma, &prev, nstart, tmp, newflags); if (error) goto out; -- 2.21.0
enclave_map() is an SGX specific variant of file_mprotect() and mmap_file(), and is provided so that LSMs can apply W^X restrictions to enclaves. Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be MAP_SHARED. Furthermore, all enclaves need read, write and execute VMAs. As a result, applying W^X restrictions on /dev/sgx/enclave using existing LSM hooks is for all intents and purposes impossible, e.g. denying either W or X would deny access to any enclave. Note, extensive discussion yielded no sane alternative to some form of SGX specific LSM hook[1]. [1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/main.c | 9 ++++++++- arch/x86/kernel/cpu/sgx/encl.c | 12 ++++++++++++ include/linux/lsm_hooks.h | 12 ++++++++++++ include/linux/security.h | 11 +++++++++++ security/security.c | 7 +++++++ 5 files changed, 50 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index dabfe2a7245a..4379a2fb1f82 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -133,13 +133,20 @@ static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; - unsigned long allowed_rwx; + unsigned long allowed_rwx, prot; int ret; allowed_rwx = sgx_allowed_rwx(encl, vma); if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) return -EACCES; + prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | + _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | + _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); + ret = security_enclave_map(prot); + if (ret) + return ret; + ret = sgx_encl_mm_add(encl, vma->vm_mm); if (ret) return ret; diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index c6436bbd4a68..059d90dcaa27 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -2,6 +2,7 @@ // Copyright(c) 2016-18 Intel Corporation. #include <linux/mm.h> +#include <linux/security.h> #include <linux/shmem_fs.h> #include <linux/suspend.h> #include <linux/sched/mm.h> @@ -387,9 +388,20 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, return ret < 0 ? ret : i; } +#ifdef CONFIG_SECURITY +static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start, + unsigned long end, unsigned long prot) +{ + return security_enclave_map(prot); +} +#endif + const struct vm_operations_struct sgx_vm_ops = { .fault = sgx_vma_fault, .access = sgx_vma_access, +#ifdef CONFIG_SECURITY + .may_mprotect = sgx_vma_mprotect, +#endif }; /** diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 47f58cfb6a19..7c1357105e61 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1446,6 +1446,11 @@ * @bpf_prog_free_security: * Clean up the security information stored inside bpf prog. * + * Security hooks for Intel SGX enclaves. + * + * @enclave_map: + * @prot contains the protection that will be applied by the kernel. + * Return 0 if permission is granted. */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1807,6 +1812,10 @@ union security_list_options { int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX + int (*enclave_map)(unsigned long prot); +#endif /* CONFIG_INTEL_SGX */ }; struct security_hook_heads { @@ -2046,6 +2055,9 @@ struct security_hook_heads { struct hlist_head bpf_prog_alloc_security; struct hlist_head bpf_prog_free_security; #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX + struct hlist_head enclave_map; +#endif /* CONFIG_INTEL_SGX */ } __randomize_layout; /* diff --git a/include/linux/security.h b/include/linux/security.h index 659071c2e57c..6a1f54ba6794 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1829,5 +1829,16 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_SECURITY */ #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX +#ifdef CONFIG_SECURITY +int security_enclave_map(unsigned long prot); +#else +static inline int security_enclave_map(unsigned long prot) +{ + return 0; +} +#endif /* CONFIG_SECURITY */ +#endif /* CONFIG_INTEL_SGX */ + #endif /* ! __LINUX_SECURITY_H */ diff --git a/security/security.c b/security/security.c index 613a5c00e602..03951e08bdfc 100644 --- a/security/security.c +++ b/security/security.c @@ -2359,3 +2359,10 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) call_void_hook(bpf_prog_free_security, aux); } #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX +int security_enclave_map(unsigned long prot) +{ + return call_int_hook(enclave_map, 0, prot); +} +#endif /* CONFIG_INTEL_SGX */ -- 2.21.0
Hook enclave_map() to require a new per-process capability, SGX_MAPWX, when mapping an enclave as simultaneously writable and executable. Note, @prot contains the actual protection bits that will be set by the kernel, not the maximal protection bits specified by userspace when the page was first loaded into the enclave. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- security/selinux/hooks.c | 21 +++++++++++++++++++++ security/selinux/include/classmap.h | 3 ++- 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 3ec702cf46ca..fc239e541b62 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) } #endif +#ifdef CONFIG_INTEL_SGX +static int selinux_enclave_map(unsigned long prot) +{ + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); + + /* SGX is supported only in 64-bit kernels. */ + WARN_ON_ONCE(!default_noexec); + + if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) + return avc_has_perm(&selinux_state, sid, sid, + SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX, + NULL); + return 0; +} +#endif + struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct task_security_struct), .lbs_file = sizeof(struct file_security_struct), @@ -6968,6 +6985,10 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), #endif + +#ifdef CONFIG_INTEL_SGX + LSM_HOOK_INIT(enclave_map, selinux_enclave_map), +#endif }; static __init int selinux_init(void) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 201f7e588a29..cfd91e879bdf 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -51,7 +51,8 @@ struct security_class_mapping secclass_map[] = { "execmem", "execstack", "execheap", "setkeycreate", "setsockcreate", "getrlimit", NULL } }, { "process2", - { "nnp_transition", "nosuid_transition", NULL } }, + { "nnp_transition", "nosuid_transition", + "sgx_mapwx", NULL } }, { "system", { "ipc_info", "syslog_read", "syslog_mod", "syslog_console", "module_request", "module_load", NULL } }, -- 2.21.0
enclave_load() is roughly analogous to the existing file_mprotect(). Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be MAP_SHARED. Furthermore, all enclaves need read, write and execute VMAs. As a result, the existing/standard call to file_mprotect() does not provide any meaningful security for enclaves since an LSM can only deny/grant access to the EPC as a whole. security_enclave_load() is called when SGX is first loading an enclave page, i.e. copying a page from normal memory into the EPC. Although the prototype for enclave_load() is similar to file_mprotect(), e.g. SGX could theoretically use file_mprotect() and set reqprot=prot, a separate hook is desirable as the semantics of an enclave's protection bits are different than those of vmas, e.g. an enclave page tracks the maximal set of protections, whereas file_mprotect() operates on the actual protections being provided. Enclaves also have unique security properties, e.g. measured code, that LSMs may want to consider. In other words, LSMs will likely want to implement different policies for enclave page protections. Note, extensive discussion yielded no sane alternative to some form of SGX specific LSM hook[1]. [1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 32 ++++++++++++++------------ include/linux/lsm_hooks.h | 8 +++++++ include/linux/security.h | 7 ++++++ security/security.c | 5 ++++ 4 files changed, 37 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index 1fca70a36ce3..ae1b4d69441c 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -9,6 +9,7 @@ #include <linux/highmem.h> #include <linux/ratelimit.h> #include <linux/sched/signal.h> +#include <linux/security.h> #include <linux/shmem_fs.h> #include <linux/slab.h> #include <linux/suspend.h> @@ -564,7 +565,8 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } -static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot) +static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot, + u16 mrmask) { struct vm_area_struct *vma; int ret; @@ -572,24 +574,24 @@ static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot) /* Hold mmap_sem across copy_from_user() to avoid a TOCTOU race. */ down_read(¤t->mm->mmap_sem); + vma = find_vma(current->mm, src); + if (!vma) { + ret = -EFAULT; + goto out; + } + /* Query vma's VM_MAYEXEC as an indirect path_noexec() check. */ - if (prot & PROT_EXEC) { - vma = find_vma(current->mm, src); - if (!vma) { - ret = -EFAULT; - goto out; - } - - if (!(vma->vm_flags & VM_MAYEXEC)) { - ret = -EACCES; - goto out; - } + if ((prot & PROT_EXEC) && !(vma->vm_flags & VM_MAYEXEC)) { + ret = -EACCES; + goto out; } + ret = security_enclave_load(vma, prot, mrmask == 0xffff); + if (ret) + goto out; + if (copy_from_user(dst, (void __user *)src, PAGE_SIZE)) ret = -EFAULT; - else - ret = 0; out: up_read(¤t->mm->mmap_sem); @@ -639,7 +641,7 @@ static long sgx_ioc_enclave_add_page(struct file *filep, void __user *arg) prot = addp.prot & (PROT_READ | PROT_WRITE | PROT_EXEC); - ret = sgx_encl_page_copy(data, addp.src, prot); + ret = sgx_encl_page_copy(data, addp.src, prot, addp.mrmask); if (ret) goto out; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 7c1357105e61..3bc92c65f287 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1451,6 +1451,11 @@ * @enclave_map: * @prot contains the protection that will be applied by the kernel. * Return 0 if permission is granted. + * + * @enclave_load: + * @vma: the source memory region of the enclave page being loaded. + * @prot: the (maximal) protections of the enclave page. + * Return 0 if permission is granted. */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1815,6 +1820,8 @@ union security_list_options { #ifdef CONFIG_INTEL_SGX int (*enclave_map)(unsigned long prot); + int (*enclave_load)(struct vm_area_struct *vma, unsigned long prot, + bool measured); #endif /* CONFIG_INTEL_SGX */ }; @@ -2057,6 +2064,7 @@ struct security_hook_heads { #endif /* CONFIG_BPF_SYSCALL */ #ifdef CONFIG_INTEL_SGX struct hlist_head enclave_map; + struct hlist_head enclave_load; #endif /* CONFIG_INTEL_SGX */ } __randomize_layout; diff --git a/include/linux/security.h b/include/linux/security.h index 6a1f54ba6794..572ddfc53039 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1832,11 +1832,18 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #ifdef CONFIG_INTEL_SGX #ifdef CONFIG_SECURITY int security_enclave_map(unsigned long prot); +int security_enclave_load(struct vm_area_struct *vma, unsigned long prot, + bool measured); #else static inline int security_enclave_map(unsigned long prot) { return 0; } +static inline int security_enclave_load(struct vm_area_struct *vma, + unsigned long prot, bool measured) +{ + return 0; +} #endif /* CONFIG_SECURITY */ #endif /* CONFIG_INTEL_SGX */ diff --git a/security/security.c b/security/security.c index 03951e08bdfc..00f483beb1cc 100644 --- a/security/security.c +++ b/security/security.c @@ -2365,4 +2365,9 @@ int security_enclave_map(unsigned long prot) { return call_int_hook(enclave_map, 0, prot); } +int security_enclave_load(struct vm_area_struct *vma, unsigned long prot, + bool measured) +{ + return call_int_hook(enclave_load, 0, vma, prot, measured); +} #endif /* CONFIG_INTEL_SGX */ -- 2.21.0
The goal of selinux_enclave_load() is to provide a facsimile of the existing selinux_file_mprotect() and file_map_prot_check() policies, but tailored to the unique properties of SGX. For example, an enclave page is technically backed by a MAP_SHARED file, but the "file" is essentially shared memory that is never persisted anywhere and also requires execute permissions (for some pages). Enclaves are also less priveleged than normal user code, e.g. SYSCALL instructions #UD if attempted in an enclave. For this reason, add SGX specific permissions instead of reusing existing permissions such as FILE__EXECUTE so that policies can allow running code in an enclave, or allow dynamically loading code in an enclave without having to grant the same capability to normal user code outside of the enclave. Intended use of each permission: - SGX_EXECDIRTY: dynamically load code within the enclave itself - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. Graphene - SGX_EXECANON: load code from anonymous memory (likely Graphene) - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior Note, equivalents to FILE__READ and FILE__WRITE are intentionally never required. Writes to the enclave page are contained to the EPC, i.e. never hit the original file, and read permissions have already been vetted (or the VMA doesn't have PROT_READ, in which case loading the page into the enclave will fail). Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- security/selinux/hooks.c | 55 +++++++++++++++++++++++++++-- security/selinux/include/classmap.h | 5 +-- 2 files changed, 55 insertions(+), 5 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index fc239e541b62..8a431168e454 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6727,6 +6727,12 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) #endif #ifdef CONFIG_INTEL_SGX +static inline int sgx_has_perm(u32 sid, u32 requested) +{ + return avc_has_perm(&selinux_state, sid, sid, + SECCLASS_PROCESS2, requested, NULL); +} + static int selinux_enclave_map(unsigned long prot) { const struct cred *cred = current_cred(); @@ -6736,11 +6742,53 @@ static int selinux_enclave_map(unsigned long prot) WARN_ON_ONCE(!default_noexec); if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) - return avc_has_perm(&selinux_state, sid, sid, - SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX, - NULL); + return sgx_has_perm(sid, PROCESS2__SGX_MAPWX); + return 0; } + +static int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot, + bool measured) +{ + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); + int ret; + + /* SGX is supported only in 64-bit kernels. */ + WARN_ON_ONCE(!default_noexec); + + /* Only executable enclave pages are restricted in any way. */ + if (!(prot & PROT_EXEC)) + return 0; + + /* + * WX at load time only requires EXECDIRTY, e.g. to allow W->X. Actual + * WX mappings require MAPWX (see selinux_enclave_map()). + */ + if (prot & PROT_WRITE) { + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECDIRTY); + if (ret) + goto out; + } + if (!measured) { + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECUNMR); + if (ret) + goto out; + } + + if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file)) || + vma->anon_vma) + /* + * Loading enclave code from an anonymous mapping or from a + * modified private file mapping. + */ + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECANON); + else + /* Loading from a shared or unmodified private file mapping. */ + ret = file_has_perm(cred, vma->vm_file, FILE__SGX_EXECUTE); +out: + return ret; +} #endif struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { @@ -6988,6 +7036,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { #ifdef CONFIG_INTEL_SGX LSM_HOOK_INIT(enclave_map, selinux_enclave_map), + LSM_HOOK_INIT(enclave_load, selinux_enclave_load), #endif }; diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index cfd91e879bdf..baa1757be46a 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -7,7 +7,7 @@ #define COMMON_FILE_PERMS COMMON_FILE_SOCK_PERMS, "unlink", "link", \ "rename", "execute", "quotaon", "mounton", "audit_access", \ - "open", "execmod" + "open", "execmod", "sgx_execute" #define COMMON_SOCK_PERMS COMMON_FILE_SOCK_PERMS, "bind", "connect", \ "listen", "accept", "getopt", "setopt", "shutdown", "recvfrom", \ @@ -52,7 +52,8 @@ struct security_class_mapping secclass_map[] = { "setsockcreate", "getrlimit", NULL } }, { "process2", { "nnp_transition", "nosuid_transition", - "sgx_mapwx", NULL } }, + "sgx_mapwx", "sgx_execdirty", "sgx_execanon", "sgx_execunmr", + NULL } }, { "system", { "ipc_info", "syslog_read", "syslog_mod", "syslog_console", "module_request", "module_load", NULL } }, -- 2.21.0
Require execute permissions when loading an enclave from a file. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- security/apparmor/include/audit.h | 2 ++ security/apparmor/lsm.c | 14 ++++++++++++++ 2 files changed, 16 insertions(+) diff --git a/security/apparmor/include/audit.h b/security/apparmor/include/audit.h index ee559bc2acb8..84470483e04d 100644 --- a/security/apparmor/include/audit.h +++ b/security/apparmor/include/audit.h @@ -107,6 +107,8 @@ enum audit_type { #define OP_PROF_LOAD "profile_load" #define OP_PROF_RM "profile_remove" +#define OP_ENCL_LOAD "enclave_load" + struct apparmor_audit_data { int error; diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c index 87500bde5a92..2ed1157e1f58 100644 --- a/security/apparmor/lsm.c +++ b/security/apparmor/lsm.c @@ -517,6 +517,17 @@ static int apparmor_file_mprotect(struct vm_area_struct *vma, !(vma->vm_flags & VM_SHARED) ? MAP_PRIVATE : 0); } +#ifdef CONFIG_INTEL_SGX +static int apparmor_enclave_load(struct vm_area_struct *vma, unsigned long prot, + bool measured) +{ + if (!(prot & PROT_EXEC)) + return 0; + + return common_file_perm(OP_ENCL_LOAD, vma->vm_file, AA_EXEC_MMAP); +} +#endif + static int apparmor_sb_mount(const char *dev_name, const struct path *path, const char *type, unsigned long flags, void *data) { @@ -1243,6 +1254,9 @@ static struct security_hook_list apparmor_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(secid_to_secctx, apparmor_secid_to_secctx), LSM_HOOK_INIT(secctx_to_secid, apparmor_secctx_to_secid), LSM_HOOK_INIT(release_secctx, apparmor_release_secctx), +#ifdef CONFIG_INTEL_SGX + LSM_HOOK_INIT(enclave_load, apparmor_enclave_load), +#endif }; /* -- 2.21.0
Wire up a theoretical EAUG flag to show that the proposed LSM model is extensible to SGX2, i.e. that SGX can communicate to LSMs that an EAUG'd page is being mapped executable, as opposed to having to require userspace to state that an EAUG'd page *may* be mapped executable in the future. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> --- arch/x86/kernel/cpu/sgx/driver/main.c | 10 +++++--- arch/x86/kernel/cpu/sgx/encl.c | 33 ++++++++++++++++++++++++++- arch/x86/kernel/cpu/sgx/encl.h | 2 ++ include/linux/lsm_hooks.h | 2 +- include/linux/security.h | 4 ++-- security/security.c | 4 ++-- security/selinux/hooks.c | 4 +++- 7 files changed, 49 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 4379a2fb1f82..b478c0f45279 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -99,7 +99,8 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, * page is considered to have no RWX permissions, i.e. is inaccessible. */ static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, - struct vm_area_struct *vma) + struct vm_area_struct *vma, + bool *eaug) { unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; unsigned long idx, idx_start, idx_end; @@ -123,6 +124,8 @@ static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, allowed_rwx = 0; else allowed_rwx &= page->vm_prot_bits; + if (page->vm_prot_bits & SGX_VM_EAUG) + *eaug = true; if (!allowed_rwx) break; } @@ -134,16 +137,17 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; unsigned long allowed_rwx, prot; + bool eaug = false; int ret; - allowed_rwx = sgx_allowed_rwx(encl, vma); + allowed_rwx = sgx_allowed_rwx(encl, vma, &eaug); if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) return -EACCES; prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); - ret = security_enclave_map(prot); + ret = security_enclave_map(prot, eaug); if (ret) return ret; diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 059d90dcaa27..2e64676a8144 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -389,10 +389,41 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, } #ifdef CONFIG_SECURITY +static bool is_eaug_range(struct sgx_encl *encl, unsigned long start, + unsigned long end) +{ + unsigned long idx, idx_start, idx_end; + struct sgx_encl_page *page; + + /* Enclave is dead or inaccessible. */ + if (!encl) + return false; + + idx_start = PFN_DOWN(start); + idx_end = PFN_DOWN(end - 1); + + for (idx = idx_start; idx <= idx_end; ++idx) { + /* + * No need to take encl->lock, vm_prot_bits is set prior to + * insertion and never changes, and racing with adding pages is + * a userspace bug. + */ + rcu_read_lock(); + page = radix_tree_lookup(&encl->page_tree, idx); + rcu_read_unlock(); + + /* Non-existent page can only be PROT_NONE, bail early. */ + if (!page || page->vm_prot_bits & SGX_VM_EAUG) + return true; + } + + return false; +} static int sgx_vma_mprotect(struct vm_area_struct *vma, unsigned long start, unsigned long end, unsigned long prot) { - return security_enclave_map(prot); + return security_enclave_map(prot, + is_eaug_range(vma->vm_private_data, start, end)); } #endif diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 5ad018c8d74c..dae1a22dc87c 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -41,6 +41,8 @@ enum sgx_encl_page_desc { #define SGX_ENCL_PAGE_VA_OFFSET(encl_page) \ ((encl_page)->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK) +#define SGX_VM_EAUG BIT(3) + struct sgx_encl_page { unsigned long desc; unsigned long vm_prot_bits; diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 3bc92c65f287..d7da732cf56e 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1819,7 +1819,7 @@ union security_list_options { #endif /* CONFIG_BPF_SYSCALL */ #ifdef CONFIG_INTEL_SGX - int (*enclave_map)(unsigned long prot); + int (*enclave_map)(unsigned long prot, bool eaug); int (*enclave_load)(struct vm_area_struct *vma, unsigned long prot, bool measured); #endif /* CONFIG_INTEL_SGX */ diff --git a/include/linux/security.h b/include/linux/security.h index 572ddfc53039..c55e14d776c8 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1831,11 +1831,11 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #ifdef CONFIG_INTEL_SGX #ifdef CONFIG_SECURITY -int security_enclave_map(unsigned long prot); +int security_enclave_map(unsigned long prot, bool eaug); int security_enclave_load(struct vm_area_struct *vma, unsigned long prot, bool measured); #else -static inline int security_enclave_map(unsigned long prot) +static inline int security_enclave_map(unsigned long prot, bool eaug) { return 0; } diff --git a/security/security.c b/security/security.c index 00f483beb1cc..f276f05341f2 100644 --- a/security/security.c +++ b/security/security.c @@ -2361,9 +2361,9 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_BPF_SYSCALL */ #ifdef CONFIG_INTEL_SGX -int security_enclave_map(unsigned long prot) +int security_enclave_map(unsigned long prot, bool eaug) { - return call_int_hook(enclave_map, 0, prot); + return call_int_hook(enclave_map, 0, prot, eaug); } int security_enclave_load(struct vm_area_struct *vma, unsigned long prot, bool measured) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 8a431168e454..f349419d4c12 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -6733,7 +6733,7 @@ static inline int sgx_has_perm(u32 sid, u32 requested) SECCLASS_PROCESS2, requested, NULL); } -static int selinux_enclave_map(unsigned long prot) +static int selinux_enclave_map(unsigned long prot, bool eaug) { const struct cred *cred = current_cred(); u32 sid = cred_sid(cred); @@ -6743,6 +6743,8 @@ static int selinux_enclave_map(unsigned long prot) if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) return sgx_has_perm(sid, PROCESS2__SGX_MAPWX); + else if (eaug && (prot & PROT_EXEC)) + return sgx_has_perm(sid, PROCESS2__SGX_EXECDIRTY); return 0; } -- 2.21.0
On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote: > Using per-vma refcounting to track mm_structs associated with an enclave > requires hooking .vm_close(), which in turn prevents the mm from merging > vmas (precisely to allow refcounting). Why having sgx_vma_close() prevents that? I do not understand the problem statement. > Avoid refcounting encl_mm altogether by registering an mmu_notifier at > .mmap(), removing the dying encl_mm at mmu_notifier.release() and > protecting mm_list during reclaim via a per-enclave SRCU. Right, there is the potential collision with my changes: 1. Your patch: enclave life-cycle equals life-cycle of all processes that are associated with the enclave. 2. My (yet be sent) patch: enclave life-cycle equals the life cycle. I won't rush with my patch. I rather merge neither at this point and you can review mine after you come back from your vacation. > Removing refcounting/vm_close() allows merging of enclave vmas, at the > cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is > disassociated from an enclave when the mm exits or the enclave dies, as > opposed to when the last vma (in a given mm) is closed. > > The impact of delying encl_mm removal is its memory footprint and > whatever overhead is incurred during EPC reclaim (to walk an mm's vmas). > Practically speaking, a stale encl_mm will exist for a meaningful amount > of time if and only if the enclave is mapped in a long-lived process and > then passed off to another long-lived process. It is expected that the > vast majority of use cases will not encounter this condition, e.g. even > using a daemon to build enclaves should not result in a stale encl_mm as > the builder should never need to mmap() the enclave. This paragraph speaks only about "well behaving" software. > Even if there are scenarios that lead to defunct encl_mms, the cost is > likely far outweighed by the benefits of reducing the number of vmas > across all enclaves. > > Note, using SRCU to protect mm_list is not strictly necessary, i.e. the > existing walker with encl_mm refcounting could be massaged to work with > mmu_notifier.release(), but the resulting code is subtle and fragile (I > never actually got it working). The primary issue is that an encl_mm > can't be moved off the list until its refcount goes to zero, otherwise > the custom walker goes off into the weeds. The refcount requirement > then prevents using mm_list to identify if an mmu_notifier.release() > has fired, i.e. another mechanism is needed to guard against races > between exit_mmap() and sgx_release(). Is it really impossible to send a separate SRCU patch? I fully agree with the SRCU whereas rest of this patch is still under debate. If you could do that, I can merge it in no time. It is a small step into better direction. > Cc: Dave Hansen <dave.hansen@intel.com> > Cc: Andy Lutomirski <luto@kernel.org> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Needs to be rebased because the master missing your earlier bug fix. > --- > arch/x86/Kconfig | 2 + > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 14 -- > arch/x86/kernel/cpu/sgx/driver/main.c | 38 ++++ > arch/x86/kernel/cpu/sgx/encl.c | 234 +++++++++++-------------- > arch/x86/kernel/cpu/sgx/encl.h | 19 +- > arch/x86/kernel/cpu/sgx/reclaim.c | 71 +++----- > 6 files changed, 182 insertions(+), 196 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index a0fd17c32521..940c52762f24 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1918,6 +1918,8 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS > config INTEL_SGX > bool "Intel SGX core functionality" > depends on X86_64 && CPU_SUP_INTEL > + select MMU_NOTIFIER > + select SRCU > ---help--- > Intel(R) SGX is a set of CPU instructions that can be used by > applications to set aside private regions of code and data, referred > diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > index d17c60dca114..3552d642b26f 100644 > --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > @@ -276,7 +276,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) > { > unsigned long encl_size = secs->size + PAGE_SIZE; > struct sgx_epc_page *secs_epc; > - struct sgx_encl_mm *encl_mm; > unsigned long ssaframesize; > struct sgx_pageinfo pginfo; > struct sgx_secinfo secinfo; > @@ -311,12 +310,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) > > INIT_WORK(&encl->work, sgx_add_page_worker); > > - encl_mm = sgx_encl_mm_add(encl, current->mm); > - if (IS_ERR(encl_mm)) { > - ret = PTR_ERR(encl_mm); > - goto err_out; > - } > - > secs_epc = sgx_alloc_page(&encl->secs, true); > if (IS_ERR(secs_epc)) { > ret = PTR_ERR(secs_epc); > @@ -369,13 +362,6 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) > encl->backing = NULL; > } > > - if (!list_empty(&encl->mm_list)) { > - encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm, > - list); > - list_del(&encl_mm->list); > - kfree(encl_mm); > - } > - > mutex_unlock(&encl->lock); > return ret; > } > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c > index 0c831ee5e2de..07aa5f91b2dd 100644 > --- a/arch/x86/kernel/cpu/sgx/driver/main.c > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c > @@ -25,6 +25,7 @@ u32 sgx_xsave_size_tbl[64]; > static int sgx_open(struct inode *inode, struct file *file) > { > struct sgx_encl *encl; > + int ret; > > encl = kzalloc(sizeof(*encl), GFP_KERNEL); > if (!encl) > @@ -38,6 +39,12 @@ static int sgx_open(struct inode *inode, struct file *file) > INIT_LIST_HEAD(&encl->mm_list); > spin_lock_init(&encl->mm_lock); > > + ret = init_srcu_struct(&encl->srcu); > + if (ret) { > + kfree(encl); > + return ret; > + } > + > file->private_data = encl; > > return 0; > @@ -46,6 +53,32 @@ static int sgx_open(struct inode *inode, struct file *file) > static int sgx_release(struct inode *inode, struct file *file) > { > struct sgx_encl *encl = file->private_data; > + struct sgx_encl_mm *encl_mm; > + > + /* > + * Objects can't be *moved* off an RCU protected list (deletion is ok), > + * nor can the object be freed until after synchronize_srcu(). > + */ > +restart: > + spin_lock(&encl->mm_lock); > + if (list_empty(&encl->mm_list)) { > + encl_mm = NULL; > + } else { > + encl_mm = list_first_entry(&encl->mm_list, struct sgx_encl_mm, > + list); > + list_del_rcu(&encl_mm->list); > + } > + spin_unlock(&encl->mm_lock); > + > + if (encl_mm) { > + synchronize_srcu(&encl->srcu); > + > + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); > + > + sgx_encl_mm_release(encl_mm); > + > + goto restart; > + } > > kref_put(&encl->refcount, sgx_encl_release); > > @@ -63,6 +96,11 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, > static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > { > struct sgx_encl *encl = file->private_data; > + int ret; > + > + ret = sgx_encl_mm_add(encl, vma->vm_mm); > + if (ret) > + return ret; > > vma->vm_ops = &sgx_vm_ops; > vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > index 9566eb72d417..c6436bbd4a68 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.c > +++ b/arch/x86/kernel/cpu/sgx/encl.c > @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > return entry; > } > > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > - struct mm_struct *mm) > +static void sgx_encl_mm_release_wq(struct work_struct *work) > +{ > + struct sgx_encl_mm *encl_mm = > + container_of(work, struct sgx_encl_mm, release_work); > + > + sgx_encl_mm_release(encl_mm); > +} > + > +/* > + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release() > + * is anything but short. Do the final freeing in yet another async callback. > + */ > +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu) Would rename this either as *_tail() or *_deferred(). > +{ > + struct sgx_encl_mm *encl_mm = > + container_of(rcu, struct sgx_encl_mm, rcu); > + > + INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq); > + schedule_work(&encl_mm->release_work); > +} > + > +static void sgx_mmu_notifier_release(struct mmu_notifier *mn, > + struct mm_struct *mm) > +{ > + struct sgx_encl_mm *encl_mm = > + container_of(mn, struct sgx_encl_mm, mmu_notifier); > + struct sgx_encl_mm *tmp = NULL; > + > + /* > + * The enclave itself can remove encl_mm. Note, objects can't be moved > + * off an RCU protected list, but deletion is ok. > + */ > + spin_lock(&encl_mm->encl->mm_lock); > + list_for_each_entry(tmp, &encl_mm->encl->mm_list, list) { > + if (tmp == encl_mm) { > + list_del_rcu(&encl_mm->list); > + break; > + } > + } > + spin_unlock(&encl_mm->encl->mm_lock); > + > + if (tmp == encl_mm) { > + synchronize_srcu(&encl_mm->encl->srcu); > + > + /* > + * Delay freeing encl_mm until after mmu_notifier releases any > + * SRCU locks. synchronize_srcu() must be called from process > + * context, i.e. we can't throw mmu_notifier_unregister() in a > + * work queue and be done with it. > + */ > + mmu_notifier_unregister_no_release(mn, mm); > + mmu_notifier_call_srcu(&encl_mm->rcu, > + &sgx_encl_mm_release_delayed); > + } > +} > + > +static const struct mmu_notifier_ops sgx_mmu_notifier_ops = { > + .release = sgx_mmu_notifier_release, > +}; > + > +static struct sgx_encl_mm *sgx_encl_find_mm(struct sgx_encl *encl, > + struct mm_struct *mm) > +{ > + struct sgx_encl_mm *encl_mm = NULL; > + struct sgx_encl_mm *tmp; > + int idx; > + > + idx = srcu_read_lock(&encl->srcu); > + > + list_for_each_entry_rcu(tmp, &encl->mm_list, list) { > + if (tmp->mm == mm) { > + encl_mm = tmp; > + break; > + } > + } > + > + srcu_read_unlock(&encl->srcu, idx); > + > + return encl_mm; > +} > + > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) > { > struct sgx_encl_mm *encl_mm; > + int ret; > + > + lockdep_assert_held_exclusive(&mm->mmap_sem); > + > + /* > + * mm_structs are kept on mm_list until the mm or the enclave dies, > + * i.e. once an mm is off the list, it's gone for good, therefore it's > + * impossible to get a false positive on @mm due to a stale mm_list. > + */ > + if (sgx_encl_find_mm(encl, mm)) > + return 0; > > encl_mm = kzalloc(sizeof(*encl_mm), GFP_KERNEL); > if (!encl_mm) > - return ERR_PTR(-ENOMEM); > + return -ENOMEM; > > encl_mm->encl = encl; > encl_mm->mm = mm; > - kref_init(&encl_mm->refcount); > + encl_mm->mmu_notifier.ops = &sgx_mmu_notifier_ops; > + > + ret = __mmu_notifier_register(&encl_mm->mmu_notifier, mm); > + if (ret) { > + kfree(encl_mm); > + return ret; > + } > + > + kref_get(&encl->refcount); > > spin_lock(&encl->mm_lock); > - list_add(&encl_mm->list, &encl->mm_list); > + list_add_rcu(&encl_mm->list, &encl->mm_list); > spin_unlock(&encl->mm_lock); > > - return encl_mm; > -} > + synchronize_srcu(&encl->srcu); > > -void sgx_encl_mm_release(struct kref *ref) > -{ > - struct sgx_encl_mm *encl_mm = > - container_of(ref, struct sgx_encl_mm, refcount); > - > - spin_lock(&encl_mm->encl->mm_lock); > - list_del(&encl_mm->list); > - spin_unlock(&encl_mm->encl->mm_lock); > - > - kfree(encl_mm); > -} > - > -static struct sgx_encl_mm *sgx_encl_get_mm(struct sgx_encl *encl, > - struct mm_struct *mm) > -{ > - struct sgx_encl_mm *encl_mm = NULL; > - struct sgx_encl_mm *prev_mm = NULL; > - int iter; > - > - while (true) { > - encl_mm = sgx_encl_next_mm(encl, prev_mm, &iter); > - if (prev_mm) > - kref_put(&prev_mm->refcount, sgx_encl_mm_release); > - prev_mm = encl_mm; > - > - if (iter == SGX_ENCL_MM_ITER_DONE) > - break; > - > - if (iter == SGX_ENCL_MM_ITER_RESTART) > - continue; > - > - if (mm == encl_mm->mm) > - return encl_mm; > - } > - > - return NULL; > -} > - > -static void sgx_vma_open(struct vm_area_struct *vma) > -{ > - struct sgx_encl *encl = vma->vm_private_data; > - struct sgx_encl_mm *encl_mm; > - > - if (!encl) > - return; > - > - if (encl->flags & SGX_ENCL_DEAD) > - goto error; > - > - encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); > - if (!encl_mm) { > - encl_mm = sgx_encl_mm_add(encl, vma->vm_mm); > - if (IS_ERR(encl_mm)) > - goto error; > - } > - > - return; > - > -error: > - vma->vm_private_data = NULL; > -} > - > -static void sgx_vma_close(struct vm_area_struct *vma) > -{ > - struct sgx_encl *encl = vma->vm_private_data; > - struct sgx_encl_mm *encl_mm; > - > - if (!encl) > - return; > - > - encl_mm = sgx_encl_get_mm(encl, vma->vm_mm); > - if (encl_mm) { > - kref_put(&encl_mm->refcount, sgx_encl_mm_release); > - > - /* Release kref for the VMA. */ > - kref_put(&encl_mm->refcount, sgx_encl_mm_release); > - } > + return 0; > } > > static unsigned int sgx_vma_fault(struct vm_fault *vmf) > @@ -366,8 +388,6 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, > } > > const struct vm_operations_struct sgx_vm_ops = { > - .close = sgx_vma_close, > - .open = sgx_vma_open, > .fault = sgx_vma_fault, > .access = sgx_vma_access, > }; > @@ -465,7 +485,7 @@ void sgx_encl_release(struct kref *ref) > if (encl->backing) > fput(encl->backing); > > - WARN(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); > + WARN_ONCE(!list_empty(&encl->mm_list), "sgx: mm_list non-empty"); > > kfree(encl); > } > @@ -503,46 +523,6 @@ struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index) > return shmem_read_mapping_page_gfp(mapping, index, gfpmask); > } > > -/** > - * sgx_encl_next_mm() - Iterate to the next mm > - * @encl: an enclave > - * @mm: an mm list entry > - * @iter: iterator status > - * > - * Return: the enclave mm or NULL > - */ > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, > - struct sgx_encl_mm *encl_mm, int *iter) > -{ > - struct list_head *entry; > - > - WARN(!encl, "%s: encl is NULL", __func__); > - WARN(!iter, "%s: iter is NULL", __func__); > - > - spin_lock(&encl->mm_lock); > - > - entry = encl_mm ? encl_mm->list.next : encl->mm_list.next; > - WARN(!entry, "%s: entry is NULL", __func__); > - > - if (entry == &encl->mm_list) { > - spin_unlock(&encl->mm_lock); > - *iter = SGX_ENCL_MM_ITER_DONE; > - return NULL; > - } > - > - encl_mm = list_entry(entry, struct sgx_encl_mm, list); > - > - if (!kref_get_unless_zero(&encl_mm->refcount)) { > - spin_unlock(&encl->mm_lock); > - *iter = SGX_ENCL_MM_ITER_RESTART; > - return NULL; > - } > - > - spin_unlock(&encl->mm_lock); > - *iter = SGX_ENCL_MM_ITER_NEXT; > - return encl_mm; > -} > - > static int sgx_encl_test_and_clear_young_cb(pte_t *ptep, pgtable_t token, > unsigned long addr, void *data) > { > diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h > index c557f0374d74..0904b3c20ed0 100644 > --- a/arch/x86/kernel/cpu/sgx/encl.h > +++ b/arch/x86/kernel/cpu/sgx/encl.h > @@ -9,9 +9,11 @@ > #include <linux/kref.h> > #include <linux/list.h> > #include <linux/mm_types.h> > +#include <linux/mmu_notifier.h> > #include <linux/mutex.h> > #include <linux/notifier.h> > #include <linux/radix-tree.h> > +#include <linux/srcu.h> > #include <linux/workqueue.h> > > /** > @@ -57,8 +59,10 @@ enum sgx_encl_flags { > struct sgx_encl_mm { > struct sgx_encl *encl; > struct mm_struct *mm; > - struct kref refcount; > struct list_head list; > + struct mmu_notifier mmu_notifier; > + struct work_struct release_work; > + struct rcu_head rcu; > }; > > struct sgx_encl { > @@ -72,6 +76,7 @@ struct sgx_encl { > spinlock_t mm_lock; > struct file *backing; > struct kref refcount; > + struct srcu_struct srcu; > unsigned long base; > unsigned long size; > unsigned long ssaframesize; > @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl); > void sgx_encl_release(struct kref *ref); > pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page); > struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index); > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, > - struct sgx_encl_mm *encl_mm, int *iter); > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > - struct mm_struct *mm); > -void sgx_encl_mm_release(struct kref *ref); > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); > +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm) > +{ > + kref_put(&encl_mm->encl->refcount, sgx_encl_release); > + > + kfree(encl_mm); > +} Please just open code this to the two call sites. Makes the code hard to follow. Right now I did not find anything else questionable from the code changes. Repeating myself but if it is by any means possible before going away, can you construct a pure SRCU patch. I could then reconstruct my changes on top off that, which would make evalution of both heck a lot easier. /Jarkko
On Wed, Jun 19, 2019 at 03:23:51PM -0700, Sean Christopherson wrote:
> SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> tracked and enforced by the CPU using a base+mask approach, similar to
> how hardware range registers such as the variable MTRRs. As a result,
> the ELRANGE must be naturally sized and aligned.
>
> To reduce boilerplate code that would be needed in every userspace
> enclave loader, the SGX driver naturally aligns the mmap() address and
> also requires the range to be naturally sized. Unfortunately, SGX fails
> to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> if userspace is attempting to map a small slice of an existing enclave.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> ---
> arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
> index 07aa5f91b2dd..29384cdd0842 100644
> --- a/arch/x86/kernel/cpu/sgx/driver/main.c
> +++ b/arch/x86/kernel/cpu/sgx/driver/main.c
> @@ -115,7 +115,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
> unsigned long pgoff,
> unsigned long flags)
> {
> - if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE)
> + if (flags & MAP_PRIVATE)
> + return -EINVAL;
> +
> + if (flags & MAP_FIXED)
> + return addr;
> +
> + if (len < 2 * PAGE_SIZE || len & (len - 1))
> return -EINVAL;
Why the last check is needed given that mmap() no longer does not
associate with the layout of the enclave? I'd just wipe it away.
/Jarkko
On Wed, Jun 19, 2019 at 03:23:52PM -0700, Sean Christopherson wrote:
> The SGX enclave loader doesn't need an executable stack, but linkers
> will assume it does due to the lack of .note.GNU-stack sections in the
> loader's assembly code. As a result, the kernel tags the loader as
> having "read implies exec", and so adds PROT_EXEC to all mmap()s, even
> those for mapping EPC regions. This will cause problems in the future
> when userspace needs to explicit state a page's protection bits when the
> page is added to an enclave, e.g. adding TCS pages as R+W will cause
> mmap() to fail when the kernel tacks on +X.
>
> Explicitly tell the linker that an executable stack is not needed.
> Alternatively, each .S file could add .note.GNU-stack, but the loader
> should never need an executable stack so zap it in one fell swoop.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
OK, this one is squashed now. Thanks.
/Jarkko
On Fri, Jun 21, 2019 at 12:08:51AM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 19, 2019 at 03:23:51PM -0700, Sean Christopherson wrote:
> > SGX enclaves have an associated Enclave Linear Range (ELRANGE) that is
> > tracked and enforced by the CPU using a base+mask approach, similar to
> > how hardware range registers such as the variable MTRRs. As a result,
> > the ELRANGE must be naturally sized and aligned.
> >
> > To reduce boilerplate code that would be needed in every userspace
> > enclave loader, the SGX driver naturally aligns the mmap() address and
> > also requires the range to be naturally sized. Unfortunately, SGX fails
> > to grant a waiver to the MAP_FIXED case, e.g. incorrectly rejects mmap()
> > if userspace is attempting to map a small slice of an existing enclave.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> > arch/x86/kernel/cpu/sgx/driver/main.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c
> > index 07aa5f91b2dd..29384cdd0842 100644
> > --- a/arch/x86/kernel/cpu/sgx/driver/main.c
> > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c
> > @@ -115,7 +115,13 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
> > unsigned long pgoff,
> > unsigned long flags)
> > {
> > - if (len < 2 * PAGE_SIZE || len & (len - 1) || flags & MAP_PRIVATE)
> > + if (flags & MAP_PRIVATE)
> > + return -EINVAL;
> > +
> > + if (flags & MAP_FIXED)
> > + return addr;
> > +
> > + if (len < 2 * PAGE_SIZE || len & (len - 1))
> > return -EINVAL;
>
> Why the last check is needed given that mmap() no longer does not
> associate with the layout of the enclave? I'd just wipe it away.
In any case, I squashed this one.
/Jarkko
On Wed, Jun 19, 2019 at 03:23:53PM -0700, Sean Christopherson wrote: > arch/x86/include/uapi/asm/sgx.h | 6 ++-- > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 15 +++++--- > arch/x86/kernel/cpu/sgx/driver/main.c | 49 ++++++++++++++++++++++++++ > arch/x86/kernel/cpu/sgx/encl.h | 1 + > tools/testing/selftests/x86/sgx/main.c | 32 +++++++++++++++-- Please split the kselftest change to a separate patch. > 5 files changed, 94 insertions(+), 9 deletions(-) > > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h > index 6dba9f282232..67a3babbb24d 100644 > --- a/arch/x86/include/uapi/asm/sgx.h > +++ b/arch/x86/include/uapi/asm/sgx.h > @@ -35,15 +35,17 @@ struct sgx_enclave_create { > * @src: address for the page data > * @secinfo: address for the SECINFO data > * @mrmask: bitmask for the measured 256 byte chunks > + * @prot: maximal PROT_{READ,WRITE,EXEC} protections for the page > */ > struct sgx_enclave_add_page { > __u64 addr; > __u64 src; > __u64 secinfo; > - __u64 mrmask; > + __u16 mrmask; > + __u8 prot; > + __u8 pad; __u8 pad[7]; > +/* > + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages > + * covered by the specific VMA. A non-existent (or yet to be added) enclave > + * page is considered to have no RWX permissions, i.e. is inaccessible. > + */ That was a bit hard to grasp (at least for me). I would rephrase it like: /** * sgx_calc_vma_prot_intersection() - Calculate intersection of the permissions * for a VMA * @encl: an enclave * @vma: a VMA inside the enclave * * Iterate through the page addresses inside the VMA and calculate a bitmask * of permissions that all pages have in common. Page addresses that do * not have an associated enclave page are interpreted to zero * permissions. */ > +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, > + struct vm_area_struct *vma) Suggestion for the name: sgx_calc_vma_prot_intersection() > +{ > + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; > + unsigned long idx, idx_start, idx_end; > + struct sgx_encl_page *page; > + > + idx_start = PFN_DOWN(vma->vm_start); > + idx_end = PFN_DOWN(vma->vm_end - 1); Suggestion: just open code these to the for-statement. > + > + for (idx = idx_start; idx <= idx_end; ++idx) { > + /* > + * No need to take encl->lock, vm_prot_bits is set prior to > + * insertion and never changes, and racing with adding pages is > + * a userspace bug. > + */ > + rcu_read_lock(); > + page = radix_tree_lookup(&encl->page_tree, idx); > + rcu_read_unlock(); > + > + /* Do not allow R|W|X to a non-existent page. */ > + if (!page) > + allowed_rwx = 0; > + else > + allowed_rwx &= page->vm_prot_bits; This would be a more clean way to express the same: if (!page) return 0; allowed_rwx &= page->vm_prot_bits; /Jarkko
On Fri, Jun 21, 2019 at 04:07:53AM +0300, Jarkko Sakkinen wrote:
> /**
> * sgx_calc_vma_prot_intersection() - Calculate intersection of the permissions
> * for a VMA
> * @encl: an enclave
> * @vma: a VMA inside the enclave
> *
> * Iterate through the page addresses inside the VMA and calculate a bitmask
> * of permissions that all pages have in common. Page addresses that do
> * not have an associated enclave page are interpreted to zero
> * permissions.
> */
>
> > +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl,
> > + struct vm_area_struct *vma)
>
> Suggestion for the name: sgx_calc_vma_prot_intersection()
And have you thought off caching these results?
I.e. hold the result for each VMA and only recalculate when the old
value is dirty. Just a random thought, zero analysis but though that
good to mention anyway.
/Jarkko
On Wed, Jun 19, 2019 at 03:23:54PM -0700, Sean Christopherson wrote: > Do not allow an enclave page to be mapped with PROT_EXEC if the source > vma does not have VM_MAYEXEC. This effectively enforces noexec as > do_mmap() clears VM_MAYEXEC if the vma is being loaded from a noexec > path, i.e. prevents executing a file by loading it into an enclave. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 42 +++++++++++++++++++++++--- > 1 file changed, 37 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > index e18d2afd2aad..1fca70a36ce3 100644 > --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c > +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > @@ -564,6 +564,39 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, > return ret; > } > > +static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot) I will probably forget the context with this name after this has been merged :-) So many functions dealing with enclave pages. Two alternatives that come up to my mind: 1. sgx_encl_page_user_import() 2. sgx_encl_page_user_copy_from() Not saying that they are beatiful names but at least you immediately know the context. > +{ > + struct vm_area_struct *vma; > + int ret; > + > + /* Hold mmap_sem across copy_from_user() to avoid a TOCTOU race. */ > + down_read(¤t->mm->mmap_sem); > + > + /* Query vma's VM_MAYEXEC as an indirect path_noexec() check. */ > + if (prot & PROT_EXEC) { > + vma = find_vma(current->mm, src); > + if (!vma) { > + ret = -EFAULT; > + goto out; Should this be -EINVAL instead? /Jarkko
On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote:
> [SNAP]
The reason for delaying my proposal for encl->refcount (which I will
post as RFC and won't merge it before getting screened from you) was
that I really focused making the most well thought comments to this.
Generally I think that we need to get your SRCU changes in before
anything in the form that the patch does not anything else but only
that. Without doing that comparing changes dealing with the life-cycle
of an enclave is way too much mind bending.
/Jarkko
On Wed, Jun 19, 2019 at 03:23:55PM -0700, Sean Christopherson wrote:
> SGX will use ->may_mprotect() to invoke an SGX variant of the existing
> file_mprotect() and mmap_file() LSM hooks.
>
> The name may_mprotect() is intended to reflect the hook's purpose as a
> way to restrict mprotect() as opposed to a wholesale replacement.
>
> Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave
> VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be
> MAP_SHARED. Furthermore, all enclaves need read, write and execute
> VMAs. As a result, applying W^X restrictions on /dev/sgx/enclave using
> existing LSM hooks is for all intents and purposes impossible, e.g.
> denying either W or X would deny access to *any* enclave.
>
> By hooking mprotect(), SGX can invoke an SGX specific LSM hook, which in
> turn allows LSMs to enforce W^X policies.
>
> Alternatively, SGX could provide a helper to identify enclaves given a
> vma or file. LSMs could then check if a mapping is for enclave and take
> action according.
>
> A second alternative would be to have SGX implement its own LSM hooks
> for file_mprotect() and mmap_file(), using them to "forward" the call to
> the SGX specific hook.
>
> The major con to both alternatives is that they provide zero flexibility
> for the SGX specific LSM hook. The "is_sgx_enclave()" helper doesn't
> allow SGX can't supply any additional information whatsoever, and the
> mmap_file() hook is called before the final address is known, e.g. SGX
> can't provide any information about the specific enclave being mapped.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Absolutely nothing to say about this one. We can take it as part of the
main patch set as it is. Not going to apply it though before the things
have been sorted out.
/Jarkko
On Wed, Jun 19, 2019 at 03:23:56PM -0700, Sean Christopherson wrote: > enclave_map() is an SGX specific variant of file_mprotect() and > mmap_file(), and is provided so that LSMs can apply W^X restrictions to > enclaves. > > Due to the nature of SGX and its Enclave Page Cache (EPC), all enclave > VMAs are backed by a single file, i.e. /dev/sgx/enclave, that must be > MAP_SHARED. Furthermore, all enclaves need read, write and execute > VMAs. As a result, applying W^X restrictions on /dev/sgx/enclave using > existing LSM hooks is for all intents and purposes impossible, e.g. > denying either W or X would deny access to any enclave. > > Note, extensive discussion yielded no sane alternative to some form of > SGX specific LSM hook[1]. > > [1] https://lkml.kernel.org/r/CALCETrXf8mSK45h7sTK5Wf+pXLVn=Bjsc_RLpgO-h-qdzBRo5Q@mail.gmail.com > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> All the non-LSM changes are almost cleared from my part. I would suggest that we scrape v21 together as soon as you return from your vacation discluding the LSM hooks. There is no any particular reason to get the LSM changes to the mainline before the SGX foundations so now is the right time close things underlying them. I'm now in the same boat with your changes to the ioctl API, which means that we are ready to go. I feel a tiny bit bad that it took me so long time with [1] but I'm a simple minded person so what I can do :-) Once you can come back please deal with the suggestions that I made and provide a "pure" SRCU patch (apologies for repeating myself). I will the squash them to the existing patch set. After that is fully done we can make v21 scope decision when it comes to the enclave life-cycle. Even if the LSM changes would not be upstreamed as part of the foundations I can start holding versions of them in my tree but only after v21 is out. Can you cope with this plan? [1] https://patchwork.kernel.org/patch/11005431/ /Jarkko
> From: Christopherson, Sean J > Sent: Wednesday, June 19, 2019 3:24 PM > > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index > 6dba9f282232..67a3babbb24d 100644 > --- a/arch/x86/include/uapi/asm/sgx.h > +++ b/arch/x86/include/uapi/asm/sgx.h > @@ -35,15 +35,17 @@ struct sgx_enclave_create { > * @src: address for the page data > * @secinfo: address for the SECINFO data > * @mrmask: bitmask for the measured 256 byte chunks > + * @prot: maximal PROT_{READ,WRITE,EXEC} protections for the page > */ > struct sgx_enclave_add_page { > __u64 addr; > __u64 src; > __u64 secinfo; > - __u64 mrmask; > + __u16 mrmask; > + __u8 prot; > + __u8 pad; > }; Given EPCM permissions cannot change in SGX1, these maximal PROT_* flags can be the same as EPCM permissions, so don't have to be specified by user code until SGX2. Given we don't have a clear picture on how SGX2 will work yet, I think we shall take "prot" off until it is proven necessary. > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c > index 29384cdd0842..dabfe2a7245a 100644 > --- a/arch/x86/kernel/cpu/sgx/driver/main.c > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c > @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, } > #endif > > +/* > + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages > + * covered by the specific VMA. A non-existent (or yet to be added) > +enclave > + * page is considered to have no RWX permissions, i.e. is inaccessible. > + */ > +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, > + struct vm_area_struct *vma) > +{ > + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; > + unsigned long idx, idx_start, idx_end; > + struct sgx_encl_page *page; > + > + idx_start = PFN_DOWN(vma->vm_start); > + idx_end = PFN_DOWN(vma->vm_end - 1); > + > + for (idx = idx_start; idx <= idx_end; ++idx) { > + /* > + * No need to take encl->lock, vm_prot_bits is set prior to > + * insertion and never changes, and racing with adding pages is > + * a userspace bug. > + */ > + rcu_read_lock(); > + page = radix_tree_lookup(&encl->page_tree, idx); > + rcu_read_unlock(); This loop iterates through every page in the range, which could be very slow if the range is large. > + > + /* Do not allow R|W|X to a non-existent page. */ > + if (!page) > + allowed_rwx = 0; > + else > + allowed_rwx &= page->vm_prot_bits; > + if (!allowed_rwx) > + break; > + } > + > + return allowed_rwx; > +} > + > static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { > struct sgx_encl *encl = file->private_data; > + unsigned long allowed_rwx; > int ret; > > + allowed_rwx = sgx_allowed_rwx(encl, vma); > + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) > + return -EACCES; > + > ret = sgx_encl_mm_add(encl, vma->vm_mm); > if (ret) > return ret; > > + if (!(allowed_rwx & VM_READ)) > + vma->vm_flags &= ~VM_MAYREAD; > + if (!(allowed_rwx & VM_WRITE)) > + vma->vm_flags &= ~VM_MAYWRITE; > + if (!(allowed_rwx & VM_EXEC)) > + vma->vm_flags &= ~VM_MAYEXEC; > + Say a range comprised of a RW sub-range and a RX sub-range is being mmap()'ed as R here. It'd succeed but mprotect(<RW sub-range>, RW) afterwards will fail because VM_MAYWRITE is cleared here. However, if those two sub-ranges are mapped by separate mmap() calls then the same mprotect() would succeed. The inconsistence here is unexpected and unprecedented. > vma->vm_ops = &sgx_vm_ops; > vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; > vma->vm_private_data = encl;
> From: Christopherson, Sean J
> Sent: Wednesday, June 19, 2019 3:24 PM
>
> diff --git a/security/security.c b/security/security.c
> index 613a5c00e602..03951e08bdfc 100644
> --- a/security/security.c
> +++ b/security/security.c
> @@ -2359,3 +2359,10 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
> call_void_hook(bpf_prog_free_security, aux);
> }
> #endif /* CONFIG_BPF_SYSCALL */
> +
> +#ifdef CONFIG_INTEL_SGX
> +int security_enclave_map(unsigned long prot)
> +{
> + return call_int_hook(enclave_map, 0, prot);
> +}
> +#endif /* CONFIG_INTEL_SGX */
Why is this new security_enclave_map() necessary while security_mmap_file() will also be invoked?
> From: Christopherson, Sean J
> Sent: Wednesday, June 19, 2019 3:24 PM
>
> diff --git a/include/linux/security.h b/include/linux/security.h
> index 6a1f54ba6794..572ddfc53039 100644
> --- a/include/linux/security.h
> +++ b/include/linux/security.h
> @@ -1832,11 +1832,18 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
> #ifdef CONFIG_INTEL_SGX
> #ifdef CONFIG_SECURITY
> int security_enclave_map(unsigned long prot);
> +int security_enclave_load(struct vm_area_struct *vma, unsigned long prot,
> + bool measured);
> #else
> static inline int security_enclave_map(unsigned long prot)
> {
> return 0;
> }
> +static inline int security_enclave_load(struct vm_area_struct *vma,
> + unsigned long prot, bool measured)
> +{
> + return 0;
> +}
> #endif /* CONFIG_SECURITY */
> #endif /* CONFIG_INTEL_SGX */
Parameters to security_enclave_load() are specific on what's being loading only, but unspecific on which enclave to be loaded into. That kills the possibility of an LSM module making enclave dependent decisions.
Btw, if enclave (in the form of struct file) is also passed in as a parameter, it'd let LSM know that file is an enclave, hence would be able to make the same decision in security_mmap_file() as in security_enclave_map(). In other words, you wouldn't need security_enclave_map().
> From: Christopherson, Sean J
> Sent: Wednesday, June 19, 2019 3:24 PM
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 3ec702cf46ca..fc239e541b62 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
> }
> #endif
>
> +#ifdef CONFIG_INTEL_SGX
> +static int selinux_enclave_map(unsigned long prot)
> +{
> + const struct cred *cred = current_cred();
> + u32 sid = cred_sid(cred);
> +
> + /* SGX is supported only in 64-bit kernels. */
> + WARN_ON_ONCE(!default_noexec);
> +
> + if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
> + return avc_has_perm(&selinux_state, sid, sid,
> + SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX,
> + NULL);
Why isn't SGX_MAPWX enclave specific but process wide?
> From: Christopherson, Sean J > Sent: Wednesday, June 19, 2019 3:24 PM > > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c > index 4379a2fb1f82..b478c0f45279 100644 > --- a/arch/x86/kernel/cpu/sgx/driver/main.c > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c > @@ -99,7 +99,8 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, > * page is considered to have no RWX permissions, i.e. is inaccessible. > */ > static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, > - struct vm_area_struct *vma) > + struct vm_area_struct *vma, > + bool *eaug) > { > unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; > unsigned long idx, idx_start, idx_end; @@ -123,6 +124,8 @@ static unsigned long > sgx_allowed_rwx(struct sgx_encl *encl, > allowed_rwx = 0; > else > allowed_rwx &= page->vm_prot_bits; > + if (page->vm_prot_bits & SGX_VM_EAUG) > + *eaug = true; > if (!allowed_rwx) > break; > } > @@ -134,16 +137,17 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > { > struct sgx_encl *encl = file->private_data; > unsigned long allowed_rwx, prot; > + bool eaug = false; > int ret; > > - allowed_rwx = sgx_allowed_rwx(encl, vma); > + allowed_rwx = sgx_allowed_rwx(encl, vma, &eaug); > if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) > return -EACCES; IIUC, "eaug range" has to be mapped PROT_NONE, then vm_ops->fault() won't be invoked. Am I correct? Then how to EAUG on #PF? > > prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | > _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | > _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); > - ret = security_enclave_map(prot); > + ret = security_enclave_map(prot, eaug); > if (ret) > return ret; >
> From: Christopherson, Sean J > Sent: Wednesday, June 19, 2019 3:24 PM > > Intended use of each permission: > > - SGX_EXECDIRTY: dynamically load code within the enclave itself > - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. Graphene Why does it matter whether a code page is measured or not? > - SGX_EXECANON: load code from anonymous memory (likely Graphene) Graphene doesn't load code from anonymous memory. It loads code dynamically though, as in SGX_EXECDIRTY case. > - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior Why is SGX_EXECUTE needed from security perspective? Or why isn't FILE__EXECUTE sufficient?
On 6/19/19 6:23 PM, Sean Christopherson wrote: > Hook enclave_map() to require a new per-process capability, SGX_MAPWX, > when mapping an enclave as simultaneously writable and executable. > Note, @prot contains the actual protection bits that will be set by the > kernel, not the maximal protection bits specified by userspace when the > page was first loaded into the enclave. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > security/selinux/hooks.c | 21 +++++++++++++++++++++ > security/selinux/include/classmap.h | 3 ++- > 2 files changed, 23 insertions(+), 1 deletion(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index 3ec702cf46ca..fc239e541b62 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) > } > #endif > > +#ifdef CONFIG_INTEL_SGX > +static int selinux_enclave_map(unsigned long prot) > +{ > + const struct cred *cred = current_cred(); > + u32 sid = cred_sid(cred); > + > + /* SGX is supported only in 64-bit kernels. */ > + WARN_ON_ONCE(!default_noexec); > + > + if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) > + return avc_has_perm(&selinux_state, sid, sid, > + SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX, > + NULL); Possibly we should use a slightly more general name for the permission to allow reusing it in the future if/when another architecture introduces a similar construct under a different branding? ENCLAVE_* seems slightly more generic than SGX_*. I was interested in testing this code but sadly the driver reports the following on my development workstation: [ 1.644191] sgx: The launch control MSRs are not writable [ 1.695477] sgx: EPC section 0x70200000-0x75f7ffff [ 1.771760] sgx: The public key MSRs are not writable I guess I'm out of luck until/unless I get a NUC or server class hardware that supports flexible launch control? Seems developer unfriendly. > + return 0; > +} > +#endif > + > struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { > .lbs_cred = sizeof(struct task_security_struct), > .lbs_file = sizeof(struct file_security_struct), > @@ -6968,6 +6985,10 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), > LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), > #endif > + > +#ifdef CONFIG_INTEL_SGX > + LSM_HOOK_INIT(enclave_map, selinux_enclave_map), > +#endif > }; > > static __init int selinux_init(void) > diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h > index 201f7e588a29..cfd91e879bdf 100644 > --- a/security/selinux/include/classmap.h > +++ b/security/selinux/include/classmap.h > @@ -51,7 +51,8 @@ struct security_class_mapping secclass_map[] = { > "execmem", "execstack", "execheap", "setkeycreate", > "setsockcreate", "getrlimit", NULL } }, > { "process2", > - { "nnp_transition", "nosuid_transition", NULL } }, > + { "nnp_transition", "nosuid_transition", > + "sgx_mapwx", NULL } }, > { "system", > { "ipc_info", "syslog_read", "syslog_mod", > "syslog_console", "module_request", "module_load", NULL } }, >
On 6/19/19 6:23 PM, Sean Christopherson wrote: > The goal of selinux_enclave_load() is to provide a facsimile of the > existing selinux_file_mprotect() and file_map_prot_check() policies, > but tailored to the unique properties of SGX. > > For example, an enclave page is technically backed by a MAP_SHARED file, > but the "file" is essentially shared memory that is never persisted > anywhere and also requires execute permissions (for some pages). > > Enclaves are also less priveleged than normal user code, e.g. SYSCALL > instructions #UD if attempted in an enclave. For this reason, add SGX > specific permissions instead of reusing existing permissions such as > FILE__EXECUTE so that policies can allow running code in an enclave, or > allow dynamically loading code in an enclave without having to grant the > same capability to normal user code outside of the enclave. > > Intended use of each permission: > > - SGX_EXECDIRTY: dynamically load code within the enclave itself > - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. Graphene > - SGX_EXECANON: load code from anonymous memory (likely Graphene) > - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior > > Note, equivalents to FILE__READ and FILE__WRITE are intentionally never > required. Writes to the enclave page are contained to the EPC, i.e. > never hit the original file, and read permissions have already been > vetted (or the VMA doesn't have PROT_READ, in which case loading the > page into the enclave will fail). > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > --- > security/selinux/hooks.c | 55 +++++++++++++++++++++++++++-- > security/selinux/include/classmap.h | 5 +-- > 2 files changed, 55 insertions(+), 5 deletions(-) > > diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > index fc239e541b62..8a431168e454 100644 > --- a/security/selinux/hooks.c > +++ b/security/selinux/hooks.c > @@ -6727,6 +6727,12 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) > #endif > > #ifdef CONFIG_INTEL_SGX > +static inline int sgx_has_perm(u32 sid, u32 requested) > +{ > + return avc_has_perm(&selinux_state, sid, sid, > + SECCLASS_PROCESS2, requested, NULL); > +} > + > static int selinux_enclave_map(unsigned long prot) > { > const struct cred *cred = current_cred(); > @@ -6736,11 +6742,53 @@ static int selinux_enclave_map(unsigned long prot) > WARN_ON_ONCE(!default_noexec); > > if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) > - return avc_has_perm(&selinux_state, sid, sid, > - SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX, > - NULL); > + return sgx_has_perm(sid, PROCESS2__SGX_MAPWX); > + > return 0; > } > + > +static int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot, > + bool measured) > +{ > + const struct cred *cred = current_cred(); > + u32 sid = cred_sid(cred); > + int ret; > + > + /* SGX is supported only in 64-bit kernels. */ > + WARN_ON_ONCE(!default_noexec); > + > + /* Only executable enclave pages are restricted in any way. */ > + if (!(prot & PROT_EXEC)) > + return 0; > + > + /* > + * WX at load time only requires EXECDIRTY, e.g. to allow W->X. Actual > + * WX mappings require MAPWX (see selinux_enclave_map()). > + */ > + if (prot & PROT_WRITE) { > + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECDIRTY); > + if (ret) > + goto out; > + } > + if (!measured) { > + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECUNMR); > + if (ret) > + goto out; > + } > + > + if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file)) || > + vma->anon_vma) > + /* > + * Loading enclave code from an anonymous mapping or from a > + * modified private file mapping. > + */ > + ret = sgx_has_perm(sid, PROCESS2__SGX_EXECANON); > + else > + /* Loading from a shared or unmodified private file mapping. */ > + ret = file_has_perm(cred, vma->vm_file, FILE__SGX_EXECUTE); > +out: > + return ret; > +} Same comment on this patch: we might want to generalize the permission names in hopes of being able to reuse them in the future for similar constructs on other architectures. SGX -> ENCLAVE throughout? > #endif > > struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { > @@ -6988,6 +7036,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { > > #ifdef CONFIG_INTEL_SGX > LSM_HOOK_INIT(enclave_map, selinux_enclave_map), > + LSM_HOOK_INIT(enclave_load, selinux_enclave_load), > #endif > }; > > diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h > index cfd91e879bdf..baa1757be46a 100644 > --- a/security/selinux/include/classmap.h > +++ b/security/selinux/include/classmap.h > @@ -7,7 +7,7 @@ > > #define COMMON_FILE_PERMS COMMON_FILE_SOCK_PERMS, "unlink", "link", \ > "rename", "execute", "quotaon", "mounton", "audit_access", \ > - "open", "execmod" > + "open", "execmod", "sgx_execute" > > #define COMMON_SOCK_PERMS COMMON_FILE_SOCK_PERMS, "bind", "connect", \ > "listen", "accept", "getopt", "setopt", "shutdown", "recvfrom", \ > @@ -52,7 +52,8 @@ struct security_class_mapping secclass_map[] = { > "setsockcreate", "getrlimit", NULL } }, > { "process2", > { "nnp_transition", "nosuid_transition", > - "sgx_mapwx", NULL } }, > + "sgx_mapwx", "sgx_execdirty", "sgx_execanon", "sgx_execunmr", > + NULL } }, > { "system", > { "ipc_info", "syslog_read", "syslog_mod", > "syslog_console", "module_request", "module_load", NULL } }, >
On 6/21/19 12:54 PM, Xing, Cedric wrote:
>> From: Christopherson, Sean J
>> Sent: Wednesday, June 19, 2019 3:24 PM
>>
>> diff --git a/security/security.c b/security/security.c
>> index 613a5c00e602..03951e08bdfc 100644
>> --- a/security/security.c
>> +++ b/security/security.c
>> @@ -2359,3 +2359,10 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux)
>> call_void_hook(bpf_prog_free_security, aux);
>> }
>> #endif /* CONFIG_BPF_SYSCALL */
>> +
>> +#ifdef CONFIG_INTEL_SGX
>> +int security_enclave_map(unsigned long prot)
>> +{
>> + return call_int_hook(enclave_map, 0, prot);
>> +}
>> +#endif /* CONFIG_INTEL_SGX */
>
> Why is this new security_enclave_map() necessary while security_mmap_file() will also be invoked?
security_mmap_file() doesn't know about enclaves. It will just end up
checking FILE__READ, FILE__WRITE, and FILE__EXECUTE to /dev/sgx/enclave.
This was noted in the patch description.
On 6/21/19 1:05 PM, Xing, Cedric wrote:
>> From: Christopherson, Sean J
>> Sent: Wednesday, June 19, 2019 3:24 PM
>>
>> diff --git a/include/linux/security.h b/include/linux/security.h
>> index 6a1f54ba6794..572ddfc53039 100644
>> --- a/include/linux/security.h
>> +++ b/include/linux/security.h
>> @@ -1832,11 +1832,18 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
>> #ifdef CONFIG_INTEL_SGX
>> #ifdef CONFIG_SECURITY
>> int security_enclave_map(unsigned long prot);
>> +int security_enclave_load(struct vm_area_struct *vma, unsigned long prot,
>> + bool measured);
>> #else
>> static inline int security_enclave_map(unsigned long prot)
>> {
>> return 0;
>> }
>> +static inline int security_enclave_load(struct vm_area_struct *vma,
>> + unsigned long prot, bool measured)
>> +{
>> + return 0;
>> +}
>> #endif /* CONFIG_SECURITY */
>> #endif /* CONFIG_INTEL_SGX */
>
> Parameters to security_enclave_load() are specific on what's being loading only, but unspecific on which enclave to be loaded into. That kills the possibility of an LSM module making enclave dependent decisions.
>
> Btw, if enclave (in the form of struct file) is also passed in as a parameter, it'd let LSM know that file is an enclave, hence would be able to make the same decision in security_mmap_file() as in security_enclave_map(). In other words, you wouldn't need security_enclave_map().
Sorry, you want security_enclave_load() to stash a reference to the
enclave file in some security module-internal state, then match it upon
later security_mmap_file() calls to determine that it is dealing with an
enclave, and then adjust its logic accordingly? When do we release that
reference?
On 6/21/19 1:09 PM, Xing, Cedric wrote:
>> From: Christopherson, Sean J
>> Sent: Wednesday, June 19, 2019 3:24 PM
>>
>> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
>> index 3ec702cf46ca..fc239e541b62 100644
>> --- a/security/selinux/hooks.c
>> +++ b/security/selinux/hooks.c
>> @@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
>> }
>> #endif
>>
>> +#ifdef CONFIG_INTEL_SGX
>> +static int selinux_enclave_map(unsigned long prot)
>> +{
>> + const struct cred *cred = current_cred();
>> + u32 sid = cred_sid(cred);
>> +
>> + /* SGX is supported only in 64-bit kernels. */
>> + WARN_ON_ONCE(!default_noexec);
>> +
>> + if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
>> + return avc_has_perm(&selinux_state, sid, sid,
>> + SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX,
>> + NULL);
>
> Why isn't SGX_MAPWX enclave specific but process wide?
How would you tie it to a specific enclave? What's the object/target
SID? The SID of the enclave inode? Which one? The source vma file,
the /dev/sgx/enclave open instance, the sigstruct file, ...? If a
process can map one enclave WX, what's the benefit of preventing it from
doing likewise for any other enclave it can load?
On 6/21/19 5:22 PM, Xing, Cedric wrote: >> From: Christopherson, Sean J >> Sent: Wednesday, June 19, 2019 3:24 PM >> >> Intended use of each permission: >> >> - SGX_EXECDIRTY: dynamically load code within the enclave itself >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. Graphene > > Why does it matter whether a code page is measured or not? It won't be incorporated into an attestation? > >> - SGX_EXECANON: load code from anonymous memory (likely Graphene) > > Graphene doesn't load code from anonymous memory. It loads code dynamically though, as in SGX_EXECDIRTY case. So do we expect EXECANON to never be triggered at all? > >> - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior > > Why is SGX_EXECUTE needed from security perspective? Or why isn't FILE__EXECUTE sufficient? Splitting the SGX permissions from the regular ones allows distinctions to be made between what can be executed in the host process and what can be executed in the enclave. The host process may be allowed FILE__EXECUTE to numerous files that do not contain any code ever intended to be executed within the enclave.
On 6/25/19 5:01 PM, Stephen Smalley wrote:
> On 6/21/19 1:05 PM, Xing, Cedric wrote:
>>> From: Christopherson, Sean J
>>> Sent: Wednesday, June 19, 2019 3:24 PM
>>>
>>> diff --git a/include/linux/security.h b/include/linux/security.h
>>> index 6a1f54ba6794..572ddfc53039 100644
>>> --- a/include/linux/security.h
>>> +++ b/include/linux/security.h
>>> @@ -1832,11 +1832,18 @@ static inline void
>>> security_bpf_prog_free(struct bpf_prog_aux *aux)
>>> #ifdef CONFIG_INTEL_SGX
>>> #ifdef CONFIG_SECURITY
>>> int security_enclave_map(unsigned long prot);
>>> +int security_enclave_load(struct vm_area_struct *vma, unsigned long
>>> prot,
>>> + bool measured);
>>> #else
>>> static inline int security_enclave_map(unsigned long prot)
>>> {
>>> return 0;
>>> }
>>> +static inline int security_enclave_load(struct vm_area_struct *vma,
>>> + unsigned long prot, bool measured)
>>> +{
>>> + return 0;
>>> +}
>>> #endif /* CONFIG_SECURITY */
>>> #endif /* CONFIG_INTEL_SGX */
>>
>> Parameters to security_enclave_load() are specific on what's being
>> loading only, but unspecific on which enclave to be loaded into. That
>> kills the possibility of an LSM module making enclave dependent
>> decisions.
>>
>> Btw, if enclave (in the form of struct file) is also passed in as a
>> parameter, it'd let LSM know that file is an enclave, hence would be
>> able to make the same decision in security_mmap_file() as in
>> security_enclave_map(). In other words, you wouldn't need
>> security_enclave_map().
>
> Sorry, you want security_enclave_load() to stash a reference to the
> enclave file in some security module-internal state, then match it upon
> later security_mmap_file() calls to determine that it is dealing with an
> enclave, and then adjust its logic accordingly? When do we release that
> reference?
I guess you mean set a flag in the enclave file security struct upon
security_enclave_load() and check that flag in security_mmap_file().
This seems somewhat similar to one of Sean's alternatives in the patch
description for 06/12, except by pushing the information from sgx to LSM
upon security_enclave_load() rather than pulling it via a
is_sgx_enclave() helper. Not clear if it is still subject to the same
limitations.
On Tue, Jun 25, 2019 at 04:19:35PM -0400, Stephen Smalley wrote: Good morning, I hope the week is going well for everyone. > On 6/19/19 6:23 PM, Sean Christopherson wrote: > >Hook enclave_map() to require a new per-process capability, SGX_MAPWX, > >when mapping an enclave as simultaneously writable and executable. > >Note, @prot contains the actual protection bits that will be set by the > >kernel, not the maximal protection bits specified by userspace when the > >page was first loaded into the enclave. > > > >Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > >--- > > security/selinux/hooks.c | 21 +++++++++++++++++++++ > > security/selinux/include/classmap.h | 3 ++- > > 2 files changed, 23 insertions(+), 1 deletion(-) > > > >diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c > >index 3ec702cf46ca..fc239e541b62 100644 > >--- a/security/selinux/hooks.c > >+++ b/security/selinux/hooks.c > >@@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct > >bpf_prog_aux *aux) > > } > > #endif > > > >+#ifdef CONFIG_INTEL_SGX > >+static int selinux_enclave_map(unsigned long prot) > >+{ > >+ const struct cred *cred = current_cred(); > >+ u32 sid = cred_sid(cred); > >+ > >+ /* SGX is supported only in 64-bit kernels. */ > >+ WARN_ON_ONCE(!default_noexec); > >+ > >+ if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) > >+ return avc_has_perm(&selinux_state, sid, sid, > >+ SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX, > >+ NULL); > Possibly we should use a slightly more general name for the > permission to allow reusing it in the future if/when another > architecture introduces a similar construct under a different > branding? ENCLAVE_* seems slightly more generic than SGX_*. Perhaps TEE_*, since it is generic and expresses the notion of privileges specific to an alternate execution environment. > I was interested in testing this code but sadly the driver reports > the following on my development workstation: > > [ 1.644191] sgx: The launch control MSRs are not writable > [ 1.695477] sgx: EPC section 0x70200000-0x75f7ffff > [ 1.771760] sgx: The public key MSRs are not writable > > I guess I'm out of luck until/unless I get a NUC or server class > hardware that supports flexible launch control? Seems developer > unfriendly. Indeed. Most importantly, it is decidedly unfriendly to the future of the technology on Linux. More problematically, from a development perspective, the driver is incompatible with the current Intel runtime, which makes testing at a level beyond the one page test harness that is included with the patchset impossible. As I noted previously, before the LSM discussion, we have a patch that addresses the compatibility, security and launch control issues the original version of the driver had. If you missed the thread, it is available from the following URL: ftp://ftp.idfusion.net/pub/idfusion/jarkko-master-SFLC.patch It will be a bit dated by now and doesn't address the API change needed to set page permissions. It is a pretty solid starting point if you want to use the existing runtime to do more then trivial testing. We have an extension to the existing driver that we will be releasing, so users of our SRDE will be able to use both the out-of-tree and in-tree drivers. It also re-establishes launch control and provides a very simplistic interface to implement ring-0 security for launch control on flexible launch control platforms. I'm in Israel right now but we should have a GIT tree against the current development branches by the weekend. We will be testing the driver with our SRDE against real world enclaves as we advance the driver forward. Have a good day. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker IDfusion, LLC 4206 N. 19th Ave. Implementing measured information privacy Fargo, ND 58102 and integrity architectures. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@idfusion.net ------------------------------------------------------------------------------ "This place is so screwed up. It's just like the Titanic, only we don't even have a band playing. -- Terrance George Wieland Resurrection.
This series intends to make the new SGX subsystem to work with the existing LSM architecture smoothly so that, say, SGX cannot be abused to work around restrictions set forth by LSM modules/policies. This patch is based on and could be applied cleanly on top of Jarkko Sakkinen’s SGX patch series v20 (https://patchwork.kernel.org/cover/10905141/). For those who haven’t followed closely, the whole discussion started from the primary question of how to prevent creating an executable enclave page from a regular memory page that is NOT executable as prohibited by LSM modules/policies. And that can be translated into 2 relating questions in practice, i.e. 1) how to determine the allowed initial protections of enclave pages when they are being loaded and 2) how to determine the allowed protections of enclave pages at runtime. Those who are familiar with LSM may notice that, for regular files, #1 is determined by security_mmap_file() while #2 is covered by security_file_mprotect(). Those 2 hooks however are insufficient for enclaves due to the distinct composition and lifespan of enclave pages. Specifically, security_mmap_file() only passes in the file but is not specific on which portion of the file being mmap()’ed, with the assumption that all pages of the same file shall have the same set of allowed/disallowed protections. But that assumption is no longer true for enclaves for 2 reasons: a) pages of an enclave may be loaded from different image files with different attributes and b) enclave pages retain contents across munmap()/mmap(), therefore, say, if a policy prohibits execution of modified pages, then pages flagged modified have to stay modified across munmap()/mmap() so that the policy cannot be circumvented by remapping (i.e. munmap() followed by mmap() on the same range). But the lack of range information in security_mmap_file()’s arguments simply blocks LSM modules from tracking enclave pages properly. A rational solution would always involve tracking the correspondence between enclave pages and their origin (e.g. files from which they were loaded), which is similar to tracking regular memory pages and their origin via vm_file of struct vm_area_struct. But given the longer lifespan of enclave pages (than VMAs they are mapped into), such correspondence has to be stored in a separate data structure outside of VMAs. In theory, the correspondence could be stored either in LSM or in the SGX subsystem. This series has picked the former because firstly, such information is useful only within LSM so it makes more sense to keep it as “LSM internal” and secondly, keeping the data structure inside LSM would allow additional information to be cached in LSM modules without affecting the rest of the kernel, while lastly, those data structures would be gone when LSM is disabled hence would not impose any unnecessary overhead. As you can see in the SELinux implementation of those new hooks enclosed in this series, options are offered (via kernel command line parameter) to system administrators between accurate auditing (decisions made at the time of request) and low memory/performance overhead (decisions made/cached ahead of time at page instantiation). And that’s one of the benefits of keeping everything inside LSM. Those who are familiar with this topic and related discussions may also notice that, Sean Christopherson has sent out an RFC patch recently to address the same problem as this series. He adopted the other approach of tracking page/origin correspondence inside the SGX subsystem. However, to reduce memory overhead in practice, he cached the FSM (Finite State Machine) instead of page/origin correspondences. By “FSM”, I mean policy FSM defined as sets of states and events that may trigger state transitions. Generally speaking, any LSM module has its own definition of FSM and usually uses attributes attached to files to argument the FSM, then it advances the FSM as events are observed and gives out decision based on the current FSM state. Sean’s implementation attempts to move the FSM into the SGX subsystem, and by caching the arguments returned by LSM it tries to monitor events and reach the same decisions by itself. So from architecture perspective, that model has to face tough challenges in reality, such as how to support multiple LSM modules that employ different FSMs to govern page protection transitions. Implementation wise, his model also imposes unwanted restrictions specifically to SGX2, such as: - Complicated/Restricted UAPI – Enclave loaders are required to provide “maximal protection” at page load time, but such information is NOT always available. For example, Graphene containers may run different applications comprised of different set of executables and/or shared objects. Some of them may contain self-modifying code (or text relocation) while others don’t. The generic enclave loader usually doesn’t have such information so wouldn’t be able to provide it ahead of time. - Inefficient Auditing – Audit logs are supposed to help system administrators to determine the set of minimally needed permissions and to detect abnormal behaviors. But consider the “maximal protection” model, if “maximal protection” is set to be too permissive, then audit log wouldn’t be able to detect anomalies; or if “maximal protection” is too restrictive, then audit log cannot identify the file violating the policy. In either case the audit log cannot fulfill its purposes. - Inability to support #PF driven EPC allocation in SGX2 – For those unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by issuing EACCEPT on the address that a new page is wanted, and the resulted #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the fault address, and then the enclave would be resumed and the faulting EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing non-existing enclave pages so that the SGX module/subsystem could respond to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow mmap()’ing non-existing pages for variety of reasons and therefore blocks this major SGX2 usage. History: - This is version 2 of this patch series, with the following changes per comments/requests from the community: + A new data structure – EMA (Enclave Memory Area) is introduced to track range/origin correspondences for enclaves. EMAs are maintained by the LSM framework to be shared among all LSM modules. EMAs are allocated for enclave files only so will not impose overhead to regular applications/files. + Improved auditing – A new kernel command line option “lsm.ema.cache_decisions” is introduced, if on, would cause LSM modules to make/cache decisions at page instantiation (i.e. enclave_load() hook) instead of at time of request (i.e. file_mprotect() hook), in order to save memory by NOT keeping enclave source files open. System administrators are expected to run LSM in permissive mode along with this option off to figure out the minimal permissions necessary, then turn it back on in enforcing mode to minimize memory/performance overheads. + In the SELinux implementation of the new hooks, FILE__EXECUTE on the file containing SIGSTRUCT is interpreted as approval for launch, while FILE__EXECMOD is interpreted as allowing anonymous pages (i.e. pages EAUG’ed, or EADD’ed from an anonymous source page) to be executable. Allowed protections for other pages loaded from files are dictated by the source files’ FILE__EXECUTE/FILE__EXECMOD. This series intentionally avoids defining new permissions so that user mode tools could continue to work by treating enclave files the same way as regular executables and/or shared objects. - v1 – https://patchwork.kernel.org/cover/10984127/ Cedric Xing (3): x86/sgx: Add SGX specific LSM hooks x86/sgx: Call LSM hooks from SGX subsystem/module x86/sgx: Implement SGX specific hooks in SELinux arch/x86/kernel/cpu/sgx/driver/ioctl.c | 80 ++++++++- arch/x86/kernel/cpu/sgx/driver/main.c | 16 +- include/linux/lsm_ema.h | 171 ++++++++++++++++++ include/linux/lsm_hooks.h | 29 ++++ include/linux/security.h | 23 +++ security/Makefile | 1 + security/lsm_ema.c | 132 ++++++++++++++ security/security.c | 47 ++++- security/selinux/hooks.c | 229 ++++++++++++++++++++++++- security/selinux/include/objsec.h | 24 +++ 10 files changed, 732 insertions(+), 20 deletions(-) create mode 100644 include/linux/lsm_ema.h create mode 100644 security/lsm_ema.c -- 2.17.1
SGX enclaves are loaded from pages in regular memory. Given the ability to create executable pages, the newly added SGX subsystem may present a backdoor for adversaries to circumvent LSM policies, such as creating an executable enclave page from a modified regular page that would otherwise not be made executable as prohibited by LSM. Therefore arises the primary question of whether an enclave page should be allowed to be created from a given source page in regular memory. A related question is whether to grant/deny a mprotect() request on a given enclave page/range. mprotect() is traditionally covered by security_file_mprotect() hook, however, enclave pages have a different lifespan than either MAP_PRIVATE or MAP_SHARED. Particularly, MAP_PRIVATE pages have the same lifespan as the VMA while MAP_SHARED pages have the same lifespan as the backing file (on disk), but enclave pages have the lifespan of the enclave’s file descriptor. For example, enclave pages could be munmap()’ed then mmap()’ed again without losing contents (like MAP_SHARED), but all enclave pages will be lost once its file descriptor has been closed (like MAP_PRIVATE). That said, LSM modules need some new data structure for tracking protections of enclave pages/ranges so that they can make proper decisions at mmap()/mprotect() syscalls. The last question, which is orthogonal to the 2 above, is whether or not to allow a given enclave to launch/run. Enclave pages are not visible to the rest of the system, so to some extent offer a better place for malicious software to hide. Thus, it is sometimes desirable to whitelist/blacklist enclaves by their measurements, signing public keys, or image files. To address the questions above, 2 new LSM hooks are added for enclaves. - security_enclave_load() – This hook allows LSM to decide whether or not to allow instantiation of a range of enclave pages using the specified VMA. It is invoked when a range of enclave pages is about to be loaded. It serves 3 purposes: 1) indicate to LSM that the file struct in subject is an enclave; 2) allow LSM to decide whether or not to instantiate those pages and 3) allow LSM to initialize internal data structures for tracking origins/protections of those pages. - security_enclave_init() – This hook allows whitelisting/blacklisting or performing whatever checks deemed appropriate before an enclave is allowed to run. An LSM module may opt to use the file backing the SIGSTRUCT as a proxy to dictate allowed protections for anonymous pages. mprotect() of enclave pages continue to be governed by security_file_mprotect(), with the expectation that LSM is able to distinguish between regular and enclave pages inside the hook. For mmap(), the SGX subsystem is expected to invoke security_file_mprotect() explicitly to check protections against the requested protections for existing enclave pages. As stated earlier, enclave pages have different lifespan than the existing MAP_PRIVATE and MAP_SHARED pages, so would require a new data structure outside of VMA to track their protections and/or origins. Enclave Memory Area (or EMA for short) has been introduced to address the need. EMAs are maintained by the LSM framework for all LSM modules to share. EMAs will be instantiated for enclaves only so will not impose memory/performance overheads for regular applications/files. Please see include/linux/lsm_ema.h and security/lsm_ema.c for details. A new setup parameter – lsm.ema.cache_decisions has been introduced to offer the choice between memory consumption and accuracy of audit logs. Enabling lsm.ema.cache_decisions causes LSM framework NOT to keep backing files open for EMAs. While that saves memory, it requires LSM modules to make and cache decisions ahead of time, and makes it difficult for LSM modules to generate accurate audit logs. System administrators are expected to run LSM in permissive mode with lsm.ema.cache_decisions off to determine the minimal permissions needed, and then turn it back on in enforcing mode for optimal performance and memory usage. lsm.ema.cache_decisions is on by default and could be turned off by appending “lsm.ema.cache_decisions=0” or “lsm.ema.cache_decisions=off” to the kernel command line. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- include/linux/lsm_ema.h | 171 ++++++++++++++++++++++++++++++++++++++ include/linux/lsm_hooks.h | 29 +++++++ include/linux/security.h | 23 +++++ security/Makefile | 1 + security/lsm_ema.c | 132 +++++++++++++++++++++++++++++ security/security.c | 47 ++++++++++- 6 files changed, 402 insertions(+), 1 deletion(-) create mode 100644 include/linux/lsm_ema.h create mode 100644 security/lsm_ema.c diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h new file mode 100644 index 000000000000..a09b8f96da05 --- /dev/null +++ b/include/linux/lsm_ema.h @@ -0,0 +1,171 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/** + * Enclave Memory Area interface for LSM modules + * + * Copyright(c) 2016-19 Intel Corporation. + */ + +#ifndef _LSM_EMA_H_ +#define _LSM_EMA_H_ + +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/fs.h> +#include <linux/file.h> + +/** + * lsm_ema - LSM Enclave Memory Area structure + * + * Data structure to track origins of enclave pages + * + * @link: + * Link to adjacent EMAs. EMAs are sorted by their addresses in ascending + * order + * @start: + * Starting address + * @end: + * Ending address + * @source: + * File from which this range was loaded from, or NULL if not loaded from + * any files + */ +struct lsm_ema { + struct list_head link; + size_t start; + size_t end; + struct file *source; +}; + +#define lsm_ema_data(ema, blob_sizes) \ + ((char *)((struct lsm_ema *)(ema) + 1) + blob_sizes.lbs_ema_data) + +/** + * lsm_ema_map - LSM Enclave Memory Map structure + * + * Container for EMAs of an enclave + * + * @list: + * Head of a list of sorted EMAs + * @lock: + * Acquire before querying/updateing the list EMAs + */ +struct lsm_ema_map { + struct list_head list; + struct mutex lock; +}; + +/** + * These are functions to be used by the LSM framework, and must be defined + * regardless CONFIG_INTEL_SGX is enabled or not. + */ + +#ifdef CONFIG_INTEL_SGX +void lsm_ema_global_init(size_t); +void lsm_free_ema_map(atomic_long_t *); +#else +static inline void lsm_ema_global_init(size_t ema_data_size) +{ +} + +static inline void lsm_free_ema_map(atomic_long_t *p) +{ +} +#endif + +/** + * Below are APIs to be used by LSM modules + */ + +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *); +struct lsm_ema *lsm_alloc_ema(void); +void lsm_free_ema(struct lsm_ema *); +void lsm_init_ema(struct lsm_ema *, size_t, size_t, struct file *); +int lsm_merge_ema(struct lsm_ema *, struct lsm_ema_map *); +struct lsm_ema *lsm_split_ema(struct lsm_ema *, size_t, struct lsm_ema_map *); + +static inline struct lsm_ema_map *lsm_get_ema_map(struct file *f) +{ + return (void *)atomic_long_read(f->f_security); +} + +static inline int __must_check lsm_lock_ema(struct lsm_ema_map *map) +{ + return mutex_lock_interruptible(&map->lock); +} + +static inline void lsm_unlock_ema(struct lsm_ema_map *map) +{ + mutex_unlock(&map->lock); +} + +static inline struct lsm_ema *lsm_prev_ema(struct lsm_ema *p, + struct lsm_ema_map *map) +{ + p = list_prev_entry(p, link); + return &p->link == &map->list ? NULL : p; +} + +static inline struct lsm_ema *lsm_next_ema(struct lsm_ema *p, + struct lsm_ema_map *map) +{ + p = list_next_entry(p, link); + return &p->link == &map->list ? NULL : p; +} + +static inline struct lsm_ema *lsm_find_ema(struct lsm_ema_map *map, size_t a) +{ + struct lsm_ema *p; + + BUG_ON(!mutex_is_locked(&map->lock)); + + list_for_each_entry(p, &map->list, link) + if (a < p->end) + break; + return &p->link == &map->list ? NULL : p; +} + +static inline int lsm_insert_ema(struct lsm_ema_map *map, struct lsm_ema *n) +{ + struct lsm_ema *p = lsm_find_ema(map, n->start); + + if (!p) + list_add_tail(&n->link, &map->list); + else if (n->end <= p->start) + list_add_tail(&n->link, &p->link); + else + return -EEXIST; + + lsm_merge_ema(n, map); + if (p) + lsm_merge_ema(p, map); + return 0; +} + +static inline int lsm_for_each_ema(struct lsm_ema_map *map, size_t start, + size_t end, int (*cb)(struct lsm_ema *, + void *), void *arg) +{ + struct lsm_ema *ema; + int rc; + + ema = lsm_find_ema(map, start); + while (ema && end > ema->start) { + if (start > ema->start) + lsm_split_ema(ema, start, map); + if (end < ema->end) + ema = lsm_split_ema(ema, end, map); + + rc = (*cb)(ema, arg); + lsm_merge_ema(ema, map); + if (rc) + return rc; + + ema = lsm_next_ema(ema, map); + } + + if (ema) + lsm_merge_ema(ema, map); + return 0; +} + +#endif /* _LSM_EMA_H_ */ diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 47f58cfb6a19..ade1f9f81e64 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -29,6 +29,8 @@ #include <linux/init.h> #include <linux/rculist.h> +struct lsm_ema; + /** * union security_list_options - Linux Security Module hook function list * @@ -1446,6 +1448,21 @@ * @bpf_prog_free_security: * Clean up the security information stored inside bpf prog. * + * @enclave_load: + * Decide if a range of pages shall be allowed to be loaded into an + * enclave + * + * @encl points to the file identifying the target enclave + * @ema specifies the target range to be loaded + * @flags contains protections being requested for the target range + * @source points to the VMA containing the source pages to be loaded + * + * @enclave_init: + * Decide if an enclave shall be allowed to launch + * + * @encl points to the file identifying the target enclave being launched + * @sigstruct contains a copy of the SIGSTRUCT in kernel memory + * @source points to the VMA backing SIGSTRUCT in user memory */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1807,6 +1824,13 @@ union security_list_options { int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX + int (*enclave_load)(struct file *encl, struct lsm_ema *ema, + size_t flags, struct vm_area_struct *source); + int (*enclave_init)(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *source); +#endif }; struct security_hook_heads { @@ -2046,6 +2070,10 @@ struct security_hook_heads { struct hlist_head bpf_prog_alloc_security; struct hlist_head bpf_prog_free_security; #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX + struct hlist_head enclave_load; + struct hlist_head enclave_init; +#endif } __randomize_layout; /* @@ -2069,6 +2097,7 @@ struct lsm_blob_sizes { int lbs_ipc; int lbs_msg_msg; int lbs_task; + int lbs_ema_data; }; /* diff --git a/include/linux/security.h b/include/linux/security.h index 659071c2e57c..52c200810004 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1829,5 +1829,28 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_SECURITY */ #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX +struct sgx_sigstruct; +#ifdef CONFIG_SECURITY +int security_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *source); +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *source); +#else +static inline int security_enclave_load(struct file *encl, size_t start, + size_t end, struct vm_area_struct *src) +{ + return 0; +} + +static inline int security_enclave_init(struct file *encl, + struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + return 0; +} +#endif /* CONFIG_SECURITY */ +#endif /* CONFIG_INTEL_SGX */ + #endif /* ! __LINUX_SECURITY_H */ diff --git a/security/Makefile b/security/Makefile index c598b904938f..1bab8f1344b6 100644 --- a/security/Makefile +++ b/security/Makefile @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o +obj-$(CONFIG_INTEL_SGX) += lsm_ema.o # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/lsm_ema.c b/security/lsm_ema.c new file mode 100644 index 000000000000..68fae0724d37 --- /dev/null +++ b/security/lsm_ema.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-18 Intel Corporation. + +#include <linux/lsm_ema.h> +#include <linux/slab.h> + +static struct kmem_cache *lsm_ema_cache; +static size_t lsm_ema_data_size; +static int lsm_ema_cache_decisions = 1; + +void lsm_ema_global_init(size_t ema_size) +{ + BUG_ON(lsm_ema_data_size > 0); + + lsm_ema_data_size = ema_size; + + ema_size += sizeof(struct lsm_ema); + ema_size = max(ema_size, sizeof(struct lsm_ema_map)); + lsm_ema_cache = kmem_cache_create("lsm_ema_cache", ema_size, + __alignof__(struct lsm_ema), + SLAB_PANIC, NULL); + +} + +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *p) +{ + struct lsm_ema_map *map; + + map = (typeof(map))atomic_long_read(p); + if (!map) { + long n; + + map = (typeof(map))lsm_alloc_ema(); + if (!map) + return NULL; + + INIT_LIST_HEAD(&map->list); + mutex_init(&map->lock); + + n = atomic_long_cmpxchg(p, 0, (long)map); + if (n) { + atomic_long_t a; + atomic_long_set(&a, (long)map); + map = (typeof(map))n; + lsm_free_ema_map(&a); + } + } + return map; +} + +void lsm_free_ema_map(atomic_long_t *p) +{ + struct lsm_ema_map *map; + struct lsm_ema *ema, *n; + + map = (typeof(map))atomic_long_read(p); + if (!map) + return; + + BUG_ON(mutex_is_locked(&map->lock)); + + list_for_each_entry_safe(ema, n, &map->list, link) + lsm_free_ema(ema); + kmem_cache_free(lsm_ema_cache, map); +} + +struct lsm_ema *lsm_alloc_ema(void) +{ + return kmem_cache_zalloc(lsm_ema_cache, GFP_KERNEL); +} + +void lsm_free_ema(struct lsm_ema *ema) +{ + list_del(&ema->link); + if (ema->source) + fput(ema->source); + kmem_cache_free(lsm_ema_cache, ema); +} + +void lsm_init_ema(struct lsm_ema *ema, size_t start, size_t end, + struct file *source) +{ + INIT_LIST_HEAD(&ema->link); + ema->start = start; + ema->end = end; + if (!lsm_ema_cache_decisions && source) + ema->source = get_file(source); +} + +int lsm_merge_ema(struct lsm_ema *p, struct lsm_ema_map *map) +{ + struct lsm_ema *prev = list_prev_entry(p, link); + + BUG_ON(!mutex_is_locked(&map->lock)); + + if (&prev->link == &map->list || prev->end != p->start || + prev->source != p->source || + memcmp(prev + 1, p + 1, lsm_ema_data_size)) + return 0; + + p->start = prev->start; + fput(prev->source); + lsm_free_ema(prev); + return 1; +} + +struct lsm_ema *lsm_split_ema(struct lsm_ema *p, size_t at, + struct lsm_ema_map *map) +{ + struct lsm_ema *n; + + BUG_ON(!mutex_is_locked(&map->lock)); + + if (at <= p->start || at >= p->end) + return p; + + n = lsm_alloc_ema(); + if (likely(n)) { + lsm_init_ema(n, p->start, at, p->source); + memcpy(n + 1, p + 1, lsm_ema_data_size); + p->start = at; + list_add_tail(&n->link, &p->link); + } + return n; +} + +static int __init set_ema_cache_decisions(char *str) +{ + lsm_ema_cache_decisions = (strcmp(str, "0") && strcmp(str, "off")); + return 1; +} +__setup("lsm.ema.cache_decisions=", set_ema_cache_decisions); diff --git a/security/security.c b/security/security.c index f493db0bf62a..d50883f18be2 100644 --- a/security/security.c +++ b/security/security.c @@ -17,6 +17,7 @@ #include <linux/init.h> #include <linux/kernel.h> #include <linux/lsm_hooks.h> +#include <linux/lsm_ema.h> #include <linux/integrity.h> #include <linux/ima.h> #include <linux/evm.h> @@ -41,7 +42,9 @@ static struct kmem_cache *lsm_file_cache; static struct kmem_cache *lsm_inode_cache; char *lsm_names; -static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init; +static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init = { + .lbs_file = sizeof(atomic_long_t) * IS_ENABLED(CONFIG_INTEL_SGX), +}; /* Boot-time LSM user choice */ static __initdata const char *chosen_lsm_order; @@ -169,6 +172,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed) lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc); lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg); lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task); + lsm_set_blob_size(&needed->lbs_ema_data, &blob_sizes.lbs_ema_data); } /* Prepare LSM for initialization. */ @@ -314,6 +318,7 @@ static void __init ordered_lsm_init(void) lsm_inode_cache = kmem_cache_create("lsm_inode_cache", blob_sizes.lbs_inode, 0, SLAB_PANIC, NULL); + lsm_ema_global_init(blob_sizes.lbs_ema_data); lsm_early_cred((struct cred *) current->cred); lsm_early_task(current); @@ -1357,6 +1362,7 @@ void security_file_free(struct file *file) blob = file->f_security; if (blob) { file->f_security = NULL; + lsm_free_ema_map(blob); kmem_cache_free(lsm_file_cache, blob); } } @@ -1420,6 +1426,7 @@ int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, { return call_int_hook(file_mprotect, 0, vma, reqprot, prot); } +EXPORT_SYMBOL(security_file_mprotect); int security_file_lock(struct file *file, unsigned int cmd) { @@ -2355,3 +2362,41 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) call_void_hook(bpf_prog_free_security, aux); } #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX +int security_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *src) +{ + struct lsm_ema_map *map; + struct lsm_ema *ema; + int rc; + + map = lsm_init_or_get_ema_map(encl->f_security); + if (unlikely(!map)) + return -ENOMEM; + + ema = lsm_alloc_ema(); + if (unlikely(!ema)) + return -ENOMEM; + + lsm_init_ema(ema, start, end, src->vm_file); + rc = call_int_hook(enclave_load, 0, encl, ema, flags, src); + if (!rc) + rc = lsm_lock_ema(map); + if (!rc) { + rc = lsm_insert_ema(map, ema); + lsm_unlock_ema(map); + } + if (rc) + lsm_free_ema(ema); + return rc; +} +EXPORT_SYMBOL(security_enclave_load); + +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + return call_int_hook(enclave_init, 0, encl, sigstruct, src); +} +EXPORT_SYMBOL(security_enclave_init); +#endif /* CONFIG_INTEL_SGX */ -- 2.17.1
It’s straightforward to call new LSM hooks from the SGX subsystem/module. There are three places where LSM hooks are invoked. 1) sgx_mmap() invokes security_file_mprotect() to validate requested protection. It is necessary because security_mmap_file() invoked by mmap() syscall only validates protections against /dev/sgx/enclave file, but not against those files from which the pages were loaded from. 2) security_enclave_load() is invoked upon loading of every enclave page by the EADD ioctl. Please note that if pages are EADD’ed in batch, the SGX subsystem/module is responsible for dividing pages in trunks so that each trunk is loaded from a single VMA. 3) security_enclave_init() is invoked before initializing (EINIT) every enclave. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 80 +++++++++++++++++++++++--- arch/x86/kernel/cpu/sgx/driver/main.c | 16 +++++- 2 files changed, 85 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index b186fb7b48d5..4f5abf9819a7 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) // Copyright(c) 2016-19 Intel Corporation. -#include <asm/mman.h> +#include <linux/mman.h> #include <linux/delay.h> #include <linux/file.h> #include <linux/hashtable.h> @@ -11,6 +11,7 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> #include <linux/suspend.h> +#include <linux/security.h> #include "driver.h" struct sgx_add_page_req { @@ -575,6 +576,46 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } +static int sgx_encl_prepare_page(struct file *filp, unsigned long dst, + unsigned long src, void *buf) +{ + struct vm_area_struct *vma; + unsigned long prot; + int rc; + + if (dst & ~PAGE_SIZE) + return -EINVAL; + + rc = down_read_killable(¤t->mm->mmap_sem); + if (rc) + return rc; + + vma = find_vma(current->mm, dst); + if (vma && dst >= vma->vm_start) + prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | + _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | + _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); + else + prot = 0; + + vma = find_vma(current->mm, src); + if (!vma || src < vma->vm_start || src + PAGE_SIZE > vma->vm_end) + rc = -EFAULT; + + if (!rc && !(vma->vm_flags & VM_MAYEXEC)) + rc = -EACCES; + + if (!rc && copy_from_user(buf, (void __user *)src, PAGE_SIZE)) + rc = -EFAULT; + + if (!rc) + rc = security_enclave_load(filp, dst, PAGE_SIZE, prot, vma); + + up_read(¤t->mm->mmap_sem); + + return rc; +} + /** * sgx_ioc_enclave_add_page - handler for %SGX_IOC_ENCLAVE_ADD_PAGE * @@ -613,10 +654,9 @@ static long sgx_ioc_enclave_add_page(struct file *filep, unsigned int cmd, data = kmap(data_page); - if (copy_from_user((void *)data, (void __user *)addp->src, PAGE_SIZE)) { - ret = -EFAULT; + ret = sgx_encl_prepare_page(filep, addp->addr, addp->src, data); + if (ret) goto out; - } ret = sgx_encl_add_page(encl, addp->addr, data, &secinfo, addp->mrmask); if (ret) @@ -718,6 +758,31 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct, return ret; } +static int sgx_encl_prepare_sigstruct(struct file *filp, unsigned long src, + struct sgx_sigstruct *ss) +{ + struct vm_area_struct *vma; + int rc; + + rc = down_read_killable(¤t->mm->mmap_sem); + if (rc) + return rc; + + vma = find_vma(current->mm, src); + if (!vma || src < vma->vm_start || src + sizeof(*ss) > vma->vm_end) + rc = -EFAULT; + + if (!rc && copy_from_user(ss, (void __user *)src, sizeof(*ss))) + rc = -EFAULT; + + if (!rc) + rc = security_enclave_init(filp, ss, vma); + + up_read(¤t->mm->mmap_sem); + + return rc; +} + /** * sgx_ioc_enclave_init - handler for %SGX_IOC_ENCLAVE_INIT * @@ -753,12 +818,9 @@ static long sgx_ioc_enclave_init(struct file *filep, unsigned int cmd, ((unsigned long)sigstruct + PAGE_SIZE / 2); memset(einittoken, 0, sizeof(*einittoken)); - if (copy_from_user(sigstruct, (void __user *)initp->sigstruct, - sizeof(*sigstruct))) { - ret = -EFAULT; + ret = sgx_encl_prepare_sigstruct(filep, initp->sigstruct, sigstruct); + if (ret) goto out; - } - ret = sgx_encl_init(encl, sigstruct, einittoken); diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index afe844aa81d6..95fe18c37b84 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -63,14 +63,26 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + unsigned long prot; + int rc; vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; vma->vm_private_data = encl; - kref_get(&encl->refcount); + prot = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC); + vma->vm_flags &= ~prot; - return 0; + prot = _calc_vm_trans(prot, VM_READ, PROT_READ) | + _calc_vm_trans(prot, VM_WRITE, PROT_WRITE) | + _calc_vm_trans(prot, VM_EXEC, PROT_EXEC); + rc = security_file_mprotect(vma, prot, prot); + if (!rc) { + vma->vm_flags |= calc_vm_prot_bits(prot, 0); + kref_get(&encl->refcount); + } + + return rc; } static unsigned long sgx_get_unmapped_area(struct file *file, -- 2.17.1
This patch governs enclave page protections in a similar way to how current SELinux governs protections for regular memory pages. In summary: - All pages are allowed PROT_READ/PROT_WRITE upon request. - For pages that are EADD’ed, PROT_EXEC will be granted initially if PROT_EXEC could also be granted to the VMA containing the source pages. Afterwards, PROT_EXEC will be removed once PROT_WRITE is requested/granted, and could be granted again if the backing file has EXECMOD or the calling process has PROCMEM. For anonymous pages, backing file is considered to be the file containing SIGSTRUCT. - For pages that are EAUG’ed, they are considered modified initially so PROT_EXEC will not be granted unless the file containing SIGSTRUCT has EXECMOD, or the calling process has EXECMEM. Besides, launch control is implemented as EXECUTE permission on the SIGSTRUCT file. That is, - SIGSTRUCT file has EXECUTE – Enclave is allowed to launch. But this is granted only if the enclosing VMA has the same content as the disk file (i.e. vma->anon_vma == NULL). - SIGSTRUCT file has EXECMOD – All anonymous enclave pages are allowed PROT_EXEC. In all cases, simultaneous WX requires EXECMEM on the calling process. Implementation wise, 3 bits are associated with every EMA by SELinux. - sourced – Set if EMA is loaded from a file, cleared otherwise. - execute – Set if EMA is potentially executable, cleared when EMA has once been mapped writable, as result of mmap()/mprotect() syscalls. A page is executable if this bit is set AND its backing file or the file containing SIGSTRUCT (for anonymous pages) has EXECUTE. This bit will be cleared upon PROT_WRITE granted to the EMA. - execmod – Set if the backing file or the file containing SIGSTRUCT (for anonymous pages) has EXECMOD. A page is executable if this bit is set. All those 3 bits are initialized at selinux_enclave_load() and checked in selinux_file_mprotect(). SGX subsystem is expected to invoke security_file_mprotect() upon mmap() to not bypass the check. mmap() shall be treated as mprotect() from PROT_NONE to the requested protection. selinux_enclave_init() determines if an enclave is allowed to launch, using the criteria described earlier. This implementation does NOT accept SIGSTRUCT in anonymous memory. The backing file is also cached in struct file_security_struct and will serve as the base for decisions for anonymous pages. There are NO new process/file permissions introduced in this patch. The intention here is to ensure existing SELinux tools will work with enclaves seamlessly by treating them the same way as regular shared objects. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- security/selinux/hooks.c | 229 ++++++++++++++++++++++++++++-- security/selinux/include/objsec.h | 24 ++++ 2 files changed, 245 insertions(+), 8 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 94de51628fdc..cea4db780eb8 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -1663,10 +1663,9 @@ static int cred_has_capability(const struct cred *cred, /* Check whether a task has a particular permission to an inode. The 'adp' parameter is optional and allows other audit data to be passed (e.g. the dentry). */ -static int inode_has_perm(const struct cred *cred, - struct inode *inode, - u32 perms, - struct common_audit_data *adp) +static inline int inode_has_perm_audit(int audit, const struct cred *cred, + struct inode *inode, u32 perms, + struct common_audit_data *adp) { struct inode_security_struct *isec; u32 sid; @@ -1679,8 +1678,22 @@ static int inode_has_perm(const struct cred *cred, sid = cred_sid(cred); isec = selinux_inode(inode); - return avc_has_perm(&selinux_state, - sid, isec->sid, isec->sclass, perms, adp); + if (audit) + return avc_has_perm(&selinux_state, sid, isec->sid, + isec->sclass, perms, adp); + else { + struct av_decision avd; + return avc_has_perm_noaudit(&selinux_state, sid, isec->sid, + isec->sclass, perms, 0, &avd); + } +} + +static int inode_has_perm(const struct cred *cred, + struct inode *inode, + u32 perms, + struct common_audit_data *adp) +{ + return inode_has_perm_audit(1, cred, inode, perms, adp); } /* Same as inode_has_perm, but pass explicit audit data containing @@ -3499,6 +3512,13 @@ static int selinux_file_alloc_security(struct file *file) return file_alloc_security(file); } +static void selinux_file_free_security(struct file *file) +{ + long f = atomic_long_read(&selinux_file(file)->enclave_proxy_file); + if (f) + fput((struct file *)f); +} + /* * Check whether a task has the ioctl permission and cmd * operation to an inode. @@ -3666,19 +3686,23 @@ static int selinux_mmap_file(struct file *file, unsigned long reqprot, (flags & MAP_TYPE) == MAP_SHARED); } +#ifdef CONFIG_INTEL_SGX +static int enclave_mprotect(struct vm_area_struct *, size_t); +#endif + static int selinux_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot) { const struct cred *cred = current_cred(); u32 sid = cred_sid(cred); + int rc = 0; if (selinux_state.checkreqprot) prot = reqprot; if (default_noexec && (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) { - int rc = 0; if (vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk) { rc = avc_has_perm(&selinux_state, @@ -3705,7 +3729,12 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, return rc; } - return file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED); + rc = file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED); +#ifdef CONFIG_INTEL_SGX + if (!rc) + rc = enclave_mprotect(vma, prot); +#endif + return rc; } static int selinux_file_lock(struct file *file, unsigned int cmd) @@ -6740,12 +6769,190 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) } #endif +#ifdef CONFIG_INTEL_SGX +struct ema__mprot_cb_params { + struct file *encl; + size_t curprot; + size_t reqprot; +}; + +static inline struct file *ema__get_source(struct lsm_ema *ema, + struct file *encl) +{ + if (!selinux_ema(ema)->sourced) { + struct file_security_struct *fsec = selinux_file(encl); + return (void *)atomic_long_read(&fsec->enclave_proxy_file); + } + + return ema->source; +} + +static int ema__chk_X_cb(struct lsm_ema *ema, void *a) +{ + const struct ema__mprot_cb_params *parm = a; + struct ema_security_struct *esec = selinux_ema(ema); + struct file *src; + int rc; + + if (esec->execmod) + /* EXECMOD grants X on all cases */ + return 0; + + src = ema__get_source(ema, parm->encl); + if (src) { + if (esec->execute) + /* Unmodified range requires FILE__EXECUTE */ + rc = file_has_perm(current_cred(), src, + FILE__EXECUTE); + else { + /* Modified range requires FILE__EXECMOD */ + rc = file_has_perm(current_cred(), src, + FILE__EXECUTE | FILE__EXECMOD); + /* Cache FILE__EXECMOD to avoid checking it again */ + esec->execmod = !rc; + } + } else + rc = esec->execute ? 0 : -EACCES; + return rc; +} + +static int ema__clr_X_cb(struct lsm_ema *ema, void *a) +{ + selinux_ema(ema)->execute = 0; + return 0; +} + +static int enclave_mprotect(struct vm_area_struct *vma, size_t prot) +{ + struct lsm_ema_map *map; + int rc; + + if (!vma->vm_file) + return 0; + + map = lsm_get_ema_map(vma->vm_file); + if (!map) + /* Not an enclave */ + return 0; + + if ((prot & VM_WRITE) && (prot && VM_EXEC)) { + /* EXECMEM is necessary, and will be checked later */ + rc = -1; + } else { + struct ema__mprot_cb_params parm; + + parm.encl = vma->vm_file; + parm.curprot = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC); + parm.reqprot = calc_vm_prot_bits(prot, 0); + + rc = lsm_lock_ema(map); + if (!rc) + return rc; + + /* Checks are necessary only if X is being requested */ + if (prot & VM_EXEC) + rc = lsm_for_each_ema(map, vma->vm_start, vma->vm_end, + ema__chk_X_cb, &parm); + /* Clear X if W is granted */ + if (!rc && (prot & VM_WRITE)) + rc = lsm_for_each_ema(map, vma->vm_start, vma->vm_end, + ema__clr_X_cb, &parm); + lsm_unlock_ema(map); + } + + /* EXECMEM is the last resort if X is being requested */ + if (rc && (prot & VM_EXEC)) { + /* No need to update selinux_ema(ema)->execute here because it + * doesn't matter anyway when EXECMEM is present + */ + rc = avc_has_perm(&selinux_state, current_sid(), current_sid(), + SECCLASS_PROCESS, PROCESS__EXECMEM, NULL); + } + return rc; +} + +static int selinux_enclave_load(struct file *encl, struct lsm_ema *ema, + size_t flags, struct vm_area_struct *src) +{ + size_t prot = flags & (PROT_READ | PROT_WRITE | PROT_EXEC); + struct ema_security_struct *esec; + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); + int rc; + + /* check if @prot could be granted */ + rc = 0; + if (src) { + /* EADD */ + if (calc_vm_prot_bits(prot, 0) & ~src->vm_flags) + rc = selinux_file_mprotect(src, prot, prot); + } else if (prot & PROT_EXEC) { + /* EAUG implies RW, so RWX here requires EXECMEM */ + rc = avc_has_perm(&selinux_state, sid, sid, + SECCLASS_PROCESS, PROCESS__EXECMEM, NULL); + } + if (rc) + return rc; + + /* Initialize ema_security_struct now that @prot has been approved */ + esec = selinux_ema(ema); + /* Is @src backed by a file? */ + if (src && src->vm_file) + esec->sourced = 1; + /* Is @src mapped shared, or mapped privately and not modified? */ + if ((esec->sourced && !src->anon_vma) || (prot & PROT_EXEC)) + esec->execute = 1; + /* If the backing file is NOT kept opened, cache FILE__EXECUTE now! No + * audit log will be generated */ + if (esec->execute && esec->sourced && !ema->source && + inode_has_perm_audit(0, cred, file_inode(src->vm_file), + FILE__EXECUTE, NULL)) + esec->execute = 0; + /* If the backing file is NOT kept opened, cache FILE__EXECMOD now! No + * audit log will be generated */ + if (esec->sourced && !ema->source && + !inode_has_perm_audit(0, cred, file_inode(src->vm_file), + FILE__EXECUTE | FILE__EXECMOD, NULL)) + esec->execmod = 1; + + return 0; +} + +static int selinux_enclave_init(struct file *encl, + struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + struct file_security_struct *fsec = selinux_file(encl); + int rc; + + /* Is @src mapped shared, or mapped privately and not modified? */ + if (!src->vm_file || src->anon_vma) + return -EACCES; + + /* FILE__EXECUTE grants enclaves permission to launch */ + rc = file_has_perm(current_cred(), src->vm_file, FILE__EXECUTE); + if (rc) + return rc; + + /* SIGSTRUCT file is also used to determine permissions for pages not + * backed by any files */ + if (atomic_long_cmpxchg(&fsec->enclave_proxy_file, 0, + (long)src->vm_file)) + return -EEXIST; + + get_file(src->vm_file); + return 0; +} +#endif + struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct task_security_struct), .lbs_file = sizeof(struct file_security_struct), .lbs_inode = sizeof(struct inode_security_struct), .lbs_ipc = sizeof(struct ipc_security_struct), .lbs_msg_msg = sizeof(struct msg_security_struct), + .lbs_ema_data = sizeof(struct ema_security_struct) * + IS_ENABLED(CONFIG_INTEL_SGX), }; static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { @@ -6822,6 +7029,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(file_permission, selinux_file_permission), LSM_HOOK_INIT(file_alloc_security, selinux_file_alloc_security), + LSM_HOOK_INIT(file_free_security, selinux_file_free_security), LSM_HOOK_INIT(file_ioctl, selinux_file_ioctl), LSM_HOOK_INIT(mmap_file, selinux_mmap_file), LSM_HOOK_INIT(mmap_addr, selinux_mmap_addr), @@ -6982,6 +7190,11 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), #endif + +#ifdef CONFIG_INTEL_SGX + LSM_HOOK_INIT(enclave_load, selinux_enclave_load), + LSM_HOOK_INIT(enclave_init, selinux_enclave_init), +#endif }; static __init int selinux_init(void) diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index 91c5395dd20c..e58324997e8b 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -23,6 +23,7 @@ #include <linux/in.h> #include <linux/spinlock.h> #include <linux/lsm_hooks.h> +#include <linux/lsm_ema.h> #include <linux/msg.h> #include <net/net_namespace.h> #include "flask.h" @@ -68,6 +69,7 @@ struct file_security_struct { u32 fown_sid; /* SID of file owner (for SIGIO) */ u32 isid; /* SID of inode at the time of file open */ u32 pseqno; /* Policy seqno at the time of file open */ + atomic_long_t enclave_proxy_file; }; struct superblock_security_struct { @@ -154,6 +156,23 @@ struct bpf_security_struct { u32 sid; /*SID of bpf obj creater*/ }; +struct ema_security_struct { + /* (@execute && FILE__EXECUTE) grants X. + * FILE__EXECUTE is determined at mprotect() but if backing file is NOT + * kept open, FILE__EXECUTE will be determined at enclave_load() hook + */ + int execute:1; + /* (@execmod || FILE__EXECMOD) grants W->X. + * FILE__EXECMOD is determined at mprotect() but if backing file is NOT + * kept open, FILE__EXECMOD will be determined at enclave_load() hook + */ + int execmod:1; + /* @sourced is set if an enclave range is loaded (EADD'ed) from a file, + * cleared otherwise (i.e. EAUG'ed or EADD'ed from anonymous memory + */ + int sourced:1; +}; + extern struct lsm_blob_sizes selinux_blob_sizes; static inline struct task_security_struct *selinux_cred(const struct cred *cred) { @@ -185,4 +204,9 @@ static inline struct ipc_security_struct *selinux_ipc( return ipc->security + selinux_blob_sizes.lbs_ipc; } +static inline struct ema_security_struct *selinux_ema(struct lsm_ema *ema) +{ + return (void *)lsm_ema_data(ema, selinux_blob_sizes); +} + #endif /* _SELINUX_OBJSEC_H_ */ -- 2.17.1
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Stephen Smalley > Sent: Tuesday, June 25, 2019 2:49 PM > > On 6/25/19 5:01 PM, Stephen Smalley wrote: > > On 6/21/19 1:05 PM, Xing, Cedric wrote: > >>> From: Christopherson, Sean J > >>> Sent: Wednesday, June 19, 2019 3:24 PM > >>> > >>> diff --git a/include/linux/security.h b/include/linux/security.h > >>> index 6a1f54ba6794..572ddfc53039 100644 > >>> --- a/include/linux/security.h > >>> +++ b/include/linux/security.h > >>> @@ -1832,11 +1832,18 @@ static inline void > >>> security_bpf_prog_free(struct bpf_prog_aux *aux) > >>> #ifdef CONFIG_INTEL_SGX > >>> #ifdef CONFIG_SECURITY > >>> int security_enclave_map(unsigned long prot); > >>> +int security_enclave_load(struct vm_area_struct *vma, unsigned long > >>> prot, > >>> + bool measured); > >>> #else > >>> static inline int security_enclave_map(unsigned long prot) > >>> { > >>> return 0; > >>> } > >>> +static inline int security_enclave_load(struct vm_area_struct *vma, > >>> + unsigned long prot, bool measured) { > >>> + return 0; > >>> +} > >>> #endif /* CONFIG_SECURITY */ > >>> #endif /* CONFIG_INTEL_SGX */ > >> > >> Parameters to security_enclave_load() are specific on what's being > >> loading only, but unspecific on which enclave to be loaded into. That > >> kills the possibility of an LSM module making enclave dependent > >> decisions. > >> > >> Btw, if enclave (in the form of struct file) is also passed in as a > >> parameter, it'd let LSM know that file is an enclave, hence would be > >> able to make the same decision in security_mmap_file() as in > >> security_enclave_map(). In other words, you wouldn't need > >> security_enclave_map(). > > > > Sorry, you want security_enclave_load() to stash a reference to the > > enclave file in some security module-internal state, then match it > > upon later security_mmap_file() calls to determine that it is dealing > > with an enclave, and then adjust its logic accordingly? When do we > > release that reference? > > I guess you mean set a flag in the enclave file security struct upon > security_enclave_load() and check that flag in security_mmap_file(). Yes, by invoking security_enclave_load(), the SGX subsystem indicates to LSM the file struct in subject refers to an enclave. But security_mmap_file() doesn't pass in the range being mmap()'ed so LSM still cannot decide. Instead of changing the definition of security_mmap_file(), I'd invoke security_file_mprotect() from sgx_mmap(). After all, creating a new mapping is equivalent to changing the target range from PROT_NONE to @prot being requested. I just sent out a patch series with all those details in code. > > This seems somewhat similar to one of Sean's alternatives in the patch > description for 06/12, except by pushing the information from sgx to LSM > upon security_enclave_load() rather than pulling it via a > is_sgx_enclave() helper. Not clear if it is still subject to the same > limitations. Yes, they are similar except who keeps track of that piece of information. As Dr. Greg pointed out, the new hooks do have to be SGX specific. But calling is_sgx_enclave() really ties LSM to SGX. In contrast, inferring it through security_enclave_load() makes LSM SGX-agnostic. Then the only SGX specific thing left in the hooks is the sgx_sigstruct. In theory, that's just a digital signature and as Dr. Greg pointed out, SGX is probably not the only technology that uses digital signature to identify executable contents. And in that sense, if we rename it to something generic with probably a tag indicating its format, then the whole thing would become SGX agnostic and could be useful for other TEEs on architectures other than x86.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Stephen Smalley > Sent: Tuesday, June 25, 2019 2:10 PM > > On 6/21/19 5:22 PM, Xing, Cedric wrote: > >> From: Christopherson, Sean J > >> Sent: Wednesday, June 19, 2019 3:24 PM > >> > >> Intended use of each permission: > >> > >> - SGX_EXECDIRTY: dynamically load code within the enclave itself > >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. > >> Graphene > > > > Why does it matter whether a code page is measured or not? > > It won't be incorporated into an attestation? Yes, it will. And because of that, I don't think LSM should care. > > > > >> - SGX_EXECANON: load code from anonymous memory (likely Graphene) > > > > Graphene doesn't load code from anonymous memory. It loads code > dynamically though, as in SGX_EXECDIRTY case. > > So do we expect EXECANON to never be triggered at all? I don't think so. And from security perspective, the decision I think shall base on whether the source pages are (allowed to be made) executable. > > > > >> - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior > > > > Why is SGX_EXECUTE needed from security perspective? Or why isn't > FILE__EXECUTE sufficient? > > Splitting the SGX permissions from the regular ones allows distinctions > to be made between what can be executed in the host process and what can > be executed in the enclave. The host process may be allowed > FILE__EXECUTE to numerous files that do not contain any code ever > intended to be executed within the enclave. Given an enclave and its host process, any executable contents could be allowed in 1) Neither the enclave nor the host 2) Enclave only 3) Host only 4) Both the enclave and the host Given the fact that enclave can access host's memory, if a piece of code is NOT allowed in the host, then it shouldn't be allowed in enclave either. So #2 shall never happen. An enclave dictates/enforces its own contents cryptographically, so it's unnecessary to enforce #3 by LSM IMO. Then #1 and #4 are the only 2 cases to be supported - a single FILE__EXECUTE is sufficient. I'm not objecting to new permissions to make things more explicit, but that'd require updates to user mode tools. I think it just easier to reuse existing permissions.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 25, 2019 2:06 PM
>
> On 6/21/19 1:09 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Wednesday, June 19, 2019 3:24 PM
> >>
> >> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> >> index 3ec702cf46ca..fc239e541b62 100644
> >> --- a/security/selinux/hooks.c
> >> +++ b/security/selinux/hooks.c
> >> @@ -6726,6 +6726,23 @@ static void selinux_bpf_prog_free(struct
> bpf_prog_aux *aux)
> >> }
> >> #endif
> >>
> >> +#ifdef CONFIG_INTEL_SGX
> >> +static int selinux_enclave_map(unsigned long prot) {
> >> + const struct cred *cred = current_cred();
> >> + u32 sid = cred_sid(cred);
> >> +
> >> + /* SGX is supported only in 64-bit kernels. */
> >> + WARN_ON_ONCE(!default_noexec);
> >> +
> >> + if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
> >> + return avc_has_perm(&selinux_state, sid, sid,
> >> + SECCLASS_PROCESS2, PROCESS2__SGX_MAPWX,
> >> + NULL);
> >
> > Why isn't SGX_MAPWX enclave specific but process wide?
>
> How would you tie it to a specific enclave? What's the object/target
> SID? The SID of the enclave inode? Which one? The source vma file,
> the /dev/sgx/enclave open instance, the sigstruct file, ...? If a
> process can map one enclave WX, what's the benefit of preventing it from
> doing likewise for any other enclave it can load?
I wasn't saying we should. Rather, I think we can reuse EXECMEM. After all, under what circumstances are WX necessary? IMHO, WX shall be strongly discouraged and this SGX_MAPWX is kind of trying to give the bearing enclave a dirty look. And if that's the sole purpose, let's make it even dirtier by requiring EXECMEM on the host process. After all, WX is never a good thing in security so I doubt any ISVs would have a practical reason to require WX in their enclaves.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Stephen Smalley
> Sent: Tuesday, June 25, 2019 1:48 PM
>
> On 6/21/19 12:54 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Wednesday, June 19, 2019 3:24 PM
> >>
> >> diff --git a/security/security.c b/security/security.c index
> >> 613a5c00e602..03951e08bdfc 100644
> >> --- a/security/security.c
> >> +++ b/security/security.c
> >> @@ -2359,3 +2359,10 @@ void security_bpf_prog_free(struct
> bpf_prog_aux *aux)
> >> call_void_hook(bpf_prog_free_security, aux);
> >> }
> >> #endif /* CONFIG_BPF_SYSCALL */
> >> +
> >> +#ifdef CONFIG_INTEL_SGX
> >> +int security_enclave_map(unsigned long prot) {
> >> + return call_int_hook(enclave_map, 0, prot); } #endif /*
> >> +CONFIG_INTEL_SGX */
> >
> > Why is this new security_enclave_map() necessary while
> security_mmap_file() will also be invoked?
>
> security_mmap_file() doesn't know about enclaves. It will just end up
> checking FILE__READ, FILE__WRITE, and FILE__EXECUTE to /dev/sgx/enclave.
> This was noted in the patch description.
Surely I understand all those. As I mentioned in my other email, enclave_load() could indicate to LSM that a file is an enclave. Of course mmap() could be invoked before any pages are loaded so LSM wouldn't know at the first mmap(), but that doesn't matter as an empty enclave wouldn't post any threats anyway.
On 6/27/2019 11:56 AM, Cedric Xing wrote: > SGX enclaves are loaded from pages in regular memory. Given the ability to > create executable pages, the newly added SGX subsystem may present a backdoor > for adversaries to circumvent LSM policies, such as creating an executable > enclave page from a modified regular page that would otherwise not be made > executable as prohibited by LSM. Therefore arises the primary question of > whether an enclave page should be allowed to be created from a given source > page in regular memory. > > A related question is whether to grant/deny a mprotect() request on a given > enclave page/range. mprotect() is traditionally covered by > security_file_mprotect() hook, however, enclave pages have a different lifespan > than either MAP_PRIVATE or MAP_SHARED. Particularly, MAP_PRIVATE pages have the > same lifespan as the VMA while MAP_SHARED pages have the same lifespan as the > backing file (on disk), but enclave pages have the lifespan of the enclave’s > file descriptor. For example, enclave pages could be munmap()’ed then mmap()’ed > again without losing contents (like MAP_SHARED), but all enclave pages will be > lost once its file descriptor has been closed (like MAP_PRIVATE). That said, > LSM modules need some new data structure for tracking protections of enclave > pages/ranges so that they can make proper decisions at mmap()/mprotect() > syscalls. > > The last question, which is orthogonal to the 2 above, is whether or not to > allow a given enclave to launch/run. Enclave pages are not visible to the rest > of the system, so to some extent offer a better place for malicious software to > hide. Thus, it is sometimes desirable to whitelist/blacklist enclaves by their > measurements, signing public keys, or image files. > > To address the questions above, 2 new LSM hooks are added for enclaves. > - security_enclave_load() – This hook allows LSM to decide whether or not to > allow instantiation of a range of enclave pages using the specified VMA. It > is invoked when a range of enclave pages is about to be loaded. It serves 3 > purposes: 1) indicate to LSM that the file struct in subject is an enclave; > 2) allow LSM to decide whether or not to instantiate those pages and 3) > allow LSM to initialize internal data structures for tracking > origins/protections of those pages. > - security_enclave_init() – This hook allows whitelisting/blacklisting or > performing whatever checks deemed appropriate before an enclave is allowed > to run. An LSM module may opt to use the file backing the SIGSTRUCT as a > proxy to dictate allowed protections for anonymous pages. > > mprotect() of enclave pages continue to be governed by > security_file_mprotect(), with the expectation that LSM is able to distinguish > between regular and enclave pages inside the hook. For mmap(), the SGX > subsystem is expected to invoke security_file_mprotect() explicitly to check > protections against the requested protections for existing enclave pages. As > stated earlier, enclave pages have different lifespan than the existing > MAP_PRIVATE and MAP_SHARED pages, so would require a new data structure outside > of VMA to track their protections and/or origins. Enclave Memory Area (or EMA > for short) has been introduced to address the need. EMAs are maintained by the > LSM framework for all LSM modules to share. EMAs will be instantiated for > enclaves only so will not impose memory/performance overheads for regular > applications/files. Please see include/linux/lsm_ema.h and security/lsm_ema.c > for details. > > A new setup parameter – lsm.ema.cache_decisions has been introduced to offer > the choice between memory consumption and accuracy of audit logs. Enabling > lsm.ema.cache_decisions causes LSM framework NOT to keep backing files open for > EMAs. While that saves memory, it requires LSM modules to make and cache > decisions ahead of time, and makes it difficult for LSM modules to generate > accurate audit logs. System administrators are expected to run LSM in > permissive mode with lsm.ema.cache_decisions off to determine the minimal > permissions needed, and then turn it back on in enforcing mode for optimal > performance and memory usage. lsm.ema.cache_decisions is on by default and > could be turned off by appending “lsm.ema.cache_decisions=0” or > “lsm.ema.cache_decisions=off” to the kernel command line. > > Signed-off-by: Cedric Xing <cedric.xing@intel.com> > --- > include/linux/lsm_ema.h | 171 ++++++++++++++++++++++++++++++++++++++ > include/linux/lsm_hooks.h | 29 +++++++ > include/linux/security.h | 23 +++++ > security/Makefile | 1 + > security/lsm_ema.c | 132 +++++++++++++++++++++++++++++ > security/security.c | 47 ++++++++++- > 6 files changed, 402 insertions(+), 1 deletion(-) > create mode 100644 include/linux/lsm_ema.h > create mode 100644 security/lsm_ema.c Don't use "lsm_ema". This isn't LSM infrastructure. Three letter abbreviations are easy to type, but are doomed to encounter conflicts and lead to confusion. I suggest that you use "enclave", because it doesn't start off conflicting with anything and is descriptive. This code should not be mixed in with the LSM infrastructure. It should all be contained in its own module, under security/enclave. > diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h > new file mode 100644 > index 000000000000..a09b8f96da05 > --- /dev/null > +++ b/include/linux/lsm_ema.h There's no need for this header to be used outside the enclave LSM. It should be "security/enclave/enclave.h" > @@ -0,0 +1,171 @@ > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > +/** > + * Enclave Memory Area interface for LSM modules > + * > + * Copyright(c) 2016-19 Intel Corporation. > + */ > + > +#ifndef _LSM_EMA_H_ > +#define _LSM_EMA_H_ > + > +#include <linux/list.h> > +#include <linux/mutex.h> > +#include <linux/fs.h> > +#include <linux/file.h> > + > +/** > + * lsm_ema - LSM Enclave Memory Area structure How about s/lsm_ema/enclave/ ? > + * > + * Data structure to track origins of enclave pages > + * > + * @link: > + * Link to adjacent EMAs. EMAs are sorted by their addresses in ascending > + * order > + * @start: > + * Starting address > + * @end: > + * Ending address > + * @source: > + * File from which this range was loaded from, or NULL if not loaded from > + * any files > + */ > +struct lsm_ema { > + struct list_head link; > + size_t start; > + size_t end; > + struct file *source; > +}; > + > +#define lsm_ema_data(ema, blob_sizes) \ > + ((char *)((struct lsm_ema *)(ema) + 1) + blob_sizes.lbs_ema_data) Who uses this? The enclave LSM? Convention would have this be selinux_enclave(ema) for the SELinux code. This is inconsistent with the way other blobs are handled. > + > +/** > + * lsm_ema_map - LSM Enclave Memory Map structure enclave_map > + * > + * Container for EMAs of an enclave > + * > + * @list: > + * Head of a list of sorted EMAs > + * @lock: > + * Acquire before querying/updateing the list EMAs > + */ > +struct lsm_ema_map { > + struct list_head list; > + struct mutex lock; > +}; > + > +/** > + * These are functions to be used by the LSM framework, and must be defined > + * regardless CONFIG_INTEL_SGX is enabled or not. Not acceptable for the LSM infrastructure. They are inconsistent with the way data is used there. > + */ > + > +#ifdef CONFIG_INTEL_SGX > +void lsm_ema_global_init(size_t); > +void lsm_free_ema_map(atomic_long_t *); > +#else > +static inline void lsm_ema_global_init(size_t ema_data_size) > +{ > +} > + > +static inline void lsm_free_ema_map(atomic_long_t *p) > +{ > +} > +#endif > + > +/** > + * Below are APIs to be used by LSM modules > + */ > + > +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *); > +struct lsm_ema *lsm_alloc_ema(void); Do you mean security_alloc_enclave()? That would go into security/security.h > +void lsm_free_ema(struct lsm_ema *); Do you mean security_free_enclave()? That would go into security/security.h > +void lsm_init_ema(struct lsm_ema *, size_t, size_t, struct file *); This goes in the enclave LSM. > +int lsm_merge_ema(struct lsm_ema *, struct lsm_ema_map *); > +struct lsm_ema *lsm_split_ema(struct lsm_ema *, size_t, struct lsm_ema_map *); > + > +static inline struct lsm_ema_map *lsm_get_ema_map(struct file *f) > +{ > + return (void *)atomic_long_read(f->f_security); > +} > + > +static inline int __must_check lsm_lock_ema(struct lsm_ema_map *map) > +{ > + return mutex_lock_interruptible(&map->lock); > +} > + > +static inline void lsm_unlock_ema(struct lsm_ema_map *map) > +{ > + mutex_unlock(&map->lock); > +} > + > +static inline struct lsm_ema *lsm_prev_ema(struct lsm_ema *p, > + struct lsm_ema_map *map) > +{ > + p = list_prev_entry(p, link); > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline struct lsm_ema *lsm_next_ema(struct lsm_ema *p, > + struct lsm_ema_map *map) > +{ > + p = list_next_entry(p, link); > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline struct lsm_ema *lsm_find_ema(struct lsm_ema_map *map, size_t a) > +{ > + struct lsm_ema *p; > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + list_for_each_entry(p, &map->list, link) > + if (a < p->end) > + break; > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline int lsm_insert_ema(struct lsm_ema_map *map, struct lsm_ema *n) > +{ > + struct lsm_ema *p = lsm_find_ema(map, n->start); > + > + if (!p) > + list_add_tail(&n->link, &map->list); > + else if (n->end <= p->start) > + list_add_tail(&n->link, &p->link); > + else > + return -EEXIST; > + > + lsm_merge_ema(n, map); > + if (p) > + lsm_merge_ema(p, map); > + return 0; > +} > + > +static inline int lsm_for_each_ema(struct lsm_ema_map *map, size_t start, > + size_t end, int (*cb)(struct lsm_ema *, > + void *), void *arg) > +{ > + struct lsm_ema *ema; > + int rc; > + > + ema = lsm_find_ema(map, start); > + while (ema && end > ema->start) { > + if (start > ema->start) > + lsm_split_ema(ema, start, map); > + if (end < ema->end) > + ema = lsm_split_ema(ema, end, map); > + > + rc = (*cb)(ema, arg); > + lsm_merge_ema(ema, map); > + if (rc) > + return rc; > + > + ema = lsm_next_ema(ema, map); > + } > + > + if (ema) > + lsm_merge_ema(ema, map); > + return 0; > +} There is no way that these belong as part of the LSM infrastructure. If you need an enclave management API you need to find some other place for it. > + > +#endif /* _LSM_EMA_H_ */ > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h > index 47f58cfb6a19..ade1f9f81e64 100644 > --- a/include/linux/lsm_hooks.h > +++ b/include/linux/lsm_hooks.h > @@ -29,6 +29,8 @@ > #include <linux/init.h> > #include <linux/rculist.h> > > +struct lsm_ema; > + > /** > * union security_list_options - Linux Security Module hook function list > * > @@ -1446,6 +1448,21 @@ > * @bpf_prog_free_security: > * Clean up the security information stored inside bpf prog. > * > + * @enclave_load: > + * Decide if a range of pages shall be allowed to be loaded into an > + * enclave > + * > + * @encl points to the file identifying the target enclave > + * @ema specifies the target range to be loaded > + * @flags contains protections being requested for the target range > + * @source points to the VMA containing the source pages to be loaded > + * > + * @enclave_init: > + * Decide if an enclave shall be allowed to launch > + * > + * @encl points to the file identifying the target enclave being launched > + * @sigstruct contains a copy of the SIGSTRUCT in kernel memory > + * @source points to the VMA backing SIGSTRUCT in user memory > */ > union security_list_options { > int (*binder_set_context_mgr)(struct task_struct *mgr); > @@ -1807,6 +1824,13 @@ union security_list_options { > int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); > void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); > #endif /* CONFIG_BPF_SYSCALL */ > + > +#ifdef CONFIG_INTEL_SGX > + int (*enclave_load)(struct file *encl, struct lsm_ema *ema, > + size_t flags, struct vm_area_struct *source); > + int (*enclave_init)(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *source); > +#endif > }; > > struct security_hook_heads { > @@ -2046,6 +2070,10 @@ struct security_hook_heads { > struct hlist_head bpf_prog_alloc_security; > struct hlist_head bpf_prog_free_security; > #endif /* CONFIG_BPF_SYSCALL */ > +#ifdef CONFIG_INTEL_SGX > + struct hlist_head enclave_load; > + struct hlist_head enclave_init; > +#endif > } __randomize_layout; > > /* > @@ -2069,6 +2097,7 @@ struct lsm_blob_sizes { > int lbs_ipc; > int lbs_msg_msg; > int lbs_task; > + int lbs_ema_data; Is a module like SELinux expected to have its own data for enclave? That's the only case where you would have a enclave entry in the blob. > }; > > /* > diff --git a/include/linux/security.h b/include/linux/security.h > index 659071c2e57c..52c200810004 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -1829,5 +1829,28 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) > #endif /* CONFIG_SECURITY */ > #endif /* CONFIG_BPF_SYSCALL */ > > +#ifdef CONFIG_INTEL_SGX > +struct sgx_sigstruct; > +#ifdef CONFIG_SECURITY > +int security_enclave_load(struct file *encl, size_t start, size_t end, > + size_t flags, struct vm_area_struct *source); > +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *source); > +#else > +static inline int security_enclave_load(struct file *encl, size_t start, > + size_t end, struct vm_area_struct *src) > +{ > + return 0; > +} > + > +static inline int security_enclave_init(struct file *encl, > + struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *src) > +{ > + return 0; > +} > +#endif /* CONFIG_SECURITY */ > +#endif /* CONFIG_INTEL_SGX */ > + > #endif /* ! __LINUX_SECURITY_H */ > > diff --git a/security/Makefile b/security/Makefile > index c598b904938f..1bab8f1344b6 100644 > --- a/security/Makefile > +++ b/security/Makefile > @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ > obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ > obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ > obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o > +obj-$(CONFIG_INTEL_SGX) += lsm_ema.o This belongs in a subdirectory. > > # Object integrity file lists > subdir-$(CONFIG_INTEGRITY) += integrity > diff --git a/security/lsm_ema.c b/security/lsm_ema.c > new file mode 100644 > index 000000000000..68fae0724d37 > --- /dev/null > +++ b/security/lsm_ema.c > @@ -0,0 +1,132 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2016-18 Intel Corporation. > + > +#include <linux/lsm_ema.h> > +#include <linux/slab.h> > + > +static struct kmem_cache *lsm_ema_cache; > +static size_t lsm_ema_data_size; > +static int lsm_ema_cache_decisions = 1; > + > +void lsm_ema_global_init(size_t ema_size) > +{ > + BUG_ON(lsm_ema_data_size > 0); > + > + lsm_ema_data_size = ema_size; > + > + ema_size += sizeof(struct lsm_ema); > + ema_size = max(ema_size, sizeof(struct lsm_ema_map)); > + lsm_ema_cache = kmem_cache_create("lsm_ema_cache", ema_size, > + __alignof__(struct lsm_ema), > + SLAB_PANIC, NULL); > + > +} > + > +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *p) > +{ > + struct lsm_ema_map *map; > + > + map = (typeof(map))atomic_long_read(p); > + if (!map) { > + long n; > + > + map = (typeof(map))lsm_alloc_ema(); > + if (!map) > + return NULL; > + > + INIT_LIST_HEAD(&map->list); > + mutex_init(&map->lock); > + > + n = atomic_long_cmpxchg(p, 0, (long)map); > + if (n) { > + atomic_long_t a; > + atomic_long_set(&a, (long)map); > + map = (typeof(map))n; > + lsm_free_ema_map(&a); > + } > + } > + return map; > +} > + > +void lsm_free_ema_map(atomic_long_t *p) > +{ > + struct lsm_ema_map *map; > + struct lsm_ema *ema, *n; > + > + map = (typeof(map))atomic_long_read(p); > + if (!map) > + return; > + > + BUG_ON(mutex_is_locked(&map->lock)); > + > + list_for_each_entry_safe(ema, n, &map->list, link) > + lsm_free_ema(ema); > + kmem_cache_free(lsm_ema_cache, map); > +} > + > +struct lsm_ema *lsm_alloc_ema(void) > +{ > + return kmem_cache_zalloc(lsm_ema_cache, GFP_KERNEL); > +} > + > +void lsm_free_ema(struct lsm_ema *ema) > +{ > + list_del(&ema->link); > + if (ema->source) > + fput(ema->source); > + kmem_cache_free(lsm_ema_cache, ema); > +} > + > +void lsm_init_ema(struct lsm_ema *ema, size_t start, size_t end, > + struct file *source) > +{ > + INIT_LIST_HEAD(&ema->link); > + ema->start = start; > + ema->end = end; > + if (!lsm_ema_cache_decisions && source) > + ema->source = get_file(source); > +} > + > +int lsm_merge_ema(struct lsm_ema *p, struct lsm_ema_map *map) > +{ > + struct lsm_ema *prev = list_prev_entry(p, link); > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + if (&prev->link == &map->list || prev->end != p->start || > + prev->source != p->source || > + memcmp(prev + 1, p + 1, lsm_ema_data_size)) > + return 0; > + > + p->start = prev->start; > + fput(prev->source); > + lsm_free_ema(prev); > + return 1; > +} > + > +struct lsm_ema *lsm_split_ema(struct lsm_ema *p, size_t at, > + struct lsm_ema_map *map) > +{ > + struct lsm_ema *n; > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + if (at <= p->start || at >= p->end) > + return p; > + > + n = lsm_alloc_ema(); > + if (likely(n)) { > + lsm_init_ema(n, p->start, at, p->source); > + memcpy(n + 1, p + 1, lsm_ema_data_size); > + p->start = at; > + list_add_tail(&n->link, &p->link); > + } > + return n; > +} > + > +static int __init set_ema_cache_decisions(char *str) > +{ > + lsm_ema_cache_decisions = (strcmp(str, "0") && strcmp(str, "off")); > + return 1; > +} > +__setup("lsm.ema.cache_decisions=", set_ema_cache_decisions); > diff --git a/security/security.c b/security/security.c > index f493db0bf62a..d50883f18be2 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -17,6 +17,7 @@ > #include <linux/init.h> > #include <linux/kernel.h> > #include <linux/lsm_hooks.h> > +#include <linux/lsm_ema.h> > #include <linux/integrity.h> > #include <linux/ima.h> > #include <linux/evm.h> > @@ -41,7 +42,9 @@ static struct kmem_cache *lsm_file_cache; > static struct kmem_cache *lsm_inode_cache; > > char *lsm_names; > -static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init; > +static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init = { > + .lbs_file = sizeof(atomic_long_t) * IS_ENABLED(CONFIG_INTEL_SGX), > +}; This belongs in the module specific code. It does not belong here. > > /* Boot-time LSM user choice */ > static __initdata const char *chosen_lsm_order; > @@ -169,6 +172,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed) > lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc); > lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg); > lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task); > + lsm_set_blob_size(&needed->lbs_ema_data, &blob_sizes.lbs_ema_data); > } > > /* Prepare LSM for initialization. */ > @@ -314,6 +318,7 @@ static void __init ordered_lsm_init(void) > lsm_inode_cache = kmem_cache_create("lsm_inode_cache", > blob_sizes.lbs_inode, 0, > SLAB_PANIC, NULL); > + lsm_ema_global_init(blob_sizes.lbs_ema_data); > > lsm_early_cred((struct cred *) current->cred); > lsm_early_task(current); > @@ -1357,6 +1362,7 @@ void security_file_free(struct file *file) > blob = file->f_security; > if (blob) { > file->f_security = NULL; > + lsm_free_ema_map(blob); > kmem_cache_free(lsm_file_cache, blob); > } > } > @@ -1420,6 +1426,7 @@ int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, > { > return call_int_hook(file_mprotect, 0, vma, reqprot, prot); > } > +EXPORT_SYMBOL(security_file_mprotect); > > int security_file_lock(struct file *file, unsigned int cmd) > { > @@ -2355,3 +2362,41 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) > call_void_hook(bpf_prog_free_security, aux); > } > #endif /* CONFIG_BPF_SYSCALL */ > + > +#ifdef CONFIG_INTEL_SGX > +int security_enclave_load(struct file *encl, size_t start, size_t end, > + size_t flags, struct vm_area_struct *src) You are mixing module specific code into the infrastructure. All of this should be in the enclave code. None of it should be here. > +{ > + struct lsm_ema_map *map; > + struct lsm_ema *ema; > + int rc; > + > + map = lsm_init_or_get_ema_map(encl->f_security); > + if (unlikely(!map)) > + return -ENOMEM; > + > + ema = lsm_alloc_ema(); > + if (unlikely(!ema)) > + return -ENOMEM; > + > + lsm_init_ema(ema, start, end, src->vm_file); > + rc = call_int_hook(enclave_load, 0, encl, ema, flags, src); > + if (!rc) > + rc = lsm_lock_ema(map); > + if (!rc) { > + rc = lsm_insert_ema(map, ema); > + lsm_unlock_ema(map); > + } > + if (rc) > + lsm_free_ema(ema); > + return rc; > +} > +EXPORT_SYMBOL(security_enclave_load); > + > +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *src) > +{ > + return call_int_hook(enclave_init, 0, encl, sigstruct, src); > +} > +EXPORT_SYMBOL(security_enclave_init); > +#endif /* CONFIG_INTEL_SGX */
Hi Casey, > From: Casey Schaufler [mailto:casey@schaufler-ca.com] > Sent: Thursday, June 27, 2019 3:07 PM > > Don't use "lsm_ema". This isn't LSM infrastructure. > Three letter abbreviations are easy to type, but are doomed to encounter > conflicts and lead to confusion. > I suggest that you use "enclave", because it doesn't start off > conflicting with anything and is descriptive. > > This code should not be mixed in with the LSM infrastructure. > It should all be contained in its own module, under security/enclave. lsm_ema is *intended* to be part of the LSM infrastructure. It is going to be shared among all LSMs that would like to track enclave pages and their origins. And they could be extended to store more information as deemed appropriate by the LSM module. The last patch of this series shows how to extend EMA inside SELinux. > > > diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h new > > file mode 100644 index 000000000000..a09b8f96da05 > > --- /dev/null > > +++ b/include/linux/lsm_ema.h > > There's no need for this header to be used outside the enclave > LSM. It should be "security/enclave/enclave.h" This header file is supposed to be used by all LSM modules, similar to lsm_hooks.h. Hence it is placed in the same location. > > > > @@ -0,0 +1,171 @@ > > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > > +/** > > + * Enclave Memory Area interface for LSM modules > > + * > > + * Copyright(c) 2016-19 Intel Corporation. > > + */ > > + > > +#ifndef _LSM_EMA_H_ > > +#define _LSM_EMA_H_ > > + > > +#include <linux/list.h> > > +#include <linux/mutex.h> > > +#include <linux/fs.h> > > +#include <linux/file.h> > > + > > +/** > > + * lsm_ema - LSM Enclave Memory Area structure > > How about s/lsm_ema/enclave/ ? I understand your suggestion, but this structure is shared among all LSMs. And I think lsm_ema is pretty descriptive without being too verbose. > > > + * > > + * Data structure to track origins of enclave pages > > + * > > + * @link: > > + * Link to adjacent EMAs. EMAs are sorted by their addresses in > ascending > > + * order > > + * @start: > > + * Starting address > > + * @end: > > + * Ending address > > + * @source: > > + * File from which this range was loaded from, or NULL if not loaded > from > > + * any files > > + */ > > +struct lsm_ema { > > + struct list_head link; > > + size_t start; > > + size_t end; > > + struct file *source; > > +}; > > + > > +#define lsm_ema_data(ema, blob_sizes) \ > > + ((char *)((struct lsm_ema *)(ema) + 1) + blob_sizes.lbs_ema_data) > > Who uses this? The enclave LSM? Convention would have this > be selinux_enclave(ema) for the SELinux code. This is > inconsistent with the way other blobs are handled. This is to be used in various LSMs. As you can see in the last patch of this series, selinux_ema() is defined as a wrapper of this macro. > > > + > > +/** > > + * lsm_ema_map - LSM Enclave Memory Map structure > > enclave_map > > > + * > > + * Container for EMAs of an enclave > > + * > > + * @list: > > + * Head of a list of sorted EMAs > > + * @lock: > > + * Acquire before querying/updateing the list EMAs > > + */ > > +struct lsm_ema_map { > > + struct list_head list; > > + struct mutex lock; > > +}; > > + > > +/** > > + * These are functions to be used by the LSM framework, and must be > defined > > + * regardless CONFIG_INTEL_SGX is enabled or not. > > Not acceptable for the LSM infrastructure. They > are inconsistent with the way data is used there. I'm not sure I understand this comment. > > > + */ > > + > > +#ifdef CONFIG_INTEL_SGX > > +void lsm_ema_global_init(size_t); > > +void lsm_free_ema_map(atomic_long_t *); > > +#else > > +static inline void lsm_ema_global_init(size_t ema_data_size) > > +{ > > +} > > + > > +static inline void lsm_free_ema_map(atomic_long_t *p) > > +{ > > +} > > +#endif > > + > > +/** > > + * Below are APIs to be used by LSM modules > > + */ > > + > > +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *); > > +struct lsm_ema *lsm_alloc_ema(void); > > Do you mean security_alloc_enclave()? > That would go into security/security.h No. Neither lsm_alloc_ema() above, nor lsm_free_ema() below, is LSM hook. They are APIs to deal with EMAs. > > > +void lsm_free_ema(struct lsm_ema *); > > Do you mean security_free_enclave()? > That would go into security/security.h > > > +void lsm_init_ema(struct lsm_ema *, size_t, size_t, struct file *); > > This goes in the enclave LSM. There's NO enclave LSM. This patch is introducing new LSM hooks applicable to all LSM modules, but not introducing new LSM modules. > > > +int lsm_merge_ema(struct lsm_ema *, struct lsm_ema_map *); > > +struct lsm_ema *lsm_split_ema(struct lsm_ema *, size_t, struct > lsm_ema_map *); > > + > > +static inline struct lsm_ema_map *lsm_get_ema_map(struct file *f) > > +{ > > + return (void *)atomic_long_read(f->f_security); > > +} > > + > > +static inline int __must_check lsm_lock_ema(struct lsm_ema_map *map) > > +{ > > + return mutex_lock_interruptible(&map->lock); > > +} > > + > > +static inline void lsm_unlock_ema(struct lsm_ema_map *map) > > +{ > > + mutex_unlock(&map->lock); > > +} > > + > > +static inline struct lsm_ema *lsm_prev_ema(struct lsm_ema *p, > > + struct lsm_ema_map *map) > > +{ > > + p = list_prev_entry(p, link); > > + return &p->link == &map->list ? NULL : p; > > +} > > + > > +static inline struct lsm_ema *lsm_next_ema(struct lsm_ema *p, > > + struct lsm_ema_map *map) > > +{ > > + p = list_next_entry(p, link); > > + return &p->link == &map->list ? NULL : p; > > +} > > + > > +static inline struct lsm_ema *lsm_find_ema(struct lsm_ema_map *map, > size_t a) > > +{ > > + struct lsm_ema *p; > > + > > + BUG_ON(!mutex_is_locked(&map->lock)); > > + > > + list_for_each_entry(p, &map->list, link) > > + if (a < p->end) > > + break; > > + return &p->link == &map->list ? NULL : p; > > +} > > + > > +static inline int lsm_insert_ema(struct lsm_ema_map *map, struct > lsm_ema *n) > > +{ > > + struct lsm_ema *p = lsm_find_ema(map, n->start); > > + > > + if (!p) > > + list_add_tail(&n->link, &map->list); > > + else if (n->end <= p->start) > > + list_add_tail(&n->link, &p->link); > > + else > > + return -EEXIST; > > + > > + lsm_merge_ema(n, map); > > + if (p) > > + lsm_merge_ema(p, map); > > + return 0; > > +} > > + > > +static inline int lsm_for_each_ema(struct lsm_ema_map *map, size_t > start, > > + size_t end, int (*cb)(struct lsm_ema *, > > + void *), void *arg) > > +{ > > + struct lsm_ema *ema; > > + int rc; > > + > > + ema = lsm_find_ema(map, start); > > + while (ema && end > ema->start) { > > + if (start > ema->start) > > + lsm_split_ema(ema, start, map); > > + if (end < ema->end) > > + ema = lsm_split_ema(ema, end, map); > > + > > + rc = (*cb)(ema, arg); > > + lsm_merge_ema(ema, map); > > + if (rc) > > + return rc; > > + > > + ema = lsm_next_ema(ema, map); > > + } > > + > > + if (ema) > > + lsm_merge_ema(ema, map); > > + return 0; > > +} > > There is no way that these belong as part of the LSM > infrastructure. If you need an enclave management API > you need to find some other place for it. They are NO enclave management APIs. They don't manage enclaves. They track origins of enclave pages. They are needed by all LSMs. As I stated in the cover letter, the primary question is how to prevent SGX from being abused as a backdoor to make executable pages that would otherwise not be executable without SGX. Any LSM module unaware of that would leave that "hole" open. So tracking enclave pages will become a common task for all LSMs that care page protections, and that's why I place it inside LSM infrastructure.
On 6/27/2019 3:52 PM, Xing, Cedric wrote: > Hi Casey, > >> From: Casey Schaufler [mailto:casey@schaufler-ca.com] >> Sent: Thursday, June 27, 2019 3:07 PM >> >> Don't use "lsm_ema". This isn't LSM infrastructure. >> Three letter abbreviations are easy to type, but are doomed to encounter >> conflicts and lead to confusion. >> I suggest that you use "enclave", because it doesn't start off >> conflicting with anything and is descriptive. >> >> This code should not be mixed in with the LSM infrastructure. >> It should all be contained in its own module, under security/enclave. > lsm_ema is *intended* to be part of the LSM infrastructure. That's not going to fly, not for a minute. > It is going to be shared among all LSMs that would like to track enclave pages and their origins. That's true for InfiniBand, tun and sctp as well. Look at their implementations. > And they could be extended to store more information as deemed appropriate by the LSM module. Which is what blobs are for, but that does not appear to be how you're using either the file blob or your new ema blob. > The last patch of this series shows how to extend EMA inside SELinux. I don't see (but I admit the code doesn't make a lot of sense to me) anything you couldn't do in the SELinux code by adding data to the file blob. The data you're adding to the LSM infrastructure doesn't belong there, and it doesn't need to be there. > >>> diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h new >>> file mode 100644 index 000000000000..a09b8f96da05 >>> --- /dev/null >>> +++ b/include/linux/lsm_ema.h >> There's no need for this header to be used outside the enclave >> LSM. It should be "security/enclave/enclave.h" > This header file is supposed to be used by all LSM modules, similar to lsm_hooks.h. Hence it is placed in the same location. > >> >>> @@ -0,0 +1,171 @@ >>> +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ >>> +/** >>> + * Enclave Memory Area interface for LSM modules >>> + * >>> + * Copyright(c) 2016-19 Intel Corporation. >>> + */ >>> + >>> +#ifndef _LSM_EMA_H_ >>> +#define _LSM_EMA_H_ >>> + >>> +#include <linux/list.h> >>> +#include <linux/mutex.h> >>> +#include <linux/fs.h> >>> +#include <linux/file.h> >>> + >>> +/** >>> + * lsm_ema - LSM Enclave Memory Area structure >> How about s/lsm_ema/enclave/ ? > I understand your suggestion, but this structure is shared among all LSMs. And I think lsm_ema is pretty descriptive without being too verbose. > >>> + * >>> + * Data structure to track origins of enclave pages >>> + * >>> + * @link: >>> + * Link to adjacent EMAs. EMAs are sorted by their addresses in >> ascending >>> + * order >>> + * @start: >>> + * Starting address >>> + * @end: >>> + * Ending address >>> + * @source: >>> + * File from which this range was loaded from, or NULL if not loaded >> from >>> + * any files >>> + */ >>> +struct lsm_ema { >>> + struct list_head link; >>> + size_t start; >>> + size_t end; >>> + struct file *source; >>> +}; >>> + >>> +#define lsm_ema_data(ema, blob_sizes) \ >>> + ((char *)((struct lsm_ema *)(ema) + 1) + blob_sizes.lbs_ema_data) >> Who uses this? The enclave LSM? Convention would have this >> be selinux_enclave(ema) for the SELinux code. This is >> inconsistent with the way other blobs are handled. > This is to be used in various LSMs. As you can see in the last patch of this series, selinux_ema() is defined as a wrapper of this macro. > >>> + >>> +/** >>> + * lsm_ema_map - LSM Enclave Memory Map structure >> enclave_map >> >>> + * >>> + * Container for EMAs of an enclave >>> + * >>> + * @list: >>> + * Head of a list of sorted EMAs >>> + * @lock: >>> + * Acquire before querying/updateing the list EMAs >>> + */ >>> +struct lsm_ema_map { >>> + struct list_head list; >>> + struct mutex lock; >>> +}; >>> + >>> +/** >>> + * These are functions to be used by the LSM framework, and must be >> defined >>> + * regardless CONFIG_INTEL_SGX is enabled or not. >> Not acceptable for the LSM infrastructure. They >> are inconsistent with the way data is used there. > I'm not sure I understand this comment. It means that your definition and use of the lsm_ema_blob does not match the way other blobs are managed and used. The LSM infrastructure uses these entries in a very particular way, and you're trying to use it differently. Your might be able to change the rest of the enclave system to use it correctly, or you might be able to find a different place for it. >>> + */ >>> + >>> +#ifdef CONFIG_INTEL_SGX >>> +void lsm_ema_global_init(size_t); >>> +void lsm_free_ema_map(atomic_long_t *); >>> +#else >>> +static inline void lsm_ema_global_init(size_t ema_data_size) >>> +{ >>> +} >>> + >>> +static inline void lsm_free_ema_map(atomic_long_t *p) >>> +{ >>> +} >>> +#endif >>> + >>> +/** >>> + * Below are APIs to be used by LSM modules >>> + */ >>> + >>> +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *); >>> +struct lsm_ema *lsm_alloc_ema(void); >> Do you mean security_alloc_enclave()? >> That would go into security/security.h > No. Neither lsm_alloc_ema() above, nor lsm_free_ema() below, is LSM hook. They are APIs to deal with EMAs. > >>> +void lsm_free_ema(struct lsm_ema *); >> Do you mean security_free_enclave()? >> That would go into security/security.h >> >>> +void lsm_init_ema(struct lsm_ema *, size_t, size_t, struct file *); >> This goes in the enclave LSM. > There's NO enclave LSM. This patch is introducing new LSM hooks applicable to all LSM modules, but not introducing new LSM modules. > >>> +int lsm_merge_ema(struct lsm_ema *, struct lsm_ema_map *); >>> +struct lsm_ema *lsm_split_ema(struct lsm_ema *, size_t, struct >> lsm_ema_map *); >>> + >>> +static inline struct lsm_ema_map *lsm_get_ema_map(struct file *f) >>> +{ >>> + return (void *)atomic_long_read(f->f_security); >>> +} >>> + >>> +static inline int __must_check lsm_lock_ema(struct lsm_ema_map *map) >>> +{ >>> + return mutex_lock_interruptible(&map->lock); >>> +} >>> + >>> +static inline void lsm_unlock_ema(struct lsm_ema_map *map) >>> +{ >>> + mutex_unlock(&map->lock); >>> +} >>> + >>> +static inline struct lsm_ema *lsm_prev_ema(struct lsm_ema *p, >>> + struct lsm_ema_map *map) >>> +{ >>> + p = list_prev_entry(p, link); >>> + return &p->link == &map->list ? NULL : p; >>> +} >>> + >>> +static inline struct lsm_ema *lsm_next_ema(struct lsm_ema *p, >>> + struct lsm_ema_map *map) >>> +{ >>> + p = list_next_entry(p, link); >>> + return &p->link == &map->list ? NULL : p; >>> +} >>> + >>> +static inline struct lsm_ema *lsm_find_ema(struct lsm_ema_map *map, >> size_t a) >>> +{ >>> + struct lsm_ema *p; >>> + >>> + BUG_ON(!mutex_is_locked(&map->lock)); >>> + >>> + list_for_each_entry(p, &map->list, link) >>> + if (a < p->end) >>> + break; >>> + return &p->link == &map->list ? NULL : p; >>> +} >>> + >>> +static inline int lsm_insert_ema(struct lsm_ema_map *map, struct >> lsm_ema *n) >>> +{ >>> + struct lsm_ema *p = lsm_find_ema(map, n->start); >>> + >>> + if (!p) >>> + list_add_tail(&n->link, &map->list); >>> + else if (n->end <= p->start) >>> + list_add_tail(&n->link, &p->link); >>> + else >>> + return -EEXIST; >>> + >>> + lsm_merge_ema(n, map); >>> + if (p) >>> + lsm_merge_ema(p, map); >>> + return 0; >>> +} >>> + >>> +static inline int lsm_for_each_ema(struct lsm_ema_map *map, size_t >> start, >>> + size_t end, int (*cb)(struct lsm_ema *, >>> + void *), void *arg) >>> +{ >>> + struct lsm_ema *ema; >>> + int rc; >>> + >>> + ema = lsm_find_ema(map, start); >>> + while (ema && end > ema->start) { >>> + if (start > ema->start) >>> + lsm_split_ema(ema, start, map); >>> + if (end < ema->end) >>> + ema = lsm_split_ema(ema, end, map); >>> + >>> + rc = (*cb)(ema, arg); >>> + lsm_merge_ema(ema, map); >>> + if (rc) >>> + return rc; >>> + >>> + ema = lsm_next_ema(ema, map); >>> + } >>> + >>> + if (ema) >>> + lsm_merge_ema(ema, map); >>> + return 0; >>> +} >> There is no way that these belong as part of the LSM >> infrastructure. If you need an enclave management API >> you need to find some other place for it. > They are NO enclave management APIs. They don't manage enclaves. They track origins of enclave pages. They are needed by all LSMs. > > As I stated in the cover letter, the primary question is how to prevent SGX from being abused as a backdoor to make executable pages that would otherwise not be executable without SGX. Any LSM module unaware of that would leave that "hole" open. So tracking enclave pages will become a common task for all LSMs that care page protections, and that's why I place it inside LSM infrastructure. Page protections are an important part of many security features, but that's beside the point. The LSM system provides mechanism for providing additional restrictions to existing security mechanisms. First, you create the security mechanism (e.g. enclaves) then you add LSM hooks so that security modules (e.g. SELinux) can apply their own policies in addition. In support of this, the LSM blob mechanism allows security modules to maintain their own information about the system components (e.g. file, inode, cred, task) they care about. The LSM infrastructure does not itself provide or support security data or policy. That's strictly for the modules to do.
> From: Casey Schaufler [mailto:casey@schaufler-ca.com] > Sent: Thursday, June 27, 2019 4:37 PM > >> > >> This code should not be mixed in with the LSM infrastructure. > >> It should all be contained in its own module, under security/enclave. > > lsm_ema is *intended* to be part of the LSM infrastructure. > > That's not going to fly, not for a minute. Why not, if there's a need for it? And what's the concern here if it becomes part of the LSM infrastructure. > > > It is going to be shared among all LSMs that would like to track > enclave pages and their origins. > > That's true for InfiniBand, tun and sctp as well. Look at their > implementations. As far as I can tell, InfiniBand, tun and sctp, all of them seemed used inside SELinux only. If you had a chance to look at v1 of my series, I started by burying everything inside SELinux too. But Stephen pointed out such tracking would be needed by all LSMs so code duplication might be a concern. Thus I responded by moving it into LSM infrastructure. > > > And they could be extended to store more information as deemed > appropriate by the LSM module. > > Which is what blobs are for, but that does not appear to be how > you're using either the file blob or your new ema blob. A lsm_ema_map pointer is stored in file->f_security. Each lsm_ema_map contains a list of lsm_ema structures. In my last patch, SELinux stores a ema_security_struct with every ema, by setting selinux_blob_sizes.lbs_ema_data to sizeof(ema_security_struct). ema_security_struct is initialized in selinux_enclave_load(), and checked in enclave_mprotect(), which is a subroutine of selinux_file_mprotect(). BTW, it is alloced/freed automatically by LSM infrastructure in security_enclave_load()/security_file_free(). > > > The last patch of this series shows how to extend EMA inside SELinux. > > I don't see (but I admit the code doesn't make a lot of sense to me) > anything you couldn't do in the SELinux code by adding data to the > file blob. The data you're adding to the LSM infrastructure doesn't > belong there, and it doesn't need to be there. You are correct. My v1 did it inside SELinux. The key question I think is whether only SELinux needs it, or all LSMs need it. Stephen thought it was the latter (and I agree with him) so I moved it into the LSM infrastructure to be shared, just like the auditing code. > >> Not acceptable for the LSM infrastructure. They > >> are inconsistent with the way data is used there. > > I'm not sure I understand this comment. > > It means that your definition and use of the lsm_ema_blob > does not match the way other blobs are managed and used. > The LSM infrastructure uses these entries in a very particular > way, and you're trying to use it differently. Your might be > able to change the rest of the enclave system to use it > correctly, or you might be able to find a different place > for it. I'm still not sure why you think this (lbs_ema_data) is inconsistent with other blobs. Same as all other blobs, an LSM requests it by storing the needed size in it, and is assigned an offset, and the buffer is allocated/freed by the infrastructure. Am I missing anything? > > > > As I stated in the cover letter, the primary question is how to > prevent SGX from being abused as a backdoor to make executable pages > that would otherwise not be executable without SGX. Any LSM module > unaware of that would leave that "hole" open. So tracking enclave pages > will become a common task for all LSMs that care page protections, and > that's why I place it inside LSM infrastructure. > > Page protections are an important part of many security features, > but that's beside the point. The LSM system provides mechanism for > providing additional restrictions to existing security mechanisms. > First, you create the security mechanism (e.g. enclaves) then you > add LSM hooks so that security modules (e.g. SELinux) can apply > their own policies in addition. In support of this, the LSM blob > mechanism allows security modules to maintain their own information > about the system components (e.g. file, inode, cred, task) they > care about. The LSM infrastructure does not itself provide or > support security data or policy. That's strictly for the modules > to do. Agreed! EMA doesn't dictate policies for sure. Is it considered "security data"? I'm not sure the definition of "security data" here. It does store some "data", something that multiple LSM modules would need to duplicate if not pulled into a common place. It is meant to be a "helper" data structure, just like the auditing code.
On 6/27/19 4:19 PM, Xing, Cedric wrote:
>> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
>> owner@vger.kernel.org] On Behalf Of Stephen Smalley
>> Sent: Tuesday, June 25, 2019 2:10 PM
>>
>> On 6/21/19 5:22 PM, Xing, Cedric wrote:
>>>> From: Christopherson, Sean J
>>>> Sent: Wednesday, June 19, 2019 3:24 PM
>>>>
>>>> Intended use of each permission:
>>>>
>>>> - SGX_EXECDIRTY: dynamically load code within the enclave itself
>>>> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g.
>>>> Graphene
>>>
>>> Why does it matter whether a code page is measured or not?
>>
>> It won't be incorporated into an attestation?
>
> Yes, it will. And because of that, I don't think LSM should care.
>
>>
>>>
>>>> - SGX_EXECANON: load code from anonymous memory (likely Graphene)
>>>
>>> Graphene doesn't load code from anonymous memory. It loads code
>> dynamically though, as in SGX_EXECDIRTY case.
>>
>> So do we expect EXECANON to never be triggered at all?
>
> I don't think so. And from security perspective, the decision I think shall base on whether the source pages are (allowed to be made) executable.
>
>>
>>>
>>>> - SGX_EXECUTE: load an enclave from a file, i.e. normal behavior
>>>
>>> Why is SGX_EXECUTE needed from security perspective? Or why isn't
>> FILE__EXECUTE sufficient?
>>
>> Splitting the SGX permissions from the regular ones allows distinctions
>> to be made between what can be executed in the host process and what can
>> be executed in the enclave. The host process may be allowed
>> FILE__EXECUTE to numerous files that do not contain any code ever
>> intended to be executed within the enclave.
>
> Given an enclave and its host process, any executable contents could be allowed in
> 1) Neither the enclave nor the host
> 2) Enclave only
> 3) Host only
> 4) Both the enclave and the host
>
> Given the fact that enclave can access host's memory, if a piece of code is NOT allowed in the host, then it shouldn't be allowed in enclave either. So #2 shall never happen.
>
> An enclave dictates/enforces its own contents cryptographically, so it's unnecessary to enforce #3 by LSM IMO.
>
> Then #1 and #4 are the only 2 cases to be supported - a single FILE__EXECUTE is sufficient.
>
> I'm not objecting to new permissions to make things more explicit, but that'd require updates to user mode tools. I think it just easier to reuse existing permissions.
FWIW, adding new permissions only requires updating policy
configuration, not userspace code/tools. But in any event, we can reuse
the execute-related permissions if it makes sense but still consider
introducing additional, new permissions, possibly in a separate
"enclave" security class, if we want explicit control over enclave
loading, e.g. ENCLAVE__LOAD, ENCLAVE__INIT, etc.
One residual concern I have with the reuse of FILE__EXECUTE is using it
for the sigstruct file as the fallback case. If the sigstruct is always
part of the same file as the code, then it probably doesn't matter. But
otherwise, it is somewhat odd to have to allow the host process to
execute from the sigstruct file if it is only data (the signature).
On 6/27/19 2:56 PM, Cedric Xing wrote: > SGX enclaves are loaded from pages in regular memory. Given the ability to > create executable pages, the newly added SGX subsystem may present a backdoor > for adversaries to circumvent LSM policies, such as creating an executable > enclave page from a modified regular page that would otherwise not be made > executable as prohibited by LSM. Therefore arises the primary question of > whether an enclave page should be allowed to be created from a given source > page in regular memory. > > A related question is whether to grant/deny a mprotect() request on a given > enclave page/range. mprotect() is traditionally covered by > security_file_mprotect() hook, however, enclave pages have a different lifespan > than either MAP_PRIVATE or MAP_SHARED. Particularly, MAP_PRIVATE pages have the > same lifespan as the VMA while MAP_SHARED pages have the same lifespan as the > backing file (on disk), but enclave pages have the lifespan of the enclave’s > file descriptor. For example, enclave pages could be munmap()’ed then mmap()’ed > again without losing contents (like MAP_SHARED), but all enclave pages will be > lost once its file descriptor has been closed (like MAP_PRIVATE). That said, > LSM modules need some new data structure for tracking protections of enclave > pages/ranges so that they can make proper decisions at mmap()/mprotect() > syscalls. > > The last question, which is orthogonal to the 2 above, is whether or not to > allow a given enclave to launch/run. Enclave pages are not visible to the rest > of the system, so to some extent offer a better place for malicious software to > hide. Thus, it is sometimes desirable to whitelist/blacklist enclaves by their > measurements, signing public keys, or image files. > > To address the questions above, 2 new LSM hooks are added for enclaves. > - security_enclave_load() – This hook allows LSM to decide whether or not to > allow instantiation of a range of enclave pages using the specified VMA. It > is invoked when a range of enclave pages is about to be loaded. It serves 3 > purposes: 1) indicate to LSM that the file struct in subject is an enclave; > 2) allow LSM to decide whether or not to instantiate those pages and 3) > allow LSM to initialize internal data structures for tracking > origins/protections of those pages. > - security_enclave_init() – This hook allows whitelisting/blacklisting or > performing whatever checks deemed appropriate before an enclave is allowed > to run. An LSM module may opt to use the file backing the SIGSTRUCT as a > proxy to dictate allowed protections for anonymous pages. > > mprotect() of enclave pages continue to be governed by > security_file_mprotect(), with the expectation that LSM is able to distinguish > between regular and enclave pages inside the hook. For mmap(), the SGX > subsystem is expected to invoke security_file_mprotect() explicitly to check > protections against the requested protections for existing enclave pages. As > stated earlier, enclave pages have different lifespan than the existing > MAP_PRIVATE and MAP_SHARED pages, so would require a new data structure outside > of VMA to track their protections and/or origins. Enclave Memory Area (or EMA > for short) has been introduced to address the need. EMAs are maintained by the > LSM framework for all LSM modules to share. EMAs will be instantiated for > enclaves only so will not impose memory/performance overheads for regular > applications/files. Please see include/linux/lsm_ema.h and security/lsm_ema.c > for details. > > A new setup parameter – lsm.ema.cache_decisions has been introduced to offer > the choice between memory consumption and accuracy of audit logs. Enabling > lsm.ema.cache_decisions causes LSM framework NOT to keep backing files open for > EMAs. While that saves memory, it requires LSM modules to make and cache > decisions ahead of time, and makes it difficult for LSM modules to generate > accurate audit logs. System administrators are expected to run LSM in > permissive mode with lsm.ema.cache_decisions off to determine the minimal > permissions needed, and then turn it back on in enforcing mode for optimal > performance and memory usage. lsm.ema.cache_decisions is on by default and > could be turned off by appending “lsm.ema.cache_decisions=0” or > “lsm.ema.cache_decisions=off” to the kernel command line. This seems problematic on a few fronts: - Specifying it as a boot parameter requires teaching admins / policy developers to do something in addition to what they are already doing when they want to create policy, - Limiting it to a boot parameter requires a reboot to change the mode of operation, whereas SELinux offers runtime toggling of permissive mode and even per-process (domain) permissive mode (and so does AppArmor), - In the cache_decisions=1 case, do we get any auditing at all? If not, that's a problem. We want auditing not only when we are generating/learning policy but also in operation. - There is the potential for inconsistencies to arise between the enforcement applied with different cache_decisions values. I would suggest that we just never cache the decision and accept the cost if we are going to take this approach. > > Signed-off-by: Cedric Xing <cedric.xing@intel.com> > --- > include/linux/lsm_ema.h | 171 ++++++++++++++++++++++++++++++++++++++ > include/linux/lsm_hooks.h | 29 +++++++ > include/linux/security.h | 23 +++++ > security/Makefile | 1 + > security/lsm_ema.c | 132 +++++++++++++++++++++++++++++ > security/security.c | 47 ++++++++++- > 6 files changed, 402 insertions(+), 1 deletion(-) > create mode 100644 include/linux/lsm_ema.h > create mode 100644 security/lsm_ema.c > > diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h > new file mode 100644 > index 000000000000..a09b8f96da05 > --- /dev/null > +++ b/include/linux/lsm_ema.h > @@ -0,0 +1,171 @@ > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > +/** > + * Enclave Memory Area interface for LSM modules > + * > + * Copyright(c) 2016-19 Intel Corporation. > + */ > + > +#ifndef _LSM_EMA_H_ > +#define _LSM_EMA_H_ > + > +#include <linux/list.h> > +#include <linux/mutex.h> > +#include <linux/fs.h> > +#include <linux/file.h> > + > +/** > + * lsm_ema - LSM Enclave Memory Area structure > + * > + * Data structure to track origins of enclave pages > + * > + * @link: > + * Link to adjacent EMAs. EMAs are sorted by their addresses in ascending > + * order > + * @start: > + * Starting address > + * @end: > + * Ending address > + * @source: > + * File from which this range was loaded from, or NULL if not loaded from > + * any files > + */ > +struct lsm_ema { > + struct list_head link; > + size_t start; > + size_t end; > + struct file *source; > +}; > + > +#define lsm_ema_data(ema, blob_sizes) \ > + ((char *)((struct lsm_ema *)(ema) + 1) + blob_sizes.lbs_ema_data) > + > +/** > + * lsm_ema_map - LSM Enclave Memory Map structure > + * > + * Container for EMAs of an enclave > + * > + * @list: > + * Head of a list of sorted EMAs > + * @lock: > + * Acquire before querying/updateing the list EMAs > + */ > +struct lsm_ema_map { > + struct list_head list; > + struct mutex lock; > +}; > + > +/** > + * These are functions to be used by the LSM framework, and must be defined > + * regardless CONFIG_INTEL_SGX is enabled or not. > + */ > + > +#ifdef CONFIG_INTEL_SGX > +void lsm_ema_global_init(size_t); > +void lsm_free_ema_map(atomic_long_t *); > +#else > +static inline void lsm_ema_global_init(size_t ema_data_size) > +{ > +} > + > +static inline void lsm_free_ema_map(atomic_long_t *p) > +{ > +} > +#endif > + > +/** > + * Below are APIs to be used by LSM modules > + */ > + > +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *); > +struct lsm_ema *lsm_alloc_ema(void); > +void lsm_free_ema(struct lsm_ema *); > +void lsm_init_ema(struct lsm_ema *, size_t, size_t, struct file *); > +int lsm_merge_ema(struct lsm_ema *, struct lsm_ema_map *); > +struct lsm_ema *lsm_split_ema(struct lsm_ema *, size_t, struct lsm_ema_map *); > + > +static inline struct lsm_ema_map *lsm_get_ema_map(struct file *f) > +{ > + return (void *)atomic_long_read(f->f_security); > +} > + > +static inline int __must_check lsm_lock_ema(struct lsm_ema_map *map) > +{ > + return mutex_lock_interruptible(&map->lock); > +} > + > +static inline void lsm_unlock_ema(struct lsm_ema_map *map) > +{ > + mutex_unlock(&map->lock); > +} > + > +static inline struct lsm_ema *lsm_prev_ema(struct lsm_ema *p, > + struct lsm_ema_map *map) > +{ > + p = list_prev_entry(p, link); > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline struct lsm_ema *lsm_next_ema(struct lsm_ema *p, > + struct lsm_ema_map *map) > +{ > + p = list_next_entry(p, link); > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline struct lsm_ema *lsm_find_ema(struct lsm_ema_map *map, size_t a) > +{ > + struct lsm_ema *p; > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + list_for_each_entry(p, &map->list, link) > + if (a < p->end) > + break; > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline int lsm_insert_ema(struct lsm_ema_map *map, struct lsm_ema *n) > +{ > + struct lsm_ema *p = lsm_find_ema(map, n->start); > + > + if (!p) > + list_add_tail(&n->link, &map->list); > + else if (n->end <= p->start) > + list_add_tail(&n->link, &p->link); > + else > + return -EEXIST; > + > + lsm_merge_ema(n, map); > + if (p) > + lsm_merge_ema(p, map); > + return 0; > +} > + > +static inline int lsm_for_each_ema(struct lsm_ema_map *map, size_t start, > + size_t end, int (*cb)(struct lsm_ema *, > + void *), void *arg) > +{ > + struct lsm_ema *ema; > + int rc; > + > + ema = lsm_find_ema(map, start); > + while (ema && end > ema->start) { > + if (start > ema->start) > + lsm_split_ema(ema, start, map); > + if (end < ema->end) > + ema = lsm_split_ema(ema, end, map); > + > + rc = (*cb)(ema, arg); > + lsm_merge_ema(ema, map); > + if (rc) > + return rc; > + > + ema = lsm_next_ema(ema, map); > + } > + > + if (ema) > + lsm_merge_ema(ema, map); > + return 0; > +} > + > +#endif /* _LSM_EMA_H_ */ > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h > index 47f58cfb6a19..ade1f9f81e64 100644 > --- a/include/linux/lsm_hooks.h > +++ b/include/linux/lsm_hooks.h > @@ -29,6 +29,8 @@ > #include <linux/init.h> > #include <linux/rculist.h> > > +struct lsm_ema; > + > /** > * union security_list_options - Linux Security Module hook function list > * > @@ -1446,6 +1448,21 @@ > * @bpf_prog_free_security: > * Clean up the security information stored inside bpf prog. > * > + * @enclave_load: > + * Decide if a range of pages shall be allowed to be loaded into an > + * enclave > + * > + * @encl points to the file identifying the target enclave > + * @ema specifies the target range to be loaded > + * @flags contains protections being requested for the target range > + * @source points to the VMA containing the source pages to be loaded > + * > + * @enclave_init: > + * Decide if an enclave shall be allowed to launch > + * > + * @encl points to the file identifying the target enclave being launched > + * @sigstruct contains a copy of the SIGSTRUCT in kernel memory > + * @source points to the VMA backing SIGSTRUCT in user memory > */ > union security_list_options { > int (*binder_set_context_mgr)(struct task_struct *mgr); > @@ -1807,6 +1824,13 @@ union security_list_options { > int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); > void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); > #endif /* CONFIG_BPF_SYSCALL */ > + > +#ifdef CONFIG_INTEL_SGX > + int (*enclave_load)(struct file *encl, struct lsm_ema *ema, > + size_t flags, struct vm_area_struct *source); > + int (*enclave_init)(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *source); > +#endif > }; > > struct security_hook_heads { > @@ -2046,6 +2070,10 @@ struct security_hook_heads { > struct hlist_head bpf_prog_alloc_security; > struct hlist_head bpf_prog_free_security; > #endif /* CONFIG_BPF_SYSCALL */ > +#ifdef CONFIG_INTEL_SGX > + struct hlist_head enclave_load; > + struct hlist_head enclave_init; > +#endif > } __randomize_layout; > > /* > @@ -2069,6 +2097,7 @@ struct lsm_blob_sizes { > int lbs_ipc; > int lbs_msg_msg; > int lbs_task; > + int lbs_ema_data; > }; > > /* > diff --git a/include/linux/security.h b/include/linux/security.h > index 659071c2e57c..52c200810004 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -1829,5 +1829,28 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) > #endif /* CONFIG_SECURITY */ > #endif /* CONFIG_BPF_SYSCALL */ > > +#ifdef CONFIG_INTEL_SGX > +struct sgx_sigstruct; > +#ifdef CONFIG_SECURITY > +int security_enclave_load(struct file *encl, size_t start, size_t end, > + size_t flags, struct vm_area_struct *source); > +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *source); > +#else > +static inline int security_enclave_load(struct file *encl, size_t start, > + size_t end, struct vm_area_struct *src) > +{ > + return 0; > +} > + > +static inline int security_enclave_init(struct file *encl, > + struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *src) > +{ > + return 0; > +} > +#endif /* CONFIG_SECURITY */ > +#endif /* CONFIG_INTEL_SGX */ > + > #endif /* ! __LINUX_SECURITY_H */ > > diff --git a/security/Makefile b/security/Makefile > index c598b904938f..1bab8f1344b6 100644 > --- a/security/Makefile > +++ b/security/Makefile > @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ > obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ > obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ > obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o > +obj-$(CONFIG_INTEL_SGX) += lsm_ema.o > > # Object integrity file lists > subdir-$(CONFIG_INTEGRITY) += integrity > diff --git a/security/lsm_ema.c b/security/lsm_ema.c > new file mode 100644 > index 000000000000..68fae0724d37 > --- /dev/null > +++ b/security/lsm_ema.c > @@ -0,0 +1,132 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2016-18 Intel Corporation. > + > +#include <linux/lsm_ema.h> > +#include <linux/slab.h> > + > +static struct kmem_cache *lsm_ema_cache; > +static size_t lsm_ema_data_size; > +static int lsm_ema_cache_decisions = 1; > + > +void lsm_ema_global_init(size_t ema_size) > +{ > + BUG_ON(lsm_ema_data_size > 0); > + > + lsm_ema_data_size = ema_size; > + > + ema_size += sizeof(struct lsm_ema); > + ema_size = max(ema_size, sizeof(struct lsm_ema_map)); > + lsm_ema_cache = kmem_cache_create("lsm_ema_cache", ema_size, > + __alignof__(struct lsm_ema), > + SLAB_PANIC, NULL); > + > +} > + > +struct lsm_ema_map *lsm_init_or_get_ema_map(atomic_long_t *p) > +{ > + struct lsm_ema_map *map; > + > + map = (typeof(map))atomic_long_read(p); > + if (!map) { > + long n; > + > + map = (typeof(map))lsm_alloc_ema(); > + if (!map) > + return NULL; > + > + INIT_LIST_HEAD(&map->list); > + mutex_init(&map->lock); > + > + n = atomic_long_cmpxchg(p, 0, (long)map); > + if (n) { > + atomic_long_t a; > + atomic_long_set(&a, (long)map); > + map = (typeof(map))n; > + lsm_free_ema_map(&a); > + } > + } > + return map; > +} > + > +void lsm_free_ema_map(atomic_long_t *p) > +{ > + struct lsm_ema_map *map; > + struct lsm_ema *ema, *n; > + > + map = (typeof(map))atomic_long_read(p); > + if (!map) > + return; > + > + BUG_ON(mutex_is_locked(&map->lock)); > + > + list_for_each_entry_safe(ema, n, &map->list, link) > + lsm_free_ema(ema); > + kmem_cache_free(lsm_ema_cache, map); > +} > + > +struct lsm_ema *lsm_alloc_ema(void) > +{ > + return kmem_cache_zalloc(lsm_ema_cache, GFP_KERNEL); > +} > + > +void lsm_free_ema(struct lsm_ema *ema) > +{ > + list_del(&ema->link); > + if (ema->source) > + fput(ema->source); > + kmem_cache_free(lsm_ema_cache, ema); > +} > + > +void lsm_init_ema(struct lsm_ema *ema, size_t start, size_t end, > + struct file *source) > +{ > + INIT_LIST_HEAD(&ema->link); > + ema->start = start; > + ema->end = end; > + if (!lsm_ema_cache_decisions && source) > + ema->source = get_file(source); > +} > + > +int lsm_merge_ema(struct lsm_ema *p, struct lsm_ema_map *map) > +{ > + struct lsm_ema *prev = list_prev_entry(p, link); > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + if (&prev->link == &map->list || prev->end != p->start || > + prev->source != p->source || > + memcmp(prev + 1, p + 1, lsm_ema_data_size)) > + return 0; > + > + p->start = prev->start; > + fput(prev->source); > + lsm_free_ema(prev); > + return 1; > +} > + > +struct lsm_ema *lsm_split_ema(struct lsm_ema *p, size_t at, > + struct lsm_ema_map *map) > +{ > + struct lsm_ema *n; > + > + BUG_ON(!mutex_is_locked(&map->lock)); > + > + if (at <= p->start || at >= p->end) > + return p; > + > + n = lsm_alloc_ema(); > + if (likely(n)) { > + lsm_init_ema(n, p->start, at, p->source); > + memcpy(n + 1, p + 1, lsm_ema_data_size); > + p->start = at; > + list_add_tail(&n->link, &p->link); > + } > + return n; > +} > + > +static int __init set_ema_cache_decisions(char *str) > +{ > + lsm_ema_cache_decisions = (strcmp(str, "0") && strcmp(str, "off")); > + return 1; > +} > +__setup("lsm.ema.cache_decisions=", set_ema_cache_decisions); > diff --git a/security/security.c b/security/security.c > index f493db0bf62a..d50883f18be2 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -17,6 +17,7 @@ > #include <linux/init.h> > #include <linux/kernel.h> > #include <linux/lsm_hooks.h> > +#include <linux/lsm_ema.h> > #include <linux/integrity.h> > #include <linux/ima.h> > #include <linux/evm.h> > @@ -41,7 +42,9 @@ static struct kmem_cache *lsm_file_cache; > static struct kmem_cache *lsm_inode_cache; > > char *lsm_names; > -static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init; > +static struct lsm_blob_sizes blob_sizes __lsm_ro_after_init = { > + .lbs_file = sizeof(atomic_long_t) * IS_ENABLED(CONFIG_INTEL_SGX), > +}; > > /* Boot-time LSM user choice */ > static __initdata const char *chosen_lsm_order; > @@ -169,6 +172,7 @@ static void __init lsm_set_blob_sizes(struct lsm_blob_sizes *needed) > lsm_set_blob_size(&needed->lbs_ipc, &blob_sizes.lbs_ipc); > lsm_set_blob_size(&needed->lbs_msg_msg, &blob_sizes.lbs_msg_msg); > lsm_set_blob_size(&needed->lbs_task, &blob_sizes.lbs_task); > + lsm_set_blob_size(&needed->lbs_ema_data, &blob_sizes.lbs_ema_data); > } > > /* Prepare LSM for initialization. */ > @@ -314,6 +318,7 @@ static void __init ordered_lsm_init(void) > lsm_inode_cache = kmem_cache_create("lsm_inode_cache", > blob_sizes.lbs_inode, 0, > SLAB_PANIC, NULL); > + lsm_ema_global_init(blob_sizes.lbs_ema_data); > > lsm_early_cred((struct cred *) current->cred); > lsm_early_task(current); > @@ -1357,6 +1362,7 @@ void security_file_free(struct file *file) > blob = file->f_security; > if (blob) { > file->f_security = NULL; > + lsm_free_ema_map(blob); > kmem_cache_free(lsm_file_cache, blob); > } > } > @@ -1420,6 +1426,7 @@ int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, > { > return call_int_hook(file_mprotect, 0, vma, reqprot, prot); > } > +EXPORT_SYMBOL(security_file_mprotect); > > int security_file_lock(struct file *file, unsigned int cmd) > { > @@ -2355,3 +2362,41 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) > call_void_hook(bpf_prog_free_security, aux); > } > #endif /* CONFIG_BPF_SYSCALL */ > + > +#ifdef CONFIG_INTEL_SGX > +int security_enclave_load(struct file *encl, size_t start, size_t end, > + size_t flags, struct vm_area_struct *src) > +{ > + struct lsm_ema_map *map; > + struct lsm_ema *ema; > + int rc; > + > + map = lsm_init_or_get_ema_map(encl->f_security); > + if (unlikely(!map)) > + return -ENOMEM; > + > + ema = lsm_alloc_ema(); > + if (unlikely(!ema)) > + return -ENOMEM; > + > + lsm_init_ema(ema, start, end, src->vm_file); > + rc = call_int_hook(enclave_load, 0, encl, ema, flags, src); > + if (!rc) > + rc = lsm_lock_ema(map); > + if (!rc) { > + rc = lsm_insert_ema(map, ema); > + lsm_unlock_ema(map); > + } > + if (rc) > + lsm_free_ema(ema); > + return rc; > +} > +EXPORT_SYMBOL(security_enclave_load); > + > +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *src) > +{ > + return call_int_hook(enclave_init, 0, encl, sigstruct, src); > +} > +EXPORT_SYMBOL(security_enclave_init); > +#endif /* CONFIG_INTEL_SGX */ >
On 6/27/2019 5:47 PM, Xing, Cedric wrote: >> From: Casey Schaufler [mailto:casey@schaufler-ca.com] >> Sent: Thursday, June 27, 2019 4:37 PM >>>> This code should not be mixed in with the LSM infrastructure. >>>> It should all be contained in its own module, under security/enclave. >>> lsm_ema is *intended* to be part of the LSM infrastructure. >> That's not going to fly, not for a minute. > Why not, if there's a need for it? > > And what's the concern here if it becomes part of the LSM infrastructure. The LSM infrastructure provides a framework for hooks and allocation of blobs. That's it. It's a layer for connecting system features like VFS, IPC and the IP stack to the security modules. It does not implement any policy of it's own. We are not going to implement SGX or any other mechanism within the LSM infrastructure. >>> It is going to be shared among all LSMs that would like to track >> enclave pages and their origins. >> >> That's true for InfiniBand, tun and sctp as well. Look at their >> implementations. > As far as I can tell, InfiniBand, tun and sctp, all of them seemed used inside SELinux only. So? > If you had a chance to look at v1 of my series, I started by burying everything inside SELinux too. But Stephen pointed out such tracking would be needed by all LSMs so code duplication might be a concern. Thus I responded by moving it into LSM infrastructure. What you need to do is move all the lsm_ema code into its own place (which could be security/enclave). Manage your internal data as you like. LSMs (e.g. SELinux) can call your APIs if needed. If the LSMs need to store SGX information with the file structure they need to include that in the space they ask for in the file blob. >>> And they could be extended to store more information as deemed >> appropriate by the LSM module. >> >> Which is what blobs are for, but that does not appear to be how >> you're using either the file blob or your new ema blob. > A lsm_ema_map pointer is stored in file->f_security. That's up to the individual security module to decide. > Each lsm_ema_map contains a list of lsm_ema structures. In my last patch, SELinux stores a ema_security_struct with every ema, by setting selinux_blob_sizes.lbs_ema_data to sizeof(ema_security_struct). You are managing the ema map lists. You don't need the LSM infrastructure to do that. > ema_security_struct is initialized in selinux_enclave_load(), and checked in enclave_mprotect(), which is a subroutine of selinux_file_mprotect(). BTW, it is alloced/freed automatically by LSM infrastructure in security_enclave_load()/security_file_free(). Do you mean security_enclave_load()/security_enclave_free() ? There is no way you can possibly have sane behavior if you're allocation and free aren't tied to the same blob. >>> The last patch of this series shows how to extend EMA inside SELinux. >> I don't see (but I admit the code doesn't make a lot of sense to me) >> anything you couldn't do in the SELinux code by adding data to the >> file blob. The data you're adding to the LSM infrastructure doesn't >> belong there, and it doesn't need to be there. > You are correct. My v1 did it inside SELinux. > > The key question I think is whether only SELinux needs it, or all LSMs need it. Stephen thought it was the latter (and I agree with him) so I moved it into the LSM infrastructure to be shared, just like the auditing code. You are both right that it doesn't belong in the SELinux code. It also doesn't belong as part of the LSM infrastructure. >>>> Not acceptable for the LSM infrastructure. They >>>> are inconsistent with the way data is used there. >>> I'm not sure I understand this comment. >> It means that your definition and use of the lsm_ema_blob >> does not match the way other blobs are managed and used. >> The LSM infrastructure uses these entries in a very particular >> way, and you're trying to use it differently. Your might be >> able to change the rest of the enclave system to use it >> correctly, or you might be able to find a different place >> for it. > I'm still not sure why you think this (lbs_ema_data) is inconsistent with other blobs. > > Same as all other blobs, an LSM requests it by storing the needed size in it, and is assigned an offset, and the buffer is allocated/freed by the infrastructure. Am I missing anything? Yes. Aside from allocation and deletion the infrastructure does nothing with the blobs. The blobs are used only by the security modules. All other data is maintained and used elsewhere. SGX specific data needs to me maintained and managed elsewhere. >>> As I stated in the cover letter, the primary question is how to >> prevent SGX from being abused as a backdoor to make executable pages >> that would otherwise not be executable without SGX. Any LSM module >> unaware of that would leave that "hole" open. So tracking enclave pages >> will become a common task for all LSMs that care page protections, and >> that's why I place it inside LSM infrastructure. >> >> Page protections are an important part of many security features, >> but that's beside the point. The LSM system provides mechanism for >> providing additional restrictions to existing security mechanisms. >> First, you create the security mechanism (e.g. enclaves) then you >> add LSM hooks so that security modules (e.g. SELinux) can apply >> their own policies in addition. In support of this, the LSM blob >> mechanism allows security modules to maintain their own information >> about the system components (e.g. file, inode, cred, task) they >> care about. The LSM infrastructure does not itself provide or >> support security data or policy. That's strictly for the modules >> to do. > Agreed! > > EMA doesn't dictate policies for sure. Is it considered "security data"? I'm not sure the definition of "security data" here. It does store some "data", something that multiple LSM modules would need to duplicate if not pulled into a common place. It is meant to be a "helper" data structure, just like the auditing code. Good example. You'll see that there is no audit code in the LSM infrastructure. None. No audit data, either. It's all taken care of in the audit system.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Stephen Smalley > Sent: Friday, June 28, 2019 9:17 AM > > FWIW, adding new permissions only requires updating policy configuration, > not userspace code/tools. But in any event, we can reuse the execute- > related permissions if it makes sense but still consider introducing > additional, new permissions, possibly in a separate "enclave" security > class, if we want explicit control over enclave loading, e.g. > ENCLAVE__LOAD, ENCLAVE__INIT, etc. I'm not so familiar with SELinux tools so my apology in advance if I end up mixing up things. I'm not only talking about the new permissions, but also how to apply them to enclave files. Intel SGX SDK packages enclaves as .so files, and I guess that's the most straight forward way that most others would do. So if different permissions are defined, then user mode tools would have to distinguish enclaves from regular .so files in order to grant them different permissions. Would that be something extra to existing tools? > > One residual concern I have with the reuse of FILE__EXECUTE is using it > for the sigstruct file as the fallback case. If the sigstruct is always > part of the same file as the code, then it probably doesn't matter. But > otherwise, it is somewhat odd to have to allow the host process to > execute from the sigstruct file if it is only data (the signature). I agree with you. But do you think it a practical problem today? As far as I know, no one is deploying sigstructs in dedicated files. I'm just trying to touch as few things as possible until there's definitely a need to do so.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Stephen Smalley > Sent: Friday, June 28, 2019 9:37 AM > > > lsm.ema.cache_decisions is on by default and > > could be turned off by appending “lsm.ema.cache_decisions=0” or > > “lsm.ema.cache_decisions=off” to the kernel command line. > > This seems problematic on a few fronts: > > - Specifying it as a boot parameter requires teaching admins / policy > developers to do something in addition to what they are already doing > when they want to create policy, > > - Limiting it to a boot parameter requires a reboot to change the mode > of operation, whereas SELinux offers runtime toggling of permissive mode > and even per-process (domain) permissive mode (and so does AppArmor), How about making a variable tunable via sysctl? > > - In the cache_decisions=1 case, do we get any auditing at all? If not, > that's a problem. We want auditing not only when we are > generating/learning policy but also in operation. Currently it doesn't generate audit log, but I could add it, except it couldn't point to the exact file. But I can use the sigstruct file instead so administrators can at least tell which enclave violates the policy. Do you think it acceptable? > > - There is the potential for inconsistencies to arise between the > enforcement applied with different cache_decisions values. The enforcement will be consistent. The difference only lies in the logs. > > I would suggest that we just never cache the decision and accept the > cost if we are going to take this approach. This will also be a viable option. I don't think any enclaves would be comprised of a large number of files anyway. When SGX2 comes up, I think most enclaves will be instantiated from only one file and defer loading libraries at runtime. So in practice we are looking to keeping only one file open per enclave, which seems totally acceptable. Stephen (and everyone having an opinion on this), which way do you prefer? sysctl variable? Or never cache decisions?
> From: Casey Schaufler [mailto:casey@schaufler-ca.com] > Sent: Friday, June 28, 2019 10:22 AM > > > > And what's the concern here if it becomes part of the LSM > infrastructure. > > The LSM infrastructure provides a framework for hooks > and allocation of blobs. That's it. It's a layer for > connecting system features like VFS, IPC and the IP stack > to the security modules. It does not implement any policy > of it's own. We are not going to implement SGX or any other > mechanism within the LSM infrastructure. EMA doesn't force/implement any policy either. It just supplements VMA. > > >>> It is going to be shared among all LSMs that would like to track > >> enclave pages and their origins. > >> > >> That's true for InfiniBand, tun and sctp as well. Look at their > >> implementations. > > As far as I can tell, InfiniBand, tun and sctp, all of them seemed > used inside SELinux only. > > So? So they are NOT shared among LSMs, which are different than EMA. > > > If you had a chance to look at v1 of my series, I started by burying > everything inside SELinux too. But Stephen pointed out such tracking > would be needed by all LSMs so code duplication might be a concern. Thus > I responded by moving it into LSM infrastructure. > > What you need to do is move all the lsm_ema code into its own > place (which could be security/enclave). Manage your internal > data as you like. LSMs (e.g. SELinux) can call your APIs if > needed. If the LSMs need to store SGX information with the file > structure they need to include that in the space they ask for in > the file blob. I thought subdirectories were for LSM modules. EMA is more like auditing code, which has a header in include/linux/ and an implementation in security/. Is that right? > > > >>> And they could be extended to store more information as deemed > >> appropriate by the LSM module. > >> > >> Which is what blobs are for, but that does not appear to be how > >> you're using either the file blob or your new ema blob. > > A lsm_ema_map pointer is stored in file->f_security. > > That's up to the individual security module to decide. That's doable. The drawback is, if there are N LSM modules active, then the same information will be duplicated N times. Will that be a problem? > > > Each lsm_ema_map contains a list of lsm_ema structures. In my last > patch, SELinux stores a ema_security_struct with every ema, by setting > selinux_blob_sizes.lbs_ema_data to sizeof(ema_security_struct). > > You are managing the ema map lists. You don't need the LSM > infrastructure to do that. > > > ema_security_struct is initialized in selinux_enclave_load(), and > checked in enclave_mprotect(), which is a subroutine of > selinux_file_mprotect(). BTW, it is alloced/freed automatically by LSM > infrastructure in security_enclave_load()/security_file_free(). > > Do you mean security_enclave_load()/security_enclave_free() ? > There is no way you can possibly have sane behavior if you're > allocation and free aren't tied to the same blob. There's no security_*enclave*_free(). lsm_ema_map is allocated only for enclaves. But LSM doesn't know which file is an enclave, so the allocation is deferred until the first security_enclave_load(). security_file_free() frees the map if it isn't NULL. > > >>> The last patch of this series shows how to extend EMA inside > SELinux. > >> I don't see (but I admit the code doesn't make a lot of sense to me) > >> anything you couldn't do in the SELinux code by adding data to the > >> file blob. The data you're adding to the LSM infrastructure doesn't > >> belong there, and it doesn't need to be there. > > You are correct. My v1 did it inside SELinux. > > > > The key question I think is whether only SELinux needs it, or all LSMs > need it. Stephen thought it was the latter (and I agree with him) so I > moved it into the LSM infrastructure to be shared, just like the > auditing code. > > You are both right that it doesn't belong in the SELinux code. > It also doesn't belong as part of the LSM infrastructure. Then what is your suggestion? Is the code in security_enclave_load()/security_file_free() that bothers you? Because you think they shouldn't do anything more than just a single line of call_int/void_hooks()? > > >>>> Not acceptable for the LSM infrastructure. They > >>>> are inconsistent with the way data is used there. > >>> I'm not sure I understand this comment. > >> It means that your definition and use of the lsm_ema_blob > >> does not match the way other blobs are managed and used. > >> The LSM infrastructure uses these entries in a very particular > >> way, and you're trying to use it differently. Your might be > >> able to change the rest of the enclave system to use it > >> correctly, or you might be able to find a different place > >> for it. > > I'm still not sure why you think this (lbs_ema_data) is inconsistent > with other blobs. > > > > Same as all other blobs, an LSM requests it by storing the needed size > in it, and is assigned an offset, and the buffer is allocated/freed by > the infrastructure. Am I missing anything? > > Yes. Aside from allocation and deletion the infrastructure does > nothing with the blobs. The blobs are used only by the security > modules. All other data is maintained and used elsewhere. SGX > specific data needs to me maintained and managed elsewhere. > > >>> As I stated in the cover letter, the primary question is how to > >> prevent SGX from being abused as a backdoor to make executable pages > >> that would otherwise not be executable without SGX. Any LSM module > >> unaware of that would leave that "hole" open. So tracking enclave > pages > >> will become a common task for all LSMs that care page protections, > and > >> that's why I place it inside LSM infrastructure. > >> > >> Page protections are an important part of many security features, > >> but that's beside the point. The LSM system provides mechanism for > >> providing additional restrictions to existing security mechanisms. > >> First, you create the security mechanism (e.g. enclaves) then you > >> add LSM hooks so that security modules (e.g. SELinux) can apply > >> their own policies in addition. In support of this, the LSM blob > >> mechanism allows security modules to maintain their own information > >> about the system components (e.g. file, inode, cred, task) they > >> care about. The LSM infrastructure does not itself provide or > >> support security data or policy. That's strictly for the modules > >> to do. > > Agreed! > > > > EMA doesn't dictate policies for sure. Is it considered "security > data"? I'm not sure the definition of "security data" here. It does > store some "data", something that multiple LSM modules would need to > duplicate if not pulled into a common place. It is meant to be a > "helper" data structure, just like the auditing code. > > Good example. You'll see that there is no audit code in the > LSM infrastructure. None. No audit data, either. It's all taken > care of in the audit system. Did you mean security/security.c didn't call into any audit functions? lsm_audit.c is located in security/ and its header in include/linux/ but you don't have a problem with them. Am I right? IIUC, you want me to pack whatever inside security_enclave_load()/security_file_free() into some APIs to be called by individual LSM modules. But if you can pay a bit more attention to the code, an EMA can be inserted to the map only after *all* LSM modules have approved it. So if it is spread into individual LSMs and if there are multiple active LSMs, there could be inconsistence among LSMs if they each maintains its own map and makes different decisions on the same EMA at enclave_load(). I'm not saying that's unresolvable but definitely more error prone, besides wasting memory. Or do you have any practical suggestions?
On Fri, Jun 28, 2019 at 5:20 PM Xing, Cedric <cedric.xing@intel.com> wrote: > > > From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > > owner@vger.kernel.org] On Behalf Of Stephen Smalley > > Sent: Friday, June 28, 2019 9:17 AM > > > > FWIW, adding new permissions only requires updating policy configuration, > > not userspace code/tools. But in any event, we can reuse the execute- > > related permissions if it makes sense but still consider introducing > > additional, new permissions, possibly in a separate "enclave" security > > class, if we want explicit control over enclave loading, e.g. > > ENCLAVE__LOAD, ENCLAVE__INIT, etc. > > I'm not so familiar with SELinux tools so my apology in advance if I end up mixing up things. > > I'm not only talking about the new permissions, but also how to apply them to enclave files. Intel SGX SDK packages enclaves as .so files, and I guess that's the most straight forward way that most others would do. So if different permissions are defined, then user mode tools would have to distinguish enclaves from regular .so files in order to grant them different permissions. Would that be something extra to existing tools? It doesn't require any userspace code changes. It is just a matter of defining some configuration data in the policy for the new permissions, one or more security labels (tags) for the SGX .so files, and rules allowing access where desired, and then setting those security labels on the SGX .so files (via the security.selinux extended attribute on the files). Even the last part is generally handled by updating a configuration specifying how files should be labeled and then rpm automatically labels the files when created, or you can manually restorecon them. If the new permissions are defined in their own security class rather than reusing existing ones, then they can even be defined entirely via a local or third party policy module separate from the distro policy if desired/needed. > > > > > One residual concern I have with the reuse of FILE__EXECUTE is using it > > for the sigstruct file as the fallback case. If the sigstruct is always > > part of the same file as the code, then it probably doesn't matter. But > > otherwise, it is somewhat odd to have to allow the host process to > > execute from the sigstruct file if it is only data (the signature). > > I agree with you. But do you think it a practical problem today? As far as I know, no one is deploying sigstructs in dedicated files. I'm just trying to touch as few things as possible until there's definitely a need to do so. I don't know, and it wasn't clear to me from the earlier discussions. If not and if it is acceptable to require them to be in files in the first place, then perhaps it isn't necessary.
On Fri, Jun 28, 2019 at 5:54 PM Xing, Cedric <cedric.xing@intel.com> wrote: > > > From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > > owner@vger.kernel.org] On Behalf Of Stephen Smalley > > Sent: Friday, June 28, 2019 9:37 AM > > > > > lsm.ema.cache_decisions is on by default and > > > could be turned off by appending “lsm.ema.cache_decisions=0” or > > > “lsm.ema.cache_decisions=off” to the kernel command line. > > > > This seems problematic on a few fronts: > > > > - Specifying it as a boot parameter requires teaching admins / policy > > developers to do something in addition to what they are already doing > > when they want to create policy, > > > > - Limiting it to a boot parameter requires a reboot to change the mode > > of operation, whereas SELinux offers runtime toggling of permissive mode > > and even per-process (domain) permissive mode (and so does AppArmor), > > How about making a variable tunable via sysctl? Better than a boot parameter but still not amenable to per-domain permissive and still requires admins to remember and perform an extra step before collecting denials. > > > > > - In the cache_decisions=1 case, do we get any auditing at all? If not, > > that's a problem. We want auditing not only when we are > > generating/learning policy but also in operation. > > Currently it doesn't generate audit log, but I could add it, except it couldn't point to the exact file. But I can use the sigstruct file instead so administrators can at least tell which enclave violates the policy. Do you think it acceptable? Seems prone to user confusion and lacks precision in why the denial occurred. > > > > > - There is the potential for inconsistencies to arise between the > > enforcement applied with different cache_decisions values. > > The enforcement will be consistent. The difference only lies in the logs. > > > > > I would suggest that we just never cache the decision and accept the > > cost if we are going to take this approach. > > This will also be a viable option. I don't think any enclaves would be comprised of a large number of files anyway. When SGX2 comes up, I think most enclaves will be instantiated from only one file and defer loading libraries at runtime. So in practice we are looking to keeping only one file open per enclave, which seems totally acceptable. > > Stephen (and everyone having an opinion on this), which way do you prefer? sysctl variable? Or never cache decisions? I'd favor never caching decisions.
On Fri, Jun 28, 2019 at 1:22 PM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 6/27/2019 5:47 PM, Xing, Cedric wrote: > >> From: Casey Schaufler [mailto:casey@schaufler-ca.com] > >> Sent: Thursday, June 27, 2019 4:37 PM > >>>> This code should not be mixed in with the LSM infrastructure. > >>>> It should all be contained in its own module, under security/enclave. > >>> lsm_ema is *intended* to be part of the LSM infrastructure. > >> That's not going to fly, not for a minute. > > Why not, if there's a need for it? > > > > And what's the concern here if it becomes part of the LSM infrastructure. > > The LSM infrastructure provides a framework for hooks > and allocation of blobs. That's it. It's a layer for > connecting system features like VFS, IPC and the IP stack > to the security modules. It does not implement any policy > of it's own. We are not going to implement SGX or any other > mechanism within the LSM infrastructure. I don't think you understand the purpose of this code. It isn't implementing SGX, nor is it needed by SGX. It is providing shared infrastructure for security modules, similar to lsm_audit.c, so that security modules can enforce W^X or similar memory protection guarantees for SGX enclave memory, which has unique properties that render the existing mmap and mprotect hooks insufficient. They can certainly implement it only for SELinux, but then any other security module that wants to provide such guarantees will have to replicate that code. > > >>> It is going to be shared among all LSMs that would like to track > >> enclave pages and their origins. > >> > >> That's true for InfiniBand, tun and sctp as well. Look at their > >> implementations. > > As far as I can tell, InfiniBand, tun and sctp, all of them seemed used inside SELinux only. > > So? > > > If you had a chance to look at v1 of my series, I started by burying everything inside SELinux too. But Stephen pointed out such tracking would be needed by all LSMs so code duplication might be a concern. Thus I responded by moving it into LSM infrastructure. > > What you need to do is move all the lsm_ema code into its own > place (which could be security/enclave). Manage your internal > data as you like. LSMs (e.g. SELinux) can call your APIs if > needed. If the LSMs need to store SGX information with the file > structure they need to include that in the space they ask for in > the file blob. > > > >>> And they could be extended to store more information as deemed > >> appropriate by the LSM module. > >> > >> Which is what blobs are for, but that does not appear to be how > >> you're using either the file blob or your new ema blob. > > A lsm_ema_map pointer is stored in file->f_security. > > That's up to the individual security module to decide. > > > Each lsm_ema_map contains a list of lsm_ema structures. In my last patch, SELinux stores a ema_security_struct with every ema, by setting selinux_blob_sizes.lbs_ema_data to sizeof(ema_security_struct). > > You are managing the ema map lists. You don't need the LSM > infrastructure to do that. > > > ema_security_struct is initialized in selinux_enclave_load(), and checked in enclave_mprotect(), which is a subroutine of selinux_file_mprotect(). BTW, it is alloced/freed automatically by LSM infrastructure in security_enclave_load()/security_file_free(). > > Do you mean security_enclave_load()/security_enclave_free() ? > There is no way you can possibly have sane behavior if you're > allocation and free aren't tied to the same blob. > > >>> The last patch of this series shows how to extend EMA inside SELinux. > >> I don't see (but I admit the code doesn't make a lot of sense to me) > >> anything you couldn't do in the SELinux code by adding data to the > >> file blob. The data you're adding to the LSM infrastructure doesn't > >> belong there, and it doesn't need to be there. > > You are correct. My v1 did it inside SELinux. > > > > The key question I think is whether only SELinux needs it, or all LSMs need it. Stephen thought it was the latter (and I agree with him) so I moved it into the LSM infrastructure to be shared, just like the auditing code. > > You are both right that it doesn't belong in the SELinux code. > It also doesn't belong as part of the LSM infrastructure. > > >>>> Not acceptable for the LSM infrastructure. They > >>>> are inconsistent with the way data is used there. > >>> I'm not sure I understand this comment. > >> It means that your definition and use of the lsm_ema_blob > >> does not match the way other blobs are managed and used. > >> The LSM infrastructure uses these entries in a very particular > >> way, and you're trying to use it differently. Your might be > >> able to change the rest of the enclave system to use it > >> correctly, or you might be able to find a different place > >> for it. > > I'm still not sure why you think this (lbs_ema_data) is inconsistent with other blobs. > > > > Same as all other blobs, an LSM requests it by storing the needed size in it, and is assigned an offset, and the buffer is allocated/freed by the infrastructure. Am I missing anything? > > Yes. Aside from allocation and deletion the infrastructure does > nothing with the blobs. The blobs are used only by the security > modules. All other data is maintained and used elsewhere. SGX > specific data needs to me maintained and managed elsewhere. > > >>> As I stated in the cover letter, the primary question is how to > >> prevent SGX from being abused as a backdoor to make executable pages > >> that would otherwise not be executable without SGX. Any LSM module > >> unaware of that would leave that "hole" open. So tracking enclave pages > >> will become a common task for all LSMs that care page protections, and > >> that's why I place it inside LSM infrastructure. > >> > >> Page protections are an important part of many security features, > >> but that's beside the point. The LSM system provides mechanism for > >> providing additional restrictions to existing security mechanisms. > >> First, you create the security mechanism (e.g. enclaves) then you > >> add LSM hooks so that security modules (e.g. SELinux) can apply > >> their own policies in addition. In support of this, the LSM blob > >> mechanism allows security modules to maintain their own information > >> about the system components (e.g. file, inode, cred, task) they > >> care about. The LSM infrastructure does not itself provide or > >> support security data or policy. That's strictly for the modules > >> to do. > > Agreed! > > > > EMA doesn't dictate policies for sure. Is it considered "security data"? I'm not sure the definition of "security data" here. It does store some "data", something that multiple LSM modules would need to duplicate if not pulled into a common place. It is meant to be a "helper" data structure, just like the auditing code. > > Good example. You'll see that there is no audit code in the > LSM infrastructure. None. No audit data, either. It's all taken > care of in the audit system. > >
On 6/28/2019 6:37 PM, Stephen Smalley wrote:
> On Fri, Jun 28, 2019 at 1:22 PM Casey Schaufler <casey@schaufler-ca.com> wrote:
>> On 6/27/2019 5:47 PM, Xing, Cedric wrote:
>>>> From: Casey Schaufler [mailto:casey@schaufler-ca.com]
>>>> Sent: Thursday, June 27, 2019 4:37 PM
>>>>>> This code should not be mixed in with the LSM infrastructure.
>>>>>> It should all be contained in its own module, under security/enclave.
>>>>> lsm_ema is *intended* to be part of the LSM infrastructure.
>>>> That's not going to fly, not for a minute.
>>> Why not, if there's a need for it?
>>>
>>> And what's the concern here if it becomes part of the LSM infrastructure.
>> The LSM infrastructure provides a framework for hooks
>> and allocation of blobs. That's it. It's a layer for
>> connecting system features like VFS, IPC and the IP stack
>> to the security modules. It does not implement any policy
>> of it's own. We are not going to implement SGX or any other
>> mechanism within the LSM infrastructure.
> I don't think you understand the purpose of this code. It isn't
> implementing SGX, nor is it needed by SGX.
> It is providing shared infrastructure for security modules, similar to
> lsm_audit.c, so that security modules can enforce W^X or similar
> memory protection guarantees for SGX enclave memory, which has unique
> properties that render the existing mmap and mprotect hooks
> insufficient. They can certainly implement it only for SELinux, but
> then any other security module that wants to provide such guarantees
> will have to replicate that code.
I am not objecting to the purpose of the code.
I *am* objecting to calling it part of the LSM infrastructure.
It needs to be it's own thing, off somewhere else.
It must not use the lsm_ prefix. That's namespace pollution.
The code must not be embedded in the LSM infrastructure code,
that breaks with how everything else works.
... and the notion that you allocate data for one blob
that gets freed relative to another breaks the data management
model.
On Tue, Jun 25, 2019 at 2:09 PM Stephen Smalley <sds@tycho.nsa.gov> wrote:
>
> On 6/21/19 5:22 PM, Xing, Cedric wrote:
> >> From: Christopherson, Sean J
> >> Sent: Wednesday, June 19, 2019 3:24 PM
> >>
> >> Intended use of each permission:
> >>
> >> - SGX_EXECDIRTY: dynamically load code within the enclave itself
> >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. Graphene
> >
> > Why does it matter whether a code page is measured or not?
>
> It won't be incorporated into an attestation?
>
Also, if there is, in parallel, a policy that limits the set of
enclave SIGSTRUCTs that are accepted, requiring all code be measured
makes it harder to subvert by writing incompetent or maliciously
incompetent enclaves.
On Thu, Jun 27, 2019 at 11:56 AM Cedric Xing <cedric.xing@intel.com> wrote:
>
> SGX enclaves are loaded from pages in regular memory. Given the ability to
> create executable pages, the newly added SGX subsystem may present a backdoor
> for adversaries to circumvent LSM policies, such as creating an executable
> enclave page from a modified regular page that would otherwise not be made
> executable as prohibited by LSM. Therefore arises the primary question of
> whether an enclave page should be allowed to be created from a given source
> page in regular memory.
>
> A related question is whether to grant/deny a mprotect() request on a given
> enclave page/range. mprotect() is traditionally covered by
> security_file_mprotect() hook, however, enclave pages have a different lifespan
> than either MAP_PRIVATE or MAP_SHARED. Particularly, MAP_PRIVATE pages have the
> same lifespan as the VMA while MAP_SHARED pages have the same lifespan as the
> backing file (on disk), but enclave pages have the lifespan of the enclave’s
> file descriptor. For example, enclave pages could be munmap()’ed then mmap()’ed
> again without losing contents (like MAP_SHARED), but all enclave pages will be
> lost once its file descriptor has been closed (like MAP_PRIVATE). That said,
> LSM modules need some new data structure for tracking protections of enclave
> pages/ranges so that they can make proper decisions at mmap()/mprotect()
> syscalls.
>
> The last question, which is orthogonal to the 2 above, is whether or not to
> allow a given enclave to launch/run. Enclave pages are not visible to the rest
> of the system, so to some extent offer a better place for malicious software to
> hide. Thus, it is sometimes desirable to whitelist/blacklist enclaves by their
> measurements, signing public keys, or image files.
>
> To address the questions above, 2 new LSM hooks are added for enclaves.
> - security_enclave_load() – This hook allows LSM to decide whether or not to
> allow instantiation of a range of enclave pages using the specified VMA. It
> is invoked when a range of enclave pages is about to be loaded. It serves 3
> purposes: 1) indicate to LSM that the file struct in subject is an enclave;
> 2) allow LSM to decide whether or not to instantiate those pages and 3)
> allow LSM to initialize internal data structures for tracking
> origins/protections of those pages.
> - security_enclave_init() – This hook allows whitelisting/blacklisting or
> performing whatever checks deemed appropriate before an enclave is allowed
> to run. An LSM module may opt to use the file backing the SIGSTRUCT as a
> proxy to dictate allowed protections for anonymous pages.
>
> mprotect() of enclave pages continue to be governed by
> security_file_mprotect(), with the expectation that LSM is able to distinguish
> between regular and enclave pages inside the hook. For mmap(), the SGX
> subsystem is expected to invoke security_file_mprotect() explicitly to check
> protections against the requested protections for existing enclave pages. As
> stated earlier, enclave pages have different lifespan than the existing
> MAP_PRIVATE and MAP_SHARED pages, so would require a new data structure outside
> of VMA to track their protections and/or origins. Enclave Memory Area (or EMA
> for short) has been introduced to address the need. EMAs are maintained by the
> LSM framework for all LSM modules to share. EMAs will be instantiated for
> enclaves only so will not impose memory/performance overheads for regular
> applications/files. Please see include/linux/lsm_ema.h and security/lsm_ema.c
> for details.
>
> A new setup parameter – lsm.ema.cache_decisions has been introduced to offer
> the choice between memory consumption and accuracy of audit logs. Enabling
> lsm.ema.cache_decisions causes LSM framework NOT to keep backing files open for
> EMAs. While that saves memory, it requires LSM modules to make and cache
> decisions ahead of time, and makes it difficult for LSM modules to generate
> accurate audit logs. System administrators are expected to run LSM in
> permissive mode with lsm.ema.cache_decisions off to determine the minimal
> permissions needed, and then turn it back on in enforcing mode for optimal
> performance and memory usage. lsm.ema.cache_decisions is on by default and
> could be turned off by appending “lsm.ema.cache_decisions=0” or
> “lsm.ema.cache_decisions=off” to the kernel command line.
Just on a very cursory review, this seems like it's creating a bunch
of complexity (a whole new library and data structure), and I'm not
convinced the result is any better than Sean's version.
Hi Andy,
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Saturday, June 29, 2019 4:47 PM
>
> Just on a very cursory review, this seems like it's creating a bunch of
> complexity (a whole new library and data structure), and I'm not
> convinced the result is any better than Sean's version.
The new EMA data structure is to track enclave pages by range. Yes, Sean avoided that by storing similar information in the existing encl_page structure inside SGX subsystem. But as I pointed out, his code has to iterate through *every* page in range so mprotect() will be very slow if the range is large. So he would end up introducing something similar to achieve the same performance.
And that's not the most important point. The major problem in his patch lies in SGX2 support, as #PF driven EAUG cannot be supported (or he'd have to amend his code accordingly, which will add complexity and tip your scale).
Other weird things, such as mmap()'ing page by page vs. mmap()'ing the whole range will impact subsequent mprotect()'s as you have noticed, don't exist in my series.
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Saturday, June 29, 2019 4:42 PM
>
> On Tue, Jun 25, 2019 at 2:09 PM Stephen Smalley <sds@tycho.nsa.gov>
> wrote:
> >
> > On 6/21/19 5:22 PM, Xing, Cedric wrote:
> > >> From: Christopherson, Sean J
> > >> Sent: Wednesday, June 19, 2019 3:24 PM
> > >>
> > >> Intended use of each permission:
> > >>
> > >> - SGX_EXECDIRTY: dynamically load code within the enclave itself
> > >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g.
> > >> Graphene
> > >
> > > Why does it matter whether a code page is measured or not?
> >
> > It won't be incorporated into an attestation?
> >
>
> Also, if there is, in parallel, a policy that limits the set of enclave
> SIGSTRUCTs that are accepted, requiring all code be measured makes it
> harder to subvert by writing incompetent or maliciously incompetent
> enclaves.
As analyzed in my reply to one of Stephen's comments, no executable page shall be "enclave only" as enclaves have access to host's memory, so if an executable page in regular memory is considered posting a threat to the process, it should be considered posting the same threat inside an enclave as well.
That said, every executable enclave page should have an executable source page (doesn’t have to executable, as long as mprotect(X) would succeed on it, as shown in my patch), hence any exploits mountable on the enclave page shall also be mountable using the source page. Given only the weakest link matters in security, I argue that SGX_EXECUNMR is unnecessary from the process's perspective.
SGX_EXECUNMR does impact security from the enclave's perspective, thus it is reflected in enclave's measurement, which is part of SGX ISA. It's the enclave vendor's responsibility to ensure code pages are properly measured and that's largely automated by tools. It's highly unlikely an ISV would "forget" to measure a page so I don't think SGX_EXECUNMR has much value for ISVs.
So the only case left is the enclave author left a page unmeasured with a malicious intent. As that's part of the enclave measurement, it would get caught at EINIT because of an untrusted/blacklisted signing key, or it doesn't because of the lack of whitelisting/blacklisting mechanism. But in the latter case, the adversary could just measure the malicious page as the final measurement or signing key doesn't matter anyway. Sean's series doesn't have an enclave_init() hook so it will always be the latter case, where the final measurement doesn't matter. Therefore, SGX_EXECUNMR doesn't have any value as adversaries could always measure all code pages to satisfy the policy without worrying about final measurements.
On Mon, Jul 1, 2019 at 10:46 AM Xing, Cedric <cedric.xing@intel.com> wrote: > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > Sent: Saturday, June 29, 2019 4:42 PM > > > > On Tue, Jun 25, 2019 at 2:09 PM Stephen Smalley <sds@tycho.nsa.gov> > > wrote: > > > > > > On 6/21/19 5:22 PM, Xing, Cedric wrote: > > > >> From: Christopherson, Sean J > > > >> Sent: Wednesday, June 19, 2019 3:24 PM > > > >> > > > >> Intended use of each permission: > > > >> > > > >> - SGX_EXECDIRTY: dynamically load code within the enclave itself > > > >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. > > > >> Graphene > > > > > > > > Why does it matter whether a code page is measured or not? > > > > > > It won't be incorporated into an attestation? > > > > > > > Also, if there is, in parallel, a policy that limits the set of enclave > > SIGSTRUCTs that are accepted, requiring all code be measured makes it > > harder to subvert by writing incompetent or maliciously incompetent > > enclaves. > > As analyzed in my reply to one of Stephen's comments, no executable page shall be "enclave only" as enclaves have access to host's memory, so if an executable page in regular memory is considered posting a threat to the process, it should be considered posting the same threat inside an enclave as well. Huh? The SDM (37.3 in whateve version I'm reading) says "Code fetches from inside an enclave to a linear address outside that enclave result in a #GP(0) exception." Enclaves execute enclave code only. In any event, I believe we're discussing taking readable memory from outside the enclave and copying it to an executable code inside the enclave. > > That said, every executable enclave page should have an executable source page (doesn’t have to executable, as long as mprotect(X) would succeed on it, as shown in my patch) Does Sean's series require this? I think that, if we can get away with it, it's a lot nicer to *not* require user code to map the source pages PROT_EXEC. Some policy may check that it's VM_MAYEXEC or check some other attribute of the VMA, but actually requiring PROT_EXEC seems like we're weakening existing hardening measures to enforce a policy, which is a mistake.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Casey Schaufler > > On 6/28/2019 6:37 PM, Stephen Smalley wrote: > > On Fri, Jun 28, 2019 at 1:22 PM Casey Schaufler <casey@schaufler- > ca.com> wrote: > >> On 6/27/2019 5:47 PM, Xing, Cedric wrote: > >>>> From: Casey Schaufler [mailto:casey@schaufler-ca.com] > >>>> Sent: Thursday, June 27, 2019 4:37 PM > >>>>>> This code should not be mixed in with the LSM infrastructure. > >>>>>> It should all be contained in its own module, under > security/enclave. > >>>>> lsm_ema is *intended* to be part of the LSM infrastructure. > >>>> That's not going to fly, not for a minute. > >>> Why not, if there's a need for it? > >>> > >>> And what's the concern here if it becomes part of the LSM > infrastructure. > >> The LSM infrastructure provides a framework for hooks and allocation > >> of blobs. That's it. It's a layer for connecting system features like > >> VFS, IPC and the IP stack to the security modules. It does not > >> implement any policy of it's own. We are not going to implement SGX > >> or any other mechanism within the LSM infrastructure. > > I don't think you understand the purpose of this code. It isn't > > implementing SGX, nor is it needed by SGX. > > It is providing shared infrastructure for security modules, similar to > > lsm_audit.c, so that security modules can enforce W^X or similar > > memory protection guarantees for SGX enclave memory, which has unique > > properties that render the existing mmap and mprotect hooks > > insufficient. They can certainly implement it only for SELinux, but > > then any other security module that wants to provide such guarantees > > will have to replicate that code. > > I am not objecting to the purpose of the code. > I *am* objecting to calling it part of the LSM infrastructure. > It needs to be it's own thing, off somewhere else. > It must not use the lsm_ prefix. That's namespace pollution. > The code must not be embedded in the LSM infrastructure code, that > breaks with how everything else works. If you understand the purpose, then why are you objecting the lsm_ prefix as they are APIs to be used by all LSM modules? Or what should be the prefix in your mind? Or what kind of APIs do you think can qualify the lsm_ prefix? And I'd like to clarify that it doesn't break anything, but is just a bit different, in that security_enclave_load() and security_file_free() call into those APIs. But there's a need for them because otherwise code/data would have to be duplicated among LSMs and the logic would be harder to comprehend. So that's a trade-off. Then what's the practical drawback of doing that? If no, why would we want to pay for the cost for not doing that? > > ... and the notion that you allocate data for one blob that gets freed > relative to another breaks the data management model. What do you mean here? EMA blobs are allocated/freed *not* relative to any other blobs.
On Mon, Jul 1, 2019 at 10:11 AM Xing, Cedric <cedric.xing@intel.com> wrote: > > Hi Andy, > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > Sent: Saturday, June 29, 2019 4:47 PM > > > > Just on a very cursory review, this seems like it's creating a bunch of > > complexity (a whole new library and data structure), and I'm not > > convinced the result is any better than Sean's version. > > The new EMA data structure is to track enclave pages by range. Yes, Sean avoided that by storing similar information in the existing encl_page structure inside SGX subsystem. But as I pointed out, his code has to iterate through *every* page in range so mprotect() will be very slow if the range is large. So he would end up introducing something similar to achieve the same performance. It seems odd to stick it in security/ if it only has one user, though. Also, if it wasn't in security/, then the security folks would stop complaining :) > > And that's not the most important point. The major problem in his patch lies in SGX2 support, as #PF driven EAUG cannot be supported (or he'd have to amend his code accordingly, which will add complexity and tip your scale). > Why can't it be?
On Wed, Jun 19, 2019 at 3:24 PM Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
> static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
> {
> struct sgx_encl *encl = file->private_data;
> + unsigned long allowed_rwx;
> int ret;
>
> + allowed_rwx = sgx_allowed_rwx(encl, vma);
> + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx)
> + return -EACCES;
> +
> ret = sgx_encl_mm_add(encl, vma->vm_mm);
> if (ret)
> return ret;
>
> + if (!(allowed_rwx & VM_READ))
> + vma->vm_flags &= ~VM_MAYREAD;
> + if (!(allowed_rwx & VM_WRITE))
> + vma->vm_flags &= ~VM_MAYWRITE;
> + if (!(allowed_rwx & VM_EXEC))
> + vma->vm_flags &= ~VM_MAYEXEC;
> +
I'm with Cedric here -- this is no good. The reason I think we need
.may_mprotect or similar is exactly to avoid doing this.
mmap() just needs to make the same type of VMA regardless of the pages
in the range.
> From: Stephen Smalley [mailto:stephen.smalley@gmail.com]
> Sent: Friday, June 28, 2019 6:22 PM
>
> On Fri, Jun 28, 2019 at 5:54 PM Xing, Cedric <cedric.xing@intel.com>
> wrote:
> >
> > > From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> > > owner@vger.kernel.org] On Behalf Of Stephen Smalley
> > > Sent: Friday, June 28, 2019 9:37 AM
> > >
> > > > lsm.ema.cache_decisions is on by default and could be turned off
> > > > by appending “lsm.ema.cache_decisions=0” or
> > > > “lsm.ema.cache_decisions=off” to the kernel command line.
> > >
> > > This seems problematic on a few fronts:
> > >
> > > - Specifying it as a boot parameter requires teaching admins /
> > > policy developers to do something in addition to what they are
> > > already doing when they want to create policy,
> > >
> > > - Limiting it to a boot parameter requires a reboot to change the
> > > mode of operation, whereas SELinux offers runtime toggling of
> > > permissive mode and even per-process (domain) permissive mode (and
> > > so does AppArmor),
> >
> > How about making a variable tunable via sysctl?
>
> Better than a boot parameter but still not amenable to per-domain
> permissive and still requires admins to remember and perform an extra
> step before collecting denials.
>
> >
> > >
> > > - In the cache_decisions=1 case, do we get any auditing at all? If
> > > not, that's a problem. We want auditing not only when we are
> > > generating/learning policy but also in operation.
> >
> > Currently it doesn't generate audit log, but I could add it, except it
> couldn't point to the exact file. But I can use the sigstruct file
> instead so administrators can at least tell which enclave violates the
> policy. Do you think it acceptable?
>
> Seems prone to user confusion and lacks precision in why the denial
> occurred.
>
> >
> > >
> > > - There is the potential for inconsistencies to arise between the
> > > enforcement applied with different cache_decisions values.
> >
> > The enforcement will be consistent. The difference only lies in the
> logs.
> >
> > >
> > > I would suggest that we just never cache the decision and accept the
> > > cost if we are going to take this approach.
> >
> > This will also be a viable option. I don't think any enclaves would be
> comprised of a large number of files anyway. When SGX2 comes up, I think
> most enclaves will be instantiated from only one file and defer loading
> libraries at runtime. So in practice we are looking to keeping only one
> file open per enclave, which seems totally acceptable.
> >
> > Stephen (and everyone having an opinion on this), which way do you
> prefer? sysctl variable? Or never cache decisions?
>
> I'd favor never caching decisions.
Alright, I'll remove the boot parameter and never cache decisions.
> From: Stephen Smalley [mailto:stephen.smalley@gmail.com] > Sent: Friday, June 28, 2019 6:16 PM > > On Fri, Jun 28, 2019 at 5:20 PM Xing, Cedric <cedric.xing@intel.com> > wrote: > > > > > From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > > > owner@vger.kernel.org] On Behalf Of Stephen Smalley > > > Sent: Friday, June 28, 2019 9:17 AM > > > > > > FWIW, adding new permissions only requires updating policy > > > configuration, not userspace code/tools. But in any event, we can > > > reuse the execute- related permissions if it makes sense but still > > > consider introducing additional, new permissions, possibly in a > > > separate "enclave" security class, if we want explicit control over > enclave loading, e.g. > > > ENCLAVE__LOAD, ENCLAVE__INIT, etc. > > > > I'm not so familiar with SELinux tools so my apology in advance if I > end up mixing up things. > > > > I'm not only talking about the new permissions, but also how to apply > them to enclave files. Intel SGX SDK packages enclaves as .so files, and > I guess that's the most straight forward way that most others would do. > So if different permissions are defined, then user mode tools would have > to distinguish enclaves from regular .so files in order to grant them > different permissions. Would that be something extra to existing tools? > > It doesn't require any userspace code changes. It is just a matter of > defining some configuration data in the policy for the new permissions, > one or more security labels (tags) for the SGX .so files, and rules > allowing access where desired, and then setting those security labels on > the SGX .so files (via the security.selinux extended attribute on the > files). Even the last part is generally handled by updating a > configuration specifying how files should be labeled and then rpm > automatically labels the files when created, or you can manually > restorecon them. If the new permissions are defined in their own > security class rather than reusing existing ones, then they can even be > defined entirely via a local or third party policy module separate from > the distro policy if desired/needed. I'm not objecting to what you proposed but just trying to understand more. SGX enclaves don't look any different than regular shared objects except the meta data section, which is implementation dependent (all enclaves built by Intel's SDK have .note.sgxmeta sections but others could do something completely different and may not even use ELF sections). Then how does rpm tell whether a .so file is a regular shared object or an SGX enclave? My understanding is, rpm has to be able to distinguish those two in order to label them correctly (differently). Am I correct? > > > > > > > > > One residual concern I have with the reuse of FILE__EXECUTE is using > > > it for the sigstruct file as the fallback case. If the sigstruct is > > > always part of the same file as the code, then it probably doesn't > > > matter. But otherwise, it is somewhat odd to have to allow the host > > > process to execute from the sigstruct file if it is only data (the > signature). > > > > I agree with you. But do you think it a practical problem today? As > far as I know, no one is deploying sigstructs in dedicated files. I'm > just trying to touch as few things as possible until there's definitely > a need to do so. > > I don't know, and it wasn't clear to me from the earlier discussions. > If not and if it is acceptable to require them to be in files in the > first place, then perhaps it isn't necessary.
> From: Andy Lutomirski [mailto:luto@kernel.org] > Sent: Monday, July 01, 2019 10:58 AM > > On Mon, Jul 1, 2019 at 10:11 AM Xing, Cedric <cedric.xing@intel.com> > wrote: > > > > Hi Andy, > > > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > > Sent: Saturday, June 29, 2019 4:47 PM > > > > > > Just on a very cursory review, this seems like it's creating a bunch > > > of complexity (a whole new library and data structure), and I'm not > > > convinced the result is any better than Sean's version. > > > > The new EMA data structure is to track enclave pages by range. Yes, > Sean avoided that by storing similar information in the existing > encl_page structure inside SGX subsystem. But as I pointed out, his code > has to iterate through *every* page in range so mprotect() will be very > slow if the range is large. So he would end up introducing something > similar to achieve the same performance. > > It seems odd to stick it in security/ if it only has one user, though. > Also, if it wasn't in security/, then the security folks would stop > complaining :) That's where I started. EMA (though named differently in my v1) was buried inside and used only by SELinux. But Stephen thought it useful for other LSMs as well, as it could be expected that other LSMs would also need to track enclave pages and end up duplicating what's done inside SELinux. I'm ok either way, though I do agree with Stephen's assessment. > > > > > > And that's not the most important point. The major problem in his > patch lies in SGX2 support, as #PF driven EAUG cannot be supported (or > he'd have to amend his code accordingly, which will add complexity and > tip your scale). > > > > Why can't it be? Let me take it back. It's important as it is where LSM folks are divided. I intended to say the major reason I objected Sean's approach was its inability to support SGX2 smoothly - as #PF driven EAUG requires non-existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be dispatched so EAUG couldn't be issued in response to #PF.
> From: Andy Lutomirski [mailto:luto@kernel.org] > Sent: Monday, July 01, 2019 10:54 AM > > On Mon, Jul 1, 2019 at 10:46 AM Xing, Cedric <cedric.xing@intel.com> > wrote: > > > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > > Sent: Saturday, June 29, 2019 4:42 PM > > > > > > On Tue, Jun 25, 2019 at 2:09 PM Stephen Smalley <sds@tycho.nsa.gov> > > > wrote: > > > > > > > > On 6/21/19 5:22 PM, Xing, Cedric wrote: > > > > >> From: Christopherson, Sean J > > > > >> Sent: Wednesday, June 19, 2019 3:24 PM > > > > >> > > > > >> Intended use of each permission: > > > > >> > > > > >> - SGX_EXECDIRTY: dynamically load code within the enclave > itself > > > > >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. > > > > >> Graphene > > > > > > > > > > Why does it matter whether a code page is measured or not? > > > > > > > > It won't be incorporated into an attestation? > > > > > > > > > > Also, if there is, in parallel, a policy that limits the set of > > > enclave SIGSTRUCTs that are accepted, requiring all code be measured > > > makes it harder to subvert by writing incompetent or maliciously > > > incompetent enclaves. > > > > As analyzed in my reply to one of Stephen's comments, no executable > page shall be "enclave only" as enclaves have access to host's memory, > so if an executable page in regular memory is considered posting a > threat to the process, it should be considered posting the same threat > inside an enclave as well. What I was trying to say was, an executable page, if considered a threat to the enclosing process, should always be considered a threat no matter it is in that process's memory or inside an enclave enclosed in that same process's address space. Therefore, for a page in regular memory, if it is denied executable, it is because it is considered a potential security threat to the enclosing process, so it shall not be used as the source for an executable enclave page, as the same threat exists regardless it is in regular memory or EPC. Does that make more sense? > > Huh? The SDM (37.3 in whateve version I'm reading) says "Code fetches > from inside an enclave to a linear address outside that enclave result > in a #GP(0) exception." Enclaves execute enclave code only. > > In any event, I believe we're discussing taking readable memory from > outside the enclave and copying it to an executable code inside the > enclave. You are correct. SGX ISA doesn't care the source page as it only takes care of the security the enclave itself. But LSM on the other hand also takes care of the enclosing process. That said, a page, if denied executable because it is considered a potential threat to the process by LSM, should also be denied (by LSM) as the source for an executable enclave page because the same threat would exist even if it resides inside an enclave, for enclaves have access to all of the enclosing process's memory. > > > > > That said, every executable enclave page should have an executable > > source page (doesn’t have to executable, as long as mprotect(X) would > > succeed on it, as shown in my patch) > > Does Sean's series require this? I think that, if we can get away with > it, it's a lot nicer to *not* require user code to map the source pages > PROT_EXEC. Some policy may check that it's VM_MAYEXEC or check some > other attribute of the VMA, but actually requiring PROT_EXEC seems like > we're weakening existing hardening measures to enforce a policy, which > is a mistake. My patch doesn't require X on source pages either. I said "would", meaning X *would* be granted but doesn't have to be granted. You can see this in selinux_enclave_load() calling selinux_file_mprotect() in my code. The purpose is to determine if X *would* be granted to the source pages without actually granting X.
Hi Andy,
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> owner@vger.kernel.org] On Behalf Of Xing, Cedric
> Sent: Monday, July 01, 2019 11:54 AM
> > >
> > > That said, every executable enclave page should have an executable
> > > source page (doesn’t have to executable, as long as mprotect(X)
> would
> > > succeed on it, as shown in my patch)
> >
> > Does Sean's series require this? I think that, if we can get away
> with
> > it, it's a lot nicer to *not* require user code to map the source
> pages
> > PROT_EXEC. Some policy may check that it's VM_MAYEXEC or check some
> > other attribute of the VMA, but actually requiring PROT_EXEC seems
> like
> > we're weakening existing hardening measures to enforce a policy, which
> > is a mistake.
>
> My patch doesn't require X on source pages either. I said "would",
> meaning X *would* be granted but doesn't have to be granted. You can see
> this in selinux_enclave_load() calling selinux_file_mprotect() in my
> code. The purpose is to determine if X *would* be granted to the source
> pages without actually granting X.
Forgot to conclude that we are on the same page for the requirement on the source pages.
And given that requirement (enclave page cannot be X unless source would also be allowed X), measuring enclave code pages or not doesn't make any difference from the enclosing process's perspective in terms of security. So it only makes a difference for the enclave, which however has been covered cryptographically by its measurement already. So SGX_EXECUNMR doesn't have any practical use, thus I don't think it should be added as a new permission.
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Monday, July 01, 2019 11:00 AM
>
> On Wed, Jun 19, 2019 at 3:24 PM Sean Christopherson
> <sean.j.christopherson@intel.com> wrote:
> > static int sgx_mmap(struct file *file, struct vm_area_struct *vma) {
> > struct sgx_encl *encl = file->private_data;
> > + unsigned long allowed_rwx;
> > int ret;
> >
> > + allowed_rwx = sgx_allowed_rwx(encl, vma);
> > + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) &
> ~allowed_rwx)
> > + return -EACCES;
> > +
> > ret = sgx_encl_mm_add(encl, vma->vm_mm);
> > if (ret)
> > return ret;
> >
> > + if (!(allowed_rwx & VM_READ))
> > + vma->vm_flags &= ~VM_MAYREAD;
> > + if (!(allowed_rwx & VM_WRITE))
> > + vma->vm_flags &= ~VM_MAYWRITE;
> > + if (!(allowed_rwx & VM_EXEC))
> > + vma->vm_flags &= ~VM_MAYEXEC;
> > +
>
> I'm with Cedric here -- this is no good. The reason I think we
> need .may_mprotect or similar is exactly to avoid doing this.
>
> mmap() just needs to make the same type of VMA regardless of the pages
> in the range.
Instead of making decisions on its own, a more generic approach is for SGX subsystem/module to ask LSM for a decision, by calling security_file_mprotect() - as a new mapping could be considered as changing protection from PROT_NONE to (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC)).
.may_mprotect() also solves part of the problem - i.e. VMAs will be created consistently but non-existent pages still cannot be mapped, which however is necessary for #PF driven EAUG in SGX2. Given that security_file_mprotect() is invoked by mprotect() syscall, it looks to me a more streamlined solution to call the same hook (security_file_mprotect) from both places (mmap and mprotect).
On Mon, Jul 1, 2019 at 11:54 AM Xing, Cedric <cedric.xing@intel.com> wrote: > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > Sent: Monday, July 01, 2019 10:54 AM > > > > On Mon, Jul 1, 2019 at 10:46 AM Xing, Cedric <cedric.xing@intel.com> > > wrote: > > > > > > > From: Andy Lutomirski [mailto:luto@kernel.org] > > > > Sent: Saturday, June 29, 2019 4:42 PM > > > > > > > > On Tue, Jun 25, 2019 at 2:09 PM Stephen Smalley <sds@tycho.nsa.gov> > > > > wrote: > > > > > > > > > > On 6/21/19 5:22 PM, Xing, Cedric wrote: > > > > > >> From: Christopherson, Sean J > > > > > >> Sent: Wednesday, June 19, 2019 3:24 PM > > > > > >> > > > > > >> Intended use of each permission: > > > > > >> > > > > > >> - SGX_EXECDIRTY: dynamically load code within the enclave > > itself > > > > > >> - SGX_EXECUNMR: load unmeasured code into the enclave, e.g. > > > > > >> Graphene > > > > > > > > > > > > Why does it matter whether a code page is measured or not? > > > > > > > > > > It won't be incorporated into an attestation? > > > > > > > > > > > > > Also, if there is, in parallel, a policy that limits the set of > > > > enclave SIGSTRUCTs that are accepted, requiring all code be measured > > > > makes it harder to subvert by writing incompetent or maliciously > > > > incompetent enclaves. > > > > > > As analyzed in my reply to one of Stephen's comments, no executable > > page shall be "enclave only" as enclaves have access to host's memory, > > so if an executable page in regular memory is considered posting a > > threat to the process, it should be considered posting the same threat > > inside an enclave as well. > > What I was trying to say was, an executable page, if considered a threat to the enclosing process, should always be considered a threat no matter it is in that process's memory or inside an enclave enclosed in that same process's address space. > > Therefore, for a page in regular memory, if it is denied executable, it is because it is considered a potential security threat to the enclosing process, so it shall not be used as the source for an executable enclave page, as the same threat exists regardless it is in regular memory or EPC. Does that make more sense? It does make sense, but I'm not sure it's correct to assume that any LSM policy will always allow execution on enclave source pages if it would allow execution inside the enclave. As an example, here is a policy that seems reasonable: Task A cannot execute dynamic non-enclave code (no execmod, no execmem, etc -- only approved unmodified file pages can be executed). But task A can execute an enclave with MRENCLAVE == such-and-such, and that enclave may be loaded from regular anonymous memory -- the MRENCLAVE is considered enough verification. > > My patch doesn't require X on source pages either. I said "would", meaning X *would* be granted but doesn't have to be granted. You can see this in selinux_enclave_load() calling selinux_file_mprotect() in my code. The purpose is to determine if X *would* be granted to the source pages without actually granting X. As above, I'm not convinced this assumption is valid.
On Mon, Jul 1, 2019 at 11:31 AM Xing, Cedric <cedric.xing@intel.com> wrote:
> I intended to say the major reason I objected Sean's approach was its inability to support SGX2 smoothly - as #PF driven EAUG requires non-existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be dispatched so EAUG couldn't be issued in response to #PF.
I still think that, if the kernel wants to support #PF-driven EAUG, it
should be an opt-in thing. It would be something like
SGX_IOC_ADD_LAZY_EAUG_PAGES or similar. If it's done that way, then
the driver needs to learn how to track ranges of pages efficiently,
which is another reason to consider leaving all the fancy page / page
range tracking in the driver.
I don't think it's a good idea for a page fault on any non-EADDed page
in ELRANGE to automatically populate the page.
On 7/1/2019 10:57 AM, Xing, Cedric wrote: >> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- >> owner@vger.kernel.org] On Behalf Of Casey Schaufler >> >> On 6/28/2019 6:37 PM, Stephen Smalley wrote: >>> On Fri, Jun 28, 2019 at 1:22 PM Casey Schaufler <casey@schaufler- >> ca.com> wrote: >>>> On 6/27/2019 5:47 PM, Xing, Cedric wrote: >>>>>> From: Casey Schaufler [mailto:casey@schaufler-ca.com] >>>>>> Sent: Thursday, June 27, 2019 4:37 PM >>>>>>>> This code should not be mixed in with the LSM infrastructure. >>>>>>>> It should all be contained in its own module, under >> security/enclave. >>>>>>> lsm_ema is *intended* to be part of the LSM infrastructure. >>>>>> That's not going to fly, not for a minute. >>>>> Why not, if there's a need for it? >>>>> >>>>> And what's the concern here if it becomes part of the LSM >> infrastructure. >>>> The LSM infrastructure provides a framework for hooks and allocation >>>> of blobs. That's it. It's a layer for connecting system features like >>>> VFS, IPC and the IP stack to the security modules. It does not >>>> implement any policy of it's own. We are not going to implement SGX >>>> or any other mechanism within the LSM infrastructure. >>> I don't think you understand the purpose of this code. It isn't >>> implementing SGX, nor is it needed by SGX. >>> It is providing shared infrastructure for security modules, similar to >>> lsm_audit.c, so that security modules can enforce W^X or similar >>> memory protection guarantees for SGX enclave memory, which has unique >>> properties that render the existing mmap and mprotect hooks >>> insufficient. They can certainly implement it only for SELinux, but >>> then any other security module that wants to provide such guarantees >>> will have to replicate that code. >> I am not objecting to the purpose of the code. >> I *am* objecting to calling it part of the LSM infrastructure. >> It needs to be it's own thing, off somewhere else. >> It must not use the lsm_ prefix. That's namespace pollution. >> The code must not be embedded in the LSM infrastructure code, that >> breaks with how everything else works. > If you understand the purpose, The purpose is to support the SGX hardware, is it not? If you don't have SGX hardware (e.g. MIPS, ARM, s390) you don't need this code. > then why are you objecting the lsm_ prefix as they are APIs to be used by all LSM modules? We name interfaces based on what they provide, not who consumes them. As your code provides enclave services, that is how they should be named. > Or what should be the prefix in your mind? I'm pretty sure that I've consistently suggested "enclave". > Or what kind of APIs do you think can qualify the lsm_ prefix? Code that implements the LSM infrastructure. There is one LSM blob allocation interface, lsm_inode_alloc(), that is used in early set-up that is exported. As I've mentioned more than once, enclave/page management is not an LSM infrastructure function, it's a memory management function. > And I'd like to clarify that it doesn't break anything, but is just a bit different, in that security_enclave_load() and security_file_free() call into those APIs. There should be nothing in security_enclave_load() except calls to the enclave_load() hooks (e.g. selinux_enclave_load()). There should be nothing in security_file_free() except file blob management calls to the file_free() hooks (e.g. apparmor_file_free()). > But there's a need for them because otherwise code/data would have to be duplicated among LSMs There's plenty of code duplication among the LSMs, because a lot of what they do is the same thing. Someday there may be an effort to address some of that, but I don't think it's on anybody's radar. As for data duplication, there's a reason we use lots of pointers. > and the logic would be harder to comprehend. Keeping the layering clean is critical to comprehension. There's a lot of VFS code that could have been implemented within the LSM infrastructure, but I don't think that anyone would argue that it should have been. > So that's a trade-off. I remain completely unconvinced that your proposal represents a good way to implement you scheme. > Then what's the practical drawback of doing that? Name space pollution. Layering violation. Architecture specific implementation detail in a general infrastructure. > If no, why would we want to pay for the cost for not doing that? Modularity and maintainability come directly to mind. >> ... and the notion that you allocate data for one blob that gets freed >> relative to another breaks the data management model. > What do you mean here? You're freeing the EMA data from security_file_free(). If selinux wants to free EMA data it has allocated in selinux_enclave_load() in selinux_file_free() that's fine, but the LSM infrastructure has no need to know about it. EMA needs to manage its own data, just like VFS does. The LSM infrastructure provides blob management so that the security modules can extend data if they want to. > EMA blobs are allocated/freed *not* relative to any other blobs. In the code you proposed they are freed in security_file_free(). That is for file blob management.
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Monday, July 01, 2019 12:36 PM
>
> On Mon, Jul 1, 2019 at 11:31 AM Xing, Cedric <cedric.xing@intel.com>
> wrote:
> > I intended to say the major reason I objected Sean's approach was its
> inability to support SGX2 smoothly - as #PF driven EAUG requires non-
> existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be
> dispatched so EAUG couldn't be issued in response to #PF.
>
> I still think that, if the kernel wants to support #PF-driven EAUG, it
> should be an opt-in thing. It would be something like
> SGX_IOC_ADD_LAZY_EAUG_PAGES or similar. If it's done that way, then
> the driver needs to learn how to track ranges of pages efficiently,
> which is another reason to consider leaving all the fancy page / page
> range tracking in the driver.
>
> I don't think it's a good idea for a page fault on any non-EADDed page
> in ELRANGE to automatically populate the page.
I'm with you. The user code shall be explicit on which range to EAUG pages upon #PF. What I'm saying is, a range has to be mapped before the driver could receive #PFs (in the form of vm_ops->fault callbacks). But Sean's series doesn’t support that because no pages can be mapped before coming into existence.
For "page tracking", what information to track is LSM dependent, so it may run into problems if different LSMs want to track different things. And that's the major reason I think it should be done inside LSM.
Besides, the current page tracking structure in the driver is page oriented and its sole purpose is to serve #PFs. Page protection is better tracked using range oriented structures. Those 2 are orthogonal. It wouldn't reduce the complexity of the whole kernel by moving it into SGX driver.
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Monday, July 01, 2019 12:33 PM
>
> It does make sense, but I'm not sure it's correct to assume that any LSM
> policy will always allow execution on enclave source pages if it would
> allow execution inside the enclave. As an example, here is a policy
> that seems reasonable:
>
> Task A cannot execute dynamic non-enclave code (no execmod, no execmem,
> etc -- only approved unmodified file pages can be executed).
> But task A can execute an enclave with MRENCLAVE == such-and-such, and
> that enclave may be loaded from regular anonymous memory -- the
> MRENCLAVE is considered enough verification.
You are right. That's a reasonable policy. But I still can't see the need for SGX_EXECUNMR, as MRENCLAVE is considered enough verification.
> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > owner@vger.kernel.org] On Behalf Of Casey Schaufler > Sent: Monday, July 01, 2019 12:54 PM > > If you understand the purpose, > > The purpose is to support the SGX hardware, is it not? > If you don't have SGX hardware (e.g. MIPS, ARM, s390) you don't need > this code. No, it is NOT to support SGX - i.e. SGX doesn't require this piece of code to work. And as Dr. Greg pointed out, it can be used for other TEEs than SGX. It doesn't contain SGX h/w specifics. It is compiled out because there's no module calling it on other architectures today. But it doesn't conflict with any h/w and may be useful (for other TEEs) on other architectures in future. > > > then why are you objecting the lsm_ prefix as they are APIs to be used > by all LSM modules? > > We name interfaces based on what they provide, not who consumes them. > As your code provides enclave services, that is how they should be named. > > > Or what should be the prefix in your mind? > > I'm pretty sure that I've consistently suggested "enclave". > > > Or what kind of APIs do you think can qualify the lsm_ prefix? > > Code that implements the LSM infrastructure. There is one LSM blob > allocation interface, lsm_inode_alloc(), that is used in early set-up > that is exported. As I've mentioned more than once, enclave/page > management is not an LSM infrastructure function, it's a memory > management function. It doesn't manage anything. The reason it appears in the infrastructure is because the decision of inserting an EMA depends on the decisions from *all* active LSMs. That is NOT new either, as you can see it in security_file_permission() and security_vm_enough_memory_mm(), both do something after all LSM modules make their decisions. Would you please explain why you don't see those as problems but calling EMA functions in security_enclave_load() is a problem? > > > And I'd like to clarify that it doesn't break anything, but is just a > bit different, in that security_enclave_load() and security_file_free() > call into those APIs. > > There should be nothing in security_enclave_load() except calls to the > enclave_load() hooks (e.g. selinux_enclave_load()). There should be > nothing in security_file_free() except file blob management calls to the > file_free() hooks (e.g. apparmor_file_free()). As above, there are examples in security/security.c where the hook does more than just calling registered hooks from LSMs. > > > But there's a need for them because otherwise code/data would have to > > be duplicated among LSMs > > There's plenty of code duplication among the LSMs, because a lot of what > they do is the same thing. Someday there may be an effort to address > some of that, but I don't think it's on anybody's radar. > As for data duplication, there's a reason we use lots of pointers. As stated above, security_enclave_load() needs to do something extra after all LSMs make their decisions. How can pointers help here? > > > and the logic would be harder to comprehend. > > Keeping the layering clean is critical to comprehension. > There's a lot of VFS code that could have been implemented within the > LSM infrastructure, but I don't think that anyone would argue that it > should have been. > > > So that's a trade-off. > > I remain completely unconvinced that your proposal represents a good way > to implement you scheme. > > > Then what's the practical drawback of doing that? > > Name space pollution. Alright, I can fix the names. > Layering violation. Not sure what you are referring to. If you are referring to buffers allocated in one layer and freed in elsewhere, you have got the code wrong. Buffers allocated in security_enclave_load() is freed in security_file_free(). Whatever else allocated in LSMs are not seen or taken care of by the infrastructure. The purpose of allocating EMAs in enclave_load() is trying to minimize overhead for non-enclave files, otherwise it could be done in file_alloc() to be more "paired" with file_free(). But I don't see it necessary. > Architecture specific implementation detail in a general infrastructure. Stated earlier, it doesn't contain any h/w specifics but just a TEE abstraction. It could be left on all the time or controlled by a different config macro. It is contingent to CONFIG_INTEL_SGX just for convenience, as SGX is the first (and only so far) TEE that needs attention from LSM, but there could be more in future. > > > If no, why would we want to pay for the cost for not doing that? > > Modularity and maintainability come directly to mind. Putting it elsewhere will incur more maintenance cost. > > >> ... and the notion that you allocate data for one blob that gets > >> freed relative to another breaks the data management model. > > What do you mean here? > > You're freeing the EMA data from security_file_free(). > If selinux wants to free EMA data it has allocated in > selinux_enclave_load() in selinux_file_free() that's fine, but the LSM > infrastructure has no need to know about it. > EMA needs to manage its own data, just like VFS does. > The LSM infrastructure provides blob management so that the security > modules can extend data if they want to. You've got the code wrong. selinux_enclave_load() doesn't allocate any memory. selinux_file_mprotect() may, due to EMA split. But that's transparent to all LSMs. The LSM infrastructure doesn't know anything about what LSM modules do, nor does it manage any buffers allocated by any LSM modules. EMA is currently managing its own data. What's needed is the trigger - to let EMA know when to update its states. The trigger could be placed in LSM infrastructure or inside individual LSMs. The reason to put it in the infrastructure, is that it depends on the decision of *all* LSMs whether to insert a new EMA. That's similar to vm_enough_memory() where the final __vm_enough_memory() call is made by the infrastructure but not individual LSMs. > > > EMA blobs are allocated/freed *not* relative to any other blobs. > > In the code you proposed they are freed in security_file_free(). > That is for file blob management. Yes. EMA contributes to the file blob. But it only frees memory allocated by the infrastructure itself, not anything from any LSM modules.
On 7/1/2019 2:45 PM, Xing, Cedric wrote: >> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- >> owner@vger.kernel.org] On Behalf Of Casey Schaufler >> Sent: Monday, July 01, 2019 12:54 PM >>> If you understand the purpose, >> The purpose is to support the SGX hardware, is it not? >> If you don't have SGX hardware (e.g. MIPS, ARM, s390) you don't need >> this code. > No, it is NOT to support SGX Then what *is* it for? > - i.e. SGX doesn't require this piece of code to work. > > And as Dr. Greg pointed out, it can be used for other TEEs than SGX. That sure makes it sound like it's for SGX to me. > It doesn't contain SGX h/w specifics. I never said it did. But no one ever suggested doing anything here before SGX, and your subject line: "x86/sgx: Add SGX specific LSM hooks" says it does. > It is compiled out because there's no module calling it on other architectures today. But it doesn't conflict with any h/w and may be useful (for other TEEs) on other architectures in future. > >>> then why are you objecting the lsm_ prefix as they are APIs to be used >> by all LSM modules? >> >> We name interfaces based on what they provide, not who consumes them. >> As your code provides enclave services, that is how they should be named. >> >>> Or what should be the prefix in your mind? >> I'm pretty sure that I've consistently suggested "enclave". >> >>> Or what kind of APIs do you think can qualify the lsm_ prefix? >> Code that implements the LSM infrastructure. There is one LSM blob >> allocation interface, lsm_inode_alloc(), that is used in early set-up >> that is exported. As I've mentioned more than once, enclave/page >> management is not an LSM infrastructure function, it's a memory >> management function. > It doesn't manage anything. Sorry, "memory management" as in all that stuff around pages and TLBs and who gets what pages, as opposed to keeping track of anything on its own. > The reason it appears in the infrastructure is because the decision of inserting an EMA depends on the decisions from *all* active LSMs. You have not been listening. Most LSMs use VFS. We haven't rolled VFS functions into the LSM infrastructure. > That is NOT new either, as you can see it in security_file_permission() and security_vm_enough_memory_mm(), both do something after all LSM modules make their decisions. Did you look to see what it is they're doing? If you had, you would see that is nothing like what you're proposing. > Would you please explain why you don't see those as problems but calling EMA functions in security_enclave_load() is a problem? The enclave code should be calling security_enclave_load(), not the other way around. That assumes you're using the naming convention correctly. security_vm_enough_memory_mm() was discussed at length and there wasn't a clean way to get the logic right without putting the code here. security_file_permission() has the fs_notify_perm call for similar reasons. Neither is anything like what you're suggesting. >>> And I'd like to clarify that it doesn't break anything, but is just a >> bit different, in that security_enclave_load() and security_file_free() >> call into those APIs. >> >> There should be nothing in security_enclave_load() except calls to the >> enclave_load() hooks (e.g. selinux_enclave_load()). There should be >> nothing in security_file_free() except file blob management calls to the >> file_free() hooks (e.g. apparmor_file_free()). > As above, there are examples in security/security.c where the hook does more than just calling registered hooks from LSMs. And as I've said, that doesn't matter. You're still going about using the LSM infrastructure backwards. >>> But there's a need for them because otherwise code/data would have to >>> be duplicated among LSMs >> There's plenty of code duplication among the LSMs, because a lot of what >> they do is the same thing. Someday there may be an effort to address >> some of that, but I don't think it's on anybody's radar. >> As for data duplication, there's a reason we use lots of pointers. > As stated above, security_enclave_load() needs to do something extra after all LSMs make their decisions. How can pointers help here? I can explain it, but you clearly have no interest in doing anything to make your code fit into the system. I have a lot of other things to be doing. >>> and the logic would be harder to comprehend. >> Keeping the layering clean is critical to comprehension. >> There's a lot of VFS code that could have been implemented within the >> LSM infrastructure, but I don't think that anyone would argue that it >> should have been. >> >>> So that's a trade-off. >> I remain completely unconvinced that your proposal represents a good way >> to implement you scheme. >> >>> Then what's the practical drawback of doing that? >> Name space pollution. > Alright, I can fix the names. Good! >> Layering violation. > Not sure what you are referring to. The only places where the blob freed by security_file_free() may be allocated is security_file_alloc(). The security modules are welcome to do anything they like in addition, provided they clean up after themselves in their file_free() hooks. If SELinux wants to support controls on enclave information, and that requires additional data, SELinux should include space in its file blob for that information, or a pointer to the place where the enclave code is maintaining it. That's the way audit works. > If you are referring to buffers allocated in one layer and freed in elsewhere, you have got the code wrong. Buffers allocated in security_enclave_load() is freed in security_file_free(). It's up to the security module's file_free() to clean up anything that wasn't allocated in security_file_free(). Interested security modules should call enclave_load(), and put the information into their portion of the security blob. The module specific code can call enclave_file_free(), or whatever interface you want to provide, to clean up. That might take place in file_free(), but it also might be elsewhere. > Whatever else allocated in LSMs are not seen or taken care of by the infrastructure. The purpose of allocating EMAs in enclave_load() is trying to minimize overhead for non-enclave files, otherwise it could be done in file_alloc() to be more "paired" with file_free(). But I don't see it necessary. Try looking at maintaining what you've put into the LSM code as a separate entity. It makes it simpler. Really. >> Architecture specific implementation detail in a general infrastructure. > Stated earlier, it doesn't contain any h/w specifics but just a TEE abstraction. Then put it in the TEE system. > It could be left on all the time or controlled by a different config macro. True in any case. > It is contingent to CONFIG_INTEL_SGX just for convenience, as SGX is the first (and only so far) TEE that needs attention from LSM, but there could be more in future. All the more reason to keep it separate. These things never get simpler when they get more generalized. >>> If no, why would we want to pay for the cost for not doing that? >> Modularity and maintainability come directly to mind. > Putting it elsewhere will incur more maintenance cost. I don't believe that for a second. 40 years of C programming have taught me that trying to do multiple things in one place is always a bad idea. >>>> ... and the notion that you allocate data for one blob that gets >>>> freed relative to another breaks the data management model. >>> What do you mean here? >> You're freeing the EMA data from security_file_free(). >> If selinux wants to free EMA data it has allocated in >> selinux_enclave_load() in selinux_file_free() that's fine, but the LSM >> infrastructure has no need to know about it. >> EMA needs to manage its own data, just like VFS does. >> The LSM infrastructure provides blob management so that the security >> modules can extend data if they want to. > You've got the code wrong. selinux_enclave_load() doesn't allocate any memory. selinux_file_mprotect() may, due to EMA split. But that's transparent to all LSMs. ... and the LSM infrastructure doesn't care and must not be made to care. It's all up to SELinux. > The LSM infrastructure doesn't know anything about what LSM modules do, nor does it manage any buffers allocated by any LSM modules. Right, which is why putting your lsm_ema_blob is wrong, and why forcing into the file blob is wrong. > EMA is currently managing its own data. What's needed is the trigger - to let EMA know when to update its states. The trigger could be placed in LSM infrastructure or inside individual LSMs. Yes. The latter. > The reason to put it in the infrastructure, is that it depends on the decision of *all* LSMs whether to insert a new EMA. That's basic stacking behavior. "Bail on fail", which says that once denial is detected, you're done. > That's similar to vm_enough_memory() where the final __vm_enough_memory() call is made by the infrastructure but not individual LSMs. Do you really understand the painful reasons that case is required? And if so, why you aren't taking steps to avoid them? >>> EMA blobs are allocated/freed *not* relative to any other blobs. >> In the code you proposed they are freed in security_file_free(). >> That is for file blob management. > Yes. EMA contributes to the file blob. But it only frees memory allocated by the infrastructure itself, not anything from any LSM modules. That's not the way it's supposed to be done. The module tells the infrastructure what it needs, which may include space for EMA data. The module asks EMA for the data it needs and stuffs it somewhere, and the file blob is a fine choice. The module cleans up in file_free, or at any time before that. If no module uses EMA, nothing goes in the blob. If two modules use EMA each is responsible for the data it uses, which may be the same or may be different. I've looked at your code. Making it work the way it should would not be difficult and would likely simplify a bunch of it.
On Mon, Jul 1, 2019 at 12:56 PM Xing, Cedric <cedric.xing@intel.com> wrote:
>
> > From: Andy Lutomirski [mailto:luto@kernel.org]
> > Sent: Monday, July 01, 2019 12:36 PM
> >
> > On Mon, Jul 1, 2019 at 11:31 AM Xing, Cedric <cedric.xing@intel.com>
> > wrote:
> > > I intended to say the major reason I objected Sean's approach was its
> > inability to support SGX2 smoothly - as #PF driven EAUG requires non-
> > existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be
> > dispatched so EAUG couldn't be issued in response to #PF.
> >
> > I still think that, if the kernel wants to support #PF-driven EAUG, it
> > should be an opt-in thing. It would be something like
> > SGX_IOC_ADD_LAZY_EAUG_PAGES or similar. If it's done that way, then
> > the driver needs to learn how to track ranges of pages efficiently,
> > which is another reason to consider leaving all the fancy page / page
> > range tracking in the driver.
> >
> > I don't think it's a good idea for a page fault on any non-EADDed page
> > in ELRANGE to automatically populate the page.
>
> I'm with you. The user code shall be explicit on which range to EAUG pages upon #PF. What I'm saying is, a range has to be mapped before the driver could receive #PFs (in the form of vm_ops->fault callbacks). But Sean's series doesn’t support that because no pages can be mapped before coming into existence.
>
> For "page tracking", what information to track is LSM dependent, so it may run into problems if different LSMs want to track different things. And that's the major reason I think it should be done inside LSM.
>
> Besides, the current page tracking structure in the driver is page oriented and its sole purpose is to serve #PFs. Page protection is better tracked using range oriented structures. Those 2 are orthogonal. It wouldn't reduce the complexity of the whole kernel by moving it into SGX driver.
It seems to me that the driver is going to need to improve its data
structures to track ranges of pages regardless of any LSM issues. If
we're going to have an enclave with a huge ELRANGE and we're going to
mark some large subset of the full ELRANGE as allocate-on-demand, then
we are going to want to track that range in some efficient way. It
could be a single extent or a set of power-of-two-sized extents (i.e.
radix tree entries), or something else, but a list of pages, of which
some are marked not-yet-allocated, isn't going to cut it.
Once that happens, it seems natural to put whatever permission
tracking we need into the same data structure. That's why my proposal
had the driver getting coarse-grained info from LSM ("may execute
dirty page", for example) rather than asking the LSM to track the
whole state machine.
Does that make sense?
> From: Andy Lutomirski [mailto:luto@kernel.org]
> Sent: Monday, July 1, 2019 7:29 PM
>
> On Mon, Jul 1, 2019 at 12:56 PM Xing, Cedric <cedric.xing@intel.com> wrote:
> >
> > > From: Andy Lutomirski [mailto:luto@kernel.org]
> > > Sent: Monday, July 01, 2019 12:36 PM
> > >
> > > On Mon, Jul 1, 2019 at 11:31 AM Xing, Cedric <cedric.xing@intel.com>
> > > wrote:
> > > > I intended to say the major reason I objected Sean's approach was its
> > > inability to support SGX2 smoothly - as #PF driven EAUG requires non-
> > > existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be
> > > dispatched so EAUG couldn't be issued in response to #PF.
> > >
> > > I still think that, if the kernel wants to support #PF-driven EAUG, it
> > > should be an opt-in thing. It would be something like
> > > SGX_IOC_ADD_LAZY_EAUG_PAGES or similar. If it's done that way, then
> > > the driver needs to learn how to track ranges of pages efficiently,
> > > which is another reason to consider leaving all the fancy page / page
> > > range tracking in the driver.
> > >
> > > I don't think it's a good idea for a page fault on any non-EADDed page
> > > in ELRANGE to automatically populate the page.
> >
> > I'm with you. The user code shall be explicit on which range to EAUG pages upon #PF.
> What I'm saying is, a range has to be mapped before the driver could receive #PFs (in the
> form of vm_ops->fault callbacks). But Sean's series doesn’t support that because no pages
> can be mapped before coming into existence.
> >
> > For "page tracking", what information to track is LSM dependent, so it may run into
> problems if different LSMs want to track different things. And that's the major reason I
> think it should be done inside LSM.
> >
> > Besides, the current page tracking structure in the driver is page oriented and its sole
> purpose is to serve #PFs. Page protection is better tracked using range oriented
> structures. Those 2 are orthogonal. It wouldn't reduce the complexity of the whole kernel
> by moving it into SGX driver.
>
> It seems to me that the driver is going to need to improve its data
> structures to track ranges of pages regardless of any LSM issues. If
> we're going to have an enclave with a huge ELRANGE and we're going to
> mark some large subset of the full ELRANGE as allocate-on-demand, then
> we are going to want to track that range in some efficient way. It
> could be a single extent or a set of power-of-two-sized extents (i.e.
> radix tree entries), or something else, but a list of pages, of which
> some are marked not-yet-allocated, isn't going to cut it.
>
> Once that happens, it seems natural to put whatever permission
> tracking we need into the same data structure. That's why my proposal
> had the driver getting coarse-grained info from LSM ("may execute
> dirty page", for example) rather than asking the LSM to track the
> whole state machine.
>
> Does that make sense?
The driver will eventually need some range oriented structures for managing EAUGs. But it doesn't necessarily have to be in the same structure as other per-page information. After all, they are touched by different components of the driver and indeed pretty orthogonal. Evils are always in the details. It may be counter-intuitive but per our prototype years ago, it would be simpler to just keep them separate.
IIUC, your idea is in fact keeping a FSM inside SGX driver and using return values from security_enclave_load() to argument it. That means LSM has to work quite "closely" with SGX driver (i.e. LSM needs to understand the FSM in SGX driver), which is quite different than all other existing hooks, which basically make binary decisions only. And I'm not sure how to chain LSMs if there are multiple active at the same time. Auditing is also a problem, as you can't generate audit log at the time a real mprotect() request is approved/denied. UAPI is also unpleasant as the enclave loader has to "predict" the maximal protection, which is not always available to the loader at load time, or significant changes to enclave build tools would be necessary. I think the FSM is really part of the policy and internal to LSM (or more particularly, SELinux, as different LSM modules may have different FSMs), so it still makes more sense to me to keep "LSM internals" internal to LSM.
> From: Casey Schaufler [mailto:casey@schaufler-ca.com] > Sent: Monday, July 1, 2019 4:12 PM > > On 7/1/2019 2:45 PM, Xing, Cedric wrote: > >> From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx- > >> owner@vger.kernel.org] On Behalf Of Casey Schaufler > >> Sent: Monday, July 01, 2019 12:54 PM > >>> If you understand the purpose, > >> The purpose is to support the SGX hardware, is it not? > >> If you don't have SGX hardware (e.g. MIPS, ARM, s390) you don't need > >> this code. > > No, it is NOT to support SGX > > Then what *is* it for? > > > - i.e. SGX doesn't require this piece of code to work. > > > > And as Dr. Greg pointed out, it can be used for other TEEs than SGX. > > That sure makes it sound like it's for SGX to me. I meant it is generic and potentially useful to more TEEs but so far useful to SGX, which is the first technology that uses this infrastructure. I hope that makes more sense. > > > It doesn't contain SGX h/w specifics. > > I never said it did. But no one ever suggested doing anything > here before SGX, and your subject line: > > "x86/sgx: Add SGX specific LSM hooks" > > says it does. Yes. And in the commit message I also stated that the need for such tracking structure is due to the unique lifespan of enclave pages. Hence this infrastructure will also be useful for other TEEs whose pages share the same lifespan. > > > It is compiled out because there's no module calling it on other architectures today. > But it doesn't conflict with any h/w and may be useful (for other TEEs) on other > architectures in future. > > > >>> then why are you objecting the lsm_ prefix as they are APIs to be used > >> by all LSM modules? > >> > >> We name interfaces based on what they provide, not who consumes them. > >> As your code provides enclave services, that is how they should be named. > >> > >>> Or what should be the prefix in your mind? > >> I'm pretty sure that I've consistently suggested "enclave". > >> > >>> Or what kind of APIs do you think can qualify the lsm_ prefix? > >> Code that implements the LSM infrastructure. There is one LSM blob > >> allocation interface, lsm_inode_alloc(), that is used in early set-up > >> that is exported. As I've mentioned more than once, enclave/page > >> management is not an LSM infrastructure function, it's a memory > >> management function. > > It doesn't manage anything. > > Sorry, "memory management" as in all that stuff around pages and > TLBs and who gets what pages, as opposed to keeping track of anything > on its own. > > > The reason it appears in the infrastructure is because the decision of inserting an EMA > depends on the decisions from *all* active LSMs. > > You have not been listening. Most LSMs use VFS. We haven't rolled VFS > functions into the LSM infrastructure. > > > That is NOT new either, as you can see it in security_file_permission() and > security_vm_enough_memory_mm(), both do something after all LSM modules make their > decisions. > > Did you look to see what it is they're doing? If you had, > you would see that is nothing like what you're proposing. I feel like we are talking about different things. I said those hooks did more than just calling registered hooks. And security_enclave_load() is similar to them, also for a similar reason - something needs to be done after *all* LSM modules make their decisions. I'm not sure what you are talking about. > > > > Would you please explain why you don't see those as problems but calling EMA functions > in security_enclave_load() is a problem? > > The enclave code should be calling security_enclave_load(), > not the other way around. That assumes you're using the naming > convention correctly. Yes. The enclave code (SGX driver) calls security_enclave_load/init. Never the other way around. Again, EMA code is similar to auditing code. It is supposed to be called by LSM modules. > > security_vm_enough_memory_mm() was discussed at length and there > wasn't a clean way to get the logic right without putting the code > here. security_file_permission() has the fs_notify_perm call for > similar reasons. Neither is anything like what you're suggesting. Guess we are discussing "at length" right now on how to get the logic right. I'm not sure why "neither is anything like what I'm suggesting". > > > >>> And I'd like to clarify that it doesn't break anything, but is just a > >> bit different, in that security_enclave_load() and security_file_free() > >> call into those APIs. > >> > >> There should be nothing in security_enclave_load() except calls to the > >> enclave_load() hooks (e.g. selinux_enclave_load()). There should be > >> nothing in security_file_free() except file blob management calls to the > >> file_free() hooks (e.g. apparmor_file_free()). > > As above, there are examples in security/security.c where the hook does more than just > calling registered hooks from LSMs. > > And as I've said, that doesn't matter. You're still going about > using the LSM infrastructure backwards. > > >>> But there's a need for them because otherwise code/data would have to > >>> be duplicated among LSMs > >> There's plenty of code duplication among the LSMs, because a lot of what > >> they do is the same thing. Someday there may be an effort to address > >> some of that, but I don't think it's on anybody's radar. > >> As for data duplication, there's a reason we use lots of pointers. > > As stated above, security_enclave_load() needs to do something extra after all LSMs make > their decisions. How can pointers help here? > > I can explain it, but you clearly have no interest in doing > anything to make your code fit into the system. I have a lot > of other things to be doing. I'm so interested in getting things fit. Just that what I see fit doesn't seem fit from your perspective. > > >>> and the logic would be harder to comprehend. > >> Keeping the layering clean is critical to comprehension. > >> There's a lot of VFS code that could have been implemented within the > >> LSM infrastructure, but I don't think that anyone would argue that it > >> should have been. > >> > >>> So that's a trade-off. > >> I remain completely unconvinced that your proposal represents a good way > >> to implement you scheme. > >> > >>> Then what's the practical drawback of doing that? > >> Name space pollution. > > Alright, I can fix the names. > > Good! > > > >> Layering violation. > > Not sure what you are referring to. > > The only places where the blob freed by security_file_free() > may be allocated is security_file_alloc(). The security modules > are welcome to do anything they like in addition, provided > they clean up after themselves in their file_free() hooks. Exactly! Like I said, allocation could happen in security_file_alloc() but I did it in security_enclave_load() to avoid unnecessary allocation for non-enclaves. I know "security" looks close to "selinux" but I beg your attention in the function names. Whatever allocated inside SELinux *never* gets freed by the infrastructure, except those implicit allocations due to EMA splits. > > If SELinux wants to support controls on enclave information, > and that requires additional data, SELinux should include > space in its file blob for that information, or a pointer to > the place where the enclave code is maintaining it. > > That's the way audit works. I have to repeat myself. This was what v1 does. The drawback is, there could be multiple LSMs active at the same time. And an EMA is inserted iff *all* LSMs have approved it. Thus the actual insertion is now done at the end of security_enclave_load(). That makes the logic clear and less error prone, and saves code that'd be duplicated into multiple LSMs otherwise. And that's why I cited security_file_permission()/security_vm_enough_memory() as precedence to security_enclave_load(). > > > If you are referring to buffers allocated in one layer and freed in elsewhere, you have > got the code wrong. Buffers allocated in security_enclave_load() is freed in > security_file_free(). > > It's up to the security module's file_free() to clean up anything that > wasn't allocated in security_file_free(). Interested security modules > should call enclave_load(), and put the information into their portion > of the security blob. The module specific code can call enclave_file_free(), > or whatever interface you want to provide, to clean up. That might take > place in file_free(), but it also might be elsewhere. Would you please take a closer look at my code? Whatever I added to SELinux did *not* allocate anything! Or are you talking about something else? > > > > Whatever else allocated in LSMs are not seen or taken care of by the infrastructure. The > purpose of allocating EMAs in enclave_load() is trying to minimize overhead for non- > enclave files, otherwise it could be done in file_alloc() to be more "paired" with > file_free(). But I don't see it necessary. > > Try looking at maintaining what you've put into the LSM code as > a separate entity. It makes it simpler. Really. It is already a separate entity. It has its own header and own C file. > > >> Architecture specific implementation detail in a general infrastructure. > > Stated earlier, it doesn't contain any h/w specifics but just a TEE abstraction. > > Then put it in the TEE system. It's LSM's abstraction of TEE - i.e., it tracks what matters to LSM only. TEE doesn't care. It just provides information and asks for a decision at return. That's how LSM works. > > > It could be left on all the time or controlled by a different config macro. > > True in any case. > > > It is contingent to CONFIG_INTEL_SGX just for convenience, as SGX is the first (and only > so far) TEE that needs attention from LSM, but there could be more in future. > > All the more reason to keep it separate. These things never get simpler > when they get more generalized. I have a hard time understanding what you mean by "separate". > > >>> If no, why would we want to pay for the cost for not doing that? > >> Modularity and maintainability come directly to mind. > > Putting it elsewhere will incur more maintenance cost. > > I don't believe that for a second. 40 years of C programming > have taught me that trying to do multiple things in one place > is always a bad idea. Agreed. I'm doing only one thing. > > > >>>> ... and the notion that you allocate data for one blob that gets > >>>> freed relative to another breaks the data management model. > >>> What do you mean here? > >> You're freeing the EMA data from security_file_free(). > >> If selinux wants to free EMA data it has allocated in > >> selinux_enclave_load() in selinux_file_free() that's fine, but the LSM > >> infrastructure has no need to know about it. > >> EMA needs to manage its own data, just like VFS does. > >> The LSM infrastructure provides blob management so that the security > >> modules can extend data if they want to. > > You've got the code wrong. selinux_enclave_load() doesn't allocate any memory. > selinux_file_mprotect() may, due to EMA split. But that's transparent to all LSMs. > > ... and the LSM infrastructure doesn't care and > must not be made to care. It's all up to SELinux. It doesn't care the decisions. But it assists in maintaining information on which decisions (from multiple LSMs) are based. > > > The LSM infrastructure doesn't know anything about what LSM modules do, nor does it > manage any buffers allocated by any LSM modules. > > Right, which is why putting your lsm_ema_blob is wrong, and why > forcing into the file blob is wrong. lsm_ema_blob is NOT part of file blob. > > > EMA is currently managing its own data. What's needed is the trigger - to let EMA know > when to update its states. The trigger could be placed in LSM infrastructure or inside > individual LSMs. > > Yes. The latter. > > > The reason to put it in the infrastructure, is that it depends on the decision of *all* > LSMs whether to insert a new EMA. > > That's basic stacking behavior. "Bail on fail", which says that once > denial is detected, you're done. Who does the insertion on success, if not the LSM infrastructure? This is again similar to security_file_permission/security_vm_enough_memory. The last step is done by the infrastructure on success. > > > That's similar to vm_enough_memory() where the final __vm_enough_memory() call is made > by the infrastructure but not individual LSMs. > > Do you really understand the painful reasons that case is required? > And if so, why you aren't taking steps to avoid them? I think I've run into the same painful reasons. Honestly, I tried not to do anything more than just a call_int_hooks. But I realized that'd make thing much more complicated and error prone in multiple-active-LSM cases. So I think I've run into the same painful reasons. And I don't see any actionable suggestions from you so far. > > > >>> EMA blobs are allocated/freed *not* relative to any other blobs. > >> In the code you proposed they are freed in security_file_free(). > >> That is for file blob management. > > Yes. EMA contributes to the file blob. But it only frees memory allocated by the > infrastructure itself, not anything from any LSM modules. > > That's not the way it's supposed to be done. The module tells > the infrastructure what it needs, which may include space for > EMA data. The module asks EMA for the data it needs and stuffs > it somewhere, and the file blob is a fine choice. The module > cleans up in file_free, or at any time before that. If no module > uses EMA, nothing goes in the blob. If two modules use EMA each > is responsible for the data it uses, which may be the same or > may be different. > > I've looked at your code. Making it work the way it should would > not be difficult and would likely simplify a bunch of it. Guess this discussion will never end if we don't get into code. Guess it'd be more productive to talk over phone then come back to this thread with a conclusion. Will that be ok with you?
On 7/2/2019 12:42 AM, Xing, Cedric wrote:
> ...
> Guess this discussion will never end if we don't get into code. Guess it'd be more productive to talk over phone then come back to this thread with a conclusion. Will that be ok with you?
I don't think that a phone call is going to help. Talking
code issues tends to muddle them in my brain. If you can give
me a few days I will propose a rough version of how I think
your code should be integrated into the LSM environment. I'm
spending more time trying (unsuccessfully :( ) to discribe
the issues in English than it will probably take in C.
On Tue, Jul 02, 2019 at 08:44:40AM -0700, Casey Schaufler wrote: Good morning, I hope this note finds the week going well for everyone. > On 7/2/2019 12:42 AM, Xing, Cedric wrote: > > ... > > Guess this discussion will never end if we don't get into > > code. Guess it'd be more productive to talk over phone then come back > > to this thread with a conclusion. Will that be ok with you? > I don't think that a phone call is going to help. Talking code > issues tends to muddle them in my brain. If you can give me a few > days I will propose a rough version of how I think your code should > be integrated into the LSM environment. I'm spending more time > trying (unsuccessfully :( ) to discribe the issues in English than > it will probably take in C. While Casey is off writing his rosetta stone, let me suggest that the most important thing we need to do is to take a little time, step back and look at the big picture with respect to what we are trying to accomplish and if we are going about it in a way that makes any sense from an engineering perspective. This conversation shouldn't be about SGX, it should be about the best way for the kernel/LSM to discipline a Trusted Execution Environment (TEE). As I have noted previously, a TEE is a 'blackbox' that, by design, is intended to allow execution of code and processing of data in a manner that is resistant to manipulation or inspection by untrusted userspace, the kernel and/or the hardware itself. Given that fact, if we are to be intellectually honest, we need to ask ourselves how effective we believe we can be in controlling any TEE with kernel based mechanisms. This is particularly the case if the author of any code running in the TEE has adversarial intent. Here is the list of controls that we believe an LSM can, effectively, implement against a TEE: 1.) Code provenance and origin. 2.) Cryptographic verification of dynamically executable content. 2.) The ability of a TEE to implement anonymous executable content. If people are in agreement with this concept, it is difficult to understand why we should be implementing complex state machines and the like, whether it is in the driver or the LSM. Security code has to be measured with a metric of effectiveness, otherwise we are engaging in security theater. I believe that if we were using this lens, we would already have a mainline SGX driver, since we seem to have most of the needed LSM infrastructure and any additional functionality would be a straight forward implementation. Most importantly, the infrastructure would not be SGX specific, which would seem to be a desirable political concept. If we are not willing to engage in this discussion we are going to end up with a quasi-technology specific solution that isn't implementing any relevant security guarantees. FWIW, we wouldn't even be having this, now lengthy discussion, if I wouldn't have aggressively advocated, starting last November, that an SGX driver needed some form of execution control if there was a desire for the technology to not pose a security risk to the platform. So humor me a little bit.... :-) Best wishes for a productive remainder of the week to everyone. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker IDfusion, LLC 4206 N. 19th Ave. Implementing measured information privacy Fargo, ND 58102 and integrity architectures. PH: 701-281-1686 FAX: 701-281-3949 EMAIL: greg@idfusion.net ------------------------------------------------------------------------------ "... remember that innovation is saying 'no' to 1000 things."
On 7/3/2019 2:46 AM, Dr. Greg wrote: > On Tue, Jul 02, 2019 at 08:44:40AM -0700, Casey Schaufler wrote: > > Good morning, I hope this note finds the week going well for everyone. > >> On 7/2/2019 12:42 AM, Xing, Cedric wrote: >>> ... >>> Guess this discussion will never end if we don't get into >>> code. Guess it'd be more productive to talk over phone then come back >>> to this thread with a conclusion. Will that be ok with you? >> I don't think that a phone call is going to help. Talking code >> issues tends to muddle them in my brain. If you can give me a few >> days I will propose a rough version of how I think your code should >> be integrated into the LSM environment. I'm spending more time >> trying (unsuccessfully :( ) to discribe the issues in English than >> it will probably take in C. > While Casey is off writing his rosetta stone, I'd hardly call it that. More of an effort to round the corners on the square peg. And Cedric has some ideas on how to approach that. > let me suggest that the > most important thing we need to do is to take a little time, step back > and look at the big picture with respect to what we are trying to > accomplish and if we are going about it in a way that makes any sense > from an engineering perspective. > > This conversation shouldn't be about SGX, it should be about the best > way for the kernel/LSM to discipline a Trusted Execution Environment > (TEE). As I have noted previously, a TEE is a 'blackbox' that, by > design, is intended to allow execution of code and processing of data > in a manner that is resistant to manipulation or inspection by > untrusted userspace, the kernel and/or the hardware itself. > > Given that fact, if we are to be intellectually honest, we need to ask > ourselves how effective we believe we can be in controlling any TEE > with kernel based mechanisms. This is particularly the case if the > author of any code running in the TEE has adversarial intent. > > Here is the list of controls that we believe an LSM can, effectively, > implement against a TEE: > > 1.) Code provenance and origin. > > 2.) Cryptographic verification of dynamically executable content. > > 2.) The ability of a TEE to implement anonymous executable content. > > If people are in agreement with this concept, it is difficult to > understand why we should be implementing complex state machines and > the like, whether it is in the driver or the LSM. Security code has > to be measured with a metric of effectiveness, otherwise we are > engaging in security theater. > > I believe that if we were using this lens, we would already have a > mainline SGX driver, since we seem to have most of the needed LSM > infrastructure and any additional functionality would be a straight > forward implementation. Most importantly, the infrastructure would > not be SGX specific, which would seem to be a desirable political > concept. Generality introduced in the absence of multiple instances often results in unnecessary complexity, unused interfaces and feature compromise. Guessing what other TEE systems might do, and constraining SGX to those models (or the other way around) is a well established road to ruin. The LSM infrastructure is a fine example. For the first ten years the "general" mechanism had a single user. I'd say to hold off on the general until there is more experience with the specific. It's easier to construct a general mechanism around things that work than to fit things that need to work into some preconceived notion of generality. > > If we are not willing to engage in this discussion we are going to end > up with a quasi-technology specific solution that isn't implementing > any relevant security guarantees. > > FWIW, we wouldn't even be having this, now lengthy discussion, if I > wouldn't have aggressively advocated, starting last November, that an > SGX driver needed some form of execution control if there was a desire > for the technology to not pose a security risk to the platform. So > humor me a little bit.... :-) > > Best wishes for a productive remainder of the week to everyone. > > Dr. Greg > > As always, > Dr. Greg Wettstein, Ph.D, Worker > IDfusion, LLC > 4206 N. 19th Ave. Implementing measured information privacy > Fargo, ND 58102 and integrity architectures. > PH: 701-281-1686 > FAX: 701-281-3949 EMAIL: greg@idfusion.net > ------------------------------------------------------------------------------ > "... remember that innovation is saying 'no' to 1000 things."
On Thu, Jun 27, 2019 at 11:56:18AM -0700, Cedric Xing wrote: I think it is fine to have these patch sets as a discussion starters but it does not make any sense to me to upstream LSM changes with the SGX foundations. This is exactly the same situation as with KVM changes. The patch set is already way too big to fit to the standards [1]. The eye should be on whether the uapi (e.g. device files, ioctl's) will work for LSM's in a legit way. Do we need more of these different flavors of experimental LSM changes or can we make some conclusions with the real issue we are trying to deal with? [1] "Do not send more than 15 patches at once to the vger mailing lists!!!" https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#select-the-recipients-for-your-patch /Jarkko
> The eye should be on whether the uapi (e.g. device files, ioctl's) will
> work for LSM's in a legit way. Do we need more of these different
> flavors of experimental LSM changes or can we make some conclusions with
> the real issue we are trying to deal with?
Anyway, sending v21 soonish. Finished it on Thu but have been waiting
any internal QA feedback. If nothing pops up, I'll send it tmrw.
/Jarkko
On Thu, Jul 04, 2019 at 02:22:21AM +0300, Jarkko Sakkinen wrote:
> > The eye should be on whether the uapi (e.g. device files, ioctl's) will
> > work for LSM's in a legit way. Do we need more of these different
> > flavors of experimental LSM changes or can we make some conclusions with
> > the real issue we are trying to deal with?
>
> Anyway, sending v21 soonish. Finished it on Thu but have been waiting
> any internal QA feedback. If nothing pops up, I'll send it tmrw.
Ugh, the point I forgot to add was that it contains update to
SGX_IOC_ENCLAVE_ADD_PAGE that is relevant for the discussion (probably
the same as Sean proposed cannot recall if I did tuning to it).
/Jarkko
On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: I still don't get why we need this whole mess and do not simply admit that there are two distinct roles: 1. Creator 2. User In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but User does not. It just gets the fd from the Creator. I'm sure that all the SGX2 related functionality can be solved somehow in this role playing game. An example would be the usual case where enclave is actually a loader that loads the actual piece of software that one wants to run. Things simply need to be designed in a way the Creator runs the loader part. These are non-trivial problems but oddball security model is not going to make them disappear - on the contrary it will make designing user space only more complicated. I think this is classical example of when something overly complicated is invented in the kernel only to realize that it should be solved in the user space. It would not be like the only use case where some kind of privileged daemon is used for managing some a kernel provided resource. I think a really good conclusion from this discussion that has taken two months is to realize that nothing needs to be done in this area (except *maybe* noexec check). /Jarkko
On 7/3/2019 4:16 PM, Jarkko Sakkinen wrote: > On Thu, Jun 27, 2019 at 11:56:18AM -0700, Cedric Xing wrote: > > I think it is fine to have these patch sets as a discussion starters but > it does not make any sense to me to upstream LSM changes with the SGX > foundations. Guess LSM is a gating factor, because otherwise SGX could be abused to make executable EPC from pages that are otherwise not allowed to be executable. Am I missing anything? > > This is exactly the same situation as with KVM changes. The patch set is > already way too big to fit to the standards [1]. > > The eye should be on whether the uapi (e.g. device files, ioctl's) will > work for LSM's in a legit way. Do we need more of these different > flavors of experimental LSM changes or can we make some conclusions with > the real issue we are trying to deal with? > > [1] "Do not send more than 15 patches at once to the vger mailing lists!!!" > https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html#select-the-recipients-for-your-patch > > /Jarkko >
On Wed, Jul 03, 2019 at 08:32:10AM -0700, Casey Schaufler wrote: Good morning, I hope the weekend has been enjoyable for everyone. > >> On 7/2/2019 12:42 AM, Xing, Cedric wrote: > >>> ... > >>> Guess this discussion will never end if we don't get into > >>> code. Guess it'd be more productive to talk over phone then come back > >>> to this thread with a conclusion. Will that be ok with you? > >> I don't think that a phone call is going to help. Talking code > >> issues tends to muddle them in my brain. If you can give me a few > >> days I will propose a rough version of how I think your code should > >> be integrated into the LSM environment. I'm spending more time > >> trying (unsuccessfully :( ) to discribe the issues in English than > >> it will probably take in C. > > While Casey is off writing his rosetta stone, > I'd hardly call it that. More of an effort to round the corners on > the square peg. And Cedric has some ideas on how to approach that. Should we infer from this comment that, of the two competing strategies, Cedric's is the favored architecture? > > let me suggest that the > > most important thing we need to do is to take a little time, step back > > and look at the big picture with respect to what we are trying to > > accomplish and if we are going about it in a way that makes any sense > > from an engineering perspective. > > > > This conversation shouldn't be about SGX, it should be about the best > > way for the kernel/LSM to discipline a Trusted Execution Environment > > (TEE). As I have noted previously, a TEE is a 'blackbox' that, by > > design, is intended to allow execution of code and processing of data > > in a manner that is resistant to manipulation or inspection by > > untrusted userspace, the kernel and/or the hardware itself. > > > > Given that fact, if we are to be intellectually honest, we need to ask > > ourselves how effective we believe we can be in controlling any TEE > > with kernel based mechanisms. This is particularly the case if the > > author of any code running in the TEE has adversarial intent. > > > > Here is the list of controls that we believe an LSM can, effectively, > > implement against a TEE: > > > > 1.) Code provenance and origin. > > > > 2.) Cryptographic verification of dynamically executable content. > > > > 2.) The ability of a TEE to implement anonymous executable content. > > > > If people are in agreement with this concept, it is difficult to > > understand why we should be implementing complex state machines and > > the like, whether it is in the driver or the LSM. Security code has > > to be measured with a metric of effectiveness, otherwise we are > > engaging in security theater. > > > > I believe that if we were using this lens, we would already have a > > mainline SGX driver, since we seem to have most of the needed LSM > > infrastructure and any additional functionality would be a straight > > forward implementation. Most importantly, the infrastructure would > > not be SGX specific, which would seem to be a desirable political > > concept. > Generality introduced in the absence of multiple instances > often results in unnecessary complexity, unused interfaces > and feature compromise. Guessing what other TEE systems might > do, and constraining SGX to those models (or the other way around) > is a well established road to ruin. The LSM infrastructure is > a fine example. For the first ten years the "general" mechanism > had a single user. I'd say to hold off on the general until there > is more experience with the specific. It's easier to construct > a general mechanism around things that work than to fit things > that need to work into some preconceived notion of generality. All well taken points from an implementation perspective, but they elide the point I was trying to make. Which is the fact that without any semblance of a discussion regarding the requirements needed to implement a security architecture around the concept of a TEE, this entire process, despite Cedric's well intentioned efforts, amounts to pounding a square solution into the round hole of a security problem. Which, as I noted in my e-mail, is tantamount to security theater. Everyone wants to see this driver upstream. If we would have had a reasoned discussion regarding what it means to implement proper controls around a TEE, when we started to bring these issues forward last November, we could have possibly been on the road to having a driver with reasoned security controls and one that actually delivers the security guarantees the hardware was designed to deliver. Best wishes for a productive week to everyone. Dr. Greg As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in information infra-structure Fargo, ND 58102 development. PH: 701-281-1686 EMAIL: greg@enjellic.com ------------------------------------------------------------------------------ "Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction." -- Albert Einstein
On Thu, Jun 27, 2019 at 01:29:39PM -0700, Xing, Cedric wrote:
> > From: linux-sgx-owner@vger.kernel.org [mailto:linux-sgx-
> > owner@vger.kernel.org] On Behalf Of Stephen Smalley
> > Sent: Tuesday, June 25, 2019 1:48 PM
> >
> > On 6/21/19 12:54 PM, Xing, Cedric wrote:
> > >> From: Christopherson, Sean J
> > >> Sent: Wednesday, June 19, 2019 3:24 PM
> > >>
> > >> diff --git a/security/security.c b/security/security.c index
> > >> 613a5c00e602..03951e08bdfc 100644
> > >> --- a/security/security.c
> > >> +++ b/security/security.c
> > >> @@ -2359,3 +2359,10 @@ void security_bpf_prog_free(struct
> > bpf_prog_aux *aux)
> > >> call_void_hook(bpf_prog_free_security, aux);
> > >> }
> > >> #endif /* CONFIG_BPF_SYSCALL */
> > >> +
> > >> +#ifdef CONFIG_INTEL_SGX
> > >> +int security_enclave_map(unsigned long prot) {
> > >> + return call_int_hook(enclave_map, 0, prot); } #endif /*
> > >> +CONFIG_INTEL_SGX */
> > >
> > > Why is this new security_enclave_map() necessary while
> > security_mmap_file() will also be invoked?
> >
> > security_mmap_file() doesn't know about enclaves. It will just end up
> > checking FILE__READ, FILE__WRITE, and FILE__EXECUTE to /dev/sgx/enclave.
> > This was noted in the patch description.
>
> Surely I understand all those. As I mentioned in my other email,
> enclave_load() could indicate to LSM that a file is an enclave. Of course
> mmap() could be invoked before any pages are loaded so LSM wouldn't know at
> the first mmap(), but that doesn't matter as an empty enclave wouldn't post
> any threats anyway.
security_mmap_file() is invoked before the final address is known, and
MAP_FIXED isn't technically required.
On Mon, Jul 01, 2019 at 01:03:51PM -0700, Xing, Cedric wrote:
> > From: Andy Lutomirski [mailto:luto@kernel.org]
> > Sent: Monday, July 01, 2019 12:33 PM
> >
> > It does make sense, but I'm not sure it's correct to assume that any LSM
> > policy will always allow execution on enclave source pages if it would
> > allow execution inside the enclave. As an example, here is a policy
> > that seems reasonable:
> >
> > Task A cannot execute dynamic non-enclave code (no execmod, no execmem,
> > etc -- only approved unmodified file pages can be executed).
> > But task A can execute an enclave with MRENCLAVE == such-and-such, and
> > that enclave may be loaded from regular anonymous memory -- the
> > MRENCLAVE is considered enough verification.
>
> You are right. That's a reasonable policy. But I still can't see the need for
> SGX_EXECUNMR, as MRENCLAVE is considered enough verification.
That assumes the enclave/loader developer will never make a mistake, and
that policy owners are going to do a deep dive on the EEXTEND values for
an enclave (and will never make a mistake).
User errors aside, EXECUNMR would also be useful in conjunction with
MRSIGNER, e.g. allow all enclaves signed by X, but disallow unmeasured
code.
On Fri, Jun 21, 2019 at 04:26:54AM +0300, Jarkko Sakkinen wrote: > On Wed, Jun 19, 2019 at 03:23:54PM -0700, Sean Christopherson wrote: > > Do not allow an enclave page to be mapped with PROT_EXEC if the source > > vma does not have VM_MAYEXEC. This effectively enforces noexec as > > do_mmap() clears VM_MAYEXEC if the vma is being loaded from a noexec > > path, i.e. prevents executing a file by loading it into an enclave. > > > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > --- > > arch/x86/kernel/cpu/sgx/driver/ioctl.c | 42 +++++++++++++++++++++++--- > > 1 file changed, 37 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > > index e18d2afd2aad..1fca70a36ce3 100644 > > --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c > > +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c > > @@ -564,6 +564,39 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, > > return ret; > > } > > > > +static int sgx_encl_page_copy(void *dst, unsigned long src, unsigned long prot) > > I will probably forget the context with this name after this has been > merged :-) So many functions dealing with enclave pages. Two > alternatives that come up to my mind: > > 1. sgx_encl_page_user_import() > 2. sgx_encl_page_user_copy_from() What about sgx_encl_page_copy_from_user() to align with copy_from_user()? > Not saying that they are beatiful names but at least you immediately > know the context. > > > +{ > > + struct vm_area_struct *vma; > > + int ret; > > + > > + /* Hold mmap_sem across copy_from_user() to avoid a TOCTOU race. */ > > + down_read(¤t->mm->mmap_sem); > > + > > + /* Query vma's VM_MAYEXEC as an indirect path_noexec() check. */ > > + if (prot & PROT_EXEC) { > > + vma = find_vma(current->mm, src); > > + if (!vma) { > > + ret = -EFAULT; > > + goto out; > > Should this be -EINVAL instead? copy_from_user() failure is handled via -EFAULT, this is effectively an equivalent check.
This series intends to make the new SGX subsystem to work with the existing LSM architecture smoothly so that, say, SGX cannot be abused to work around restrictions set forth by LSM modules/policies. This patch is based on and could be applied cleanly on top of Jarkko Sakkinen’s SGX patch series v20 (https://patchwork.kernel.org/cover/10905141/). For those who haven’t followed closely, the whole discussion started from the primary question of how to prevent creating an executable enclave page from a regular memory page that is NOT executable as prohibited by LSM modules/policies. And that can be translated into 2 relating questions in practice, i.e. 1) how to determine the allowed initial protections of enclave pages when they are being loaded and 2) how to determine the allowed protections of enclave pages at runtime. Those who are familiar with LSM may notice that, for regular files, #1 is determined by security_mmap_file() while #2 is covered by security_file_mprotect(). Those 2 hooks however are insufficient for enclaves due to the distinct composition and lifespan of enclave pages. Specifically, security_mmap_file() only passes in the file but is not specific on which portion of the file being mmap()’ed, with the assumption that all pages of the same file shall have the same set of allowed/disallowed protections. But that assumption is no longer true for enclaves for 2 reasons: a) pages of an enclave may be loaded from different image files with different attributes and b) enclave pages retain contents across munmap()/mmap(), therefore, say, if a policy prohibits execution of modified pages, then pages flagged modified have to stay modified across munmap()/mmap() so that the policy cannot be circumvented by remapping (i.e. munmap() followed by mmap() on the same range). But the lack of range information in security_mmap_file()’s arguments simply blocks LSM modules from tracking enclave pages properly. A rational solution would always involve tracking the correspondence between enclave pages and their origin (e.g. files from which they were loaded), which is similar to tracking regular memory pages and their origin via vm_file of struct vm_area_struct. But given the longer lifespan of enclave pages (than VMAs they are mapped into), such correspondence has to be stored in a separate data structure outside of VMAs. In theory, the correspondence could be stored either in LSM or in the SGX subsystem. This series has picked the former because firstly, such information is useful only within LSM so it makes more sense to keep it as “LSM internal” and secondly, keeping the data structure inside LSM would allow additional information to be cached in LSM modules without affecting the rest of the kernel, while lastly, those data structures would be gone when LSM is disabled hence would not impose any unnecessary overhead. Those who are familiar with this topic and related discussions may also notice that, Sean Christopherson has sent out an RFC patch recently to address the same problem as this series. He adopted the other approach of tracking page/origin correspondence inside the SGX subsystem. However, to reduce memory overhead in practice, he cached the FSM (Finite State Machine) instead of page/origin correspondences. By “FSM”, I mean policy FSM defined as sets of states and events that may trigger state transitions. Generally speaking, any LSM module has its own definition of FSM and usually uses attributes attached to files to argument the FSM, then it advances the FSM as events are observed and gives out decision based on the current FSM state. Sean’s implementation attempts to move the FSM into the SGX subsystem, and by caching the arguments returned by LSM it tries to monitor events and reach the same decisions by itself. So from architecture perspective, that model has to face tough challenges in reality, such as how to support multiple LSM modules that employ different FSMs to govern page protection transitions. Implementation wise, his model also imposes unwanted restrictions specifically to SGX2, such as: · Complicated/Restricted UAPI – Enclave loaders are required to provide “maximal protection” at page load time, but such information is NOT always available. For example, Graphene containers may run different applications comprised of different set of executables and/or shared objects. Some of them may contain self-modifying code (or text relocation) while others don’t. The generic enclave loader usually doesn’t have such information so wouldn’t be able to provide it ahead of time. · Inefficient Auditing – Audit logs are supposed to help system administrators to determine the set of minimally needed permissions and to detect abnormal behaviors. But consider the “maximal protection” model, if “maximal protection” is set to be too permissive, then audit log wouldn’t be able to detect anomalies; or if “maximal protection” is too restrictive, then audit log cannot identify the file violating the policy. In either case the audit log cannot fulfill its purposes. · Inability to support #PF driven EPC allocation in SGX2 – For those unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by issuing EACCEPT on the address that a new page is wanted, and the resulted #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the fault address, and then the enclave would be resumed and the faulting EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing non-existing enclave pages so that the SGX module/subsystem could respond to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow mmap()’ing non-existing pages for variety of reasons and therefore blocks this major SGX2 usage. Please note that this series has only been compile-tested. History: · This is version 3 of this patch series, with the following changes over version 2 per comments/requests from the community. - Per Casey Schaufler, moved EMA from the LSM infrastructure into a new LSM module, named “ema”, which is responsible for maintaining EMA maps for enclaves and offers APIs for other LSM modules to query/update EMA nodes. - Per Stephen Smalley, removed kernel command line option “lsm.ema.cache_decisions”. Enclave page origins will always be tracked and audit logs will always be accurate. - Per Andy Lutomirski, a new PROCESS2__ENCLAVE_EXECANON permission has been added to SELinux to allow EADD’ing executable pages from non-executable anonymous source pages. - Revised permission checks for enclave pages in SELinux. See selinux_enclave_load() and enclave_mprotect() functions in security/selinux/hooks.c for details. · v2 – https://patchwork.kernel.org/cover/11020301/ · v1 – https://patchwork.kernel.org/cover/10984127/ Cedric Xing (4): x86/sgx: Add SGX specific LSM hooks x86/64: Call LSM hooks from SGX subsystem/module X86/sgx: Introduce EMA as a new LSM module x86/sgx: Implement SGX specific hooks in SELinux arch/x86/kernel/cpu/sgx/driver/ioctl.c | 80 ++++++- arch/x86/kernel/cpu/sgx/driver/main.c | 16 +- include/linux/lsm_ema.h | 97 +++++++++ include/linux/lsm_hooks.h | 27 +++ include/linux/security.h | 23 ++ security/Makefile | 1 + security/commonema.c | 277 +++++++++++++++++++++++++ security/security.c | 17 ++ security/selinux/hooks.c | 236 ++++++++++++++++++++- security/selinux/include/classmap.h | 3 +- security/selinux/include/objsec.h | 7 + 11 files changed, 770 insertions(+), 14 deletions(-) create mode 100644 include/linux/lsm_ema.h create mode 100644 security/commonema.c -- 2.17.1
SGX enclaves are loaded from pages in regular memory. Given the ability to create executable pages, the newly added SGX subsystem may present a backdoor for adversaries to circumvent LSM policies, such as creating an executable enclave page from a modified regular page that would otherwise not be made executable as prohibited by LSM. Therefore arises the primary question of whether an enclave page should be allowed to be created from a given source page in regular memory. A related question is whether to grant/deny a mprotect() request on a given enclave page/range. mprotect() is traditionally covered by security_file_mprotect() hook, however, enclave pages have a different lifespan than either MAP_PRIVATE or MAP_SHARED. Particularly, MAP_PRIVATE pages have the same lifespan as the VMA while MAP_SHARED pages have the same lifespan as the backing file (on disk), but enclave pages have the lifespan of the enclave’s file descriptor. For example, enclave pages could be munmap()’ed then mmap()’ed again without losing contents (like MAP_SHARED), but all enclave pages will be lost once its file descriptor has been closed (like MAP_PRIVATE). That said, LSM modules need some new data structure for tracking protections of enclave pages/ranges so that they can make proper decisions at mmap()/mprotect() syscalls. The last question, which is orthogonal to the 2 above, is whether or not to allow a given enclave to launch/run. Enclave pages are not visible to the rest of the system, so to some extent offer a better place for malicious software to hide. Thus, it is sometimes desirable to whitelist/blacklist enclaves by their measurements, signing public keys, or image files. To address the questions above, 2 new LSM hooks are added for enclaves. · security_enclave_load() – This hook allows LSM to decide whether or not to allow instantiation of a range of enclave pages using the specified VMA. It is invoked when a range of enclave pages is about to be loaded. It serves 3 purposes: 1) indicate to LSM that the file struct in subject is an enclave; 2) allow LSM to decide whether or not to instantiate those pages and 3) allow LSM to initialize internal data structures for tracking origins/protections of those pages. · security_enclave_init() – This hook allows whitelisting/blacklisting or performing whatever checks deemed appropriate before an enclave is allowed to run. An LSM module may opt to use the file backing the SIGSTRUCT as a proxy to dictate allowed protections for anonymous pages. mprotect() of enclave pages continue to be governed by security_file_mprotect(), with the expectation that LSM is able to distinguish between regular and enclave pages inside the hook. For mmap(), the SGX subsystem is expected to invoke security_file_mprotect() explicitly to check protections against the requested protections for existing enclave pages. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- include/linux/lsm_hooks.h | 27 +++++++++++++++++++++++++++ include/linux/security.h | 23 +++++++++++++++++++++++ security/security.c | 17 +++++++++++++++++ 3 files changed, 67 insertions(+) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 47f58cfb6a19..9d9e44200683 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1446,6 +1446,22 @@ * @bpf_prog_free_security: * Clean up the security information stored inside bpf prog. * + * @enclave_load: + * Decide if a range of pages shall be allowed to be loaded into an + * enclave + * + * @encl points to the file identifying the target enclave + * @start target range starting address + * @end target range ending address + * @flags contains protections being requested for the target range + * @source points to the VMA containing the source pages to be loaded + * + * @enclave_init: + * Decide if an enclave shall be allowed to launch + * + * @encl points to the file identifying the target enclave being launched + * @sigstruct contains a copy of the SIGSTRUCT in kernel memory + * @source points to the VMA backing SIGSTRUCT in user memory */ union security_list_options { int (*binder_set_context_mgr)(struct task_struct *mgr); @@ -1807,6 +1823,13 @@ union security_list_options { int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux); void (*bpf_prog_free_security)(struct bpf_prog_aux *aux); #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX + int (*enclave_load)(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *source); + int (*enclave_init)(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *source); +#endif }; struct security_hook_heads { @@ -2046,6 +2069,10 @@ struct security_hook_heads { struct hlist_head bpf_prog_alloc_security; struct hlist_head bpf_prog_free_security; #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX + struct hlist_head enclave_load; + struct hlist_head enclave_init; +#endif } __randomize_layout; /* diff --git a/include/linux/security.h b/include/linux/security.h index 659071c2e57c..52c200810004 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1829,5 +1829,28 @@ static inline void security_bpf_prog_free(struct bpf_prog_aux *aux) #endif /* CONFIG_SECURITY */ #endif /* CONFIG_BPF_SYSCALL */ +#ifdef CONFIG_INTEL_SGX +struct sgx_sigstruct; +#ifdef CONFIG_SECURITY +int security_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *source); +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *source); +#else +static inline int security_enclave_load(struct file *encl, size_t start, + size_t end, struct vm_area_struct *src) +{ + return 0; +} + +static inline int security_enclave_init(struct file *encl, + struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + return 0; +} +#endif /* CONFIG_SECURITY */ +#endif /* CONFIG_INTEL_SGX */ + #endif /* ! __LINUX_SECURITY_H */ diff --git a/security/security.c b/security/security.c index f493db0bf62a..72c10f5e4f95 100644 --- a/security/security.c +++ b/security/security.c @@ -1420,6 +1420,7 @@ int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, { return call_int_hook(file_mprotect, 0, vma, reqprot, prot); } +EXPORT_SYMBOL(security_file_mprotect); int security_file_lock(struct file *file, unsigned int cmd) { @@ -2355,3 +2356,19 @@ void security_bpf_prog_free(struct bpf_prog_aux *aux) call_void_hook(bpf_prog_free_security, aux); } #endif /* CONFIG_BPF_SYSCALL */ + +#ifdef CONFIG_INTEL_SGX +int security_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *src) +{ + return call_int_hook(enclave_load, 0, encl, start, end, flags, src); +} +EXPORT_SYMBOL(security_enclave_load); + +int security_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + return call_int_hook(enclave_init, 0, encl, sigstruct, src); +} +EXPORT_SYMBOL(security_enclave_init); +#endif /* CONFIG_INTEL_SGX */ -- 2.17.1
It’s straightforward to call new LSM hooks from the SGX subsystem/module. There are three places where LSM hooks are invoked. 1) sgx_mmap() invokes security_file_mprotect() to validate requested protection. It is necessary because security_mmap_file() invoked by mmap() syscall only validates protections against /dev/sgx/enclave file, but not against those files from which the pages were loaded from. 2) security_enclave_load() is invoked upon loading of every enclave page by the EADD ioctl. Please note that if pages are EADD’ed in batch, the SGX subsystem/module is responsible for dividing pages in trunks so that each trunk is loaded from a single VMA. 3) security_enclave_init() is invoked before initializing (EINIT) every enclave. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- arch/x86/kernel/cpu/sgx/driver/ioctl.c | 80 +++++++++++++++++++++++--- arch/x86/kernel/cpu/sgx/driver/main.c | 16 +++++- 2 files changed, 85 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver/ioctl.c b/arch/x86/kernel/cpu/sgx/driver/ioctl.c index b186fb7b48d5..4f5abf9819a7 100644 --- a/arch/x86/kernel/cpu/sgx/driver/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/driver/ioctl.c @@ -1,7 +1,7 @@ // SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) // Copyright(c) 2016-19 Intel Corporation. -#include <asm/mman.h> +#include <linux/mman.h> #include <linux/delay.h> #include <linux/file.h> #include <linux/hashtable.h> @@ -11,6 +11,7 @@ #include <linux/shmem_fs.h> #include <linux/slab.h> #include <linux/suspend.h> +#include <linux/security.h> #include "driver.h" struct sgx_add_page_req { @@ -575,6 +576,46 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, return ret; } +static int sgx_encl_prepare_page(struct file *filp, unsigned long dst, + unsigned long src, void *buf) +{ + struct vm_area_struct *vma; + unsigned long prot; + int rc; + + if (dst & ~PAGE_SIZE) + return -EINVAL; + + rc = down_read_killable(¤t->mm->mmap_sem); + if (rc) + return rc; + + vma = find_vma(current->mm, dst); + if (vma && dst >= vma->vm_start) + prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | + _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | + _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); + else + prot = 0; + + vma = find_vma(current->mm, src); + if (!vma || src < vma->vm_start || src + PAGE_SIZE > vma->vm_end) + rc = -EFAULT; + + if (!rc && !(vma->vm_flags & VM_MAYEXEC)) + rc = -EACCES; + + if (!rc && copy_from_user(buf, (void __user *)src, PAGE_SIZE)) + rc = -EFAULT; + + if (!rc) + rc = security_enclave_load(filp, dst, PAGE_SIZE, prot, vma); + + up_read(¤t->mm->mmap_sem); + + return rc; +} + /** * sgx_ioc_enclave_add_page - handler for %SGX_IOC_ENCLAVE_ADD_PAGE * @@ -613,10 +654,9 @@ static long sgx_ioc_enclave_add_page(struct file *filep, unsigned int cmd, data = kmap(data_page); - if (copy_from_user((void *)data, (void __user *)addp->src, PAGE_SIZE)) { - ret = -EFAULT; + ret = sgx_encl_prepare_page(filep, addp->addr, addp->src, data); + if (ret) goto out; - } ret = sgx_encl_add_page(encl, addp->addr, data, &secinfo, addp->mrmask); if (ret) @@ -718,6 +758,31 @@ static int sgx_encl_init(struct sgx_encl *encl, struct sgx_sigstruct *sigstruct, return ret; } +static int sgx_encl_prepare_sigstruct(struct file *filp, unsigned long src, + struct sgx_sigstruct *ss) +{ + struct vm_area_struct *vma; + int rc; + + rc = down_read_killable(¤t->mm->mmap_sem); + if (rc) + return rc; + + vma = find_vma(current->mm, src); + if (!vma || src < vma->vm_start || src + sizeof(*ss) > vma->vm_end) + rc = -EFAULT; + + if (!rc && copy_from_user(ss, (void __user *)src, sizeof(*ss))) + rc = -EFAULT; + + if (!rc) + rc = security_enclave_init(filp, ss, vma); + + up_read(¤t->mm->mmap_sem); + + return rc; +} + /** * sgx_ioc_enclave_init - handler for %SGX_IOC_ENCLAVE_INIT * @@ -753,12 +818,9 @@ static long sgx_ioc_enclave_init(struct file *filep, unsigned int cmd, ((unsigned long)sigstruct + PAGE_SIZE / 2); memset(einittoken, 0, sizeof(*einittoken)); - if (copy_from_user(sigstruct, (void __user *)initp->sigstruct, - sizeof(*sigstruct))) { - ret = -EFAULT; + ret = sgx_encl_prepare_sigstruct(filep, initp->sigstruct, sigstruct); + if (ret) goto out; - } - ret = sgx_encl_init(encl, sigstruct, einittoken); diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c index 58ba6153070b..8848711a55bd 100644 --- a/arch/x86/kernel/cpu/sgx/driver/main.c +++ b/arch/x86/kernel/cpu/sgx/driver/main.c @@ -63,14 +63,26 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { struct sgx_encl *encl = file->private_data; + unsigned long prot; + int rc; vma->vm_ops = &sgx_vm_ops; vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; vma->vm_private_data = encl; - kref_get(&encl->refcount); + prot = vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC); + vma->vm_flags &= ~prot; - return 0; + prot = _calc_vm_trans(prot, VM_READ, PROT_READ) | + _calc_vm_trans(prot, VM_WRITE, PROT_WRITE) | + _calc_vm_trans(prot, VM_EXEC, PROT_EXEC); + rc = security_file_mprotect(vma, prot, prot); + if (!rc) { + vma->vm_flags |= calc_vm_prot_bits(prot, 0); + kref_get(&encl->refcount); + } + + return rc; } static unsigned long sgx_get_unmapped_area(struct file *file, -- 2.17.1
As enclave pages have different lifespan than the existing MAP_PRIVATE and MAP_SHARED pages, a new data structure is required outside of VMA to track their protections and/or origins. Enclave Memory Area (or EMA for short) has been introduced to address the need. EMAs are maintained by a new LSM module named “ema”, which is similar to the idea of the “capability” LSM module. This new “ema” module has LSM_ORDER_FIRST so will always be dispatched before other LSM_ORDER_MUTABLE modules (e.g. selinux, apparmor, etc.). It is responsible for initializing EMA maps, and inserting and freeing EMA nodes, and offers APIs for other LSM modules to query/update EMAs. Details could be found in include/linux/lsm_ema.h and security/commonema.c. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- include/linux/lsm_ema.h | 97 ++++++++++++++ security/Makefile | 1 + security/commonema.c | 277 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 375 insertions(+) create mode 100644 include/linux/lsm_ema.h create mode 100644 security/commonema.c diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h new file mode 100644 index 000000000000..59fc4ea6fa78 --- /dev/null +++ b/include/linux/lsm_ema.h @@ -0,0 +1,97 @@ +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ +/** + * Enclave Memory Area interface for LSM modules + * + * Copyright(c) 2016-19 Intel Corporation. + */ + +#ifndef _LSM_EMA_H_ +#define _LSM_EMA_H_ + +#include <linux/list.h> +#include <linux/mutex.h> +#include <linux/fs.h> +#include <linux/file.h> + +/** + * ema - Enclave Memory Area structure for LSM modules + * + * Data structure to track origins of enclave pages + * + * @link: + * Link to adjacent EMAs. EMAs are sorted by their addresses in ascending + * order + * @start: + * Starting address + * @end: + * Ending address + * @source: + * File from which this range was loaded from, or NULL if not loaded from + * any files + */ +struct ema { + struct list_head link; + size_t start; + size_t end; + struct file *source; +}; + +#define ema_data(ema, offset) \ + ((void *)((char *)((struct ema *)(ema) + 1) + offset)) + +/** + * ema_map - LSM Enclave Memory Map structure for LSM modules + * + * Container for EMAs of an enclave + * + * @list: + * Head of a list of sorted EMAs + * @lock: + * Acquire before querying/updateing the list EMAs + */ +struct ema_map { + struct list_head list; + struct mutex lock; +}; + +size_t __init ema_request_blob(size_t blob_size); +struct ema_map *ema_get_map(struct file *encl); +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end, + int (*cb)(struct ema *ema, void *arg), void *arg); +void ema_remove_range(struct ema_map *map, size_t start, size_t end); + +static inline int __must_check ema_lock_map(struct ema_map *map) +{ + return mutex_lock_interruptible(&map->lock); +} + +static inline void ema_unlock_map(struct ema_map *map) +{ + mutex_unlock(&map->lock); +} + +static inline int ema_lock_apply_to_range(struct ema_map *map, + size_t start, size_t end, + int (*cb)(struct ema *, void *), + void *arg) +{ + int rc = ema_lock_map(map); + if (!rc) { + rc = ema_apply_to_range(map, start, end, cb, arg); + ema_unlock_map(map); + } + return rc; +} + +static inline int ema_lock_remove_range(struct ema_map *map, + size_t start, size_t end) +{ + int rc = ema_lock_map(map); + if (!rc) { + ema_remove_range(map, start, end); + ema_unlock_map(map); + } + return rc; +} + +#endif /* _LSM_EMA_H_ */ diff --git a/security/Makefile b/security/Makefile index c598b904938f..b66d03a94853 100644 --- a/security/Makefile +++ b/security/Makefile @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o +obj-$(CONFIG_INTEL_SGX) += commonema.o # Object integrity file lists subdir-$(CONFIG_INTEGRITY) += integrity diff --git a/security/commonema.c b/security/commonema.c new file mode 100644 index 000000000000..c5b0bdfdc013 --- /dev/null +++ b/security/commonema.c @@ -0,0 +1,277 @@ +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) +// Copyright(c) 2016-18 Intel Corporation. + +#include <linux/lsm_ema.h> +#include <linux/lsm_hooks.h> +#include <linux/slab.h> + +static struct kmem_cache *_map_cache; +static struct kmem_cache *_node_cache; +static size_t _data_size __lsm_ro_after_init; + +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = { + .lbs_file = sizeof(atomic_long_t), +}; + +static atomic_long_t *_map_file(struct file *encl) +{ + return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file); +} + +static struct ema_map *_alloc_map(void) +{ + struct ema_map *m; + + m = kmem_cache_zalloc(_map_cache, GFP_KERNEL); + if (likely(m)) { + INIT_LIST_HEAD(&m->list); + mutex_init(&m->lock); + } + return m; +} + +static struct ema *_new_ema(size_t start, size_t end, struct file *src) +{ + struct ema *ema; + + if (unlikely(!_node_cache)) { + struct kmem_cache *c; + + c = kmem_cache_create("lsm-ema", sizeof(*ema) + _data_size, + __alignof__(typeof(*ema)), SLAB_PANIC, + NULL); + if (atomic_long_cmpxchg((atomic_long_t *)&_node_cache, 0, + (long)c)) + kmem_cache_destroy(c); + } + + ema = kmem_cache_zalloc(_node_cache, GFP_KERNEL); + if (likely(ema)) { + INIT_LIST_HEAD(&ema->link); + ema->start = start; + ema->end = end; + if (src) + ema->source = get_file(src); + } + return ema; +} + +static void _free_ema(struct ema *ema) +{ + if (ema->source) + fput(ema->source); + kmem_cache_free(_node_cache, ema); +} + +static void _free_map(struct ema_map *map) +{ + struct ema *p, *n; + + WARN_ON(mutex_is_locked(&map->lock)); + list_for_each_entry_safe(p, n, &map->list, link) + _free_ema(p); + kmem_cache_free(_map_cache, map); +} + +static struct ema_map *_init_map(struct file *encl) +{ + struct ema_map *m = ema_get_map(encl); + if (!m) { + m = _alloc_map(); + if (atomic_long_cmpxchg(_map_file(encl), 0, (long)m)) { + _free_map(m); + m = ema_get_map(encl); + } + } + return m; +} + +static inline struct ema *_next_ema(struct ema *p, struct ema_map *map) +{ + p = list_next_entry(p, link); + return &p->link == &map->list ? NULL : p; +} + +static inline struct ema *_find_ema(struct ema_map *map, size_t a) +{ + struct ema *p; + + WARN_ON(!mutex_is_locked(&map->lock)); + + list_for_each_entry(p, &map->list, link) + if (a < p->end) + break; + return &p->link == &map->list ? NULL : p; +} + +static struct ema *_split_ema(struct ema *p, size_t at) +{ + typeof(p) n; + + if (at <= p->start || at >= p->end) + return p; + + n = _new_ema(p->start, at, p->source); + if (likely(n)) { + memcpy(n + 1, p + 1, _data_size); + p->start = at; + list_add_tail(&n->link, &p->link); + } + return n; +} + +static int _merge_ema(struct ema *p, struct ema_map *map) +{ + typeof(p) prev = list_prev_entry(p, link); + + WARN_ON(!mutex_is_locked(&map->lock)); + + if (&prev->link == &map->list || prev->end != p->start || + prev->source != p->source || memcmp(prev + 1, p + 1, _data_size)) + return 0; + + p->start = prev->start; + fput(prev->source); + _free_ema(prev); + return 1; +} + +static inline int _insert_ema(struct ema_map *map, struct ema *n) +{ + typeof(n) p = _find_ema(map, n->start); + + if (!p) + list_add_tail(&n->link, &map->list); + else if (n->end <= p->start) + list_add_tail(&n->link, &p->link); + else + return -EEXIST; + + _merge_ema(n, map); + if (p) + _merge_ema(p, map); + return 0; +} + +static void ema_file_free_security(struct file *encl) +{ + struct ema_map *m = ema_get_map(encl); + if (m) + _free_map(m); +} + +static int ema_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *vma) +{ + struct ema_map *m; + struct ema *ema; + int rc; + + m = _init_map(encl); + if (unlikely(!m)) + return -ENOMEM; + + ema = _new_ema(start, end, vma ? vma->vm_file : NULL); + if (unlikely(!ema)) + return -ENOMEM; + + rc = ema_lock_map(m); + if (!rc) { + rc = _insert_ema(m, ema); + ema_unlock_map(m); + } + if (rc) + _free_ema(ema); + return rc; +} + +static int ema_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, + struct vm_area_struct *vma) +{ + if (unlikely(!_init_map(encl))) + return -ENOMEM; + return 0; +} + +static struct security_hook_list ema_hooks[] __lsm_ro_after_init = { + LSM_HOOK_INIT(file_free_security, ema_file_free_security), + LSM_HOOK_INIT(enclave_load, ema_enclave_load), + LSM_HOOK_INIT(enclave_init, ema_enclave_init), +}; + +static int __init ema_init(void) +{ + _map_cache = kmem_cache_create("lsm-ema_map", sizeof(struct ema_map), + __alignof__(struct ema_map), SLAB_PANIC, + NULL); + security_add_hooks(ema_hooks, ARRAY_SIZE(ema_hooks), "ema"); + return 0; +} + +DEFINE_LSM(ema) = { + .name = "ema", + .order = LSM_ORDER_FIRST, + .init = ema_init, + .blobs = &ema_blob_sizes, +}; + +/* ema_request_blob shall only be called from LSM module init function */ +size_t __init ema_request_blob(size_t size) +{ + typeof(_data_size) offset = _data_size; + _data_size += size; + return offset; +} + +struct ema_map *ema_get_map(struct file *encl) +{ + return (struct ema_map *)atomic_long_read(_map_file(encl)); +} + +/** + * Invoke a callback function on every EMA falls within range, split EMAs as + * needed + */ +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end, + int (*cb)(struct ema *, void *), void *arg) +{ + struct ema *ema; + int rc; + + ema = _find_ema(map, start); + while (ema && end > ema->start) { + if (start > ema->start) + _split_ema(ema, start); + if (end < ema->end) + ema = _split_ema(ema, end); + + rc = (*cb)(ema, arg); + _merge_ema(ema, map); + if (rc) + return rc; + + ema = _next_ema(ema, map); + } + + if (ema) + _merge_ema(ema, map); + return 0; +} + +/* Remove all EMAs falling within range, split EMAs as needed */ +void ema_remove_range(struct ema_map *map, size_t start, size_t end) +{ + struct ema *ema, *n; + + ema = _find_ema(map, start); + while (ema && end > ema->start) { + if (start > ema->start) + _split_ema(ema, start); + if (end < ema->end) + ema = _split_ema(ema, end); + + n = _next_ema(ema, map); + _free_ema(ema); + ema = n; + } +} -- 2.17.1
This patch governs enclave page protections in a similar way to how current SELinux governs protections for regular memory pages. In summary: · All pages are allowed PROT_READ/PROT_WRITE upon request. · For pages that are EADD’ed, PROT_EXEC will be granted initially if PROT_EXEC could also be granted to the VMA containing the source pages, or if the calling process has ENCLAVE_EXECANON permission. Afterwards, PROT_EXEC will be removed once PROT_WRITE is requested/granted, and could be granted again if the backing file has EXECMOD or the calling process has PROCMEM. For anonymous pages, backing file is considered to be the file containing SIGSTRUCT. · For pages that are EAUG’ed, they are considered modified initially so PROT_EXEC will not be granted unless the file containing SIGSTRUCT has EXECMOD, or the calling process has EXECMEM. Besides, launch control is implemented as EXECUTE permission on the SIGSTRUCT file. That is, · SIGSTRUCT file has EXECUTE – Enclave is allowed to launch. But this is granted only if the enclosing VMA has the same content as the disk file (i.e. vma->anon_vma == NULL). · SIGSTRUCT file has EXECMOD – All anonymous enclave pages are allowed PROT_EXEC. In all cases, simultaneous WX requires EXECMEM on the calling process. Implementation wise, 2 bits are associated with every EMA by SELinux. · sourced – Set if EMA is loaded from some memory page (i.e. EADD’ed), cleared otherwise. When cleared, the backing file is considered to be the file containing SIGSTRUCT. · modified – Set if EMA has ever been mapped writable, as result of mmap()/mprotect() syscalls. When set, FILE__EXECMOD is required on the backing file for the range to be executable. Both bits are initialized at selinux_enclave_load() and checked in selinux_file_mprotect(). SGX subsystem is expected to invoke security_file_mprotect() upon mmap() to not bypass the check. mmap() shall be treated as mprotect() from PROT_NONE to the requested protection. selinux_enclave_init() determines if an enclave is allowed to launch, using the criteria described earlier. This implementation does NOT accept SIGSTRUCT in anonymous memory. The backing file is also cached in struct file_security_struct and will serve as the base for decisions for anonymous pages. There’s one new process permission – PROCESS2__ENCLAVE_EXECANON introduced by this patch. It is equivalent to FILE__EXECUTE for all enclave pages loaded from anonymous mappings. Signed-off-by: Cedric Xing <cedric.xing@intel.com> --- security/selinux/hooks.c | 236 +++++++++++++++++++++++++++- security/selinux/include/classmap.h | 3 +- security/selinux/include/objsec.h | 7 + 3 files changed, 243 insertions(+), 3 deletions(-) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 94de51628fdc..c7fe1d47654d 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3499,6 +3499,13 @@ static int selinux_file_alloc_security(struct file *file) return file_alloc_security(file); } +static void selinux_file_free_security(struct file *file) +{ + long f = atomic_long_read(&selinux_file(file)->encl_ss); + if (f) + fput((struct file *)f); +} + /* * Check whether a task has the ioctl permission and cmd * operation to an inode. @@ -3666,19 +3673,23 @@ static int selinux_mmap_file(struct file *file, unsigned long reqprot, (flags & MAP_TYPE) == MAP_SHARED); } +#ifdef CONFIG_INTEL_SGX +static int enclave_mprotect(struct vm_area_struct *, size_t); +#endif + static int selinux_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot) { const struct cred *cred = current_cred(); u32 sid = cred_sid(cred); + int rc = 0; if (selinux_state.checkreqprot) prot = reqprot; if (default_noexec && (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) { - int rc = 0; if (vma->vm_start >= vma->vm_mm->start_brk && vma->vm_end <= vma->vm_mm->brk) { rc = avc_has_perm(&selinux_state, @@ -3705,7 +3716,12 @@ static int selinux_file_mprotect(struct vm_area_struct *vma, return rc; } - return file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED); + rc = file_map_prot_check(vma->vm_file, prot, vma->vm_flags&VM_SHARED); +#ifdef CONFIG_INTEL_SGX + if (!rc) + rc = enclave_mprotect(vma, prot); +#endif + return rc; } static int selinux_file_lock(struct file *file, unsigned int cmd) @@ -6740,6 +6756,213 @@ static void selinux_bpf_prog_free(struct bpf_prog_aux *aux) } #endif +#ifdef CONFIG_INTEL_SGX +static size_t ema__blob __lsm_ro_after_init; + +static inline struct ema_security_struct *selinux_ema(struct ema *ema) +{ + return ema_data(ema, ema__blob); +} + +static int ema__chk_X_cb(struct ema *ema, void *a) +{ + struct file_security_struct *fsec = selinux_file(a); + struct ema_security_struct *esec = selinux_ema(ema); + struct file *ess = (struct file *)atomic_long_read(&fsec->encl_ss); + int rc; + + if (!esec->sourced) { + /* EAUG'ed pages */ + rc = file_has_perm(current_cred(), ess, FILE__EXECMOD); + } else if (!ema->source) { + /* EADD'ed anonymous pages */ + u32 sid = current_sid(); + rc = avc_has_perm(&selinux_state, sid, sid, SECCLASS_PROCESS2, + PROCESS2__ENCLAVE_EXECANON, NULL); + if (rc) + rc = avc_has_perm(&selinux_state, sid, sid, + SECCLASS_PROCESS, PROCESS__EXECMEM, + NULL); + if (!rc && esec->modified) + rc = file_has_perm(current_cred(), ess, FILE__EXECMOD); + } else { + /* EADD'ed pages from files */ + u32 av = FILE__EXECUTE; + if (esec->modified) + av |= FILE__EXECMOD; + rc = file_has_perm(current_cred(), ema->source, av); + } + + return rc; +} + +static int ema__set_M_cb(struct ema *ema, void *a) +{ + selinux_ema(ema)->modified = 1; + return 0; +} + +static int enclave_mprotect(struct vm_area_struct *vma, size_t prot) +{ + struct ema_map *m; + int rc; + + /* is vma an enclave vma ? */ + if (!vma->vm_file) + return 0; + m = ema_get_map(vma->vm_file); + if (!m) + return 0; + + /* WX requires EXECMEM */ + if ((prot && PROT_WRITE) && (prot & PROT_EXEC)) { + rc = avc_has_perm(&selinux_state, current_sid(), current_sid(), + SECCLASS_PROCESS, PROCESS__EXECMEM, NULL); + if (rc) + return rc; + } + + rc = ema_lock_map(m); + if (rc) + return rc; + + if ((prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, + ema__chk_X_cb, vma->vm_file); + if (!rc && (prot & PROT_WRITE) && !(vma->vm_flags & VM_WRITE)) + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, + ema__set_M_cb, NULL); + + ema_unlock_map(m); + + return rc; +} + +static int enclave_load_prot_check(struct file *encl, size_t prot, + struct vm_area_struct *vma) +{ + struct file_security_struct *fsec = selinux_file(encl); + struct file *ess; + const struct cred *cred = current_cred(); + u32 sid = cred_sid(cred); + int rc; + int modified = 0; + + /* R/W without X are always allowed */ + if (!(prot & PROT_EXEC)) + /* R/W always allowed */ + return 0; + + if (!vma) { + ess = (struct file *)atomic_long_read(&fsec->encl_ss); + WARN_ON(!ess); + if (unlikely(!ess)) + return -EPERM; + + /* For EAUG, X is considered self-modifying code */ + rc = file_has_perm(cred, ess, FILE__EXECMOD); + } else if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file))) { + /* EADD from anonymous pages requires ENCLAVE_EXECANON */ + if (!(prot & PROT_WRITE) && + avc_has_perm(&selinux_state, sid, sid, SECCLASS_PROCESS2, + PROCESS2__ENCLAVE_EXECANON, NULL)) { + /* On failure, Trigger EXECMEM check at the end */ + prot |= PROT_WRITE; + } + rc = 0; + } else { + /* EADD from file requires EXECUTE */ + u32 av = FILE__EXECUTE; + + /* EXECMOD required for modified private mapping */ + if (vma->anon_vma) { + av |= FILE__EXECMOD; + modified = 1; + } + + rc = file_has_perm(cred, vma->vm_file, av); + } + + /* WX requires EXECMEM additionally */ + if (!rc && (prot & PROT_WRITE)) + rc = avc_has_perm(&selinux_state, sid, sid, SECCLASS_PROCESS, + PROCESS__EXECMEM, NULL); + + return rc ? rc : modified; +} + +static int ema__set_cb(struct ema *ema, void *a) +{ + struct ema_security_struct *esec = selinux_ema(ema); + struct ema_security_struct *s = a; + + esec->modified = s->modified; + esec->sourced = s->sourced; + return 0; +} + +static int selinux_enclave_load(struct file *encl, size_t start, size_t end, + size_t flags, struct vm_area_struct *src) +{ + struct ema_map *m; + size_t prot; + int rc; + + m = ema_get_map(encl); + WARN_ON(!m); + if (unlikely(!m)) + return -EPERM; + + prot = flags & (PROT_READ | PROT_WRITE | PROT_EXEC); + + /* check if @prot could be granted */ + rc = enclave_load_prot_check(encl, prot, src); + + /* initialize ema */ + if (rc >= 0) { + struct ema_security_struct esec; + + if ((prot & PROT_WRITE) || rc) + esec.modified = 1; + if (src) + esec.sourced = 1; + + rc = ema_lock_apply_to_range(m, start, end, + ema__set_cb, &esec); + } + + /* remove ema on error */ + if (rc) + ema_remove_range(m, start, end); + + return rc; +} + +static int selinux_enclave_init(struct file *encl, + struct sgx_sigstruct *sigstruct, + struct vm_area_struct *src) +{ + struct file_security_struct *fsec = selinux_file(encl); + int rc; + + /* Is @src mapped shared, or mapped privately and not modified? */ + if (!src->vm_file || src->anon_vma) + return -EACCES; + + /* EXECUTE grants enclaves permission to launch */ + rc = file_has_perm(current_cred(), src->vm_file, FILE__EXECUTE); + if (rc) + return rc; + + /* Store SIGSTRUCT file for future use */ + if (atomic_long_cmpxchg(&fsec->encl_ss, 0, (long)src->vm_file)) + return -EEXIST; + + get_file(src->vm_file); + return 0; +} +#endif + struct lsm_blob_sizes selinux_blob_sizes __lsm_ro_after_init = { .lbs_cred = sizeof(struct task_security_struct), .lbs_file = sizeof(struct file_security_struct), @@ -6822,6 +7045,7 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(file_permission, selinux_file_permission), LSM_HOOK_INIT(file_alloc_security, selinux_file_alloc_security), + LSM_HOOK_INIT(file_free_security, selinux_file_free_security), LSM_HOOK_INIT(file_ioctl, selinux_file_ioctl), LSM_HOOK_INIT(mmap_file, selinux_mmap_file), LSM_HOOK_INIT(mmap_addr, selinux_mmap_addr), @@ -6982,6 +7206,11 @@ static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free), LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free), #endif + +#ifdef CONFIG_INTEL_SGX + LSM_HOOK_INIT(enclave_load, selinux_enclave_load), + LSM_HOOK_INIT(enclave_init, selinux_enclave_init), +#endif }; static __init int selinux_init(void) @@ -7007,6 +7236,9 @@ static __init int selinux_init(void) hashtab_cache_init(); +#ifdef CONFIG_INTEL_SGX + ema__blob = ema_request_blob(sizeof(struct ema_security_struct)); +#endif security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks), "selinux"); if (avc_add_callback(selinux_netcache_avc_callback, AVC_CALLBACK_RESET)) diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h index 201f7e588a29..0d3161a52577 100644 --- a/security/selinux/include/classmap.h +++ b/security/selinux/include/classmap.h @@ -51,7 +51,8 @@ struct security_class_mapping secclass_map[] = { "execmem", "execstack", "execheap", "setkeycreate", "setsockcreate", "getrlimit", NULL } }, { "process2", - { "nnp_transition", "nosuid_transition", NULL } }, + { "nnp_transition", "nosuid_transition", + "enclave_execanon", NULL } }, { "system", { "ipc_info", "syslog_read", "syslog_mod", "syslog_console", "module_request", "module_load", NULL } }, diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index 91c5395dd20c..8d1ce9c6d6fa 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -23,6 +23,7 @@ #include <linux/in.h> #include <linux/spinlock.h> #include <linux/lsm_hooks.h> +#include <linux/lsm_ema.h> #include <linux/msg.h> #include <net/net_namespace.h> #include "flask.h" @@ -68,6 +69,7 @@ struct file_security_struct { u32 fown_sid; /* SID of file owner (for SIGIO) */ u32 isid; /* SID of inode at the time of file open */ u32 pseqno; /* Policy seqno at the time of file open */ + atomic_long_t encl_ss; /* Enclave sigstruct file */ }; struct superblock_security_struct { @@ -154,6 +156,11 @@ struct bpf_security_struct { u32 sid; /*SID of bpf obj creater*/ }; +struct ema_security_struct { + int modified:1; /* Set when W is granted */ + int sourced:1; /* Set if loaded from source in regular memory */ +}; + extern struct lsm_blob_sizes selinux_blob_sizes; static inline struct task_security_struct *selinux_cred(const struct cred *cred) { -- 2.17.1
On Fri, Jun 21, 2019 at 10:18:55AM -0700, Xing, Cedric wrote: > > From: Christopherson, Sean J > > Sent: Wednesday, June 19, 2019 3:24 PM > > > > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c b/arch/x86/kernel/cpu/sgx/driver/main.c > > index 4379a2fb1f82..b478c0f45279 100644 > > --- a/arch/x86/kernel/cpu/sgx/driver/main.c > > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c > > @@ -99,7 +99,8 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, > > * page is considered to have no RWX permissions, i.e. is inaccessible. > > */ > > static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, > > - struct vm_area_struct *vma) > > + struct vm_area_struct *vma, > > + bool *eaug) > > { > > unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; > > unsigned long idx, idx_start, idx_end; @@ -123,6 +124,8 @@ static unsigned long > > sgx_allowed_rwx(struct sgx_encl *encl, > > allowed_rwx = 0; > > else > > allowed_rwx &= page->vm_prot_bits; > > + if (page->vm_prot_bits & SGX_VM_EAUG) > > + *eaug = true; > > if (!allowed_rwx) > > break; > > } > > @@ -134,16 +137,17 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma) > > { > > struct sgx_encl *encl = file->private_data; > > unsigned long allowed_rwx, prot; > > + bool eaug = false; > > int ret; > > > > - allowed_rwx = sgx_allowed_rwx(encl, vma); > > + allowed_rwx = sgx_allowed_rwx(encl, vma, &eaug); > > if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) > > return -EACCES; > > IIUC, "eaug range" has to be mapped PROT_NONE, then vm_ops->fault() won't be > invoked. Am I correct? Then how to EAUG on #PF? Pages tagged SGX_VM_EAUG also have maximal permissions and can be mapped PROT_{READ,WRITE,EXEC} accordingly. > > > > > prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | > > _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | > > _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); > > - ret = security_enclave_map(prot); > > + ret = security_enclave_map(prot, eaug); > > if (ret) > > return ret; > >
On Fri, 2019-07-05 at 22:04 -0700, Xing, Cedric wrote:
> On 7/3/2019 4:16 PM, Jarkko Sakkinen wrote:
> > On Thu, Jun 27, 2019 at 11:56:18AM -0700, Cedric Xing wrote:
> >
> > I think it is fine to have these patch sets as a discussion starters but
> > it does not make any sense to me to upstream LSM changes with the SGX
> > foundations.
>
> Guess LSM is a gating factor, because otherwise SGX could be abused to
> make executable EPC from pages that are otherwise not allowed to be
> executable. Am I missing anything?
No, but what was the point? LSM is always additional gating factor.
Does not make a case for any of the proposed LSM changes.
/Jarrko
On Fri, Jun 21, 2019 at 12:03:36AM +0300, Jarkko Sakkinen wrote: > On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote: > > Using per-vma refcounting to track mm_structs associated with an enclave > > requires hooking .vm_close(), which in turn prevents the mm from merging > > vmas (precisely to allow refcounting). > > Why having sgx_vma_close() prevents that? I do not understand the > problem statement. vmas that define .vm_close() cannot be merged. /* * If the vma has a ->close operation then the driver probably needs to release * per-vma resources, so we don't attempt to merge those. */ static inline int is_mergeable_vma(struct vm_area_struct *vma, struct file *file, unsigned long vm_flags, struct vm_userfaultfd_ctx vm_userfaultfd_ctx) { ... if (vma->vm_ops && vma->vm_ops->close) return 0; if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx)) return 0; return 1; } > > > Avoid refcounting encl_mm altogether by registering an mmu_notifier at > > .mmap(), removing the dying encl_mm at mmu_notifier.release() and > > protecting mm_list during reclaim via a per-enclave SRCU. > > Right, there is the potential collision with my changes: > > 1. Your patch: enclave life-cycle equals life-cycle of all processes > that are associated with the enclave. > 2. My (yet be sent) patch: enclave life-cycle equals the life cycle. > > I won't rush with my patch. I rather merge neither at this point and > you can review mine after you come back from your vacation. > > > Removing refcounting/vm_close() allows merging of enclave vmas, at the > > cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is > > disassociated from an enclave when the mm exits or the enclave dies, as > > opposed to when the last vma (in a given mm) is closed. > > > > The impact of delying encl_mm removal is its memory footprint and > > whatever overhead is incurred during EPC reclaim (to walk an mm's vmas). > > Practically speaking, a stale encl_mm will exist for a meaningful amount > > of time if and only if the enclave is mapped in a long-lived process and > > then passed off to another long-lived process. It is expected that the > > vast majority of use cases will not encounter this condition, e.g. even > > using a daemon to build enclaves should not result in a stale encl_mm as > > the builder should never need to mmap() the enclave. > > This paragraph speaks only about "well behaving" software. Malicious software isn't all that interesting as there are far easier ways to waste system resources. That being said, the encl_mm allocation can use GFP_KERNEL_ACCOUNT. > > Even if there are scenarios that lead to defunct encl_mms, the cost is > > likely far outweighed by the benefits of reducing the number of vmas > > across all enclaves. > > > > Note, using SRCU to protect mm_list is not strictly necessary, i.e. the > > existing walker with encl_mm refcounting could be massaged to work with > > mmu_notifier.release(), but the resulting code is subtle and fragile (I > > never actually got it working). The primary issue is that an encl_mm > > can't be moved off the list until its refcount goes to zero, otherwise > > the custom walker goes off into the weeds. The refcount requirement > > then prevents using mm_list to identify if an mmu_notifier.release() > > has fired, i.e. another mechanism is needed to guard against races > > between exit_mmap() and sgx_release(). > > Is it really impossible to send a separate SRCU patch? I can split out the SRCU as a precursor. It'll likely take me a few days to get it sent. > I fully agree with the SRCU whereas rest of this patch is still > under debate. > > If you could do that, I can merge it in no time. It is a small > step into better direction. > > > Cc: Dave Hansen <dave.hansen@intel.com> > > Cc: Andy Lutomirski <luto@kernel.org> > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > > Needs to be rebased because the master missing your earlier bug fix. ... > > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c > > index 9566eb72d417..c6436bbd4a68 100644 > > --- a/arch/x86/kernel/cpu/sgx/encl.c > > +++ b/arch/x86/kernel/cpu/sgx/encl.c > > @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, > > return entry; > > } > > > > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > > - struct mm_struct *mm) > > +static void sgx_encl_mm_release_wq(struct work_struct *work) > > +{ > > + struct sgx_encl_mm *encl_mm = > > + container_of(work, struct sgx_encl_mm, release_work); > > + > > + sgx_encl_mm_release(encl_mm); > > +} > > + > > +/* > > + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release() > > + * is anything but short. Do the final freeing in yet another async callback. > > + */ > > +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu) > > Would rename this either as *_tail() or *_deferred(). Deferred works for me. > > +{ > > + struct sgx_encl_mm *encl_mm = > > + container_of(rcu, struct sgx_encl_mm, rcu); > > + > > + INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq); > > + schedule_work(&encl_mm->release_work); > > +} > > + ... > > @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl); > > void sgx_encl_release(struct kref *ref); > > pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page); > > struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index); > > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl, > > - struct sgx_encl_mm *encl_mm, int *iter); > > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl, > > - struct mm_struct *mm); > > -void sgx_encl_mm_release(struct kref *ref); > > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); > > +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm) > > +{ > > + kref_put(&encl_mm->encl->refcount, sgx_encl_release); > > + > > + kfree(encl_mm); > > +} > > Please just open code this to the two call sites. Makes the code hard to > follow. Heh, I waffled between a helper and open coding. I chose poorly :-) > Right now I did not find anything else questionable from the code > changes. Repeating myself but if it is by any means possible before > going away, can you construct a pure SRCU patch. > > I could then reconstruct my changes on top off that, which would > make evalution of both heck a lot easier. > > /Jarkko
On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote: ... > different FSMs to govern page protection transitions. Implementation wise, his > model also imposes unwanted restrictions specifically to SGX2, such as: > · Complicated/Restricted UAPI – Enclave loaders are required to provide I don't think "complicated" is a fair assessment. For SGX1 enclaves it's literally a direct propagation of the SECINFO RWX flags. > “maximal protection” at page load time, but such information is NOT always > available. For example, Graphene containers may run different applications > comprised of different set of executables and/or shared objects. Some of > them may contain self-modifying code (or text relocation) while others > don’t. The generic enclave loader usually doesn’t have such information so > wouldn’t be able to provide it ahead of time. I'm unconvinced that it would be remotely difficult to teach an enclave loader that an enclave or hosted application employs SMC, relocation or any other behavior that would require declaring RWX on all pages. > · Inefficient Auditing – Audit logs are supposed to help system > administrators to determine the set of minimally needed permissions and to > detect abnormal behaviors. But consider the “maximal protection” model, if > “maximal protection” is set to be too permissive, then audit log wouldn’t > be able to detect anomalies; Huh? Declaring overly permissive protections is only problematic if an LSM denies the permission, in which case it will generate an accurate audit log. If the enclave/loader "requires" a permission it doesn't actually need, e.g. EXECDIRTY, then it's a software bug that should be fixed. I don't see how this scenario is any different than an application that uses assembly code without 'noexecstack' and inadvertantly "requires" EXECSTACK due to triggering "read implies exec". In both cases the denied permission is unnecessary due to a userspace application bug. > or if “maximal protection” is too restrictive, > then audit log cannot identify the file violating the policy. Maximal protections that are too restrictive are completely orthogonal to LSMs as the enclave would fail to run irrespective of LSMs. This is no different than specifying the wrong RWX flags in SECINFO, or opening a file as RO instead of RW. > In either case the audit log cannot fulfill its purposes. > · Inability to support #PF driven EPC allocation in SGX2 – For those > unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by > issuing EACCEPT on the address that a new page is wanted, and the resulted > #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the > fault address, and then the enclave would be resumed and the faulting > EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing > non-existing enclave pages so that the SGX module/subsystem could respond > to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow > mmap()’ing non-existing pages for variety of reasons and therefore blocks > this major SGX2 usage. This is simply wrong. The key requirement in the theoretical EAUG scheme is to mmap() pages that have not been added to the _hardware_ maintained enclave. The pages (or some optimized representation of a range of pages) would exist in the kernel's software mode of the enclave.
On 7/7/2019 4:41 PM, Cedric Xing wrote: > As enclave pages have different lifespan than the existing MAP_PRIVATE and > MAP_SHARED pages, a new data structure is required outside of VMA to track > their protections and/or origins. Enclave Memory Area (or EMA for short) has > been introduced to address the need. EMAs are maintained by a new LSM module > named “ema”, which is similar to the idea of the “capability” LSM module. First off, I'll say that this is an improvement over the LSM integrated version that preceded it. I still have some issues with the naming, but I'll address that inline. I do have a suggestion that I think will make this more conventional. In this scheme you use an ema LSM to manage your ema data. A quick sketch looks like: sgx_something_in() calls security_enclave_load() calls ema_enclave_load() selinux_enclave_load() otherlsm_enclave_load() Why is this better than: sgx_something_in() calls ema_enclave_load() security_enclave_load() calls selinux_enclave_load() otherlsm_enclave_load() If you did really want ema to behave like an LSM you would put the file data that SELinux is managing into the ema portion of the blob and provide interfaces for the SELinux (or whoever) to use that. Also, it's an abomination (as I've stated before) for ema to rely on SELinux to provide a file_free() hook for ema's data. If you continue down the LSM route, you need to provide an ema_file_free() hook. You can't count on SELinux to do it for you. If there are multiple LSMs (coming soon!) that use the ema data, they'll all try to free it, and then Bad Things can happen. > > This new “ema” module has LSM_ORDER_FIRST so will always be dispatched before > other LSM_ORDER_MUTABLE modules (e.g. selinux, apparmor, etc.). It is > responsible for initializing EMA maps, and inserting and freeing EMA nodes, and > offers APIs for other LSM modules to query/update EMAs. Details could be found > in include/linux/lsm_ema.h and security/commonema.c. > > Signed-off-by: Cedric Xing <cedric.xing@intel.com> > --- > include/linux/lsm_ema.h | 97 ++++++++++++++ I still think this should be enclave.h (or commonema.h) as it is not LSM code. > security/Makefile | 1 + > security/commonema.c | 277 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 375 insertions(+) > create mode 100644 include/linux/lsm_ema.h > create mode 100644 security/commonema.c > > diff --git a/include/linux/lsm_ema.h b/include/linux/lsm_ema.h > new file mode 100644 > index 000000000000..59fc4ea6fa78 > --- /dev/null > +++ b/include/linux/lsm_ema.h > @@ -0,0 +1,97 @@ > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */ > +/** > + * Enclave Memory Area interface for LSM modules > + * > + * Copyright(c) 2016-19 Intel Corporation. > + */ > + > +#ifndef _LSM_EMA_H_ > +#define _LSM_EMA_H_ > + > +#include <linux/list.h> > +#include <linux/mutex.h> > +#include <linux/fs.h> > +#include <linux/file.h> > + > +/** > + * ema - Enclave Memory Area structure for LSM modules LSM modules is redundant. "LSM" or "LSMs" would be better. > + * > + * Data structure to track origins of enclave pages > + * > + * @link: > + * Link to adjacent EMAs. EMAs are sorted by their addresses in ascending > + * order > + * @start: > + * Starting address > + * @end: > + * Ending address > + * @source: > + * File from which this range was loaded from, or NULL if not loaded from > + * any files > + */ > +struct ema { > + struct list_head link; > + size_t start; > + size_t end; > + struct file *source; > +}; > + > +#define ema_data(ema, offset) \ > + ((void *)((char *)((struct ema *)(ema) + 1) + offset)) > + > +/** > + * ema_map - LSM Enclave Memory Map structure for LSM modules As above. > + * > + * Container for EMAs of an enclave > + * > + * @list: > + * Head of a list of sorted EMAs > + * @lock: > + * Acquire before querying/updateing the list EMAs > + */ > +struct ema_map { > + struct list_head list; > + struct mutex lock; > +}; > + > +size_t __init ema_request_blob(size_t blob_size); > +struct ema_map *ema_get_map(struct file *encl); > +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end, > + int (*cb)(struct ema *ema, void *arg), void *arg); > +void ema_remove_range(struct ema_map *map, size_t start, size_t end); > + > +static inline int __must_check ema_lock_map(struct ema_map *map) > +{ > + return mutex_lock_interruptible(&map->lock); > +} > + > +static inline void ema_unlock_map(struct ema_map *map) > +{ > + mutex_unlock(&map->lock); > +} > + > +static inline int ema_lock_apply_to_range(struct ema_map *map, > + size_t start, size_t end, > + int (*cb)(struct ema *, void *), > + void *arg) > +{ > + int rc = ema_lock_map(map); > + if (!rc) { > + rc = ema_apply_to_range(map, start, end, cb, arg); > + ema_unlock_map(map); > + } > + return rc; > +} > + > +static inline int ema_lock_remove_range(struct ema_map *map, > + size_t start, size_t end) > +{ > + int rc = ema_lock_map(map); > + if (!rc) { > + ema_remove_range(map, start, end); > + ema_unlock_map(map); > + } > + return rc; > +} > + > +#endif /* _LSM_EMA_H_ */ > diff --git a/security/Makefile b/security/Makefile > index c598b904938f..b66d03a94853 100644 > --- a/security/Makefile > +++ b/security/Makefile > @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ > obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ > obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ > obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o > +obj-$(CONFIG_INTEL_SGX) += commonema.o The config option and the file name ought to match, or at least be closer. > > # Object integrity file lists > subdir-$(CONFIG_INTEGRITY) += integrity > diff --git a/security/commonema.c b/security/commonema.c Put this in a subdirectory. Please. > new file mode 100644 > index 000000000000..c5b0bdfdc013 > --- /dev/null > +++ b/security/commonema.c > @@ -0,0 +1,277 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2016-18 Intel Corporation. > + > +#include <linux/lsm_ema.h> > +#include <linux/lsm_hooks.h> > +#include <linux/slab.h> > + > +static struct kmem_cache *_map_cache; > +static struct kmem_cache *_node_cache; > +static size_t _data_size __lsm_ro_after_init; > + > +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = { > + .lbs_file = sizeof(atomic_long_t), > +}; If this is ema's data ema must manage it. You *must* have a file_free(). > + > +static atomic_long_t *_map_file(struct file *encl) > +{ > + return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file); I don't trust all the casting going on here, especially since you don't end up with the type you should be returning. > +} > + > +static struct ema_map *_alloc_map(void) Function header comments, please. > +{ > + struct ema_map *m; > + > + m = kmem_cache_zalloc(_map_cache, GFP_KERNEL); > + if (likely(m)) { > + INIT_LIST_HEAD(&m->list); > + mutex_init(&m->lock); > + } > + return m; > +} > + > +static struct ema *_new_ema(size_t start, size_t end, struct file *src) > +{ > + struct ema *ema; > + > + if (unlikely(!_node_cache)) { > + struct kmem_cache *c; > + > + c = kmem_cache_create("lsm-ema", sizeof(*ema) + _data_size, > + __alignof__(typeof(*ema)), SLAB_PANIC, > + NULL); > + if (atomic_long_cmpxchg((atomic_long_t *)&_node_cache, 0, > + (long)c)) > + kmem_cache_destroy(c); > + } > + > + ema = kmem_cache_zalloc(_node_cache, GFP_KERNEL); > + if (likely(ema)) { > + INIT_LIST_HEAD(&ema->link); > + ema->start = start; > + ema->end = end; > + if (src) > + ema->source = get_file(src); > + } > + return ema; > +} > + > +static void _free_ema(struct ema *ema) > +{ > + if (ema->source) > + fput(ema->source); > + kmem_cache_free(_node_cache, ema); > +} > + > +static void _free_map(struct ema_map *map) > +{ > + struct ema *p, *n; > + > + WARN_ON(mutex_is_locked(&map->lock)); > + list_for_each_entry_safe(p, n, &map->list, link) > + _free_ema(p); > + kmem_cache_free(_map_cache, map); > +} > + > +static struct ema_map *_init_map(struct file *encl) > +{ > + struct ema_map *m = ema_get_map(encl); > + if (!m) { > + m = _alloc_map(); > + if (atomic_long_cmpxchg(_map_file(encl), 0, (long)m)) { > + _free_map(m); > + m = ema_get_map(encl); > + } > + } > + return m; > +} > + > +static inline struct ema *_next_ema(struct ema *p, struct ema_map *map) > +{ > + p = list_next_entry(p, link); > + return &p->link == &map->list ? NULL : p; > +} > + > +static inline struct ema *_find_ema(struct ema_map *map, size_t a) > +{ > + struct ema *p; > + > + WARN_ON(!mutex_is_locked(&map->lock)); > + > + list_for_each_entry(p, &map->list, link) > + if (a < p->end) > + break; > + return &p->link == &map->list ? NULL : p; > +} > + > +static struct ema *_split_ema(struct ema *p, size_t at) > +{ > + typeof(p) n; > + > + if (at <= p->start || at >= p->end) > + return p; > + > + n = _new_ema(p->start, at, p->source); > + if (likely(n)) { > + memcpy(n + 1, p + 1, _data_size); > + p->start = at; > + list_add_tail(&n->link, &p->link); > + } > + return n; > +} > + > +static int _merge_ema(struct ema *p, struct ema_map *map) > +{ > + typeof(p) prev = list_prev_entry(p, link); > + > + WARN_ON(!mutex_is_locked(&map->lock)); > + > + if (&prev->link == &map->list || prev->end != p->start || > + prev->source != p->source || memcmp(prev + 1, p + 1, _data_size)) > + return 0; > + > + p->start = prev->start; > + fput(prev->source); > + _free_ema(prev); > + return 1; > +} > + > +static inline int _insert_ema(struct ema_map *map, struct ema *n) > +{ > + typeof(n) p = _find_ema(map, n->start); > + > + if (!p) > + list_add_tail(&n->link, &map->list); > + else if (n->end <= p->start) > + list_add_tail(&n->link, &p->link); > + else > + return -EEXIST; > + > + _merge_ema(n, map); > + if (p) > + _merge_ema(p, map); > + return 0; > +} > + > +static void ema_file_free_security(struct file *encl) > +{ > + struct ema_map *m = ema_get_map(encl); > + if (m) > + _free_map(m); > +} > + > +static int ema_enclave_load(struct file *encl, size_t start, size_t end, > + size_t flags, struct vm_area_struct *vma) > +{ > + struct ema_map *m; > + struct ema *ema; > + int rc; > + > + m = _init_map(encl); > + if (unlikely(!m)) > + return -ENOMEM; > + > + ema = _new_ema(start, end, vma ? vma->vm_file : NULL); > + if (unlikely(!ema)) > + return -ENOMEM; > + > + rc = ema_lock_map(m); > + if (!rc) { > + rc = _insert_ema(m, ema); > + ema_unlock_map(m); > + } > + if (rc) > + _free_ema(ema); > + return rc; > +} > + > +static int ema_enclave_init(struct file *encl, struct sgx_sigstruct *sigstruct, > + struct vm_area_struct *vma) > +{ > + if (unlikely(!_init_map(encl))) > + return -ENOMEM; > + return 0; > +} > + > +static struct security_hook_list ema_hooks[] __lsm_ro_after_init = { > + LSM_HOOK_INIT(file_free_security, ema_file_free_security), > + LSM_HOOK_INIT(enclave_load, ema_enclave_load), > + LSM_HOOK_INIT(enclave_init, ema_enclave_init), > +}; > + > +static int __init ema_init(void) > +{ > + _map_cache = kmem_cache_create("lsm-ema_map", sizeof(struct ema_map), > + __alignof__(struct ema_map), SLAB_PANIC, > + NULL); > + security_add_hooks(ema_hooks, ARRAY_SIZE(ema_hooks), "ema"); > + return 0; > +} > + > +DEFINE_LSM(ema) = { > + .name = "ema", > + .order = LSM_ORDER_FIRST, > + .init = ema_init, > + .blobs = &ema_blob_sizes, > +}; > + > +/* ema_request_blob shall only be called from LSM module init function */ > +size_t __init ema_request_blob(size_t size) > +{ > + typeof(_data_size) offset = _data_size; > + _data_size += size; > + return offset; > +} > + > +struct ema_map *ema_get_map(struct file *encl) > +{ > + return (struct ema_map *)atomic_long_read(_map_file(encl)); > +} > + > +/** > + * Invoke a callback function on every EMA falls within range, split EMAs as > + * needed > + */ > +int ema_apply_to_range(struct ema_map *map, size_t start, size_t end, > + int (*cb)(struct ema *, void *), void *arg) > +{ > + struct ema *ema; > + int rc; > + > + ema = _find_ema(map, start); > + while (ema && end > ema->start) { > + if (start > ema->start) > + _split_ema(ema, start); > + if (end < ema->end) > + ema = _split_ema(ema, end); > + > + rc = (*cb)(ema, arg); > + _merge_ema(ema, map); > + if (rc) > + return rc; > + > + ema = _next_ema(ema, map); > + } > + > + if (ema) > + _merge_ema(ema, map); > + return 0; > +} > + > +/* Remove all EMAs falling within range, split EMAs as needed */ > +void ema_remove_range(struct ema_map *map, size_t start, size_t end) > +{ > + struct ema *ema, *n; > + > + ema = _find_ema(map, start); > + while (ema && end > ema->start) { > + if (start > ema->start) > + _split_ema(ema, start); > + if (end < ema->end) > + ema = _split_ema(ema, end); > + > + n = _next_ema(ema, map); > + _free_ema(ema); > + ema = n; > + } > +}
On Fri, Jun 21, 2019 at 09:42:54AM -0700, Xing, Cedric wrote: > > From: Christopherson, Sean J > > Sent: Wednesday, June 19, 2019 3:24 PM > > > > diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index > > 6dba9f282232..67a3babbb24d 100644 > > --- a/arch/x86/include/uapi/asm/sgx.h > > +++ b/arch/x86/include/uapi/asm/sgx.h > > @@ -35,15 +35,17 @@ struct sgx_enclave_create { > > * @src: address for the page data > > * @secinfo: address for the SECINFO data > > * @mrmask: bitmask for the measured 256 byte chunks > > + * @prot: maximal PROT_{READ,WRITE,EXEC} protections for the page > > */ > > struct sgx_enclave_add_page { > > __u64 addr; > > __u64 src; > > __u64 secinfo; > > - __u64 mrmask; > > + __u16 mrmask; > > + __u8 prot; > > + __u8 pad; > > }; > > Given EPCM permissions cannot change in SGX1, these maximal PROT_* flags can > be the same as EPCM permissions, so don't have to be specified by user code > until SGX2. Given we don't have a clear picture on how SGX2 will work yet, I > think we shall take "prot" off until it is proven necessary. I'm ok with deriving the maximal protections from SECINFO, so long as we acknowledge that we're preventing userspace from utilizing EMODPE (until the kernel supports SGX2). > > diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c > > b/arch/x86/kernel/cpu/sgx/driver/main.c > > index 29384cdd0842..dabfe2a7245a 100644 > > --- a/arch/x86/kernel/cpu/sgx/driver/main.c > > +++ b/arch/x86/kernel/cpu/sgx/driver/main.c > > @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, } > > #endif > > > > +/* > > + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages > > + * covered by the specific VMA. A non-existent (or yet to be added) > > +enclave > > + * page is considered to have no RWX permissions, i.e. is inaccessible. > > + */ > > +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, > > + struct vm_area_struct *vma) > > +{ > > + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; > > + unsigned long idx, idx_start, idx_end; > > + struct sgx_encl_page *page; > > + > > + idx_start = PFN_DOWN(vma->vm_start); > > + idx_end = PFN_DOWN(vma->vm_end - 1); > > + > > + for (idx = idx_start; idx <= idx_end; ++idx) { > > + /* > > + * No need to take encl->lock, vm_prot_bits is set prior to > > + * insertion and never changes, and racing with adding pages is > > + * a userspace bug. > > + */ > > + rcu_read_lock(); > > + page = radix_tree_lookup(&encl->page_tree, idx); > > + rcu_read_unlock(); > > This loop iterates through every page in the range, which could be very slow > if the range is large. At this point I'm shooting for functional correctness and minimal code changes. Optimizations will be in order at some point, just not now. > > + > > + /* Do not allow R|W|X to a non-existent page. */ > > + if (!page) > > + allowed_rwx = 0; > > + else > > + allowed_rwx &= page->vm_prot_bits; > > + if (!allowed_rwx) > > + break; > > + } > > + > > + return allowed_rwx; > > +} > > + > > static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { > > struct sgx_encl *encl = file->private_data; > > + unsigned long allowed_rwx; > > int ret; > > > > + allowed_rwx = sgx_allowed_rwx(encl, vma); > > + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) > > + return -EACCES; > > + > > ret = sgx_encl_mm_add(encl, vma->vm_mm); > > if (ret) > > return ret; > > > > + if (!(allowed_rwx & VM_READ)) > > + vma->vm_flags &= ~VM_MAYREAD; > > + if (!(allowed_rwx & VM_WRITE)) > > + vma->vm_flags &= ~VM_MAYWRITE; > > + if (!(allowed_rwx & VM_EXEC)) > > + vma->vm_flags &= ~VM_MAYEXEC; > > + > > Say a range comprised of a RW sub-range and a RX sub-range is being mmap()'ed > as R here. It'd succeed but mprotect(<RW sub-range>, RW) afterwards will fail > because VM_MAYWRITE is cleared here. However, if those two sub-ranges are > mapped by separate mmap() calls then the same mprotect() would succeed. The > inconsistence here is unexpected and unprecedented. Boo, I thought I was being super clever. > > vma->vm_ops = &sgx_vm_ops; > > vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; > > vma->vm_private_data = encl; >
On 7/8/2019 9:26 AM, Casey Schaufler wrote: > In this scheme you use an ema LSM to manage your ema data. > A quick sketch looks like: > > sgx_something_in() calls > security_enclave_load() calls > ema_enclave_load() > selinux_enclave_load() > otherlsm_enclave_load() > > Why is this better than: > > sgx_something_in() calls > ema_enclave_load() > security_enclave_load() calls > selinux_enclave_load() > otherlsm_enclave_load() Are you talking about moving EMA somewhere outside LSM? If so, where? > > > If you did really want ema to behave like an LSM > you would put the file data that SELinux is managing > into the ema portion of the blob and provide interfaces > for the SELinux (or whoever) to use that. Also, it's > an abomination (as I've stated before) for ema to > rely on SELinux to provide a file_free() hook for > ema's data. If you continue down the LSM route, you > need to provide an ema_file_free() hook. You can't > count on SELinux to do it for you. If there are multiple > LSMs (coming soon!) that use the ema data, they'll all > try to free it, and then Bad Things can happen. I'm afraid you have misunderstood the code. What is kept open and gets closed in selinux_file_free() is the sigstruct file. SELinux uses it to determine the page permissions for enclave pages from anonymous sources. It is a policy choice made inside SELinux and has nothing to do with EMA. There's indeed an ema_file_free_security() to free the EMA map for enclaves being closed. EMA does *NOT* rely on any other LSMs to free data for it. The only exception is when an LSM fails enclave_load(), it has to call ema_remove_range() to remove the range being added, which was *not* required originally in v2. >> +/** >> + * ema - Enclave Memory Area structure for LSM modules > > LSM modules is redundant. "LSM" or "LSMs" would be better. Noted >> diff --git a/security/Makefile b/security/Makefile >> index c598b904938f..b66d03a94853 100644 >> --- a/security/Makefile >> +++ b/security/Makefile >> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >> +obj-$(CONFIG_INTEL_SGX) += commonema.o > > The config option and the file name ought to match, > or at least be closer. Just trying to match file names as "capability" uses commoncap.c. Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? >> diff --git a/security/commonema.c b/security/commonema.c > > Put this in a subdirectory. Please. Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. >> +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = { >> + .lbs_file = sizeof(atomic_long_t), >> +}; > > If this is ema's data ema must manage it. You *must* have > a file_free(). There is one indeed - ema_file_free_security(). > >> + >> +static atomic_long_t *_map_file(struct file *encl) >> +{ >> + return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file); > > I don't trust all the casting going on here, especially since > you don't end up with the type you should be returning. Will change. >> +} >> + >> +static struct ema_map *_alloc_map(void) > > Function header comments, please. Will add.
On 7/8/2019 9:34 AM, Sean Christopherson wrote: > On Fri, Jun 21, 2019 at 09:42:54AM -0700, Xing, Cedric wrote: >>> From: Christopherson, Sean J >>> Sent: Wednesday, June 19, 2019 3:24 PM >>> >>> diff --git a/arch/x86/include/uapi/asm/sgx.h b/arch/x86/include/uapi/asm/sgx.h index >>> 6dba9f282232..67a3babbb24d 100644 >>> --- a/arch/x86/include/uapi/asm/sgx.h >>> +++ b/arch/x86/include/uapi/asm/sgx.h >>> @@ -35,15 +35,17 @@ struct sgx_enclave_create { >>> * @src: address for the page data >>> * @secinfo: address for the SECINFO data >>> * @mrmask: bitmask for the measured 256 byte chunks >>> + * @prot: maximal PROT_{READ,WRITE,EXEC} protections for the page >>> */ >>> struct sgx_enclave_add_page { >>> __u64 addr; >>> __u64 src; >>> __u64 secinfo; >>> - __u64 mrmask; >>> + __u16 mrmask; >>> + __u8 prot; >>> + __u8 pad; >>> }; >> >> Given EPCM permissions cannot change in SGX1, these maximal PROT_* flags can >> be the same as EPCM permissions, so don't have to be specified by user code >> until SGX2. Given we don't have a clear picture on how SGX2 will work yet, I >> think we shall take "prot" off until it is proven necessary. > > I'm ok with deriving the maximal protections from SECINFO, so long as we > acknowledge that we're preventing userspace from utilizing EMODPE (until > the kernel supports SGX2). I think that's alright. >>> diff --git a/arch/x86/kernel/cpu/sgx/driver/main.c >>> b/arch/x86/kernel/cpu/sgx/driver/main.c >>> index 29384cdd0842..dabfe2a7245a 100644 >>> --- a/arch/x86/kernel/cpu/sgx/driver/main.c >>> +++ b/arch/x86/kernel/cpu/sgx/driver/main.c >>> @@ -93,15 +93,64 @@ static long sgx_compat_ioctl(struct file *filep, unsigned int cmd, } >>> #endif >>> >>> +/* >>> + * Returns the AND of VM_{READ,WRITE,EXEC} permissions across all pages >>> + * covered by the specific VMA. A non-existent (or yet to be added) >>> +enclave >>> + * page is considered to have no RWX permissions, i.e. is inaccessible. >>> + */ >>> +static unsigned long sgx_allowed_rwx(struct sgx_encl *encl, >>> + struct vm_area_struct *vma) >>> +{ >>> + unsigned long allowed_rwx = VM_READ | VM_WRITE | VM_EXEC; >>> + unsigned long idx, idx_start, idx_end; >>> + struct sgx_encl_page *page; >>> + >>> + idx_start = PFN_DOWN(vma->vm_start); >>> + idx_end = PFN_DOWN(vma->vm_end - 1); >>> + >>> + for (idx = idx_start; idx <= idx_end; ++idx) { >>> + /* >>> + * No need to take encl->lock, vm_prot_bits is set prior to >>> + * insertion and never changes, and racing with adding pages is >>> + * a userspace bug. >>> + */ >>> + rcu_read_lock(); >>> + page = radix_tree_lookup(&encl->page_tree, idx); >>> + rcu_read_unlock(); >> >> This loop iterates through every page in the range, which could be very slow >> if the range is large. > > At this point I'm shooting for functional correctness and minimal code > changes. Optimizations will be in order at some point, just not now. I was trying to point out in this thread that your approach isn't as simple as it looks lik >>> + >>> + /* Do not allow R|W|X to a non-existent page. */ >>> + if (!page) >>> + allowed_rwx = 0; >>> + else >>> + allowed_rwx &= page->vm_prot_bits; >>> + if (!allowed_rwx) >>> + break; >>> + } >>> + >>> + return allowed_rwx; >>> +} >>> + >>> static int sgx_mmap(struct file *file, struct vm_area_struct *vma) { >>> struct sgx_encl *encl = file->private_data; >>> + unsigned long allowed_rwx; >>> int ret; >>> >>> + allowed_rwx = sgx_allowed_rwx(encl, vma); >>> + if (vma->vm_flags & (VM_READ | VM_WRITE | VM_EXEC) & ~allowed_rwx) >>> + return -EACCES; >>> + >>> ret = sgx_encl_mm_add(encl, vma->vm_mm); >>> if (ret) >>> return ret; >>> >>> + if (!(allowed_rwx & VM_READ)) >>> + vma->vm_flags &= ~VM_MAYREAD; >>> + if (!(allowed_rwx & VM_WRITE)) >>> + vma->vm_flags &= ~VM_MAYWRITE; >>> + if (!(allowed_rwx & VM_EXEC)) >>> + vma->vm_flags &= ~VM_MAYEXEC; >>> + >> >> Say a range comprised of a RW sub-range and a RX sub-range is being mmap()'ed >> as R here. It'd succeed but mprotect(<RW sub-range>, RW) afterwards will fail >> because VM_MAYWRITE is cleared here. However, if those two sub-ranges are >> mapped by separate mmap() calls then the same mprotect() would succeed. The >> inconsistence here is unexpected and unprecedented. > > Boo, I thought I was being super clever. > >>> vma->vm_ops = &sgx_vm_ops; >>> vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO; >>> vma->vm_private_data = encl; >>
On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: > > I still don't get why we need this whole mess and do not simply admit > that there are two distinct roles: > > 1. Creator > 2. User Because SELinux has existing concepts of EXECMEM and EXECMOD. > In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but > User does not. It just gets the fd from the Creator. I'm sure that all > the SGX2 related functionality can be solved somehow in this role > playing game. > > An example would be the usual case where enclave is actually a loader > that loads the actual piece of software that one wants to run. Things > simply need to be designed in a way the Creator runs the loader part. > These are non-trivial problems but oddball security model is not going > to make them disappear - on the contrary it will make designing user > space only more complicated. > > I think this is classical example of when something overly complicated > is invented in the kernel only to realize that it should be solved in > the user space. > > It would not be like the only use case where some kind of privileged > daemon is used for managing some a kernel provided resource. > > I think a really good conclusion from this discussion that has taken two > months is to realize that nothing needs to be done in this area (except > *maybe* noexec check). Hmm, IMO we need to support at least equivalents to EXECMEM and EXECMOD. That being said, we can do so without functional changes to the SGX uapi, e.g. add reserved fields so that the initial uapi can be extended *if* we decide to go with the "userspace provides maximal protections" path, and use the EPCM permissions as the maximal protections for the initial upstreaming. That'd give us a minimal implemenation for initial upstreaming and would eliminate Cedric's blocking complaint. The "whole mess" of whitelisting, blacklisting and SGX2 support would be deferred until post-upstreaming.
On 7/8/2019 10:29 AM, Sean Christopherson wrote: > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: >> On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: >> >> I still don't get why we need this whole mess and do not simply admit >> that there are two distinct roles: >> >> 1. Creator >> 2. User > > Because SELinux has existing concepts of EXECMEM and EXECMOD. > >> In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but >> User does not. It just gets the fd from the Creator. I'm sure that all >> the SGX2 related functionality can be solved somehow in this role >> playing game. >> >> An example would be the usual case where enclave is actually a loader >> that loads the actual piece of software that one wants to run. Things >> simply need to be designed in a way the Creator runs the loader part. >> These are non-trivial problems but oddball security model is not going >> to make them disappear - on the contrary it will make designing user >> space only more complicated. >> >> I think this is classical example of when something overly complicated >> is invented in the kernel only to realize that it should be solved in >> the user space. Are you talking about changing enclave loaders in user mode? That'd break all existing code. I don't think we shall ever consider this approach. >> >> It would not be like the only use case where some kind of privileged >> daemon is used for managing some a kernel provided resource. >> >> I think a really good conclusion from this discussion that has taken two >> months is to realize that nothing needs to be done in this area (except >> *maybe* noexec check). > > Hmm, IMO we need to support at least equivalents to EXECMEM and EXECMOD. > > That being said, we can do so without functional changes to the SGX uapi, > e.g. add reserved fields so that the initial uapi can be extended *if* we > decide to go with the "userspace provides maximal protections" path, and > use the EPCM permissions as the maximal protections for the initial > upstreaming. > > That'd give us a minimal implemenation for initial upstreaming and would > eliminate Cedric's blocking complaint. The "whole mess" of whitelisting, > blacklisting and SGX2 support would be deferred until post-upstreaming. >
On 7/8/2019 8:55 AM, Sean Christopherson wrote: > On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote: > ... > >> different FSMs to govern page protection transitions. Implementation wise, his >> model also imposes unwanted restrictions specifically to SGX2, such as: >> · Complicated/Restricted UAPI – Enclave loaders are required to provide > > I don't think "complicated" is a fair assessment. For SGX1 enclaves it's > literally a direct propagation of the SECINFO RWX flags. True only for SGX1. >> “maximal protection” at page load time, but such information is NOT always >> available. For example, Graphene containers may run different applications >> comprised of different set of executables and/or shared objects. Some of >> them may contain self-modifying code (or text relocation) while others >> don’t. The generic enclave loader usually doesn’t have such information so >> wouldn’t be able to provide it ahead of time. > > I'm unconvinced that it would be remotely difficult to teach an enclave > loader that an enclave or hosted application employs SMC, relocation or > any other behavior that would require declaring RWX on all pages. You've been talking as if "enclave loader" is tailored to the enclave it is loading. But in reality "enclave loader" is usually a library knowing usually nothing about the enclave. How could it know if an enclave contains self-modifying code? >> · Inefficient Auditing – Audit logs are supposed to help system >> administrators to determine the set of minimally needed permissions and to >> detect abnormal behaviors. But consider the “maximal protection” model, if >> “maximal protection” is set to be too permissive, then audit log wouldn’t >> be able to detect anomalies; > > Huh? Declaring overly permissive protections is only problematic if an > LSM denies the permission, in which case it will generate an accurate > audit log. > > If the enclave/loader "requires" a permission it doesn't actually need, > e.g. EXECDIRTY, then it's a software bug that should be fixed. I don't > see how this scenario is any different than an application that uses > assembly code without 'noexecstack' and inadvertantly "requires" > EXECSTACK due to triggering "read implies exec". In both cases the > denied permission is unnecessary due to a userspace application bug. You see, you've been assuming "enclave loader" knows everything and tailored to what it loads in a particular application. But the reality is the loader is generic and probably shared by multiple applications. It needs some generic way to figure out the "maximal protection". An implementation could use information embedded in the enclave file, or could just be "configurable". In the former case, you put extra burdens on the build tools, while in the latter case, your audit logs cannot help generating an appropriate configuration. >> or if “maximal protection” is too restrictive, >> then audit log cannot identify the file violating the policy. > > Maximal protections that are too restrictive are completely orthogonal to > LSMs as the enclave would fail to run irrespective of LSMs. This is no > different than specifying the wrong RWX flags in SECINFO, or opening a > file as RO instead of RW. Say loader is configurable. By looking at the log, can an administrator tell which file has too restrictive "maximal protection"? >> In either case the audit log cannot fulfill its purposes. >> · Inability to support #PF driven EPC allocation in SGX2 – For those >> unfamiliar with SGX2 software flows, an SGX2 enclave requests a page by >> issuing EACCEPT on the address that a new page is wanted, and the resulted >> #PF is expected to be handled by the kernel by EAUG’ing an EPC page at the >> fault address, and then the enclave would be resumed and the faulting >> EACCEPT retried, and succeed. The key requirement is to allow mmap()’ing >> non-existing enclave pages so that the SGX module/subsystem could respond >> to #PFs by EAUG’ing new pages. Sean’s implementation doesn’t allow >> mmap()’ing non-existing pages for variety of reasons and therefore blocks >> this major SGX2 usage. > > This is simply wrong. The key requirement in the theoretical EAUG scheme > is to mmap() pages that have not been added to the _hardware_ maintained > enclave. The pages (or some optimized representation of a range of pages) > would exist in the kernel's software mode of the enclave. You are right. Code can always change. My assessment was made on what's in your patch. The key point here is your patch is more complicated than it seems because you've been hand-waving on imminent requirements.
On Mon, Jul 08, 2019 at 10:49:59AM -0700, Xing, Cedric wrote: > On 7/8/2019 8:55 AM, Sean Christopherson wrote: > >On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote: > True only for SGX1. > >> “maximal protection” at page load time, but such information is NOT always > >> available. For example, Graphene containers may run different applications > >> comprised of different set of executables and/or shared objects. Some of > >> them may contain self-modifying code (or text relocation) while others > >> don’t. The generic enclave loader usually doesn’t have such information so > >> wouldn’t be able to provide it ahead of time. > > > >I'm unconvinced that it would be remotely difficult to teach an enclave > >loader that an enclave or hosted application employs SMC, relocation or > >any other behavior that would require declaring RWX on all pages. > > You've been talking as if "enclave loader" is tailored to the enclave it is > loading. But in reality "enclave loader" is usually a library knowing > usually nothing about the enclave. How could it know if an enclave contains > self-modifying code? Given the rarity of SMC, require enclaves to declare "I do SMC"... The Intel SDK already requires the enclave developer to declare heap size, stack size, thread affinity, etc... I have a very hard time believing that it can't support SMC and relocation flags. > >> · Inefficient Auditing – Audit logs are supposed to help system > >> administrators to determine the set of minimally needed permissions and to > >> detect abnormal behaviors. But consider the “maximal protection” model, if > >> “maximal protection” is set to be too permissive, then audit log wouldn’t > >> be able to detect anomalies; > > > >Huh? Declaring overly permissive protections is only problematic if an > >LSM denies the permission, in which case it will generate an accurate > >audit log. > > > >If the enclave/loader "requires" a permission it doesn't actually need, > >e.g. EXECDIRTY, then it's a software bug that should be fixed. I don't > >see how this scenario is any different than an application that uses > >assembly code without 'noexecstack' and inadvertantly "requires" > >EXECSTACK due to triggering "read implies exec". In both cases the > >denied permission is unnecessary due to a userspace application bug. > > You see, you've been assuming "enclave loader" knows everything and tailored > to what it loads in a particular application. But the reality is the loader > is generic and probably shared by multiple applications. No, I'm assuming that an enclave can communicate its basic needs without undue pain. > It needs some generic way to figure out the "maximal protection". An > implementation could use information embedded in the enclave file, or could > just be "configurable". In the former case, you put extra burdens on the build > tools, while in the latter case, your audit logs cannot help generating an > appropriate configuration. I'm contending the "extra burdens" is minimal. if (do_smc || do_relocation) max_prot = RWX; else max_prot = SECINFO.FLAGS; > >> or if “maximal protection” is too restrictive, > >> then audit log cannot identify the file violating the policy. > > > >Maximal protections that are too restrictive are completely orthogonal to > >LSMs as the enclave would fail to run irrespective of LSMs. This is no > >different than specifying the wrong RWX flags in SECINFO, or opening a > >file as RO instead of RW. > > Say loader is configurable. By looking at the log, can an administrator tell > which file has too restrictive "maximal protection"? Again, this fails irrespective of LSMs. So the answer is "no", because there is no log. But the admin will never have to deal with this issue because the enclave will *never* run, i.e. would unconditionally fail to run during initial development. And the developer has bigger problems if they can't debug their own code. > >>In either case the audit log cannot fulfill its purposes.
Hi Sean,
What's in my cover letter is my assessment on what's in your series. You
may disagree. But I don't think it productive until you can prove your
points in code.
The key points I'm making are:
(1) The impact to user mode code due to UAPI change is more significant
than you have envisioned.
(2) Your series has implemented less than required in practice.
For #1, regular shared objects don't carry info like whether it contains
self-modifying code or generates code on the fly. So your requirement of
"maximal protection" is new, and you should at least put together a
story to show everyone how it could be met, especially without changing
build tools.
For #2, SGX2 is imminent, and the upcoming ICX server will support 512GB
of EPC. So the problems in mprotect() performance and EAUG-on-#PF must
be solved, let alone other problems. I guess you have to code them up so
everyone will be able to evaluate whether your approach is really as
simple as you have claimed.
-Cedric
On 7/8/2019 11:49 AM, Sean Christopherson wrote:
> On Mon, Jul 08, 2019 at 10:49:59AM -0700, Xing, Cedric wrote:
>> On 7/8/2019 8:55 AM, Sean Christopherson wrote:
>>> On Sun, Jul 07, 2019 at 04:41:30PM -0700, Cedric Xing wrote:
>> True only for SGX1.
>>>> “maximal protection” at page load time, but such information is NOT always
>>>> available. For example, Graphene containers may run different applications
>>>> comprised of different set of executables and/or shared objects. Some of
>>>> them may contain self-modifying code (or text relocation) while others
>>>> don’t. The generic enclave loader usually doesn’t have such information so
>>>> wouldn’t be able to provide it ahead of time.
>>>
>>> I'm unconvinced that it would be remotely difficult to teach an enclave
>>> loader that an enclave or hosted application employs SMC, relocation or
>>> any other behavior that would require declaring RWX on all pages.
>>
>> You've been talking as if "enclave loader" is tailored to the enclave it is
>> loading. But in reality "enclave loader" is usually a library knowing
>> usually nothing about the enclave. How could it know if an enclave contains
>> self-modifying code?
>
> Given the rarity of SMC, require enclaves to declare "I do SMC"... The
> Intel SDK already requires the enclave developer to declare heap size,
> stack size, thread affinity, etc... I have a very hard time believing
> that it can't support SMC and relocation flags.
>
>>>> · Inefficient Auditing – Audit logs are supposed to help system
>>>> administrators to determine the set of minimally needed permissions and to
>>>> detect abnormal behaviors. But consider the “maximal protection” model, if
>>>> “maximal protection” is set to be too permissive, then audit log wouldn’t
>>>> be able to detect anomalies;
>>>
>>> Huh? Declaring overly permissive protections is only problematic if an
>>> LSM denies the permission, in which case it will generate an accurate
>>> audit log.
>>>
>>> If the enclave/loader "requires" a permission it doesn't actually need,
>>> e.g. EXECDIRTY, then it's a software bug that should be fixed. I don't
>>> see how this scenario is any different than an application that uses
>>> assembly code without 'noexecstack' and inadvertantly "requires"
>>> EXECSTACK due to triggering "read implies exec". In both cases the
>>> denied permission is unnecessary due to a userspace application bug.
>>
>> You see, you've been assuming "enclave loader" knows everything and tailored
>> to what it loads in a particular application. But the reality is the loader
>> is generic and probably shared by multiple applications.
>
> No, I'm assuming that an enclave can communicate its basic needs without
> undue pain.
>
>> It needs some generic way to figure out the "maximal protection". An
>> implementation could use information embedded in the enclave file, or could
>> just be "configurable". In the former case, you put extra burdens on the build
>> tools, while in the latter case, your audit logs cannot help generating an
>> appropriate configuration.
>
> I'm contending the "extra burdens" is minimal.
>
> if (do_smc || do_relocation)
> max_prot = RWX;
> else
> max_prot = SECINFO.FLAGS;
>
>>>> or if “maximal protection” is too restrictive,
>>>> then audit log cannot identify the file violating the policy.
>>>
>>> Maximal protections that are too restrictive are completely orthogonal to
>>> LSMs as the enclave would fail to run irrespective of LSMs. This is no
>>> different than specifying the wrong RWX flags in SECINFO, or opening a
>>> file as RO instead of RW.
>>
>> Say loader is configurable. By looking at the log, can an administrator tell
>> which file has too restrictive "maximal protection"?
>
> Again, this fails irrespective of LSMs. So the answer is "no", because
> there is no log. But the admin will never have to deal with this issue
> because the enclave will *never* run, i.e. would unconditionally fail to
> run during initial development. And the developer has bigger problems if
> they can't debug their own code.
>
>>>> In either case the audit log cannot fulfill its purposes.
On 7/8/2019 10:16 AM, Xing, Cedric wrote: > On 7/8/2019 9:26 AM, Casey Schaufler wrote: >> In this scheme you use an ema LSM to manage your ema data. >> A quick sketch looks like: >> >> sgx_something_in() calls >> security_enclave_load() calls >> ema_enclave_load() >> selinux_enclave_load() >> otherlsm_enclave_load() >> >> Why is this better than: >> >> sgx_something_in() calls >> ema_enclave_load() >> security_enclave_load() calls >> selinux_enclave_load() >> otherlsm_enclave_load() > > Are you talking about moving EMA somewhere outside LSM? Yes. That's what I've been saying all along. > If so, where? I tried to make it obvious. Put the call to your EMA code on the line before you call security_enclave_load(). > >> >> If you did really want ema to behave like an LSM >> you would put the file data that SELinux is managing >> into the ema portion of the blob and provide interfaces >> for the SELinux (or whoever) to use that. Also, it's >> an abomination (as I've stated before) for ema to >> rely on SELinux to provide a file_free() hook for >> ema's data. If you continue down the LSM route, you >> need to provide an ema_file_free() hook. You can't >> count on SELinux to do it for you. If there are multiple >> LSMs (coming soon!) that use the ema data, they'll all >> try to free it, and then Bad Things can happen. > > I'm afraid you have misunderstood the code. What is kept open and gets closed in selinux_file_free() is the sigstruct file. SELinux uses it to determine the page permissions for enclave pages from anonymous sources. It is a policy choice made inside SELinux and has nothing to do with EMA. OK. > > There's indeed an ema_file_free_security() to free the EMA map for enclaves being closed. EMA does *NOT* rely on any other LSMs to free data for it. The only exception is when an LSM fails enclave_load(), it has to call ema_remove_range() to remove the range being added, which was *not* required originally in v2. OK. > >>> +/** >>> + * ema - Enclave Memory Area structure for LSM modules >> >> LSM modules is redundant. "LSM" or "LSMs" would be better. > > Noted > >>> diff --git a/security/Makefile b/security/Makefile >>> index c598b904938f..b66d03a94853 100644 >>> --- a/security/Makefile >>> +++ b/security/Makefile >>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >> >> The config option and the file name ought to match, >> or at least be closer. > > Just trying to match file names as "capability" uses commoncap.c. Fine, then you should be using CONFIG_SECURITY_EMA. > > Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? Make CONFIG_SECURITY_EMA depends on CONFIG_INTEL_SGX When another TEE (maybe MIPS_SSRPQ) comes along you can have CONFIG_SECURITY_EMA depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ > >>> diff --git a/security/commonema.c b/security/commonema.c >> >> Put this in a subdirectory. Please. > > Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. commoncap is not optional. It is a base part of the security subsystem. ema is optional. > >>> +static struct lsm_blob_sizes ema_blob_sizes __lsm_ro_after_init = { >>> + .lbs_file = sizeof(atomic_long_t), >>> +}; >> >> If this is ema's data ema must manage it. You *must* have >> a file_free(). > > There is one indeed - ema_file_free_security(). I see it now. > >> >>> + >>> +static atomic_long_t *_map_file(struct file *encl) >>> +{ >>> + return (void *)((char *)(encl->f_security) + ema_blob_sizes.lbs_file); >> >> I don't trust all the casting going on here, especially since >> you don't end up with the type you should be returning. > > Will change. > >>> +} >>> + >>> +static struct ema_map *_alloc_map(void) >> >> Function header comments, please. > > Will add.
On 7/7/2019 6:30 AM, Dr. Greg wrote: > On Wed, Jul 03, 2019 at 08:32:10AM -0700, Casey Schaufler wrote: > > Good morning, I hope the weekend has been enjoyable for everyone. > >>>> On 7/2/2019 12:42 AM, Xing, Cedric wrote: >>>>> ... >>>>> Guess this discussion will never end if we don't get into >>>>> code. Guess it'd be more productive to talk over phone then come back >>>>> to this thread with a conclusion. Will that be ok with you? >>>> I don't think that a phone call is going to help. Talking code >>>> issues tends to muddle them in my brain. If you can give me a few >>>> days I will propose a rough version of how I think your code should >>>> be integrated into the LSM environment. I'm spending more time >>>> trying (unsuccessfully :( ) to discribe the issues in English than >>>> it will probably take in C. >>> While Casey is off writing his rosetta stone, >> I'd hardly call it that. More of an effort to round the corners on >> the square peg. And Cedric has some ideas on how to approach that. > Should we infer from this comment that, of the two competing > strategies, Cedric's is the favored architecture? With Cedric's latest patches I'd say there's only one strategy. There's still some refinement to do, but we're getting there. >>> let me suggest that the >>> most important thing we need to do is to take a little time, step back >>> and look at the big picture with respect to what we are trying to >>> accomplish and if we are going about it in a way that makes any sense >>> from an engineering perspective. >>> >>> This conversation shouldn't be about SGX, it should be about the best >>> way for the kernel/LSM to discipline a Trusted Execution Environment >>> (TEE). As I have noted previously, a TEE is a 'blackbox' that, by >>> design, is intended to allow execution of code and processing of data >>> in a manner that is resistant to manipulation or inspection by >>> untrusted userspace, the kernel and/or the hardware itself. >>> >>> Given that fact, if we are to be intellectually honest, we need to ask >>> ourselves how effective we believe we can be in controlling any TEE >>> with kernel based mechanisms. This is particularly the case if the >>> author of any code running in the TEE has adversarial intent. >>> >>> Here is the list of controls that we believe an LSM can, effectively, >>> implement against a TEE: >>> >>> 1.) Code provenance and origin. >>> >>> 2.) Cryptographic verification of dynamically executable content. >>> >>> 2.) The ability of a TEE to implement anonymous executable content. >>> >>> If people are in agreement with this concept, it is difficult to >>> understand why we should be implementing complex state machines and >>> the like, whether it is in the driver or the LSM. Security code has >>> to be measured with a metric of effectiveness, otherwise we are >>> engaging in security theater. >>> >>> I believe that if we were using this lens, we would already have a >>> mainline SGX driver, since we seem to have most of the needed LSM >>> infrastructure and any additional functionality would be a straight >>> forward implementation. Most importantly, the infrastructure would >>> not be SGX specific, which would seem to be a desirable political >>> concept. >> Generality introduced in the absence of multiple instances >> often results in unnecessary complexity, unused interfaces >> and feature compromise. Guessing what other TEE systems might >> do, and constraining SGX to those models (or the other way around) >> is a well established road to ruin. The LSM infrastructure is >> a fine example. For the first ten years the "general" mechanism >> had a single user. I'd say to hold off on the general until there >> is more experience with the specific. It's easier to construct >> a general mechanism around things that work than to fit things >> that need to work into some preconceived notion of generality. > All well taken points from an implementation perspective, but they > elide the point I was trying to make. Which is the fact that without > any semblance of a discussion regarding the requirements needed to > implement a security architecture around the concept of a TEE, this > entire process, despite Cedric's well intentioned efforts, amounts to > pounding a square solution into the round hole of a security problem. Lead with code. I love a good requirements document, but one of the few places where I agree with the agile folks is that working code speaks loudly. > Which, as I noted in my e-mail, is tantamount to security theater. Not buying that. Not rejecting it, either. Without code to judge it's kind of hard to say. > Everyone wants to see this driver upstream. If we would have had a > reasoned discussion regarding what it means to implement proper > controls around a TEE, when we started to bring these issues forward > last November, we could have possibly been on the road to having a > driver with reasoned security controls and one that actually delivers > the security guarantees the hardware was designed to deliver. > > Best wishes for a productive week to everyone. > > Dr. Greg > > As always, > Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. > 4206 N. 19th Ave. Specializing in information infra-structure > Fargo, ND 58102 development. > PH: 701-281-1686 EMAIL: greg@enjellic.com > ------------------------------------------------------------------------------ > "Any intelligent fool can make things bigger and more complex... It > takes a touch of genius - and a lot of courage to move in the opposite > direction." > -- Albert Einstein
On Sun, Jul 07, 2019 at 04:41:32PM -0700, Cedric Xing wrote: ... > @@ -575,6 +576,46 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long addr, > return ret; > } > > +static int sgx_encl_prepare_page(struct file *filp, unsigned long dst, > + unsigned long src, void *buf) > +{ > + struct vm_area_struct *vma; > + unsigned long prot; > + int rc; > + > + if (dst & ~PAGE_SIZE) > + return -EINVAL; > + > + rc = down_read_killable(¤t->mm->mmap_sem); > + if (rc) > + return rc; > + > + vma = find_vma(current->mm, dst); > + if (vma && dst >= vma->vm_start) > + prot = _calc_vm_trans(vma->vm_flags, VM_READ, PROT_READ) | > + _calc_vm_trans(vma->vm_flags, VM_WRITE, PROT_WRITE) | > + _calc_vm_trans(vma->vm_flags, VM_EXEC, PROT_EXEC); > + else > + prot = 0; > + > + vma = find_vma(current->mm, src); > + if (!vma || src < vma->vm_start || src + PAGE_SIZE > vma->vm_end) > + rc = -EFAULT; > + > + if (!rc && !(vma->vm_flags & VM_MAYEXEC)) > + rc = -EACCES; Disallowing loading enclave *data* from a noexec file system is an arbitrary and weird restriction. > + > + if (!rc && copy_from_user(buf, (void __user *)src, PAGE_SIZE)) > + rc = -EFAULT; > + > + if (!rc) > + rc = security_enclave_load(filp, dst, PAGE_SIZE, prot, vma); > + > + up_read(¤t->mm->mmap_sem); > + > + return rc; > +}
On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: > +static int enclave_mprotect(struct vm_area_struct *vma, size_t prot) > +{ > + struct ema_map *m; > + int rc; > + > + /* is vma an enclave vma ? */ > + if (!vma->vm_file) > + return 0; > + m = ema_get_map(vma->vm_file); > + if (!m) > + return 0; > + > + /* WX requires EXECMEM */ > + if ((prot && PROT_WRITE) && (prot & PROT_EXEC)) { > + rc = avc_has_perm(&selinux_state, current_sid(), current_sid(), > + SECCLASS_PROCESS, PROCESS__EXECMEM, NULL); > + if (rc) > + return rc; > + } > + > + rc = ema_lock_map(m); > + if (rc) > + return rc; > + > + if ((prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) > + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, > + ema__chk_X_cb, vma->vm_file); > + if (!rc && (prot & PROT_WRITE) && !(vma->vm_flags & VM_WRITE)) > + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, > + ema__set_M_cb, NULL); Not tracking whether a page has been mapped X and having ema__chk_W_cb() allows an application to circumvent W^X policies by spinning up a helper process. Ignoring that issue, this approach suffers from the same race condition I pointed out a while back[1]. If process A maps a page W and process B maps the same page X, then the result of ema__chk_X_cb() depends on the order of mprotect() calls between A and B. [1] https://lore.kernel.org/linux-security-module/20190614200123.GA32570@linux.intel.com/ > + ema_unlock_map(m); > + > + return rc; > +}
On Mon, Jul 08, 2019 at 05:02:00PM -0700, Casey Schaufler wrote:
> On 7/7/2019 6:30 AM, Dr. Greg wrote:
> > On Wed, Jul 03, 2019 at 08:32:10AM -0700, Casey Schaufler wrote:
> >
> > Good morning, I hope the weekend has been enjoyable for everyone.
> >
> >>>> On 7/2/2019 12:42 AM, Xing, Cedric wrote:
> >>>>> ...
> >>>>> Guess this discussion will never end if we don't get into
> >>>>> code. Guess it'd be more productive to talk over phone then come back
> >>>>> to this thread with a conclusion. Will that be ok with you?
> >>>> I don't think that a phone call is going to help. Talking code
> >>>> issues tends to muddle them in my brain. If you can give me a few
> >>>> days I will propose a rough version of how I think your code should
> >>>> be integrated into the LSM environment. I'm spending more time
> >>>> trying (unsuccessfully :( ) to discribe the issues in English than
> >>>> it will probably take in C.
> >>> While Casey is off writing his rosetta stone,
> >> I'd hardly call it that. More of an effort to round the corners on
> >> the square peg. And Cedric has some ideas on how to approach that.
> > Should we infer from this comment that, of the two competing
> > strategies, Cedric's is the favored architecture?
>
> With Cedric's latest patches I'd say there's only one
> strategy. There's still some refinement to do, but we're
> getting there.
Dynamic tracking has an unsolvable race condition. If process A maps a
page W and process B maps the same page X, then the result of W^X checks
depends on the order of mprotect() calls between A and B.
If we're ok saying "don't do that" then I can get behind dynamic tracking
as a whole. Even if we settle on dynamic tracking, where that tracking
code lives is still an open IMO.
On Mon, Jul 08, 2019 at 07:57:07AM -0700, Sean Christopherson wrote:
> On Fri, Jun 21, 2019 at 12:03:36AM +0300, Jarkko Sakkinen wrote:
> > On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote:
> > > Using per-vma refcounting to track mm_structs associated with an enclave
> > > requires hooking .vm_close(), which in turn prevents the mm from merging
> > > vmas (precisely to allow refcounting).
> >
> > Why having sgx_vma_close() prevents that? I do not understand the
> > problem statement.
>
> vmas that define .vm_close() cannot be merged.
Ugh, did not know that :-) Thank you.
/Jarkko
On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote: > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: > > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: > > > > I still don't get why we need this whole mess and do not simply admit > > that there are two distinct roles: > > > > 1. Creator > > 2. User > > Because SELinux has existing concepts of EXECMEM and EXECMOD. What is the official documentation for those? I've only found some explanations from discussions and some RHEL sysadmin guides. > That being said, we can do so without functional changes to the SGX uapi, > e.g. add reserved fields so that the initial uapi can be extended *if* we > decide to go with the "userspace provides maximal protections" path, and > use the EPCM permissions as the maximal protections for the initial > upstreaming. > > That'd give us a minimal implemenation for initial upstreaming and would > eliminate Cedric's blocking complaint. The "whole mess" of whitelisting, > blacklisting and SGX2 support would be deferred until post-upstreaming. I'd like that approach more too. /Jarkko
On Tue, Jul 09, 2019 at 07:22:03PM +0300, Jarkko Sakkinen wrote: > On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote: > > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: > > > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: > > > > > > I still don't get why we need this whole mess and do not simply admit > > > that there are two distinct roles: > > > > > > 1. Creator > > > 2. User > > > > Because SELinux has existing concepts of EXECMEM and EXECMOD. > > What is the official documentation for those? I've only found some > explanations from discussions and some RHEL sysadmin guides. No clue. My knowledge was gleaned from the code and from Stephen's feedback. The high level breakdown: - FILE__EXECUTE: required to gain X on a mapping to a regular file - FILE__EXECUTE + FILE__WRITE: required to gain WX or W->X on a shared mapping to a regular file - FILE__EXECMOD: required to gain W->X on a private mapping of a regular file - PROCESS__EXECMEM: required to gain WX on a private mapping to a regular file, OR to gain X on an anonymous mapping. Translating those to SGX, with a lot of input from Stephen, I ended up with the following: - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X on an enclave page loaded from a regular file - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE, required to gain W->X on an enclave page - PROCESS2__ENCLAVE_EXECANON: subset of EXECMEM, required to gain X on an enclave page that is loaded from an anonymous mapping - PROCESS2__ENCLAVE_MAPWX: subset of EXECMEM, required to gain WX on an enclave page > > That being said, we can do so without functional changes to the SGX uapi, > > e.g. add reserved fields so that the initial uapi can be extended *if* we > > decide to go with the "userspace provides maximal protections" path, and > > use the EPCM permissions as the maximal protections for the initial > > upstreaming. > > > > That'd give us a minimal implemenation for initial upstreaming and would > > eliminate Cedric's blocking complaint. The "whole mess" of whitelisting, > > blacklisting and SGX2 support would be deferred until post-upstreaming. > > I'd like that approach more too. > > /Jarkko
On 7/9/2019 10:09 AM, Sean Christopherson wrote: > On Tue, Jul 09, 2019 at 07:22:03PM +0300, Jarkko Sakkinen wrote: >> On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote: >>> On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: >>>> On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: >>>> >>>> I still don't get why we need this whole mess and do not simply admit >>>> that there are two distinct roles: >>>> >>>> 1. Creator >>>> 2. User >>> >>> Because SELinux has existing concepts of EXECMEM and EXECMOD. >> >> What is the official documentation for those? I've only found some >> explanations from discussions and some RHEL sysadmin guides. > > No clue. My knowledge was gleaned from the code and from Stephen's > feedback. > > > The high level breakdown: > > - FILE__EXECUTE: required to gain X on a mapping to a regular file > > > - FILE__EXECUTE + FILE__WRITE: required to gain WX or W->X on a shared > mapping to a regular file > > - FILE__EXECMOD: required to gain W->X on a private mapping of a regular > file > > - PROCESS__EXECMEM: required to gain WX on a private mapping to a regular > file, OR to gain X on an anonymous mapping. > > > Translating those to SGX, with a lot of input from Stephen, I ended up > with the following: > > - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X > on an enclave page loaded from a regular file > > - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE, > required to gain W->X on an enclave page EXECMOD basically indicates a file containing self-modifying code. Your ENCLAVE_EXECDIRTY is however a process permission, which is illogical. > - PROCESS2__ENCLAVE_EXECANON: subset of EXECMEM, required to gain X on > an enclave page that is loaded from an > anonymous mapping > > - PROCESS2__ENCLAVE_MAPWX: subset of EXECMEM, required to gain WX on an > enclave page > > > >>> That being said, we can do so without functional changes to the SGX uapi, >>> e.g. add reserved fields so that the initial uapi can be extended *if* we >>> decide to go with the "userspace provides maximal protections" path, and >>> use the EPCM permissions as the maximal protections for the initial >>> upstreaming. >>> >>> That'd give us a minimal implemenation for initial upstreaming and would >>> eliminate Cedric's blocking complaint. The "whole mess" of whitelisting, >>> blacklisting and SGX2 support would be deferred until post-upstreaming. >> >> I'd like that approach more too. >> >> /Jarkko
On 7/8/2019 6:52 PM, Sean Christopherson wrote:
> On Mon, Jul 08, 2019 at 05:02:00PM -0700, Casey Schaufler wrote:
>> On 7/7/2019 6:30 AM, Dr. Greg wrote:
>>> On Wed, Jul 03, 2019 at 08:32:10AM -0700, Casey Schaufler wrote:
>>>
>>> Good morning, I hope the weekend has been enjoyable for everyone.
>>>
>>>>>> On 7/2/2019 12:42 AM, Xing, Cedric wrote:
>>>>>>> ...
>>>>>>> Guess this discussion will never end if we don't get into
>>>>>>> code. Guess it'd be more productive to talk over phone then come back
>>>>>>> to this thread with a conclusion. Will that be ok with you?
>>>>>> I don't think that a phone call is going to help. Talking code
>>>>>> issues tends to muddle them in my brain. If you can give me a few
>>>>>> days I will propose a rough version of how I think your code should
>>>>>> be integrated into the LSM environment. I'm spending more time
>>>>>> trying (unsuccessfully :( ) to discribe the issues in English than
>>>>>> it will probably take in C.
>>>>> While Casey is off writing his rosetta stone,
>>>> I'd hardly call it that. More of an effort to round the corners on
>>>> the square peg. And Cedric has some ideas on how to approach that.
>>> Should we infer from this comment that, of the two competing
>>> strategies, Cedric's is the favored architecture?
>>
>> With Cedric's latest patches I'd say there's only one
>> strategy. There's still some refinement to do, but we're
>> getting there.
>
> Dynamic tracking has an unsolvable race condition. If process A maps a
> page W and process B maps the same page X, then the result of W^X checks
> depends on the order of mprotect() calls between A and B.
I don't quite understand where the term "dynamic tracking" came from.
What's done in the patch is just to track which page was contributed by
which file. It's been done for all file mappings in Linux.
Back to the "race condition". A similar situation already exists in
SELinux, between EXECMOD and EXECMEM. Say a file does *not* have EXECMOD
but the calling process has EXECMEM. Then WX could be granted to a fresh
private mapping (because of EXECMEM). However, once the mapping has been
written to, X should have been revoked (because of lack of EXECMOD) but
will still be retained until dropped by an explicit mprotect().
Afterwards, mprotect(X) will be denied. That's not the same situation as
in this enclave case but they do share one thing in common - X should
have been revoked from an existing mapping but it isn't, which is just a
policy choice.
Nothing is unsolvable. Here are 2 options.
(1) We argue that it doesn't matter, similar to the EXECMOD/EXECMEM case
on regular file mappings described above; or
(2) EXECMOD is required for both W->X and X->W transitions, hence W
requested by the 2nd process will be denied because X has been granted
to the 1st process while EXECMOD is absent.
Please note that #2 is effectively the same concept as
PROCESS2__ENCLAVE_EXECDIRTY in Sean's patch, except that EXECMOD is per
file while ENCLAVE_EXECDIRTY is per process.
On 7/8/2019 6:33 PM, Sean Christopherson wrote: > On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: >> +static int enclave_mprotect(struct vm_area_struct *vma, size_t prot) >> +{ >> + struct ema_map *m; >> + int rc; >> + >> + /* is vma an enclave vma ? */ >> + if (!vma->vm_file) >> + return 0; >> + m = ema_get_map(vma->vm_file); >> + if (!m) >> + return 0; >> + >> + /* WX requires EXECMEM */ >> + if ((prot && PROT_WRITE) && (prot & PROT_EXEC)) { >> + rc = avc_has_perm(&selinux_state, current_sid(), current_sid(), >> + SECCLASS_PROCESS, PROCESS__EXECMEM, NULL); >> + if (rc) >> + return rc; >> + } >> + >> + rc = ema_lock_map(m); >> + if (rc) >> + return rc; >> + >> + if ((prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) >> + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, >> + ema__chk_X_cb, vma->vm_file); >> + if (!rc && (prot & PROT_WRITE) && !(vma->vm_flags & VM_WRITE)) >> + rc = ema_apply_to_range(m, vma->vm_start, vma->vm_end, >> + ema__set_M_cb, NULL); > > Not tracking whether a page has been mapped X and having ema__chk_W_cb() > allows an application to circumvent W^X policies by spinning up a helper > process. See my response in another email. This problem has nothing to do with the architecture, but is just a policy choice. Your patch of EXECDIRTY is another possible policy, by combining (or *not* distinguishing) W->X and X->W into a single WX "maximal protection". > Ignoring that issue, this approach suffers from the same race condition I > pointed out a while back[1]. If process A maps a page W and process B > maps the same page X, then the result of ema__chk_X_cb() depends on the > order of mprotect() calls between A and B. > > [1] https://lore.kernel.org/linux-security-module/20190614200123.GA32570@linux.intel.com/ You seem to be talking about the same problem in both places. >> + ema_unlock_map(m); >> + >> + return rc; >> +}
On 7/8/2019 4:53 PM, Casey Schaufler wrote: > On 7/8/2019 10:16 AM, Xing, Cedric wrote: >> On 7/8/2019 9:26 AM, Casey Schaufler wrote: >>> In this scheme you use an ema LSM to manage your ema data. >>> A quick sketch looks like: >>> >>> sgx_something_in() calls >>> security_enclave_load() calls >>> ema_enclave_load() >>> selinux_enclave_load() >>> otherlsm_enclave_load() >>> >>> Why is this better than: >>> >>> sgx_something_in() calls >>> ema_enclave_load() >>> security_enclave_load() calls >>> selinux_enclave_load() >>> otherlsm_enclave_load() >> >> Are you talking about moving EMA somewhere outside LSM? > > Yes. That's what I've been saying all along. > >> If so, where? > > I tried to make it obvious. Put the call to your EMA code > on the line before you call security_enclave_load(). Sorry but I'm still confused. EMA code is used by LSMs only. Making it callable from other parts of the kernel IMHO is probably not a good idea. And more importantly I don't understand the motivation behind it. Would you please elaborate? >>>> +/** >>>> + * ema - Enclave Memory Area structure for LSM modules >>> >>> LSM modules is redundant. "LSM" or "LSMs" would be better. >> >> Noted >> >>>> diff --git a/security/Makefile b/security/Makefile >>>> index c598b904938f..b66d03a94853 100644 >>>> --- a/security/Makefile >>>> +++ b/security/Makefile >>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >>> >>> The config option and the file name ought to match, >>> or at least be closer. >> >> Just trying to match file names as "capability" uses commoncap.c. > > Fine, then you should be using CONFIG_SECURITY_EMA. > >> >> Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? > > Make > CONFIG_SECURITY_EMA > depends on CONFIG_INTEL_SGX > > When another TEE (maybe MIPS_SSRPQ) comes along you can have > > CONFIG_SECURITY_EMA > depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ Your suggestions are reasonable. Given such config change wouldn't affect any code, can we do it later, e.g., when additional TEEs come online and make use of these new hooks? After all, security_enclave_init() will need amendment anyway as one of its current parameters is of type 'struct sgx_sigstruct', which will need to be replaced with something more generic. At the time being, I'd like to keep things intuitive so as not to confuse reviewers. >> >>>> diff --git a/security/commonema.c b/security/commonema.c >>> >>> Put this in a subdirectory. Please. >> >> Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. > > commoncap is not optional. It is a base part of the > security subsystem. ema is optional. Alright. I'd move it into a sub-folder and rename it to ema.c. Would you be ok with that?
On Tue, Jul 09, 2019 at 01:41:28PM -0700, Xing, Cedric wrote:
> On 7/9/2019 10:09 AM, Sean Christopherson wrote:
> >Translating those to SGX, with a lot of input from Stephen, I ended up
> >with the following:
> >
> > - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X
> > on an enclave page loaded from a regular file
> >
> > - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE,
> > required to gain W->X on an enclave page
>
> EXECMOD basically indicates a file containing self-modifying code. Your
> ENCLAVE_EXECDIRTY is however a process permission, which is illogical.
How is it illogical? If a PROCESS wants to EXECute a DIRTY ENCLAVE page,
then it needs PROCESS2__ENCLAVE_EXECDIRTY.
FILE__EXECMOD on /dev/sgx/enclave is a process permission masquerading as
a file permission, let's call it what it is.
On 7/9/2019 3:25 PM, Sean Christopherson wrote:
> On Tue, Jul 09, 2019 at 01:41:28PM -0700, Xing, Cedric wrote:
>> On 7/9/2019 10:09 AM, Sean Christopherson wrote:
>>> Translating those to SGX, with a lot of input from Stephen, I ended up
>>> with the following:
>>>
>>> - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X
>>> on an enclave page loaded from a regular file
>>>
>>> - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE,
>>> required to gain W->X on an enclave page
>>
>> EXECMOD basically indicates a file containing self-modifying code. Your
>> ENCLAVE_EXECDIRTY is however a process permission, which is illogical.
>
> How is it illogical? If a PROCESS wants to EXECute a DIRTY ENCLAVE page,
> then it needs PROCESS2__ENCLAVE_EXECDIRTY
Just think of the purpose of FILE__EXECMOD. It indicates to LSM the file
has self-modifying code, hence W->X transition should be considered
"normal" and allowed, regardless which process that file is loaded into.
The same thing for enclaves here. Whether an enclave contains
self-modifying code is specific to that enclave, regardless which
process it is loaded into.
But what are you doing is quite the opposite, and that's I mean by
"illogical".
On 7/9/2019 3:13 PM, Xing, Cedric wrote: > On 7/8/2019 4:53 PM, Casey Schaufler wrote: >> On 7/8/2019 10:16 AM, Xing, Cedric wrote: >>> On 7/8/2019 9:26 AM, Casey Schaufler wrote: >>>> In this scheme you use an ema LSM to manage your ema data. >>>> A quick sketch looks like: >>>> >>>> sgx_something_in() calls >>>> security_enclave_load() calls >>>> ema_enclave_load() >>>> selinux_enclave_load() >>>> otherlsm_enclave_load() >>>> >>>> Why is this better than: >>>> >>>> sgx_something_in() calls >>>> ema_enclave_load() >>>> security_enclave_load() calls >>>> selinux_enclave_load() >>>> otherlsm_enclave_load() >>> >>> Are you talking about moving EMA somewhere outside LSM? >> >> Yes. That's what I've been saying all along. >> >>> If so, where? >> >> I tried to make it obvious. Put the call to your EMA code >> on the line before you call security_enclave_load(). > > Sorry but I'm still confused. > > EMA code is used by LSMs only. Making it callable from other parts of the kernel IMHO is probably not a good idea. And more importantly I don't understand the motivation behind it. Would you please elaborate? LSM modules implement additional access control restrictions. The EMA code does not do that, it provides management of data that is used by security modules. It is not one itself. VFS also performs this role, but no one would consider making VFS a security module. >>>>> +/** >>>>> + * ema - Enclave Memory Area structure for LSM modules >>>> >>>> LSM modules is redundant. "LSM" or "LSMs" would be better. >>> >>> Noted >>> >>>>> diff --git a/security/Makefile b/security/Makefile >>>>> index c598b904938f..b66d03a94853 100644 >>>>> --- a/security/Makefile >>>>> +++ b/security/Makefile >>>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>>>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>>>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>>>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>>>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >>>> >>>> The config option and the file name ought to match, >>>> or at least be closer. >>> >>> Just trying to match file names as "capability" uses commoncap.c. >> >> Fine, then you should be using CONFIG_SECURITY_EMA. >> >>> >>> Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? >> >> Make >> CONFIG_SECURITY_EMA >> depends on CONFIG_INTEL_SGX >> >> When another TEE (maybe MIPS_SSRPQ) comes along you can have >> >> CONFIG_SECURITY_EMA >> depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ > > Your suggestions are reasonable. Given such config change wouldn't affect any code, can we do it later, That doesn't make the current options any less confusing, and it will be easier to make the change now than at some point in the future. > e.g., when additional TEEs come online and make use of these new hooks? After all, security_enclave_init() will need amendment anyway as one of its current parameters is of type 'struct sgx_sigstruct', which will need to be replaced with something more generic. At the time being, I'd like to keep things intuitive so as not to confuse reviewers. Reviewers (including me) are already confused by the inconsistency. > >>> >>>>> diff --git a/security/commonema.c b/security/commonema.c >>>> >>>> Put this in a subdirectory. Please. >>> >>> Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. >> >> commoncap is not optional. It is a base part of the >> security subsystem. ema is optional. > > Alright. I'd move it into a sub-folder and rename it to ema.c. Would you be ok with that? Sounds fine.
On 7/9/2019 5:10 PM, Casey Schaufler wrote: > On 7/9/2019 3:13 PM, Xing, Cedric wrote: >> On 7/8/2019 4:53 PM, Casey Schaufler wrote: >>> On 7/8/2019 10:16 AM, Xing, Cedric wrote: >>>> On 7/8/2019 9:26 AM, Casey Schaufler wrote: >>>>> In this scheme you use an ema LSM to manage your ema data. >>>>> A quick sketch looks like: >>>>> >>>>> sgx_something_in() calls >>>>> security_enclave_load() calls >>>>> ema_enclave_load() >>>>> selinux_enclave_load() >>>>> otherlsm_enclave_load() >>>>> >>>>> Why is this better than: >>>>> >>>>> sgx_something_in() calls >>>>> ema_enclave_load() >>>>> security_enclave_load() calls >>>>> selinux_enclave_load() >>>>> otherlsm_enclave_load() >>>> >>>> Are you talking about moving EMA somewhere outside LSM? >>> >>> Yes. That's what I've been saying all along. >>> >>>> If so, where? >>> >>> I tried to make it obvious. Put the call to your EMA code >>> on the line before you call security_enclave_load(). >> >> Sorry but I'm still confused. >> >> EMA code is used by LSMs only. Making it callable from other parts of the kernel IMHO is probably not a good idea. And more importantly I don't understand the motivation behind it. Would you please elaborate? > > LSM modules implement additional access control restrictions. > The EMA code does not do that, it provides management of data > that is used by security modules. It is not one itself. VFS > also performs this role, but no one would consider making VFS > a security module. You are right. EMA is more like a helper library than a real LSM. But the practical problem is, it has a piece of initialization code, to basically request some space in the file blob from the LSM infrastructure. That cannot be done by any LSMs at runtime. So it has to either be done in LSM infrastructure directly, or make itself an LSM to make its initialization function invoked by LSM infrastructure automatically. You have objected to the former, so I switched to the latter. Are you now objecting to the latter as well? Then what are you suggesting, really? VFS is a completely different story. It's the file system abstraction so it has a natural place to live in the kernel, and its initialization doesn't depend on the LSM infrastructure. EMA on the other hand, shall belong to LSM because it is both produced and consumed within LSM. And, Stephen, do you have an opinion on this? >>>>>> +/** >>>>>> + * ema - Enclave Memory Area structure for LSM modules >>>>> >>>>> LSM modules is redundant. "LSM" or "LSMs" would be better. >>>> >>>> Noted >>>> >>>>>> diff --git a/security/Makefile b/security/Makefile >>>>>> index c598b904938f..b66d03a94853 100644 >>>>>> --- a/security/Makefile >>>>>> +++ b/security/Makefile >>>>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>>>>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>>>>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>>>>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>>>>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >>>>> >>>>> The config option and the file name ought to match, >>>>> or at least be closer. >>>> >>>> Just trying to match file names as "capability" uses commoncap.c. >>> >>> Fine, then you should be using CONFIG_SECURITY_EMA. >>> >>>> >>>> Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? >>> >>> Make >>> CONFIG_SECURITY_EMA >>> depends on CONFIG_INTEL_SGX >>> >>> When another TEE (maybe MIPS_SSRPQ) comes along you can have >>> >>> CONFIG_SECURITY_EMA >>> depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ >> >> Your suggestions are reasonable. Given such config change wouldn't affect any code, can we do it later, > > That doesn't make the current options any less confusing, > and it will be easier to make the change now than at some > point in the future. > >> e.g., when additional TEEs come online and make use of these new hooks? After all, security_enclave_init() will need amendment anyway as one of its current parameters is of type 'struct sgx_sigstruct', which will need to be replaced with something more generic. At the time being, I'd like to keep things intuitive so as not to confuse reviewers. > > Reviewers (including me) are already confused by the inconsistency. OK. Let me make this change. >>>> >>>>>> diff --git a/security/commonema.c b/security/commonema.c >>>>> >>>>> Put this in a subdirectory. Please. >>>> >>>> Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. >>> >>> commoncap is not optional. It is a base part of the >>> security subsystem. ema is optional. >> >> Alright. I'd move it into a sub-folder and rename it to ema.c. Would you be ok with that? > > Sounds fine. This is another part that confuses me. Per you comment here, I think you are OK with EMA being part of LSM (I mean, living somewhere under security/). But your other comment of calling ema_enclave_load() alongside security_enclave_load() made me think EMA and LSM were separate. What do you want really?
On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote:
Good evening to everyone.
> That being said, we can do so without functional changes to the SGX
> uapi, e.g. add reserved fields so that the initial uapi can be
> extended *if* we decide to go with the "userspace provides maximal
> protections" path, and use the EPCM permissions as the maximal
> protections for the initial upstreaming.
>
> That'd give us a minimal implemenation for initial upstreaming and
> would eliminate Cedric's blocking complaint. The "whole mess" of
> whitelisting, blacklisting and SGX2 support would be deferred until
> post-upstreaming.
Are we convinced the 'mess' will be any easier to clean up after the
driver is upstreamed?
The primary problem is that we haven't addressed the issue of what
this technology is designed to do and its implications with respect to
the kernel. As a result we are attempting to implement controls which
we are comfortable with and understand rather then those that are
relevant.
Have a good evening.
Dr. Greg
As always,
Dr. Greg Wettstein, Ph.D, Worker
IDfusion, LLC Implementing SGX secured and modeled
4206 N. 19th Ave. intelligent network endpoints.
Fargo, ND 58102
PH: 701-281-1686 EMAIL: greg@idfusion.net
------------------------------------------------------------------------------
"Courage is not the absence of fear, but rather the judgement that
something else is more important than fear."
-- Ambrose Redmoon
On 7/9/2019 6:28 PM, Dr. Greg wrote: > On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote: > > Good evening to everyone. > >> That being said, we can do so without functional changes to the SGX >> uapi, e.g. add reserved fields so that the initial uapi can be >> extended *if* we decide to go with the "userspace provides maximal >> protections" path, and use the EPCM permissions as the maximal >> protections for the initial upstreaming. >> >> That'd give us a minimal implemenation for initial upstreaming and >> would eliminate Cedric's blocking complaint. The "whole mess" of >> whitelisting, blacklisting and SGX2 support would be deferred until >> post-upstreaming. > > Are we convinced the 'mess' will be any easier to clean up after the > driver is upstreamed? > > The primary problem is that we haven't addressed the issue of what > this technology is designed to do and its implications with respect to > the kernel. As a result we are attempting to implement controls which > we are comfortable with and understand rather then those that are > relevant. I don't think it's about easier or harder to clean up the mess, but a divide-and-conquer strategy. After all, SGX and LSM are kind of orthogonal as long as SGX doesn't compromise the protection provided by LSM. Let's step back and look at what started this lengthy discussion. The primary problem of v20 was that the SGX module allows executable enclave pages to be created from non-executable regular pages, which could be exploited by adversaries to grant executable permissions to pages that would otherwise be denied without SGX. And that could be fixed simply by capping EPCM permissions to whatever allowed on the source page, without touching LSM. Of course the drawback is loss of functionality - i.e. self-modifying enclaves cannot be loaded unless the calling process has EXECMEM. But that should suffice, as most SGX1 applications don't contain self-modifying code anyway. Then we could switch our focus back to LSM and work out what's relevant, especially for SGX2 and beyond. > Have a good evening. > > Dr. Greg > > As always, > Dr. Greg Wettstein, Ph.D, Worker > IDfusion, LLC Implementing SGX secured and modeled > 4206 N. 19th Ave. intelligent network endpoints. > Fargo, ND 58102 > PH: 701-281-1686 EMAIL: greg@idfusion.net > ------------------------------------------------------------------------------ > "Courage is not the absence of fear, but rather the judgement that > something else is more important than fear." > -- Ambrose Redmoon >
[-- Attachment #1: Type: text/plain, Size: 2114 bytes --] On 2019-07-08 10:29, Sean Christopherson wrote: > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: >> On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: >> >> I still don't get why we need this whole mess and do not simply admit >> that there are two distinct roles: >> >> 1. Creator >> 2. User > > Because SELinux has existing concepts of EXECMEM and EXECMOD. > >> In the SELinux context Creator needs FILE__WRITE and FILE__EXECUTE but >> User does not. It just gets the fd from the Creator. I'm sure that all >> the SGX2 related functionality can be solved somehow in this role >> playing game. >> >> An example would be the usual case where enclave is actually a loader >> that loads the actual piece of software that one wants to run. Things >> simply need to be designed in a way the Creator runs the loader part. >> These are non-trivial problems but oddball security model is not going >> to make them disappear - on the contrary it will make designing user >> space only more complicated. >> >> I think this is classical example of when something overly complicated >> is invented in the kernel only to realize that it should be solved in >> the user space. >> >> It would not be like the only use case where some kind of privileged >> daemon is used for managing some a kernel provided resource. >> >> I think a really good conclusion from this discussion that has taken two >> months is to realize that nothing needs to be done in this area (except >> *maybe* noexec check). > > Hmm, IMO we need to support at least equivalents to EXECMEM and EXECMOD. > > That being said, we can do so without functional changes to the SGX uapi, > e.g. add reserved fields so that the initial uapi can be extended *if* we > decide to go with the "userspace provides maximal protections" path, and > use the EPCM permissions as the maximal protections for the initial > upstreaming. Why do you need to add reserved fields now? Isn't this what incorporating the struct size in the ioctl number is for? -- Jethro Beekman | Fortanix [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3990 bytes --]
On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: > selinux_enclave_init() determines if an enclave is allowed to launch, using the > criteria described earlier. This implementation does NOT accept SIGSTRUCT in > anonymous memory. The backing file is also cached in struct > file_security_struct and will serve as the base for decisions for anonymous > pages. Did we ever reach a consensus on whether sigstruct must reside in a file? > + /* Store SIGSTRUCT file for future use */ > + if (atomic_long_cmpxchg(&fsec->encl_ss, 0, (long)src->vm_file)) > + return -EEXIST; > + > + get_file(src->vm_file); My understanding is that Andy is strongly against pinning a file for the duration of the enclave, has that changed?
[-- Attachment #1: Type: text/plain, Size: 621 bytes --] On 2019-07-10 08:49, Sean Christopherson wrote: > On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: >> selinux_enclave_init() determines if an enclave is allowed to launch, using the >> criteria described earlier. This implementation does NOT accept SIGSTRUCT in >> anonymous memory. The backing file is also cached in struct >> file_security_struct and will serve as the base for decisions for anonymous >> pages. > > Did we ever reach a consensus on whether sigstruct must reside in a file? This would be inconvenient for me, but I guess I can create a memfd? -- Jethro Beekman | Fortanix [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3990 bytes --]
On Tue, Jul 09, 2019 at 04:11:08PM -0700, Xing, Cedric wrote:
> On 7/9/2019 3:25 PM, Sean Christopherson wrote:
> >On Tue, Jul 09, 2019 at 01:41:28PM -0700, Xing, Cedric wrote:
> >>On 7/9/2019 10:09 AM, Sean Christopherson wrote:
> >>>Translating those to SGX, with a lot of input from Stephen, I ended up
> >>>with the following:
> >>>
> >>> - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X
> >>> on an enclave page loaded from a regular file
> >>>
> >>> - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE,
> >>> required to gain W->X on an enclave page
> >>
> >>EXECMOD basically indicates a file containing self-modifying code. Your
> >>ENCLAVE_EXECDIRTY is however a process permission, which is illogical.
> >
> >How is it illogical? If a PROCESS wants to EXECute a DIRTY ENCLAVE page,
> >then it needs PROCESS2__ENCLAVE_EXECDIRTY
> Just think of the purpose of FILE__EXECMOD. It indicates to LSM the file has
> self-modifying code, hence W->X transition should be considered "normal" and
> allowed, regardless which process that file is loaded into.
>
> The same thing for enclaves here. Whether an enclave contains self-modifying
> code is specific to that enclave, regardless which process it is loaded
> into.
>
> But what are you doing is quite the opposite, and that's I mean by
> "illogical".
Ah. My intent was to minimize the number of new labels, and because W->X
scenarios are not guaranteed to be backed by a file, I went with a per
process permission. Ditto for EXECANON. I'm not opposed to also having a
per file permission that can be used when possible.
Something like this? And maybe merge EXECANON and EXECDIRTY into a single
permission?
Depending on whether sigstruct is required to be in a pinned file, EAUG
pages would need either EXECDIRTY or EXECMOD.
static int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot,
bool measured)
{
const struct cred *cred = current_cred();
u32 sid = cred_sid(cred);
int ret;
/* Currently supported only in noexec kernels. */
WARN_ON_ONCE(!default_noexec);
/* Only executable enclave pages are restricted in any way. */
if (!(prot & PROT_EXEC))
return 0;
if (!measured) {
ret = enclave_has_perm(sid, PROCESS2__ENCLAVE_EXECUNMR);
if (ret)
goto out;
}
if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file))) {
ret = enclave_has_perm(sid, PROCESS2__ENCLAVE_EXECANON);
if (ret)
goto out;
/* Ability to do W->X within the enclave. */
if (prot & PROT_WRITE)
ret = enclave_has_perm(sid,
PROCESS2__ENCLAVE_EXECDIRTY);
} else {
ret = file_has_perm(cred, vma->vm_file, FILE__ENCLAVE_EXECUTE);
if (ret)
goto out;
/*
* Load code from a modified private mapping, or from any file
* mapping with the ability to do W->X within the enclave.
*/
if (vma->anon_vma || (prot & PROT_WRITE))
ret = file_has_perm(cred, vma->vm_file,
FILE__ENCLAVE_EXECMOD);
}
out:
return ret;
}
On 7/10/2019 8:49 AM, Sean Christopherson wrote: > On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: >> selinux_enclave_init() determines if an enclave is allowed to launch, using the >> criteria described earlier. This implementation does NOT accept SIGSTRUCT in >> anonymous memory. The backing file is also cached in struct >> file_security_struct and will serve as the base for decisions for anonymous >> pages. > > Did we ever reach a consensus on whether sigstruct must reside in a file? No. We reached the opposite agreement of *not* requiring sigstruct to reside in a file at the interface level - i.e., security_enclave_init() takes a VMA but *not* a file struct as input. At the implementation level, an LSM may require sigstruct to reside in a file. But that's a per-LSM decision. >> + /* Store SIGSTRUCT file for future use */ >> + if (atomic_long_cmpxchg(&fsec->encl_ss, 0, (long)src->vm_file)) >> + return -EEXIST; >> + >> + get_file(src->vm_file); > > My understanding is that Andy is strongly against pinning a file for the > duration of the enclave, has that changed? I think everyone including Andy prefers not to pin any files. But it's a trade-off among code simplicity, auditing accuracy and memory consumption. I think the latest suggestion from Stephen was to keep files open, for SELinux. Again, that's a per-LSM decision.
On 7/10/2019 9:08 AM, Jethro Beekman wrote: > On 2019-07-10 08:49, Sean Christopherson wrote: >> On Sun, Jul 07, 2019 at 04:41:34PM -0700, Cedric Xing wrote: >>> selinux_enclave_init() determines if an enclave is allowed to launch, >>> using the >>> criteria described earlier. This implementation does NOT accept >>> SIGSTRUCT in >>> anonymous memory. The backing file is also cached in struct >>> file_security_struct and will serve as the base for decisions for >>> anonymous >>> pages. >> >> Did we ever reach a consensus on whether sigstruct must reside in a file? > > This would be inconvenient for me, but I guess I can create a memfd? No, sigstruct doesn't have to reside in a file. But the current direction is, in SELinux, what the enclave can do depends on permissions given to the file containing sigstruct. That said, if SELinux is in effect, sigstruct has to reside in a real file with FILE__EXECUTE permission for the enclave to launch. memfd wouldn't work. To some extent, that serves the purpose of whitelisting. > -- > Jethro Beekman | Fortanix >
On Tue, Jul 09, 2019 at 10:09:17AM -0700, Sean Christopherson wrote:
> On Tue, Jul 09, 2019 at 07:22:03PM +0300, Jarkko Sakkinen wrote:
> > On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote:
> > > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote:
> > > > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote:
> > > >
> > > > I still don't get why we need this whole mess and do not simply admit
> > > > that there are two distinct roles:
> > > >
> > > > 1. Creator
> > > > 2. User
> > >
> > > Because SELinux has existing concepts of EXECMEM and EXECMOD.
> >
> > What is the official documentation for those? I've only found some
> > explanations from discussions and some RHEL sysadmin guides.
>
> No clue. My knowledge was gleaned from the code and from Stephen's
> feedback.
OK, thanks for elaboration. Got nailed some details I was missing :-)
Anyway, to accompany your code changes I'm eager to document this not
least because it is a good peer test that this all make sense (you
cannot "unit test" a security model so that is the next best thing).
Still, we need a documentation reference to reflect the narrative
for these changes, seriously. It cannot be that SELinux is widely
deployed and it completely lacks documentation for its basic
objects, can it?
/Jarkko
On Wed, Jul 10, 2019 at 11:19:30PM +0300, Jarkko Sakkinen wrote:
> On Tue, Jul 09, 2019 at 10:09:17AM -0700, Sean Christopherson wrote:
> > On Tue, Jul 09, 2019 at 07:22:03PM +0300, Jarkko Sakkinen wrote:
> > > On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote:
> > > > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote:
> > > > > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote:
> > > > >
> > > > > I still don't get why we need this whole mess and do not simply admit
> > > > > that there are two distinct roles:
> > > > >
> > > > > 1. Creator
> > > > > 2. User
> > > >
> > > > Because SELinux has existing concepts of EXECMEM and EXECMOD.
> > >
> > > What is the official documentation for those? I've only found some
> > > explanations from discussions and some RHEL sysadmin guides.
> >
> > No clue. My knowledge was gleaned from the code and from Stephen's
> > feedback.
>
> OK, thanks for elaboration. Got nailed some details I was missing :-)
>
> Anyway, to accompany your code changes I'm eager to document this not
> least because it is a good peer test that this all make sense (you
> cannot "unit test" a security model so that is the next best thing).
>
> Still, we need a documentation reference to reflect the narrative
> for these changes, seriously. It cannot be that SELinux is widely
> deployed and it completely lacks documentation for its basic
> objects, can it?
The vast majority of documentation I've found is on *using* SELinux,
e.g. writing policies and whatnot. I haven't found anything on its
internal details, although I admitedly haven't looked all that hard.
On 7/9/2019 5:55 PM, Xing, Cedric wrote: > On 7/9/2019 5:10 PM, Casey Schaufler wrote: >> On 7/9/2019 3:13 PM, Xing, Cedric wrote: >>> On 7/8/2019 4:53 PM, Casey Schaufler wrote: >>>> On 7/8/2019 10:16 AM, Xing, Cedric wrote: >>>>> On 7/8/2019 9:26 AM, Casey Schaufler wrote: >>>>>> In this scheme you use an ema LSM to manage your ema data. >>>>>> A quick sketch looks like: >>>>>> >>>>>> sgx_something_in() calls >>>>>> security_enclave_load() calls >>>>>> ema_enclave_load() >>>>>> selinux_enclave_load() >>>>>> otherlsm_enclave_load() >>>>>> >>>>>> Why is this better than: >>>>>> >>>>>> sgx_something_in() calls >>>>>> ema_enclave_load() >>>>>> security_enclave_load() calls >>>>>> selinux_enclave_load() >>>>>> otherlsm_enclave_load() >>>>> >>>>> Are you talking about moving EMA somewhere outside LSM? >>>> >>>> Yes. That's what I've been saying all along. >>>> >>>>> If so, where? >>>> >>>> I tried to make it obvious. Put the call to your EMA code >>>> on the line before you call security_enclave_load(). >>> >>> Sorry but I'm still confused. >>> >>> EMA code is used by LSMs only. Making it callable from other parts of the kernel IMHO is probably not a good idea. And more importantly I don't understand the motivation behind it. Would you please elaborate? >> >> LSM modules implement additional access control restrictions. >> The EMA code does not do that, it provides management of data >> that is used by security modules. It is not one itself. VFS >> also performs this role, but no one would consider making VFS >> a security module. > > You are right. So far, so good ... > EMA is more like a helper library than a real LSM. Then you should use it as such. > But the practical problem is, it has a piece of initialization code, to basically request some space in the file blob from the LSM infrastructure. The security modules that want to use EMA should request that space. > That cannot be done by any LSMs at runtime. Sure it can. And it has to. What if you don't have any security modules that use the EMA data? Surely you don't want to be allocating blob space for EMA data if no one is going to use it. > So it has to either be done in LSM infrastructure directly, or make itself an LSM to make its initialization function invoked by LSM infrastructure automatically. That is not true. The security module that wants to use the EMA data can call whatever allocation function you use. Or, the call can be made from the code just before you call the security hook, which would be identical to calling it as a "first" hook. > You have objected to the former, so I switched to the latter. Are you now objecting to the latter as well? Then what are you suggesting, really? Call your allocation function just before you call security_enclave_load(). There is no way that selinux_enclave_load() could tell the difference. > VFS is a completely different story. It's the file system abstraction so it has a natural place to live in the kernel, and its initialization doesn't depend on the LSM infrastructure. EMA on the other hand, shall belong to LSM because it is both produced and consumed within LSM. And this is the enclave abstraction, or rather, should be according to at least half the people joining in on the thread. It does not belong in the LSM infrastructure because it is it's own thing, with its own state and data, which it needs to maintain in its own way and place. It needs interfaces so that security modules can use that information appropriately. It needs a hook or two so that the enclave abstraction can ask the security modules to make decisions. > > And, Stephen, do you have an opinion on this? > >>>>>>> +/** >>>>>>> + * ema - Enclave Memory Area structure for LSM modules >>>>>> >>>>>> LSM modules is redundant. "LSM" or "LSMs" would be better. >>>>> >>>>> Noted >>>>> >>>>>>> diff --git a/security/Makefile b/security/Makefile >>>>>>> index c598b904938f..b66d03a94853 100644 >>>>>>> --- a/security/Makefile >>>>>>> +++ b/security/Makefile >>>>>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>>>>>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>>>>>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>>>>>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>>>>>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >>>>>> >>>>>> The config option and the file name ought to match, >>>>>> or at least be closer. >>>>> >>>>> Just trying to match file names as "capability" uses commoncap.c. >>>> >>>> Fine, then you should be using CONFIG_SECURITY_EMA. >>>> >>>>> >>>>> Like I said, this feature could potentially be used by TEEs other than SGX. For now, SGX is the only user so it is tied to CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you have a preference? >>>> >>>> Make >>>> CONFIG_SECURITY_EMA >>>> depends on CONFIG_INTEL_SGX >>>> >>>> When another TEE (maybe MIPS_SSRPQ) comes along you can have >>>> >>>> CONFIG_SECURITY_EMA >>>> depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ >>> >>> Your suggestions are reasonable. Given such config change wouldn't affect any code, can we do it later, >> >> That doesn't make the current options any less confusing, >> and it will be easier to make the change now than at some >> point in the future. >> >>> e.g., when additional TEEs come online and make use of these new hooks? After all, security_enclave_init() will need amendment anyway as one of its current parameters is of type 'struct sgx_sigstruct', which will need to be replaced with something more generic. At the time being, I'd like to keep things intuitive so as not to confuse reviewers. >> >> Reviewers (including me) are already confused by the inconsistency. > > OK. Let me make this change. Thank you. >>>>> >>>>>>> diff --git a/security/commonema.c b/security/commonema.c >>>>>> >>>>>> Put this in a subdirectory. Please. >>>>> >>>>> Then why is commoncap.c located in this directory? I'm just trying to match the existing convention. >>>> >>>> commoncap is not optional. It is a base part of the >>>> security subsystem. ema is optional. >>> >>> Alright. I'd move it into a sub-folder and rename it to ema.c. Would you be ok with that? >> >> Sounds fine. > > This is another part that confuses me. Per you comment here, I think you are OK with EMA being part of LSM Ah. Being in the security directory does not mean it's a part of the LSM system. Keys and integrity are security subsystems that are related to, but not part of, the LSM sub-system. > (I mean, living somewhere under security/). But your other comment of calling ema_enclave_load() alongside security_enclave_load() made me think EMA and LSM were separate. What do you want really? Please stop asking the same question over and over. You're not going to get a different answer from what you've gotten already. Look back at it what's already been said.
On Wed, Jul 10, 2019 at 11:19:30PM +0300, Jarkko Sakkinen wrote: > Still, we need a documentation reference to reflect the narrative > for these changes, seriously. It cannot be that SELinux is widely > deployed and it completely lacks documentation for its basic > objects, can it? I found one good reference: https://selinuxpTroject.org/page/ObjectClassesPerms It describes EXECMOD as: "Make executable a file mapping that has been modified by copy-on-write. (Text relocation)" This makes me wonder how EXECMOD even connects to this discussion? Enclave is never a COW mapping. Seems like there is a huge diff on how SELinux's official documentation describes it and how it is described here... /Jarkko
Just some questions on these. On Tue, Jul 09, 2019 at 10:09:17AM -0700, Sean Christopherson wrote: > - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X > on an enclave page loaded from a regular file One thing that I have hard time to perceive is that whether the process or the target object has them. So would this be in the files extended attribute or does process need to possess this or both? > - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE, > required to gain W->X on an enclave page Still puzzling with EXECMOD given that how it is documented in https://selinuxproject.org/page/ObjectClassesPerms. If anything in that document is out of date, would be nice if it was updated. > - PROCESS2__ENCLAVE_EXECANON: subset of EXECMEM, required to gain X on > an enclave page that is loaded from an > anonymous mapping > > - PROCESS2__ENCLAVE_MAPWX: subset of EXECMEM, required to gain WX on an > enclave page I guess these three belong to the process and are not attached to file. How in SELinux anyway process in the first place acquires any SELinux permissions? I guess getty or whatever login process can set its perms before setuid() et al somehow (I don't know how) because they run as root? /Jarkko
On 7/10/2019 3:16 PM, Jarkko Sakkinen wrote: > Just some questions on these. > > On Tue, Jul 09, 2019 at 10:09:17AM -0700, Sean Christopherson wrote: >> - FILE__ENCLAVE_EXECUTE: equivalent to FILE__EXECUTE, required to gain X >> on an enclave page loaded from a regular file > > One thing that I have hard time to perceive is that whether the process > or the target object has them. So would this be in the files extended > attribute or does process need to possess this or both? The target object. >> - PROCESS2__ENCLAVE_EXECDIRTY: hybrid of EXECMOD and EXECUTE+WRITE, >> required to gain W->X on an enclave page > > Still puzzling with EXECMOD given that how it is documented in > https://selinuxproject.org/page/ObjectClassesPerms. If anything in that > document is out of date, would be nice if it was updated. If you search for "EXECMOD" in security/selinux/hooks.c in the latest (Linux-5.2) master, you'll find only one occurrence - at line 3702. The logic over there, if translated into English, basically says FILE__EXECMOD is required (on the backing file) if mprotect() is called to request X on a private file mapping that has been modified by the calling process. That's what Sean meant by "W->X". EXCLAVE_EXECDIRTY is similar to EXECMOD but because of his "maximal protection" model, LSM couldn't distinguish between "W->X" and "X->W", hence those two are collapsed into a single case - WX in "maximal protection". >> - PROCESS2__ENCLAVE_EXECANON: subset of EXECMEM, required to gain X on >> an enclave page that is loaded from an >> anonymous mapping >> >> - PROCESS2__ENCLAVE_MAPWX: subset of EXECMEM, required to gain WX on an >> enclave page > > I guess these three belong to the process and are not attached to file. Correct. ENCLAVE_EXECANON basically means the calling process doesn't care what permissions given to enclave pages as the SIGSTRUCT alone is considered sufficient validation. This has a security impact process wide so shall be a process permission. ENCLAVE_{EXECDIRTY|MAPWX} express enclave specific requirements/behaviors and IMO shall be enclave permissions, probably manifested as file permissions on the file containing SIGSTRUCT. Sean was taking a shortcut to make them process scope in order to avoid keeping the SIGSTRUCT file around, which was what I criticized as "illogical". > How in SELinux anyway process in the first place acquires any SELinux > permissions? I guess getty or whatever login process can set its perms > before setuid() et al somehow (I don't know how) because they run as > root? > > /Jarkko >
On Wed, Jul 10, 2019 at 01:31:04PM -0700, Sean Christopherson wrote: > On Wed, Jul 10, 2019 at 11:19:30PM +0300, Jarkko Sakkinen wrote: > > On Tue, Jul 09, 2019 at 10:09:17AM -0700, Sean Christopherson wrote: > > > On Tue, Jul 09, 2019 at 07:22:03PM +0300, Jarkko Sakkinen wrote: > > > > On Mon, Jul 08, 2019 at 10:29:30AM -0700, Sean Christopherson wrote: > > > > > On Fri, Jul 05, 2019 at 07:05:49PM +0300, Jarkko Sakkinen wrote: > > > > > > On Wed, Jun 19, 2019 at 03:23:49PM -0700, Sean Christopherson wrote: > > > > > > > > > > > > I still don't get why we need this whole mess and do not simply admit > > > > > > that there are two distinct roles: > > > > > > > > > > > > 1. Creator > > > > > > 2. User > > > > > > > > > > Because SELinux has existing concepts of EXECMEM and EXECMOD. > > > > > > > > What is the official documentation for those? I've only found some > > > > explanations from discussions and some RHEL sysadmin guides. > > > > > > No clue. My knowledge was gleaned from the code and from Stephen's > > > feedback. > > > > OK, thanks for elaboration. Got nailed some details I was missing :-) > > > > Anyway, to accompany your code changes I'm eager to document this not > > least because it is a good peer test that this all make sense (you > > cannot "unit test" a security model so that is the next best thing). > > > > Still, we need a documentation reference to reflect the narrative > > for these changes, seriously. It cannot be that SELinux is widely > > deployed and it completely lacks documentation for its basic > > objects, can it? > > The vast majority of documentation I've found is on *using* SELinux, > e.g. writing policies and whatnot. I haven't found anything on its > internal details, although I admitedly haven't looked all that hard. I think these are the best references. Object classes: https://selinuxproject.org/page/ObjectClassesPerms Background/concepts: https://selinuxproject.org/page/Category:Notebook#Notebook_Sections /Jarkko
On Wed, Jul 10, 2019 at 04:16:42PM -0700, Xing, Cedric wrote:
> > Still puzzling with EXECMOD given that how it is documented in
> > https://selinuxproject.org/page/ObjectClassesPerms. If anything in that
> > document is out of date, would be nice if it was updated.
>
> If you search for "EXECMOD" in security/selinux/hooks.c in the latest
> (Linux-5.2) master, you'll find only one occurrence - at line 3702.
>
> The logic over there, if translated into English, basically says
> FILE__EXECMOD is required (on the backing file) if mprotect() is called to
> request X on a private file mapping that has been modified by the calling
> process. That's what Sean meant by "W->X".
Looking at that part of code, there is this comment:
/*
* We are making executable a file mapping that has
* had some COW done. Since pages might have been
* written, check ability to execute the possibly
* modified content. This typically should only
* occur for text relocations.
*/
There is no COW done with enclaves, never. Thus, EXECMOD does not
connect in any possible way to SGX. OR, that comment is false.
Which one is it?
Also the official documentation for SELinux speaks only about COW
mappings.
Also the condition supports all this as a *private* file mapping ends up
to the anon_vma list when it gets written. We have a *shared* file
mapping
Nothing that you say makes sense to me, sorry...
/Jarkko
On Mon, Jul 08, 2019 at 05:02:00PM -0700, Casey Schaufler wrote: > > On 7/7/2019 6:30 AM, Dr. Greg wrote: > > All well taken points from an implementation perspective, but they > > elide the point I was trying to make. Which is the fact that without > > any semblance of a discussion regarding the requirements needed to > > implement a security architecture around the concept of a TEE, this > > entire process, despite Cedric's well intentioned efforts, amounts to > > pounding a square solution into the round hole of a security problem. > Lead with code. I love a good requirements document, but one of the > few places where I agree with the agile folks is that working code > speaks loudly. > > > Which, as I noted in my e-mail, is tantamount to security theater. > > Not buying that. Not rejecting it, either. Without code > to judge it's kind of hard to say. We tried the code approach. By most accounts it seemed to go poorly, given that it ended up with Jonathan Corbet writing an LWN feature article on the state of dysfunction and chaos surrounding Linux SGX driver development. So we are standing around and mumbling until we can figure out what kind of code we need to write to make the new driver relevant to the CISO's and security architects we need to defend SGX to. Have a good week. Dr. Greg As always, Dr. Greg Wettstein, Ph.D, Worker IDfusion, LLC Implementing SGX secured and modeled 4206 N. 19th Ave. intelligent network endpoints. Fargo, ND 58102 PH: 701-281-1686 EMAIL: greg@idfusion.net ------------------------------------------------------------------------------ "Five year projections, are you kidding me. We don't know what we are supposed to be doing at the 4 o'clock meeting this afternoon." -- Terry Wieland Resurrection
On 7/9/19 8:55 PM, Xing, Cedric wrote: > On 7/9/2019 5:10 PM, Casey Schaufler wrote: >> On 7/9/2019 3:13 PM, Xing, Cedric wrote: >>> On 7/8/2019 4:53 PM, Casey Schaufler wrote: >>>> On 7/8/2019 10:16 AM, Xing, Cedric wrote: >>>>> On 7/8/2019 9:26 AM, Casey Schaufler wrote: >>>>>> In this scheme you use an ema LSM to manage your ema data. >>>>>> A quick sketch looks like: >>>>>> >>>>>> sgx_something_in() calls >>>>>> security_enclave_load() calls >>>>>> ema_enclave_load() >>>>>> selinux_enclave_load() >>>>>> otherlsm_enclave_load() >>>>>> >>>>>> Why is this better than: >>>>>> >>>>>> sgx_something_in() calls >>>>>> ema_enclave_load() >>>>>> security_enclave_load() calls >>>>>> selinux_enclave_load() >>>>>> otherlsm_enclave_load() >>>>> >>>>> Are you talking about moving EMA somewhere outside LSM? >>>> >>>> Yes. That's what I've been saying all along. >>>> >>>>> If so, where? >>>> >>>> I tried to make it obvious. Put the call to your EMA code >>>> on the line before you call security_enclave_load(). >>> >>> Sorry but I'm still confused. >>> >>> EMA code is used by LSMs only. Making it callable from other parts of >>> the kernel IMHO is probably not a good idea. And more importantly I >>> don't understand the motivation behind it. Would you please elaborate? >> >> LSM modules implement additional access control restrictions. >> The EMA code does not do that, it provides management of data >> that is used by security modules. It is not one itself. VFS >> also performs this role, but no one would consider making VFS >> a security module. > > You are right. EMA is more like a helper library than a real LSM. But > the practical problem is, it has a piece of initialization code, to > basically request some space in the file blob from the LSM > infrastructure. That cannot be done by any LSMs at runtime. So it has to > either be done in LSM infrastructure directly, or make itself an LSM to > make its initialization function invoked by LSM infrastructure > automatically. You have objected to the former, so I switched to the > latter. Are you now objecting to the latter as well? Then what are you > suggesting, really? > > VFS is a completely different story. It's the file system abstraction so > it has a natural place to live in the kernel, and its initialization > doesn't depend on the LSM infrastructure. EMA on the other hand, shall > belong to LSM because it is both produced and consumed within LSM. > > And, Stephen, do you have an opinion on this? I don't really understand Casey's position. I also wouldn't necessarily view his opinion on the matter as necessarily authoritative since he is not the LSM maintainer as far as I know although he has contributed a lot of changes in recent years. I understood the architecture of the original EMA implementation (i.e. a support library to be used by the security modules to help them in implementing their own access control policies), although I do have some concerns about the complexity and lifecycle issues, and wonder if a simpler model as suggested by Dr. Greg isn't feasible. I'd also feel better if there was clear consensus among all of the @intel.com participants that this is the right approach. To date that has seemed elusive. If there were consensus that the EMA approach was the right one and that a simpler model is not feasible, and if the only obstacle to adopting EMA was Casey's objections, then I'd say just put it directly into SELinux and be done with it. I originally thought that was a mistake but if other security modules don't want the support, so be it. > >>>>>>> +/** >>>>>>> + * ema - Enclave Memory Area structure for LSM modules >>>>>> >>>>>> LSM modules is redundant. "LSM" or "LSMs" would be better. >>>>> >>>>> Noted >>>>> >>>>>>> diff --git a/security/Makefile b/security/Makefile >>>>>>> index c598b904938f..b66d03a94853 100644 >>>>>>> --- a/security/Makefile >>>>>>> +++ b/security/Makefile >>>>>>> @@ -28,6 +28,7 @@ obj-$(CONFIG_SECURITY_YAMA) += yama/ >>>>>>> obj-$(CONFIG_SECURITY_LOADPIN) += loadpin/ >>>>>>> obj-$(CONFIG_SECURITY_SAFESETID) += safesetid/ >>>>>>> obj-$(CONFIG_CGROUP_DEVICE) += device_cgroup.o >>>>>>> +obj-$(CONFIG_INTEL_SGX) += commonema.o >>>>>> >>>>>> The config option and the file name ought to match, >>>>>> or at least be closer. >>>>> >>>>> Just trying to match file names as "capability" uses commoncap.c. >>>> >>>> Fine, then you should be using CONFIG_SECURITY_EMA. >>>> >>>>> >>>>> Like I said, this feature could potentially be used by TEEs other >>>>> than SGX. For now, SGX is the only user so it is tied to >>>>> CONFIG_INTEL_SGX. I can rename it to ema.c or enclave.c. Do you >>>>> have a preference? >>>> >>>> Make >>>> CONFIG_SECURITY_EMA >>>> depends on CONFIG_INTEL_SGX >>>> >>>> When another TEE (maybe MIPS_SSRPQ) comes along you can have >>>> >>>> CONFIG_SECURITY_EMA >>>> depends on CONFIG_INTEL_SGX || CONFIG_MIPS_SSRPQ >>> >>> Your suggestions are reasonable. Given such config change wouldn't >>> affect any code, can we do it later, >> >> That doesn't make the current options any less confusing, >> and it will be easier to make the change now than at some >> point in the future. >> >>> e.g., when additional TEEs come online and make use of these new >>> hooks? After all, security_enclave_init() will need amendment anyway >>> as one of its current parameters is of type 'struct sgx_sigstruct', >>> which will need to be replaced with something more generic. At the >>> time being, I'd like to keep things intuitive so as not to confuse >>> reviewers. >> >> Reviewers (including me) are already confused by the inconsistency. > > OK. Let me make this change. > >>>>> >>>>>>> diff --git a/security/commonema.c b/security/commonema.c >>>>>> >>>>>> Put this in a subdirectory. Please. >>>>> >>>>> Then why is commoncap.c located in this directory? I'm just trying >>>>> to match the existing convention. >>>> >>>> commoncap is not optional. It is a base part of the >>>> security subsystem. ema is optional. >>> >>> Alright. I'd move it into a sub-folder and rename it to ema.c. Would >>> you be ok with that? >> >> Sounds fine. > > This is another part that confuses me. Per you comment here, I think you > are OK with EMA being part of LSM (I mean, living somewhere under > security/). But your other comment of calling ema_enclave_load() > alongside security_enclave_load() made me think EMA and LSM were > separate. What do you want really?
On 7/11/19 5:26 AM, Jarkko Sakkinen wrote:
> On Wed, Jul 10, 2019 at 04:16:42PM -0700, Xing, Cedric wrote:
>>> Still puzzling with EXECMOD given that how it is documented in
>>> https://selinuxproject.org/page/ObjectClassesPerms. If anything in that
>>> document is out of date, would be nice if it was updated.
>>
>> If you search for "EXECMOD" in security/selinux/hooks.c in the latest
>> (Linux-5.2) master, you'll find only one occurrence - at line 3702.
>>
>> The logic over there, if translated into English, basically says
>> FILE__EXECMOD is required (on the backing file) if mprotect() is called to
>> request X on a private file mapping that has been modified by the calling
>> process. That's what Sean meant by "W->X".
>
> Looking at that part of code, there is this comment:
>
> /*
> * We are making executable a file mapping that has
> * had some COW done. Since pages might have been
> * written, check ability to execute the possibly
> * modified content. This typically should only
> * occur for text relocations.
> */
>
> There is no COW done with enclaves, never. Thus, EXECMOD does not
> connect in any possible way to SGX. OR, that comment is false.
>
> Which one is it?
>
> Also the official documentation for SELinux speaks only about COW
> mappings.
>
> Also the condition supports all this as a *private* file mapping ends up
> to the anon_vma list when it gets written. We have a *shared* file
> mapping
>
> Nothing that you say makes sense to me, sorry...
The existing permissions don't map cleanly to SGX but I think Sean and
Cedric were trying to make a best-effort approximation to the underlying
concepts in a manner that permits control over the introduction of
executable content.
Sure, the existing EXECMOD check is only applied today when there is an
attempt to make executable a previously modified (detected based on COW
having occurred) private file mapping. But the general notion of
controlling the ability to execute modified content is still meaningful.
In the case of regular files, having both FILE__WRITE and FILE__EXECUTE
to the file is sufficient because that implies the ability to execute
modified content. And those FILE__* checks predated the introduction of
EXECMOD and EXECMEM.
The mapping of /dev/sgx/enclave doesn't really fit existing categories
because it doesn't provide the same semantics as a shared mapping of a
regular file. Userspace will always need both FILE__WRITE and
FILE__EXECUTE to /dev/sgx/enclave.
On Thu, Jul 11, 2019 at 09:51:19AM -0400, Stephen Smalley wrote:
> I'd also feel better if there was clear consensus among all of the
> @intel.com participants that this is the right approach. To date that has
> seemed elusive.
That's a very kind way to phrase things :-)
For initial upstreaming, we've agreed that there is no need to extend the
uapi, i.e. we can punt on deciding between on-the-fly tracking and having
userspace specify maximal permissions until we add SGX2 support.
The last open (knock on wood) for initial upstreaming is whether SELinux
would prefer to have new enclave specific permissions or reuse the
existing PROCESS__EXECMEM, FILE__EXECUTE and FILE__EXECMOD permissions.
My understanding is that enclave specific permissions are preferred.
On 7/11/19 11:12 AM, Sean Christopherson wrote:
> On Thu, Jul 11, 2019 at 09:51:19AM -0400, Stephen Smalley wrote:
>> I'd also feel better if there was clear consensus among all of the
>> @intel.com participants that this is the right approach. To date that has
>> seemed elusive.
>
> That's a very kind way to phrase things :-)
>
> For initial upstreaming, we've agreed that there is no need to extend the
> uapi, i.e. we can punt on deciding between on-the-fly tracking and having
> userspace specify maximal permissions until we add SGX2 support.
>
> The last open (knock on wood) for initial upstreaming is whether SELinux
> would prefer to have new enclave specific permissions or reuse the
> existing PROCESS__EXECMEM, FILE__EXECUTE and FILE__EXECMOD permissions.
> My understanding is that enclave specific permissions are preferred.
I was left unclear on this topic after the email exchanges with Cedric.
There are at least three options:
1) Reuse the existing EXECMEM, EXECUTE, and EXECMOD permissions. Pros:
Existing distro policies will be applied in the expected manner with
respect to the introduction of executable code into the system,
consistent control will be provided over the enclave and the host
process, no change for users/documentation wrt policy. Cons: Existing
permissions don't map exactly to SGX semantics, no ability to
distinguish executable content within the enclave versus the host
process at the LSM level (argued earlier by Cedric to be unnecessary and
perhaps meaningless), need to allow FILE__EXECUTE or other checks on
sigstruct files that may not actually contain code.
2) Define new permissions within existing security classes (e.g.
process2, file). Pros: Can tailor permission names and definitions to
SGX semantics, ability to distinguish enclave versus host process
execute access, no need to grant FILE__EXECUTE to sigstruct files, class
matches the target object, permissions computed and cached upon existing
checks (i.e. when a process accesses a file, all of the permissions to
that file are computed and then cached at once, including the
enclave-related ones). Cons: Typical distro policies (unlike Android)
allow unknown permissions by default for forward kernel compatibility
reasons, so existing policies will permit these new permissions by
default and enforcement will only truly take effect once policies are
updated, adding new permissions to existing classes requires an update
to the base policy (so they can't be shipped as a third party policy
module alongside the SGX driver or installed as a local module by an
admin, for example), documentation/user education required for new
permissions.
3) Define new permissions in new security classes (e.g. enclave). Pros
relative to #2: New classes and permissions can be defined and installed
in third party or local policy module without requiring a change to the
base policy. Cons relative to #2: Class won't correspond to the target
object, permissions won't be computed and cached upon existing checks
(only when performing the checks against the new classes).
Combinations are also possible, of course.
On Thu, Jul 11, 2019 at 12:11:06PM -0400, Stephen Smalley wrote:
> On 7/11/19 11:12 AM, Sean Christopherson wrote:
> >On Thu, Jul 11, 2019 at 09:51:19AM -0400, Stephen Smalley wrote:
> >>I'd also feel better if there was clear consensus among all of the
> >>@intel.com participants that this is the right approach. To date that has
> >>seemed elusive.
> >
> >That's a very kind way to phrase things :-)
> >
> >For initial upstreaming, we've agreed that there is no need to extend the
> >uapi, i.e. we can punt on deciding between on-the-fly tracking and having
> >userspace specify maximal permissions until we add SGX2 support.
> >
> >The last open (knock on wood) for initial upstreaming is whether SELinux
> >would prefer to have new enclave specific permissions or reuse the
> >existing PROCESS__EXECMEM, FILE__EXECUTE and FILE__EXECMOD permissions.
> >My understanding is that enclave specific permissions are preferred.
>
> I was left unclear on this topic after the email exchanges with Cedric.
> There are at least three options:
>
> 1) Reuse the existing EXECMEM, EXECUTE, and EXECMOD permissions. Pros:
> Existing distro policies will be applied in the expected manner with respect
> to the introduction of executable code into the system, consistent control
> will be provided over the enclave and the host process, no change for
> users/documentation wrt policy. Cons: Existing permissions don't map
> exactly to SGX semantics, no ability to distinguish executable content
> within the enclave versus the host process at the LSM level (argued earlier
> by Cedric to be unnecessary and perhaps meaningless), need to allow
> FILE__EXECUTE or other checks on sigstruct files that may not actually
> contain code.
>
> 2) Define new permissions within existing security classes (e.g. process2,
> file). Pros: Can tailor permission names and definitions to SGX semantics,
> ability to distinguish enclave versus host process execute access, no need
> to grant FILE__EXECUTE to sigstruct files, class matches the target object,
> permissions computed and cached upon existing checks (i.e. when a process
> accesses a file, all of the permissions to that file are computed and then
> cached at once, including the enclave-related ones). Cons: Typical distro
> policies (unlike Android) allow unknown permissions by default for forward
> kernel compatibility reasons, so existing policies will permit these new
> permissions by default and enforcement will only truly take effect once
> policies are updated, adding new permissions to existing classes requires an
> update to the base policy (so they can't be shipped as a third party policy
> module alongside the SGX driver or installed as a local module by an admin,
> for example), documentation/user education required for new permissions.
>
> 3) Define new permissions in new security classes (e.g. enclave). Pros
> relative to #2: New classes and permissions can be defined and installed in
> third party or local policy module without requiring a change to the base
> policy. Cons relative to #2: Class won't correspond to the target object,
> permissions won't be computed and cached upon existing checks (only when
> performing the checks against the new classes).
>
> Combinations are also possible, of course.
What's the impact on distros/ecosystems if we go with #1 for now and later
decide to switch to #2 after upstreaming? I.e. can we take a minimal-ish
approach now without painting ourselves into a corner?
We can map quite closely to the existing intent of EXECUTE, EXECMOD and
EXECMEM via a combination of checking protections at enclave load time and
again at mmap()/mprotect(), e.g.:
#ifdef CONFIG_INTEL_SGX
static inline int enclave_has_perm(u32 sid, u32 requested)
{
return avc_has_perm(&selinux_state, sid, sid,
SECCLASS_PROCESS, requested, NULL);
}
static int selinux_enclave_map(unsigned long prot)
{
const struct cred *cred = current_cred();
u32 sid = cred_sid(cred);
if ((prot & PROT_EXEC) && (prot & PROT_WRITE))
return enclave_has_perm(sid, PROCESS__EXECMEM);
return 0;
}
static int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot)
{
const struct cred *cred = current_cred();
u32 sid = cred_sid(cred);
int ret;
/* Only executable enclave pages are restricted in any way. */
if (!(prot & PROT_EXEC))
return 0;
if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file))) {
ret = enclave_has_perm(sid, PROCESS__EXECMEM);
} else {
ret = file_has_perm(cred, vma->vm_file, FILE__EXECUTE);
if (ret)
goto out;
/*
* Load code from a modified private mapping or from a file
* with the ability to do W->X within the enclave.
*/
if (vma->anon_vma || (prot & PROT_WRITE))
ret = file_has_perm(cred, vma->vm_file,
FILE__EXECMOD);
}
out:
return ret;
}
#endif
On 7/11/19 12:25 PM, Sean Christopherson wrote: > On Thu, Jul 11, 2019 at 12:11:06PM -0400, Stephen Smalley wrote: >> On 7/11/19 11:12 AM, Sean Christopherson wrote: >>> On Thu, Jul 11, 2019 at 09:51:19AM -0400, Stephen Smalley wrote: >>>> I'd also feel better if there was clear consensus among all of the >>>> @intel.com participants that this is the right approach. To date that has >>>> seemed elusive. >>> >>> That's a very kind way to phrase things :-) >>> >>> For initial upstreaming, we've agreed that there is no need to extend the >>> uapi, i.e. we can punt on deciding between on-the-fly tracking and having >>> userspace specify maximal permissions until we add SGX2 support. >>> >>> The last open (knock on wood) for initial upstreaming is whether SELinux >>> would prefer to have new enclave specific permissions or reuse the >>> existing PROCESS__EXECMEM, FILE__EXECUTE and FILE__EXECMOD permissions. >>> My understanding is that enclave specific permissions are preferred. >> >> I was left unclear on this topic after the email exchanges with Cedric. >> There are at least three options: >> >> 1) Reuse the existing EXECMEM, EXECUTE, and EXECMOD permissions. Pros: >> Existing distro policies will be applied in the expected manner with respect >> to the introduction of executable code into the system, consistent control >> will be provided over the enclave and the host process, no change for >> users/documentation wrt policy. Cons: Existing permissions don't map >> exactly to SGX semantics, no ability to distinguish executable content >> within the enclave versus the host process at the LSM level (argued earlier >> by Cedric to be unnecessary and perhaps meaningless), need to allow >> FILE__EXECUTE or other checks on sigstruct files that may not actually >> contain code. >> >> 2) Define new permissions within existing security classes (e.g. process2, >> file). Pros: Can tailor permission names and definitions to SGX semantics, >> ability to distinguish enclave versus host process execute access, no need >> to grant FILE__EXECUTE to sigstruct files, class matches the target object, >> permissions computed and cached upon existing checks (i.e. when a process >> accesses a file, all of the permissions to that file are computed and then >> cached at once, including the enclave-related ones). Cons: Typical distro >> policies (unlike Android) allow unknown permissions by default for forward >> kernel compatibility reasons, so existing policies will permit these new >> permissions by default and enforcement will only truly take effect once >> policies are updated, adding new permissions to existing classes requires an >> update to the base policy (so they can't be shipped as a third party policy >> module alongside the SGX driver or installed as a local module by an admin, >> for example), documentation/user education required for new permissions. >> >> 3) Define new permissions in new security classes (e.g. enclave). Pros >> relative to #2: New classes and permissions can be defined and installed in >> third party or local policy module without requiring a change to the base >> policy. Cons relative to #2: Class won't correspond to the target object, >> permissions won't be computed and cached upon existing checks (only when >> performing the checks against the new classes). >> >> Combinations are also possible, of course. > > What's the impact on distros/ecosystems if we go with #1 for now and later > decide to switch to #2 after upstreaming? I.e. can we take a minimal-ish > approach now without painting ourselves into a corner? Yes, I think that's fine. > We can map quite closely to the existing intent of EXECUTE, EXECMOD and > EXECMEM via a combination of checking protections at enclave load time and > again at mmap()/mprotect(), e.g.: > > #ifdef CONFIG_INTEL_SGX > static inline int enclave_has_perm(u32 sid, u32 requested) > { > return avc_has_perm(&selinux_state, sid, sid, > SECCLASS_PROCESS, requested, NULL); > } > > static int selinux_enclave_map(unsigned long prot) > { > const struct cred *cred = current_cred(); > u32 sid = cred_sid(cred); > > if ((prot & PROT_EXEC) && (prot & PROT_WRITE)) > return enclave_has_perm(sid, PROCESS__EXECMEM); > > return 0; > } > > static int selinux_enclave_load(struct vm_area_struct *vma, unsigned long prot) > { > const struct cred *cred = current_cred(); > u32 sid = cred_sid(cred); > int ret; > > /* Only executable enclave pages are restricted in any way. */ > if (!(prot & PROT_EXEC)) > return 0; > > if (!vma->vm_file || IS_PRIVATE(file_inode(vma->vm_file))) { > ret = enclave_has_perm(sid, PROCESS__EXECMEM); > } else { > ret = file_has_perm(cred, vma->vm_file, FILE__EXECUTE); > if (ret) > goto out; > > /* > * Load code from a modified private mapping or from a file > * with the ability to do W->X within the enclave. > */ > if (vma->anon_vma || (prot & PROT_WRITE)) > ret = file_has_perm(cred, vma->vm_file, > FILE__EXECMOD); > } > > out: > return ret; > } > #endif >
On Thu, Jul 11, 2019 at 10:32:41AM -0400, Stephen Smalley wrote: > The existing permissions don't map cleanly to SGX but I think Sean and > Cedric were trying to make a best-effort approximation to the underlying > concepts in a manner that permits control over the introduction of > executable content. > > Sure, the existing EXECMOD check is only applied today when there is an > attempt to make executable a previously modified (detected based on COW > having occurred) private file mapping. But the general notion of > controlling the ability to execute modified content is still meaningful. OK to summarize EXECMOD does not connect with SGX in any possible way but SGX needs something that mimics EXECMOD behaviour? And the same probably goes for EXECMEM, which is also private mapping related concept. > In the case of regular files, having both FILE__WRITE and FILE__EXECUTE to > the file is sufficient because that implies the ability to execute modified > content. And those FILE__* checks predated the introduction of EXECMOD and > EXECMEM. Right. > The mapping of /dev/sgx/enclave doesn't really fit existing categories > because it doesn't provide the same semantics as a shared mapping of a > regular file. Userspace will always need both FILE__WRITE and FILE__EXECUTE > to /dev/sgx/enclave. Right. Yeah, that has been the confusing part for me that they've been shuffled in the discussion together with SGX concepts and hooks like there was any connection. Thank you for clarifying this! /Jarkko
On 7/11/2019 9:32 AM, Stephen Smalley wrote:
> On 7/11/19 12:25 PM, Sean Christopherson wrote:
>> On Thu, Jul 11, 2019 at 12:11:06PM -0400, Stephen Smalley wrote:
>>> On 7/11/19 11:12 AM, Sean Christopherson wrote:
>>>> On Thu, Jul 11, 2019 at 09:51:19AM -0400, Stephen Smalley wrote:
>>>>> I'd also feel better if there was clear consensus among all of the
>>>>> @intel.com participants that this is the right approach. To date
>>>>> that has
>>>>> seemed elusive.
>>>>
>>>> That's a very kind way to phrase things :-)
>>>>
>>>> For initial upstreaming, we've agreed that there is no need to
>>>> extend the
>>>> uapi, i.e. we can punt on deciding between on-the-fly tracking and
>>>> having
>>>> userspace specify maximal permissions until we add SGX2 support.
>>>>
>>>> The last open (knock on wood) for initial upstreaming is whether
>>>> SELinux
>>>> would prefer to have new enclave specific permissions or reuse the
>>>> existing PROCESS__EXECMEM, FILE__EXECUTE and FILE__EXECMOD permissions.
>>>> My understanding is that enclave specific permissions are preferred.
>>>
>>> I was left unclear on this topic after the email exchanges with Cedric.
>>> There are at least three options:
>>>
>>> 1) Reuse the existing EXECMEM, EXECUTE, and EXECMOD permissions. Pros:
>>> Existing distro policies will be applied in the expected manner with
>>> respect
>>> to the introduction of executable code into the system, consistent
>>> control
>>> will be provided over the enclave and the host process, no change for
>>> users/documentation wrt policy. Cons: Existing permissions don't map
>>> exactly to SGX semantics, no ability to distinguish executable content
>>> within the enclave versus the host process at the LSM level (argued
>>> earlier
>>> by Cedric to be unnecessary and perhaps meaningless), need to allow
>>> FILE__EXECUTE or other checks on sigstruct files that may not actually
>>> contain code.
>>>
>>> 2) Define new permissions within existing security classes (e.g.
>>> process2,
>>> file). Pros: Can tailor permission names and definitions to SGX
>>> semantics,
>>> ability to distinguish enclave versus host process execute access, no
>>> need
>>> to grant FILE__EXECUTE to sigstruct files, class matches the target
>>> object,
>>> permissions computed and cached upon existing checks (i.e. when a
>>> process
>>> accesses a file, all of the permissions to that file are computed and
>>> then
>>> cached at once, including the enclave-related ones). Cons: Typical
>>> distro
>>> policies (unlike Android) allow unknown permissions by default for
>>> forward
>>> kernel compatibility reasons, so existing policies will permit these new
>>> permissions by default and enforcement will only truly take effect once
>>> policies are updated, adding new permissions to existing classes
>>> requires an
>>> update to the base policy (so they can't be shipped as a third party
>>> policy
>>> module alongside the SGX driver or installed as a local module by an
>>> admin,
>>> for example), documentation/user education required for new permissions.
>>>
>>> 3) Define new permissions in new security classes (e.g. enclave). Pros
>>> relative to #2: New classes and permissions can be defined and
>>> installed in
>>> third party or local policy module without requiring a change to the
>>> base
>>> policy. Cons relative to #2: Class won't correspond to the target
>>> object,
>>> permissions won't be computed and cached upon existing checks (only when
>>> performing the checks against the new classes).
>>>
>>> Combinations are also possible, of course.
>>
>> What's the impact on distros/ecosystems if we go with #1 for now and
>> later
>> decide to switch to #2 after upstreaming? I.e. can we take a minimal-ish
>> approach now without painting ourselves into a corner?
>
> Yes, I think that's fine.
I can't agree more on this. It's easier to add new things than to take
existing things out. We can just wait until usages come up that really
require new permissions.
On 7/11/2019 10:51 AM, Jarkko Sakkinen wrote:
> On Thu, Jul 11, 2019 at 10:32:41AM -0400, Stephen Smalley wrote:
>> The existing permissions don't map cleanly to SGX but I think Sean and
>> Cedric were trying to make a best-effort approximation to the underlying
>> concepts in a manner that permits control over the introduction of
>> executable content.
>>
>> Sure, the existing EXECMOD check is only applied today when there is an
>> attempt to make executable a previously modified (detected based on COW
>> having occurred) private file mapping. But the general notion of
>> controlling the ability to execute modified content is still meaningful.
>
> OK to summarize EXECMOD does not connect with SGX in any possible way
> but SGX needs something that mimics EXECMOD behaviour?
Stephen may correct me if I'm wrong. EXECMOD is granted to files, to
indicate the bearer contains self-modifying code (or text relocation).
So if it applies the enclaves, there are two aspects of it:
(1) An enclave may be loaded from multiple image files, among which some
may contain self-modifying code and hence would require EXECMOD on each
of them. At runtime, W->X will be allowed/denied for pages loaded from
image files having/not having EXECMOD.
(2) But there are pages not loaded from any files - e.g. pages EAUG'ed
at runtime. We are trying to use the file containing SIGSTRUCT as the
"proxy file" - i.e. EXECMOD on the proxy file indicates the enclave may
load code into EAUG'ed pages at runtime.
(3) Well, this is not an aspect of EXECMOD. Yet there's another category
of pages - pages EADD'ed from anonymous memory. They are different than
EAUG'ed pages in that their contents are supplied by untrusted code. How
to control their accesses is still being debated. My argument was that
the source pages must be executable before they could be loaded
executable in EPC. Andy argued that SIGSTRUCT alone could be considered
sufficient validation on the contents in certain usages, so both Sean
and I had proposed PROCESS2__ENCLAVE_EXECANON as a new permission. But
for the 1st upstream I think we all agree to do just the minimum until
real requirements come up. After all, adding is generally easier than
taking away existing things.
On Thu, Jul 11, 2019 at 3:23 AM Dr. Greg <greg@idfusion.net> wrote:
>
> On Mon, Jul 08, 2019 at 05:02:00PM -0700, Casey Schaufler wrote:
>
> > > On 7/7/2019 6:30 AM, Dr. Greg wrote:
> > > All well taken points from an implementation perspective, but they
> > > elide the point I was trying to make. Which is the fact that without
> > > any semblance of a discussion regarding the requirements needed to
> > > implement a security architecture around the concept of a TEE, this
> > > entire process, despite Cedric's well intentioned efforts, amounts to
> > > pounding a square solution into the round hole of a security problem.
>
> > Lead with code. I love a good requirements document, but one of the
> > few places where I agree with the agile folks is that working code
> > speaks loudly.
> >
> > > Which, as I noted in my e-mail, is tantamount to security theater.
> >
> > Not buying that. Not rejecting it, either. Without code
> > to judge it's kind of hard to say.
>
> We tried the code approach.
>
You sent code. That code did not, in any respect, address the issue
of how LSMs were supposed to control what code got executed.
Do you have an actual suggestion here that we should pay attention to?