On 2019-10-11 20:15, Sean Christopherson wrote: > On Fri, Oct 11, 2019 at 04:37:25PM +0000, Jethro Beekman wrote: >> UAPI: >> >> This got a whole lot more complex for userspace compared to the out-of-tree >> driver. >> >> 1. Manually needing to mmap a naturally-aligned memory region by allocating >> too much memory and then unmapping parts is quite annoying. Why was the >> auto-aligning removed? I think this will need to be handled the same for >> every consumer of SGX, so I don't see why this is not handled in the kernel. >> It never seems wrong to align if NULL is passed as the requested address. >> Alternatively, is there room in the flags for a MAP_ALIGNED bit? > > I'm pretty sure everyone agrees it's annoying. The short of it is that > the SGX driver is the wrong place to do the alignment. The driver could > key off addr=0, but we don't want to take on that implicit behavior. Why not? > A MAP_ALIGNED flag to have the allocation be naturally aligned is the > ideal solution. It's definitely something we should pursue, but that can > and probably should be done in parallel to the SGX series. > >> 2. Having to re-open the device for every enclave is also annoying. This >> means you need a filesystem available throughout the process lifetime. I >> tried dup, but that doesn't work. Can we make dup work? > > The semantics of dup() won't get you what want, as dup() just creates a > new descriptor pointing at the same file. > > An alternative solution that was proposed was to have an ioctl() for > creating an enclave. But that means using an anonymous inode, which runs > afoul of SELinux permissions, e.g. every _process_ that runs enclaves > would require EXECMEM. Linus was quite clear that SGX wouldn't be merged > if using it required users to degrade existing security. It's ok if it's the same inode, it just needs to be a different struct file. > I'm open to other ideas. I wasn't aware this was a pain point and file > stuff isn't exactly my area of expertise, so I haven't put much/any > thought into alternatives. The default permissions for /dev/sgx/enclave are root-only. This means you want to be able to do the same thing as network servers: initialize some resources as root, then drop privileges. This used to mean opening /dev/sgx and keeping the fd around which meant you could launch enclaves at will. With the new API, this is no longer possible, you can only launch one enclave per fd. Is there a different type of operation that doesn't just duplicate the fd but also the struct file? If not, can we add an ioctl for that? There are other scenarios where it's not just the permissions on /dev/sgx/enclave that are the problem but using the filesystem in general that is. Maybe you've used seccomp to disable file operations, etc. >> 3. Needing to mprotect every page with the precise permissions needed after >> EINIT is really bad. This means I have to remember this data for every page >> between EADD and EINIT. I don't care about SELinux, I trust the ECPM will do >> its job for me. Can we make it so that I can protect the whole range at once, >> or protect the individual pages at EADD time? > > You can mprotect() or mmap(..., MAP_FIXED) an enclave range once all > pages covered by the specified range have been added to the enclave, i.e. > at EADD. I double checked this with the selftest. Holler if you're > seeing different behavior. I'd swear I tried this flow and I was getting EACCES. But I implemented it again now, and it works fine. So this is not an issue. I also saw mmap(..., MAP_FIXED) being used in the selftest. Is there a reason to use this over mprotect? >> VDSO: >> >> It is *difficult* to link to weakly link to a symbol in the VDSO. Anyway, I >> figured it out. >> >> 1. What if I don't want to automatically ERESUME after kernel interrupt? > > Do EENTER/ERESUME directly instead of going through the vDSO. That kind of defeats the point. >> 2. I normally do a sanity check after ENCLU[EENTER] that EAX = EEXIT. The >> current implementation just clears EAX instead without looking at it. > > Hmm, the only reason I can think of for checking EAX would be to support > userspace mucking with EAX in a #DB/#BP signal handler. At that point, I > would expect the signal handler to modify RIP as well. Reaching the XOR > via any other non-EEXIT path would require a kernel bug. Or a CPU bug. > Was there a specific scenario or use case you had in mind? I'm not > against adding a check, I just don't see what value it would provide. Nothing specific. It just seems like a prudent thing to do when messing with control flow in unexpected ways. Alternatively, just remove the xor, and change the API of the function so that 3 is the normal return value? Then the user can decide themselves if they think the check is worth it. -- Jethro Beekman | Fortanix