On 2019-06-10 11:53, Sean Christopherson wrote: > On Fri, Jun 07, 2019 at 12:32:23PM -0700, Andy Lutomirski wrote: >> >>> On Jun 6, 2019, at 10:32 AM, Sean Christopherson wrote: >>> >>>> On Thu, Jun 06, 2019 at 10:20:38AM -0700, Andy Lutomirski wrote: >>>> On Wed, Jun 5, 2019 at 12:49 PM Sean Christopherson >>>> wrote: >>>>> >>>>> For some enclaves, e.g. an enclave with a small code footprint and a >>>>> large working set, the vast majority of pages added to the enclave are >>>>> zero pages. Introduce a flag to denote such zero pages. The major >>>>> benefit of the flag will be realized in a future patch to use Linux's >>>>> actual zero page as the source, as opposed to explicitly zeroing the >>>>> enclave's backing memory. >>>>> >>>> >>>> I feel like I probably asked this at some point, but why is there a >>>> workqueue here at all? >>> >>> Performance. A while back I wrote a patch set to remove the worker queue >>> and discovered that it tanks enclave build time when the enclave is being >>> hosted by a Golang application. Here's a snippet from a mail discussing >>> the code. >>> >>> The bad news is that I don't think we can remove the add page worker >>> as applications with userspace schedulers, e.g. Go's M:N scheduler, >>> can see a 10x or more throughput improvement when using the worker >>> queue. I did a bit of digging for the Golang case to make sure I >>> wasn't doing something horribly stupid/naive and found that it's a >>> generic issue in Golang with blocking (or just long-running) system >>> calls. Because Golang multiplexes Goroutines on top of OS threads, >>> blocking syscalls introduce latency and context switching overhead, >>> e.g. Go's scheduler will spin up a new OS thread to service other >>> Goroutines after it realizes the syscall has blocked, and will later >>> destroy one of the OS threads so that it doesn't build up too many >>> unused. >>> >>> IIRC, the scenario is spinning up several goroutines, each building an >>> enclave. I played around with adding a flag to do a synchronous EADD >>> but didn't see a meaningful change in performance for the simple case. >>> Supporting both the worker queue and direct paths was complex enough >>> that I decided it wasn't worth the trouble for initial upstreaming. >> >> Sigh. >> >> It seems silly to add a workaround for a language that has trouble calling >> somewhat-but-not-too-slow syscalls or ioctls. >> >> How about fixing this in Go directly? Either convince the golang people to >> add a way to allocate a real thread for a particular region of code or have >> the Go SGX folks write a bit of C code to do a whole bunch of ioctls and >> have Go call *that*. Then the mess stays in Go where it belongs. > > Actually, I'm pretty sure changing the ioctl() from ADD_PAGE to ADD_REGION > would eliminate the worst of the golang slowdown without requiring > userspace to get super fancy. I'm in favor of eliminating the work queue, > especially if the UAPI is changed to allow adding multiple pages in a > single syscall. > I don't know if this is going to matter a whole lot, but have you considered the performance impact of needing to the EPC paging while doing the EADD ioctl and how this interacts with having a workqueue? -- Jethro Beekman | Fortanix