On 7/28/20 12:16 PM, Andy Lutomirski wrote: > On Tue, Jul 28, 2020 at 9:32 AM Madhavan T. Venkataraman > wrote: >> Thanks. See inline.. >> >> On 7/28/20 10:13 AM, David Laight wrote: >>> From: madvenka@linux.microsoft.com >>>> Sent: 28 July 2020 14:11 >>> ... >>>> The kernel creates the trampoline mapping without any permissions. When >>>> the trampoline is executed by user code, a page fault happens and the >>>> kernel gets control. The kernel recognizes that this is a trampoline >>>> invocation. It sets up the user registers based on the specified >>>> register context, and/or pushes values on the user stack based on the >>>> specified stack context, and sets the user PC to the requested target >>>> PC. When the kernel returns, execution continues at the target PC. >>>> So, the kernel does the work of the trampoline on behalf of the >>>> application. >>> Isn't the performance of this going to be horrid? >> It takes about the same amount of time as getpid(). So, it is >> one quick trip into the kernel. I expect that applications will >> typically not care about this extra overhead as long as >> they are able to run. > What did you test this on? A page fault on any modern x86_64 system > is much, much, much, much slower than a syscall. I tested it in on a KVM guest running Ubuntu. So, when you say that a page fault is much slower, do you mean a regular page fault that is handled through the VM layer? Here is the relevant code in do_user_addr_fault():         if (unlikely(access_error(hw_error_code, vma))) {                 /*                  * If it is a user execute fault, it could be a trampoline                  * invocation.                  */                 if ((hw_error_code & tflags) == tflags &&                     trampfd_fault(vma, regs)) {                         up_read(&mm->mmap_sem);                         return;                 }                 bad_area_access_error(regs, hw_error_code, address, vma);                 return;         }         /*          * If for any reason at all we couldn't handle the fault,          * make sure we exit gracefully rather than endlessly redo          * the fault.  Since we never set FAULT_FLAG_RETRY_NOWAIT, if          * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.          *          * Note that handle_userfault() may also release and reacquire mmap_sem          * (and not return with VM_FAULT_RETRY), when returning to userland to          * repeat the page fault later with a VM_FAULT_NOPAGE retval          * (potentially after handling any pending signal during the return to          * userland). The return to userland is identified whenever          * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.          */         fault = handle_mm_fault(vma, address, flags); trampfd faults are instruction faults that go through a different code path than the one that calls handle_mm_fault(). Could you clarify? Thanks. Madhavan