On 7/28/20 12:16 PM, Andy Lutomirski wrote:
> On Tue, Jul 28, 2020 at 9:32 AM Madhavan T. Venkataraman
> <madvenka@linux.microsoft.com> wrote:
>> Thanks. See inline..
>>
>> On 7/28/20 10:13 AM, David Laight wrote:
>>> From:  madvenka@linux.microsoft.com
>>>> Sent: 28 July 2020 14:11
>>> ...
>>>> The kernel creates the trampoline mapping without any permissions. When
>>>> the trampoline is executed by user code, a page fault happens and the
>>>> kernel gets control. The kernel recognizes that this is a trampoline
>>>> invocation. It sets up the user registers based on the specified
>>>> register context, and/or pushes values on the user stack based on the
>>>> specified stack context, and sets the user PC to the requested target
>>>> PC. When the kernel returns, execution continues at the target PC.
>>>> So, the kernel does the work of the trampoline on behalf of the
>>>> application.
>>> Isn't the performance of this going to be horrid?
>> It takes about the same amount of time as getpid(). So, it is
>> one quick trip into the kernel. I expect that applications will
>> typically not care about this extra overhead as long as
>> they are able to run.
> What did you test this on?  A page fault on any modern x86_64 system
> is much, much, much, much slower than a syscall.

I tested it in on a KVM guest running Ubuntu. So, when you say
that a page fault is much slower, do you mean a regular page
fault that is handled through the VM layer? Here is the relevant code
in do_user_addr_fault():

            if (unlikely(access_error(hw_error_code, vma))) {
                    /*
                     * If it is a user execute fault, it could be a trampoline
                     * invocation.
                     */
                    if ((hw_error_code & tflags) == tflags &&
                        trampfd_fault(vma, regs)) {
                            up_read(&mm->mmap_sem);
                            return;
                    }
                    bad_area_access_error(regs, hw_error_code, address, vma);
                    return;
            }

            /*
             * If for any reason at all we couldn't handle the fault,
             * make sure we exit gracefully rather than endlessly redo
             * the fault.  Since we never set FAULT_FLAG_RETRY_NOWAIT, if
             * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
             *
             * Note that handle_userfault() may also release and reacquire mmap_sem
             * (and not return with VM_FAULT_RETRY), when returning to userland to
             * repeat the page fault later with a VM_FAULT_NOPAGE retval
             * (potentially after handling any pending signal during the return to
             * userland). The return to userland is identified whenever
             * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
             */
            fault = handle_mm_fault(vma, address, flags);

trampfd faults are instruction faults that go through a different code
path than the one that calls handle_mm_fault().

Could you clarify?

Thanks.

Madhavan