On 7/28/20 12:16 PM, Andy Lutomirski wrote:
On Tue, Jul 28, 2020 at 9:32 AM Madhavan T. Venkataraman
<madvenka@linux.microsoft.com> wrote:
Thanks. See inline..

On 7/28/20 10:13 AM, David Laight wrote:
From:  madvenka@linux.microsoft.com
Sent: 28 July 2020 14:11
...
The kernel creates the trampoline mapping without any permissions. When
the trampoline is executed by user code, a page fault happens and the
kernel gets control. The kernel recognizes that this is a trampoline
invocation. It sets up the user registers based on the specified
register context, and/or pushes values on the user stack based on the
specified stack context, and sets the user PC to the requested target
PC. When the kernel returns, execution continues at the target PC.
So, the kernel does the work of the trampoline on behalf of the
application.
Isn't the performance of this going to be horrid?
It takes about the same amount of time as getpid(). So, it is
one quick trip into the kernel. I expect that applications will
typically not care about this extra overhead as long as
they are able to run.
What did you test this on?  A page fault on any modern x86_64 system
is much, much, much, much slower than a syscall.

I tested it in on a KVM guest running Ubuntu. So, when you say
that a page fault is much slower, do you mean a regular page
fault that is handled through the VM layer? Here is the relevant code
in do_user_addr_fault():

        if (unlikely(access_error(hw_error_code, vma))) {
                /*
                 * If it is a user execute fault, it could be a trampoline
                 * invocation.
                 */
                if ((hw_error_code & tflags) == tflags &&
                    trampfd_fault(vma, regs)) {
                        up_read(&mm->mmap_sem);
                        return;
                }
                bad_area_access_error(regs, hw_error_code, address, vma);
                return;
        }

        /*
         * If for any reason at all we couldn't handle the fault,
         * make sure we exit gracefully rather than endlessly redo
         * the fault.  Since we never set FAULT_FLAG_RETRY_NOWAIT, if
         * we get VM_FAULT_RETRY back, the mmap_sem has been unlocked.
         *
         * Note that handle_userfault() may also release and reacquire mmap_sem
         * (and not return with VM_FAULT_RETRY), when returning to userland to
         * repeat the page fault later with a VM_FAULT_NOPAGE retval
         * (potentially after handling any pending signal during the return to
         * userland). The return to userland is identified whenever
         * FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in flags.
         */
        fault = handle_mm_fault(vma, address, flags);
trampfd faults are instruction faults that go through a different code
path than the one that calls handle_mm_fault().

Could you clarify?

Thanks.

Madhavan