From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=0.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, FSL_HELO_FAKE,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C4217C636CA for ; Sun, 18 Jul 2021 01:38:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9B4646115A for ; Sun, 18 Jul 2021 01:38:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232845AbhGRBlY (ORCPT ); Sat, 17 Jul 2021 21:41:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53782 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231351AbhGRBlX (ORCPT ); Sat, 17 Jul 2021 21:41:23 -0400 Received: from mail-pl1-x62a.google.com (mail-pl1-x62a.google.com [IPv6:2607:f8b0:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBE3EC061762; Sat, 17 Jul 2021 18:38:24 -0700 (PDT) Received: by mail-pl1-x62a.google.com with SMTP id x16so7626676plg.3; Sat, 17 Jul 2021 18:38:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=WEdBd0twgdRTUN6zbU42jso5KNi/Ttb0Ghjj44rsXrs=; b=WEkUZCnkeLGPvpSAl90lA8V4IZev7XrxWgU8z+ZZ3KkFUF8rK4e9lJRwMsVqk/rgDk ORNlmoVAHX9zuz7DJZbh1TsBzlQvPnlHZ+qVPg4lbXEcDWCUiQb7a2ecQkI28DzArToL 9fikyeV20oE1Vc4AUlp2+U+64ANu3oOG3K9/NLrgRzGUcirNQMdYUhjCobwQ26sgeEL2 ppEUkLQ6xCDzAo1Fq8sFf93lhHl0zmzQPLcwxvm3m0ZkRdwcSScVoAcf92Vdxdo3PM0j t3x+IYlrynoaymyLDVjWdgG/iUAgaAgJbqyIDkTZzJhuQW3lT/rJyBy4pBqyY8U3k++8 sI8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=WEdBd0twgdRTUN6zbU42jso5KNi/Ttb0Ghjj44rsXrs=; b=B1OiGc2W4NdPSKSVSz1Ciep4ORfJFG/QDanJJD+nNj06sFHZkCBBXcu4qs6JHcwYZ4 w3EFyJjGK78FJMoHuUl/oDUx8K24SHfakh3ZZBr+H3m+6wllBOHpkKKXl85nqPNppUDH PIJ/Jra3KOL0ymZnkQmyG8nY8hZXVoahL1ry7/GL1uYeyjuJwEy3ItI1zz8X6K9QQ1Gk piyrzY+Za4qu+HWjTCwaqoHeb5ZjLUqkDWUbHofX/8+r08u42vVfiq3ObDFBCuo0wsnd TL4PDerIxhT+Eh97e9zq1ir2nVjtgVUSkd92hj4UV2rAEPR1osW+Gx9oOi//YTBlw0Kz qOVA== X-Gm-Message-State: AOAM5323lluOwmzPjSRMF9PPcvFvQm9QW76xWr4IwBHHgURBdwiLlHZM Lr1gtUx62nJdWd38hil1WxQ= X-Google-Smtp-Source: ABdhPJzXist5OPRGItTQ3oVS05KQC1wCjZ/gM0LnpufPstiTA6CQfKmzmJbNaY/pSIeyZXB/vNX8JQ== X-Received: by 2002:a17:902:f549:b029:12b:4d26:c7e1 with SMTP id h9-20020a170902f549b029012b4d26c7e1mr13385910plf.45.1626572304140; Sat, 17 Jul 2021 18:38:24 -0700 (PDT) Received: from gmail.com ([2601:600:8500:5f14:d627:c51e:516e:a105]) by smtp.gmail.com with ESMTPSA id r10sm14440229pff.7.2021.07.17.18.38.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 17 Jul 2021 18:38:23 -0700 (PDT) Date: Sat, 17 Jul 2021 18:34:39 -0700 From: Andrei Vagin To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com, Andrew Morton , Anton Ivanov , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, Ingo Molnar , Jeff Dike , Mike Rapoport , Michael Kerrisk , Oleg Nesterov , Peter Zijlstra , Richard Weinberger , Thomas Gleixner Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space Message-ID: References: <20210414055217.543246-1-avagin@gmail.com> <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jul 02, 2021 at 03:44:41PM -0700, Andy Lutomirski wrote: > On 4/13/21 10:52 PM, Andrei Vagin wrote: > > > process_vm_exec has two modes: > > > > * Execute code in an address space of a target process and stop on any > > signal or system call. > > We already have a perfectly good context switch mechanism: context > switches. If you execute code, you are basically guaranteed to be > subject to being hijacked, which means you pretty much can't allow > syscalls. But there's a lot of non-syscall state, and I think context > switching needs to be done with extreme care. > > (Just as example, suppose you switch mms, then set %gs to point to the > LDT, then switch back. Now you're in a weird state. With %ss the plot > is a bit thicker. And there are emulated vsyscalls and such.) > > If you, PeterZ, and the UMCG could all find an acceptable, efficient way > to wake-and-wait so you can switch into an injected task in the target > process and switch back quickly, then I think a much nicer solution will > become available. I know about umcg and I even did a prototype that used fuxet_swap (the previous attempt of umcg). Here are a few problems and maybe you will have some ideas on how to solve them. The main question is how to hijack a stub process where a guest code is executing. We need to trap system calls, memory faults, and other exceptions and handle them in the Sentry (supervisor/kernel). All interested events except system calls generate signals. We can use seccomp to get signals on system calls too. In my prototype, a guest code is running in stub processes. One stub process is for each guest address space. In a stub process, I set a signal handler for SIGSEGV, SIGBUS, SIGFPE, SIGSYS, SIGILL, set an alternate signal stack, and set seccomp rules. The signal handler communicates with the Sentry (supervisor/kernel) via shared memory and uses futex_swap to make fast switches to the Sentry and back to a stub process. Here are a few problems. First, we have a signal handler code, its stack, and a shared memory region in a guest address space, and we need to guarantee that a guest code will not be able to use them to do something unexpected. The second problem is performance. It is much faster if we compare it with the ptrace platform, but it is still a few times slower than process_vm_exec. Signal handling is expensive. The kernel has to generate a signal frame, execute a signal handler, and then it needs to call rt_sigreturn. Futex_swap makes fast context switches, but it is still slower than process_vm_exec. UMCG should be faster because it doesn’t have a futex overhead. Andy, what do you think about the idea to rework process_vm_exec so that it executes code and syscalls in the context of a target process? Maybe you see other ways how we can “hijack” a remote process? Thanks, Andrei > > > > > * Execute a system call in an address space of a target process. > > I could get behind this, but there are plenty of cans of worms to watch > out for. Serious auditing would be needed. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1030.google.com ([2607:f8b0:4864:20::1030]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1m4vlB-007GFb-2E for linux-um@lists.infradead.org; Sun, 18 Jul 2021 01:38:26 +0000 Received: by mail-pj1-x1030.google.com with SMTP id b8-20020a17090a4888b02901725eedd346so10003958pjh.4 for ; Sat, 17 Jul 2021 18:38:24 -0700 (PDT) Date: Sat, 17 Jul 2021 18:34:39 -0700 From: Andrei Vagin Subject: Re: [PATCH 0/4 POC] Allow executing code and syscalls in another address space Message-ID: References: <20210414055217.543246-1-avagin@gmail.com> <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <6073e4c6-6fe8-0448-4586-5d04d7154164@kernel.org> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: Andy Lutomirski Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com, Andrew Morton , Anton Ivanov , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, Ingo Molnar , Jeff Dike , Mike Rapoport , Michael Kerrisk , Oleg Nesterov , Peter Zijlstra , Richard Weinberger , Thomas Gleixner T24gRnJpLCBKdWwgMDIsIDIwMjEgYXQgMDM6NDQ6NDFQTSAtMDcwMCwgQW5keSBMdXRvbWlyc2tp IHdyb3RlOgo+IE9uIDQvMTMvMjEgMTA6NTIgUE0sIEFuZHJlaSBWYWdpbiB3cm90ZToKPiAKPiA+ IHByb2Nlc3Nfdm1fZXhlYyBoYXMgdHdvIG1vZGVzOgo+ID4gCj4gPiAqIEV4ZWN1dGUgY29kZSBp biBhbiBhZGRyZXNzIHNwYWNlIG9mIGEgdGFyZ2V0IHByb2Nlc3MgYW5kIHN0b3Agb24gYW55Cj4g PiAgIHNpZ25hbCBvciBzeXN0ZW0gY2FsbC4KPiAKPiBXZSBhbHJlYWR5IGhhdmUgYSBwZXJmZWN0 bHkgZ29vZCBjb250ZXh0IHN3aXRjaCBtZWNoYW5pc206IGNvbnRleHQKPiBzd2l0Y2hlcy4gIElm IHlvdSBleGVjdXRlIGNvZGUsIHlvdSBhcmUgYmFzaWNhbGx5IGd1YXJhbnRlZWQgdG8gYmUKPiBz dWJqZWN0IHRvIGJlaW5nIGhpamFja2VkLCB3aGljaCBtZWFucyB5b3UgcHJldHR5IG11Y2ggY2Fu J3QgYWxsb3cKPiBzeXNjYWxscy4gIEJ1dCB0aGVyZSdzIGEgbG90IG9mIG5vbi1zeXNjYWxsIHN0 YXRlLCBhbmQgSSB0aGluayBjb250ZXh0Cj4gc3dpdGNoaW5nIG5lZWRzIHRvIGJlIGRvbmUgd2l0 aCBleHRyZW1lIGNhcmUuCj4gCj4gKEp1c3QgYXMgZXhhbXBsZSwgc3VwcG9zZSB5b3Ugc3dpdGNo IG1tcywgdGhlbiBzZXQgJWdzIHRvIHBvaW50IHRvIHRoZQo+IExEVCwgdGhlbiBzd2l0Y2ggYmFj ay4gIE5vdyB5b3UncmUgaW4gYSB3ZWlyZCBzdGF0ZS4gIFdpdGggJXNzIHRoZSBwbG90Cj4gaXMg YSBiaXQgdGhpY2tlci4gIEFuZCB0aGVyZSBhcmUgZW11bGF0ZWQgdnN5c2NhbGxzIGFuZCBzdWNo LikKPiAKPiBJZiB5b3UsIFBldGVyWiwgYW5kIHRoZSBVTUNHIGNvdWxkIGFsbCBmaW5kIGFuIGFj Y2VwdGFibGUsIGVmZmljaWVudCB3YXkKPiB0byB3YWtlLWFuZC13YWl0IHNvIHlvdSBjYW4gc3dp dGNoIGludG8gYW4gaW5qZWN0ZWQgdGFzayBpbiB0aGUgdGFyZ2V0Cj4gcHJvY2VzcyBhbmQgc3dp dGNoIGJhY2sgcXVpY2tseSwgdGhlbiBJIHRoaW5rIGEgbXVjaCBuaWNlciBzb2x1dGlvbiB3aWxs Cj4gYmVjb21lIGF2YWlsYWJsZS4KCkkga25vdyBhYm91dCB1bWNnIGFuZCBJIGV2ZW4gZGlkIGEg cHJvdG90eXBlIHRoYXQgdXNlZCBmdXhldF9zd2FwICh0aGUKcHJldmlvdXMgYXR0ZW1wdCBvZiB1 bWNnKS4gSGVyZSBhcmUgYSBmZXcgcHJvYmxlbXMgYW5kIG1heWJlIHlvdSB3aWxsCmhhdmUgc29t ZSBpZGVhcyBvbiBob3cgdG8gc29sdmUgdGhlbS4KClRoZSBtYWluIHF1ZXN0aW9uIGlzIGhvdyB0 byBoaWphY2sgYSBzdHViIHByb2Nlc3Mgd2hlcmUgYSBndWVzdCBjb2RlIGlzCmV4ZWN1dGluZy4g V2UgbmVlZCB0byB0cmFwIHN5c3RlbSBjYWxscywgbWVtb3J5IGZhdWx0cywgYW5kIG90aGVyCmV4 Y2VwdGlvbnMgYW5kIGhhbmRsZSB0aGVtIGluIHRoZSBTZW50cnkgKHN1cGVydmlzb3Iva2VybmVs KS4gQWxsCmludGVyZXN0ZWQgZXZlbnRzIGV4Y2VwdCBzeXN0ZW0gY2FsbHMgZ2VuZXJhdGUgc2ln bmFscy4gV2UgY2FuIHVzZQpzZWNjb21wIHRvIGdldCBzaWduYWxzIG9uIHN5c3RlbSBjYWxscyB0 b28uIEluIG15IHByb3RvdHlwZSwgYSBndWVzdApjb2RlIGlzIHJ1bm5pbmcgaW4gc3R1YiBwcm9j ZXNzZXMuIE9uZSBzdHViIHByb2Nlc3MgaXMgZm9yIGVhY2ggZ3Vlc3QKYWRkcmVzcyBzcGFjZS4g SW4gYSBzdHViIHByb2Nlc3MsIEkgc2V0IGEgc2lnbmFsIGhhbmRsZXIgZm9yIFNJR1NFR1YsClNJ R0JVUywgU0lHRlBFLCBTSUdTWVMsIFNJR0lMTCwgc2V0IGFuIGFsdGVybmF0ZSBzaWduYWwgc3Rh Y2ssIGFuZCBzZXQKc2VjY29tcCBydWxlcy4gVGhlIHNpZ25hbCBoYW5kbGVyIGNvbW11bmljYXRl cyB3aXRoIHRoZSBTZW50cnkKKHN1cGVydmlzb3Iva2VybmVsKSB2aWEgc2hhcmVkIG1lbW9yeSBh bmQgdXNlcyBmdXRleF9zd2FwIHRvIG1ha2UgZmFzdApzd2l0Y2hlcyB0byB0aGUgU2VudHJ5IGFu ZCBiYWNrIHRvIGEgc3R1YiBwcm9jZXNzLgoKSGVyZSBhcmUgYSBmZXcgcHJvYmxlbXMuIEZpcnN0 LCB3ZSBoYXZlIGEgc2lnbmFsIGhhbmRsZXIgY29kZSwgaXRzCnN0YWNrLCBhbmQgYSBzaGFyZWQg bWVtb3J5IHJlZ2lvbiBpbiBhIGd1ZXN0IGFkZHJlc3Mgc3BhY2UsIGFuZCB3ZSBuZWVkCnRvIGd1 YXJhbnRlZSB0aGF0IGEgZ3Vlc3QgY29kZSB3aWxsIG5vdCBiZSBhYmxlIHRvIHVzZSB0aGVtIHRv IGRvCnNvbWV0aGluZyB1bmV4cGVjdGVkLgoKVGhlIHNlY29uZCBwcm9ibGVtIGlzIHBlcmZvcm1h bmNlLiBJdCBpcyBtdWNoIGZhc3RlciBpZiB3ZSBjb21wYXJlIGl0CndpdGggdGhlIHB0cmFjZSBw bGF0Zm9ybSwgYnV0IGl0IGlzIHN0aWxsIGEgZmV3IHRpbWVzIHNsb3dlciB0aGFuCnByb2Nlc3Nf dm1fZXhlYy4gU2lnbmFsIGhhbmRsaW5nIGlzIGV4cGVuc2l2ZS4gVGhlIGtlcm5lbCBoYXMgdG8K Z2VuZXJhdGUgYSBzaWduYWwgZnJhbWUsIGV4ZWN1dGUgYSBzaWduYWwgaGFuZGxlciwgYW5kIHRo ZW4gaXQgbmVlZHMgdG8KY2FsbCBydF9zaWdyZXR1cm4uIEZ1dGV4X3N3YXAgbWFrZXMgZmFzdCBj b250ZXh0IHN3aXRjaGVzLCBidXQgaXQgaXMKc3RpbGwgc2xvd2VyIHRoYW4gcHJvY2Vzc192bV9l eGVjLiBVTUNHIHNob3VsZCBiZSBmYXN0ZXIgYmVjYXVzZSBpdApkb2VzbuKAmXQgaGF2ZSBhIGZ1 dGV4IG92ZXJoZWFkLgoKQW5keSwgd2hhdCBkbyB5b3UgdGhpbmsgYWJvdXQgdGhlIGlkZWEgdG8g cmV3b3JrIHByb2Nlc3Nfdm1fZXhlYyBzbyB0aGF0Cml0IGV4ZWN1dGVzIGNvZGUgYW5kIHN5c2Nh bGxzIGluIHRoZSBjb250ZXh0IG9mIGEgdGFyZ2V0IHByb2Nlc3M/Ck1heWJlIHlvdSBzZWUgb3Ro ZXIgd2F5cyBob3cgd2UgY2FuIOKAnGhpamFja+KAnSBhIHJlbW90ZSBwcm9jZXNzPwoKVGhhbmtz LApBbmRyZWkKCj4gCj4gPiAKPiA+ICogRXhlY3V0ZSBhIHN5c3RlbSBjYWxsIGluIGFuIGFkZHJl c3Mgc3BhY2Ugb2YgYSB0YXJnZXQgcHJvY2Vzcy4KPiAKPiBJIGNvdWxkIGdldCBiZWhpbmQgdGhp cywgYnV0IHRoZXJlIGFyZSBwbGVudHkgb2YgY2FucyBvZiB3b3JtcyB0byB3YXRjaAo+IG91dCBm b3IuICBTZXJpb3VzIGF1ZGl0aW5nIHdvdWxkIGJlIG5lZWRlZC4KCl9fX19fX19fX19fX19fX19f X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fCmxpbnV4LXVtIG1haWxpbmcgbGlzdApsaW51 eC11bUBsaXN0cy5pbmZyYWRlYWQub3JnCmh0dHA6Ly9saXN0cy5pbmZyYWRlYWQub3JnL21haWxt YW4vbGlzdGluZm8vbGludXgtdW0K