From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51BB6C11F64 for ; Tue, 29 Jun 2021 01:05:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E6BC611CA for ; Tue, 29 Jun 2021 01:05:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231983AbhF2BHt (ORCPT ); Mon, 28 Jun 2021 21:07:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230090AbhF2BHm (ORCPT ); Mon, 28 Jun 2021 21:07:42 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA30FC061574; Mon, 28 Jun 2021 18:05:14 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id c15so9945276pls.13; Mon, 28 Jun 2021 18:05:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=XyjrBBqKuXXp7P2ZwUMXqg1MVFp6okeVGxa9hWFm7S4=; b=YudMmojcl8EcewHCVOzKzNOaZCv8gvxxc3MVxhmsJqbE2V35ct/dB50dD9ikE2uWBW 2oIVsGMgPM65c5eBHS98Qdq6LrDLLdhVS8vwZ8EOlseIgbflPKy3c9hx84xV7FdsHANw FQbxlMJMQAtlMTfaNbguanJZBqIjpV/Q7CNIfP2+RFCbE2NEua6eLX7/AmlrfaEdXhTm pMFnhEif4H4JY16mgSAQJhy/3by1wtxsgWTPHMUnqVbZI2m0vQYVwMFQkv0B+GPLrMkr P2oTfMwPpk6H/dYuqe5kItNUqNX7IlajS8Q7ETyajHUJCOZQJNswp/CiNXtSJDdSevqP 8USg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=XyjrBBqKuXXp7P2ZwUMXqg1MVFp6okeVGxa9hWFm7S4=; b=rHB6AUA4DH1Q/x+ShKZN5yv4QcMEEtkDpD23MgLNZUU59kLO++QP544oeB+FU70xDD 70oedYTOPxt23uOoclkdyhUZpRPlF38J05lSZqDbLlIKdjTM2Xz3gn1XgTeP/JrdBaRz rIPM828GtKMSKkh3WUGMhfa4stbTJvlgMcir1s3wnsPEcFf3tXWnrDVDfHAn6HEi0u+t zADwFNZBXI8Y11ZKxI5s2BZW7nLL5wNsi+wG5hcfVLtmSSIyWa2rLHtL9VpBYiHFE0sS Z9bm9iI9/p/75pvzPU8e4af5PfQkcmXivKOihoUCRNmlk4Hc6CHAUacfCCKzq+OmMo7U wt5A== X-Gm-Message-State: AOAM532T6k3snwT1jWcg134pTQ1qqjSrjWXBvTGzC6Kv04mGx7G3ou7u Ke+SDymzMp+ygxFs1NBe0lY= X-Google-Smtp-Source: ABdhPJz9JhMtSK+wELOy3wDFTu9TB5p9rlWUIqAdu55+YzzGIp/JMq+7UD8w+nOlM7ikBH5UzG2m6Q== X-Received: by 2002:a17:90a:390d:: with SMTP id y13mr40019491pjb.133.1624928713891; Mon, 28 Jun 2021 18:05:13 -0700 (PDT) Received: from gmail.com ([2601:600:8500:5f14:d627:c51e:516e:a105]) by smtp.gmail.com with ESMTPSA id w21sm1669030pge.30.2021.06.28.18.05.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Jun 2021 18:05:13 -0700 (PDT) Date: Mon, 28 Jun 2021 18:01:43 -0700 From: Andrei Vagin To: "Eric W. Biederman" Cc: Jann Horn , Andy Lutomirski , Linux Kernel Mailing List , Linux API , linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com, Andrew Morton , Anton Ivanov , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, Ingo Molnar , Jeff Dike , Mike Rapoport , Michael Kerrisk , Oleg Nesterov , "Peter Zijlstra (Intel)" , Richard Weinberger , Thomas Gleixner Subject: Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Message-ID: References: <20210414055217.543246-1-avagin@gmail.com> <20210414055217.543246-3-avagin@gmail.com> <87o8bpyhsw.fsf@disp2133> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87o8bpyhsw.fsf@disp2133> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 28, 2021 at 01:18:07PM -0500, Eric W. Biederman wrote: > Jann Horn writes: > > > On Mon, Jun 28, 2021 at 6:30 PM Andy Lutomirski wrote: > >> On Mon, Jun 28, 2021, at 9:13 AM, Jann Horn wrote: > >> > On Wed, Apr 14, 2021 at 7:59 AM Andrei Vagin wrote: > >> > > This change introduces the new system call: > >> > > process_vm_exec(pid_t pid, struct sigcontext *uctx, unsigned long flags, > >> > > siginfo_t * uinfo, sigset_t *sigmask, size_t sizemask) > >> > > > >> > > process_vm_exec allows to execute the current process in an address > >> > > space of another process. > >> > [...] > >> > > >> > I still think that this whole API is fundamentally the wrong approach > >> > because it tries to shoehorn multiple usecases with different > >> > requirements into a single API. But that aside: > >> > > >> > > +static void swap_mm(struct mm_struct *prev_mm, struct mm_struct *target_mm) > >> > > +{ > >> > > + struct task_struct *tsk = current; > >> > > + struct mm_struct *active_mm; > >> > > + > >> > > + task_lock(tsk); > >> > > + /* Hold off tlb flush IPIs while switching mm's */ > >> > > + local_irq_disable(); > >> > > + > >> > > + sync_mm_rss(prev_mm); > >> > > + > >> > > + vmacache_flush(tsk); > >> > > + > >> > > + active_mm = tsk->active_mm; > >> > > + if (active_mm != target_mm) { > >> > > + mmgrab(target_mm); > >> > > + tsk->active_mm = target_mm; > >> > > + } > >> > > + tsk->mm = target_mm; > >> > > >> > I'm pretty sure you're not currently allowed to overwrite the ->mm > >> > pointer of a userspace thread. For example, zap_threads() assumes that > >> > all threads running under a process have the same ->mm. (And if you're > >> > fiddling with ->mm stuff, you should probably CC linux-mm@.) > >> > >> exec_mmap() does it, so it can’t be entirely impossible. > > > > Yeah, true, execve can do it - I guess the thing that makes that > > special is that it's running after de_thread(), so it's guaranteed to > > be single-threaded? > > Even the implementation detail of swapping the mm aside. Even the idea > of swaping the mm is completely broken, as an endless system calls > depend upon the state held in task_struct. io_uring just tried running > system calls of a process in a different context and we ultimately had > to make the threads part of the original process to make enough things > work to keep the problem tractable. > > System calls deeply and fundamentally depend on task_struct and > signal_struct. In opposite to io_uring, process_vm_exec doesn't intend to run system calls in the context of the target process. We initially declare that system calls are executed in the context of the current process with just another mm. If we are talking about user-mode kernels, they will need just two system calls: mmap and munmap. In case of CRIU, vmsplice will be used too. > > I can think of two possibilities. > 1) Hijack and existing process thread. > 2) Inject a new thread into an existing process. I am not sure that I understand what you mean here, but it sounds like we will need to do a context switch to execute anything in a context of a hijacked thread. If I am right, it kills the main idea of process_vm_exec. If I misunderstand your idea, maybe you can describe it with more details. Thanks, Andrei From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x102a.google.com ([2607:f8b0:4864:20::102a]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1ly2Bj-009Tfu-L7 for linux-um@lists.infradead.org; Tue, 29 Jun 2021 01:05:21 +0000 Received: by mail-pj1-x102a.google.com with SMTP id in17-20020a17090b4391b0290170ba0ec7fcso441324pjb.4 for ; Mon, 28 Jun 2021 18:05:14 -0700 (PDT) Date: Mon, 28 Jun 2021 18:01:43 -0700 From: Andrei Vagin Subject: Re: [PATCH 2/4] arch/x86: implement the process_vm_exec syscall Message-ID: References: <20210414055217.543246-1-avagin@gmail.com> <20210414055217.543246-3-avagin@gmail.com> <87o8bpyhsw.fsf@disp2133> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <87o8bpyhsw.fsf@disp2133> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Sender: "linux-um" Errors-To: linux-um-bounces+geert=linux-m68k.org@lists.infradead.org To: "Eric W. Biederman" Cc: Jann Horn , Andy Lutomirski , Linux Kernel Mailing List , Linux API , linux-um@lists.infradead.org, criu@openvz.org, avagin@google.com, Andrew Morton , Anton Ivanov , Christian Brauner , Dmitry Safonov <0x7f454c46@gmail.com>, Ingo Molnar , Jeff Dike , Mike Rapoport , Michael Kerrisk , Oleg Nesterov , "Peter Zijlstra (Intel)" , Richard Weinberger , Thomas Gleixner T24gTW9uLCBKdW4gMjgsIDIwMjEgYXQgMDE6MTg6MDdQTSAtMDUwMCwgRXJpYyBXLiBCaWVkZXJt YW4gd3JvdGU6Cj4gSmFubiBIb3JuIDxqYW5uaEBnb29nbGUuY29tPiB3cml0ZXM6Cj4gCj4gPiBP biBNb24sIEp1biAyOCwgMjAyMSBhdCA2OjMwIFBNIEFuZHkgTHV0b21pcnNraSA8bHV0b0BrZXJu ZWwub3JnPiB3cm90ZToKPiA+PiBPbiBNb24sIEp1biAyOCwgMjAyMSwgYXQgOToxMyBBTSwgSmFu biBIb3JuIHdyb3RlOgo+ID4+ID4gT24gV2VkLCBBcHIgMTQsIDIwMjEgYXQgNzo1OSBBTSBBbmRy ZWkgVmFnaW4gPGF2YWdpbkBnbWFpbC5jb20+IHdyb3RlOgo+ID4+ID4gPiBUaGlzIGNoYW5nZSBp bnRyb2R1Y2VzIHRoZSBuZXcgc3lzdGVtIGNhbGw6Cj4gPj4gPiA+IHByb2Nlc3Nfdm1fZXhlYyhw aWRfdCBwaWQsIHN0cnVjdCBzaWdjb250ZXh0ICp1Y3R4LCB1bnNpZ25lZCBsb25nIGZsYWdzLAo+ ID4+ID4gPiAgICAgICAgICAgICAgICAgc2lnaW5mb190ICogdWluZm8sIHNpZ3NldF90ICpzaWdt YXNrLCBzaXplX3Qgc2l6ZW1hc2spCj4gPj4gPiA+Cj4gPj4gPiA+IHByb2Nlc3Nfdm1fZXhlYyBh bGxvd3MgdG8gZXhlY3V0ZSB0aGUgY3VycmVudCBwcm9jZXNzIGluIGFuIGFkZHJlc3MKPiA+PiA+ ID4gc3BhY2Ugb2YgYW5vdGhlciBwcm9jZXNzLgo+ID4+ID4gWy4uLl0KPiA+PiA+Cj4gPj4gPiBJ IHN0aWxsIHRoaW5rIHRoYXQgdGhpcyB3aG9sZSBBUEkgaXMgZnVuZGFtZW50YWxseSB0aGUgd3Jv bmcgYXBwcm9hY2gKPiA+PiA+IGJlY2F1c2UgaXQgdHJpZXMgdG8gc2hvZWhvcm4gbXVsdGlwbGUg dXNlY2FzZXMgd2l0aCBkaWZmZXJlbnQKPiA+PiA+IHJlcXVpcmVtZW50cyBpbnRvIGEgc2luZ2xl IEFQSS4gQnV0IHRoYXQgYXNpZGU6Cj4gPj4gPgo+ID4+ID4gPiArc3RhdGljIHZvaWQgc3dhcF9t bShzdHJ1Y3QgbW1fc3RydWN0ICpwcmV2X21tLCBzdHJ1Y3QgbW1fc3RydWN0ICp0YXJnZXRfbW0p Cj4gPj4gPiA+ICt7Cj4gPj4gPiA+ICsgICAgICAgc3RydWN0IHRhc2tfc3RydWN0ICp0c2sgPSBj dXJyZW50Owo+ID4+ID4gPiArICAgICAgIHN0cnVjdCBtbV9zdHJ1Y3QgKmFjdGl2ZV9tbTsKPiA+ PiA+ID4gKwo+ID4+ID4gPiArICAgICAgIHRhc2tfbG9jayh0c2spOwo+ID4+ID4gPiArICAgICAg IC8qIEhvbGQgb2ZmIHRsYiBmbHVzaCBJUElzIHdoaWxlIHN3aXRjaGluZyBtbSdzICovCj4gPj4g PiA+ICsgICAgICAgbG9jYWxfaXJxX2Rpc2FibGUoKTsKPiA+PiA+ID4gKwo+ID4+ID4gPiArICAg ICAgIHN5bmNfbW1fcnNzKHByZXZfbW0pOwo+ID4+ID4gPiArCj4gPj4gPiA+ICsgICAgICAgdm1h Y2FjaGVfZmx1c2godHNrKTsKPiA+PiA+ID4gKwo+ID4+ID4gPiArICAgICAgIGFjdGl2ZV9tbSA9 IHRzay0+YWN0aXZlX21tOwo+ID4+ID4gPiArICAgICAgIGlmIChhY3RpdmVfbW0gIT0gdGFyZ2V0 X21tKSB7Cj4gPj4gPiA+ICsgICAgICAgICAgICAgICBtbWdyYWIodGFyZ2V0X21tKTsKPiA+PiA+ ID4gKyAgICAgICAgICAgICAgIHRzay0+YWN0aXZlX21tID0gdGFyZ2V0X21tOwo+ID4+ID4gPiAr ICAgICAgIH0KPiA+PiA+ID4gKyAgICAgICB0c2stPm1tID0gdGFyZ2V0X21tOwo+ID4+ID4KPiA+ PiA+IEknbSBwcmV0dHkgc3VyZSB5b3UncmUgbm90IGN1cnJlbnRseSBhbGxvd2VkIHRvIG92ZXJ3 cml0ZSB0aGUgLT5tbQo+ID4+ID4gcG9pbnRlciBvZiBhIHVzZXJzcGFjZSB0aHJlYWQuIEZvciBl eGFtcGxlLCB6YXBfdGhyZWFkcygpIGFzc3VtZXMgdGhhdAo+ID4+ID4gYWxsIHRocmVhZHMgcnVu bmluZyB1bmRlciBhIHByb2Nlc3MgaGF2ZSB0aGUgc2FtZSAtPm1tLiAoQW5kIGlmIHlvdSdyZQo+ ID4+ID4gZmlkZGxpbmcgd2l0aCAtPm1tIHN0dWZmLCB5b3Ugc2hvdWxkIHByb2JhYmx5IENDIGxp bnV4LW1tQC4pCj4gPj4KPiA+PiBleGVjX21tYXAoKSBkb2VzIGl0LCBzbyBpdCBjYW7igJl0IGJl IGVudGlyZWx5IGltcG9zc2libGUuCj4gPgo+ID4gWWVhaCwgdHJ1ZSwgZXhlY3ZlIGNhbiBkbyBp dCAtIEkgZ3Vlc3MgdGhlIHRoaW5nIHRoYXQgbWFrZXMgdGhhdAo+ID4gc3BlY2lhbCBpcyB0aGF0 IGl0J3MgcnVubmluZyBhZnRlciBkZV90aHJlYWQoKSwgc28gaXQncyBndWFyYW50ZWVkIHRvCj4g PiBiZSBzaW5nbGUtdGhyZWFkZWQ/Cj4gCj4gRXZlbiB0aGUgaW1wbGVtZW50YXRpb24gZGV0YWls IG9mIHN3YXBwaW5nIHRoZSBtbSBhc2lkZS4gIEV2ZW4gdGhlIGlkZWEKPiBvZiBzd2FwaW5nIHRo ZSBtbSBpcyBjb21wbGV0ZWx5IGJyb2tlbiwgYXMgYW4gZW5kbGVzcyBzeXN0ZW0gY2FsbHMKPiBk ZXBlbmQgdXBvbiB0aGUgc3RhdGUgaGVsZCBpbiB0YXNrX3N0cnVjdC4gIGlvX3VyaW5nIGp1c3Qg dHJpZWQgcnVubmluZwo+IHN5c3RlbSBjYWxscyBvZiBhIHByb2Nlc3MgaW4gYSBkaWZmZXJlbnQg Y29udGV4dCBhbmQgd2UgdWx0aW1hdGVseSBoYWQKPiB0byBtYWtlIHRoZSB0aHJlYWRzIHBhcnQg b2YgdGhlIG9yaWdpbmFsIHByb2Nlc3MgdG8gbWFrZSBlbm91Z2ggdGhpbmdzCj4gd29yayB0byBr ZWVwIHRoZSBwcm9ibGVtIHRyYWN0YWJsZS4KPiAKPiBTeXN0ZW0gY2FsbHMgZGVlcGx5IGFuZCBm dW5kYW1lbnRhbGx5IGRlcGVuZCBvbiB0YXNrX3N0cnVjdCBhbmQKPiBzaWduYWxfc3RydWN0LgoK SW4gb3Bwb3NpdGUgdG8gaW9fdXJpbmcsIHByb2Nlc3Nfdm1fZXhlYyBkb2Vzbid0IGludGVuZCB0 byBydW4gc3lzdGVtCmNhbGxzIGluIHRoZSBjb250ZXh0IG9mIHRoZSB0YXJnZXQgcHJvY2Vzcy4g V2UgaW5pdGlhbGx5IGRlY2xhcmUgdGhhdApzeXN0ZW0gY2FsbHMgYXJlIGV4ZWN1dGVkIGluIHRo ZSBjb250ZXh0IG9mIHRoZSBjdXJyZW50IHByb2Nlc3Mgd2l0aApqdXN0IGFub3RoZXIgbW0uIElm IHdlIGFyZSB0YWxraW5nIGFib3V0IHVzZXItbW9kZSBrZXJuZWxzLCB0aGV5IHdpbGwKbmVlZCBq dXN0IHR3byBzeXN0ZW0gY2FsbHM6IG1tYXAgYW5kIG11bm1hcC4gSW4gY2FzZSBvZiBDUklVLCB2 bXNwbGljZQp3aWxsIGJlIHVzZWQgdG9vLgoKPiAKPiBJIGNhbiB0aGluayBvZiB0d28gcG9zc2li aWxpdGllcy4KPiAxKSBIaWphY2sgYW5kIGV4aXN0aW5nIHByb2Nlc3MgdGhyZWFkLgo+IDIpIElu amVjdCBhIG5ldyB0aHJlYWQgaW50byBhbiBleGlzdGluZyBwcm9jZXNzLgoKSSBhbSBub3Qgc3Vy ZSB0aGF0IEkgdW5kZXJzdGFuZCB3aGF0IHlvdSBtZWFuIGhlcmUsIGJ1dCBpdCBzb3VuZHMgbGlr ZQp3ZSB3aWxsIG5lZWQgdG8gZG8gYSBjb250ZXh0IHN3aXRjaCB0byBleGVjdXRlIGFueXRoaW5n IGluIGEgY29udGV4dApvZiBhIGhpamFja2VkIHRocmVhZC4gSWYgSSBhbSByaWdodCwgaXQga2ls bHMgdGhlIG1haW4gaWRlYSBvZgpwcm9jZXNzX3ZtX2V4ZWMuIElmIEkgbWlzdW5kZXJzdGFuZCB5 b3VyIGlkZWEsIG1heWJlIHlvdSBjYW4gZGVzY3JpYmUgaXQKd2l0aCBtb3JlIGRldGFpbHMuCgpU aGFua3MsCkFuZHJlaQoKX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f X19fX18KbGludXgtdW0gbWFpbGluZyBsaXN0CmxpbnV4LXVtQGxpc3RzLmluZnJhZGVhZC5vcmcK aHR0cDovL2xpc3RzLmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC11bQo=