From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD01AC433B4 for ; Tue, 18 May 2021 23:54:13 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 40212601FC for ; Tue, 18 May 2021 23:54:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 40212601FC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=fex-emu.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=g0oO8owlF2+37en6UwXciMdCD2ovtDdLi2XFeiisyR8=; b=rXI0sGFDe2MmCHdR99QjM814h 9SMsJoeecHICovLSPrYbX9iVXLuFIMcdXV1eG+np0GOeeQ9RFN3z1zlfbWU2y2L0j4K1YgNtrnTYq hLGqfQs1xjJ7iFPb9DGfbltvbll9wFooqRs7K0kkeec3hg+rBIPe95i1IPg+bMHl78/dCNHxKSAkM vSgesLiKPm0j+XHLsTMOcWET7St8NNkKi5qTrXH8qn2SdN9vdHaHDnoFvmE5I0vJ1QECSXaaGCJpC zySMbh/rCUVURSKL2RB/yAxIc2FstsCooTDBxguPUwra8/1RcbbOYSZCJMAPl622Lqqf8Apo34ECB rKeTvTnMw==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lj9Vk-002DDB-Rp; Tue, 18 May 2021 23:52:29 +0000 Received: from bombadil.infradead.org ([198.137.202.133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj9Vh-002DCe-Pb for linux-arm-kernel@desiato.infradead.org; Tue, 18 May 2021 23:52:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:Cc:To:Subject:Message-ID :Date:From:In-Reply-To:References:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=MFd8U2H3+7KgjevEIkxFhdP9sWTsgejveMawFZzQnQI=; b=szkNQcjf868++NZ7VqT5RWw2QI VqkIHKuhGWShJEk5hym0X2byeqmivCxv/jaSL5yOgpFHGR7JqqVIbgSJShvPk/Pl4EWphjUaabJd8 tgmXe0XjaBql0VnXqpEMypB0LW+Aek7XeqlEodaLDpbePbV4Oy9lpLpeIgHjECk+e2B0CNsRguwAJ n/S2TQrb59slh3MLKUPH3hpFPdUetJTMbllh+vXJVPz57TPCjLKFaxTXWTwFwKXvoVIi3V6+HzY9q c5LoRGBfWz79NvI6FQ+Y85PyPb9ZXzG9M5juSoRZKxGLxkLWwVJH/pVjfznkLxPCIfilXPEO9OFRT +M/x1hBA==; Received: from mail-pf1-x42f.google.com ([2607:f8b0:4864:20::42f]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lj9Ve-00F1yj-Nr for linux-arm-kernel@lists.infradead.org; Tue, 18 May 2021 23:52:24 +0000 Received: by mail-pf1-x42f.google.com with SMTP id f22so243032pfn.0 for ; Tue, 18 May 2021 16:52:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fex-emu.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MFd8U2H3+7KgjevEIkxFhdP9sWTsgejveMawFZzQnQI=; b=BLkpUlmMtVHZZGVesaTCQAiF5S8WxQOW2l57KGOQL3jfZzyKjmB1pmKd7x3rfGGqZv CvX3IOjwn4Fl9z8+I3c92zJ6sn3Mw2c0w6GVjVQlNQYqHQ2qHdiB9atJzYnXwenx8GbU FkVJ7fZjP7Z7PC0X/vQtePhd3wZxsrAfOHyRL0NfbK94CM53xf5O/XXu2zobmliqQ6+0 DGbHlc3Ehtji+gn+ogdz8jhR3xebQuHJNE4hXdfolkNl1MTrrv1CS5xv4c7YAG4CnhZs XV1aSEPn2g4YyAmKI390UAr78YBr9S7O7ZU54GZeQpEKzeOXJbT/gfGfYxLaunRGvHmd VfAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MFd8U2H3+7KgjevEIkxFhdP9sWTsgejveMawFZzQnQI=; b=iGQ7Q6wGLTz29ORWwVg3Wjo0LOkWhs7rHu8z2ugZP+XSwPSmZm4Mi4gabiRT94bKI9 dRMaGsP0oft/kMQI0+q6w3W8HFBI1p7V0fXy6kBqDSwYJK8FPW7vy/2T3+0pu61uRMwD LkMJot+U1xFRnUBiQF9glB8Ycy2lBy/Sz48kBF9XkQ0xdNxoFdZz8dds94kINC633KJ3 RJqySlGkzqECBUT4XxfDVKvG/qcXEOGCFNXEFPTYf6ZWExQqnCeCqvpqzv50LPDpH9IY TtRuLJlb3GpvitSH6usPT+GmG3LfTv4V51n1I015aNz6p93t3p5m1pFrxdmopjv8cjQQ DrtA== X-Gm-Message-State: AOAM533ju+CO00ixwjzahgi4hMfCuE95gZFZODLFNTkEPVx4cMPtYJM2 Ye95QvYB5dYER+GU//cLC5Rw+0KuitoAPKarLikfmw== X-Google-Smtp-Source: ABdhPJyDSm7VZ3vEBsXak3vmZyZbdtFNebQ5SlUpDFvuZiK10BE3W4rZjOtccBl55SaxHiB7LQSWQitHr+vRO0WTBw0= X-Received: by 2002:a62:6481:0:b029:249:ecee:a05d with SMTP id y123-20020a6264810000b0290249eceea05dmr7620241pfb.9.1621381941392; Tue, 18 May 2021 16:52:21 -0700 (PDT) MIME-Version: 1.0 References: <20210518090658.9519-1-amanieu@gmail.com> <20210518090658.9519-9-amanieu@gmail.com> In-Reply-To: From: Ryan Houdek Date: Tue, 18 May 2021 16:52:10 -0700 Message-ID: Subject: Re: [RESEND PATCH v4 8/8] arm64: Allow 64-bit tasks to invoke compat syscalls To: Arnd Bergmann Cc: "Amanieu d'Antras" , Catalin Marinas , Will Deacon , Mark Rutland , Steven Price , David Laight , Mark Brown , Linux ARM , Linux Kernel Mailing List X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210518_165222_812130_5D6B3187 X-CRM114-Status: GOOD ( 45.30 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, May 18, 2021 at 6:03 AM Arnd Bergmann wrote: > > On Tue, May 18, 2021 at 11:06 AM Amanieu d'Antras wrote: > > > > Setting bit 31 in x8 when performing a syscall will do the following: > > - The remainder of x8 is treated as a compat syscall number and is used > > to index the compat syscall table. > > - in_compat_syscall will return true for the duration of the syscall. > > - VM allocations performed by the syscall will be located in the lower > > 4G of the address space. > > - Interrupted syscalls are properly restarted as compat syscalls. > > - Seccomp will treats the syscall as having AUDIT_ARCH_ARM instead of > > AUDIT_ARCH_AARCH64. This affects the arch value seen by seccomp > > filters and reported by SIGSYS. > > - PTRACE_GET_SYSCALL_INFO also treats the syscall as having > > AUDIT_ARCH_ARM. Recent versions of strace will correctly report the > > system call name and parameters when an AArch64 task mixes 32-bit and > > 64-bit syscalls. > > > > Previously, setting bit 31 of the syscall number would always cause the > > sygscall to return ENOSYS. This allows user programs to reliably detect > > kernel support for compat syscall by trying a simple syscall such as > > getpid. > > > > The AArch32-private compat syscalls (__ARM_NR_compat_*) are not exposed > > through this interface. These syscalls do not make sense in the context > > of an AArch64 task. > > > > Signed-off-by: Amanieu d'Antras > > Co-developed-by: Ryan Houdek > > Signed-off-by: Ryan Houdek > > I'm still undecided about this approach. It is an easy way to expose the 32-bit > ABIs, it mostly copies what x86-64 already does with 32-bit syscalls and > it doesn't expose a lot of attack surface that isn't already exposed to normal > 32-bit tasks running compat mode. > > On the other hand, exposing the entire aarch32 syscall set seems both > too broad and not broad enough: Half of the system calls behave the > exact same way in native and compat mode, so they wouldn't need to > be exposed like this, a lot of others are trivially emulated in user space > by calling the native versions. The syscalls that are actually hard to do > such as ioctl() or the signal handling will work for aarch32 emulation, but > they are still insufficient to correctly emulate other 32-bit architectures > that have a slightly different ABI. This means the interface is a fairly good > fit for Tango, but much less so for FEX. You are correct here. This meshes perfectly for Tango's use case. Where the syscalls will match perfectly for their aarch32->aarch64->compat syscall path. For FEX's use case, we still need to deal with any data structure that doesn't match between the 32-bit x86 to compat syscall boundary. While x86->compat will require significantly less fixups than x86->aarch64, it is still likely to have some structure differences that need fixing. > > It's also worth pointing out that this approach has a few things in common > with Yury's ilp32 tree at https://github.com/norov/linux/tree/ilp32-5.2 > Unlike the x86 x32 mode, that port however does not allow calling compat > syscalls from normal 64-bit tasks but rather keys the syscall entry point > off the executable format., which wouldn't work here. It also uses the > asm-generic system call numbers instead of the arm32 syscall numbers. > > I assume you have already considered or tried the alternative approach of > only adding a minimal set of syscalls that are needed for the emulation. > Having a way to limit the address space for mmap() and similar > system calls sounds like a generally useful addition, and having an > extended variant of ioctl() that lets you pick the target ABI (arm32, x86-32, > ...) on supported drivers would probably be better for FEX. Can you > explain the tradeoffs that led you towards duplicating the syscall > entry points instead? I'm likely to not be very concise here. There are many paper cuts for any route taken here. For me, this one is the best route because of its ability to future proof for any upcoming additions to syscalls. If we were wanting to take a path of duplicating a bunch of compat syscalls to work from the 64-bit side. We would first need to start with around nine syscalls that are causing immediate problems. mmap/mmap2, mremap, shmat, ioctl, recvmsg, recvmmsg, getdents, and getdents64. So we could carve those out, adding effectively the same memory handling code that is being added here[1]. Do the ML dance to upstream. We now have nine-ish syscalls that are added specifically for userspace compatibility layers. That's already beginning to have a bad smell. Next step is a couple months down the line, someone adds a super cool syscall that say, allocates memory that is secure over infiniband and flushes to persistence on hibernate. Neato. Oops, this is allocating memory, and since FEX is tracking very close to upstream kernel syscall support, we now need to add yet another syscall that handles the compat version in a 64-bit space. Or maybe it appends to a linked list of secure memory regions. Only visible as the head of the list (Hello robust futexes). See what I mean? Exposing the 32-bit compat syscalls removes the burden of now needing to think about every syscall in a context of 32-bit, 64-bit, 32-bit on 64-bit. Also removes the burden that I then need to come back and pester the ML every single time with new patchsets adding syscalls only for compat layers. And I'm all about removing unnecessary burden [1]Side grade, personality flags won't be pretty here, FEX lives in a mixed syscall world and doesn't want only one or the other working. FEX does a bunch of stuff in the background and a personality flag would be hard to work around whenever we need to do some memory allocations, or file system handling, or its own 64-bit ioctl handling. Just not very versatile. FEX is already allocating all 48/52-bit VA, breaking ASLR and stack growing, as a partial workaround here. > > Arnd _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel