From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75DD3C64EC4 for ; Fri, 3 Mar 2023 16:58:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 113EA6B0071; Fri, 3 Mar 2023 11:58:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C3EB6B0075; Fri, 3 Mar 2023 11:58:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECD796B0078; Fri, 3 Mar 2023 11:58:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id DBCB46B0071 for ; Fri, 3 Mar 2023 11:58:06 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 37262AB939 for ; Fri, 3 Mar 2023 16:58:06 +0000 (UTC) X-FDA: 80528194572.18.4959B42 Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169]) by imf25.hostedemail.com (Postfix) with ESMTP id 6B37CA0003 for ; Fri, 3 Mar 2023 16:58:04 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IDIBW0UN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of hjl.tools@gmail.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hjl.tools@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677862684; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=; b=HjDyJ/9y5nSSRKrD4jitQKUv4SKvzM1Xgp/MBW6/CfC4n5ebS6SbuQ6aMb6PT7VkiXesSp NYWQjIV6sB1uzha7/WtWaSohZvIfpZQlBzruN0YEg5WOyDj21010mrrLQo+eg98JB95Rml 8Ftx8KcvGenKH50XRCbwGSxvgBdD3l4= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=IDIBW0UN; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf25.hostedemail.com: domain of hjl.tools@gmail.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hjl.tools@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677862684; a=rsa-sha256; cv=none; b=LGqX8JkvQD+AiOGEsv0PleiLc03yDAMBlnl9oLE7CZXwTQHfjBC9e+FQSz8UK4icLtLrym oHHTFiqif6awhT7zDXjuk/c8c61Sv9QY4UuYILzM+GKJNwbDxRLYAlL7Eh368cvfGkk0qn N628KNLpuppFzK+DdPLE9u7fRt0WwrM= Received: by mail-yb1-f169.google.com with SMTP id k199so2570085ybf.4 for ; Fri, 03 Mar 2023 08:58:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1677862683; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=; b=IDIBW0UNVpMI+IMCx1+mckdBWjg/FVAdJKROu+nBsttHf8pTuYRxWlw4sNGnlqQR5l 7WIyPrarL2OAHw3eugjv2bBaRATtrgVVY+6JMknIEirhKSPLNcEhhPNa7WHiZ2LpZ260 /SzgyM/a3KmbmSGl9Vo+MJfY9mQLHxSmn2Iil/Me5OusWN6U2DYkkp10BA9g9dTpQSO/ CWxVyJROAUMr++qVJPCqqdcFIgaP86Ysrw5vdbFlF8nf2oIfRk9GGkjmwINitq3+VFcm R6PaMqu3cjBYb9ZZlRdEIWH9NgDwNYjvOID4ClEtARGUOukOi+HIYQE2E1a5ZDVS3z2f z85g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1677862683; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=; b=HMBHviIM6Etp9hcpS/TqOCXXYPdhNn+n+0bxyh+RHb/OyNFohTdOX5LUcvlzUIo6pt ngluThh4kXEaPNU9B0dRE/j18jcZnvGTa+uQUyMUYbwRqpyq+s+1YYytxijrNiSojt/T /7ROgKkcRW5+ugLBWjOTN33ONlFdpJ1zaWqTVUIh8V7OHTlEnSSlaMeyelRAzKOUqF9g wR+bN0CtmVRlCWqk4DxuZgSdgOTPzMIXnPZlhR8c3PKxHGPYXivJiAWm+NKerv97YpSU NRbHPJigDUzaqLmGXQWEPjMluIZz5dxoU7bkNsaUyO9aECMGjP22Nkw+kShoxLuspGu+ yKfw== X-Gm-Message-State: AO0yUKV0Lh/FTOrr4RandVKnpWStIEwz7dkZ12kigb89LYPKLF93LV5O YJ0S4hRh5JZ7dLeATxSHOCU0AS7oFCmBQlVGi4A= X-Google-Smtp-Source: AK7set8CJEPgUHfm0NWAyZb3Ct1QVtkyvLCsvbeoZUIfZc1JR5vgiNci4fMnknHET2767/ewn501c8bkQgWEQc7mcyw= X-Received: by 2002:a05:6902:2ca:b0:8a3:d147:280b with SMTP id w10-20020a05690202ca00b008a3d147280bmr1316955ybh.3.1677862683371; Fri, 03 Mar 2023 08:58:03 -0800 (PST) MIME-Version: 1.0 References: <636de4a28a42a082f182e940fbd8e63ea23895cc.camel@intel.com> <8153f5d15ec6aa4a221fb945e16d315068bd06e4.camel@intel.com> In-Reply-To: From: "H.J. Lu" Date: Fri, 3 Mar 2023 08:57:27 -0800 Message-ID: Subject: Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description To: "szabolcs.nagy@arm.com" Cc: "Edgecombe, Rick P" , "david@redhat.com" , "bsingharora@gmail.com" , "hpa@zytor.com" , "Syromiatnikov, Eugene" , "peterz@infradead.org" , "rdunlap@infradead.org" , "keescook@chromium.org" , "Eranian, Stephane" , "kirill.shutemov@linux.intel.com" , "dave.hansen@linux.intel.com" , "linux-mm@kvack.org" , "fweimer@redhat.com" , "nadav.amit@gmail.com" , "jannh@google.com" , "dethoma@microsoft.com" , "broonie@kernel.org" , "kcc@google.com" , "linux-arch@vger.kernel.org" , "bp@alien8.de" , "oleg@redhat.com" , "Yang, Weijiang" , "Lutomirski, Andy" , "pavel@ucw.cz" , "arnd@arndb.de" , "tglx@linutronix.de" , "Schimpe, Christina" , "mike.kravetz@oracle.com" , "x86@kernel.org" , "linux-doc@vger.kernel.org" , "debug@rivosinc.com" , "jamorris@linux.microsoft.com" , "john.allen@amd.com" , "rppt@kernel.org" , "andrew.cooper3@citrix.com" , "mingo@redhat.com" , "corbet@lwn.net" , "linux-kernel@vger.kernel.org" , "linux-api@vger.kernel.org" , "gorcunov@gmail.com" , "akpm@linux-foundation.org" , "Yu, Yu-cheng" , "nd@arm.com" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6B37CA0003 X-Rspamd-Server: rspam09 X-Rspam-User: X-Stat-Signature: dadopukfo6qh8fusdhka3g8zxqjsmka8 X-HE-Tag: 1677862684-205141 X-HE-Meta: U2FsdGVkX1+zgT5oPpnjVocoQ19FuN1pw8wc1en5MjSu0yhkyNy6Hk90TlWub99Q6kIfRo4ZtClB7Kv2WKptVhqIhHMhX0oSklO0Sl7Ca2qMxmJ6qmj9HcMY+WyxHdfQd4gWUHTFBP/irZQzhs09wIbFAo26rgCAasu/aC7MCcj2Ntd55j1eYMjvD3PaHn2GC5w2Sm1ilZSVWkDbi1yILR3TN/WFSGchDOY+saaW1r4LNJ61/xodrwvX5KI5jtuHpEhanU1ElqzGUj8v+9an/KyAANTOPeAHOGqmkB81dN2SV2lI4b7Wr7G7VdriSJasUC3px3BA7Mom4E7bgEPLEQ+WZNnbu1y3v5cc2H+XnQu30dsHkL4k8YJMeqV3eqlxQrp2x9HyoYvjfVZaettAdoKVxqadgEwRJsVl97kYhzXuxaSbrbKCwUNVCgd1do+P6E0YIr2bQAaHVc5fFDYuLf0b0Rr2RCjR8RDS2Dmc0n6F+Dz5ViF6RXprVVUOeuCtMVl6xfgoo6XTQqOYG0xEMz/hfWyd+cKjk1lhpTG/TZY5S602VDH8FcUyNlN59stCZ9K9cNUhUR/VkNdzmvJzRVaH6Uiv5ORSi2s0++ppikuYvT+3tnC0/XTpMiI0flPL0HrhWC2u8qZRX0xkSu4FQo8Nu6UM1d1sEtCprJyjM9dsyhqlxpgF393PyaPcPvZzqb7YcXHRyrVZ6XkEs00oEuS+xwg3p9h5lHwL/V099VkG6lD+KnkUoAKnxgcmOdQ9/ZF8DtenoUmFg3KkmlqaigqqO9EPjpkaPqz0dJhFRctqb/JmH4H/ox4aH2kZASUrLhk/5CmdJ5WBe3OwIT/YMYChWMJzrajlvRAKiSlBWdHlsYJ3fu47Id/WP2LexpJDFgfrEX9G1fqbMGO1imH74WL9QGivZvn+XKDXanwKSMh+p3H7p2GUkRu/ZMGq9fOmF4BeoCqhxntMZDcijc7 5WCDxTK1 uml1uAQUqyyFIrNnHe1MOIYnPXaqDPFgSnK31gyBhfYc8MSuwhaUJGOIx5l0VAsjX0JDXw+bQGH1FKYqTUR8doiCavdtpW14fG+9zlPqSBv6w/ntJsVYYmzg1BBmZ4tVIWh1GZI4p0YOx6pNPxVA5LQTGKbpHOApBQXH0I0YK5k/w1o3VSI9+16ZSHahkQCChj9L8VcCboboJr8g6ZzdVuRKM/0VNhiHkpSmfCIXjNAmcEM3gWIkSYUe5tZqnBNPaxtcv6kF1sxHjpLB+OL+qKRyocZwr/J1tzCXI/S8bcACNk+ZfV9V8o8RqBoUcH1BEp8pulJTuyW8NDiS4iLe05VJz1uGd/WZuEq+v9zZJlMBGkaywVPB0AhzulzCTc49B14oKbb8zfrCdB+n8YD559VYw7bGEFAbZ0YnA2czPTvLdWo+flD5oxSFW7AbCqYu1cbmR05XU8KWTROVZp/Lk2GlAb1y0X7RliSx42rn2BN7Rp7V2nETddtJOLB3C2jA24KelQQawKS+/AGcikMbuUPZUMouI5PjJe+WKUzg1dyI7HzzC8jwQ3YvQwrE6/BJ5vPcG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Mar 3, 2023 at 8:31=E2=80=AFAM szabolcs.nagy@arm.com wrote: > > The 03/02/2023 21:17, Edgecombe, Rick P wrote: > > Is the idea that shadow stack would be forced on regardless of if the > > linked libraries support it? In which case it could be allowed to crash > > if they do not? > > execute a binary > - with shstk enabled and locked (only if marked?). > - with shstk disabled and locked. > could be managed in userspace, but it is libc dependent then. > > > > > > - I think it's better to have a new limit specifically for shadow > > > > > stack size (which by default can be RLIMIT_STACK) so userspace > > > > > can adjust it if needed (another reason is that stack size is > > > > > not always a good indicator of max call depth). > > > > Looking at this again, I'm not sure why a new rlimit is needed. It > > seems many of those points were just formulations of that the clone3 > > stack size was not used, but it actually is and just not documented. If > > you disagree perhaps you could elaborate on what the requirements are > > and we can see if it seems tricky to do in a follow up. > > - tiny thread stack and deep signal stack. > (note that this does not really work with glibc because it has > implementation internal signals that don't run on alt stack, > cannot be masked and don't fit on a tiny thread stack, but > with other runtimes this can be a valid use-case, e.g. musl > allows tiny thread stacks, < pagesize.) > > - thread runtimes with clone (glibc uses clone3 but some dont). > > - huge stacks but small call depth (problem if some va limit > is hit or memory overcommit is disabled). > > > > "sigaltshstk() is separate from sigaltstack(). You can have one > > > without the other, neither or both together. Because the shadow > > > stack specific state is pushed to the shadow stack, the two > > > features don=E2=80=99t need to know about each other." > ... > > > i don't see why automatic alt shadow stack allocation would > > > not work (kernel manages it transparently when an alt stack > > > is installed or disabled). > > > > Ah, I think I see where maybe I can fill you in. Andy Luto had > > discounted this idea out of hand originally, but I didn't see it at > > first. sigaltstack lets you set, retrieve, or disable the shadow stack, > > right... But this doesn't allocate anything, it just sets where the > > next signal will be handled. This is different than things like threads > > where there is a new resources being allocated and it makes coming up > > with logic to guess when to de-allocate the alt shadow stack difficult. > > You probably already know... > > > > But because of this there can be some modes where the shadow stack is > > changed while on it. For one example, SS_AUTODISARM will disable the > > alt shadow stack while switching to it and restore when sigreturning. > > At which point a new altstack can be set. In the non-shadow stack case > > this is nice because future signals won't clobber the alt stack if you > > switch away from it (swapcontext(), etc). But it also means you can > > "change" the alt stack while on it ("change" sort of, the auto disarm > > results in the kernel forgetting it temporarily). > > the problem with swapcontext is that it may unmask signals > that run on the alt stack, which means the code cannot jump > back after another signal clobbered the alt stack. > > the non-standard SS_AUTODISARM aims to solve this by disabling > alt stack settings on signal entry until the handler returns. > > so this use case is not about supporting swapcontext out, but > about jumping back. however that does not work reliably with > this patchset: if swapcontext goes to the thread stack (and > not to another stack e.g. used by makecontext), then jump back > fails. (and if there is a sigaltshstk installed then even jump > out fails.) > > assuming > - jump out from alt shadow stack can be made to work. > - alt shadow stack management can be automatic. > then this can be improved so jump back works reliably. > > > I hear where you are coming from with the desire to have it "just work" > > with existing code, but I think the resulting ABI around the alt shadow > > stack allocation lifecycle would be way too complicated even if it > > could be made to work. Hence making a new interface. But also, the idea > > was that the x86 signal ABI should support handling alt shadow stacks, > > which is what we have done with this series. If a different interface > > for configuring it is better than the one from the POC, I'm not seeing > > a problem jump out. Is there any specific concern about backwards > > compatibility here? > > sigaltstack syscall behaviour may be hard to change later > and currently > - shadow stack overflow cannot be recovered from. > - longjmp out of signal handler fails (with sigaltshstk). > - SS_AUTODISARM does not work (jump back can fail). > > > > "Since shadow alt stacks are a new feature, longjmp()ing from an > > > alt shadow stack will simply not be supported. If a libc want=E2=80= =99s > > > to support this it will need to enable WRSS and write it=E2=80=99s ow= n > > > restore token." > > > > > > i think longjmp should work without enabling writes to the shadow > > > stack in the libc. this can also affect unwinding across signal > > > handlers (not for c++ but e.g. glibc thread cancellation). > > > > glibc today does not support longjmp()ing from a different stack (for > > example even today after a swapcontext()) when shadow stack is used. If > > glibc used wrss it could be supported maybe, but otherwise I don't see > > how the HW can support it. > > > > HJ and I were actually just discussing this the other day. Are you > > looking at this series with respect to the arm shadow stack feature by > > any chance? I would love if glibc/tools would document what the shadow > > stack limitations are. If the all the arch's have the same or similar > > limitations perhaps this could be one developer guide. For the most > > part though, the limitations I've encountered are in glibc and the > > kernel is more the building blocks. > > well we hope that shadow stack behaviour and limitations can > be similar across targets. > > longjmp to different stack should work: it can do the same as > setcontext/swapcontext: scan for the pivot token. then only > longjmp out of alt shadow stack fails. (this is non-conforming > longjmp use, but e.g. qemu relies on it.) Restore token may not be used with longjmp. Unlike setcontext/swapcontext, longjmp is optional. If longjmp isn't called, there will be an extra token on shadow stack and RET will fail. > for longjmp out of alt shadow stack, the target shadow stack > needs a pivot token, which implies the kernel needs to push that > on signal entry, which can overflow. but i suspect that can be > handled the same way as stackoverflow on signal entry is handled. > > > A general comment. Not sure if you are aware, but this shadow stack > > enabling effort is quite old at this point and there have been many > > discussions on these topics stretching back years. The latest > > conversation was around getting this series into linux-next soon to get > > some testing on the MM pieces. I really appreciate getting this ABI > > feedback as it is always tricky to get right, but at this stage I would > > hope to be focusing mostly on concrete problems. > > > > I also expect to have some amount of ABI growth going forward with all > > the normal things that entails. Shadow stack is not special in that it > > can come fully finalized without the need for the real world usage > > iterative feedback process. At some point we need to move forward with > > something, and we have quite a bit of initial changes at this point. > > > > So I would like to minimize the initial implementation unless anyone > > sees any likely problems with future growth. Can you be clear if you > > see any concrete problems at this point or are more looking to evaluate > > the design reasoning? I'm under the assumption there is nothing that > > would prohibit linux-next testing while any ABI shakedown happens > > concurrently at least? > > understood. > > the points that i think are worth raising: > > - shadow stack size logic may need to change later. > (it can be too big, or too small in practice.) > - shadow stack overflow is not recoverable and the > possible fix for that (sigaltshstk) breaks longjmp > out of signal handlers. > - jump back after SS_AUTODISARM swapcontext cannot be > reliable if alt signal uses thread shadow stack. > - the above two concerns may be mitigated by different > sigaltstack behaviour which may be hard to add later. > - end token for backtrace may be useful, if added > later it can be hard to check. > > thanks. --=20 H.J.