From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 75DD3C64EC4
	for <linux-mm@archiver.kernel.org>; Fri,  3 Mar 2023 16:58:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 113EA6B0071; Fri,  3 Mar 2023 11:58:07 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 0C3EB6B0075; Fri,  3 Mar 2023 11:58:07 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id ECD796B0078; Fri,  3 Mar 2023 11:58:06 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id DBCB46B0071
	for <linux-mm@kvack.org>; Fri,  3 Mar 2023 11:58:06 -0500 (EST)
Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id 37262AB939
	for <linux-mm@kvack.org>; Fri,  3 Mar 2023 16:58:06 +0000 (UTC)
X-FDA: 80528194572.18.4959B42
Received: from mail-yb1-f169.google.com (mail-yb1-f169.google.com [209.85.219.169])
	by imf25.hostedemail.com (Postfix) with ESMTP id 6B37CA0003
	for <linux-mm@kvack.org>; Fri,  3 Mar 2023 16:58:04 +0000 (UTC)
Authentication-Results: imf25.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=IDIBW0UN;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf25.hostedemail.com: domain of hjl.tools@gmail.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hjl.tools@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1677862684;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=;
	b=HjDyJ/9y5nSSRKrD4jitQKUv4SKvzM1Xgp/MBW6/CfC4n5ebS6SbuQ6aMb6PT7VkiXesSp
	NYWQjIV6sB1uzha7/WtWaSohZvIfpZQlBzruN0YEg5WOyDj21010mrrLQo+eg98JB95Rml
	8Ftx8KcvGenKH50XRCbwGSxvgBdD3l4=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=IDIBW0UN;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf25.hostedemail.com: domain of hjl.tools@gmail.com designates 209.85.219.169 as permitted sender) smtp.mailfrom=hjl.tools@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677862684; a=rsa-sha256;
	cv=none;
	b=LGqX8JkvQD+AiOGEsv0PleiLc03yDAMBlnl9oLE7CZXwTQHfjBC9e+FQSz8UK4icLtLrym
	oHHTFiqif6awhT7zDXjuk/c8c61Sv9QY4UuYILzM+GKJNwbDxRLYAlL7Eh368cvfGkk0qn
	N628KNLpuppFzK+DdPLE9u7fRt0WwrM=
Received: by mail-yb1-f169.google.com with SMTP id k199so2570085ybf.4
        for <linux-mm@kvack.org>; Fri, 03 Mar 2023 08:58:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1677862683;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=;
        b=IDIBW0UNVpMI+IMCx1+mckdBWjg/FVAdJKROu+nBsttHf8pTuYRxWlw4sNGnlqQR5l
         7WIyPrarL2OAHw3eugjv2bBaRATtrgVVY+6JMknIEirhKSPLNcEhhPNa7WHiZ2LpZ260
         /SzgyM/a3KmbmSGl9Vo+MJfY9mQLHxSmn2Iil/Me5OusWN6U2DYkkp10BA9g9dTpQSO/
         CWxVyJROAUMr++qVJPCqqdcFIgaP86Ysrw5vdbFlF8nf2oIfRk9GGkjmwINitq3+VFcm
         R6PaMqu3cjBYb9ZZlRdEIWH9NgDwNYjvOID4ClEtARGUOukOi+HIYQE2E1a5ZDVS3z2f
         z85g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1677862683;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=tURdhw498bOHnYQo1Wbcyg/CVD2GUHQ/ilVefn7+ZzY=;
        b=HMBHviIM6Etp9hcpS/TqOCXXYPdhNn+n+0bxyh+RHb/OyNFohTdOX5LUcvlzUIo6pt
         ngluThh4kXEaPNU9B0dRE/j18jcZnvGTa+uQUyMUYbwRqpyq+s+1YYytxijrNiSojt/T
         /7ROgKkcRW5+ugLBWjOTN33ONlFdpJ1zaWqTVUIh8V7OHTlEnSSlaMeyelRAzKOUqF9g
         wR+bN0CtmVRlCWqk4DxuZgSdgOTPzMIXnPZlhR8c3PKxHGPYXivJiAWm+NKerv97YpSU
         NRbHPJigDUzaqLmGXQWEPjMluIZz5dxoU7bkNsaUyO9aECMGjP22Nkw+kShoxLuspGu+
         yKfw==
X-Gm-Message-State: AO0yUKV0Lh/FTOrr4RandVKnpWStIEwz7dkZ12kigb89LYPKLF93LV5O
	YJ0S4hRh5JZ7dLeATxSHOCU0AS7oFCmBQlVGi4A=
X-Google-Smtp-Source: AK7set8CJEPgUHfm0NWAyZb3Ct1QVtkyvLCsvbeoZUIfZc1JR5vgiNci4fMnknHET2767/ewn501c8bkQgWEQc7mcyw=
X-Received: by 2002:a05:6902:2ca:b0:8a3:d147:280b with SMTP id
 w10-20020a05690202ca00b008a3d147280bmr1316955ybh.3.1677862683371; Fri, 03 Mar
 2023 08:58:03 -0800 (PST)
MIME-Version: 1.0
References: <Y/9fdYQ8Cd0GI+8C@arm.com> <636de4a28a42a082f182e940fbd8e63ea23895cc.camel@intel.com>
 <ZADLZJI1W1PCJf5t@arm.com> <8153f5d15ec6aa4a221fb945e16d315068bd06e4.camel@intel.com>
 <ZAIgrXQ4670gxlE4@arm.com>
In-Reply-To: <ZAIgrXQ4670gxlE4@arm.com>
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 3 Mar 2023 08:57:27 -0800
Message-ID: <CAMe9rOrM=HXBY25rYrjLnHzSvHFuui06qRpc4xufxeaaGW-Fmw@mail.gmail.com>
Subject: Re: [PATCH v7 01/41] Documentation/x86: Add CET shadow stack description
To: "szabolcs.nagy@arm.com" <szabolcs.nagy@arm.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>, "david@redhat.com" <david@redhat.com>, 
	"bsingharora@gmail.com" <bsingharora@gmail.com>, "hpa@zytor.com" <hpa@zytor.com>, 
	"Syromiatnikov, Eugene" <esyr@redhat.com>, "peterz@infradead.org" <peterz@infradead.org>, 
	"rdunlap@infradead.org" <rdunlap@infradead.org>, "keescook@chromium.org" <keescook@chromium.org>, 
	"Eranian, Stephane" <eranian@google.com>, 
	"kirill.shutemov@linux.intel.com" <kirill.shutemov@linux.intel.com>, 
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, 
	"fweimer@redhat.com" <fweimer@redhat.com>, "nadav.amit@gmail.com" <nadav.amit@gmail.com>, 
	"jannh@google.com" <jannh@google.com>, "dethoma@microsoft.com" <dethoma@microsoft.com>, 
	"broonie@kernel.org" <broonie@kernel.org>, "kcc@google.com" <kcc@google.com>, 
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "bp@alien8.de" <bp@alien8.de>, 
	"oleg@redhat.com" <oleg@redhat.com>, "Yang, Weijiang" <weijiang.yang@intel.com>, 
	"Lutomirski, Andy" <luto@kernel.org>, "pavel@ucw.cz" <pavel@ucw.cz>, "arnd@arndb.de" <arnd@arndb.de>, 
	"tglx@linutronix.de" <tglx@linutronix.de>, "Schimpe, Christina" <christina.schimpe@intel.com>, 
	"mike.kravetz@oracle.com" <mike.kravetz@oracle.com>, "x86@kernel.org" <x86@kernel.org>, 
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>, "debug@rivosinc.com" <debug@rivosinc.com>, 
	"jamorris@linux.microsoft.com" <jamorris@linux.microsoft.com>, "john.allen@amd.com" <john.allen@amd.com>, 
	"rppt@kernel.org" <rppt@kernel.org>, "andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>, 
	"mingo@redhat.com" <mingo@redhat.com>, "corbet@lwn.net" <corbet@lwn.net>, 
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, 
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>, "gorcunov@gmail.com" <gorcunov@gmail.com>, 
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>, "Yu, Yu-cheng" <yu-cheng.yu@intel.com>, 
	"nd@arm.com" <nd@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 6B37CA0003
X-Rspamd-Server: rspam09
X-Rspam-User: 
X-Stat-Signature: dadopukfo6qh8fusdhka3g8zxqjsmka8
X-HE-Tag: 1677862684-205141
X-HE-Meta: U2FsdGVkX1+zgT5oPpnjVocoQ19FuN1pw8wc1en5MjSu0yhkyNy6Hk90TlWub99Q6kIfRo4ZtClB7Kv2WKptVhqIhHMhX0oSklO0Sl7Ca2qMxmJ6qmj9HcMY+WyxHdfQd4gWUHTFBP/irZQzhs09wIbFAo26rgCAasu/aC7MCcj2Ntd55j1eYMjvD3PaHn2GC5w2Sm1ilZSVWkDbi1yILR3TN/WFSGchDOY+saaW1r4LNJ61/xodrwvX5KI5jtuHpEhanU1ElqzGUj8v+9an/KyAANTOPeAHOGqmkB81dN2SV2lI4b7Wr7G7VdriSJasUC3px3BA7Mom4E7bgEPLEQ+WZNnbu1y3v5cc2H+XnQu30dsHkL4k8YJMeqV3eqlxQrp2x9HyoYvjfVZaettAdoKVxqadgEwRJsVl97kYhzXuxaSbrbKCwUNVCgd1do+P6E0YIr2bQAaHVc5fFDYuLf0b0Rr2RCjR8RDS2Dmc0n6F+Dz5ViF6RXprVVUOeuCtMVl6xfgoo6XTQqOYG0xEMz/hfWyd+cKjk1lhpTG/TZY5S602VDH8FcUyNlN59stCZ9K9cNUhUR/VkNdzmvJzRVaH6Uiv5ORSi2s0++ppikuYvT+3tnC0/XTpMiI0flPL0HrhWC2u8qZRX0xkSu4FQo8Nu6UM1d1sEtCprJyjM9dsyhqlxpgF393PyaPcPvZzqb7YcXHRyrVZ6XkEs00oEuS+xwg3p9h5lHwL/V099VkG6lD+KnkUoAKnxgcmOdQ9/ZF8DtenoUmFg3KkmlqaigqqO9EPjpkaPqz0dJhFRctqb/JmH4H/ox4aH2kZASUrLhk/5CmdJ5WBe3OwIT/YMYChWMJzrajlvRAKiSlBWdHlsYJ3fu47Id/WP2LexpJDFgfrEX9G1fqbMGO1imH74WL9QGivZvn+XKDXanwKSMh+p3H7p2GUkRu/ZMGq9fOmF4BeoCqhxntMZDcijc7
 5WCDxTK1
 uml1uAQUqyyFIrNnHe1MOIYnPXaqDPFgSnK31gyBhfYc8MSuwhaUJGOIx5l0VAsjX0JDXw+bQGH1FKYqTUR8doiCavdtpW14fG+9zlPqSBv6w/ntJsVYYmzg1BBmZ4tVIWh1GZI4p0YOx6pNPxVA5LQTGKbpHOApBQXH0I0YK5k/w1o3VSI9+16ZSHahkQCChj9L8VcCboboJr8g6ZzdVuRKM/0VNhiHkpSmfCIXjNAmcEM3gWIkSYUe5tZqnBNPaxtcv6kF1sxHjpLB+OL+qKRyocZwr/J1tzCXI/S8bcACNk+ZfV9V8o8RqBoUcH1BEp8pulJTuyW8NDiS4iLe05VJz1uGd/WZuEq+v9zZJlMBGkaywVPB0AhzulzCTc49B14oKbb8zfrCdB+n8YD559VYw7bGEFAbZ0YnA2czPTvLdWo+flD5oxSFW7AbCqYu1cbmR05XU8KWTROVZp/Lk2GlAb1y0X7RliSx42rn2BN7Rp7V2nETddtJOLB3C2jA24KelQQawKS+/AGcikMbuUPZUMouI5PjJe+WKUzg1dyI7HzzC8jwQ3YvQwrE6/BJ5vPcG
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Mar 3, 2023 at 8:31=E2=80=AFAM szabolcs.nagy@arm.com
<szabolcs.nagy@arm.com> wrote:
>
> The 03/02/2023 21:17, Edgecombe, Rick P wrote:
> > Is the idea that shadow stack would be forced on regardless of if the
> > linked libraries support it? In which case it could be allowed to crash
> > if they do not?
>
> execute a binary
> - with shstk enabled and locked (only if marked?).
> - with shstk disabled and locked.
> could be managed in userspace, but it is libc dependent then.
>
> > > > > - I think it's better to have a new limit specifically for shadow
> > > > >   stack size (which by default can be RLIMIT_STACK) so userspace
> > > > >   can adjust it if needed (another reason is that stack size is
> > > > >   not always a good indicator of max call depth).
> >
> > Looking at this again, I'm not sure why a new rlimit is needed. It
> > seems many of those points were just formulations of that the clone3
> > stack size was not used, but it actually is and just not documented. If
> > you disagree perhaps you could elaborate on what the requirements are
> > and we can see if it seems tricky to do in a follow up.
>
> - tiny thread stack and deep signal stack.
> (note that this does not really work with glibc because it has
> implementation internal signals that don't run on alt stack,
> cannot be masked and don't fit on a tiny thread stack, but
> with other runtimes this can be a valid use-case, e.g. musl
> allows tiny thread stacks, < pagesize.)
>
> - thread runtimes with clone (glibc uses clone3 but some dont).
>
> - huge stacks but small call depth (problem if some va limit
>   is hit or memory overcommit is disabled).
>
> > > "sigaltshstk() is separate from sigaltstack(). You can have one
> > > without the other, neither or both together. Because the shadow
> > > stack specific state is pushed to the shadow stack, the two
> > > features don=E2=80=99t need to know about each other."
> ...
> > > i don't see why automatic alt shadow stack allocation would
> > > not work (kernel manages it transparently when an alt stack
> > > is installed or disabled).
> >
> > Ah, I think I see where maybe I can fill you in. Andy Luto had
> > discounted this idea out of hand originally, but I didn't see it at
> > first. sigaltstack lets you set, retrieve, or disable the shadow stack,
> > right... But this doesn't allocate anything, it just sets where the
> > next signal will be handled. This is different than things like threads
> > where there is a new resources being allocated and it makes coming up
> > with logic to guess when to de-allocate the alt shadow stack difficult.
> > You probably already know...
> >
> > But because of this there can be some modes where the shadow stack is
> > changed while on it. For one example, SS_AUTODISARM will disable the
> > alt shadow stack while switching to it and restore when sigreturning.
> > At which point a new altstack can be set. In the non-shadow stack case
> > this is nice because future signals won't clobber the alt stack if you
> > switch away from it (swapcontext(), etc). But it also means you can
> > "change" the alt stack while on it ("change" sort of, the auto disarm
> > results in the kernel forgetting it temporarily).
>
> the problem with swapcontext is that it may unmask signals
> that run on the alt stack, which means the code cannot jump
> back after another signal clobbered the alt stack.
>
> the non-standard SS_AUTODISARM aims to solve this by disabling
> alt stack settings on signal entry until the handler returns.
>
> so this use case is not about supporting swapcontext out, but
> about jumping back. however that does not work reliably with
> this patchset: if swapcontext goes to the thread stack (and
> not to another stack e.g. used by makecontext), then jump back
> fails. (and if there is a sigaltshstk installed then even jump
> out fails.)
>
> assuming
> - jump out from alt shadow stack can be made to work.
> - alt shadow stack management can be automatic.
> then this can be improved so jump back works reliably.
>
> > I hear where you are coming from with the desire to have it "just work"
> > with existing code, but I think the resulting ABI around the alt shadow
> > stack allocation lifecycle would be way too complicated even if it
> > could be made to work. Hence making a new interface. But also, the idea
> > was that the x86 signal ABI should support handling alt shadow stacks,
> > which is what we have done with this series. If a different interface
> > for configuring it is better than the one from the POC, I'm not seeing
> > a problem jump out. Is there any specific concern about backwards
> > compatibility here?
>
> sigaltstack syscall behaviour may be hard to change later
> and currently
> - shadow stack overflow cannot be recovered from.
> - longjmp out of signal handler fails (with sigaltshstk).
> - SS_AUTODISARM does not work (jump back can fail).
>
> > > "Since shadow alt stacks are a new feature, longjmp()ing from an
> > > alt shadow stack will simply not be supported. If a libc want=E2=80=
=99s
> > > to support this it will need to enable WRSS and write it=E2=80=99s ow=
n
> > > restore token."
> > >
> > > i think longjmp should work without enabling writes to the shadow
> > > stack in the libc. this can also affect unwinding across signal
> > > handlers (not for c++ but e.g. glibc thread cancellation).
> >
> > glibc today does not support longjmp()ing from a different stack (for
> > example even today after a swapcontext()) when shadow stack is used. If
> > glibc used wrss it could be supported maybe, but otherwise I don't see
> > how the HW can support it.
> >
> > HJ and I were actually just discussing this the other day. Are you
> > looking at this series with respect to the arm shadow stack feature by
> > any chance? I would love if glibc/tools would document what the shadow
> > stack limitations are. If the all the arch's have the same or similar
> > limitations perhaps this could be one developer guide. For the most
> > part though, the limitations I've encountered are in glibc and the
> > kernel is more the building blocks.
>
> well we hope that shadow stack behaviour and limitations can
> be similar across targets.
>
> longjmp to different stack should work: it can do the same as
> setcontext/swapcontext: scan for the pivot token. then only
> longjmp out of alt shadow stack fails. (this is non-conforming
> longjmp use, but e.g. qemu relies on it.)

Restore token may not be used with longjmp.  Unlike setcontext/swapcontext,
longjmp is optional.  If longjmp isn't called, there will be an extra
token on shadow
stack and RET will fail.

> for longjmp out of alt shadow stack, the target shadow stack
> needs a pivot token, which implies the kernel needs to push that
> on signal entry, which can overflow. but i suspect that can be
> handled the same way as stackoverflow on signal entry is handled.
>
> > A general comment. Not sure if you are aware, but this shadow stack
> > enabling effort is quite old at this point and there have been many
> > discussions on these topics stretching back years. The latest
> > conversation was around getting this series into linux-next soon to get
> > some testing on the MM pieces. I really appreciate getting this ABI
> > feedback as it is always tricky to get right, but at this stage I would
> > hope to be focusing mostly on concrete problems.
> >
> > I also expect to have some amount of ABI growth going forward with all
> > the normal things that entails. Shadow stack is not special in that it
> > can come fully finalized without the need for the real world usage
> > iterative feedback process. At some point we need to move forward with
> > something, and we have quite a bit of initial changes at this point.
> >
> > So I would like to minimize the initial implementation unless anyone
> > sees any likely problems with future growth. Can you be clear if you
> > see any concrete problems at this point or are more looking to evaluate
> > the design reasoning? I'm under the assumption there is nothing that
> > would prohibit linux-next testing while any ABI shakedown happens
> > concurrently at least?
>
> understood.
>
> the points that i think are worth raising:
>
> - shadow stack size logic may need to change later.
>   (it can be too big, or too small in practice.)
> - shadow stack overflow is not recoverable and the
>   possible fix for that (sigaltshstk) breaks longjmp
>   out of signal handlers.
> - jump back after SS_AUTODISARM swapcontext cannot be
>   reliable if alt signal uses thread shadow stack.
> - the above two concerns may be mitigated by different
>   sigaltstack behaviour which may be hard to add later.
> - end token for backtrace may be useful, if added
>   later it can be hard to check.
>
> thanks.


--=20
H.J.