From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7A2CC433DB for ; Thu, 4 Mar 2021 18:53:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8025964F62 for ; Thu, 4 Mar 2021 18:53:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235277AbhCDSxH (ORCPT ); Thu, 4 Mar 2021 13:53:07 -0500 Received: from foss.arm.com ([217.140.110.172]:42844 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231315AbhCDSwi (ORCPT ); Thu, 4 Mar 2021 13:52:38 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EEE5331B; Thu, 4 Mar 2021 10:51:52 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.53.210]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D5FF23F7D7; Thu, 4 Mar 2021 10:51:50 -0800 (PST) Date: Thu, 4 Mar 2021 18:51:48 +0000 From: Mark Rutland To: Marco Elver Cc: Christophe Leroy , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , LKML , linuxppc-dev@lists.ozlabs.org, kasan-dev , Catalin Marinas , Will Deacon , Linux ARM , broonie@kernel.org, linux-toolchains@vger.kernel.org Subject: Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends Message-ID: <20210304185148.GE60457@C02TD0UTHF1T.local> References: <1802be3e-dc1a-52e0-1754-a40f0ea39658@csgroup.eu> <20210304145730.GC54534@C02TD0UTHF1T.local> <20210304165923.GA60457@C02TD0UTHF1T.local> <20210304180154.GD60457@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 04, 2021 at 07:22:53PM +0100, Marco Elver wrote: > On Thu, 4 Mar 2021 at 19:02, Mark Rutland wrote: > > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote: > > > On Thu, Mar 04, 2021 at 04:59PM +0000, Mark Rutland wrote: > > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote: > > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland wrote: > > > > > > [adding Mark Brown] > > > > > > > > > > > > The bigger problem here is that skipping is dodgy to begin with, and > > > > > > this is still liable to break in some cases. One big concern is that > > > > > > (especially with LTO) we cannot guarantee the compiler will not inline > > > > > > or outline functions, causing the skipp value to be too large or too > > > > > > small. That's liable to happen to callers, and in theory (though > > > > > > unlikely in practice), portions of arch_stack_walk() or > > > > > > stack_trace_save() could get outlined too. > > > > > > > > > > > > Unless we can get some strong guarantees from compiler folk such that we > > > > > > can guarantee a specific function acts boundary for unwinding (and > > > > > > doesn't itself get split, etc), the only reliable way I can think to > > > > > > solve this requires an assembly trampoline. Whatever we do is liable to > > > > > > need some invasive rework. > > > > > > > > > > Will LTO and friends respect 'noinline'? > > > > > > > > I hope so (and suspect we'd have more problems otherwise), but I don't > > > > know whether they actually so. > > > > > > > > I suspect even with 'noinline' the compiler is permitted to outline > > > > portions of a function if it wanted to (and IIUC it could still make > > > > specialized copies in the absence of 'noclone'). > > > > > > > > > One thing I also noticed is that tail calls would also cause the stack > > > > > trace to appear somewhat incomplete (for some of my tests I've > > > > > disabled tail call optimizations). > > > > > > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a > > > > trace A->C? ... or is A going missing too? > > > > > > Correct, it's just the A->C outcome. > > > > I'd assumed that those cases were benign, e.g. for livepatching what > > matters is what can be returned to, so B disappearing from the trace > > isn't a problem there. > > > > Is the concern debugability, or is there a functional issue you have in > > mind? > > For me, it's just been debuggability, and reliable test cases. > > > > > > Is there a way to also mark a function non-tail-callable? > > > > > > > > I think this can be bodged using __attribute__((optimize("$OPTIONS"))) > > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support > > > > function-local optimization options), but I don't expect there's any way > > > > to mark a callee as not being tail-callable. > > > > > > I don't think this is reliable. It'd be > > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't > > > work if applied to the function we do not want to tail-call-optimize, > > > but would have to be applied to the function that does the tail-calling. > > > > Yup; that's what I meant then I said you could do that on the caller but > > not the callee. > > > > I don't follow why you'd want to put this on the callee, though, so I > > think I'm missing something. Considering a set of functions in different > > compilation units: > > > > A->B->C->D->E->F->G->H->I->J->K > > I was having this problem with KCSAN, where the compiler would > tail-call-optimize __tsan_X instrumentation. Those are compiler-generated calls, right? When those are generated the compilation unit (and whatever it has included) might not have provided a prototype anyway, and the compiler has special knowledge of the functions, so it feels like the compiler would need to inhibit TCO here for this to be robust. For their intended usage subjecting them to TCO doesn't seem to make sense AFAICT. I suspect that compilers have some way of handling that; otherwise I'd expect to have heard stories of mcount/fentry calls getting TCO'd and causing problems. So maybe there's an easy fix there? > This would mean that KCSAN runtime functions ended up in the trace, > but the function where the access happened would not. However, I don't > care about the runtime functions, and instead want to see the function > where the access happened. In that case, I'd like to just mark > __tsan_X and any other kcsan instrumentation functions as > do-not-tail-call-optimize, which would solve the problem. I understand why we don't want to TCO these calls, but given the calls are implicitly generated, I strongly suspect it's better to fix the implicit call generation to not be TCO'd to begin with. > The solution today is that when you compile a kernel with KCSAN, every > instrumented TU is compiled with -fno-optimize-sibling-calls. The > better solution would be to just mark KCSAN runtime functions somehow, > but permit tail calling other things. Although, I probably still want > to see the full trace, and would decide that having > -fno-optimize-sibling-calls is a small price to pay in a > debug-only-kernel to get complete traces. > > > ... if K were marked in this way, and J was compiled with visibility of > > this, J would stick around, but J's callers might not, and so the a > > trace might see: > > > > A->J->K > > > > ... do you just care about the final caller, i.e. you just need > > certainty that J will be in the trace? > > Yes. But maybe it's a special problem that only sanitizers have. I reckon for basically any instrumentation we don't want calls to be TCO'd, though I'm not immediately sure of cases beyond sanitizers and mcount/fentry. Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB43DC433DB for ; Thu, 4 Mar 2021 18:52:17 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0E79D64F60 for ; Thu, 4 Mar 2021 18:52:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0E79D64F60 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4Ds0NH2C9Cz3dC1 for ; Fri, 5 Mar 2021 05:52:15 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=arm.com (client-ip=217.140.110.172; helo=foss.arm.com; envelope-from=mark.rutland@arm.com; receiver=) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lists.ozlabs.org (Postfix) with ESMTP id 4Ds0Mw0pswz30Lh for ; Fri, 5 Mar 2021 05:51:54 +1100 (AEDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EEE5331B; Thu, 4 Mar 2021 10:51:52 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.53.210]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D5FF23F7D7; Thu, 4 Mar 2021 10:51:50 -0800 (PST) Date: Thu, 4 Mar 2021 18:51:48 +0000 From: Mark Rutland To: Marco Elver Subject: Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends Message-ID: <20210304185148.GE60457@C02TD0UTHF1T.local> References: <1802be3e-dc1a-52e0-1754-a40f0ea39658@csgroup.eu> <20210304145730.GC54534@C02TD0UTHF1T.local> <20210304165923.GA60457@C02TD0UTHF1T.local> <20210304180154.GD60457@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Catalin Marinas , Will Deacon , LKML , broonie@kernel.org, Paul Mackerras , kasan-dev , linux-toolchains@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Linux ARM Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu, Mar 04, 2021 at 07:22:53PM +0100, Marco Elver wrote: > On Thu, 4 Mar 2021 at 19:02, Mark Rutland wrote: > > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote: > > > On Thu, Mar 04, 2021 at 04:59PM +0000, Mark Rutland wrote: > > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote: > > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland wrote: > > > > > > [adding Mark Brown] > > > > > > > > > > > > The bigger problem here is that skipping is dodgy to begin with, and > > > > > > this is still liable to break in some cases. One big concern is that > > > > > > (especially with LTO) we cannot guarantee the compiler will not inline > > > > > > or outline functions, causing the skipp value to be too large or too > > > > > > small. That's liable to happen to callers, and in theory (though > > > > > > unlikely in practice), portions of arch_stack_walk() or > > > > > > stack_trace_save() could get outlined too. > > > > > > > > > > > > Unless we can get some strong guarantees from compiler folk such that we > > > > > > can guarantee a specific function acts boundary for unwinding (and > > > > > > doesn't itself get split, etc), the only reliable way I can think to > > > > > > solve this requires an assembly trampoline. Whatever we do is liable to > > > > > > need some invasive rework. > > > > > > > > > > Will LTO and friends respect 'noinline'? > > > > > > > > I hope so (and suspect we'd have more problems otherwise), but I don't > > > > know whether they actually so. > > > > > > > > I suspect even with 'noinline' the compiler is permitted to outline > > > > portions of a function if it wanted to (and IIUC it could still make > > > > specialized copies in the absence of 'noclone'). > > > > > > > > > One thing I also noticed is that tail calls would also cause the stack > > > > > trace to appear somewhat incomplete (for some of my tests I've > > > > > disabled tail call optimizations). > > > > > > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a > > > > trace A->C? ... or is A going missing too? > > > > > > Correct, it's just the A->C outcome. > > > > I'd assumed that those cases were benign, e.g. for livepatching what > > matters is what can be returned to, so B disappearing from the trace > > isn't a problem there. > > > > Is the concern debugability, or is there a functional issue you have in > > mind? > > For me, it's just been debuggability, and reliable test cases. > > > > > > Is there a way to also mark a function non-tail-callable? > > > > > > > > I think this can be bodged using __attribute__((optimize("$OPTIONS"))) > > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support > > > > function-local optimization options), but I don't expect there's any way > > > > to mark a callee as not being tail-callable. > > > > > > I don't think this is reliable. It'd be > > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't > > > work if applied to the function we do not want to tail-call-optimize, > > > but would have to be applied to the function that does the tail-calling. > > > > Yup; that's what I meant then I said you could do that on the caller but > > not the callee. > > > > I don't follow why you'd want to put this on the callee, though, so I > > think I'm missing something. Considering a set of functions in different > > compilation units: > > > > A->B->C->D->E->F->G->H->I->J->K > > I was having this problem with KCSAN, where the compiler would > tail-call-optimize __tsan_X instrumentation. Those are compiler-generated calls, right? When those are generated the compilation unit (and whatever it has included) might not have provided a prototype anyway, and the compiler has special knowledge of the functions, so it feels like the compiler would need to inhibit TCO here for this to be robust. For their intended usage subjecting them to TCO doesn't seem to make sense AFAICT. I suspect that compilers have some way of handling that; otherwise I'd expect to have heard stories of mcount/fentry calls getting TCO'd and causing problems. So maybe there's an easy fix there? > This would mean that KCSAN runtime functions ended up in the trace, > but the function where the access happened would not. However, I don't > care about the runtime functions, and instead want to see the function > where the access happened. In that case, I'd like to just mark > __tsan_X and any other kcsan instrumentation functions as > do-not-tail-call-optimize, which would solve the problem. I understand why we don't want to TCO these calls, but given the calls are implicitly generated, I strongly suspect it's better to fix the implicit call generation to not be TCO'd to begin with. > The solution today is that when you compile a kernel with KCSAN, every > instrumented TU is compiled with -fno-optimize-sibling-calls. The > better solution would be to just mark KCSAN runtime functions somehow, > but permit tail calling other things. Although, I probably still want > to see the full trace, and would decide that having > -fno-optimize-sibling-calls is a small price to pay in a > debug-only-kernel to get complete traces. > > > ... if K were marked in this way, and J was compiled with visibility of > > this, J would stick around, but J's callers might not, and so the a > > trace might see: > > > > A->J->K > > > > ... do you just care about the final caller, i.e. you just need > > certainty that J will be in the trace? > > Yes. But maybe it's a special problem that only sanitizers have. I reckon for basically any instrumentation we don't want calls to be TCO'd, though I'm not immediately sure of cases beyond sanitizers and mcount/fentry. Thanks, Mark. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B14AC433DB for ; Thu, 4 Mar 2021 18:53:43 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B4B6F64F60 for ; Thu, 4 Mar 2021 18:53:42 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B4B6F64F60 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ys/X4zUPoEdkSMsWrd33QyRQTkJnBVWb1u6Ud7i0hUs=; b=pmcMBCAY3DFlRIJIXfZRRA2Tp GS2Yu840ppbvQQEOn4AnDVgJ6YAR2+xDiElIWOmbIIHc1u0QfDJesEWS2vi1hGlV7+hIjhBvdsMKn G2cwXge60wz3LGLhQVZZsK/VXjx656aXBBjJ2CIJfXwSrdhoyHa0qZftujmYGUJypHT4QCZKcNjSS rfXQgW0U6njyX4rwVr7pHsi35ai7UsaLlGOk8tfKQUjeApOUFdvnztSMcm+IFet3CCjV7u+th/+pj nFJbDQs7elMj787SO33021xkCPlvbRkgtu4RfzkQZs4BRIJzfmcd9U+hr3AZ94/eIbR1xbdVBS4is mSvBcunkQ==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lHt4x-009dwl-0h; Thu, 04 Mar 2021 18:52:07 +0000 Received: from casper.infradead.org ([2001:8b0:10b:1236::1]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lHt4t-009dwJ-Nt for linux-arm-kernel@desiato.infradead.org; Thu, 04 Mar 2021 18:52:03 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=cUmkhQt/y6pC826CUsKJjFSqa8Jc4NK0rFmfgLN77/M=; b=l/LekwZOvrkPnWOxqBisJd4OWV 5o+2xu9oMlqxbWZTFVJj/N9NT6SdiL4Onxv5/0FfmdIPBpYR+qe6jQEonX8t+cLNO5tUkTRM6ddSq ygZy71hx27z/cAzIkGa6+q69vuy6slGAFg5xRFrCRctNYrZzNCINKByDTQYfV7jjlanJLBCagF3sk TJoT80G11yZwtohYGprxSjX1Y2zJrVuwDmK+vTHGA1RxN0frulqn9XeoFoey1vKD0jqCdgMKIYfC0 MWGjQIyhRCldjPs3jtymcHUjYp2NXMzGlaghfCYOtT5MvXHwp7aPGLAkUXyjhdOpt8TaRxVKVVrP+ YLzYw8ng==; Received: from foss.arm.com ([217.140.110.172]) by casper.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lHt4r-008GkL-NX for linux-arm-kernel@lists.infradead.org; Thu, 04 Mar 2021 18:52:02 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EEE5331B; Thu, 4 Mar 2021 10:51:52 -0800 (PST) Received: from C02TD0UTHF1T.local (unknown [10.57.53.210]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D5FF23F7D7; Thu, 4 Mar 2021 10:51:50 -0800 (PST) Date: Thu, 4 Mar 2021 18:51:48 +0000 From: Mark Rutland To: Marco Elver Cc: Christophe Leroy , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , LKML , linuxppc-dev@lists.ozlabs.org, kasan-dev , Catalin Marinas , Will Deacon , Linux ARM , broonie@kernel.org, linux-toolchains@vger.kernel.org Subject: Re: [PATCH v1] powerpc: Include running function as first entry in save_stack_trace() and friends Message-ID: <20210304185148.GE60457@C02TD0UTHF1T.local> References: <1802be3e-dc1a-52e0-1754-a40f0ea39658@csgroup.eu> <20210304145730.GC54534@C02TD0UTHF1T.local> <20210304165923.GA60457@C02TD0UTHF1T.local> <20210304180154.GD60457@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210304_185202_530485_7BF53675 X-CRM114-Status: GOOD ( 46.24 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Mar 04, 2021 at 07:22:53PM +0100, Marco Elver wrote: > On Thu, 4 Mar 2021 at 19:02, Mark Rutland wrote: > > On Thu, Mar 04, 2021 at 06:25:33PM +0100, Marco Elver wrote: > > > On Thu, Mar 04, 2021 at 04:59PM +0000, Mark Rutland wrote: > > > > On Thu, Mar 04, 2021 at 04:30:34PM +0100, Marco Elver wrote: > > > > > On Thu, 4 Mar 2021 at 15:57, Mark Rutland wrote: > > > > > > [adding Mark Brown] > > > > > > > > > > > > The bigger problem here is that skipping is dodgy to begin with, and > > > > > > this is still liable to break in some cases. One big concern is that > > > > > > (especially with LTO) we cannot guarantee the compiler will not inline > > > > > > or outline functions, causing the skipp value to be too large or too > > > > > > small. That's liable to happen to callers, and in theory (though > > > > > > unlikely in practice), portions of arch_stack_walk() or > > > > > > stack_trace_save() could get outlined too. > > > > > > > > > > > > Unless we can get some strong guarantees from compiler folk such that we > > > > > > can guarantee a specific function acts boundary for unwinding (and > > > > > > doesn't itself get split, etc), the only reliable way I can think to > > > > > > solve this requires an assembly trampoline. Whatever we do is liable to > > > > > > need some invasive rework. > > > > > > > > > > Will LTO and friends respect 'noinline'? > > > > > > > > I hope so (and suspect we'd have more problems otherwise), but I don't > > > > know whether they actually so. > > > > > > > > I suspect even with 'noinline' the compiler is permitted to outline > > > > portions of a function if it wanted to (and IIUC it could still make > > > > specialized copies in the absence of 'noclone'). > > > > > > > > > One thing I also noticed is that tail calls would also cause the stack > > > > > trace to appear somewhat incomplete (for some of my tests I've > > > > > disabled tail call optimizations). > > > > > > > > I assume you mean for a chain A->B->C where B tail-calls C, you get a > > > > trace A->C? ... or is A going missing too? > > > > > > Correct, it's just the A->C outcome. > > > > I'd assumed that those cases were benign, e.g. for livepatching what > > matters is what can be returned to, so B disappearing from the trace > > isn't a problem there. > > > > Is the concern debugability, or is there a functional issue you have in > > mind? > > For me, it's just been debuggability, and reliable test cases. > > > > > > Is there a way to also mark a function non-tail-callable? > > > > > > > > I think this can be bodged using __attribute__((optimize("$OPTIONS"))) > > > > on a caller to inhibit TCO (though IIRC GCC doesn't reliably support > > > > function-local optimization options), but I don't expect there's any way > > > > to mark a callee as not being tail-callable. > > > > > > I don't think this is reliable. It'd be > > > __attribute__((optimize("-fno-optimize-sibling-calls"))), but doesn't > > > work if applied to the function we do not want to tail-call-optimize, > > > but would have to be applied to the function that does the tail-calling. > > > > Yup; that's what I meant then I said you could do that on the caller but > > not the callee. > > > > I don't follow why you'd want to put this on the callee, though, so I > > think I'm missing something. Considering a set of functions in different > > compilation units: > > > > A->B->C->D->E->F->G->H->I->J->K > > I was having this problem with KCSAN, where the compiler would > tail-call-optimize __tsan_X instrumentation. Those are compiler-generated calls, right? When those are generated the compilation unit (and whatever it has included) might not have provided a prototype anyway, and the compiler has special knowledge of the functions, so it feels like the compiler would need to inhibit TCO here for this to be robust. For their intended usage subjecting them to TCO doesn't seem to make sense AFAICT. I suspect that compilers have some way of handling that; otherwise I'd expect to have heard stories of mcount/fentry calls getting TCO'd and causing problems. So maybe there's an easy fix there? > This would mean that KCSAN runtime functions ended up in the trace, > but the function where the access happened would not. However, I don't > care about the runtime functions, and instead want to see the function > where the access happened. In that case, I'd like to just mark > __tsan_X and any other kcsan instrumentation functions as > do-not-tail-call-optimize, which would solve the problem. I understand why we don't want to TCO these calls, but given the calls are implicitly generated, I strongly suspect it's better to fix the implicit call generation to not be TCO'd to begin with. > The solution today is that when you compile a kernel with KCSAN, every > instrumented TU is compiled with -fno-optimize-sibling-calls. The > better solution would be to just mark KCSAN runtime functions somehow, > but permit tail calling other things. Although, I probably still want > to see the full trace, and would decide that having > -fno-optimize-sibling-calls is a small price to pay in a > debug-only-kernel to get complete traces. > > > ... if K were marked in this way, and J was compiled with visibility of > > this, J would stick around, but J's callers might not, and so the a > > trace might see: > > > > A->J->K > > > > ... do you just care about the final caller, i.e. you just need > > certainty that J will be in the trace? > > Yes. But maybe it's a special problem that only sanitizers have. I reckon for basically any instrumentation we don't want calls to be TCO'd, though I'm not immediately sure of cases beyond sanitizers and mcount/fentry. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel