From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E4D7C25B0C for ; Tue, 9 Aug 2022 17:04:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245330AbiHIREI (ORCPT ); Tue, 9 Aug 2022 13:04:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245237AbiHIREF (ORCPT ); Tue, 9 Aug 2022 13:04:05 -0400 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 557811E0 for ; Tue, 9 Aug 2022 10:04:04 -0700 (PDT) Received: by mail-qk1-x734.google.com with SMTP id d8so4549451qkk.1 for ; Tue, 09 Aug 2022 10:04:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wK2UgiO0xmPm9gijXHdgykd9J2+++2+6yb4/6WYnghc=; b=Qd2j159QFh2Ba9yZo2y6muhNd/MB6kx1pzreSiNh/IP8NYWmhVkxCntqx++BPMqkYa afojxFFSXsVdw3irQaSLsOSlvHg4zmyHko5zytdVke0ShLA5lXIf5+WJ+saTzEUy8jLk NlJnzoN2xfcUmzN+pImB1NmT9eYiOqGsnNN+s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wK2UgiO0xmPm9gijXHdgykd9J2+++2+6yb4/6WYnghc=; b=QAiTv+x/t4PeQapcnikfAK/7v33U9nDnXEpjFH+V+Qzj9Fr38vbggmF41rGI2Nue1H n7pDBovVoGboqW7J0ul/+sCbePwS+5SpmXxqSgS/doGgOOTm+VqOoYd1tHAoq2NxQYGS MqveocsLmR2nxKUEUQJJ9iNd2qrs2w4pskebr0CdiSL/jCL1kkWTt6SzKItKBcOlXZl5 CohYZExMLRB+0rulsEmQ9Bi2ZaZNQYUMtzDDpT5ae1B4t7WvoxHC6chofFFWA//otvTy BY1XeU4UvQhJpGDnkSB9PYmEWVbW/ch+VT8voWSGLEel3s4CKDVaoXPEmEh2Wpz0CKxv ZXZw== X-Gm-Message-State: ACgBeo2XBDKYeQ2ZNMdtBIEQILSckfTj5mC2JhPIZ6D5RbckDi2Vgczy SbIh+eKfqSntjK6qa1pa/zCyiYpMPhqhNlLp3FsA+g== X-Google-Smtp-Source: AA6agR5hqFK/rP1CLKlAqc+pNfT72uqlNlIbvr0GiCDZaizK6zy4eTXbYO7r2AkEzj0Uuu/VQkXwfJzymxc6m94w6eI= X-Received: by 2002:a05:620a:371e:b0:6b8:b7a4:42c8 with SMTP id de30-20020a05620a371e00b006b8b7a442c8mr18416821qkb.608.1660064643348; Tue, 09 Aug 2022 10:04:03 -0700 (PDT) MIME-Version: 1.0 References: <0f8fe661-c450-ccd8-761f-dbfff449c533@huawei.com> <40fda0b0-0efc-ea1b-96d5-e51a4d1593dd@huawei.com> <55c1b9d6-1d53-9752-fb03-00f60ed15db7@huawei.com> In-Reply-To: <55c1b9d6-1d53-9752-fb03-00f60ed15db7@huawei.com> From: Florent Revest Date: Tue, 9 Aug 2022 19:03:52 +0200 Message-ID: Subject: Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support To: Xu Kuohai Cc: Mark Rutland , bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, Catalin Marinas , Will Deacon , Steven Rostedt , Ingo Molnar , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S . Miller" , Hideaki YOSHIFUJI , David Ahern , Thomas Gleixner , Borislav Petkov , Dave Hansen , x86@kernel.org, hpa@zytor.com, Shuah Khan , Jakub Kicinski , Jesper Dangaard Brouer , Pasha Tatashin , Ard Biesheuvel , Daniel Kiss , Steven Price , Sudeep Holla , Marc Zyngier , Peter Collingbourne , Mark Brown , Delyan Kratunov , Kumar Kartikeya Dwivedi , Wang ShaoBo , cj.chengjian@huawei.com, huawei.libin@huawei.com, xiexiuqi@huawei.com, liwei391@huawei.com Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai wrote: > On 6/7/2022 12:35 AM, Mark Rutland wrote: > > On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote: > >> On 5/26/2022 6:06 PM, Mark Rutland wrote: > >>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote: > >>>> On 5/25/2022 9:38 PM, Mark Rutland wrote: > >>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote: > >>>>>> As noted in that thread, I have a few concerns which equally apply here: > >>>>> > >>>>> * Due to the limited range of BL instructions, it's not always possible to > >>>>> patch an ftrace call-site to branch to an arbitrary trampoline. The way this > >>>>> works for ftrace today relies upon knowingthe set of trampolines at > >>>>> compile-time, and allocating module PLTs for those, and that approach cannot > >>>>> work reliably for dynanically allocated trampolines. > >>>> > >>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no > >>>> bpf trampoline is constructed for out of range patch-site: > >>>> > >>>> if (is_long_jump(orig_call, image)) > >>>> return -ENOTSUPP; > >>> > >>> Sure, my point is that in practice that means that (from the user's PoV) this > >>> may randomly fail to work, and I'd like something that we can ensure works > >>> consistently. > >>> > >> > >> OK, should I suspend this work until you finish refactoring ftrace? > > > > Yes; I'd appreciate if we could hold on this for a bit. > > > > I think with some ground work we can avoid most of the painful edge cases and > > might be able to avoid the need for custom trampolines. > > > > I'v read your WIP code, but unfortunately I didn't find any mechanism to > replace bpf trampoline in your code, sorry. > > It looks like bpf trampoline and ftrace works can be done at the same > time. I think for now we can just attach bpf trampoline to bpf prog. > Once your ftrace work is done, we can add support for attaching bpf > trampoline to regular kernel function. Is this OK? Hey Mark and Xu! :) I'm interested in this feature too and would be happy to help. I've been trying to understand what you both have in mind to figure out a way forward, please correct me if I got anything wrong! :) It looks like, currently, there are three places where an indirection to BPF is technically possible. Chronologically these are: - the function's patchsite (currently there are 2 nops, this could become 4 nops with Mark's series on per call-site ops) - the ftrace ops (currently called by iterating over a global list but could be called more directly with Mark's series on per-call-site ops or by dynamically generated branches with Wang's series on dynamic trampolines) - a ftrace trampoline tail call (currently, this is after restoring a full pt_regs but this could become an args only restoration with Mark's series on DYNAMIC_FTRACE_WITH_ARGS) If we first consider the situation when only a BPF program is attached to a kernel function: - Using the patchsite for indirection (proposed by Xu, same as on x86) Pros: - We have BPF trampolines anyway because they are required for orthogonal features such as calling BPF programs as functions, so jumping into that existing JITed code is straightforward - This has the minimum overhead (eg: these trampolines only save the actual number of args used by the function in ctx and avoid indirect calls) Cons: - If the BPF trampoline is JITed outside BL's limits, attachment can randomly fail - Using a ftrace op for indirection (proposed by Mark) Pros: - BPF doesn't need to care about BL's range, ftrace_caller will be in range Cons: - The ftrace trampoline would first save all args in an ftrace_regs only for the BPF op to then re-save them in a BPF ctx array (as per BPF calling convention) so we'd effectively have to do the work of saving args twice - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64 should implement DIRECT_CALLS with... an indirect call :) (that is, the arch_ftrace_set_direct_caller op would turn back its ftrace_regs into arguments for the BPF trampoline) or BPF would need to use a different ftrace API just on arm64 (to define new ops, which, unless if they would be dynamically JITed, wouldn't be as performant as the existing BPF trampolines) - Using a ftrace trampoline tail call for indirection (not discussed yet iiuc) Pros: - BPF also doesn't need to care about BL's range - This also leverages the existing BPF trampolines Cons: - This also does the work of saving/restoring arguments twice - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS should be enough to call BPF trampolines If we consider the situation when both ftrace ops and BPF programs are attached to a kernel function: - Using the patchsite for indirection can't solve this - Using a ftrace op for indirection (proposed by Mark) or using a ftrace trampoline tail call as an indirection (proposed by Xu, same as on x86) have the same pros & cons as in the BPF only situation except that this time we pay the cost of registers saving twice for good reasons (we need args in both ftrace_regs and the BPF ctx array formats anyway) Unless I'm missing something, it sounds like the following approach would work: - Always patch patchsites with calls to ftrace trampolines (within BL ranges) - Always go through ops and have arch_ftrace_set_direct_caller set ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch) - If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail call it Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have DYNAMIC_FTRACE_WITH_DIRECT_CALLS depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS BPF trampolines (the only users of this API now) only care about args to the attachment point anyway so I think this would work transparently ? Once Mark's series on per-callsite ops is merged, the second step (going through ops) would be significantly faster in the situation where only one program is used, therefore one arch_ftrace_set_direct_caller op. Once Wang's series on dynamic trampolines is merged, the second step (going through ops) would also be significantly faster in the case when multiple ops are attached. What are your thoughts? If this sounds somewhat sane, I'm happy to help out with the implementation as well :) Thanks! Florent From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EE12DC19F2D for ; Tue, 9 Aug 2022 17:05:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=nJmogXWdGfc+ZB7LjqcP7Q9O7rwuOPYfSYT4/ajoQvQ=; b=h4V+iDnvGPuJYr o84wphrmy/ktGoOahjy4sTSxA+NmyNrcXWkaEjVNNioLCNQbtwip1vMdPVaGmQbwalmNc0M15KOVp BytCQBFubBu/xWxMjEpfIzv4xjRE80XHxy+LOThqDjsAuzu9E9/7BKjB1nKg1lI5YvGZF3LICJI+p hKI9tLutCqROAeZMZ8pOT86GgpO3KJZDKXpZa9hAjQ/2ohwMy6LYo6zrpuZuX1/q4hU+5YGRKakEo zWV8jCdNgMtF14FUXE4Xm+5wLo/ITUQk4cqvajEo263S97GA4Um4pZd6AWK5zXGHFRktQSSXns3RE 6VUIeCrhtw3UhJM2/XrQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1oLSeK-0059Po-JQ; Tue, 09 Aug 2022 17:04:12 +0000 Received: from mail-qk1-x72b.google.com ([2607:f8b0:4864:20::72b]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1oLSeG-0059Mb-5u for linux-arm-kernel@lists.infradead.org; Tue, 09 Aug 2022 17:04:09 +0000 Received: by mail-qk1-x72b.google.com with SMTP id u24so1534309qku.2 for ; Tue, 09 Aug 2022 10:04:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wK2UgiO0xmPm9gijXHdgykd9J2+++2+6yb4/6WYnghc=; b=Qd2j159QFh2Ba9yZo2y6muhNd/MB6kx1pzreSiNh/IP8NYWmhVkxCntqx++BPMqkYa afojxFFSXsVdw3irQaSLsOSlvHg4zmyHko5zytdVke0ShLA5lXIf5+WJ+saTzEUy8jLk NlJnzoN2xfcUmzN+pImB1NmT9eYiOqGsnNN+s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wK2UgiO0xmPm9gijXHdgykd9J2+++2+6yb4/6WYnghc=; b=FekTdFmGVo/QxahvrDe6zVTYKMfFYZfHKq46MwOaJjngC8XHYo6msYCqFfNqZA6Psa Ruhu1flu+bUfKkdxTUVrz4TA71kUsgeCbk2+OMOwkm7pPwkNTjEnp82xb61Kpe00PENh b5eOaaWjI2XXWpn/4sNH9qBb/CiPR9Qot4vRamO/DdhZvwpnGDAvz3jzkXJWWxnhYYZG Ft+xkBbaZrXnJfUajE7dcBKI2Fs0bby0Ws31b5qNpfBwb1r1i6/CD95o7rWXCXLyAoLr T3tQbjyV5z2VzHPE3/5PlNR6RwxaiB5176/eF3rW0b7fCxXfAH8hmQoOce40Rohg3a4q VAXg== X-Gm-Message-State: ACgBeo1WhOJo9dhak+XUIlQoKTAqUCZohchxmev76v1Lf15NWROM4Gmd f86+Qe6XiwuWoHwd2HAGzmsQ7dAHAag0ooDguWkHtw== X-Google-Smtp-Source: AA6agR5hqFK/rP1CLKlAqc+pNfT72uqlNlIbvr0GiCDZaizK6zy4eTXbYO7r2AkEzj0Uuu/VQkXwfJzymxc6m94w6eI= X-Received: by 2002:a05:620a:371e:b0:6b8:b7a4:42c8 with SMTP id de30-20020a05620a371e00b006b8b7a442c8mr18416821qkb.608.1660064643348; Tue, 09 Aug 2022 10:04:03 -0700 (PDT) MIME-Version: 1.0 References: <0f8fe661-c450-ccd8-761f-dbfff449c533@huawei.com> <40fda0b0-0efc-ea1b-96d5-e51a4d1593dd@huawei.com> <55c1b9d6-1d53-9752-fb03-00f60ed15db7@huawei.com> In-Reply-To: <55c1b9d6-1d53-9752-fb03-00f60ed15db7@huawei.com> From: Florent Revest Date: Tue, 9 Aug 2022 19:03:52 +0200 Message-ID: Subject: Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support To: Xu Kuohai Cc: Mark Rutland , bpf@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-kselftest@vger.kernel.org, Catalin Marinas , Will Deacon , Steven Rostedt , Ingo Molnar , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S . Miller" , Hideaki YOSHIFUJI , David Ahern , Thomas Gleixner , Borislav Petkov , Dave Hansen , x86@kernel.org, hpa@zytor.com, Shuah Khan , Jakub Kicinski , Jesper Dangaard Brouer , Pasha Tatashin , Ard Biesheuvel , Daniel Kiss , Steven Price , Sudeep Holla , Marc Zyngier , Peter Collingbourne , Mark Brown , Delyan Kratunov , Kumar Kartikeya Dwivedi , Wang ShaoBo , cj.chengjian@huawei.com, huawei.libin@huawei.com, xiexiuqi@huawei.com, liwei391@huawei.com X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220809_100408_282088_9A250EA1 X-CRM114-Status: GOOD ( 43.75 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Jun 9, 2022 at 6:27 AM Xu Kuohai wrote: > On 6/7/2022 12:35 AM, Mark Rutland wrote: > > On Thu, May 26, 2022 at 10:48:05PM +0800, Xu Kuohai wrote: > >> On 5/26/2022 6:06 PM, Mark Rutland wrote: > >>> On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote: > >>>> On 5/25/2022 9:38 PM, Mark Rutland wrote: > >>>>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote: > >>>>>> As noted in that thread, I have a few concerns which equally apply here: > >>>>> > >>>>> * Due to the limited range of BL instructions, it's not always possible to > >>>>> patch an ftrace call-site to branch to an arbitrary trampoline. The way this > >>>>> works for ftrace today relies upon knowingthe set of trampolines at > >>>>> compile-time, and allocating module PLTs for those, and that approach cannot > >>>>> work reliably for dynanically allocated trampolines. > >>>> > >>>> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no > >>>> bpf trampoline is constructed for out of range patch-site: > >>>> > >>>> if (is_long_jump(orig_call, image)) > >>>> return -ENOTSUPP; > >>> > >>> Sure, my point is that in practice that means that (from the user's PoV) this > >>> may randomly fail to work, and I'd like something that we can ensure works > >>> consistently. > >>> > >> > >> OK, should I suspend this work until you finish refactoring ftrace? > > > > Yes; I'd appreciate if we could hold on this for a bit. > > > > I think with some ground work we can avoid most of the painful edge cases and > > might be able to avoid the need for custom trampolines. > > > > I'v read your WIP code, but unfortunately I didn't find any mechanism to > replace bpf trampoline in your code, sorry. > > It looks like bpf trampoline and ftrace works can be done at the same > time. I think for now we can just attach bpf trampoline to bpf prog. > Once your ftrace work is done, we can add support for attaching bpf > trampoline to regular kernel function. Is this OK? Hey Mark and Xu! :) I'm interested in this feature too and would be happy to help. I've been trying to understand what you both have in mind to figure out a way forward, please correct me if I got anything wrong! :) It looks like, currently, there are three places where an indirection to BPF is technically possible. Chronologically these are: - the function's patchsite (currently there are 2 nops, this could become 4 nops with Mark's series on per call-site ops) - the ftrace ops (currently called by iterating over a global list but could be called more directly with Mark's series on per-call-site ops or by dynamically generated branches with Wang's series on dynamic trampolines) - a ftrace trampoline tail call (currently, this is after restoring a full pt_regs but this could become an args only restoration with Mark's series on DYNAMIC_FTRACE_WITH_ARGS) If we first consider the situation when only a BPF program is attached to a kernel function: - Using the patchsite for indirection (proposed by Xu, same as on x86) Pros: - We have BPF trampolines anyway because they are required for orthogonal features such as calling BPF programs as functions, so jumping into that existing JITed code is straightforward - This has the minimum overhead (eg: these trampolines only save the actual number of args used by the function in ctx and avoid indirect calls) Cons: - If the BPF trampoline is JITed outside BL's limits, attachment can randomly fail - Using a ftrace op for indirection (proposed by Mark) Pros: - BPF doesn't need to care about BL's range, ftrace_caller will be in range Cons: - The ftrace trampoline would first save all args in an ftrace_regs only for the BPF op to then re-save them in a BPF ctx array (as per BPF calling convention) so we'd effectively have to do the work of saving args twice - BPF currently uses DYNAMIC_FTRACE_WITH_DIRECT_CALLS APIs. Either arm64 should implement DIRECT_CALLS with... an indirect call :) (that is, the arch_ftrace_set_direct_caller op would turn back its ftrace_regs into arguments for the BPF trampoline) or BPF would need to use a different ftrace API just on arm64 (to define new ops, which, unless if they would be dynamically JITed, wouldn't be as performant as the existing BPF trampolines) - Using a ftrace trampoline tail call for indirection (not discussed yet iiuc) Pros: - BPF also doesn't need to care about BL's range - This also leverages the existing BPF trampolines Cons: - This also does the work of saving/restoring arguments twice - DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_REGS now although in practice the registers kept by DYNAMIC_FTRACE_WITH_ARGS should be enough to call BPF trampolines If we consider the situation when both ftrace ops and BPF programs are attached to a kernel function: - Using the patchsite for indirection can't solve this - Using a ftrace op for indirection (proposed by Mark) or using a ftrace trampoline tail call as an indirection (proposed by Xu, same as on x86) have the same pros & cons as in the BPF only situation except that this time we pay the cost of registers saving twice for good reasons (we need args in both ftrace_regs and the BPF ctx array formats anyway) Unless I'm missing something, it sounds like the following approach would work: - Always patch patchsites with calls to ftrace trampolines (within BL ranges) - Always go through ops and have arch_ftrace_set_direct_caller set ftrace_regs->direct_call (instead of pt_regs->orig_x0 in this patch) - If ftrace_regs->direct_call != 0 at the end of the ftrace trampoline, tail call it Once Mark's series on DYNAMIC_FTRACE_WITH_ARGS is merged, we would need to have DYNAMIC_FTRACE_WITH_DIRECT_CALLS depend on DYNAMIC_FTRACE_WITH_REGS || DYNAMIC_FTRACE_WITH_ARGS BPF trampolines (the only users of this API now) only care about args to the attachment point anyway so I think this would work transparently ? Once Mark's series on per-callsite ops is merged, the second step (going through ops) would be significantly faster in the situation where only one program is used, therefore one arch_ftrace_set_direct_caller op. Once Wang's series on dynamic trampolines is merged, the second step (going through ops) would also be significantly faster in the case when multiple ops are attached. What are your thoughts? If this sounds somewhat sane, I'm happy to help out with the implementation as well :) Thanks! Florent _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel