From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD2DDC433F5 for ; Thu, 26 May 2022 14:48:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1347743AbiEZOsO (ORCPT ); Thu, 26 May 2022 10:48:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47208 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1347739AbiEZOsM (ORCPT ); Thu, 26 May 2022 10:48:12 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ACC4F5DA47; Thu, 26 May 2022 07:48:10 -0700 (PDT) Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4L89k32DZnz1JCRq; Thu, 26 May 2022 22:46:35 +0800 (CST) Received: from [10.67.111.192] (10.67.111.192) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 26 May 2022 22:48:05 +0800 Message-ID: <40fda0b0-0efc-ea1b-96d5-e51a4d1593dd@huawei.com> Date: Thu, 26 May 2022 22:48:05 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support Content-Language: en-US To: Mark Rutland CC: , , , , , Catalin Marinas , Will Deacon , Steven Rostedt , Ingo Molnar , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S . Miller" , Hideaki YOSHIFUJI , David Ahern , Thomas Gleixner , Borislav Petkov , Dave Hansen , , , Shuah Khan , Jakub Kicinski , Jesper Dangaard Brouer , Pasha Tatashin , Ard Biesheuvel , Daniel Kiss , Steven Price , Sudeep Holla , Marc Zyngier , Peter Collingbourne , Mark Brown , Delyan Kratunov , Kumar Kartikeya Dwivedi , Wang ShaoBo , , , , References: <0f8fe661-c450-ccd8-761f-dbfff449c533@huawei.com> From: Xu Kuohai In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.67.111.192] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 5/26/2022 6:06 PM, Mark Rutland wrote: > On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote: >> On 5/25/2022 9:38 PM, Mark Rutland wrote: >>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote: >>>> Add ftrace direct support for arm64. >>>> >>>> 1. When there is custom trampoline only, replace the fentry nop to a >>>> jump instruction that jumps directly to the custom trampoline. >>>> >>>> 2. When ftrace trampoline and custom trampoline coexist, jump from >>>> fentry to ftrace trampoline first, then jump to custom trampoline >>>> when ftrace trampoline exits. The current unused register >>>> pt_regs->orig_x0 is used as an intermediary for jumping from ftrace >>>> trampoline to custom trampoline. >>> >>> For those of us not all that familiar with BPF, can you explain *why* you want >>> this? The above explains what the patch implements, but not why that's useful. >>> >>> e.g. is this just to avoid the overhead of the ops list processing in the >>> regular ftrace code, or is the custom trampoline there to allow you to do >>> something special? >> >> IIUC, ftrace direct call was designed to *remove* the unnecessary >> overhead of saving regs completely [1][2]. > > Ok. My plan is to get rid of most of the register saving generally, so I think > that aspect can be solved without direct calls. Looking forward to your new solution. > >> [1] >> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/ >> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/ >> >> This patch itself is just a variant of [3]. >> >> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/ >> >>> >>> There is another patch series on the list from some of your colleagues which >>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not >>> clear to me whether these are trying to solve the largely same problem or >>> something different. That other thread is at: >>> >>> https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/ >>> >>> ... and I've added the relevant parties to CC here, since there doesn't seem to >>> be any overlap in the CC lists of the two threads. >> >> We're not working to solve the same problem. The trampoline introduced >> in this series helps us to monitor kernel function or another bpf prog >> with bpf, and also helps us to use bpf prog like a normal kernel >> function pointer. > > Ok, but why is it necessary to have a special trampoline? > > Is that *just* to avoid overhead, or do you need to do something special that > the regular trampoline won't do? > Sorry for not explaining the problem. The main bpf prog accepts only a single argument 'ctx' in r1, so to allow kernel code to call bpf prog transparently, we need a trampoline to convert native calling convention into BPF calling convention [1]. [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/ For example, SEC("struct_ops/dctcp_state") void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state) { // do something } The above bpf prog will be compiled to something like this: dctcp_state: r2 = *(u64 *)(r1 + 8) // new_state r1 = *(u64 *)(r1 + 0) // sk ... It accepts only one argument 'ctx' in r1, and loads the actual arugment 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before calling this prog, we need to construct 'ctx' and store its address to r1. >>> >>> In that other thread I've suggested a general approach we could follow at: >>> >>> https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/ >>> >> >> Is it possible for a kernel function to take a long jump to common >> trampoline when we get a huge kernel image? > > It is possible, but only where the kernel Image itself is massive and the .text > section exceeeds 128MiB, at which point other things break anyway. Practically > speaking, this doesn't happen for production kernels, or reasonable test > kernels. > So even for normal kernel functions, we need some way to construct and destruct long jumps atomically and safely. > I've been meaning to add some logic to detect this at boot time and idsable > ftrace (or at build time), since live patching would also be broken in that > case. >>>> As noted in that thread, I have a few concerns which equally apply here: >>> >>> * Due to the limited range of BL instructions, it's not always possible to >>> patch an ftrace call-site to branch to an arbitrary trampoline. The way this >>> works for ftrace today relies upon knowingthe set of trampolines at >>> compile-time, and allocating module PLTs for those, and that approach cannot >>> work reliably for dynanically allocated trampolines. >> >> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no >> bpf trampoline is constructed for out of range patch-site: >> >> if (is_long_jump(orig_call, image)) >> return -ENOTSUPP; > > Sure, my point is that in practice that means that (from the user's PoV) this > may randomly fail to work, and I'd like something that we can ensure works > consistently. > OK, should I suspend this work until you finish refactoring ftrace? >>> I'd strongly prefer to avoid custom tramplines unless they're strictly >>> necessary for functional reasons, so that we can have this work reliably and >>> consistently. >> >> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines. > > As above, can you please let me know *why* specifically it is needed? Why can't > we invoke the BPF code through the usual ops mechanism? > > Is that to avoid overhead, or are there other functional reasons you need a > special trampoline? > >>>> * If this is mostly about avoiding the ops list processing overhead, I >> beleive >>> we can implement some custom ops support more generally in ftrace which would >>> still use a common trampoline but could directly call into those custom ops. >>> I would strongly prefer this over custom trampolines. >>> >>> * I'm looking to minimize the set of regs ftrace saves, and never save a full >>> pt_regs, since today we (incompletely) fill that with bogus values and cannot >>> acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs >>> unless necessary, and I don't want to add additional reliance upon that >>> structure. >> >> Even if such a common trampoline is used, bpf trampoline is still >> necessary since we need to construct custom instructions to implement >> bpf functions, for example, to implement kernel function pointer with a >> bpf prog. > > Sorry, but I'm struggling to understand this. What specifically do you need to > do that means this can't use the same calling convention as the regular ops > function pointers? > > Thanks, > Mark. > . From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D246BC433F5 for ; Thu, 26 May 2022 14:50:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:From:References:CC:To: Subject:MIME-Version:Date:Message-ID:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=PuWNmvanhKiEjcWbUqASuHwlcptQ09QEvZk28SvJJ/8=; b=r1QH3WWG+mbDii bW9wFlRHzw0R55p8S9c0DZc9bq6rQlUaBzMuXuUgje6TVxE1Xda3fRQca9mklqYqI9RQb46Ji09Ed uYr9cxPBOb5k1UkSaAelFdkmqqNnAIWT93xKvG4gzV9jvG5IEyLKIBdMcgVxK+OiMkx35jGA/PWSI pYDSOBLST4OIg2CnbKJ0JZ6ua1zVqy7uSxWuP8ZGAcJhCwDcacWpwQoBrVnTdq7dQTAJ+R1Yzlna+ 43Tlwz0DtCXHVtR2kAb6zf5nd7EuoCIOTb6iz/XdivklxrPZ3pYrZI7oKTqYJK95zL2F3LCenLPcF PQbw4154lyad8T3E5x3A==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuEmy-00FDQd-Mv; Thu, 26 May 2022 14:48:36 +0000 Received: from szxga08-in.huawei.com ([45.249.212.255]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nuEmc-00FDFB-JA for linux-arm-kernel@lists.infradead.org; Thu, 26 May 2022 14:48:17 +0000 Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4L89k32DZnz1JCRq; Thu, 26 May 2022 22:46:35 +0800 (CST) Received: from [10.67.111.192] (10.67.111.192) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Thu, 26 May 2022 22:48:05 +0800 Message-ID: <40fda0b0-0efc-ea1b-96d5-e51a4d1593dd@huawei.com> Date: Thu, 26 May 2022 22:48:05 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH bpf-next v5 1/6] arm64: ftrace: Add ftrace direct call support Content-Language: en-US To: Mark Rutland CC: , , , , , Catalin Marinas , Will Deacon , Steven Rostedt , Ingo Molnar , Daniel Borkmann , Alexei Starovoitov , Zi Shen Lim , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , "David S . Miller" , Hideaki YOSHIFUJI , David Ahern , Thomas Gleixner , Borislav Petkov , Dave Hansen , , , Shuah Khan , Jakub Kicinski , Jesper Dangaard Brouer , Pasha Tatashin , Ard Biesheuvel , Daniel Kiss , Steven Price , Sudeep Holla , Marc Zyngier , Peter Collingbourne , Mark Brown , Delyan Kratunov , Kumar Kartikeya Dwivedi , Wang ShaoBo , , , , References: <0f8fe661-c450-ccd8-761f-dbfff449c533@huawei.com> From: Xu Kuohai In-Reply-To: X-Originating-IP: [10.67.111.192] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemi500013.china.huawei.com (7.221.188.120) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220526_074815_374258_E30EED24 X-CRM114-Status: GOOD ( 44.29 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 5/26/2022 6:06 PM, Mark Rutland wrote: > On Thu, May 26, 2022 at 05:45:03PM +0800, Xu Kuohai wrote: >> On 5/25/2022 9:38 PM, Mark Rutland wrote: >>> On Wed, May 18, 2022 at 09:16:33AM -0400, Xu Kuohai wrote: >>>> Add ftrace direct support for arm64. >>>> >>>> 1. When there is custom trampoline only, replace the fentry nop to a >>>> jump instruction that jumps directly to the custom trampoline. >>>> >>>> 2. When ftrace trampoline and custom trampoline coexist, jump from >>>> fentry to ftrace trampoline first, then jump to custom trampoline >>>> when ftrace trampoline exits. The current unused register >>>> pt_regs->orig_x0 is used as an intermediary for jumping from ftrace >>>> trampoline to custom trampoline. >>> >>> For those of us not all that familiar with BPF, can you explain *why* you want >>> this? The above explains what the patch implements, but not why that's useful. >>> >>> e.g. is this just to avoid the overhead of the ops list processing in the >>> regular ftrace code, or is the custom trampoline there to allow you to do >>> something special? >> >> IIUC, ftrace direct call was designed to *remove* the unnecessary >> overhead of saving regs completely [1][2]. > > Ok. My plan is to get rid of most of the register saving generally, so I think > that aspect can be solved without direct calls. Looking forward to your new solution. > >> [1] >> https://lore.kernel.org/all/20191022175052.frjzlnjjfwwfov64@ast-mbp.dhcp.thefacebook.com/ >> [2] https://lore.kernel.org/all/20191108212834.594904349@goodmis.org/ >> >> This patch itself is just a variant of [3]. >> >> [3] https://lore.kernel.org/all/20191108213450.891579507@goodmis.org/ >> >>> >>> There is another patch series on the list from some of your colleagues which >>> uses dynamic trampolines to try to avoid that ops list overhead, and it's not >>> clear to me whether these are trying to solve the largely same problem or >>> something different. That other thread is at: >>> >>> https://lore.kernel.org/linux-arm-kernel/20220316100132.244849-1-bobo.shaobowang@huawei.com/ >>> >>> ... and I've added the relevant parties to CC here, since there doesn't seem to >>> be any overlap in the CC lists of the two threads. >> >> We're not working to solve the same problem. The trampoline introduced >> in this series helps us to monitor kernel function or another bpf prog >> with bpf, and also helps us to use bpf prog like a normal kernel >> function pointer. > > Ok, but why is it necessary to have a special trampoline? > > Is that *just* to avoid overhead, or do you need to do something special that > the regular trampoline won't do? > Sorry for not explaining the problem. The main bpf prog accepts only a single argument 'ctx' in r1, so to allow kernel code to call bpf prog transparently, we need a trampoline to convert native calling convention into BPF calling convention [1]. [1] https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org/ For example, SEC("struct_ops/dctcp_state") void BPF_PROG(dctcp_state, struct sock *sk, __u8 new_state) { // do something } The above bpf prog will be compiled to something like this: dctcp_state: r2 = *(u64 *)(r1 + 8) // new_state r1 = *(u64 *)(r1 + 0) // sk ... It accepts only one argument 'ctx' in r1, and loads the actual arugment 'sk' and 'new_state' from r1 + 0 and r1 + 8, resepectively. So before calling this prog, we need to construct 'ctx' and store its address to r1. >>> >>> In that other thread I've suggested a general approach we could follow at: >>> >>> https://lore.kernel.org/linux-arm-kernel/YmGF%2FOpIhAF8YeVq@lakrids/ >>> >> >> Is it possible for a kernel function to take a long jump to common >> trampoline when we get a huge kernel image? > > It is possible, but only where the kernel Image itself is massive and the .text > section exceeeds 128MiB, at which point other things break anyway. Practically > speaking, this doesn't happen for production kernels, or reasonable test > kernels. > So even for normal kernel functions, we need some way to construct and destruct long jumps atomically and safely. > I've been meaning to add some logic to detect this at boot time and idsable > ftrace (or at build time), since live patching would also be broken in that > case. >>>> As noted in that thread, I have a few concerns which equally apply here: >>> >>> * Due to the limited range of BL instructions, it's not always possible to >>> patch an ftrace call-site to branch to an arbitrary trampoline. The way this >>> works for ftrace today relies upon knowingthe set of trampolines at >>> compile-time, and allocating module PLTs for those, and that approach cannot >>> work reliably for dynanically allocated trampolines. >> >> Currently patch 5 returns -ENOTSUPP when long jump is detected, so no >> bpf trampoline is constructed for out of range patch-site: >> >> if (is_long_jump(orig_call, image)) >> return -ENOTSUPP; > > Sure, my point is that in practice that means that (from the user's PoV) this > may randomly fail to work, and I'd like something that we can ensure works > consistently. > OK, should I suspend this work until you finish refactoring ftrace? >>> I'd strongly prefer to avoid custom tramplines unless they're strictly >>> necessary for functional reasons, so that we can have this work reliably and >>> consistently. >> >> bpf trampoline is needed by bpf itself, not to replace ftrace trampolines. > > As above, can you please let me know *why* specifically it is needed? Why can't > we invoke the BPF code through the usual ops mechanism? > > Is that to avoid overhead, or are there other functional reasons you need a > special trampoline? > >>>> * If this is mostly about avoiding the ops list processing overhead, I >> beleive >>> we can implement some custom ops support more generally in ftrace which would >>> still use a common trampoline but could directly call into those custom ops. >>> I would strongly prefer this over custom trampolines. >>> >>> * I'm looking to minimize the set of regs ftrace saves, and never save a full >>> pt_regs, since today we (incompletely) fill that with bogus values and cannot >>> acquire some state reliably (e.g. PSTATE). I'd like to avoid usage of pt_regs >>> unless necessary, and I don't want to add additional reliance upon that >>> structure. >> >> Even if such a common trampoline is used, bpf trampoline is still >> necessary since we need to construct custom instructions to implement >> bpf functions, for example, to implement kernel function pointer with a >> bpf prog. > > Sorry, but I'm struggling to understand this. What specifically do you need to > do that means this can't use the same calling convention as the regular ops > function pointers? > > Thanks, > Mark. > . _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel