From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 271BDC4332F for ; Tue, 8 Nov 2022 18:50:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229707AbiKHSuv (ORCPT ); Tue, 8 Nov 2022 13:50:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36982 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229501AbiKHSuY (ORCPT ); Tue, 8 Nov 2022 13:50:24 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9D5A6DCFC for ; Tue, 8 Nov 2022 10:50:19 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 43F4B61750 for ; Tue, 8 Nov 2022 18:50:19 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A751AC433D7 for ; Tue, 8 Nov 2022 18:50:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1667933418; bh=zu9PMhVfzzTPBM578gdkzAALFYMkI9GwJjbL3eyvjAM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Y86jQbHnVfUqG1sMkGTtdBeomtv/wb2PQODh1qzPhXx7eWj4bLqsm0za77Jf2M6Mx PKjkEklqRZVcq283TRIz2nZyfRjafK+1zWjj2hVzWuEbSgjviV/t+agMyoQnqp1Wgj UDtoaJma16bxcNNTkDDlZ2E+rwWzePTfXRms9XpFplP/eB7u9oOEOKl/Kfh7kubWhQ QYw9wsUSp4rssTJSlbtSCGHHyrkyiK+20JUWBbNvBxRq56rA5afe3CD7RRskeOe1Op 3xglcKn2N0aLn6oEc+VjRGWxgKSdKjx2ey/hKmt9vep3zJVPRmF8uF6aIzQ9VGds/W DHKNQKmRVKsKw== Received: by mail-ej1-f51.google.com with SMTP id kt23so40957777ejc.7 for ; Tue, 08 Nov 2022 10:50:18 -0800 (PST) X-Gm-Message-State: ACrzQf3l4Ohve8DJr0+NdilZXB3j7pBd0/rlOyy+4xoAUB93AbQBc0J/ m+X6DmB4MUFn57pKA/r/+XZ+mTGAdI5npwSBAjc= X-Google-Smtp-Source: AMsMyM56A9ld/ijomWhhxZcM0Q6GqvKYUhlV48OrAak6z2K8RXAGhJXA8FLeGzN+XDmHx5xVq2WtTK4fY+TaCd2Ejuw= X-Received: by 2002:a17:907:b602:b0:7ad:e82c:3355 with SMTP id vl2-20020a170907b60200b007ade82c3355mr41175687ejc.3.1667933416863; Tue, 08 Nov 2022 10:50:16 -0800 (PST) MIME-Version: 1.0 References: <20221107223921.3451913-1-song@kernel.org> <9e59a4e8b6f071cf380b9843cdf1e9160f798255.camel@intel.com> In-Reply-To: <9e59a4e8b6f071cf380b9843cdf1e9160f798255.camel@intel.com> From: Song Liu Date: Tue, 8 Nov 2022 10:50:04 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH bpf-next v2 0/5] execmem_alloc for BPF programs To: "Edgecombe, Rick P" Cc: "rppt@kernel.org" , "peterz@infradead.org" , "bpf@vger.kernel.org" , "linux-mm@kvack.org" , "hch@lst.de" , "x86@kernel.org" , "akpm@linux-foundation.org" , "mcgrof@kernel.org" , "Lu, Aaron" Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Tue, Nov 8, 2022 at 8:51 AM Edgecombe, Rick P wrote: > > On Tue, 2022-11-08 at 13:27 +0200, Mike Rapoport wrote: > > > Based on our experiments [5], we measured 0.5% performance > > > improvement > > > from bpf_prog_pack. This patchset further boosts the improvement to > > > 0.7%. > > > The difference is because bpf_prog_pack uses 512x 4kB pages instead > > > of > > > 1x 2MB page, bpf_prog_pack as-is doesn't resolve #2 above. > > > > > > This patchset replaces bpf_prog_pack with a better API and makes it > > > available for other dynamic kernel text, such as modules, ftrace, > > > kprobe. > > > > > > The proposed execmem_alloc() looks to me very much tailored for x86 > > to be > > used as a replacement for module_alloc(). Some architectures have > > module_alloc() that is quite different from the default or x86 > > version, so > > I'd expect at least some explanation how modules etc can use execmem_ > > APIs > > without breaking !x86 architectures. > > I think this is fair, but I think we should ask ask ourselves - how > much should we do in one step? > > For non-text_poke() architectures, the way you can make it work is have > the API look like: > execmem_alloc() <- Does the allocation, but necessarily usable yet > execmem_write() <- Loads the mapping, doesn't work after finish() > execmem_finish() <- Makes the mapping live (loaded, executable, ready) > > So for text_poke(): > execmem_alloc() <- reserves the mapping > execmem_write() <- text_pokes() to the mapping > execmem_finish() <- does nothing > > And non-text_poke(): > execmem_alloc() <- Allocates a regular RW vmalloc allocation > execmem_write() <- Writes normally to it > execmem_finish() <- does set_memory_ro()/set_memory_x() on it Yeah, some fallback mechanism like this is missing in current version. It is not a problem for BPF programs, as we call it from arch code. But we do need better APIs for modules. Thanks, Song > > Non-text_poke() only gets the benefits of centralized logic, but the > interface works for both. This is pretty much what the perm_alloc() RFC > did to make it work with other arch's and modules. But to fit with the > existing modules code (which is actually spread all over) and also > handle RO sections, it also needed some additional bells and whistles. > > So the question I'm trying to ask is, how much should we target for the > next step? I first thought that this functionality was so intertwined, > it would be too hard to do iteratively. So if we want to try > iteratively, I'm ok if it doesn't solve everything. > >