From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6DA6C77B7C for ; Sat, 27 May 2023 03:20:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237800AbjE0DUB (ORCPT ); Fri, 26 May 2023 23:20:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55688 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjE0DT7 (ORCPT ); Fri, 26 May 2023 23:19:59 -0400 Received: from out-29.mta0.migadu.com (out-29.mta0.migadu.com [IPv6:2001:41d0:1004:224b::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2CA299 for ; Fri, 26 May 2023 20:19:56 -0700 (PDT) Date: Fri, 26 May 2023 23:19:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1685157595; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zC6ZNRAMXSTd1zjJb1ou+NimilKWTlR9ciflQq7y0Bw=; b=oWx2NpJvKxrzV3oXwp2njxlTCvxR3WH88zCPvD2fyst1ulzM9qe96PAlVTOQETHCG81FGd rMVp8WLbEw6k0SPHd08q8Y611MvjKz8NUXREJBkL3JBxYqI+6GLH6qHyDUE0mMfEXCca0D 3yNNaKHa1oS6rPQunUyGJMxDV6Hcgig= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Song Liu Cc: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, mcgrof@kernel.org, peterz@infradead.org, tglx@linutronix.de, x86@kernel.org, rppt@kernel.org Subject: Re: [PATCH 1/3] module: Introduce module_alloc_type Message-ID: References: <20230526051529.3387103-1-song@kernel.org> <20230526051529.3387103-2-song@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, May 26, 2023 at 05:03:18PM -0700, Song Liu wrote: > On Fri, May 26, 2023 at 4:39 PM Kent Overstreet > wrote: > > > [...] > > > > > But it should be an internal implementation detail, I don't think we > > want the external interface tied to vmalloc - > > > > > These two APIs allow the core code work with all archs. They won't break > > > sub-page allocations. (not all archs will have sub-page allocations) > > > > So yes, text_poke() doesn't work on all architectures, and we need a > > fallback. > > > > But if the fallback needs to go the unprotect/protect route, I think we > > need to put that in the helper, and not expose it. Part of what people > > are wanting is to limit or eliminate pages that are RWX, so we > > definitely shouldn't be creating new interfaces to flipping page > > permissions: that should be pushed down to as low a level as possible. > > > > E.g., with my jitalloc_update(), a more complete version would look like > > > > void jitalloc_update(void *dst, void *src, size_t len) > > { > > if (text_poke_available) { > > text_poke(...); > > } else { > > unprotect(); > > memcpy(); > > protect(); > > } > > } > > I think this doesn't work for sub page allocation? Perhaps I elided too much - it does if you add a single lock. You can't do that if it's not a common helper. > At the end of all this, we will have modules running from huge pages, data > and text. It will give significant performance benefit when some key driver > cannot be compiled into the kernel. Yeah, I've seen the numbers for the perf impact of running as a module due to the extra TLB overhead - but Mike's recent data was showing that this doesn't matter nearly as much as data as it does for text. > > Also - module_memory_fill_type(), module_memory_invalidate_type() - I > > know these are for BPF, but could you explain why we need them in the > > external interface here? Could they perhaps be small helpers in the bpf > > code that use something like jitalloc_update()? > > These are used by all users, not just BPF. 1/3 uses them in > kernel/module/main.c. I didn't use them in 3/3 as it is arch code, but I can > use them instead of text_poke_* (and that is probably better code style). > As I am writing the email, I think I should use it in ftrace.c (2/3) to avoid > the __weak function. Ok. Could we make it clearer why they're needed and what they're for? I know bpf fills in invalid instructions initially; I didn't read through enough code to find the "why", and let's document that to save other people the same effort. And do they really need to be callbacks in mod_alloc_params...? Do we need the other things in mod_alloc_params/vmalloc_params? - your granularity field says it's for specifying PAGE/PMD size: we definitely do not want that. We've had way too much code along the lines of "implement hugepages for x", we need to stop doing that. It's an internal mm/ detail. - vmalloc_params: we need gfp_t and pgprot_t, but those should be normal arguments. start/end/vm_flags? They seem like historical module baggage, can we get rid of them? > > > OTOH, jit_alloc (as-is) solves a subset of the problem (only the text). > > > We can probably use it (with some updates) instead of the vmap_area > > > based allocator. But I guess we need to align the direction first. > > > > Let's see if we can get tglx to chime in again, since he was pretty > > vocal in the last discussion. > > > > It's also good practice to try to summarize all the relevant "whys" from > > the discussion that went into a feature and put that in at least the > > commit message - or even better, header file comments. > > > > Also, organization: the module code is a huge mess, let's _please_ do > > split this out and think about organization a bit more, not add to the > > mess :) > > I don't really think module code is very messy at the moment. If > necessary, we can put module_alloc_type related code in a > separate file. Hey, it's been organized since I last looked at it :) But I tihink this belongs in mm/, not module code.