From mboxrd@z Thu Jan 1 00:00:00 1970 From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Date: Fri, 5 Oct 2018 19:28:12 +0200 Subject: [RFC PATCH 0/9] patchable function pointers for pluggable crypto routines In-Reply-To: References: <20181005081333.15018-1-ard.biesheuvel@linaro.org> <20181005133705.GA4588@zx2c4.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 5 October 2018 at 19:26, Andy Lutomirski wrote: > On Fri, Oct 5, 2018 at 10:15 AM Ard Biesheuvel > wrote: >> >> On 5 October 2018 at 15:37, Jason A. Donenfeld wrote: >> ... >> > Therefore, I think this patch goes in exactly the wrong direction. I >> > mean, if you want to introduce dynamic patching as a means for making >> > the crypto API's dynamic dispatch stuff not as slow in a post-spectre >> > world, sure, go for it; that may very well be a good idea. But >> > presenting it as an alternative to Zinc very widely misses the point and >> > serves to prolong a series of bad design choices, which are now able to >> > be rectified by putting energy into Zinc instead. >> > >> >> This series has nothing to do with dynamic dispatch: the call sites >> call crypto functions using ordinary function calls (although my >> example uses CRC-T10DIF), and these calls are redirected via what is >> essentially a PLT entry, so that we can supsersede those routines at >> runtime. > > If you really want to do it PLT-style, then just do: > > extern void whatever_func(args); > > Call it like: > whatever_func(args here); > > And rig up something to emit asm like: > > GLOBAL(whatever_func) > jmpq default_whatever_func > ENDPROC(whatever_func) > > Architectures without support can instead do: > > void whatever_func(args) > { > READ_ONCE(patchable_function_struct_for_whatever_func->ptr)(args); > } > > and patch the asm function for basic support. It will be slower than > necessary, but maybe the relocation trick could be used on top of this > to redirect the call to whatever_func directly to the target for > architectures that want to squeeze out the last bit of performance. > This might actually be the best of all worlds: easy implementation on > all architectures, no inline asm, and the totally non-magical version > works with okay performance. > > (Is this what your code is doing? I admit I didn't follow all the way > through all the macros.) Basically