From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C9FC46466 for ; Mon, 5 Oct 2020 19:00:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7430F20B80 for ; Mon, 5 Oct 2020 19:00:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ky1b95z9" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729234AbgJETAC (ORCPT ); Mon, 5 Oct 2020 15:00:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727247AbgJETAB (ORCPT ); Mon, 5 Oct 2020 15:00:01 -0400 Received: from mail-oi1-x243.google.com (mail-oi1-x243.google.com [IPv6:2607:f8b0:4864:20::243]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1F54C0613CE for ; Mon, 5 Oct 2020 12:00:01 -0700 (PDT) Received: by mail-oi1-x243.google.com with SMTP id u126so9751253oif.13 for ; Mon, 05 Oct 2020 12:00:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=Ky1b95z9ctxi5Up6vlAjc2Sxv+MR7UCALpWjrjeUBeeemtaEGwZe9ejzUJSjnqtClo ReUuS+LFV7Pk5Bhi9vXooH6kJAtdsj5SAxpT4b1SzN5FmVmWgnVZZ+gmZiGkbBF/xvdc LbxZkShlVJragtp+g5FcfCbiM65I7so+1l5boJRu6oBARjnlFR2MwpIQaLXEsO8SsQfv j39YatgwTx9rnBMhn+MZSrM82o5/g1sAniUNwPFHZ/5upL26RXXMGgSklQnyM9vGF9tc fDxyE2Oc/XlaPyAd4iSslCujS84buEYweStGARsnJMW56Mcgdn721RtOuSFuvo/A0maq Inmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=rbrCbi3uPWmUYkV0TMzq/cQsULikGayYHW1L6hof/ifODuNOykQ7qy/ovC1uE0B8n1 IyshAunF13vixgeY9uck5OkAIOd34XVnYH26NROMs9WZNRObyLLonGyzP0KyeelKoQ1M DBaZe9b52vBjZGZYQJOWDTAaqDZ2nkXo/YOt/f+ivvbBnytZ/jVXUOYFmwEQ3UXgx2jv vIHBI45hEODqkMiLmScqzgfR7pIai6T/yBND6ND9wi5jj59PpBEaaG0X5HcOjD/5+isI oj/oavtMFLnqTLOv6OB+Z47o4OT8b8WLiPtAaMBPSz9b1e1F5cl6VZoWP9PV8q+FFsRe DVMw== X-Gm-Message-State: AOAM533yjdwU7Ppq7vj+/QajaaC7dpyKRgHsXFlbd6AK/4CAeP+dQl8D BXXNpeS9FN+5g28gT9PmLxLJUDHIyeavISD+AJ0YhQ== X-Google-Smtp-Source: ABdhPJyzVuth4CKP/Ne+wOuwyABlhcWUIOPRj8t4qy87nTBlOmB6Th6J+bzOAcqymPkEx9q7xzjLSwJPeD+YCnkxBi4= X-Received: by 2002:a54:468f:: with SMTP id k15mr488388oic.121.1601924400739; Mon, 05 Oct 2020 12:00:00 -0700 (PDT) MIME-Version: 1.0 References: <20200929133814.2834621-1-elver@google.com> <20200929133814.2834621-2-elver@google.com> In-Reply-To: From: Marco Elver Date: Mon, 5 Oct 2020 20:59:49 +0200 Message-ID: Subject: Re: [PATCH v4 01/11] mm: add Kernel Electric-Fence infrastructure To: Jann Horn Cc: Dmitry Vyukov , Andrew Morton , Alexander Potapenko , "H . Peter Anvin" , "Paul E . McKenney" , Andrey Konovalov , Andrey Ryabinin , Andy Lutomirski , Borislav Petkov , Catalin Marinas , Christoph Lameter , Dave Hansen , David Rientjes , Eric Dumazet , Greg Kroah-Hartman , Hillf Danton , Ingo Molnar , Jonathan Cameron , Jonathan Corbet , Joonsoo Kim , Kees Cook , Mark Rutland , Pekka Enberg , Peter Zijlstra , SeongJae Park , Thomas Gleixner , Vlastimil Babka , Will Deacon , "the arch/x86 maintainers" , "open list:DOCUMENTATION" , kernel list , kasan-dev , Linux ARM , Linux-MM , SeongJae Park Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2 Oct 2020 at 20:28, Jann Horn wrote: [...] > > > > > > Do you have performance numbers or a description of why you believe > > > that this part of kfence is exceptionally performance-sensitive? If > > > not, it might be a good idea to remove this optimization, at least for > > > the initial version of this code. (And even if the optimization is > > > worthwhile, it might be a better idea to go for the generic version > > > immediately.) > > > > This check is very hot, it happens on every free. For every freed > > object we need to understand if it belongs to KFENCE or not. > > Ah, so the path you care about does not dereference __kfence_pool, it > just compares it to the supplied pointer? > > > First off: The way you've written is_kfence_address(), GCC 10.2 at -O3 > seems to generate *utterly* *terrible* code (and the newest clang > release isn't any better); something like this: > > kfree_inefficient: > mov rax, QWORD PTR __kfence_pool[rip] > cmp rax, rdi > jbe .L4 > .L2: > jmp kfree_not_kfence > .L4: > add rax, 0x200000 > cmp rax, rdi > jbe .L2 > jmp kfree_kfence > > So pointers to the left of the region and pointers to the right of the > region will take different branches, and so if you have a mix of > objects on both sides of the kfence region, you'll get tons of branch > mispredictions for no good reason. You'll want to rewrite that check > as "unlikely(ptr - base <= SIZE)" instead of "unlikely(ptr >= base && > ptr < base + SIZE" unless you know that all the objects will be on one > side. This would also reduce the performance impact of loading > __kfence_pool from the data section, because the branch prediction can > then speculate the branch that depends on the load properly and > doesn't have to go roll back everything that happened when the object > turns out to be on the opposite side of the kfence memory region - the > latency of the load will hopefully become almost irrelevant. Good point, implemented that. (It's "ptr - base < SIZE" I take it.) > So in x86 intel assembly (assuming that we want to ensure that we only > do a single branch on the object type), the straightforward and > non-terrible version would be: > > > kfree_unoptimized: > mov rax, rdi > sub rax, QWORD PTR __kfence_pool[rip] > cmp rax, 0x200000 > jbe 1f > /* non-kfence case goes here */ > 1: > /* kfence case goes here */ > > > while the version you want is: > > > kfree_static: > mov rax, rdi > sub rax, OFFSET FLAT:__kfence_pool > cmp rax, 0x200000 > jbe 1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > > If we instead use something like > > #define STATIC_VARIABLE_LOAD(variable) \ > ({ \ > typeof(variable) value; \ > BUILD_BUG_ON(sizeof(variable) != sizeof(unsigned long)); \ > asm( \ > ".pushsection .static_variable_users\n\t" \ > ".long " #variable " - .\n\t" \ > ".long 123f - .\n\t" /* offset to end of constant */ \ > ".popsection\n\t" \ > "movabs $0x0123456789abcdef, %0" \ > "123:\n\t" \ > :"=r"(value) \ > ); \ > value; \ > }) > static __always_inline bool is_kfence_address(const void *addr) > { > return unlikely((char*)addr - STATIC_VARIABLE_LOAD(__kfence_pool) < > KFENCE_POOL_SIZE); > } > > to locate the pool (which could again be normally allocated with > alloc_pages()), we'd get code like this, which is like the previous > except that we need an extra "movabs" because x86's "sub" can only use > immediates up to 32 bits: > > kfree_hotpatchable_bigreloc: > mov rax, rdi > movabs rdx, 0x0123456789abcdef > sub rax, rdx > cmp rax, 0x200000 > jbe .1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > The arch-specific part of this could probably be packaged up pretty > nicely into a generic interface. If it actually turns out to have a > performance benefit, that is. Something like this would certainly be nice, but we'll do the due diligence and see if it's even worth it. > If that one extra "movabs" is actually a problem, it would > *theoretically* be possible to get rid of that by using module_alloc() > to allocate virtual memory to which offsets from kernel text are 32 > bits, and using special-cased inline asm, but we probably shouldn't do > that, because as Mark pointed out, we'd then risk getting extremely > infrequent extra bugs when drivers use phys_to_virt() on allocations > that were done through kfence. Adding new, extremely infrequent and > sporadically occurring bugs to the kernel seems like the exact > opposite of the goal of KFENCE. :P > > Overall my expectation would be that the MOVABS version should > probably at worst be something like one cycle slower - it adds 5 > instruction bytes (and we pay 1 cycle in the frontend per 16 bytes of > instructions, I think?) and 1 backend cycle (for the MOVABS - Agner > Fog's tables seem to say that at least on Skylake, MOVABS is 1 cycle). > But that backend cycle shouldn't even be on the critical path (and it > has a wider choice of ports than e.g. a load, and I think typical > kernel code isn't exactly highly parallelizable, so we can probably > schedule on a port that would've been free otherwise?), and I think > typical kernel code should be fairly light on the backend, so with the > MOVABS version, compared to the version with __kfence_pool in the data > section, we probably overall just pay a fraction of a cycle in > execution cost? I'm not a professional performance engineer, but this > sounds to me like the MOVABS version should probably perform roughly > as well as your version. > > Anyway, I guess this is all pretty vague without actually having > concrete benchmark results. :P > > See for examples of actual code > generation for different options of writing this check. Thanks for the analysis! There is also some (11 year old) prior art, that seems to never have made it into the kernel: https://lore.kernel.org/lkml/20090924132626.485545323@polymtl.ca/ Maybe we need to understand why that never made it. But I think, even if we drop the static pool, a first version of KFENCE should not depend on it. Thanks, -- Marco From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.4 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60298C47095 for ; Mon, 5 Oct 2020 19:00:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D088920B80 for ; Mon, 5 Oct 2020 19:00:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ky1b95z9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D088920B80 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 428B190001B; Mon, 5 Oct 2020 15:00:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B37490000C; Mon, 5 Oct 2020 15:00:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 27B0590001B; Mon, 5 Oct 2020 15:00:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0038.hostedemail.com [216.40.44.38]) by kanga.kvack.org (Postfix) with ESMTP id E65FA90000C for ; Mon, 5 Oct 2020 15:00:02 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 88704180AD801 for ; Mon, 5 Oct 2020 19:00:02 +0000 (UTC) X-FDA: 77338786644.23.team38_2a0dd82271c0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 6277237606 for ; Mon, 5 Oct 2020 19:00:02 +0000 (UTC) X-HE-Tag: team38_2a0dd82271c0 X-Filterd-Recvd-Size: 10545 Received: from mail-oi1-f194.google.com (mail-oi1-f194.google.com [209.85.167.194]) by imf39.hostedemail.com (Postfix) with ESMTP for ; Mon, 5 Oct 2020 19:00:01 +0000 (UTC) Received: by mail-oi1-f194.google.com with SMTP id c13so9781723oiy.6 for ; Mon, 05 Oct 2020 12:00:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=Ky1b95z9ctxi5Up6vlAjc2Sxv+MR7UCALpWjrjeUBeeemtaEGwZe9ejzUJSjnqtClo ReUuS+LFV7Pk5Bhi9vXooH6kJAtdsj5SAxpT4b1SzN5FmVmWgnVZZ+gmZiGkbBF/xvdc LbxZkShlVJragtp+g5FcfCbiM65I7so+1l5boJRu6oBARjnlFR2MwpIQaLXEsO8SsQfv j39YatgwTx9rnBMhn+MZSrM82o5/g1sAniUNwPFHZ/5upL26RXXMGgSklQnyM9vGF9tc fDxyE2Oc/XlaPyAd4iSslCujS84buEYweStGARsnJMW56Mcgdn721RtOuSFuvo/A0maq Inmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=AJAdpk1e/UO6etwFD9h8z2qt0cegfHU6Rb5e0nymQWbbaGZicTp4XBtSPBDDSwAJUF MJUIXaFCuxDfdLZTlDDaPSjykagJfmgxdwtrgkDBeuoJ2P+aHbw9CCSfjI6S1k5BBm6y MsqzvQkmjTfCj8bHok2qijnVCyuw7h9SMirhJLaVdZz5e88bP0dpKGaGt5IwMCOIfmtr xMtOSLu8KOTLmAF72QAGXprMphbPJZBCyu/AhqHbTd7z9GtMgiPp3Dysffh8ix8Tx0CY cMj1GvRcMnhjfmvD1S4Bybls9ow3MQ4vrPequIOErJFEMlJ6Be52ADm22cRygBbjFzPb Cmzw== X-Gm-Message-State: AOAM532ZtyojHseYINTmW4zBd5ZBK31CmfxvMYmoO6J/PcGkSIIPqsE0 adOU8ctoDwkt52JVKJRHCmAJAcaYbXRb/zL2zEJBiA== X-Google-Smtp-Source: ABdhPJyzVuth4CKP/Ne+wOuwyABlhcWUIOPRj8t4qy87nTBlOmB6Th6J+bzOAcqymPkEx9q7xzjLSwJPeD+YCnkxBi4= X-Received: by 2002:a54:468f:: with SMTP id k15mr488388oic.121.1601924400739; Mon, 05 Oct 2020 12:00:00 -0700 (PDT) MIME-Version: 1.0 References: <20200929133814.2834621-1-elver@google.com> <20200929133814.2834621-2-elver@google.com> In-Reply-To: From: Marco Elver Date: Mon, 5 Oct 2020 20:59:49 +0200 Message-ID: Subject: Re: [PATCH v4 01/11] mm: add Kernel Electric-Fence infrastructure To: Jann Horn Cc: Dmitry Vyukov , Andrew Morton , Alexander Potapenko , "H . Peter Anvin" , "Paul E . McKenney" , Andrey Konovalov , Andrey Ryabinin , Andy Lutomirski , Borislav Petkov , Catalin Marinas , Christoph Lameter , Dave Hansen , David Rientjes , Eric Dumazet , Greg Kroah-Hartman , Hillf Danton , Ingo Molnar , Jonathan Cameron , Jonathan Corbet , Joonsoo Kim , Kees Cook , Mark Rutland , Pekka Enberg , Peter Zijlstra , SeongJae Park , Thomas Gleixner , Vlastimil Babka , Will Deacon , "the arch/x86 maintainers" , "open list:DOCUMENTATION" , kernel list , kasan-dev , Linux ARM , Linux-MM , SeongJae Park Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 2 Oct 2020 at 20:28, Jann Horn wrote: [...] > > > > > > Do you have performance numbers or a description of why you believe > > > that this part of kfence is exceptionally performance-sensitive? If > > > not, it might be a good idea to remove this optimization, at least for > > > the initial version of this code. (And even if the optimization is > > > worthwhile, it might be a better idea to go for the generic version > > > immediately.) > > > > This check is very hot, it happens on every free. For every freed > > object we need to understand if it belongs to KFENCE or not. > > Ah, so the path you care about does not dereference __kfence_pool, it > just compares it to the supplied pointer? > > > First off: The way you've written is_kfence_address(), GCC 10.2 at -O3 > seems to generate *utterly* *terrible* code (and the newest clang > release isn't any better); something like this: > > kfree_inefficient: > mov rax, QWORD PTR __kfence_pool[rip] > cmp rax, rdi > jbe .L4 > .L2: > jmp kfree_not_kfence > .L4: > add rax, 0x200000 > cmp rax, rdi > jbe .L2 > jmp kfree_kfence > > So pointers to the left of the region and pointers to the right of the > region will take different branches, and so if you have a mix of > objects on both sides of the kfence region, you'll get tons of branch > mispredictions for no good reason. You'll want to rewrite that check > as "unlikely(ptr - base <= SIZE)" instead of "unlikely(ptr >= base && > ptr < base + SIZE" unless you know that all the objects will be on one > side. This would also reduce the performance impact of loading > __kfence_pool from the data section, because the branch prediction can > then speculate the branch that depends on the load properly and > doesn't have to go roll back everything that happened when the object > turns out to be on the opposite side of the kfence memory region - the > latency of the load will hopefully become almost irrelevant. Good point, implemented that. (It's "ptr - base < SIZE" I take it.) > So in x86 intel assembly (assuming that we want to ensure that we only > do a single branch on the object type), the straightforward and > non-terrible version would be: > > > kfree_unoptimized: > mov rax, rdi > sub rax, QWORD PTR __kfence_pool[rip] > cmp rax, 0x200000 > jbe 1f > /* non-kfence case goes here */ > 1: > /* kfence case goes here */ > > > while the version you want is: > > > kfree_static: > mov rax, rdi > sub rax, OFFSET FLAT:__kfence_pool > cmp rax, 0x200000 > jbe 1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > > If we instead use something like > > #define STATIC_VARIABLE_LOAD(variable) \ > ({ \ > typeof(variable) value; \ > BUILD_BUG_ON(sizeof(variable) != sizeof(unsigned long)); \ > asm( \ > ".pushsection .static_variable_users\n\t" \ > ".long " #variable " - .\n\t" \ > ".long 123f - .\n\t" /* offset to end of constant */ \ > ".popsection\n\t" \ > "movabs $0x0123456789abcdef, %0" \ > "123:\n\t" \ > :"=r"(value) \ > ); \ > value; \ > }) > static __always_inline bool is_kfence_address(const void *addr) > { > return unlikely((char*)addr - STATIC_VARIABLE_LOAD(__kfence_pool) < > KFENCE_POOL_SIZE); > } > > to locate the pool (which could again be normally allocated with > alloc_pages()), we'd get code like this, which is like the previous > except that we need an extra "movabs" because x86's "sub" can only use > immediates up to 32 bits: > > kfree_hotpatchable_bigreloc: > mov rax, rdi > movabs rdx, 0x0123456789abcdef > sub rax, rdx > cmp rax, 0x200000 > jbe .1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > The arch-specific part of this could probably be packaged up pretty > nicely into a generic interface. If it actually turns out to have a > performance benefit, that is. Something like this would certainly be nice, but we'll do the due diligence and see if it's even worth it. > If that one extra "movabs" is actually a problem, it would > *theoretically* be possible to get rid of that by using module_alloc() > to allocate virtual memory to which offsets from kernel text are 32 > bits, and using special-cased inline asm, but we probably shouldn't do > that, because as Mark pointed out, we'd then risk getting extremely > infrequent extra bugs when drivers use phys_to_virt() on allocations > that were done through kfence. Adding new, extremely infrequent and > sporadically occurring bugs to the kernel seems like the exact > opposite of the goal of KFENCE. :P > > Overall my expectation would be that the MOVABS version should > probably at worst be something like one cycle slower - it adds 5 > instruction bytes (and we pay 1 cycle in the frontend per 16 bytes of > instructions, I think?) and 1 backend cycle (for the MOVABS - Agner > Fog's tables seem to say that at least on Skylake, MOVABS is 1 cycle). > But that backend cycle shouldn't even be on the critical path (and it > has a wider choice of ports than e.g. a load, and I think typical > kernel code isn't exactly highly parallelizable, so we can probably > schedule on a port that would've been free otherwise?), and I think > typical kernel code should be fairly light on the backend, so with the > MOVABS version, compared to the version with __kfence_pool in the data > section, we probably overall just pay a fraction of a cycle in > execution cost? I'm not a professional performance engineer, but this > sounds to me like the MOVABS version should probably perform roughly > as well as your version. > > Anyway, I guess this is all pretty vague without actually having > concrete benchmark results. :P > > See for examples of actual code > generation for different options of writing this check. Thanks for the analysis! There is also some (11 year old) prior art, that seems to never have made it into the kernel: https://lore.kernel.org/lkml/20090924132626.485545323@polymtl.ca/ Maybe we need to understand why that never made it. But I think, even if we drop the static pool, a first version of KFENCE should not depend on it. Thanks, -- Marco From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_ADSP_CUSTOM_MED,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F065C4363A for ; Mon, 5 Oct 2020 19:01:40 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D1F5E206C3 for ; Mon, 5 Oct 2020 19:01:39 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="WQY0E5on"; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="Ky1b95z9" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D1F5E206C3 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=agr4YF+5ooiQulek/i/GrskFUHdPXtFJi+/SyhBy4NU=; b=WQY0E5onuW2F6KS2NKD3cP9V7 z0qpZH49PV1m0O3HiumtoyvJOd4h1rqiOWV2h+yY8Zrnkf/hrB3xDxXglsjLfugiCINF/DsvSwW9h YEgL9FZ/TGiyTXhMD/8qgp4ZQJoJjhqczNhsEGnlPbvKdTUBbvcN6IohPaszWyZNWQWT7cb9iEcXP Wq0ArvN10gORi/Okm6qt/WDlgY1zwwoHqW3zXMNdWf1j2cRiq4QRu5O3ZkkeIiYxEH7+St2DbrbVV wCwuEiS+fyiVq4fVk7Dht5Yi6snUebqmC1ulVHbaKUQCjRCfcLz6ZusE05ZxFMBkjtVnQq7fcug7n O/dOBxnWw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPViS-0005wG-AV; Mon, 05 Oct 2020 19:00:08 +0000 Received: from mail-oi1-x243.google.com ([2607:f8b0:4864:20::243]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kPViO-0005tR-Px for linux-arm-kernel@lists.infradead.org; Mon, 05 Oct 2020 19:00:05 +0000 Received: by mail-oi1-x243.google.com with SMTP id n2so9792385oij.1 for ; Mon, 05 Oct 2020 12:00:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=Ky1b95z9ctxi5Up6vlAjc2Sxv+MR7UCALpWjrjeUBeeemtaEGwZe9ejzUJSjnqtClo ReUuS+LFV7Pk5Bhi9vXooH6kJAtdsj5SAxpT4b1SzN5FmVmWgnVZZ+gmZiGkbBF/xvdc LbxZkShlVJragtp+g5FcfCbiM65I7so+1l5boJRu6oBARjnlFR2MwpIQaLXEsO8SsQfv j39YatgwTx9rnBMhn+MZSrM82o5/g1sAniUNwPFHZ/5upL26RXXMGgSklQnyM9vGF9tc fDxyE2Oc/XlaPyAd4iSslCujS84buEYweStGARsnJMW56Mcgdn721RtOuSFuvo/A0maq Inmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bgyVeWM2KFwaEKtxvi0k5vnLHgxboX9dQWrAwhLZ6sM=; b=dG0YPsA3emDxdpmTHBFL0p8ClhxBmrFT/aCOPsO2tEyDONBkvy+8hTb8n55I8rU9YQ Rl4YIQQN6T5cvNelUhMXKMZ8N0zlnkB8+VbWjpjasFKeLobzbmqHu6SCrs2Il3l1sXzm XHcEFve3rulDf+133sR6O2lOWa2euYlytXVm/asnB7uOgagYhjgTd+Oif7tg48Ibpv16 JuKXkJ2bcv5XFXyPorSN96fN8tJGYhPrHb6iUEOwQJoi5BmBUbqCP9MMfXkXA0RkhHgV iEkCC3C43SFnYjEF78TMNlH56dbU3jF5FD9W9Bp0LtC8kaplvamMMcCzepb984RADLnc 1Ljw== X-Gm-Message-State: AOAM5317NwZ+pC9LkwHV1z89WI6WFx7pIaF2Of+ge4Bl2mMJZWAazW8k Sibtbc4Ko/AAWURj0ub34eqKi4ebAfTkNyRF81EwCQ== X-Google-Smtp-Source: ABdhPJyzVuth4CKP/Ne+wOuwyABlhcWUIOPRj8t4qy87nTBlOmB6Th6J+bzOAcqymPkEx9q7xzjLSwJPeD+YCnkxBi4= X-Received: by 2002:a54:468f:: with SMTP id k15mr488388oic.121.1601924400739; Mon, 05 Oct 2020 12:00:00 -0700 (PDT) MIME-Version: 1.0 References: <20200929133814.2834621-1-elver@google.com> <20200929133814.2834621-2-elver@google.com> In-Reply-To: From: Marco Elver Date: Mon, 5 Oct 2020 20:59:49 +0200 Message-ID: Subject: Re: [PATCH v4 01/11] mm: add Kernel Electric-Fence infrastructure To: Jann Horn X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201005_150004_851649_9844C241 X-CRM114-Status: GOOD ( 48.71 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , Hillf Danton , "open list:DOCUMENTATION" , Peter Zijlstra , Catalin Marinas , Dave Hansen , SeongJae Park , Linux-MM , Eric Dumazet , Alexander Potapenko , "H . Peter Anvin" , Christoph Lameter , Will Deacon , SeongJae Park , Jonathan Corbet , the arch/x86 maintainers , kasan-dev , Ingo Molnar , Vlastimil Babka , David Rientjes , Andrey Ryabinin , Kees Cook , "Paul E . McKenney" , Andrey Konovalov , Borislav Petkov , Andy Lutomirski , Jonathan Cameron , Thomas Gleixner , Andrew Morton , Dmitry Vyukov , Linux ARM , Greg Kroah-Hartman , kernel list , Pekka Enberg , Joonsoo Kim Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 2 Oct 2020 at 20:28, Jann Horn wrote: [...] > > > > > > Do you have performance numbers or a description of why you believe > > > that this part of kfence is exceptionally performance-sensitive? If > > > not, it might be a good idea to remove this optimization, at least for > > > the initial version of this code. (And even if the optimization is > > > worthwhile, it might be a better idea to go for the generic version > > > immediately.) > > > > This check is very hot, it happens on every free. For every freed > > object we need to understand if it belongs to KFENCE or not. > > Ah, so the path you care about does not dereference __kfence_pool, it > just compares it to the supplied pointer? > > > First off: The way you've written is_kfence_address(), GCC 10.2 at -O3 > seems to generate *utterly* *terrible* code (and the newest clang > release isn't any better); something like this: > > kfree_inefficient: > mov rax, QWORD PTR __kfence_pool[rip] > cmp rax, rdi > jbe .L4 > .L2: > jmp kfree_not_kfence > .L4: > add rax, 0x200000 > cmp rax, rdi > jbe .L2 > jmp kfree_kfence > > So pointers to the left of the region and pointers to the right of the > region will take different branches, and so if you have a mix of > objects on both sides of the kfence region, you'll get tons of branch > mispredictions for no good reason. You'll want to rewrite that check > as "unlikely(ptr - base <= SIZE)" instead of "unlikely(ptr >= base && > ptr < base + SIZE" unless you know that all the objects will be on one > side. This would also reduce the performance impact of loading > __kfence_pool from the data section, because the branch prediction can > then speculate the branch that depends on the load properly and > doesn't have to go roll back everything that happened when the object > turns out to be on the opposite side of the kfence memory region - the > latency of the load will hopefully become almost irrelevant. Good point, implemented that. (It's "ptr - base < SIZE" I take it.) > So in x86 intel assembly (assuming that we want to ensure that we only > do a single branch on the object type), the straightforward and > non-terrible version would be: > > > kfree_unoptimized: > mov rax, rdi > sub rax, QWORD PTR __kfence_pool[rip] > cmp rax, 0x200000 > jbe 1f > /* non-kfence case goes here */ > 1: > /* kfence case goes here */ > > > while the version you want is: > > > kfree_static: > mov rax, rdi > sub rax, OFFSET FLAT:__kfence_pool > cmp rax, 0x200000 > jbe 1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > > If we instead use something like > > #define STATIC_VARIABLE_LOAD(variable) \ > ({ \ > typeof(variable) value; \ > BUILD_BUG_ON(sizeof(variable) != sizeof(unsigned long)); \ > asm( \ > ".pushsection .static_variable_users\n\t" \ > ".long " #variable " - .\n\t" \ > ".long 123f - .\n\t" /* offset to end of constant */ \ > ".popsection\n\t" \ > "movabs $0x0123456789abcdef, %0" \ > "123:\n\t" \ > :"=r"(value) \ > ); \ > value; \ > }) > static __always_inline bool is_kfence_address(const void *addr) > { > return unlikely((char*)addr - STATIC_VARIABLE_LOAD(__kfence_pool) < > KFENCE_POOL_SIZE); > } > > to locate the pool (which could again be normally allocated with > alloc_pages()), we'd get code like this, which is like the previous > except that we need an extra "movabs" because x86's "sub" can only use > immediates up to 32 bits: > > kfree_hotpatchable_bigreloc: > mov rax, rdi > movabs rdx, 0x0123456789abcdef > sub rax, rdx > cmp rax, 0x200000 > jbe .1f > jmp kfree_not_kfence > 1: > jmp kfree_kfence > > The arch-specific part of this could probably be packaged up pretty > nicely into a generic interface. If it actually turns out to have a > performance benefit, that is. Something like this would certainly be nice, but we'll do the due diligence and see if it's even worth it. > If that one extra "movabs" is actually a problem, it would > *theoretically* be possible to get rid of that by using module_alloc() > to allocate virtual memory to which offsets from kernel text are 32 > bits, and using special-cased inline asm, but we probably shouldn't do > that, because as Mark pointed out, we'd then risk getting extremely > infrequent extra bugs when drivers use phys_to_virt() on allocations > that were done through kfence. Adding new, extremely infrequent and > sporadically occurring bugs to the kernel seems like the exact > opposite of the goal of KFENCE. :P > > Overall my expectation would be that the MOVABS version should > probably at worst be something like one cycle slower - it adds 5 > instruction bytes (and we pay 1 cycle in the frontend per 16 bytes of > instructions, I think?) and 1 backend cycle (for the MOVABS - Agner > Fog's tables seem to say that at least on Skylake, MOVABS is 1 cycle). > But that backend cycle shouldn't even be on the critical path (and it > has a wider choice of ports than e.g. a load, and I think typical > kernel code isn't exactly highly parallelizable, so we can probably > schedule on a port that would've been free otherwise?), and I think > typical kernel code should be fairly light on the backend, so with the > MOVABS version, compared to the version with __kfence_pool in the data > section, we probably overall just pay a fraction of a cycle in > execution cost? I'm not a professional performance engineer, but this > sounds to me like the MOVABS version should probably perform roughly > as well as your version. > > Anyway, I guess this is all pretty vague without actually having > concrete benchmark results. :P > > See for examples of actual code > generation for different options of writing this check. Thanks for the analysis! There is also some (11 year old) prior art, that seems to never have made it into the kernel: https://lore.kernel.org/lkml/20090924132626.485545323@polymtl.ca/ Maybe we need to understand why that never made it. But I think, even if we drop the static pool, a first version of KFENCE should not depend on it. Thanks, -- Marco _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel