From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F455C433EF for ; Sat, 21 May 2022 22:31:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244696AbiEUWbI (ORCPT ); Sat, 21 May 2022 18:31:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234785AbiEUWbG (ORCPT ); Sat, 21 May 2022 18:31:06 -0400 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CD7ED63CA for ; Sat, 21 May 2022 15:31:03 -0700 (PDT) Received: by mail-io1-xd2b.google.com with SMTP id z20so3477043iof.1 for ; Sat, 21 May 2022 15:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=NaAO8pGGJ3T+MKNvg6WEHzeviTfr99ijUwaHhRJvuBfYsjLET3d2JzRD+AbYnPvVsp UUiKuvcF2XUO2+p4aKPweMWzrGVFtR00l7hYnztPRs5axpgJ3hnruLd7i165qiT0EPyo KdMMBVmFCSSl1Q6foB6wS++7HnFaNF+pHuyzZ68DogBZwEbz+Kgy5N11MsKwDzacegxo jH3CdtTIsOrfdezE0zxgJmKFy2YRJkWJNGOMEfBdjDirby+7ZdR0JvKy+kB8RFVBXbEe 7mzHsxgyZgWOEdBtjPCcY7ZNvGQv7476Y0XvRO8aKvysFgECGRaSKYS3qRAijP4WtZys 5TuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=LUCJHWw+HHpcQl00E0ADpWbkwOBdYZBHJRgH/fklxHtGOaSbT05nINQUMheY5YZHPB hyi07LulHTGU+7MXy8oXtgt6e9idyUBFhCknQgmCSfxkjEQPtVT1Y+qfaDvCjrayAU1B T+iHtBoTARRYxzFWVswn+6NDaxWFuZ2TWoBcZY0bl6fJ+IclK7Po0jCbe6+/8wcIq+sF YHU6ksrhTIuIjLhUiFb/jLAtshy1wtF9NG/Vufvs0Tnpkqdy6MXF1QiKGNm3G0V/p85L tNm29yDTrOZsgaAWJGSOS6wtcx8G+JF4lSWbeFxyyLoLlC2hqmJnjNxh34tZqJfRyem1 RGgg== X-Gm-Message-State: AOAM532HzcFUcFmhtyfLLnEDIkDl4VQ0q4QYRXGdIOrdpRY8ioavxN3f E5HfxPSGuKESPuuZgfukbsG7QEu4B9zJ9hmR1F0= X-Google-Smtp-Source: ABdhPJzZ3q9U8SzINmFXzXhh1wuHsq2L8h2lKquyKk70w4Hj7ZmKbqyB6DmkTDCvjEzonsVoYuXQLydy0bsaZUjcTyQ= X-Received: by 2002:a05:6638:381c:b0:32e:49f9:5b6e with SMTP id i28-20020a056638381c00b0032e49f95b6emr8942455jav.71.1653172263265; Sat, 21 May 2022 15:31:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrey Konovalov Date: Sun, 22 May 2022 00:30:52 +0200 Message-ID: Subject: Re: [PATCH v3 0/3] kasan, arm64, scs: collect stack traces from Shadow Call Stack To: Mark Rutland Cc: andrey.konovalov@linux.dev, Marco Elver , Alexander Potapenko , Dmitry Vyukov , Andrey Ryabinin , kasan-dev , Catalin Marinas , Will Deacon , Vincenzo Frascino , Sami Tolvanen , Linux ARM , Peter Collingbourne , Evgenii Stepanov , Florian Mayer , Andrew Morton , Linux Memory Management List , LKML , Andrey Konovalov Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 14, 2022 at 2:37 PM Mark Rutland wrote: > Hi Mark, Sorry for the delayed response, it took some time getting my hands on hardware for testing these changes. > Just to be clear: QEMU TCG mode is *in no way* representative of HW > performance, and has drastically different performance characteristics > compared to real HW. Please be very clear when you are quoting > performance figures from QEMU TCG mode. > > Previously you said you were trying to optimize this so that some > version of KASAN could be enabled in production builds, and the above is > not a suitable benchmark system for that. Understood. My expectation was that performance numbers from QEMU would be close to hardware. I knew that there are instructions that take longer to be emulated, but I expected that they would be uniformly spread across the code. However, your explanation proved this wrong. This indeed doesn't apply when measuring the performance of a piece of code with a different density of function calls. Thank you for the detailed explanation! Those QEMU arguments will definitely be handy when I need a faster QEMU setup. > Is that *actually* what you're trying to enable, or are you just trying > to speed up running instances under QEMU (e.g. for arm64 Syzkaller runs > on GCE)? No, I'm not trying to speed up QEMU. QEMU was just the only setup that I had access to at that moment. The goal is to allow enabling stack trace collection in production on HW_TAGS-enabled devices once those are created. [...] > While the SCS unwinder is still faster, the difference is nowhere near > as pronounced. As I mentioned before, there are changes that we can make > to the regular unwinder to close that gap somewhat, some of which I > intend to make as part of ongoing cleanup/rework in that area. I tried running the same experiments on Pixel 6. Unfortunately, I was only able to test the OUTLINE SW_TAGS mode (without STACK instrumentation, as HW_TAGS doesn't support STACK at the moment.) All of the other modes either fail to flash or fail to boot with AOSP on Pixel 6 :( The results are (timestamps were measured when "ALSA device list" was printed to the kernel log): sw_tags outline nostacks: 2.218 sw_tags outline: 2.516 (+13.4%) sw_tags outline nosanitize: 2.364 (+6.5%) sw_tags outline nosanitize __set_bit: 2.364 (+6.5%) sw_tags outline nosanitize scs: 2.236 (+0.8%) Used markings: nostacks: patch from master-no-stack-traces applied nosanitize: KASAN_SANITIZE_stacktrace.o := n __set_bit: set_bit -> __set_bit change applied scs: patches from up-scs-stacks-v3 applied First, disabling instrumentation of stacktrace.c is indeed a great idea for software KASAN modes! I will send a patch for this later. Changing set_bit to __set_bit seems to make no difference on Pixel 6. The awesome part is that the overhead of collecting stack traces with SCS and even saving them into the stack depot is less than 1%. However once again note, that this is for OUTLINE SW_TAGS without STACK. > I haven't bothered testing HW_TAGS, because the performance > characteristics of emulated MTE are also nothing like that of a real HW > implementation. > > So, given that and the problems I mentioned before, I don't think > there's a justification for adding a separate SCS unwinder. As before, > I'm still happy to try to make the regular unwinder faster (and I'm > happy to make changes which benefit QEMU TCG mode if those don't harm > the maintainability of the unwinder). > > NAK to adding an SCS-specific unwinder, regardless of where in the > source tree that is placed. I see. Perhaps, it makes sense to wait until there's HW_TAGS-enabled hardware available before continuing to look into this. At the end, the performance overhead for that setup is what matters. I'll look into improving the performance of the existing unwinder a bit more. However, I don't think I'll be able to speed it up to < 1%. Which means that we'll likely need a sample-based approach for HW_TAGS stack collection to reduce the overhead. Thank you! From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id BEBD7C433F5 for ; Sat, 21 May 2022 22:32:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From: In-Reply-To:References:MIME-Version:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=JvMDs6D9KMJBGxw0wIW60SObwvdR8JGwC+mayRiuW4Q=; b=ynp5LClOHeVsRV rqz5LFT51RN4quHzXDwSU0J+bKEy7YEbyylZxIaXaH7/82nBK5nyTTGxq1P0Azbo6OmsEfWmpHN9e ArnVqnzxiN0ImOiINQNw7pHr8RaPM656vn1ZredLpo+hVb+rmYM49bhvO1hGgyE2HEVWREqDxPCmv rB0CJ5nPIbLbXZ2QLW19TENXxPW9PrZOEVhwvlRf67J6AEKOcGmtVTc0gFXryHCX+nWJ2zJZJFsoe DYUOXDMwD0bOMpSn8QjJLhEALoZiw3Yw9Jqjpoy9zKJrnOLzY701oIFen66NL9kjOxflceurpR80Z 3IVW0wbsNpeeLlY3kMBw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1nsXcr-0004Di-A2; Sat, 21 May 2022 22:31:09 +0000 Received: from mail-io1-xd34.google.com ([2607:f8b0:4864:20::d34]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1nsXcn-0004Cy-HJ for linux-arm-kernel@lists.infradead.org; Sat, 21 May 2022 22:31:07 +0000 Received: by mail-io1-xd34.google.com with SMTP id n145so2415280iod.3 for ; Sat, 21 May 2022 15:31:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=NaAO8pGGJ3T+MKNvg6WEHzeviTfr99ijUwaHhRJvuBfYsjLET3d2JzRD+AbYnPvVsp UUiKuvcF2XUO2+p4aKPweMWzrGVFtR00l7hYnztPRs5axpgJ3hnruLd7i165qiT0EPyo KdMMBVmFCSSl1Q6foB6wS++7HnFaNF+pHuyzZ68DogBZwEbz+Kgy5N11MsKwDzacegxo jH3CdtTIsOrfdezE0zxgJmKFy2YRJkWJNGOMEfBdjDirby+7ZdR0JvKy+kB8RFVBXbEe 7mzHsxgyZgWOEdBtjPCcY7ZNvGQv7476Y0XvRO8aKvysFgECGRaSKYS3qRAijP4WtZys 5TuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FonIHMi/3lO9Jffb1ttV3+nPKi58eDv/a59C6q9nmzI=; b=gqzbgPE69mCBrFBFQVAgwuXb3QRnVZOjqbFX5hjSchZLJi3m9kfikAVGyHUAAPD4Id sMoDv95bT9bjy0vrQVTnC2m8v5z667M0lEPLo5Snk8YSLcL/GEKZjahHcsyZHAQPjfKx hcgQPDGmTjmqsHxyGKjeO3aJ8jhw7r2r3s8Ladjq+rPHCPrkJ4I42hzR+rpauf4r4n5Z Nj0VNAuofbismkScbi3L4PQrKlymU6RgNLyK/GiGpejnDUtn6vajtPHDGZ9evgCX43Fj v+s89q5ucvyjsb0+YvXPXagR/3NkruqH86h9U8VOFQUdUstaufA347yg+kgBuw7zD82J lv0g== X-Gm-Message-State: AOAM531yk/elsUp5xAr8+ESipvbt9ZHxlufwvJU4rne86M7J71zItIcT l8Un8zFHZJ7OeRp48QKLTKvV892g5aSxX0DNXzY= X-Google-Smtp-Source: ABdhPJzZ3q9U8SzINmFXzXhh1wuHsq2L8h2lKquyKk70w4Hj7ZmKbqyB6DmkTDCvjEzonsVoYuXQLydy0bsaZUjcTyQ= X-Received: by 2002:a05:6638:381c:b0:32e:49f9:5b6e with SMTP id i28-20020a056638381c00b0032e49f95b6emr8942455jav.71.1653172263265; Sat, 21 May 2022 15:31:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Andrey Konovalov Date: Sun, 22 May 2022 00:30:52 +0200 Message-ID: Subject: Re: [PATCH v3 0/3] kasan, arm64, scs: collect stack traces from Shadow Call Stack To: Mark Rutland Cc: andrey.konovalov@linux.dev, Marco Elver , Alexander Potapenko , Dmitry Vyukov , Andrey Ryabinin , kasan-dev , Catalin Marinas , Will Deacon , Vincenzo Frascino , Sami Tolvanen , Linux ARM , Peter Collingbourne , Evgenii Stepanov , Florian Mayer , Andrew Morton , Linux Memory Management List , LKML , Andrey Konovalov X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220521_153105_619052_EC370187 X-CRM114-Status: GOOD ( 36.54 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Apr 14, 2022 at 2:37 PM Mark Rutland wrote: > Hi Mark, Sorry for the delayed response, it took some time getting my hands on hardware for testing these changes. > Just to be clear: QEMU TCG mode is *in no way* representative of HW > performance, and has drastically different performance characteristics > compared to real HW. Please be very clear when you are quoting > performance figures from QEMU TCG mode. > > Previously you said you were trying to optimize this so that some > version of KASAN could be enabled in production builds, and the above is > not a suitable benchmark system for that. Understood. My expectation was that performance numbers from QEMU would be close to hardware. I knew that there are instructions that take longer to be emulated, but I expected that they would be uniformly spread across the code. However, your explanation proved this wrong. This indeed doesn't apply when measuring the performance of a piece of code with a different density of function calls. Thank you for the detailed explanation! Those QEMU arguments will definitely be handy when I need a faster QEMU setup. > Is that *actually* what you're trying to enable, or are you just trying > to speed up running instances under QEMU (e.g. for arm64 Syzkaller runs > on GCE)? No, I'm not trying to speed up QEMU. QEMU was just the only setup that I had access to at that moment. The goal is to allow enabling stack trace collection in production on HW_TAGS-enabled devices once those are created. [...] > While the SCS unwinder is still faster, the difference is nowhere near > as pronounced. As I mentioned before, there are changes that we can make > to the regular unwinder to close that gap somewhat, some of which I > intend to make as part of ongoing cleanup/rework in that area. I tried running the same experiments on Pixel 6. Unfortunately, I was only able to test the OUTLINE SW_TAGS mode (without STACK instrumentation, as HW_TAGS doesn't support STACK at the moment.) All of the other modes either fail to flash or fail to boot with AOSP on Pixel 6 :( The results are (timestamps were measured when "ALSA device list" was printed to the kernel log): sw_tags outline nostacks: 2.218 sw_tags outline: 2.516 (+13.4%) sw_tags outline nosanitize: 2.364 (+6.5%) sw_tags outline nosanitize __set_bit: 2.364 (+6.5%) sw_tags outline nosanitize scs: 2.236 (+0.8%) Used markings: nostacks: patch from master-no-stack-traces applied nosanitize: KASAN_SANITIZE_stacktrace.o := n __set_bit: set_bit -> __set_bit change applied scs: patches from up-scs-stacks-v3 applied First, disabling instrumentation of stacktrace.c is indeed a great idea for software KASAN modes! I will send a patch for this later. Changing set_bit to __set_bit seems to make no difference on Pixel 6. The awesome part is that the overhead of collecting stack traces with SCS and even saving them into the stack depot is less than 1%. However once again note, that this is for OUTLINE SW_TAGS without STACK. > I haven't bothered testing HW_TAGS, because the performance > characteristics of emulated MTE are also nothing like that of a real HW > implementation. > > So, given that and the problems I mentioned before, I don't think > there's a justification for adding a separate SCS unwinder. As before, > I'm still happy to try to make the regular unwinder faster (and I'm > happy to make changes which benefit QEMU TCG mode if those don't harm > the maintainability of the unwinder). > > NAK to adding an SCS-specific unwinder, regardless of where in the > source tree that is placed. I see. Perhaps, it makes sense to wait until there's HW_TAGS-enabled hardware available before continuing to look into this. At the end, the performance overhead for that setup is what matters. I'll look into improving the performance of the existing unwinder a bit more. However, I don't think I'll be able to speed it up to < 1%. Which means that we'll likely need a sample-based approach for HW_TAGS stack collection to reduce the overhead. Thank you! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel