From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 027ADC433DB for ; Thu, 4 Feb 2021 00:54:10 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AF56E64F43 for ; Thu, 4 Feb 2021 00:54:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234267AbhBDAxw (ORCPT ); Wed, 3 Feb 2021 19:53:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232478AbhBDAxp (ORCPT ); Wed, 3 Feb 2021 19:53:45 -0500 Received: from mail-lf1-x129.google.com (mail-lf1-x129.google.com [IPv6:2a00:1450:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3F569C061788 for ; Wed, 3 Feb 2021 16:52:55 -0800 (PST) Received: by mail-lf1-x129.google.com with SMTP id f1so2021422lfu.3 for ; Wed, 03 Feb 2021 16:52:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=gVr/bTGSubf6YnOvvpVG//iNim87v02Skx3jR2rMxXBTcD+HF8G3xEAnQQx+0Q+npP pqDCNmdgNk9JsEEyyZ8pGBqPuhB0LWAtXIXpCnbR2QYDJ+XTkBpqezeM0VhEQL870N4F uJkM0MIlJHoRCuuzrpOOrJtOOSiKWEnnBpQbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=I4aUqJpYVJNZdxbkuzm0q4vuNvZzmQLs7qgFEJ1KFV37KH/EhgZ8PGZdGKYerPfuJn brB/yQow+4YregnXlqatt9g1Nm2KSyY4UUexNTsovjaQxKsbSgveuu0tjnBQnibWjN2b GHjPpOTEVhk9Va2EMl0flVb3taHMsaWJHTVisXXMF0Cf7yxhMVJZK1Je22o2OXAYhStW 9v7vH1JbaaXjljhx67er10OsxcIDcGGqzwFGlTSzL3AH1+ubsj88TiThfcQNLGKcDqdw SHkc1OAQmJwGzbBXKMevZdh8E/lPbEg4oIJdx7Ut+DE45xEVLGjXgILN5F+3Iw9pXwsI q3vg== X-Gm-Message-State: AOAM532Wg9rED/tBE6s2udV+HFtvImMfma+Y2T2x8F3taXGBzjxi+xjU HnQd5PSm7Sa2OgdD8uwGsncT6znopmCAYyTVxw+uqg== X-Google-Smtp-Source: ABdhPJzSprHSC/hLTMNA1akLup57Be6R1Xs+6lZxUgJvmJt7gToshQ1KQ2TSXBrV4DDz+jJi75zB2VS7Nf0bWHH7qGc= X-Received: by 2002:a05:6512:3190:: with SMTP id i16mr3254379lfe.200.1612399973566; Wed, 03 Feb 2021 16:52:53 -0800 (PST) MIME-Version: 1.0 References: <20210203190518.nlwghesq75enas6n@treble> <20210203232735.nw73kugja56jp4ls@treble> <20210204001700.ry6dpqvavcswyvy7@treble> In-Reply-To: <20210204001700.ry6dpqvavcswyvy7@treble> From: Ivan Babrou Date: Wed, 3 Feb 2021 16:52:42 -0800 Message-ID: Subject: Re: BUG: KASAN: stack-out-of-bounds in unwind_next_frame+0x1df5/0x2650 To: Josh Poimboeuf Cc: Peter Zijlstra , kernel-team , Ignat Korchagin , Hailong liu , Andrey Ryabinin , Alexander Potapenko , Dmitry Vyukov , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Miroslav Benes , Julien Thierry , Jiri Slaby , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel , Alasdair Kergon , Mike Snitzer , dm-devel@redhat.com, "Steven Rostedt (VMware)" , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh , Robert Richter , "Joel Fernandes (Google)" , Mathieu Desnoyers , Linux Kernel Network Developers , bpf@vger.kernel.org, Alexey Kardashevskiy Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 3, 2021 at 4:17 PM Josh Poimboeuf wrote: > > On Wed, Feb 03, 2021 at 03:30:35PM -0800, Ivan Babrou wrote: > > > > > Can you recreate with this patch, and add "unwind_debug" to the cmdline? > > > > > It will spit out a bunch of stack data. > > > > > > > > Here's the three I'm building: > > > > > > > > * https://github.com/bobrik/linux/tree/ivan/static-call-5.9 > > > > > > > > It contains: > > > > > > > > * v5.9 tag as the base > > > > * static_call-2020-10-12 tag > > > > * dm-crypt patches to reproduce the issue with KASAN > > > > * x86/unwind: Add 'unwind_debug' cmdline option > > > > * tracepoint: Fix race between tracing and removing tracepoint > > > > > > > > The very same issue can be reproduced on 5.10.11 with no patches, > > > > but I'm going with 5.9, since it boils down to static call changes. > > > > > > > > Here's the decoded stack from the kernel with unwind debug enabled: > > > > > > > > * https://gist.github.com/bobrik/ed052ac0ae44c880f3170299ad4af56b > > > > > > > > See my first email for the exact commands that trigger this. > > > > > > Thanks. Do you happen to have the original dmesg, before running it > > > through the post-processing script? > > > > Yes, here it is: > > > > * https://gist.github.com/bobrik/8c13e6a02555fb21cadabb74cdd6f9ab > > It appears the unwinder is getting lost in crypto code. No idea what > this has to do with static calls though. Or maybe you're seeing > multiple issues. > > Does this fix it? It does for the dm-crypt case! But so does the following commit in 5.11 (and 5.10.12): * https://github.com/torvalds/linux/commit/ce8f86ee94?w=1 The reason I stuck to dm-crypt reproduction is that it reproduces reliably. We also have the following stack that doesn't touch any crypto: * https://gist.github.com/bobrik/40e2559add2f0b26ae39da30dc451f1e I cannot reproduce this one, and it took 2 days of uptime for it to happen. Is there anything I can do to help diagnose it? My goal is to enable multishot KASAN in our pre-production environment, but currently it sometimes starves TX queues on the NIC due to multiple reports in a row in an interrupt about unwind_next_frame, which disables network interface, which is not something we can tolerate. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7AFD6C433DB for ; Thu, 4 Feb 2021 00:52:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F14AD64F4C for ; Thu, 4 Feb 2021 00:52:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F14AD64F4C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=cloudflare.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4D9476B0078; Wed, 3 Feb 2021 19:52:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 48AF46B007D; Wed, 3 Feb 2021 19:52:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37BF26B007E; Wed, 3 Feb 2021 19:52:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0164.hostedemail.com [216.40.44.164]) by kanga.kvack.org (Postfix) with ESMTP id 1F40B6B0078 for ; Wed, 3 Feb 2021 19:52:56 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id DA9538249980 for ; Thu, 4 Feb 2021 00:52:55 +0000 (UTC) X-FDA: 77778760710.13.wing52_47034b3275d7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id B643418140B60 for ; Thu, 4 Feb 2021 00:52:55 +0000 (UTC) X-HE-Tag: wing52_47034b3275d7 X-Filterd-Recvd-Size: 6489 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 00:52:55 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id h12so1964457lfp.9 for ; Wed, 03 Feb 2021 16:52:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=gVr/bTGSubf6YnOvvpVG//iNim87v02Skx3jR2rMxXBTcD+HF8G3xEAnQQx+0Q+npP pqDCNmdgNk9JsEEyyZ8pGBqPuhB0LWAtXIXpCnbR2QYDJ+XTkBpqezeM0VhEQL870N4F uJkM0MIlJHoRCuuzrpOOrJtOOSiKWEnnBpQbw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=h3KAhirAPe745609GYOzlrlsVuA+v2bxfkYeyCOBl3mms8Fsh34DSRKKnJ95m0E2c1 hGQ4zeesyXUluBxFRUy10+siNwdkTvTyKlH/lQbDMC/TSE0cKfNPs6RqLqAJQcCc5Uwo q/1GoPc7xr1QUes7mJD5y+GCqwB8vVzwsBKTNoXlCn78nEAo1mwS/ByPupJiUvn7GdNS lW+ttWlSZdnCLMuXkIyZRv8RBjEpXdWkIy5CcX5SmGNyEP1sT+u/XIMbhCT9tZqIGTG7 HmSijA54jxenNH1X5Q4W1KUUHuhMCCM7lFUk0D59MnGe6zUOiu5ZaIR1Kz6aekgid+FQ y/UA== X-Gm-Message-State: AOAM533Zo3WioOioHaiFS/5bzi6FWByx1Z1V5DN5Zk771VPXX5RImqxS 4yEil7R0yW0ICr7CE0HnioDsDCQ6wip9mhppAPRh9Q== X-Google-Smtp-Source: ABdhPJzSprHSC/hLTMNA1akLup57Be6R1Xs+6lZxUgJvmJt7gToshQ1KQ2TSXBrV4DDz+jJi75zB2VS7Nf0bWHH7qGc= X-Received: by 2002:a05:6512:3190:: with SMTP id i16mr3254379lfe.200.1612399973566; Wed, 03 Feb 2021 16:52:53 -0800 (PST) MIME-Version: 1.0 References: <20210203190518.nlwghesq75enas6n@treble> <20210203232735.nw73kugja56jp4ls@treble> <20210204001700.ry6dpqvavcswyvy7@treble> In-Reply-To: <20210204001700.ry6dpqvavcswyvy7@treble> From: Ivan Babrou Date: Wed, 3 Feb 2021 16:52:42 -0800 Message-ID: Subject: Re: BUG: KASAN: stack-out-of-bounds in unwind_next_frame+0x1df5/0x2650 To: Josh Poimboeuf Cc: Peter Zijlstra , kernel-team , Ignat Korchagin , Hailong liu , Andrey Ryabinin , Alexander Potapenko , Dmitry Vyukov , Andrew Morton , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , Miroslav Benes , Julien Thierry , Jiri Slaby , kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel , Alasdair Kergon , Mike Snitzer , dm-devel@redhat.com, "Steven Rostedt (VMware)" , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh , Robert Richter , "Joel Fernandes (Google)" , Mathieu Desnoyers , Linux Kernel Network Developers , bpf@vger.kernel.org, Alexey Kardashevskiy Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 3, 2021 at 4:17 PM Josh Poimboeuf wrote: > > On Wed, Feb 03, 2021 at 03:30:35PM -0800, Ivan Babrou wrote: > > > > > Can you recreate with this patch, and add "unwind_debug" to the cmdline? > > > > > It will spit out a bunch of stack data. > > > > > > > > Here's the three I'm building: > > > > > > > > * https://github.com/bobrik/linux/tree/ivan/static-call-5.9 > > > > > > > > It contains: > > > > > > > > * v5.9 tag as the base > > > > * static_call-2020-10-12 tag > > > > * dm-crypt patches to reproduce the issue with KASAN > > > > * x86/unwind: Add 'unwind_debug' cmdline option > > > > * tracepoint: Fix race between tracing and removing tracepoint > > > > > > > > The very same issue can be reproduced on 5.10.11 with no patches, > > > > but I'm going with 5.9, since it boils down to static call changes. > > > > > > > > Here's the decoded stack from the kernel with unwind debug enabled: > > > > > > > > * https://gist.github.com/bobrik/ed052ac0ae44c880f3170299ad4af56b > > > > > > > > See my first email for the exact commands that trigger this. > > > > > > Thanks. Do you happen to have the original dmesg, before running it > > > through the post-processing script? > > > > Yes, here it is: > > > > * https://gist.github.com/bobrik/8c13e6a02555fb21cadabb74cdd6f9ab > > It appears the unwinder is getting lost in crypto code. No idea what > this has to do with static calls though. Or maybe you're seeing > multiple issues. > > Does this fix it? It does for the dm-crypt case! But so does the following commit in 5.11 (and 5.10.12): * https://github.com/torvalds/linux/commit/ce8f86ee94?w=1 The reason I stuck to dm-crypt reproduction is that it reproduces reliably. We also have the following stack that doesn't touch any crypto: * https://gist.github.com/bobrik/40e2559add2f0b26ae39da30dc451f1e I cannot reproduce this one, and it took 2 days of uptime for it to happen. Is there anything I can do to help diagnose it? My goal is to enable multishot KASAN in our pre-production environment, but currently it sometimes starves TX queues on the NIC due to multiple reports in a row in an interrupt about unwind_next_frame, which disables network interface, which is not something we can tolerate. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ACCBC4332E for ; Thu, 4 Feb 2021 09:15:10 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6353364F5C for ; Thu, 4 Feb 2021 09:15:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6353364F5C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=cloudflare.com Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=dm-devel-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-336--VDo8Op1OymOms3JxxIkZQ-1; Thu, 04 Feb 2021 04:15:05 -0500 X-MC-Unique: -VDo8Op1OymOms3JxxIkZQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E76E0107ACF2; Thu, 4 Feb 2021 09:15:00 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C59635FC3A; Thu, 4 Feb 2021 09:15:00 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 95EDC1809C91; Thu, 4 Feb 2021 09:15:00 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 1140r0Ng010127 for ; Wed, 3 Feb 2021 19:53:00 -0500 Received: by smtp.corp.redhat.com (Postfix) id 470002166B2F; Thu, 4 Feb 2021 00:53:00 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast05.extmail.prod.ext.rdu2.redhat.com [10.11.55.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 402332166B30 for ; Thu, 4 Feb 2021 00:52:57 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 47B18800157 for ; Thu, 4 Feb 2021 00:52:57 +0000 (UTC) Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-540-oU60ui9uOPaRl84DgEqHmA-1; Wed, 03 Feb 2021 19:52:55 -0500 X-MC-Unique: oU60ui9uOPaRl84DgEqHmA-1 Received: by mail-lf1-f52.google.com with SMTP id i187so2009849lfd.4 for ; Wed, 03 Feb 2021 16:52:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=z0j8u/cgjNk63BrE1ROiiO7WIMH6hW9ZaraaxN/GjSE=; b=KpdGIuOlhOPVy5nMEWH/0RwHTpiMWngYNqH2Zl5wgQ7kmUfqej4R1eD48T4ElPko4T UR6dNyQa/X5acnS3h8PIUf5Nf8R1Pflxd4OaUNOIFF6zln7Qr0SuwTLPvhvO1LGU6AAO S+1kp8BTNG8MTEyrb+ULLPfAoMto1kSKduswOj5EGfGaYeFoQ2OPNLAKuy20l8TIh5M0 ey4Hhh93jIm9MIDPWOaXFUWUL1JLu9pnR+yDNaJBV76WisIK+OyDrm/CIshwRacHkeNz QarIrroI+QXm2ntb5kwnVzIynm51vAlf7In8Nz4oxLKcNMcMKTyW9yFZV/ZKSYvDBmwU Y5Fw== X-Gm-Message-State: AOAM5323n1SnxlA42tV8TtD7Z0QUdOrBNTcNSI/hEdOJM2Dj6q6KIrYQ CdQQm9fsDEGo/YJKUYcyhfHUEqzgAy+TbO/icDyl1w== X-Google-Smtp-Source: ABdhPJzSprHSC/hLTMNA1akLup57Be6R1Xs+6lZxUgJvmJt7gToshQ1KQ2TSXBrV4DDz+jJi75zB2VS7Nf0bWHH7qGc= X-Received: by 2002:a05:6512:3190:: with SMTP id i16mr3254379lfe.200.1612399973566; Wed, 03 Feb 2021 16:52:53 -0800 (PST) MIME-Version: 1.0 References: <20210203190518.nlwghesq75enas6n@treble> <20210203232735.nw73kugja56jp4ls@treble> <20210204001700.ry6dpqvavcswyvy7@treble> In-Reply-To: <20210204001700.ry6dpqvavcswyvy7@treble> From: Ivan Babrou Date: Wed, 3 Feb 2021 16:52:42 -0800 Message-ID: To: Josh Poimboeuf X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-loop: dm-devel@redhat.com X-Mailman-Approved-At: Thu, 04 Feb 2021 04:14:38 -0500 Cc: Song Liu , Mike Snitzer , Peter Zijlstra , Ignat Korchagin , Alexei Starovoitov , linux-mm@kvack.org, dm-devel@redhat.com, Alexander Potapenko , "H. Peter Anvin" , "Joel Fernandes \(Google\)" , Miroslav Benes , Jiri Slaby , Alasdair Kergon , Daniel Borkmann , kernel-team , Hailong liu , x86@kernel.org, John Fastabend , kasan-dev@googlegroups.com, Alexey Kardashevskiy , Ingo Molnar , Andrey Ryabinin , Andrii Nakryiko , Robert Richter , "Steven Rostedt \(VMware\)" , Borislav Petkov , Yonghong Song , KP Singh , Thomas Gleixner , bpf@vger.kernel.org, Dmitry Vyukov , Julien Thierry , Linux Kernel Network Developers , linux-kernel , Mathieu Desnoyers , Andrew Morton , Martin KaFai Lau Subject: Re: [dm-devel] BUG: KASAN: stack-out-of-bounds in unwind_next_frame+0x1df5/0x2650 X-BeenThere: dm-devel@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: device-mapper development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=dm-devel-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Wed, Feb 3, 2021 at 4:17 PM Josh Poimboeuf wrote: > > On Wed, Feb 03, 2021 at 03:30:35PM -0800, Ivan Babrou wrote: > > > > > Can you recreate with this patch, and add "unwind_debug" to the cmdline? > > > > > It will spit out a bunch of stack data. > > > > > > > > Here's the three I'm building: > > > > > > > > * https://github.com/bobrik/linux/tree/ivan/static-call-5.9 > > > > > > > > It contains: > > > > > > > > * v5.9 tag as the base > > > > * static_call-2020-10-12 tag > > > > * dm-crypt patches to reproduce the issue with KASAN > > > > * x86/unwind: Add 'unwind_debug' cmdline option > > > > * tracepoint: Fix race between tracing and removing tracepoint > > > > > > > > The very same issue can be reproduced on 5.10.11 with no patches, > > > > but I'm going with 5.9, since it boils down to static call changes. > > > > > > > > Here's the decoded stack from the kernel with unwind debug enabled: > > > > > > > > * https://gist.github.com/bobrik/ed052ac0ae44c880f3170299ad4af56b > > > > > > > > See my first email for the exact commands that trigger this. > > > > > > Thanks. Do you happen to have the original dmesg, before running it > > > through the post-processing script? > > > > Yes, here it is: > > > > * https://gist.github.com/bobrik/8c13e6a02555fb21cadabb74cdd6f9ab > > It appears the unwinder is getting lost in crypto code. No idea what > this has to do with static calls though. Or maybe you're seeing > multiple issues. > > Does this fix it? It does for the dm-crypt case! But so does the following commit in 5.11 (and 5.10.12): * https://github.com/torvalds/linux/commit/ce8f86ee94?w=1 The reason I stuck to dm-crypt reproduction is that it reproduces reliably. We also have the following stack that doesn't touch any crypto: * https://gist.github.com/bobrik/40e2559add2f0b26ae39da30dc451f1e I cannot reproduce this one, and it took 2 days of uptime for it to happen. Is there anything I can do to help diagnose it? My goal is to enable multishot KASAN in our pre-production environment, but currently it sometimes starves TX queues on the NIC due to multiple reports in a row in an interrupt about unwind_next_frame, which disables network interface, which is not something we can tolerate. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel