From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 723C4C433B4
	for <linux-kernel@archiver.kernel.org>; Wed, 12 May 2021 00:23:44 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 4292661278
	for <linux-kernel@archiver.kernel.org>; Wed, 12 May 2021 00:23:44 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230114AbhELAYt (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Tue, 11 May 2021 20:24:49 -0400
Received: from linux.microsoft.com ([13.77.154.182]:50184 "EHLO
        linux.microsoft.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229964AbhELAYs (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Tue, 11 May 2021 20:24:48 -0400
Received: from sequoia (162-237-133-238.lightspeed.rcsntx.sbcglobal.net [162.237.133.238])
        by linux.microsoft.com (Postfix) with ESMTPSA id F3AAA20B7178;
        Tue, 11 May 2021 17:23:40 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com F3AAA20B7178
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;
        s=default; t=1620779021;
        bh=PCcqHKfcC40XETopB6mFbp3LGmM21EL73MZsy4vVBJ0=;
        h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
        b=QgJ6uyMu3jjpjMs7Ydf+9sedTXy1H9EXs/aAtC0eNIxuA5OwAcWVr7EQPrsG0JzOe
         WPILUmjlHq17uzKwrHVkLZXKJG3/dI9wUr/RWQGQlAEwEViwaxLs0Au2j2NDDBkKJK
         4n0wvdCqG18QjTMZu46cGFqcFixh+h9irJ6Yx0pY=
Date:   Tue, 11 May 2021 19:23:38 -0500
From:   Tyler Hicks <tyhicks@linux.microsoft.com>
To:     Jens Wiklander <jens.wiklander@linaro.org>
Cc:     Allen Pais <apais@linux.microsoft.com>, zajec5@gmail.com,
        Allen Pais <allen.lkml@gmail.com>,
        bcm-kernel-feedback-list@broadcom.com,
        Linux ARM <linux-arm-kernel@lists.infradead.org>,
        Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
        OP-TEE TrustedFirmware <op-tee@lists.trustedfirmware.org>
Subject: Re: [PATCH] optee: Disable shm cache when booting the crash kernel
Message-ID: <20210512002338.GJ4967@sequoia>
References: <20210225090610.242623-1-allen.lkml@gmail.com>
 <20210507035816.426585-1-tyhicks@linux.microsoft.com>
 <720CDF03-42F9-43C3-B3B3-999E4A5E2864@linux.microsoft.com>
 <CAHUa44FHo2_EUzFzHnakkm3o7H-Nn+k4hgqT2WNFezZO6D8mxA@mail.gmail.com>
 <20210507131722.GI4967@sequoia>
 <CAHUa44F7AzRQ0ZUBtJV2Y39tk1mPGAbetn7i8-DVAsSFLbFgEg@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CAHUa44F7AzRQ0ZUBtJV2Y39tk1mPGAbetn7i8-DVAsSFLbFgEg@mail.gmail.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 2021-05-10 09:31:51, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > >
> > > >
> > > >
> > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > >
> > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > quickly as possible without allowing drivers to clean up.
> > > > >
> > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > the address results in a fault that cannot be handled:
> > > > >
> > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > Mem abort info:
> > > > >   ESR = 0x96000004
> > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >   SET = 0, FnV = 0
> > > > >   EA = 0, S1PTW = 0
> > > > > Data abort info:
> > > > >   ISV = 0, ISS = 0x00000004
> > > > >   CM = 0, WnR = 0
> > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > Hardware name: Redacted (DT)
> > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > sp : ffff80001005bb70
> > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > Call trace:
> > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > >
> > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > careful to not translate the addresses returned from
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > kernel.
> > > > >
> > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > ---
> > > > >
> > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > original two patch series[2] plus this patch is the full solution to
> > > > > properly handling OP-TEE shared memory across kexec.
> > > > >
> > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > became unresponsive during boot while steadily streaming the following
> > > > > errors to the serial console:
> > > > >
> > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > >
> > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > addresses?
> >
> > @Jens did you have any thoughts on what could be happening here with the
> > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > addresses when booting the kdump kernel, rather than just disabling the
> > caches?
> 
> No idea. There's no support for SMMU in upstream OP-TEE. Just
> disabling the caches should be good enough. You could try to never
> enable the cache so see if it makes any difference.

I think this is unrelated to OP-TEE and more to do with ongoing DMA
activity when the kernel has crashed and we've done an emergency kexec
into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
has a v2 SMMU and I think something similar to commit 3f54c447df34
("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
good so far. I'll send that out as a separate patch after a little more
testing.

However, with that change and my earlier change to disable the shm cache
during boot, I'm periodically seeing a different issue while the kdump
kernel is coming up. I'm pretty certain it was already there before but
I wasn't seeing it as often since the SMMU warnings were so "loud".

The kernel waits indefinitely for a secure world thread and boot hangs
completely:

[  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  243.366141]       Not tainted 5.4.83-microsoft-standard #1
[  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.379882] swapper/0       D    0     1      0 0x00000028
[  243.385543] Call trace:
[  243.388080]  __switch_to+0xc8/0x118
[  243.391683]  __schedule+0x2e0/0x700
[  243.395280]  schedule+0x38/0xb8
[  243.398522]  schedule_timeout+0x258/0x388
[  243.402659]  wait_for_completion+0x16c/0x4b8
[  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
[  243.412100]  optee_disable_shm_cache+0xb8/0xf8
[  243.416685]  optee_probe+0x560/0x61c
[  243.420375]  platform_drv_probe+0x58/0xa8
[  243.424512]  really_probe+0xe0/0x338
[  243.428202]  driver_probe_device+0x5c/0xf0
[  243.432427]  device_driver_attach+0x74/0x80
[  243.436744]  __driver_attach+0x64/0xe0
[  243.440611]  bus_for_each_dev+0x84/0xd8
[  243.444570]  driver_attach+0x30/0x40
[  243.448258]  bus_add_driver+0x188/0x1e8
[  243.452215]  driver_register+0x64/0x110
[  243.456172]  __platform_driver_register+0x54/0x60
[  243.461027]  optee_driver_init+0x20/0x28
[  243.465075]  do_one_initcall+0x54/0x24c
[  243.469034]  kernel_init_freeable+0x1e8/0x2c0
[  243.473529]  kernel_init+0x18/0x118
[  243.477128]  ret_from_fork+0x10/0x18

I'm unable to trigger a sysrq over the serial console of this remote
machine so I don't yet know what the other threads on the system are
doing during this time. I'll hack something together tomorrow to get a
better idea.

The blocked task warning reminded me of when you said this earlier:

> Bear in mind that there are other times where we can't recover from a
> kernel crash. For instance if a thread is executing in OP-TEE in
> secure world. 

I suspect that it is related to what I'm seeing with this blocked task. Can you
expand on why we can't recover from a kernel crash if a thread is
executing in the secure world?

I appreciate your help!

Tyler

> 
> Cheers,
> Jens
> 

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=piaZ=KH=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-9.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable
	autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 34C84C433B4
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 12 May 2021 00:25:43 +0000 (UTC)
Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id B0548616E8
	for <linux-arm-kernel@archiver.kernel.org>; Wed, 12 May 2021 00:25:42 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B0548616E8
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.microsoft.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding
	:Content-Type:List-Subscribe:List-Help:List-Post:List-Archive:
	List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID:
	Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date:
	Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner;
	 bh=PtEU7/x3Uq+lNF+AQPZS5W0UpGeVWbq1gh2g1qS9hbI=; b=UFw4swqtbH3+ZicmNTGPbZPNj
	6quAiIlHypLRyTOmVNOgNKXfYgFlkXXRSgustCj+RhxM5ga+HwkMwO3pYvZKk6E4Jt3x2B8qROS1/
	LEvw5z0fFCaxnwXWNRi6y89l8XlANFcvRWeiuxcgeJlqqfFyrmYSOhZOel2Dc/t4ckgC6nH9Z46t5
	LP68r5QNSjmPzaaLY2YhEZ5kIHhET3sAIccu3I/k2YUrU4G/HabLHLMa4PeSeWxPgaixtYwE4UiwI
	l65CBjAlbwErcXoPUEi8Y8y5Q33Nl3ym8Y127RfUEJWyYeVmjYxpiUz5FMg5V+80JoFjzXwI3f6nq
	owOlANoZg==;
Received: from localhost ([::1] helo=desiato.infradead.org)
	by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux))
	id 1lgcfI-001b5D-Rj; Wed, 12 May 2021 00:23:53 +0000
Received: from bombadil.infradead.org ([2607:7c80:54:e::133])
 by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux))
 id 1lgcfE-001b4v-Km
 for linux-arm-kernel@desiato.infradead.org; Wed, 12 May 2021 00:23:48 +0000
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=infradead.org; s=bombadil.20210309; h=In-Reply-To:Content-Type:MIME-Version
 :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:
 Content-Transfer-Encoding:Content-ID:Content-Description;
 bh=PCcqHKfcC40XETopB6mFbp3LGmM21EL73MZsy4vVBJ0=; b=qZbfYw2KBCXHxt8Ko0wIGNbFch
 Ms0G9cIl2NZYBFMoFAYYZCvtpJymwUMClYghpmZBmMMsFb0hfc3io/RAHd/E+PuOEzT957FULHj4T
 qjCAa8YEULL5CfmZ+at/ExhZANwr6z876hLNFA4ZCdf03C9UYqeFEZF2+c41ToeVHXHX+eHYTfU31
 Eo4121L/T8MOif1XlnXvlvL6Zs7R12unPDrtdLoIEJ966a6JdHVwM9OE82En3++rwN+7tQ21pgXNS
 3JZM4Mnr5TjITNRR+9T31SdRfrZt94IkF1oAjRLjVZy5k7Ov9Ln4sm6OqPL+V3XNEMaRDqMEsxTd0
 Eg0VyOnQ==;
Received: from linux.microsoft.com ([13.77.154.182])
 by bombadil.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux))
 id 1lgcfB-00A0rj-Go
 for linux-arm-kernel@lists.infradead.org; Wed, 12 May 2021 00:23:47 +0000
Received: from sequoia (162-237-133-238.lightspeed.rcsntx.sbcglobal.net
 [162.237.133.238])
 by linux.microsoft.com (Postfix) with ESMTPSA id F3AAA20B7178;
 Tue, 11 May 2021 17:23:40 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com F3AAA20B7178
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com;
 s=default; t=1620779021;
 bh=PCcqHKfcC40XETopB6mFbp3LGmM21EL73MZsy4vVBJ0=;
 h=Date:From:To:Cc:Subject:References:In-Reply-To:From;
 b=QgJ6uyMu3jjpjMs7Ydf+9sedTXy1H9EXs/aAtC0eNIxuA5OwAcWVr7EQPrsG0JzOe
 WPILUmjlHq17uzKwrHVkLZXKJG3/dI9wUr/RWQGQlAEwEViwaxLs0Au2j2NDDBkKJK
 4n0wvdCqG18QjTMZu46cGFqcFixh+h9irJ6Yx0pY=
Date: Tue, 11 May 2021 19:23:38 -0500
From: Tyler Hicks <tyhicks@linux.microsoft.com>
To: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Allen Pais <apais@linux.microsoft.com>, zajec5@gmail.com,
 Allen Pais <allen.lkml@gmail.com>, bcm-kernel-feedback-list@broadcom.com,
 Linux ARM <linux-arm-kernel@lists.infradead.org>,
 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
 OP-TEE TrustedFirmware <op-tee@lists.trustedfirmware.org>
Subject: Re: [PATCH] optee: Disable shm cache when booting the crash kernel
Message-ID: <20210512002338.GJ4967@sequoia>
References: <20210225090610.242623-1-allen.lkml@gmail.com>
 <20210507035816.426585-1-tyhicks@linux.microsoft.com>
 <720CDF03-42F9-43C3-B3B3-999E4A5E2864@linux.microsoft.com>
 <CAHUa44FHo2_EUzFzHnakkm3o7H-Nn+k4hgqT2WNFezZO6D8mxA@mail.gmail.com>
 <20210507131722.GI4967@sequoia>
 <CAHUa44F7AzRQ0ZUBtJV2Y39tk1mPGAbetn7i8-DVAsSFLbFgEg@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <CAHUa44F7AzRQ0ZUBtJV2Y39tk1mPGAbetn7i8-DVAsSFLbFgEg@mail.gmail.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210511_172345_693314_F7067DA9 
X-CRM114-Status: GOOD (  45.67  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On 2021-05-10 09:31:51, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > >
> > > >
> > > >
> > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > >
> > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > quickly as possible without allowing drivers to clean up.
> > > > >
> > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > the address results in a fault that cannot be handled:
> > > > >
> > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > Mem abort info:
> > > > >   ESR = 0x96000004
> > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >   SET = 0, FnV = 0
> > > > >   EA = 0, S1PTW = 0
> > > > > Data abort info:
> > > > >   ISV = 0, ISS = 0x00000004
> > > > >   CM = 0, WnR = 0
> > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > Hardware name: Redacted (DT)
> > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > sp : ffff80001005bb70
> > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > Call trace:
> > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > >
> > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > careful to not translate the addresses returned from
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > kernel.
> > > > >
> > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > ---
> > > > >
> > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > original two patch series[2] plus this patch is the full solution to
> > > > > properly handling OP-TEE shared memory across kexec.
> > > > >
> > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > became unresponsive during boot while steadily streaming the following
> > > > > errors to the serial console:
> > > > >
> > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > >
> > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > addresses?
> >
> > @Jens did you have any thoughts on what could be happening here with the
> > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > addresses when booting the kdump kernel, rather than just disabling the
> > caches?
> 
> No idea. There's no support for SMMU in upstream OP-TEE. Just
> disabling the caches should be good enough. You could try to never
> enable the cache so see if it makes any difference.

I think this is unrelated to OP-TEE and more to do with ongoing DMA
activity when the kernel has crashed and we've done an emergency kexec
into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
has a v2 SMMU and I think something similar to commit 3f54c447df34
("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
good so far. I'll send that out as a separate patch after a little more
testing.

However, with that change and my earlier change to disable the shm cache
during boot, I'm periodically seeing a different issue while the kdump
kernel is coming up. I'm pretty certain it was already there before but
I wasn't seeing it as often since the SMMU warnings were so "loud".

The kernel waits indefinitely for a secure world thread and boot hangs
completely:

[  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  243.366141]       Not tainted 5.4.83-microsoft-standard #1
[  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.379882] swapper/0       D    0     1      0 0x00000028
[  243.385543] Call trace:
[  243.388080]  __switch_to+0xc8/0x118
[  243.391683]  __schedule+0x2e0/0x700
[  243.395280]  schedule+0x38/0xb8
[  243.398522]  schedule_timeout+0x258/0x388
[  243.402659]  wait_for_completion+0x16c/0x4b8
[  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
[  243.412100]  optee_disable_shm_cache+0xb8/0xf8
[  243.416685]  optee_probe+0x560/0x61c
[  243.420375]  platform_drv_probe+0x58/0xa8
[  243.424512]  really_probe+0xe0/0x338
[  243.428202]  driver_probe_device+0x5c/0xf0
[  243.432427]  device_driver_attach+0x74/0x80
[  243.436744]  __driver_attach+0x64/0xe0
[  243.440611]  bus_for_each_dev+0x84/0xd8
[  243.444570]  driver_attach+0x30/0x40
[  243.448258]  bus_add_driver+0x188/0x1e8
[  243.452215]  driver_register+0x64/0x110
[  243.456172]  __platform_driver_register+0x54/0x60
[  243.461027]  optee_driver_init+0x20/0x28
[  243.465075]  do_one_initcall+0x54/0x24c
[  243.469034]  kernel_init_freeable+0x1e8/0x2c0
[  243.473529]  kernel_init+0x18/0x118
[  243.477128]  ret_from_fork+0x10/0x18

I'm unable to trigger a sysrq over the serial console of this remote
machine so I don't yet know what the other threads on the system are
doing during this time. I'll hack something together tomorrow to get a
better idea.

The blocked task warning reminded me of when you said this earlier:

> Bear in mind that there are other times where we can't recover from a
> kernel crash. For instance if a thread is executing in OP-TEE in
> secure world. 

I suspect that it is related to what I'm seeing with this blocked task. Can you
expand on why we can't recover from a kernel crash if a thread is
executing in the secure world?

I appreciate your help!

Tyler

> 
> Cheers,
> Jens
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel