From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13EE4C433EF for ; Fri, 10 Dec 2021 18:13:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245160AbhLJSQv (ORCPT ); Fri, 10 Dec 2021 13:16:51 -0500 Received: from foss.arm.com ([217.140.110.172]:45808 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238333AbhLJSQu (ORCPT ); Fri, 10 Dec 2021 13:16:50 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 70F0212FC; Fri, 10 Dec 2021 10:13:15 -0800 (PST) Received: from [10.57.34.58] (unknown [10.57.34.58]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3668E3F73D; Fri, 10 Dec 2021 10:13:14 -0800 (PST) Message-ID: <80145652-b9ca-57b5-ad95-ca12d6a25eea@arm.com> Date: Fri, 10 Dec 2021 18:13:09 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Thunderbird/91.3.2 Subject: Re: [PATCH v2 01/11] iommu/iova: Fix race between FQ timeout and teardown Content-Language: en-GB To: John Garry , joro@8bytes.org, will@kernel.org Cc: linux-kernel@vger.kernel.org, willy@infradead.org, linux-mm@kvack.org, iommu@lists.linux-foundation.org, Xiongfeng Wang References: <03cbd9c4-0f11-895b-8eb5-1b75bb74d37c@huawei.com> From: Robin Murphy In-Reply-To: <03cbd9c4-0f11-895b-8eb5-1b75bb74d37c@huawei.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021-12-10 18:04, John Garry via iommu wrote: > On 10/12/2021 17:54, Robin Murphy wrote: >> From: Xiongfeng Wang >> >> It turns out to be possible for hotplugging out a device to reach the >> stage of tearing down the device's group and default domain before the >> domain's flush queue has drained naturally. At this point, it is then >> possible for the timeout to expire just*before*  the del_timer() call > > super nit: "just*before*  the" - needs a whitespace before "before" :) Weird... the original patch file here and the copy received by lore via linux-iommu look fine, gremlins in your MUA or delivery path perhaps? >> from free_iova_flush_queue(), such that we then proceed to free the FQ >> resources while fq_flush_timeout() is still accessing them on another >> CPU. Crashes due to this have been observed in the wild while removing >> NVMe devices. >> >> Close the race window by using del_timer_sync() to safely wait for any >> active timeout handler to finish before we start to free things. We >> already avoid any locking in free_iova_flush_queue() since the FQ is >> supposed to be inactive anyway, so the potential deadlock scenario does >> not apply. >> >> Fixes: 9a005a800ae8 ("iommu/iova: Add flush timer") >> Signed-off-by: Xiongfeng Wang >> [ rm: rewrite commit message ] >> Signed-off-by: Robin Murphy > > FWIW, > > Reviewed-by: John Garry Thanks John! Robin.