From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DAA3C43381 for ; Wed, 27 Mar 2019 19:32:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D3D8D2075C for ; Wed, 27 Mar 2019 19:32:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553715149; bh=uLV+b8csXIpp+5O77PUcn60dRftAUzlG52ayYezGYpI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=oqj5QV6ulUlYz2L1/AaKU1DrNhbNyjuIgrPing1EHo8ftQYHwYpaZ04lzRfDck1mY zo9CReg6D/BgFexg8jfr/dxIym3tRLGS+PhwWo0tkIN53bVFqXcpvmr7/ndMwyiyKL 4l3ptAPcBOmsJVIn+ojmhlw9rW7pygDmz1g3mHpA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387464AbfC0Tc2 (ORCPT ); Wed, 27 Mar 2019 15:32:28 -0400 Received: from mail.kernel.org ([198.145.29.99]:42604 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730973AbfC0SCl (ORCPT ); Wed, 27 Mar 2019 14:02:41 -0400 Received: from sasha-vm.mshome.net (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id DF25821738; Wed, 27 Mar 2019 18:02:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1553709760; bh=uLV+b8csXIpp+5O77PUcn60dRftAUzlG52ayYezGYpI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=PPKUOKOeouO9JCPkCkjFi0XlrDyqwHEQguJiaeqpNq2gsqAGEsSnq3ykfzxoGw02p QdxeaxUxFiQWyIbET9EEnk1IHVgqsgKATL3JHej55RQDH3Dp0NnatugQIEW8tk0L9C Nk8Hy7j2MAf9HtWemqWWXKxVXQ1NmP8DSrmuFhJc= From: Sasha Levin To: linux-kernel@vger.kernel.org, stable@vger.kernel.org Cc: Xiang Chen , John Garry , "Martin K . Petersen" , Sasha Levin , linux-scsi@vger.kernel.org Subject: [PATCH AUTOSEL 5.0 026/262] scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO Date: Wed, 27 Mar 2019 13:58:01 -0400 Message-Id: <20190327180158.10245-26-sashal@kernel.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20190327180158.10245-1-sashal@kernel.org> References: <20190327180158.10245-1-sashal@kernel.org> MIME-Version: 1.0 X-Patchwork-Hint: Ignore Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Xiang Chen [ Upstream commit 4790595723d4b833b18c994973d39f9efb842887 ] For internal IO and SMP IO, there is a time-out timer for them. In the timer handler, it checks whether IO is done according to the flag task->task_state_lock. There is an issue which may cause system suspended: internal IO or SMP IO is sent, but at that time because of hardware exception (such as inject 2Bit ECC error), so IO is not completed and also not timeout. But, at that time, the SAS controller reset occurs to recover system. It will release the resource and set the status of IO to be SAS_TASK_STATE_DONE, so when IO timeout, it will never complete the completion of IO and wait for ever. [ 729.123632] Call trace: [ 729.126791] [] __switch_to+0x94/0xa8 [ 729.133106] [] __schedule+0x1e8/0x7fc [ 729.138975] [] schedule+0x34/0x8c [ 729.144401] [] schedule_timeout+0x1d8/0x3cc [ 729.150690] [] wait_for_common+0xdc/0x1a0 [ 729.157101] [] wait_for_completion+0x28/0x34 [ 729.165973] [] hisi_sas_internal_task_abort+0x2a0/0x424 [hisi_sas_test_main] [ 729.176447] [] hisi_sas_abort_task+0x244/0x2d8 [hisi_sas_test_main] [ 729.185258] [] sas_eh_handle_sas_errors+0x1c8/0x7b8 [ 729.192391] [] sas_scsi_recover_host+0x130/0x398 [ 729.199237] [] scsi_error_handler+0x148/0x5c0 [ 729.206009] [] kthread+0x10c/0x138 [ 729.211563] [] ret_from_fork+0x10/0x18 To solve the issue, callback function task_done of those IOs need to be called when on SAS controller reset. Signed-off-by: Xiang Chen Signed-off-by: John Garry Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin --- drivers/scsi/hisi_sas/hisi_sas_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 88ae415e907a..62d158574281 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -873,7 +873,8 @@ static void hisi_sas_do_release_task(struct hisi_hba *hisi_hba, struct sas_task spin_lock_irqsave(&task->task_state_lock, flags); task->task_state_flags &= ~(SAS_TASK_STATE_PENDING | SAS_TASK_AT_INITIATOR); - task->task_state_flags |= SAS_TASK_STATE_DONE; + if (!slot->is_internal && task->task_proto != SAS_PROTOCOL_SMP) + task->task_state_flags |= SAS_TASK_STATE_DONE; spin_unlock_irqrestore(&task->task_state_lock, flags); } -- 2.19.1