From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 070E5C4727E for ; Mon, 28 Sep 2020 18:02:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C0D3520BED for ; Mon, 28 Sep 2020 18:02:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601316165; bh=ecZXyWDxzaXZoqVvnUfNYLoG/DJ+B7euoHgszVXCivc=; h=Subject:To:Cc:References:From:Date:In-Reply-To:List-ID:From; b=tijTzVn0Mi02KjkXLyOGJPNWK7MnopWGO1MszdVU08NIaWb5AfbQjvLh6EM2aW+mn pQMrTI7rc8BwnxlQqW4oBvTwcq8xLNJkQDfJJtNdeyK/hmeiYh0gU6WrUiHyVi/a6U tgc0OYxYMLliXlnxaOF1BIoIUDInnbmODUhMLG90= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726409AbgI1SCi (ORCPT ); Mon, 28 Sep 2020 14:02:38 -0400 Received: from mail.kernel.org ([198.145.29.99]:40404 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726500AbgI1SCR (ORCPT ); Mon, 28 Sep 2020 14:02:17 -0400 Received: from [192.168.0.112] (75-58-59-55.lightspeed.rlghnc.sbcglobal.net [75.58.59.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id F263C20BED; Mon, 28 Sep 2020 18:02:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601316136; bh=ecZXyWDxzaXZoqVvnUfNYLoG/DJ+B7euoHgszVXCivc=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=fHwtX++CygUGfmenlMSqel4CBIYAF3jMJOK04h5tYPQJ2/L+ZA1mbAubxue4rIqzO tkpDg2eviLKjAotH+X3lZ/K9JCwEaSsWz7MSByERF43brhWlWmFILAlgLkzUSevMMp zWOdT5u34THPsDBWy5iiKueaLbmH53sbh2mNpdfM= Subject: Re: [PATCH v3 1/1] PCI/ERR: Fix reset logic in pcie_do_recovery() call To: "Kuppuswamy, Sathyanarayanan" , Bjorn Helgaas Cc: bhelgaas@google.com, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, ashok.raj@intel.com, Jay Vosburgh References: <20200922233333.GA2239404@bjorn-Precision-5520> <704c39bf-6f0c-bba3-70b8-91de6a445e43@linux.intel.com> <3d27d0a4-2115-fa72-8990-a84910e4215f@kernel.org> <526dc846-b12b-3523-4995-966eb972ceb7@kernel.org> <1fdcc4a6-53b7-2b5f-8496-f0f09405f561@linux.intel.com> <95e23cb5-f6e1-b121-0de8-a2066d507d9c@linux.intel.com> <65238d0b-0a39-400a-3a18-4f68eb554538@kernel.org> <4ae86061-2182-bcf1-ebd7-485acf2d47b9@linux.intel.com> <8beca800-ffb5-c535-6d43-7e750cbf06d0@linux.intel.com> <44f0cac5-8deb-1169-eb6d-93ac4889fe7e@kernel.org> <3bc0fd23-8ddd-32c5-1dd9-4d5209ea68c3@linux.intel.com> <8a3aeb3c-83c4-8626-601d-360946d55dd8@linux.intel.com> From: Sinan Kaya Autocrypt: addr=okaya@kernel.org; keydata= mQENBFrnOrUBCADGOL0kF21B6ogpOkuYvz6bUjO7NU99PKhXx1MfK/AzK+SFgxJF7dMluoF6 uT47bU7zb7HqACH6itTgSSiJeSoq86jYoq5s4JOyaj0/18Hf3/YBah7AOuwk6LtV3EftQIhw 9vXqCnBwP/nID6PQ685zl3vH68yzF6FVNwbDagxUz/gMiQh7scHvVCjiqkJ+qu/36JgtTYYw 8lGWRcto6gr0eTF8Wd8f81wspmUHGsFdN/xPsZPKMw6/on9oOj3AidcR3P9EdLY4qQyjvcNC V9cL9b5I/Ud9ghPwW4QkM7uhYqQDyh3SwgEFudc+/RsDuxjVlg9CFnGhS0nPXR89SaQZABEB AAG0HVNpbmFuIEtheWEgPG9rYXlhQGtlcm5lbC5vcmc+iQFOBBMBCAA4FiEEYdOlMSE+a7/c ckrQvGF4I+4LAFcFAlztcAoCGwMFCwkIBwIGFQoJCAsCBBYCAwECHgECF4AACgkQvGF4I+4L AFfidAf/VKHInxep0Z96iYkIq42432HTZUrxNzG9IWk4HN7c3vTJKv2W+b9pgvBF1SmkyQSy 8SJ3Zd98CO6FOHA1FigFyZahVsme+T0GsS3/OF1kjrtMktoREr8t0rK0yKpCTYVdlkHadxmR Qs5xLzW1RqKlrNigKHI2yhgpMwrpzS+67F1biT41227sqFzW9urEl/jqGJXaB6GV+SRKSHN+ ubWXgE1NkmfAMeyJPKojNT7ReL6eh3BNB/Xh1vQJew+AE50EP7o36UXghoUktnx6cTkge0ZS qgxuhN33cCOU36pWQhPqVSlLTZQJVxuCmlaHbYWvye7bBOhmiuNKhOzb3FcgT7kBDQRa5zq1 AQgAyRq/7JZKOyB8wRx6fHE0nb31P75kCnL3oE+smKW/sOcIQDV3C7mZKLf472MWB1xdr4Tm eXeL/wT0QHapLn5M5wWghC80YvjjdolHnlq9QlYVtvl1ocAC28y43tKJfklhHiwMNDJfdZbw 9lQ2h+7nccFWASNUu9cqZOABLvJcgLnfdDpnSzOye09VVlKr3NHgRyRZa7me/oFJCxrJlKAl 2hllRLt0yV08o7i14+qmvxI2EKLX9zJfJ2rGWLTVe3EJBnCsQPDzAUVYSnTtqELu2AGzvDiM gatRaosnzhvvEK+kCuXuCuZlRWP7pWSHqFFuYq596RRG5hNGLbmVFZrCxQARAQABiQEfBBgB CAAJBQJa5zq1AhsMAAoJELxheCPuCwBX2UYH/2kkMC4mImvoClrmcMsNGijcZHdDlz8NFfCI gSb3NHkarnA7uAg8KJuaHUwBMk3kBhv2BGPLcmAknzBIehbZ284W7u3DT9o1Y5g+LDyx8RIi e7pnMcC+bE2IJExCVf2p3PB1tDBBdLEYJoyFz/XpdDjZ8aVls/pIyrq+mqo5LuuhWfZzPPec 9EiM2eXpJw+Rz+vKjSt1YIhg46YbdZrDM2FGrt9ve3YaM5H0lzJgq/JQPKFdbd5MB0X37Qc+ 2m/A9u9SFnOovA42DgXUyC2cSbIJdPWOK9PnzfXqF3sX9Aol2eLUmQuLpThJtq5EHu6FzJ7Y L+s0nPaNMKwv/Xhhm6Y= Message-ID: <9b295cad-7302-cf2c-d19d-d27fabcb48be@kernel.org> Date: Mon, 28 Sep 2020 14:02:14 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <8a3aeb3c-83c4-8626-601d-360946d55dd8@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org On 9/28/2020 1:15 PM, Kuppuswamy, Sathyanarayanan wrote: > Since there is no state restoration for FATAL errors, I am wondering > whether > calls to ->error_detected(), ->mmio_enabled() and ->slot_reset() are > required? Good question, Initially when we started, we were trying to handle both NON_FATAL and FATAL errors in DPC. We have seen value in unifying AER's callback mechanism with DPC. It looks like this no longer applies for DPC. Some drivers want these indication to stop outgoing DMA/timers so that system can recover quickly. There is value in calling them with existing AER based design. I agree it doesn't apply here anymore if we are going to remove the device driver. Maybe, you should stop calling pcie_do_recovery() in DPC as well. > > Let me know your comments about following pseudo code. > > if (fatal error & hotplug_supported) >    do nothing // if fatal triggered by DPC, clear DPC state. > > if (fatal error & no-hotplug) >   perform slot_reset and renumerate affected devices. LGTM, I apologize for calling this slot_reset but slot_reset in err.c code is for post recovery callback to endpoint drivers. Let's not use this term here anymore to not confuse ourselves. remove device + rescan similar to how hotplug remove + hotplug insertion notifications does eventually. All of this to be done in DPC driver without any err.c involvement. Bjorn, What do you think? Is this a good direction? Sinan