From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932265AbcLHOnI (ORCPT ); Thu, 8 Dec 2016 09:43:08 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:35627 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S932090AbcLHOnG (ORCPT ); Thu, 8 Dec 2016 09:43:06 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="13732097" Subject: Re: [PATCH] vfio/pci: Support error recovery To: Alex Williamson References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161130210413.5161aab1@t450s.home> <58402830.3060606@cn.fujitsu.com> <20161201075541.756f6332@t450s.home> <5844092A.30204@cn.fujitsu.com> <20161204083047.7e715b09@t450s.home> <58450083.9010201@cn.fujitsu.com> <20161205091730.568e5079@t450s.home> <20161206054642-mutt-send-email-mst@kernel.org> <20161205215949.6b09bc0f@t450s.home> <584696EC.1080004@cn.fujitsu.com> <20161206083556.23be6ee5@t450s.home> CC: "Michael S. Tsirkin" , , , From: Cao jin Message-ID: <58497263.7080500@cn.fujitsu.com> Date: Thu, 8 Dec 2016 22:46:59 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20161206083556.23be6ee5@t450s.home> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: D53CA47AC8A1.A8ABD X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/06/2016 11:35 PM, Alex Williamson wrote: > On Tue, 6 Dec 2016 18:46:04 +0800 > Cao jin wrote: > >> On 12/06/2016 12:59 PM, Alex Williamson wrote: >>> On Tue, 6 Dec 2016 05:55:28 +0200 >>> "Michael S. Tsirkin" wrote: >>> >>>> On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote: >>>>> If you're going to take the lead for these AER patches, I would >>>>> certainly suggest that understanding the reasoning behind the bus reset >>>>> behavior is a central aspect to this series. This effort has dragged >>>>> out for nearly two years and I apologize, but I don't really have a lot >>>>> of patience for rehashing some of these issues if you're not going to >>>>> read the previous discussions or consult with your colleagues to >>>>> understand how we got to this point. If you want to challenge some of >>>>> the design points, that's great, it could use some new eyes, but please >>>>> understand how we got here first. >>>> >>>> Well I'm guessing Cao jin here isn't the only one not >>>> willing to plough through all historical versions of the patchset >>>> just to figure out the motivation for some code. >>>> >>>> Including a summary of a high level architecture couldn't hurt. >>>> >>>> Any chance of writing such? Alternatively, we can try to build it as >>>> part of this thread. Shouldn't be hard as it seems somewhat >>>> straight-forward on the surface: >>>> >>>> - detect link error on the host, don't reset link as we would normally do >>> >>> This is actually a new approach that I'm not sure I agree with. By >>> skipping the host directed link reset, vfio is taking responsibility >>> for doing this, but then we just assume the user will do it. I have >>> issues with this. >>> >>> The previous approach was to use the error detected notifier to block >>> access to the device, allowing the host to perform the link reset. A >>> subsequent notification in the AER process released the user access >>> which allowed the user AER process to proceed. This did result in both >>> a host directed and a guest directed link reset, but other than >>> coordinating the blocking of the user process during host reset, that >>> hasn't been brought up as an issue previously. >>> >> >> Tests on previous versions didn't bring up issues as I find, I think >> that is because we didn't test it completely. As I know, before August >> of this year, we didn't have cable connected to NIC, let alone >> connecting NIC to gateway. > > Lack of testing has been a significant issue throughout the development > of this series. > >> Even if I fixed the guest oops issue in igb driver that Alex found in >> v9, v9 still cannot work in my test. And in my test, disable link >> reset(in host) in aer core for vfio-pci is the most significant step to >> get my test passed. > > But is it the correct step? I'm not convinced. Why did blocking guest > access not work? How do you plan to manage vfio taking the > responsibility to perform a bus reset when you don't know whether QEMU > is the user of the device or whether the user supports AER recovery? > Maybe currently we don't have enough proof to prove the correctness, but I think I did find some facts to prove that link reset in host is a big trouble, and can answer part of questions above. 1st, some thoughts: In pci-error-recovery.txt and do_recovery() of kernel tree, we can see, a recovery consists of several steps(callbacks), link reset is one of them, and except link reset, the others are seems kind of device specific. In our case, both host & guest will do recovery, I think the host recovery actually is some kind of fake recovery, see vfio-pci driver's error_detected & resume callback, they don't do anything special, mainly signal error to user, but the link reset in host "fake reset" does some serious work, in other words, I think host does the recovery incompletely, so I was thinking, why not just drop incompletely host recovery(drop link reset) for vfio-pci, and let the guest take care of the whole serious recovery. This is part of the reason of why my version looks like this. But yes, I admit the issue Alex mentioned, vfio can't guarantee that user will do a bus reset, this is an issue I will keep looking for a solution. 2nd, some facts and analyzation from test: In host, the relationship between time and behviour in each component roughly looks as following: + HW + host kernel + qemu + guest kernel + | |(error recovery)| | | | | | | | | | vfio-pci's | | | | | error_detected | | | | | + | | | | | | | | | | | | error notify | | | | | | via eventfd | | | | | +---------------> +----------+ | | | | | +vfio_err_ | | | | | | |notifier_ | | | | +---- +<---+link reset | |handler | | | | | HW | | | | | | | | | | | | | | | | |r | | vfio-pci's | |pass aer | | | | |e.. | | resume | |to guest | | | | |s. | | (blocking end) | | | | | | |e | | | | *2* | | | | |t | | | +----+-----+ | | | |i | | | | | | | |n | | | +--------> +----------+ | | |g | | | | | guest | | | | | | | | | recovery | | | | | | | | | process | | | | | | | | |(include | | | | | | | | |register | | | | *1* | | | | |access) | | | | | | | | | | | | | | | | | | *3* | | | | | | +----------+ | Time | | | | | | | | | | | | | | | | | | | | v | | | | Now let me try to answer: Why did blocking guest access not work? Some important factor: 1. host recovery doesn't do anything special except error notifying, so it may be executed very fast. 2. Hardware resetting time is not sure, from pcie spec 6.6.1, guessing it need many ms, pretty long? some facts found in v9(block config write, not read, during host recovery) test: 1. reading uncor error register in vfio_err_notifier_handler sometimes returns correct value, sometimes return invalid value(All F's) So, I am thinking, if host blocking on host end early, and *2*, *3* is parallel with *1*, the way used in v9 to blocking guest access, may not work. -- Sincerely, Cao jin