From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752269AbcLFDze (ORCPT ); Mon, 5 Dec 2016 22:55:34 -0500 Received: from mx1.redhat.com ([209.132.183.28]:33936 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751201AbcLFDzb (ORCPT ); Mon, 5 Dec 2016 22:55:31 -0500 Date: Tue, 6 Dec 2016 05:55:28 +0200 From: "Michael S. Tsirkin" To: Alex Williamson Cc: Cao jin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, izumi.taku@jp.fujitsu.com Subject: Re: [PATCH] vfio/pci: Support error recovery Message-ID: <20161206054642-mutt-send-email-mst@kernel.org> References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161130210413.5161aab1@t450s.home> <58402830.3060606@cn.fujitsu.com> <20161201075541.756f6332@t450s.home> <5844092A.30204@cn.fujitsu.com> <20161204083047.7e715b09@t450s.home> <58450083.9010201@cn.fujitsu.com> <20161205091730.568e5079@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161205091730.568e5079@t450s.home> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.39]); Tue, 06 Dec 2016 03:55:31 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 05, 2016 at 09:17:30AM -0700, Alex Williamson wrote: > If you're going to take the lead for these AER patches, I would > certainly suggest that understanding the reasoning behind the bus reset > behavior is a central aspect to this series. This effort has dragged > out for nearly two years and I apologize, but I don't really have a lot > of patience for rehashing some of these issues if you're not going to > read the previous discussions or consult with your colleagues to > understand how we got to this point. If you want to challenge some of > the design points, that's great, it could use some new eyes, but please > understand how we got here first. Well I'm guessing Cao jin here isn't the only one not willing to plough through all historical versions of the patchset just to figure out the motivation for some code. Including a summary of a high level architecture couldn't hurt. Any chance of writing such? Alternatively, we can try to build it as part of this thread. Shouldn't be hard as it seems somewhat straight-forward on the surface: - detect link error on the host, don't reset link as we would normally do - report link error to guest - detect link reset request from guest - reset link on host Since link reset will reset all devices behind it, for this to work we need same set of devices behind the link in host and guest. Enforcing this would be nice to have. - as link now might end up in bad state, reset it when device is unassigned Any details I missed? -- MST