From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932378AbcLMDjx (ORCPT ); Mon, 12 Dec 2016 22:39:53 -0500 Received: from mx1.redhat.com ([209.132.183.28]:47544 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932199AbcLMDjt (ORCPT ); Mon, 12 Dec 2016 22:39:49 -0500 Date: Mon, 12 Dec 2016 20:39:48 -0700 From: Alex Williamson To: "Michael S. Tsirkin" Cc: Cao jin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, izumi.taku@jp.fujitsu.com Subject: Re: [PATCH] vfio/pci: Support error recovery Message-ID: <20161212203948.41ba48d2@t450s.home> In-Reply-To: <20161213050950-mutt-send-email-mst@kernel.org> References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <584EAACD.9070800@cn.fujitsu.com> <20161212121216.1c385d65@t450s.home> <20161213002810-mutt-send-email-mst@kernel.org> <20161212154313.2ffdf4ab@t450s.home> <20161213050950-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 13 Dec 2016 03:39:49 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Dec 2016 05:15:13 +0200 "Michael S. Tsirkin" wrote: > On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote: > > > So just don't do it then. Topology must match between host and guest, > > > except maybe for the case of devices with host driver (e.g. PF) > > > which we might be able to synchronize against. > > > > We're talking about host kernel level handling here. The host kernel > > cannot defer the link reset to the user under the assumption that the > > user is handling the devices in a very specific way. The moment we do > > that, we've lost. > > The way is same as baremetal though, so why not? How do we know this? What if the user is dpdk? The kernel is responsible for maintaining the integrity of the system and devices, not the user. > And if user doesn't do what's expected, we can > do the full link reset on close. That's exactly my point, if we're talking about multiple devices, there's no guarantee that the close() for each is simultaneous. If one function is released before the other we cannot do a bus reset. If that device is then opened by another user before its sibling is released, then we once again cannot perform a link reset. I don't think it would be reasonable to mark the released device quarantined until the sibling is released, that would be a terrible user experience.