From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934184AbcLMQNW (ORCPT ); Tue, 13 Dec 2016 11:13:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39306 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933734AbcLMQMf (ORCPT ); Tue, 13 Dec 2016 11:12:35 -0500 Date: Tue, 13 Dec 2016 18:12:34 +0200 From: "Michael S. Tsirkin" To: Alex Williamson Cc: Cao jin , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, izumi.taku@jp.fujitsu.com Subject: Re: [PATCH] vfio/pci: Support error recovery Message-ID: <20161213181116-mutt-send-email-mst@kernel.org> References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <584EAACD.9070800@cn.fujitsu.com> <20161212121216.1c385d65@t450s.home> <20161213002810-mutt-send-email-mst@kernel.org> <20161212154313.2ffdf4ab@t450s.home> <20161213050950-mutt-send-email-mst@kernel.org> <20161212203948.41ba48d2@t450s.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161212203948.41ba48d2@t450s.home> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Tue, 13 Dec 2016 16:12:35 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 12, 2016 at 08:39:48PM -0700, Alex Williamson wrote: > On Tue, 13 Dec 2016 05:15:13 +0200 > "Michael S. Tsirkin" wrote: > > > On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote: > > > > So just don't do it then. Topology must match between host and guest, > > > > except maybe for the case of devices with host driver (e.g. PF) > > > > which we might be able to synchronize against. > > > > > > We're talking about host kernel level handling here. The host kernel > > > cannot defer the link reset to the user under the assumption that the > > > user is handling the devices in a very specific way. The moment we do > > > that, we've lost. > > > > The way is same as baremetal though, so why not? > > How do we know this? What if the user is dpdk? The kernel is > responsible for maintaining the integrity of the system and devices, > not the user. > > > And if user doesn't do what's expected, we can > > do the full link reset on close. > > That's exactly my point, if we're talking about multiple devices, > there's no guarantee that the close() for each is simultaneous. If one > function is released before the other we cannot do a bus reset. If > that device is then opened by another user before its sibling is > released, then we once again cannot perform a link reset. I don't > think it would be reasonable to mark the released device quarantined > until the sibling is released, that would be a terrible user experience. Not sure why you find it so terrible, and I don't think there's another way. -- MST