From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55972) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpb7g-0002G8-QH for qemu-devel@nongnu.org; Thu, 29 Sep 2016 09:11:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpb7c-00043Y-SZ for qemu-devel@nongnu.org; Thu, 29 Sep 2016 09:11:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:59654) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpb7c-00043K-JK for qemu-devel@nongnu.org; Thu, 29 Sep 2016 09:11:32 -0400 From: Markus Armbruster References: <1471944454-13895-1-git-send-email-caoj.fnst@cn.fujitsu.com> <1471944454-13895-2-git-send-email-caoj.fnst@cn.fujitsu.com> <87y42xw8jk.fsf@dusky.pond.sub.org> <57D769C7.4000503@cn.fujitsu.com> <87fup4iatn.fsf@dusky.pond.sub.org> <20160913084911.5fd310ee@t450s.home> Date: Thu, 29 Sep 2016 15:11:27 +0200 In-Reply-To: <20160913084911.5fd310ee@t450s.home> (Alex Williamson's message of "Tue, 13 Sep 2016 08:49:11 -0600") Message-ID: <87oa3697gg.fsf@dusky.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Marcel Apfelbaum , Cao jin , qemu-devel@nongnu.org, "Michael S. Tsirkin" Alex Williamson writes: > On Tue, 13 Sep 2016 08:16:20 +0200 > Markus Armbruster wrote: > >> Cc: Alex for device assignment expertise. >> >> Cao jin writes: >> >> > On 09/12/2016 09:29 PM, Markus Armbruster wrote: >> >> Cao jin writes: >> >> >> >>> The input parameters is used for creating the msix capable device, so >> >>> they must obey the PCI spec, or else, it should be programming error. >> >> >> >> True when the the parameters come from a device model attempting to >> >> define a PCI device violating the spec. But what if the parameters come >> >> from an actual PCI device violating the spec, via device assignment? >> > >> > Before the patch, on invalid param, the vfio behaviour is: >> > error_report("vfio: msix_init failed"); >> > then, device create fail. >> > >> > After the patch, its behaviour is: >> > asserted. >> > >> > Do you mean we should still report some useful info to user on invalid >> > params? >> >> In the normal case, asking msix_init() to create MSI-X that are out of >> spec is a programming error: the code that does it is broken and needs >> fixing. >> >> Device assignment might be the exception: there, the parameters for >> msix_init() come from the assigned device, not the program. If they >> violate the spec, the device is broken. This wouldn't be a programming >> error. Alex, can this happen? >> >> If yes, we may want to handle it by failing device assignment. > > > Generally, I think the entire premise of these sorts of patches is > flawed. We take a working error path that allows a driver to robustly > abort on unexpected date and turn it into a time bomb. Often the > excuse for this is that "error handling is hard". Tough. Now a > hot-add of a device that triggers this changes from a simple failure to > a denial of service event. Furthermore, we base that time bomb on our > interpretation of the spec, which we can only validate against in-tree > devices. > > We have actually had assigned devices that fail the sanity test here, > there's a quirk in vfio_msix_early_setup() for a Chelsio device with > this bug. Do we really want user experiencing aborts when a simple > device initialization failure is sufficient? > > Generally abort code paths like this cause me to do my own sanity > testing, which is really poor practice since we should have that sanity > testing in the common code. Thanks, I prefer to assert on programming error, because 1. it does double duty as documentation, 2. error handling of impossible conditions is commonly wrong, and 3. assertion failures have a much better chance to get the program fixed. Even when presence of a working error path kills 2., the other two make me stick to assertions. However, input out-of-spec is not a programming error. For most users of msix_init(), the arguments are hard-coded, thus invalid arguments are a programming error. For device assignment, they come from a physical device, thus invalid arguments can either be a programming error (our idea of "invalid" is invalid) or bad input (the physical device is out-of-spec). Since we can't know, we better handle it rather than assert. Bottom line: you convinced me msix_init() should stay as it is. But now msi_init() looks like it needs a change: it asserts on invalid nr_vectors parameter. Does that need fixing, Alex?