From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753403AbcHJS25 (ORCPT ); Wed, 10 Aug 2016 14:28:57 -0400 Received: from mail-it0-f45.google.com ([209.85.214.45]:37624 "EHLO mail-it0-f45.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753358AbcHJS2s (ORCPT ); Wed, 10 Aug 2016 14:28:48 -0400 MIME-Version: 1.0 In-Reply-To: <1470787409.3015.81.camel@kernel.crashing.org> References: <1452684335-46107-1-git-send-email-hare@suse.de> <1452684335-46107-4-git-send-email-hare@suse.de> <20160209210458.GB32530@localhost> <1470787409.3015.81.camel@kernel.crashing.org> From: Alexander Duyck Date: Wed, 10 Aug 2016 08:47:41 -0700 Message-ID: Subject: Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access To: Benjamin Herrenschmidt Cc: Alexey Kardashevskiy , Bjorn Helgaas , Hannes Reinecke , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , Babu Moger , Paul Mackerras , Alex Williamson , santosh@chelsio.com, Netdev Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 9, 2016 at 5:03 PM, Benjamin Herrenschmidt wrote: > On Tue, 2016-08-09 at 11:12 -0700, Alexander Duyck wrote: >> >> The PCI spec is what essentially assumes that there is only one block. >> If I am not mistaken in the case of this device the second block here >> actually contains device configuration data, not actual VPD data. The >> issue here is that the second block is being accessed as VPD when it >> isn't. > > Devices do funny things with config space, film at 11. VFIO trying to > be the middle man and intercept/interpret things is broken, cannot work, > will never work, will just results in lots and lots of useless code, but > I've been singing that song for too long and nobody seems to care... > >> > > #0000 Large item 42 bytes; name 0x2 Identifier String >> > #002d Large item 74 bytes; name 0x10 >> > #007a Small item 1 bytes; name 0xf End Tag >> > --- >> > #0c00 Large item 16 bytes; name 0x2 Identifier String >> > #0c13 Large item 234 bytes; name 0x10 >> > #0d00 Large item 252 bytes; name 0x11 >> > #0dff Small item 0 bytes; name 0xf End Tag >> >> The second block here is driver proprietary setup bits. > > Right. They happen to be in VPD on this device. They an be elsewhere on > other devices. In between capabilities on some, in vendor caps on others... > >> > > The cxgb3 driver is reading the second bit starting from 0xc00 but since >> > the size is wrongly detected as 0x7c, VFIO blocks access beyond it and the >> > guest driver fails to probe. >> > >> > I also cannot find a clause in the PCI 3.0 spec saying that there must be >> > just a single block, is it there? >> >> > The problem is we need to be able to parse it. > > We can parse the standard part for generic stuff like inventory tools > or lsvpd, but we shouldn't get in the way of the driver poking at its > device. If we add the quirk to the kernel that reports the VPD for this device is the actual size of both blocks then we wouldn't be blocking the VPD access like we currently are. The problem is if we don't do this it becomes possible for a guest to essentially cripple a device on the host by just accessing VPD regions that aren't actually viable on many devices. We are much better off in terms of security and stability if we restrict access to what should be accessible. In this case what has happened is that the vendor threw in an extra out-of-spec block and just expected it to work. In order to work around it we just need to add a small function to drivers/pci/quirks.c that would update the VPD size reported so that it matches what the hardware is actually providing instead of what we can determine based on the VPD layout. Really working around something like this is not much different than what we would have to do if the vendor had stuffed the data in some reserved section of their PCI configuration space. We end up needing to add special quirks any time a vendor goes out-of-spec for some one-off configuration interface that only they are ever going to use. - Alex