From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36417) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X1ufG-0007EQ-Uq for qemu-devel@nongnu.org; Tue, 01 Jul 2014 05:47:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X1ufB-00018n-3z for qemu-devel@nongnu.org; Tue, 01 Jul 2014 05:47:50 -0400 Received: from mga02.intel.com ([134.134.136.20]:54644) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X1ufA-00017B-Fb for qemu-devel@nongnu.org; Tue, 01 Jul 2014 05:47:45 -0400 Message-ID: <53B28392.3040604@intel.com> Date: Tue, 01 Jul 2014 17:46:58 +0800 From: "Chen, Tiejun" MIME-Version: 1.0 References: <53ABF00E.6000309@redhat.com> <53B0D0C5.60000@intel.com> <20140630064822.GB14491@redhat.com> <53B110CA.6070606@intel.com> <20140630090511.GB15777@redhat.com> <53B1300D.10001@intel.com> <20140630095509.GA17700@redhat.com> <53B139E6.1020607@intel.com> <20140630112845.GA29477@redhat.com> <53B21FAA.1050207@intel.com> <20140701091256.GB25897@redhat.com> In-Reply-To: <20140701091256.GB25897@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, qemu-devel@nongnu.org, Kelly.Zytaruk@amd.com, anthony.perard@citrix.com, anthony@codemonkey.ws, yang.z.zhang@intel.com, Paolo Bonzini On 2014/7/1 17:12, Michael S. Tsirkin wrote: > On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote: >> On 2014/6/30 19:28, Michael S. Tsirkin wrote: >>> On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote: >>>> On 2014/6/30 17:55, Michael S. Tsirkin wrote: >>>>> On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: >>>>>> On 2014/6/30 17:05, Michael S. Tsirkin wrote: >>>>>>> On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: >>>>>>>> On 2014/6/30 14:48, Michael S. Tsirkin wrote: >>>>>>>>> On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: >>>>>>>>>> On 2014/6/26 18:03, Paolo Bonzini wrote: >>>>>>>>>>> Il 26/06/2014 11:18, Chen, Tiejun ha scritto: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - offsets 0x0000..0x0fff map to configuration space of the host MCH >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Are you saying the config space in the video device? >>>>>>>>>>> >>>>>>>>>>> No, I am saying in a new BAR, or at some magic offset of an existing >>>>>>>>>>> MMIO BAR. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As I mentioned previously, the IGD guy told me we have no any unused a >>>>>>>>>> offset or BAR in the config space. >>>>>>>>>> >>>>>>>>>> And guy who are responsible for the native driver seems not be accept to >>>>>>>>>> extend some magic offset of an existing MMIO BAR. >>>>>>>>>> >>>>>>>>>> In addition I think in a short time its not possible to migrate i440fx to >>>>>>>>>> q35 as a PCIe machine of xen. >>>>>>>>> >>>>>>>>> That seems like a weak motivation. I don't see a need to get something >>>>>>>>> merged upstream in a short time: this seems sure to miss 2.1, >>>>>>>>> so you have the time to make it architecturally sound. >>>>>>>>> "Making existing guests work" would be a better motivation. >>>>>>>> >>>>>>>> Yes. >>>>>>> >>>>>>> So focus on this then. Existing guests will probably work >>>>>>> fine on a newer chipset - likely better than on i440fx. >>>>>>> xen management tools need to do some work to support this? >>>>>>> That will just give everyone more long term benefits. >>>>>>> >>>>>>> If instead we create a hack that does not resemble >>>>>>> any existing hardware even remotely, what's the >>>>>>> chance that it will not break with any future >>>>>>> guest modification? >>>>>>> >>>>>>> >>>>>>>>> Isn't this possible with an mch chipset? >>>>>>>> >>>>>>>> If you're saying q35, I mean AFAIK we have no any plan to migrate to this >>>>>>>> MCH in xen case. >>>>>>> >>>>>>> q35 or a newer chipset that's closer to what guests expect. >>>>>>> >>>>>>>> Additionally, I think I should follow this feature after >>>>>>>> q35 can work for xen scenario. >>>>>>> >>>>>>> What's stopping you? >>>>>> >>>>>> I mean if you want create an new machine based on q35, actually this is >>>>>> equal to start making xen to migrate to q35 now. Right? I can't image how >>>>>> much effort should be done. >>>>> >>>>> I don't see why you don't try. Sounds like a more robust solution to me. >>>> >>>> As I think this is another requirement to us. I'm not sure if I have enough >>>> time to touch this currently. >>>> >>>>> >>>>>> So this is a reason why I'm saying I'd like to follow this feature after q35 >>>>>> can work with xen completely. >>>>> >>>>> Then we'll end up with more configurations to support, and to what end? >>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> So could we do this step by step: >>>>>>>>>> >>>>>>>>>> #1 phase: We just cover current qemu-xen implementation based on i44fx, so >>>>>>>>>> still provide that pseudo ISA bridge at 00:1f.0 as we already did. >>>>>>>>> >>>>>>>>> By the way there is no reason to put it at 00:1f.0 specifically I think. >>>>>>>>> So it seems simple: create a dummy device that gets device and >>>>>>>>> vendor id as properties. If you really like, add an option to get values >>>>>>>> >>>>>>>> Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx >>>>>>>> passthrough: create pseudo intel isa bridge. There, we fake this device just >>>>>>>> at 00:1f.0. >>>>>>>> But you guys don't like this, and shouldn't this be just this point we >>>>>>>> discussing now? >>>>>>>> >>>>>>>> If you guys agree that , everything is fine. >>>>>>> >>>>>>> Actually, this isn't what you did. >>>>>>> Don't tie it to xen, and don't tie it to 1f. >>>>>>> Just make it a simple stub pci device. >>>>>>> Whoever wants it, creates it. >>>>>>> >>>>>>> The thing I worry about, is the chance this will break going forward. >>>>>>> So you created a system with 2 isa bridges. >>>>>> >>>>>> I don't set class type to claim this is an ISA bridge, just a normal PCI >>>>>> device. >>>>>> +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice >>>>>> *hdev) >>>>>> +{ >>>>>> + struct PCIDevice *dev; >>>>>> + >>>>>> + char rid; >>>>>> + >>>>>> + /* We havt to use a simple PCI device to fake this ISA bridge >>>>>> + * to avoid making some confusion to BIOS and ACPI. >>>>>> + */ >>>>>> + dev = pci_create(bus, PCI_DEVFN(0x1f, 0), >>>>>> "pseudo-intel-pch-isa-bridge"); >>>>>> + >>>>>> + qdev_init_nofail(&dev->qdev); >>>>>> + >>>>>> + pci_config_set_vendor_id(dev->config, hdev->vendor_id); >>>>>> + pci_config_set_device_id(dev->config, hdev->device_id); >>>>>> + >>>>>> + xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1); >>>>>> + >>>>>> + pci_config_set_revision(dev->config, rid); >>>>>> + >>>>>> + XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n"); >>>>>> + return 0; >>>>>> +} >>>>> >>>>> Then I don't see how it's supposed to work. >>>>> Doesn't i915 look for an isa bridge? >>>>> >>>>> /* >>>>> * The reason to probe ISA bridge instead of Dev31:Fun0 is to >>>>> * make graphics device passthrough work easy for VMM, that only >>>>> * need to expose ISA bridge to let driver know the real hardware >>>>> * underneath. This is a requirement from virtualization team. >>>>> * >>>>> * In some virtualized environments (e.g. XEN), there is irrelevant >>>>> * ISA bridge in the system. To work reliably, we should scan trhough >>>>> * all the ISA bridge devices and check for the first match, instead >>>>> * of only checking the first one. >>>>> */ >>>>> while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) { >>>>> if (pch->vendor == PCI_VENDOR_ID_INTEL) { >>>>> unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK; >>>>> dev_priv->pch_id = id; >>>>> >>>>> if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) { >>>>> dev_priv->pch_type = PCH_IBX; >>>>> >>>>> etc >>>>> >>>> >>>> I already post this to mainline to change as follows: >>>> >>>> - while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) { >>>> + pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); >>>> + if (pch) { >>> >>> I see - responded on that mail - but I thought the point is to make >>> existing guests run? In fact you said so explicitly. >>> >>> >>>> Please refer to this, >>>> >>>> [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of >>>> check class type >>>> >>>> Linux Native guys would like to accept this. And actually Windows always use >>>> devfn to detect this. >>> >>> In fact I see this: >>> >>> linux 2.6.35-3.9 probes the 1st IA bridge >>> >>> no idea how would you fix this. >>> try changing default class for the main bridge? >>> >>> linux 3.10 probes all ISA bridges >>> >>> requires your stub to be the isa bridge? >>> >>> >>> I don't see why are driver guys so willing to do crazy things. They >>> want to match specific device/vendor id pairs, why don't they do just >>> that? Why poke at class, random slot numbers etc etc? >> >> AFAIK what they did is from our previous incorrect requirement as I >> understand. So we need to correct this. >> >> Thanks >> Tiejun > > Since we can't fix existing guests, I would say This shouldn't be a fix existing guest, and this is why I can send this before you guys accept GFX passthrough for qemu ML. I think you can re-read that patch head description. 1f.0 can work under all scenarios including qemu-xen-traditonal. And this is also expected by native Linux organically, and especially Windows always use devfn to detect PCH, this is not like current Linux. So here I just sync this to make sense. Unless you'd like to make Linux specific to this point. Tiejun > get things working first, find a clean way for > new driver to work next. > >>> >>> >>> >>>>> >>>>> >>>>>>> This is already not something that exists on real hardware. >>>>>>> So it might break some guests that will get confused. >>>>>>> Maybe we are lucky and most guests see an unfamiliar device >>>>>>> and ignore it. It seems believable. >>>>>>> >>>>>>> But your MCH hack overrides registers in the pci host. >>>>>> >>>>>> We just try to write *one* register we already confirm this is safe enough. >>>>> >>>>> This should go in code in form of comments: >>>>> document what this register does on 440fx >>>>> and why it's safe to override. >>>>> We don't see what you >>>>> confirmed off-line. >>>> >>>> That offset is one specific to IGD usage, not for i440fx common. This is why >>>> we need to expose something in the host bridge. They're just introduced to >>>> support IGD. >>>> >>>>> >>>>>> Other register are read-only. >>>>> >>>>> Doesn't matter, need to document these as well. >>>> >>>> I think everything are covered in igd_pci_write()/igd_pci_write(). >>>> >>>>> >>>>>>> Are we lucky and there's nothing in these registers >>>>>>> of interest to guests? This seems much more fragile. >>>>>>> So please poke at the spec, and compile the list >>>>>>> of registers you want to touch, figure out why they are >>>>>>> safe to override, and put this all in code comments. >>>>>>> >>>>>>> And the same thing that applies to the isa bridge >>>>>> >>>>>> We just expose its own vendor/device ids here. We don't access to change >>>>>> anything in the isa bridge. >>>>>> >>>>>>> applies here too. It should work without QEMU touching >>>>>>> hosts' hardware. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >from sysfs: device and vendor id are world readable, so just get them >>>>>>>>> directly and not through xen wrappers, this way you can open the files >>>>>>>>> RO and not RW. >>>>>>>>> You seem to poke at revision as well, I don't see >>>>>>>>> driver looking at that - strictly necessary? >>>>>>>>> If yes please patch host kernel to expose that info in sysfs, >>>>>>>>> though we can fall back on pci config if not there. >>>>>>>>> >>>>>>>>> MCH (bridge_dev) hacks in i915 are nastier. >>>>>>>>> To clean them up, we really have to have an explicit driver for this >>>>>>>>> bridge, not a pass-through device. Long term, the right thing to do is >>>>>>>>> likely to extend host driver and expose the necessary information in >>>>>>>>> sysfs on host kernel. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> I'm a bit confused. Any sysfs should be filled by the associated PCIe >>>>>>>> device, shouldn't it? So qemu still need to emulate this PCIe device >>>>>>>> firstly, then set properties into sysfs. >>>>>>> >>>>>>> I am talking about getting host properties into qemu. >>>>>>> You don't want to give qemu R/W root access to host sysfs system >>>>>>> of the root bridge, that's not secure. >>>>>> >>>>>> igd_pci_read() >>>>>> | >>>>>> + xen_host_pci_get_block() >>>>>> | >>>>>> + xen_host_pci_config_read(() >>>>>> | >>>>>> + pread() >>>>>> >>>>>> So shouldn't we already expose these information via sysfs? >>>>> >>>>> That's poking at sysfs of a real device, and after opening it RW. >>>> >>>> I don't think we can really write anything to those read-only sysfs >>>> interface even we open them as RW. >>>> >>>> Tiejun >>>> >>>>> >>>>>>> Avoiding read only access to filesystem is a good idea too, so it >>>>>>> should be possible to pass all parameters in as >>>>>>> device properties, and let whoever starts qemu >>>>>>> figure out what are reasonable values. >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> #2 phase: Now, we will choose a capability ID that won't be conflicting with >>>>>>>>>> others. To do this properly, we need to get one from PCI SIG group. To have >>>>>>>>>> this workable and consistently validated, this method shouldn't be virt >>>>>>>>>> specific. Then native driver should use the same method. >>>>>>>>> >>>>>>>>> You mean you will be able to talk sense into hardware guys? >>>>>>>>> I doubt that. If they could be convinced to make e.g. i915 base a >>>>>>>> >>>>>>>> We're negotiating this, so this is just our long term solution we figure out >>>>>>>> currently. >>>>>>>> >>>>>>>>> proper BAR, why didn't they? >>>>>>>> >>>>>>>> We already have no any free BAR as we mentioned previously. >>>>>>> >>>>>>> I thought you were talking about modifying hardware? >>>>>> >>>>>> Yes. >>>>>> >>>>>> Tiejun >>>>> >>>>> And they can't figure out how to stick extra info in an existing BAR? >>>>> Oh well, that's hardware for you. >>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> So when xen work on >>>>>>>>>> q35 PCIe machine, we can walk this way. >>>>>>>>> >>>>>>>>> If you are emulating MCH anyway, pick one that is close >>>>>>>>> to what i915 driver expects. It would then work with existing >>>>>>>> >>>>>>>> Looks you guys prefer we create a new MCH anyway, right? But is it necessary >>>>>>>> to create a new based on i440fx just for a little change? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Tiejun >>>>>>> >>>>>>> You can inherit it. Maybe you are lucky and this happens to >>>>>>> work without conflicting with whatever other guests want to do. >>>>>>> But if you ask me, you are really just piling up hacks. >>>>>>> If some guest does not work on i440, you should just work on >>>>>>> emulating whatever it does work on. >>>>>>> That would have real value. >>>>>>> >>>>>>>>> devices, without new capability IDs. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Anthony, >>>>>>>>>> >>>>>>>>>> Any comments to address this in xen case? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Tiejun >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xen.org >>>>> http://lists.xen.org/xen-devel >>>>> >>>>> >>> > > From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Chen, Tiejun" Subject: Re: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD passthrough support Date: Tue, 01 Jul 2014 17:46:58 +0800 Message-ID: <53B28392.3040604@intel.com> References: <53ABF00E.6000309@redhat.com> <53B0D0C5.60000@intel.com> <20140630064822.GB14491@redhat.com> <53B110CA.6070606@intel.com> <20140630090511.GB15777@redhat.com> <53B1300D.10001@intel.com> <20140630095509.GA17700@redhat.com> <53B139E6.1020607@intel.com> <20140630112845.GA29477@redhat.com> <53B21FAA.1050207@intel.com> <20140701091256.GB25897@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20140701091256.GB25897@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org To: "Michael S. Tsirkin" Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, qemu-devel@nongnu.org, Kelly.Zytaruk@amd.com, anthony.perard@citrix.com, anthony@codemonkey.ws, yang.z.zhang@intel.com, Paolo Bonzini List-Id: xen-devel@lists.xenproject.org On 2014/7/1 17:12, Michael S. Tsirkin wrote: > On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote: >> On 2014/6/30 19:28, Michael S. Tsirkin wrote: >>> On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote: >>>> On 2014/6/30 17:55, Michael S. Tsirkin wrote: >>>>> On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote: >>>>>> On 2014/6/30 17:05, Michael S. Tsirkin wrote: >>>>>>> On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote: >>>>>>>> On 2014/6/30 14:48, Michael S. Tsirkin wrote: >>>>>>>>> On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote: >>>>>>>>>> On 2014/6/26 18:03, Paolo Bonzini wrote: >>>>>>>>>>> Il 26/06/2014 11:18, Chen, Tiejun ha scritto: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> - offsets 0x0000..0x0fff map to configuration space of the host MCH >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Are you saying the config space in the video device? >>>>>>>>>>> >>>>>>>>>>> No, I am saying in a new BAR, or at some magic offset of an existing >>>>>>>>>>> MMIO BAR. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> As I mentioned previously, the IGD guy told me we have no any unused a >>>>>>>>>> offset or BAR in the config space. >>>>>>>>>> >>>>>>>>>> And guy who are responsible for the native driver seems not be accept to >>>>>>>>>> extend some magic offset of an existing MMIO BAR. >>>>>>>>>> >>>>>>>>>> In addition I think in a short time its not possible to migrate i440fx to >>>>>>>>>> q35 as a PCIe machine of xen. >>>>>>>>> >>>>>>>>> That seems like a weak motivation. I don't see a need to get something >>>>>>>>> merged upstream in a short time: this seems sure to miss 2.1, >>>>>>>>> so you have the time to make it architecturally sound. >>>>>>>>> "Making existing guests work" would be a better motivation. >>>>>>>> >>>>>>>> Yes. >>>>>>> >>>>>>> So focus on this then. Existing guests will probably work >>>>>>> fine on a newer chipset - likely better than on i440fx. >>>>>>> xen management tools need to do some work to support this? >>>>>>> That will just give everyone more long term benefits. >>>>>>> >>>>>>> If instead we create a hack that does not resemble >>>>>>> any existing hardware even remotely, what's the >>>>>>> chance that it will not break with any future >>>>>>> guest modification? >>>>>>> >>>>>>> >>>>>>>>> Isn't this possible with an mch chipset? >>>>>>>> >>>>>>>> If you're saying q35, I mean AFAIK we have no any plan to migrate to this >>>>>>>> MCH in xen case. >>>>>>> >>>>>>> q35 or a newer chipset that's closer to what guests expect. >>>>>>> >>>>>>>> Additionally, I think I should follow this feature after >>>>>>>> q35 can work for xen scenario. >>>>>>> >>>>>>> What's stopping you? >>>>>> >>>>>> I mean if you want create an new machine based on q35, actually this is >>>>>> equal to start making xen to migrate to q35 now. Right? I can't image how >>>>>> much effort should be done. >>>>> >>>>> I don't see why you don't try. Sounds like a more robust solution to me. >>>> >>>> As I think this is another requirement to us. I'm not sure if I have enough >>>> time to touch this currently. >>>> >>>>> >>>>>> So this is a reason why I'm saying I'd like to follow this feature after q35 >>>>>> can work with xen completely. >>>>> >>>>> Then we'll end up with more configurations to support, and to what end? >>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> So could we do this step by step: >>>>>>>>>> >>>>>>>>>> #1 phase: We just cover current qemu-xen implementation based on i44fx, so >>>>>>>>>> still provide that pseudo ISA bridge at 00:1f.0 as we already did. >>>>>>>>> >>>>>>>>> By the way there is no reason to put it at 00:1f.0 specifically I think. >>>>>>>>> So it seems simple: create a dummy device that gets device and >>>>>>>>> vendor id as properties. If you really like, add an option to get values >>>>>>>> >>>>>>>> Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx >>>>>>>> passthrough: create pseudo intel isa bridge. There, we fake this device just >>>>>>>> at 00:1f.0. >>>>>>>> But you guys don't like this, and shouldn't this be just this point we >>>>>>>> discussing now? >>>>>>>> >>>>>>>> If you guys agree that , everything is fine. >>>>>>> >>>>>>> Actually, this isn't what you did. >>>>>>> Don't tie it to xen, and don't tie it to 1f. >>>>>>> Just make it a simple stub pci device. >>>>>>> Whoever wants it, creates it. >>>>>>> >>>>>>> The thing I worry about, is the chance this will break going forward. >>>>>>> So you created a system with 2 isa bridges. >>>>>> >>>>>> I don't set class type to claim this is an ISA bridge, just a normal PCI >>>>>> device. >>>>>> +static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice >>>>>> *hdev) >>>>>> +{ >>>>>> + struct PCIDevice *dev; >>>>>> + >>>>>> + char rid; >>>>>> + >>>>>> + /* We havt to use a simple PCI device to fake this ISA bridge >>>>>> + * to avoid making some confusion to BIOS and ACPI. >>>>>> + */ >>>>>> + dev = pci_create(bus, PCI_DEVFN(0x1f, 0), >>>>>> "pseudo-intel-pch-isa-bridge"); >>>>>> + >>>>>> + qdev_init_nofail(&dev->qdev); >>>>>> + >>>>>> + pci_config_set_vendor_id(dev->config, hdev->vendor_id); >>>>>> + pci_config_set_device_id(dev->config, hdev->device_id); >>>>>> + >>>>>> + xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1); >>>>>> + >>>>>> + pci_config_set_revision(dev->config, rid); >>>>>> + >>>>>> + XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n"); >>>>>> + return 0; >>>>>> +} >>>>> >>>>> Then I don't see how it's supposed to work. >>>>> Doesn't i915 look for an isa bridge? >>>>> >>>>> /* >>>>> * The reason to probe ISA bridge instead of Dev31:Fun0 is to >>>>> * make graphics device passthrough work easy for VMM, that only >>>>> * need to expose ISA bridge to let driver know the real hardware >>>>> * underneath. This is a requirement from virtualization team. >>>>> * >>>>> * In some virtualized environments (e.g. XEN), there is irrelevant >>>>> * ISA bridge in the system. To work reliably, we should scan trhough >>>>> * all the ISA bridge devices and check for the first match, instead >>>>> * of only checking the first one. >>>>> */ >>>>> while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) { >>>>> if (pch->vendor == PCI_VENDOR_ID_INTEL) { >>>>> unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK; >>>>> dev_priv->pch_id = id; >>>>> >>>>> if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) { >>>>> dev_priv->pch_type = PCH_IBX; >>>>> >>>>> etc >>>>> >>>> >>>> I already post this to mainline to change as follows: >>>> >>>> - while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) { >>>> + pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0)); >>>> + if (pch) { >>> >>> I see - responded on that mail - but I thought the point is to make >>> existing guests run? In fact you said so explicitly. >>> >>> >>>> Please refer to this, >>>> >>>> [RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of >>>> check class type >>>> >>>> Linux Native guys would like to accept this. And actually Windows always use >>>> devfn to detect this. >>> >>> In fact I see this: >>> >>> linux 2.6.35-3.9 probes the 1st IA bridge >>> >>> no idea how would you fix this. >>> try changing default class for the main bridge? >>> >>> linux 3.10 probes all ISA bridges >>> >>> requires your stub to be the isa bridge? >>> >>> >>> I don't see why are driver guys so willing to do crazy things. They >>> want to match specific device/vendor id pairs, why don't they do just >>> that? Why poke at class, random slot numbers etc etc? >> >> AFAIK what they did is from our previous incorrect requirement as I >> understand. So we need to correct this. >> >> Thanks >> Tiejun > > Since we can't fix existing guests, I would say This shouldn't be a fix existing guest, and this is why I can send this before you guys accept GFX passthrough for qemu ML. I think you can re-read that patch head description. 1f.0 can work under all scenarios including qemu-xen-traditonal. And this is also expected by native Linux organically, and especially Windows always use devfn to detect PCH, this is not like current Linux. So here I just sync this to make sense. Unless you'd like to make Linux specific to this point. Tiejun > get things working first, find a clean way for > new driver to work next. > >>> >>> >>> >>>>> >>>>> >>>>>>> This is already not something that exists on real hardware. >>>>>>> So it might break some guests that will get confused. >>>>>>> Maybe we are lucky and most guests see an unfamiliar device >>>>>>> and ignore it. It seems believable. >>>>>>> >>>>>>> But your MCH hack overrides registers in the pci host. >>>>>> >>>>>> We just try to write *one* register we already confirm this is safe enough. >>>>> >>>>> This should go in code in form of comments: >>>>> document what this register does on 440fx >>>>> and why it's safe to override. >>>>> We don't see what you >>>>> confirmed off-line. >>>> >>>> That offset is one specific to IGD usage, not for i440fx common. This is why >>>> we need to expose something in the host bridge. They're just introduced to >>>> support IGD. >>>> >>>>> >>>>>> Other register are read-only. >>>>> >>>>> Doesn't matter, need to document these as well. >>>> >>>> I think everything are covered in igd_pci_write()/igd_pci_write(). >>>> >>>>> >>>>>>> Are we lucky and there's nothing in these registers >>>>>>> of interest to guests? This seems much more fragile. >>>>>>> So please poke at the spec, and compile the list >>>>>>> of registers you want to touch, figure out why they are >>>>>>> safe to override, and put this all in code comments. >>>>>>> >>>>>>> And the same thing that applies to the isa bridge >>>>>> >>>>>> We just expose its own vendor/device ids here. We don't access to change >>>>>> anything in the isa bridge. >>>>>> >>>>>>> applies here too. It should work without QEMU touching >>>>>>> hosts' hardware. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >from sysfs: device and vendor id are world readable, so just get them >>>>>>>>> directly and not through xen wrappers, this way you can open the files >>>>>>>>> RO and not RW. >>>>>>>>> You seem to poke at revision as well, I don't see >>>>>>>>> driver looking at that - strictly necessary? >>>>>>>>> If yes please patch host kernel to expose that info in sysfs, >>>>>>>>> though we can fall back on pci config if not there. >>>>>>>>> >>>>>>>>> MCH (bridge_dev) hacks in i915 are nastier. >>>>>>>>> To clean them up, we really have to have an explicit driver for this >>>>>>>>> bridge, not a pass-through device. Long term, the right thing to do is >>>>>>>>> likely to extend host driver and expose the necessary information in >>>>>>>>> sysfs on host kernel. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> I'm a bit confused. Any sysfs should be filled by the associated PCIe >>>>>>>> device, shouldn't it? So qemu still need to emulate this PCIe device >>>>>>>> firstly, then set properties into sysfs. >>>>>>> >>>>>>> I am talking about getting host properties into qemu. >>>>>>> You don't want to give qemu R/W root access to host sysfs system >>>>>>> of the root bridge, that's not secure. >>>>>> >>>>>> igd_pci_read() >>>>>> | >>>>>> + xen_host_pci_get_block() >>>>>> | >>>>>> + xen_host_pci_config_read(() >>>>>> | >>>>>> + pread() >>>>>> >>>>>> So shouldn't we already expose these information via sysfs? >>>>> >>>>> That's poking at sysfs of a real device, and after opening it RW. >>>> >>>> I don't think we can really write anything to those read-only sysfs >>>> interface even we open them as RW. >>>> >>>> Tiejun >>>> >>>>> >>>>>>> Avoiding read only access to filesystem is a good idea too, so it >>>>>>> should be possible to pass all parameters in as >>>>>>> device properties, and let whoever starts qemu >>>>>>> figure out what are reasonable values. >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> #2 phase: Now, we will choose a capability ID that won't be conflicting with >>>>>>>>>> others. To do this properly, we need to get one from PCI SIG group. To have >>>>>>>>>> this workable and consistently validated, this method shouldn't be virt >>>>>>>>>> specific. Then native driver should use the same method. >>>>>>>>> >>>>>>>>> You mean you will be able to talk sense into hardware guys? >>>>>>>>> I doubt that. If they could be convinced to make e.g. i915 base a >>>>>>>> >>>>>>>> We're negotiating this, so this is just our long term solution we figure out >>>>>>>> currently. >>>>>>>> >>>>>>>>> proper BAR, why didn't they? >>>>>>>> >>>>>>>> We already have no any free BAR as we mentioned previously. >>>>>>> >>>>>>> I thought you were talking about modifying hardware? >>>>>> >>>>>> Yes. >>>>>> >>>>>> Tiejun >>>>> >>>>> And they can't figure out how to stick extra info in an existing BAR? >>>>> Oh well, that's hardware for you. >>>>> >>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> So when xen work on >>>>>>>>>> q35 PCIe machine, we can walk this way. >>>>>>>>> >>>>>>>>> If you are emulating MCH anyway, pick one that is close >>>>>>>>> to what i915 driver expects. It would then work with existing >>>>>>>> >>>>>>>> Looks you guys prefer we create a new MCH anyway, right? But is it necessary >>>>>>>> to create a new based on i440fx just for a little change? >>>>>>>> >>>>>>>> Thanks >>>>>>>> Tiejun >>>>>>> >>>>>>> You can inherit it. Maybe you are lucky and this happens to >>>>>>> work without conflicting with whatever other guests want to do. >>>>>>> But if you ask me, you are really just piling up hacks. >>>>>>> If some guest does not work on i440, you should just work on >>>>>>> emulating whatever it does work on. >>>>>>> That would have real value. >>>>>>> >>>>>>>>> devices, without new capability IDs. >>>>>>>>> >>>>>>>>> >>>>>>>>>> Anthony, >>>>>>>>>> >>>>>>>>>> Any comments to address this in xen case? >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Tiejun >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> _______________________________________________ >>>>> Xen-devel mailing list >>>>> Xen-devel@lists.xen.org >>>>> http://lists.xen.org/xen-devel >>>>> >>>>> >>> > >