From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:49262)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1X2EqB-000854-95
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 03:20:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1X2Eq6-0005UC-7S
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 03:20:27 -0400
Received: from mx1.redhat.com ([209.132.183.28]:63088)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1X2Eq5-0005U2-Rs
	for qemu-devel@nongnu.org; Wed, 02 Jul 2014 03:20:22 -0400
Date: Wed, 2 Jul 2014 09:22:04 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20140702062204.GG3773@redhat.com>
References: <20140630090511.GB15777@redhat.com> <53B1300D.10001@intel.com>
	<20140630095509.GA17700@redhat.com> <53B139E6.1020607@intel.com>
	<20140630112845.GA29477@redhat.com> <53B21FAA.1050207@intel.com>
	<20140701091256.GB25897@redhat.com> <53B28392.3040604@intel.com>
	<20140701123327.GA4715@redhat.com> <53B3597A.5090908@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <53B3597A.5090908@intel.com>
Subject: Re: [Qemu-devel] [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD
 passthrough support
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Chen, Tiejun" <tiejun.chen@intel.com>
Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, qemu-devel@nongnu.org, Kelly.Zytaruk@amd.com, anthony.perard@citrix.com, anthony@codemonkey.ws, yang.z.zhang@intel.com, Paolo Bonzini <pbonzini@redhat.com>

On Wed, Jul 02, 2014 at 08:59:38AM +0800, Chen, Tiejun wrote:
> On 2014/7/1 20:33, Michael S. Tsirkin wrote:
> >On Tue, Jul 01, 2014 at 05:46:58PM +0800, Chen, Tiejun wrote:
> >>On 2014/7/1 17:12, Michael S. Tsirkin wrote:
> >>>On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote:
> >>>>On 2014/6/30 19:28, Michael S. Tsirkin wrote:
> >>>>>On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
> >>>>>>On 2014/6/30 17:55, Michael S. Tsirkin wrote:
> >>>>>>>On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
> >>>>>>>>On 2014/6/30 17:05, Michael S. Tsirkin wrote:
> >>>>>>>>>On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
> >>>>>>>>>>On 2014/6/30 14:48, Michael S. Tsirkin wrote:
> >>>>>>>>>>>On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
> >>>>>>>>>>>>On 2014/6/26 18:03, Paolo Bonzini wrote:
> >>>>>>>>>>>>>Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>- offsets 0x0000..0x0fff map to configuration space of the host MCH
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Are you saying the config space in the video device?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>No, I am saying in a new BAR, or at some magic offset of an existing
> >>>>>>>>>>>>>MMIO BAR.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>As I mentioned previously, the IGD guy told me we have no any unused a
> >>>>>>>>>>>>offset or BAR in the config space.
> >>>>>>>>>>>>
> >>>>>>>>>>>>And guy who are responsible for the native driver seems not be accept to
> >>>>>>>>>>>>extend some magic offset of an existing MMIO BAR.
> >>>>>>>>>>>>
> >>>>>>>>>>>>In addition I think in a short time its not possible to migrate i440fx to
> >>>>>>>>>>>>q35 as a PCIe machine of xen.
> >>>>>>>>>>>
> >>>>>>>>>>>That seems like a weak motivation.  I don't see a need to get something
> >>>>>>>>>>>merged upstream in a short time: this seems sure to miss 2.1,
> >>>>>>>>>>>so you have the time to make it architecturally sound.
> >>>>>>>>>>>"Making existing guests work" would be a better motivation.
> >>>>>>>>>>
> >>>>>>>>>>Yes.
> >>>>>>>>>
> >>>>>>>>>So focus on this then. Existing guests will probably work
> >>>>>>>>>fine on a newer chipset - likely better than on i440fx.
> >>>>>>>>>xen management tools need to do some work to support this?
> >>>>>>>>>That will just give everyone more long term benefits.
> >>>>>>>>>
> >>>>>>>>>If instead we create a hack that does not resemble
> >>>>>>>>>any existing hardware even remotely, what's the
> >>>>>>>>>chance that it will not break with any future
> >>>>>>>>>guest modification?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>Isn't this possible with an mch chipset?
> >>>>>>>>>>
> >>>>>>>>>>If you're saying q35, I mean AFAIK we have no any plan to migrate to this
> >>>>>>>>>>MCH in xen case.
> >>>>>>>>>
> >>>>>>>>>q35 or a newer chipset that's closer to what guests expect.
> >>>>>>>>>
> >>>>>>>>>>Additionally, I think I should follow this feature after
> >>>>>>>>>>q35 can work for xen scenario.
> >>>>>>>>>
> >>>>>>>>>What's stopping you?
> >>>>>>>>
> >>>>>>>>I mean if you want create an new machine based on q35, actually this is
> >>>>>>>>equal to start making xen to migrate to q35 now. Right? I can't image how
> >>>>>>>>much effort should be done.
> >>>>>>>
> >>>>>>>I don't see why you don't try. Sounds like a more robust solution to me.
> >>>>>>
> >>>>>>As I think this is another requirement to us. I'm not sure if I have enough
> >>>>>>time to touch this currently.
> >>>>>>
> >>>>>>>
> >>>>>>>>So this is a reason why I'm saying I'd like to follow this feature after q35
> >>>>>>>>can work with xen completely.
> >>>>>>>
> >>>>>>>Then we'll end up with more configurations to support, and to what end?
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>So could we do this step by step:
> >>>>>>>>>>>>
> >>>>>>>>>>>>#1 phase: We just cover current qemu-xen implementation based on i44fx, so
> >>>>>>>>>>>>still provide that pseudo ISA bridge at 00:1f.0 as we already did.
> >>>>>>>>>>>
> >>>>>>>>>>>By the way there is no reason to put it at 00:1f.0 specifically I think.
> >>>>>>>>>>>So it seems simple: create a dummy device that gets device and
> >>>>>>>>>>>vendor id as properties. If you really like, add an option to get values
> >>>>>>>>>>
> >>>>>>>>>>Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
> >>>>>>>>>>passthrough: create pseudo intel isa bridge. There, we fake this device just
> >>>>>>>>>>at 00:1f.0.
> >>>>>>>>>>But you guys don't like this, and shouldn't this be just this point we
> >>>>>>>>>>discussing now?
> >>>>>>>>>>
> >>>>>>>>>>If you guys agree that , everything is fine.
> >>>>>>>>>
> >>>>>>>>>Actually, this isn't what you did.
> >>>>>>>>>Don't tie it to xen, and don't tie it to 1f.
> >>>>>>>>>Just make it a simple stub pci device.
> >>>>>>>>>Whoever wants it, creates it.
> >>>>>>>>>
> >>>>>>>>>The thing I worry about, is the chance this will break going forward.
> >>>>>>>>>So you created a system with 2 isa bridges.
> >>>>>>>>
> >>>>>>>>I don't set class type to claim this is an ISA bridge, just a normal PCI
> >>>>>>>>device.
> >>>>>>>>+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
> >>>>>>>>*hdev)
> >>>>>>>>+{
> >>>>>>>>+    struct PCIDevice *dev;
> >>>>>>>>+
> >>>>>>>>+    char rid;
> >>>>>>>>+
> >>>>>>>>+    /* We havt to use a simple PCI device to fake this ISA bridge
> >>>>>>>>+     * to avoid making some confusion to BIOS and ACPI.
> >>>>>>>>+     */
> >>>>>>>>+    dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
> >>>>>>>>"pseudo-intel-pch-isa-bridge");
> >>>>>>>>+
> >>>>>>>>+    qdev_init_nofail(&dev->qdev);
> >>>>>>>>+
> >>>>>>>>+    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
> >>>>>>>>+    pci_config_set_device_id(dev->config, hdev->device_id);
> >>>>>>>>+
> >>>>>>>>+    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
> >>>>>>>>+
> >>>>>>>>+    pci_config_set_revision(dev->config, rid);
> >>>>>>>>+
> >>>>>>>>+    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
> >>>>>>>>+    return 0;
> >>>>>>>>+}
> >>>>>>>
> >>>>>>>Then I don't see how it's supposed to work.
> >>>>>>>Doesn't i915 look for an isa bridge?
> >>>>>>>
> >>>>>>>         /*
> >>>>>>>          * The reason to probe ISA bridge instead of Dev31:Fun0 is to
> >>>>>>>          * make graphics device passthrough work easy for VMM, that only
> >>>>>>>          * need to expose ISA bridge to let driver know the real hardware
> >>>>>>>          * underneath. This is a requirement from virtualization team.
> >>>>>>>          *
> >>>>>>>          * In some virtualized environments (e.g. XEN), there is irrelevant
> >>>>>>>          * ISA bridge in the system. To work reliably, we should scan trhough
> >>>>>>>          * all the ISA bridge devices and check for the first match, instead
> >>>>>>>          * of only checking the first one.
> >>>>>>>          */
> >>>>>>>         while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>>>>                 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
> >>>>>>>                         unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK;
> >>>>>>>                         dev_priv->pch_id = id;
> >>>>>>>
> >>>>>>>                         if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
> >>>>>>>                                 dev_priv->pch_type = PCH_IBX;
> >>>>>>>
> >>>>>>>etc
> >>>>>>>
> >>>>>>
> >>>>>>I already post this to mainline to change as follows:
> >>>>>>
> >>>>>>-	while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>>>+	pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
> >>>>>>+	if (pch) {
> >>>>>
> >>>>>I see - responded on that mail - but I thought the point is to make
> >>>>>existing guests run? In fact you said so explicitly.
> >>>>>
> >>>>>
> >>>>>>Please refer to this,
> >>>>>>
> >>>>>>[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of
> >>>>>>check class type
> >>>>>>
> >>>>>>Linux Native guys would like to accept this. And actually Windows always use
> >>>>>>devfn to detect this.
> >>>>>
> >>>>>In fact I see this:
> >>>>>
> >>>>>     linux 2.6.35-3.9 probes the 1st IA bridge
> >>>>>
> >>>>>         no idea how would you fix this.
> >>>>>         try changing default class for the main bridge?
> >>>>>
> >>>>>     linux 3.10 probes all ISA bridges
> >>>>>
> >>>>>         requires your stub to be the isa bridge?
> >>>>>
> >>>>>
> >>>>>I don't see why are driver guys so willing to do crazy things.  They
> >>>>>want to match specific device/vendor id pairs, why don't they do just
> >>>>>that? Why poke at class, random slot numbers etc etc?
> >>>>
> >>>>AFAIK what they did is from our previous incorrect requirement as I
> >>>>understand. So we need to correct this.
> >>>>
> >>>>Thanks
> >>>>Tiejun
> >>>
> >>>Since we can't fix existing guests, I would say
> >>
> >>This shouldn't be a fix existing guest, and this is why I can send this
> >>before you guys accept GFX passthrough for qemu ML.
> >>
> >>I think you can re-read that patch head description. 1f.0 can work under all
> >>scenarios including qemu-xen-traditonal.
> >>
> >>And this is also expected by native Linux organically, and especially
> >>Windows always use devfn to detect PCH, this is not like current Linux. So
> >>here I just sync this to make sense.
> >>
> >>Unless you'd like to make Linux specific to this point.
> >>
> >>Tiejun
> >
> >Why don't both windows and linux drivers look device up
> >by device and vendor id?
> 
> Its easy to understand because the same video device can work under
> different PCH, but we need to do different initialization based on different
> PCH.
> 
> Thanks
> Tiejun

Right so why don't they look up the PCH by device and vendor id?
For example, on linux use pci_get_device to iterate over all intel
devices.
I'm sure windows has a similar API as well.

This isn't a suggestion to make the change in guests right now!
Just a question.


> >Same applies to MCH really.
> >
> >
> >>>get things working first, find a clean way for
> >>>new driver to work next.
> >>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>This is already not something that exists on real hardware.
> >>>>>>>>>So it might break some guests that will get confused.
> >>>>>>>>>Maybe we are lucky and most guests see an unfamiliar device
> >>>>>>>>>and ignore it. It seems believable.
> >>>>>>>>>
> >>>>>>>>>But your MCH hack overrides registers in the pci host.
> >>>>>>>>
> >>>>>>>>We just try to write *one* register we already confirm this is safe enough.
> >>>>>>>
> >>>>>>>This should go in code in form of comments:
> >>>>>>>document what this register does on 440fx
> >>>>>>>and why it's safe to override.
> >>>>>>>We don't see what you
> >>>>>>>confirmed off-line.
> >>>>>>
> >>>>>>That offset is one specific to IGD usage, not for i440fx common. This is why
> >>>>>>we need to expose something in the host bridge. They're just introduced to
> >>>>>>support IGD.
> >>>>>>
> >>>>>>>
> >>>>>>>>Other register are read-only.
> >>>>>>>
> >>>>>>>Doesn't matter, need to document these as well.
> >>>>>>
> >>>>>>I think everything are covered in igd_pci_write()/igd_pci_write().
> >>>>>>
> >>>>>>>
> >>>>>>>>>Are we lucky and there's nothing in these registers
> >>>>>>>>>of interest to guests? This seems much more fragile.
> >>>>>>>>>So please poke at the spec, and compile the list
> >>>>>>>>>of registers you want to touch, figure out why they are
> >>>>>>>>>safe to override, and put this all in code comments.
> >>>>>>>>>
> >>>>>>>>>And the same thing that applies to the isa bridge
> >>>>>>>>
> >>>>>>>>We just expose its own vendor/device ids here. We don't access to change
> >>>>>>>>anything in the isa bridge.
> >>>>>>>>
> >>>>>>>>>applies here too. It should work without QEMU touching
> >>>>>>>>>hosts' hardware.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>from sysfs: device and vendor id are world readable, so just get them
> >>>>>>>>>>>directly and not through xen wrappers, this way you can open the files
> >>>>>>>>>>>RO and not RW.
> >>>>>>>>>>>You seem to poke at revision as well, I don't see
> >>>>>>>>>>>driver looking at that - strictly necessary?
> >>>>>>>>>>>If yes please patch host kernel to expose that info in sysfs,
> >>>>>>>>>>>though we can fall back on pci config if not there.
> >>>>>>>>>>>
> >>>>>>>>>>>MCH (bridge_dev) hacks in i915 are nastier.
> >>>>>>>>>>>To clean them up, we really have to have an explicit driver for this
> >>>>>>>>>>>bridge, not a pass-through device.  Long term, the right thing to do is
> >>>>>>>>>>>likely to extend host driver and expose the necessary information in
> >>>>>>>>>>>sysfs on host kernel.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>I'm a bit confused. Any sysfs should be filled by the associated PCIe
> >>>>>>>>>>device, shouldn't it? So qemu still need to emulate this PCIe device
> >>>>>>>>>>firstly, then set properties into sysfs.
> >>>>>>>>>
> >>>>>>>>>I am talking about getting host properties into qemu.
> >>>>>>>>>You don't want to give qemu R/W root access to host sysfs system
> >>>>>>>>>of the root bridge, that's not secure.
> >>>>>>>>
> >>>>>>>>igd_pci_read()
> >>>>>>>>	|
> >>>>>>>>	+ xen_host_pci_get_block()
> >>>>>>>>		|
> >>>>>>>>		+ xen_host_pci_config_read(()
> >>>>>>>>			|
> >>>>>>>>			+  pread()
> >>>>>>>>
> >>>>>>>>So shouldn't we already expose these information via sysfs?
> >>>>>>>
> >>>>>>>That's poking at sysfs of a real device, and after opening it RW.
> >>>>>>
> >>>>>>I don't think we can really write anything to those read-only sysfs
> >>>>>>interface even we open them as RW.
> >>>>>>
> >>>>>>Tiejun
> >>>>>>
> >>>>>>>
> >>>>>>>>>Avoiding read only access to filesystem is a good idea too, so it
> >>>>>>>>>should be possible to pass all parameters in as
> >>>>>>>>>device properties, and let whoever starts qemu
> >>>>>>>>>figure out what are reasonable values.
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>#2 phase: Now, we will choose a capability ID that won't be conflicting with
> >>>>>>>>>>>>others. To do this properly, we need to get one from PCI SIG group. To have
> >>>>>>>>>>>>this workable and consistently validated, this method shouldn't be virt
> >>>>>>>>>>>>specific. Then native driver should use the same method.
> >>>>>>>>>>>
> >>>>>>>>>>>You mean you will be able to talk sense into hardware guys?
> >>>>>>>>>>>I doubt that. If they could be convinced to make e.g. i915 base a
> >>>>>>>>>>
> >>>>>>>>>>We're negotiating this, so this is just our long term solution we figure out
> >>>>>>>>>>currently.
> >>>>>>>>>>
> >>>>>>>>>>>proper BAR, why didn't they?
> >>>>>>>>>>
> >>>>>>>>>>We already have no any free BAR as we mentioned previously.
> >>>>>>>>>
> >>>>>>>>>I thought you were talking about modifying hardware?
> >>>>>>>>
> >>>>>>>>Yes.
> >>>>>>>>
> >>>>>>>>Tiejun
> >>>>>>>
> >>>>>>>And they can't figure out how to stick extra info in an existing BAR?
> >>>>>>>Oh well, that's hardware for you.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>So when xen work on
> >>>>>>>>>>>>q35 PCIe machine, we can walk this way.
> >>>>>>>>>>>
> >>>>>>>>>>>If you are emulating MCH anyway, pick one that is close
> >>>>>>>>>>>to what i915 driver expects. It would then work with existing
> >>>>>>>>>>
> >>>>>>>>>>Looks you guys prefer we create a new MCH anyway, right? But is it necessary
> >>>>>>>>>>to create a new based on i440fx just for a little change?
> >>>>>>>>>>
> >>>>>>>>>>Thanks
> >>>>>>>>>>Tiejun
> >>>>>>>>>
> >>>>>>>>>You can inherit it. Maybe you are lucky and this happens to
> >>>>>>>>>work without conflicting with whatever other guests want to do.
> >>>>>>>>>But if you ask me, you are really just piling up hacks.
> >>>>>>>>>If some guest does not work on i440, you should just work on
> >>>>>>>>>emulating whatever it does work on.
> >>>>>>>>>That would have real value.
> >>>>>>>>>
> >>>>>>>>>>>devices, without new capability IDs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Anthony,
> >>>>>>>>>>>>
> >>>>>>>>>>>>Any comments to address this in xen case?
> >>>>>>>>>>>>
> >>>>>>>>>>>>Thanks
> >>>>>>>>>>>>Tiejun
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>Xen-devel mailing list
> >>>>>>>Xen-devel@lists.xen.org
> >>>>>>>http://lists.xen.org/xen-devel
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>>
> >

From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Xen-devel] [v5][PATCH 0/5] xen: add Intel IGD
 passthrough support
Date: Wed, 2 Jul 2014 09:22:04 +0300
Message-ID: <20140702062204.GG3773@redhat.com>
References: <20140630090511.GB15777@redhat.com> <53B1300D.10001@intel.com>
	<20140630095509.GA17700@redhat.com> <53B139E6.1020607@intel.com>
	<20140630112845.GA29477@redhat.com> <53B21FAA.1050207@intel.com>
	<20140701091256.GB25897@redhat.com> <53B28392.3040604@intel.com>
	<20140701123327.GA4715@redhat.com> <53B3597A.5090908@intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
Content-Disposition: inline
In-Reply-To: <53B3597A.5090908@intel.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
To: "Chen, Tiejun" <tiejun.chen@intel.com>
Cc: peter.maydell@linaro.org, xen-devel@lists.xensource.com, stefano.stabellini@eu.citrix.com, allen.m.kay@intel.com, qemu-devel@nongnu.org, Kelly.Zytaruk@amd.com, anthony.perard@citrix.com, anthony@codemonkey.ws, yang.z.zhang@intel.com, Paolo Bonzini <pbonzini@redhat.com>
List-Id: xen-devel@lists.xenproject.org

On Wed, Jul 02, 2014 at 08:59:38AM +0800, Chen, Tiejun wrote:
> On 2014/7/1 20:33, Michael S. Tsirkin wrote:
> >On Tue, Jul 01, 2014 at 05:46:58PM +0800, Chen, Tiejun wrote:
> >>On 2014/7/1 17:12, Michael S. Tsirkin wrote:
> >>>On Tue, Jul 01, 2014 at 10:40:42AM +0800, Chen, Tiejun wrote:
> >>>>On 2014/6/30 19:28, Michael S. Tsirkin wrote:
> >>>>>On Mon, Jun 30, 2014 at 06:20:22PM +0800, Chen, Tiejun wrote:
> >>>>>>On 2014/6/30 17:55, Michael S. Tsirkin wrote:
> >>>>>>>On Mon, Jun 30, 2014 at 05:38:21PM +0800, Chen, Tiejun wrote:
> >>>>>>>>On 2014/6/30 17:05, Michael S. Tsirkin wrote:
> >>>>>>>>>On Mon, Jun 30, 2014 at 03:24:58PM +0800, Chen, Tiejun wrote:
> >>>>>>>>>>On 2014/6/30 14:48, Michael S. Tsirkin wrote:
> >>>>>>>>>>>On Mon, Jun 30, 2014 at 10:51:49AM +0800, Chen, Tiejun wrote:
> >>>>>>>>>>>>On 2014/6/26 18:03, Paolo Bonzini wrote:
> >>>>>>>>>>>>>Il 26/06/2014 11:18, Chen, Tiejun ha scritto:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>- offsets 0x0000..0x0fff map to configuration space of the host MCH
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>Are you saying the config space in the video device?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>No, I am saying in a new BAR, or at some magic offset of an existing
> >>>>>>>>>>>>>MMIO BAR.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>As I mentioned previously, the IGD guy told me we have no any unused a
> >>>>>>>>>>>>offset or BAR in the config space.
> >>>>>>>>>>>>
> >>>>>>>>>>>>And guy who are responsible for the native driver seems not be accept to
> >>>>>>>>>>>>extend some magic offset of an existing MMIO BAR.
> >>>>>>>>>>>>
> >>>>>>>>>>>>In addition I think in a short time its not possible to migrate i440fx to
> >>>>>>>>>>>>q35 as a PCIe machine of xen.
> >>>>>>>>>>>
> >>>>>>>>>>>That seems like a weak motivation.  I don't see a need to get something
> >>>>>>>>>>>merged upstream in a short time: this seems sure to miss 2.1,
> >>>>>>>>>>>so you have the time to make it architecturally sound.
> >>>>>>>>>>>"Making existing guests work" would be a better motivation.
> >>>>>>>>>>
> >>>>>>>>>>Yes.
> >>>>>>>>>
> >>>>>>>>>So focus on this then. Existing guests will probably work
> >>>>>>>>>fine on a newer chipset - likely better than on i440fx.
> >>>>>>>>>xen management tools need to do some work to support this?
> >>>>>>>>>That will just give everyone more long term benefits.
> >>>>>>>>>
> >>>>>>>>>If instead we create a hack that does not resemble
> >>>>>>>>>any existing hardware even remotely, what's the
> >>>>>>>>>chance that it will not break with any future
> >>>>>>>>>guest modification?
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>Isn't this possible with an mch chipset?
> >>>>>>>>>>
> >>>>>>>>>>If you're saying q35, I mean AFAIK we have no any plan to migrate to this
> >>>>>>>>>>MCH in xen case.
> >>>>>>>>>
> >>>>>>>>>q35 or a newer chipset that's closer to what guests expect.
> >>>>>>>>>
> >>>>>>>>>>Additionally, I think I should follow this feature after
> >>>>>>>>>>q35 can work for xen scenario.
> >>>>>>>>>
> >>>>>>>>>What's stopping you?
> >>>>>>>>
> >>>>>>>>I mean if you want create an new machine based on q35, actually this is
> >>>>>>>>equal to start making xen to migrate to q35 now. Right? I can't image how
> >>>>>>>>much effort should be done.
> >>>>>>>
> >>>>>>>I don't see why you don't try. Sounds like a more robust solution to me.
> >>>>>>
> >>>>>>As I think this is another requirement to us. I'm not sure if I have enough
> >>>>>>time to touch this currently.
> >>>>>>
> >>>>>>>
> >>>>>>>>So this is a reason why I'm saying I'd like to follow this feature after q35
> >>>>>>>>can work with xen completely.
> >>>>>>>
> >>>>>>>Then we'll end up with more configurations to support, and to what end?
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>So could we do this step by step:
> >>>>>>>>>>>>
> >>>>>>>>>>>>#1 phase: We just cover current qemu-xen implementation based on i44fx, so
> >>>>>>>>>>>>still provide that pseudo ISA bridge at 00:1f.0 as we already did.
> >>>>>>>>>>>
> >>>>>>>>>>>By the way there is no reason to put it at 00:1f.0 specifically I think.
> >>>>>>>>>>>So it seems simple: create a dummy device that gets device and
> >>>>>>>>>>>vendor id as properties. If you really like, add an option to get values
> >>>>>>>>>>
> >>>>>>>>>>Yes, this is just what we did in [Xen-devel] [v5][PATCH 2/5] xen, gfx
> >>>>>>>>>>passthrough: create pseudo intel isa bridge. There, we fake this device just
> >>>>>>>>>>at 00:1f.0.
> >>>>>>>>>>But you guys don't like this, and shouldn't this be just this point we
> >>>>>>>>>>discussing now?
> >>>>>>>>>>
> >>>>>>>>>>If you guys agree that , everything is fine.
> >>>>>>>>>
> >>>>>>>>>Actually, this isn't what you did.
> >>>>>>>>>Don't tie it to xen, and don't tie it to 1f.
> >>>>>>>>>Just make it a simple stub pci device.
> >>>>>>>>>Whoever wants it, creates it.
> >>>>>>>>>
> >>>>>>>>>The thing I worry about, is the chance this will break going forward.
> >>>>>>>>>So you created a system with 2 isa bridges.
> >>>>>>>>
> >>>>>>>>I don't set class type to claim this is an ISA bridge, just a normal PCI
> >>>>>>>>device.
> >>>>>>>>+static int create_pseudo_pch_isa_bridge(PCIBus *bus, XenHostPCIDevice
> >>>>>>>>*hdev)
> >>>>>>>>+{
> >>>>>>>>+    struct PCIDevice *dev;
> >>>>>>>>+
> >>>>>>>>+    char rid;
> >>>>>>>>+
> >>>>>>>>+    /* We havt to use a simple PCI device to fake this ISA bridge
> >>>>>>>>+     * to avoid making some confusion to BIOS and ACPI.
> >>>>>>>>+     */
> >>>>>>>>+    dev = pci_create(bus, PCI_DEVFN(0x1f, 0),
> >>>>>>>>"pseudo-intel-pch-isa-bridge");
> >>>>>>>>+
> >>>>>>>>+    qdev_init_nofail(&dev->qdev);
> >>>>>>>>+
> >>>>>>>>+    pci_config_set_vendor_id(dev->config, hdev->vendor_id);
> >>>>>>>>+    pci_config_set_device_id(dev->config, hdev->device_id);
> >>>>>>>>+
> >>>>>>>>+    xen_host_pci_get_block(hdev, PCI_REVISION_ID, (uint8_t *)&rid, 1);
> >>>>>>>>+
> >>>>>>>>+    pci_config_set_revision(dev->config, rid);
> >>>>>>>>+
> >>>>>>>>+    XEN_PT_LOG(dev, "The pseudo Intel PCH ISA bridge created.\n");
> >>>>>>>>+    return 0;
> >>>>>>>>+}
> >>>>>>>
> >>>>>>>Then I don't see how it's supposed to work.
> >>>>>>>Doesn't i915 look for an isa bridge?
> >>>>>>>
> >>>>>>>         /*
> >>>>>>>          * The reason to probe ISA bridge instead of Dev31:Fun0 is to
> >>>>>>>          * make graphics device passthrough work easy for VMM, that only
> >>>>>>>          * need to expose ISA bridge to let driver know the real hardware
> >>>>>>>          * underneath. This is a requirement from virtualization team.
> >>>>>>>          *
> >>>>>>>          * In some virtualized environments (e.g. XEN), there is irrelevant
> >>>>>>>          * ISA bridge in the system. To work reliably, we should scan trhough
> >>>>>>>          * all the ISA bridge devices and check for the first match, instead
> >>>>>>>          * of only checking the first one.
> >>>>>>>          */
> >>>>>>>         while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>>>>                 if (pch->vendor == PCI_VENDOR_ID_INTEL) {
> >>>>>>>                         unsigned short id = pch->device & INTEL_PCH_DEVICE_ID_MASK;
> >>>>>>>                         dev_priv->pch_id = id;
> >>>>>>>
> >>>>>>>                         if (id == INTEL_PCH_IBX_DEVICE_ID_TYPE) {
> >>>>>>>                                 dev_priv->pch_type = PCH_IBX;
> >>>>>>>
> >>>>>>>etc
> >>>>>>>
> >>>>>>
> >>>>>>I already post this to mainline to change as follows:
> >>>>>>
> >>>>>>-	while ((pch = pci_get_class(PCI_CLASS_BRIDGE_ISA << 8, pch))) {
> >>>>>>+	pch = pci_get_bus_and_slot(0, PCI_DEVFN(0x1f, 0));
> >>>>>>+	if (pch) {
> >>>>>
> >>>>>I see - responded on that mail - but I thought the point is to make
> >>>>>existing guests run? In fact you said so explicitly.
> >>>>>
> >>>>>
> >>>>>>Please refer to this,
> >>>>>>
> >>>>>>[RFC][PATCH] gpu:drm:i915:intel_detect_pch: back to check devfn instead of
> >>>>>>check class type
> >>>>>>
> >>>>>>Linux Native guys would like to accept this. And actually Windows always use
> >>>>>>devfn to detect this.
> >>>>>
> >>>>>In fact I see this:
> >>>>>
> >>>>>     linux 2.6.35-3.9 probes the 1st IA bridge
> >>>>>
> >>>>>         no idea how would you fix this.
> >>>>>         try changing default class for the main bridge?
> >>>>>
> >>>>>     linux 3.10 probes all ISA bridges
> >>>>>
> >>>>>         requires your stub to be the isa bridge?
> >>>>>
> >>>>>
> >>>>>I don't see why are driver guys so willing to do crazy things.  They
> >>>>>want to match specific device/vendor id pairs, why don't they do just
> >>>>>that? Why poke at class, random slot numbers etc etc?
> >>>>
> >>>>AFAIK what they did is from our previous incorrect requirement as I
> >>>>understand. So we need to correct this.
> >>>>
> >>>>Thanks
> >>>>Tiejun
> >>>
> >>>Since we can't fix existing guests, I would say
> >>
> >>This shouldn't be a fix existing guest, and this is why I can send this
> >>before you guys accept GFX passthrough for qemu ML.
> >>
> >>I think you can re-read that patch head description. 1f.0 can work under all
> >>scenarios including qemu-xen-traditonal.
> >>
> >>And this is also expected by native Linux organically, and especially
> >>Windows always use devfn to detect PCH, this is not like current Linux. So
> >>here I just sync this to make sense.
> >>
> >>Unless you'd like to make Linux specific to this point.
> >>
> >>Tiejun
> >
> >Why don't both windows and linux drivers look device up
> >by device and vendor id?
> 
> Its easy to understand because the same video device can work under
> different PCH, but we need to do different initialization based on different
> PCH.
> 
> Thanks
> Tiejun

Right so why don't they look up the PCH by device and vendor id?
For example, on linux use pci_get_device to iterate over all intel
devices.
I'm sure windows has a similar API as well.

This isn't a suggestion to make the change in guests right now!
Just a question.


> >Same applies to MCH really.
> >
> >
> >>>get things working first, find a clean way for
> >>>new driver to work next.
> >>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>>This is already not something that exists on real hardware.
> >>>>>>>>>So it might break some guests that will get confused.
> >>>>>>>>>Maybe we are lucky and most guests see an unfamiliar device
> >>>>>>>>>and ignore it. It seems believable.
> >>>>>>>>>
> >>>>>>>>>But your MCH hack overrides registers in the pci host.
> >>>>>>>>
> >>>>>>>>We just try to write *one* register we already confirm this is safe enough.
> >>>>>>>
> >>>>>>>This should go in code in form of comments:
> >>>>>>>document what this register does on 440fx
> >>>>>>>and why it's safe to override.
> >>>>>>>We don't see what you
> >>>>>>>confirmed off-line.
> >>>>>>
> >>>>>>That offset is one specific to IGD usage, not for i440fx common. This is why
> >>>>>>we need to expose something in the host bridge. They're just introduced to
> >>>>>>support IGD.
> >>>>>>
> >>>>>>>
> >>>>>>>>Other register are read-only.
> >>>>>>>
> >>>>>>>Doesn't matter, need to document these as well.
> >>>>>>
> >>>>>>I think everything are covered in igd_pci_write()/igd_pci_write().
> >>>>>>
> >>>>>>>
> >>>>>>>>>Are we lucky and there's nothing in these registers
> >>>>>>>>>of interest to guests? This seems much more fragile.
> >>>>>>>>>So please poke at the spec, and compile the list
> >>>>>>>>>of registers you want to touch, figure out why they are
> >>>>>>>>>safe to override, and put this all in code comments.
> >>>>>>>>>
> >>>>>>>>>And the same thing that applies to the isa bridge
> >>>>>>>>
> >>>>>>>>We just expose its own vendor/device ids here. We don't access to change
> >>>>>>>>anything in the isa bridge.
> >>>>>>>>
> >>>>>>>>>applies here too. It should work without QEMU touching
> >>>>>>>>>hosts' hardware.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>from sysfs: device and vendor id are world readable, so just get them
> >>>>>>>>>>>directly and not through xen wrappers, this way you can open the files
> >>>>>>>>>>>RO and not RW.
> >>>>>>>>>>>You seem to poke at revision as well, I don't see
> >>>>>>>>>>>driver looking at that - strictly necessary?
> >>>>>>>>>>>If yes please patch host kernel to expose that info in sysfs,
> >>>>>>>>>>>though we can fall back on pci config if not there.
> >>>>>>>>>>>
> >>>>>>>>>>>MCH (bridge_dev) hacks in i915 are nastier.
> >>>>>>>>>>>To clean them up, we really have to have an explicit driver for this
> >>>>>>>>>>>bridge, not a pass-through device.  Long term, the right thing to do is
> >>>>>>>>>>>likely to extend host driver and expose the necessary information in
> >>>>>>>>>>>sysfs on host kernel.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>I'm a bit confused. Any sysfs should be filled by the associated PCIe
> >>>>>>>>>>device, shouldn't it? So qemu still need to emulate this PCIe device
> >>>>>>>>>>firstly, then set properties into sysfs.
> >>>>>>>>>
> >>>>>>>>>I am talking about getting host properties into qemu.
> >>>>>>>>>You don't want to give qemu R/W root access to host sysfs system
> >>>>>>>>>of the root bridge, that's not secure.
> >>>>>>>>
> >>>>>>>>igd_pci_read()
> >>>>>>>>	|
> >>>>>>>>	+ xen_host_pci_get_block()
> >>>>>>>>		|
> >>>>>>>>		+ xen_host_pci_config_read(()
> >>>>>>>>			|
> >>>>>>>>			+  pread()
> >>>>>>>>
> >>>>>>>>So shouldn't we already expose these information via sysfs?
> >>>>>>>
> >>>>>>>That's poking at sysfs of a real device, and after opening it RW.
> >>>>>>
> >>>>>>I don't think we can really write anything to those read-only sysfs
> >>>>>>interface even we open them as RW.
> >>>>>>
> >>>>>>Tiejun
> >>>>>>
> >>>>>>>
> >>>>>>>>>Avoiding read only access to filesystem is a good idea too, so it
> >>>>>>>>>should be possible to pass all parameters in as
> >>>>>>>>>device properties, and let whoever starts qemu
> >>>>>>>>>figure out what are reasonable values.
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>#2 phase: Now, we will choose a capability ID that won't be conflicting with
> >>>>>>>>>>>>others. To do this properly, we need to get one from PCI SIG group. To have
> >>>>>>>>>>>>this workable and consistently validated, this method shouldn't be virt
> >>>>>>>>>>>>specific. Then native driver should use the same method.
> >>>>>>>>>>>
> >>>>>>>>>>>You mean you will be able to talk sense into hardware guys?
> >>>>>>>>>>>I doubt that. If they could be convinced to make e.g. i915 base a
> >>>>>>>>>>
> >>>>>>>>>>We're negotiating this, so this is just our long term solution we figure out
> >>>>>>>>>>currently.
> >>>>>>>>>>
> >>>>>>>>>>>proper BAR, why didn't they?
> >>>>>>>>>>
> >>>>>>>>>>We already have no any free BAR as we mentioned previously.
> >>>>>>>>>
> >>>>>>>>>I thought you were talking about modifying hardware?
> >>>>>>>>
> >>>>>>>>Yes.
> >>>>>>>>
> >>>>>>>>Tiejun
> >>>>>>>
> >>>>>>>And they can't figure out how to stick extra info in an existing BAR?
> >>>>>>>Oh well, that's hardware for you.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>So when xen work on
> >>>>>>>>>>>>q35 PCIe machine, we can walk this way.
> >>>>>>>>>>>
> >>>>>>>>>>>If you are emulating MCH anyway, pick one that is close
> >>>>>>>>>>>to what i915 driver expects. It would then work with existing
> >>>>>>>>>>
> >>>>>>>>>>Looks you guys prefer we create a new MCH anyway, right? But is it necessary
> >>>>>>>>>>to create a new based on i440fx just for a little change?
> >>>>>>>>>>
> >>>>>>>>>>Thanks
> >>>>>>>>>>Tiejun
> >>>>>>>>>
> >>>>>>>>>You can inherit it. Maybe you are lucky and this happens to
> >>>>>>>>>work without conflicting with whatever other guests want to do.
> >>>>>>>>>But if you ask me, you are really just piling up hacks.
> >>>>>>>>>If some guest does not work on i440, you should just work on
> >>>>>>>>>emulating whatever it does work on.
> >>>>>>>>>That would have real value.
> >>>>>>>>>
> >>>>>>>>>>>devices, without new capability IDs.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Anthony,
> >>>>>>>>>>>>
> >>>>>>>>>>>>Any comments to address this in xen case?
> >>>>>>>>>>>>
> >>>>>>>>>>>>Thanks
> >>>>>>>>>>>>Tiejun
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>Xen-devel mailing list
> >>>>>>>Xen-devel@lists.xen.org
> >>>>>>>http://lists.xen.org/xen-devel
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>>
> >