All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Xiong Y" <xiong.y.zhang@intel.com>
To: 'Joonas Lahtinen' <joonas.lahtinen@linux.intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Vetter, Daniel" <daniel.vetter@intel.com>,
	"zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
	"jani.nikula@linux.intel.com" <jani.nikula@linux.intel.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"Bloomfield, Jon" <jon.bloomfield@intel.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"intel-gvt-dev@lists.freedesktop.org"
	<intel-gvt-dev@lists.freedesktop.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>,
	"Zhang, Xiong Y" <xiong.y.zhang@intel.com>
Subject: RE: [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
Date: Wed, 3 May 2017 09:22:22 +0000	[thread overview]
Message-ID: <8082FF9BCB2B054996454E47167FF4EC1C4D77EC@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <8082FF9BCB2B054996454E47167FF4EC1C4D0CAA@SHSMSX104.ccr.corp.intel.com>

> > + David and Jon
> >
> > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> >
> > The blocking issue I see is that bisecting is still not pointing at
> > relevant commits. Both bisected commits from Bugzilla are not related
> > to changes in stolen memory usage behavior. I'd assume a successful
> > bisect to land at the patches where we start creating kernel internal
> > objects from stolen memory. Otherwise we could be ignoring a bug
> > elsewhere. If it consistently lands on those patches, then there might
> > be something wrong with them, in addition to stolen memory problems.
> [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,
> guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang
> in guest dmesg. From this point, we could do git bisect.
> But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg
> when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> doesn't
> have mapping for stolen memory and IGD fail in accessing stolen memory
> from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and
> shouldn't go git bisect. You could check this host error message from the
> bugzilla
> attachment. And this should be fixed first.
> Anyway, I will try my best to get the ideal commit through git bisect, but I'm
> afraid
> the result is the same as past because we don't have a stable good point to
> start git
> bisect.
[Zhang, Xiong Y] hi, Joonas:
As you said, the gpu hang exist because i915 create ring buffer from stolen memory.
I did git bisect again, and the following commit is the first bad commit:
commit c58b735fc762e891481e92af7124b85cb0a51fce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 18 17:16:57 2016 +0100

    drm/i915: Allocate rings from stolen

    If we have stolen available, make use of it for ringbuffer allocation.
    Previously this was restricted to !llc platforms, as writing to stolen
    requires a GGTT mapping - but now that we have partial mappable support,
    the mappable aperture isn't quite so precious so we can use it more
    freely and ringbuffers are a good user for the otherwise wasted stolen.

After reverting this patch from drm-intel-nightly, I didn't see gpu hang during guest boot process.
So what's our next step ?

thanks
> 
> > Disabling power saving makes many bugs go away, but we still don't
> > disable power saving as a resolution to such bugs, but instead root
> > cause and fix the individual bugs.
> [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0,
> i915.enable_fbc=0,
> I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub.
> But gpu hang exist in guest and DMA R/W error exist in host.
> >
> > > Stolen memory isn't a standard pci resource and exists in RMRR which has
> > > identity mapping in iommu table when host boot up, so IGD could access
> > > stolen memory in host OS. While according to 'commit c875d2c1b808
> > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
> > domains")',RMRR
> > > isn't supported by kvm, then both EPT and guest iommu domain table lack
> > > of maaping for stolen memory in kvm IGD passthrough environment.
> >
> > Commit message text still fails to address that an exclusion was added
> > by commit:
> >
> > commit 18436afdc11a00ac881990b454cfb2eae81d6003
> > Author: David Woodhouse <David.Woodhouse@intel.com>
> > Date:   Wed Mar 25 15:05:47 2015 +0000
> >
> >     iommu/vt-d: Allow RMRR on graphics devices too
> >
> >     Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
> > IOMMU API
> >     domains") prevents certain options for devices with RMRRs. This even
> >     prevents those devices from getting a 1:1 mapping with 'iommu=pt',
> >     because we don't have the code to handle *preserving* the RMRR
> > regions
> >     when moving the device between domains.
> >
> > <SNIP>
> >
> > The quoted part of David's commit message leads me to believe it's
> > simply lack of some code in kernel for juggling the RMRRs when moving a
> > device between domains that is missing. Why is not that considered
> > instead? With that implemented, we would have more transparent pass-
> > through, which should be good.
> [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
> IOMMU API domains). This patch prevent devices associated with RMRRs from
> assigning to a guest, the one of reason is it knows RMRR isn't supported in
> guest
> domain IOMMU table, If these device's driver still access RMRR from guest,
> serious error will happen.
> 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an
> exception
> to above commit. So IGD could be assigned to a guest. But this doesn't mean
> IGD
> 1:1 mapping for RMRR will be support in guest domain iommu table
> 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu
> table.
> 
> When one device is assigned to a guest and this guest boot up, this guest
> domain
> Iommu table will take place of host domain iommu table on hardware. Our
> issue
> is guest domain iommu table doesn't have 1:1 mapping for RMRR.
> In order to set up 1:1 mapping for RMRR in guest domain iommu table, we
> have
> to modify kvm and qemu and kvm community have declined this.
> >
> > Also, was fixing the IGD driver loading with zero stolen memory
> > considered instead? All this information should exist in the commit
> > message.
> [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get
> the size of stolen memory. When guest read this register, qemu could trap
> it and return one value to guest.
> So in order to  " fixing the IGD driver loading with zero stolen memory ",
> We have to modify both Qemu and IGD driver:
> 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest
> 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit
> loading
> and continue.
> This doesn't give any benefit to i915, i915 will still disable stolen memory as
> i915
> see zero size stolen memory . So I prefer to disable stolen memory in i915
> directly
> and keep Qemu and IGD driver unchanged.
> >
> > After the bisecting is properly done, there is an agreement that
> > suggested RMRR preservation is absolutely a no-go, other options are
> > not viable, the commit message should be updated to reflect all that.
> > Then we should look in more detail on how to detect the scenarios when
> > we're running in a virtual machine that doesn't set up the 1:1 mapping
> > for RMRRs.
> [Zhang, Xiong Y] Sure, I will do this once we have an agreement.
> I really need the help from others who could correct me if I am wrong.
> >
> > Regards, Joonas
> > --
> > Joonas Lahtinen
> > Open Source Technology Center
> > Intel Corporation

WARNING: multiple messages have this Message-ID (diff)
From: "Zhang, Xiong Y" <xiong.y.zhang@intel.com>
To: 'Joonas Lahtinen' <joonas.lahtinen@linux.intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Vetter, Daniel" <daniel.vetter@intel.com>,
	"zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
	"jani.nikula@linux.intel.com" <jani.nikula@linux.intel.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	David Woodhouse <dwmw2@infradead.org>,
	"Bloomfield, Jon" <jon.bloomfield@intel.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"intel-gvt-dev@lists.freedesktop.org"
	<intel-gvt-dev@lists.freedesktop.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm
Date: Wed, 3 May 2017 09:22:22 +0000	[thread overview]
Message-ID: <8082FF9BCB2B054996454E47167FF4EC1C4D77EC@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <8082FF9BCB2B054996454E47167FF4EC1C4D0CAA@SHSMSX104.ccr.corp.intel.com>

> > + David and Jon
> >
> > On ti, 2017-04-25 at 18:34 +0800, Xiong Zhang wrote:
> >
> > The blocking issue I see is that bisecting is still not pointing at
> > relevant commits. Both bisected commits from Bugzilla are not related
> > to changes in stolen memory usage behavior. I'd assume a successful
> > bisect to land at the patches where we start creating kernel internal
> > objects from stolen memory. Otherwise we could be ignoring a bug
> > elsewhere. If it consistently lands on those patches, then there might
> > be something wrong with them, in addition to stolen memory problems.
> [Zhang, Xiong Y] I only try kernel 4.8 and 4.9 above, as the bugzilla descripted,
> guest 4.8 kernel doesn't see gpu hang in guest dmesg, 4.9 kernel has gpu hang
> in guest dmesg. From this point, we could do git bisect.
> But tons of IOMMU DMA R/W exception to stolen memory exist in host dmesg
> when guest kernel is 4.8 and 4.9. This means guest domain iommu table
> doesn't
> have mapping for stolen memory and IGD fail in accessing stolen memory
> from guest kernel 4.8 and 4.9. From this point, this issue isn't a regression and
> shouldn't go git bisect. You could check this host error message from the
> bugzilla
> attachment. And this should be fixed first.
> Anyway, I will try my best to get the ideal commit through git bisect, but I'm
> afraid
> the result is the same as past because we don't have a stable good point to
> start git
> bisect.
[Zhang, Xiong Y] hi, Joonas:
As you said, the gpu hang exist because i915 create ring buffer from stolen memory.
I did git bisect again, and the following commit is the first bad commit:
commit c58b735fc762e891481e92af7124b85cb0a51fce
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Aug 18 17:16:57 2016 +0100

    drm/i915: Allocate rings from stolen

    If we have stolen available, make use of it for ringbuffer allocation.
    Previously this was restricted to !llc platforms, as writing to stolen
    requires a GGTT mapping - but now that we have partial mappable support,
    the mappable aperture isn't quite so precious so we can use it more
    freely and ringbuffers are a good user for the otherwise wasted stolen.

After reverting this patch from drm-intel-nightly, I didn't see gpu hang during guest boot process.
So what's our next step ?

thanks
> 
> > Disabling power saving makes many bugs go away, but we still don't
> > disable power saving as a resolution to such bugs, but instead root
> > cause and fix the individual bugs.
> [Zhang, Xiong Y] I add i915.enable_rc6=0, i915.enable_dc=0,
> i915.enable_fbc=0,
> I915.enable_psr=0, i915.disable_power_well=0,i915.enable_ips=0 to grub.
> But gpu hang exist in guest and DMA R/W error exist in host.
> >
> > > Stolen memory isn't a standard pci resource and exists in RMRR which has
> > > identity mapping in iommu table when host boot up, so IGD could access
> > > stolen memory in host OS. While according to 'commit c875d2c1b808
> > > ("iommu/vt-d: Exclude devices using RMRRs from IOMMU API
> > domains")',RMRR
> > > isn't supported by kvm, then both EPT and guest iommu domain table lack
> > > of maaping for stolen memory in kvm IGD passthrough environment.
> >
> > Commit message text still fails to address that an exclusion was added
> > by commit:
> >
> > commit 18436afdc11a00ac881990b454cfb2eae81d6003
> > Author: David Woodhouse <David.Woodhouse@intel.com>
> > Date:   Wed Mar 25 15:05:47 2015 +0000
> >
> >     iommu/vt-d: Allow RMRR on graphics devices too
> >
> >     Commit c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
> > IOMMU API
> >     domains") prevents certain options for devices with RMRRs. This even
> >     prevents those devices from getting a 1:1 mapping with 'iommu=pt',
> >     because we don't have the code to handle *preserving* the RMRR
> > regions
> >     when moving the device between domains.
> >
> > <SNIP>
> >
> > The quoted part of David's commit message leads me to believe it's
> > simply lack of some code in kernel for juggling the RMRRs when moving a
> > device between domains that is missing. Why is not that considered
> > instead? With that implemented, we would have more transparent pass-
> > through, which should be good.
> [Zhang, Xiong Y] c875d2c1 ("iommu/vt-d: Exclude devices using RMRRs from
> IOMMU API domains). This patch prevent devices associated with RMRRs from
> assigning to a guest, the one of reason is it knows RMRR isn't supported in
> guest
> domain IOMMU table, If these device's driver still access RMRR from guest,
> serious error will happen.
> 18436afdc ("iommu/vt-d: Allow RMRR on graphics devices too "), add an
> exception
> to above commit. So IGD could be assigned to a guest. But this doesn't mean
> IGD
> 1:1 mapping for RMRR will be support in guest domain iommu table
> 'iommu=pt' is to set 1:1 mapping for all pci device in host domain iommu
> table.
> 
> When one device is assigned to a guest and this guest boot up, this guest
> domain
> Iommu table will take place of host domain iommu table on hardware. Our
> issue
> is guest domain iommu table doesn't have 1:1 mapping for RMRR.
> In order to set up 1:1 mapping for RMRR in guest domain iommu table, we
> have
> to modify kvm and qemu and kvm community have declined this.
> >
> > Also, was fixing the IGD driver loading with zero stolen memory
> > considered instead? All this information should exist in the commit
> > message.
> [Zhang, Xiong Y] IGD and i915 driver read pci config register 0x50 to get
> the size of stolen memory. When guest read this register, qemu could trap
> it and return one value to guest.
> So in order to  " fixing the IGD driver loading with zero stolen memory ",
> We have to modify both Qemu and IGD driver:
> 1) QEMU: trap read from pci cfg 0x50 register, then return zero to guest
> 2) IGD driver: when IGD driver see zero size of stolen memory, don't exit
> loading
> and continue.
> This doesn't give any benefit to i915, i915 will still disable stolen memory as
> i915
> see zero size stolen memory . So I prefer to disable stolen memory in i915
> directly
> and keep Qemu and IGD driver unchanged.
> >
> > After the bisecting is properly done, there is an agreement that
> > suggested RMRR preservation is absolutely a no-go, other options are
> > not viable, the commit message should be updated to reflect all that.
> > Then we should look in more detail on how to detect the scenarios when
> > we're running in a virtual machine that doesn't set up the 1:1 mapping
> > for RMRRs.
> [Zhang, Xiong Y] Sure, I will do this once we have an agreement.
> I really need the help from others who could correct me if I am wrong.
> >
> > Regards, Joonas
> > --
> > Joonas Lahtinen
> > Open Source Technology Center
> > Intel Corporation
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  parent reply	other threads:[~2017-05-03  9:22 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 10:34 [PATCH V6] drm/i915: Disable stolen memory when i915 runs in guest vm Xiong Zhang
2017-04-25 11:42 ` ✓ Fi.CI.BAT: success for " Patchwork
2017-04-25 12:06 ` [PATCH V6] " Joonas Lahtinen
2017-04-25 12:06   ` Joonas Lahtinen
2017-04-27  5:54   ` Zhang, Xiong Y
2017-05-05  9:14     ` Joonas Lahtinen
     [not found]   ` <8082FF9BCB2B054996454E47167FF4EC1C4D0CAA@SHSMSX104.ccr.corp.intel.com>
2017-05-03  9:22     ` Zhang, Xiong Y [this message]
2017-05-03  9:22       ` Zhang, Xiong Y
2017-05-05  9:21       ` Joonas Lahtinen
2017-05-05  9:21         ` Joonas Lahtinen
2017-05-06  2:58         ` Zhang, Xiong Y
2017-05-08 10:07           ` Joonas Lahtinen
2017-05-08 10:07             ` Joonas Lahtinen
2017-05-08 15:01             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8082FF9BCB2B054996454E47167FF4EC1C4D77EC@SHSMSX104.ccr.corp.intel.com \
    --to=xiong.y.zhang@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=daniel.vetter@intel.com \
    --cc=dwmw2@infradead.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jani.nikula@linux.intel.com \
    --cc=jon.bloomfield@intel.com \
    --cc=joonas.lahtinen@linux.intel.com \
    --cc=kevin.tian@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.