* Radeon DRM dom0 issues
@ 2014-01-20 14:58 Michael D Labriola
2014-01-20 15:14 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael D Labriola @ 2014-01-20 14:58 UTC (permalink / raw)
To: xen-devel; +Cc: michael.d.labriola
Anyone here running a dom0 w/ Radeon DRM? I'm having consistent crashes
with multiple older R600 series (HD 6470 and HD 6570) and unusably slow
graphics with a newer HD7000 (can see each line refresh indiviually on
radeonfb tty). All 3 systems seem to work fine bare metal.
The R600 crashes happen seemingly randomly when using OpenGL Compositor in
Enlightenment 0.17. My dom0 need not even have any domUs running.
Sometimes it happens within a few minutes, sometimes it will run OK for an
afternoon or so. Eventually TTM issues an "unable to get page 0" error
message, the radeon driver follows that up with a
"radeon_gem_object_create failed to allocate gem" error message. Then the
radeon driver starts spamming that gem failure message until I'm forced to
reboot.
Behavior is identical with kernel versions 3.8, 3.10, and 3.13-rc8.
I'm using Xen 4.2.1 32bit.
xen command line is: vga=mode=0x314
kernel command line is: root=/dev/md0 quiet pci=realloc
console=ttyS0,115200n8 console=tty0
Fingers crossed that there's some magic boot parameter I'm missing. It's
been a while since I dabbled with this stuff. ;-)
Thanks!
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-20 14:58 Radeon DRM dom0 issues Michael D Labriola
@ 2014-01-20 15:14 ` Konrad Rzeszutek Wilk
2014-01-20 15:26 ` Michael D Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-20 15:14 UTC (permalink / raw)
To: Michael D Labriola; +Cc: michael.d.labriola, xen-devel
On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola wrote:
> Anyone here running a dom0 w/ Radeon DRM? I'm having consistent crashes
> with multiple older R600 series (HD 6470 and HD 6570) and unusably slow
> graphics with a newer HD7000 (can see each line refresh indiviually on
> radeonfb tty). All 3 systems seem to work fine bare metal.
I hadn't been using DRM, just Xserver. Is that what you mean?
>
> The R600 crashes happen seemingly randomly when using OpenGL Compositor in
> Enlightenment 0.17. My dom0 need not even have any domUs running.
> Sometimes it happens within a few minutes, sometimes it will run OK for an
> afternoon or so. Eventually TTM issues an "unable to get page 0" error
> message, the radeon driver follows that up with a
> "radeon_gem_object_create failed to allocate gem" error message. Then the
> radeon driver starts spamming that gem failure message until I'm forced to
> reboot.
>
> Behavior is identical with kernel versions 3.8, 3.10, and 3.13-rc8.
>
> I'm using Xen 4.2.1 32bit.
>
> xen command line is: vga=mode=0x314
> kernel command line is: root=/dev/md0 quiet pci=realloc
> console=ttyS0,115200n8 console=tty0
>
> Fingers crossed that there's some magic boot parameter I'm missing. It's
> been a while since I dabbled with this stuff. ;-)
That should have been working. I had been using Xserver for years now.
Just to make sure - you aren't referring to running with X right? Just
simple framebuffer?
>
> Thanks!
>
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@gdeb.com
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-20 15:14 ` Konrad Rzeszutek Wilk
@ 2014-01-20 15:26 ` Michael D Labriola
2014-01-20 15:38 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael D Labriola @ 2014-01-20 15:26 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk; +Cc: michael.d.labriola, xen-devel
Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014 10:14:36 AM:
> From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> To: Michael D Labriola <mlabriol@gdeb.com>,
> Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> Date: 01/20/2014 10:14 AM
> Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>
> On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola wrote:
> > Anyone here running a dom0 w/ Radeon DRM? I'm having consistent
crashes
> > with multiple older R600 series (HD 6470 and HD 6570) and unusably
slow
> > graphics with a newer HD7000 (can see each line refresh indiviually on
> > radeonfb tty). All 3 systems seem to work fine bare metal.
>
> I hadn't been using DRM, just Xserver. Is that what you mean?
The R600 problems happen when in X, using OpenGL, on my dom0. The
RadeonSI sluggishness is when using the KMS framebuffer device for a plain
text console login.
> >
> > The R600 crashes happen seemingly randomly when using OpenGL
Compositor in
> > Enlightenment 0.17. My dom0 need not even have any domUs running.
> > Sometimes it happens within a few minutes, sometimes it will run OK
for an
> > afternoon or so. Eventually TTM issues an "unable to get page 0"
error
> > message, the radeon driver follows that up with a
> > "radeon_gem_object_create failed to allocate gem" error message. Then
the
> > radeon driver starts spamming that gem failure message until I'm
forced to
> > reboot.
> >
> > Behavior is identical with kernel versions 3.8, 3.10, and 3.13-rc8.
> >
> > I'm using Xen 4.2.1 32bit.
> >
> > xen command line is: vga=mode=0x314
> > kernel command line is: root=/dev/md0 quiet pci=realloc
> > console=ttyS0,115200n8 console=tty0
> >
> > Fingers crossed that there's some magic boot parameter I'm missing.
It's
> > been a while since I dabbled with this stuff. ;-)
>
> That should have been working. I had been using Xserver for years now.
> Just to make sure - you aren't referring to running with X right? Just
> simple framebuffer?
I'm not sure I understand the delineation between Xserver and X... I am
indeed in X, using xf86-video-ati and Mesa for OpenGL support. I've been
doing this for years as well, but with nouveau and w/out 3d support. Was
kinda hoping that the Radeon cards I got a hold of would allow for
hardware accelerated 3d on my dom0.
I just tried adding 'nopat' to the kernel command line. I remember doing
that a year ago... don't recall why. Any chance that helps?
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-20 15:26 ` Michael D Labriola
@ 2014-01-20 15:38 ` Konrad Rzeszutek Wilk
2014-01-20 20:15 ` Michael D Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-20 15:38 UTC (permalink / raw)
To: Michael D Labriola; +Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel
On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola wrote:
> Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014 10:14:36 AM:
>
> > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > To: Michael D Labriola <mlabriol@gdeb.com>,
> > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > Date: 01/20/2014 10:14 AM
> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >
> > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola wrote:
> > > Anyone here running a dom0 w/ Radeon DRM? I'm having consistent
> crashes
> > > with multiple older R600 series (HD 6470 and HD 6570) and unusably
> slow
> > > graphics with a newer HD7000 (can see each line refresh indiviually on
>
> > > radeonfb tty). All 3 systems seem to work fine bare metal.
> >
> > I hadn't been using DRM, just Xserver. Is that what you mean?
>
> The R600 problems happen when in X, using OpenGL, on my dom0. The
> RadeonSI sluggishness is when using the KMS framebuffer device for a plain
> text console login.
So sluggish is probably due to the PAT not being enabled. This patch
should be applied:
lkml.org/lkml/2011/11/8/406
(or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
and these two reverted:
"xen/pat: Disable PAT support for now."
"xen/pat: Disable PAT using pat_enabled value."
Which is to say do:
git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
>
>
> > >
> > > The R600 crashes happen seemingly randomly when using OpenGL
> Compositor in
> > > Enlightenment 0.17. My dom0 need not even have any domUs running.
> > > Sometimes it happens within a few minutes, sometimes it will run OK
> for an
> > > afternoon or so. Eventually TTM issues an "unable to get page 0"
> error
> > > message, the radeon driver follows that up with a
> > > "radeon_gem_object_create failed to allocate gem" error message. Then
> the
> > > radeon driver starts spamming that gem failure message until I'm
> forced to
> > > reboot.
> > >
> > > Behavior is identical with kernel versions 3.8, 3.10, and 3.13-rc8.
> > >
> > > I'm using Xen 4.2.1 32bit.
> > >
> > > xen command line is: vga=mode=0x314
> > > kernel command line is: root=/dev/md0 quiet pci=realloc
> > > console=ttyS0,115200n8 console=tty0
> > >
> > > Fingers crossed that there's some magic boot parameter I'm missing.
> It's
> > > been a while since I dabbled with this stuff. ;-)
> >
> > That should have been working. I had been using Xserver for years now.
> > Just to make sure - you aren't referring to running with X right? Just
> > simple framebuffer?
>
> I'm not sure I understand the delineation between Xserver and X... I am
> indeed in X, using xf86-video-ati and Mesa for OpenGL support. I've been
> doing this for years as well, but with nouveau and w/out 3d support. Was
> kinda hoping that the Radeon cards I got a hold of would allow for
> hardware accelerated 3d on my dom0.
You should be able to use 3D as well - with those said patches.
>
> I just tried adding 'nopat' to the kernel command line. I remember doing
> that a year ago... don't recall why. Any chance that helps?
Heh. So you want the inverse with a patch that fixes the PAT wreaking
havoc.
>
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@gdeb.com
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
>
>
>
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-20 15:38 ` Konrad Rzeszutek Wilk
@ 2014-01-20 20:15 ` Michael D Labriola
2014-01-21 21:59 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael D Labriola @ 2014-01-20 20:15 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
10:38:27 AM:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> To: Michael D Labriola <mlabriol@gdeb.com>,
> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> Date: 01/20/2014 10:38 AM
> Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>
> On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola wrote:
> > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014 10:14:36
AM:
> >
> > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > Date: 01/20/2014 10:14 AM
> > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >
> > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola wrote:
> > > > Anyone here running a dom0 w/ Radeon DRM? I'm having consistent
> > crashes
> > > > with multiple older R600 series (HD 6470 and HD 6570) and unusably
> > slow
> > > > graphics with a newer HD7000 (can see each line refresh
indiviually on
> >
> > > > radeonfb tty). All 3 systems seem to work fine bare metal.
> > >
> > > I hadn't been using DRM, just Xserver. Is that what you mean?
> >
> > The R600 problems happen when in X, using OpenGL, on my dom0. The
> > RadeonSI sluggishness is when using the KMS framebuffer device for a
plain
> > text console login.
>
> So sluggish is probably due to the PAT not being enabled. This patch
> should be applied:
>
> lkml.org/lkml/2011/11/8/406
>
> (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
>
> and these two reverted:
>
> "xen/pat: Disable PAT support for now."
> "xen/pat: Disable PAT using pat_enabled value."
>
> Which is to say do:
>
> git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
Thanks! I cherry-picked that patch out of your testing tree, reverted
those 2 commits, recompiled and installed. Definitely fixed the HD 7000
sluggishness and appears to have fixed the R600 crashes (although it's
only been running a few hours).
How come that patch didn't get into mainline? It looks pretty innocuous
to me...
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-20 20:15 ` Michael D Labriola
@ 2014-01-21 21:59 ` Konrad Rzeszutek Wilk
2014-01-23 16:54 ` Michael D Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-21 21:59 UTC (permalink / raw)
To: Michael D Labriola; +Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel
On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> 10:38:27 AM:
>
> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > To: Michael D Labriola <mlabriol@gdeb.com>,
> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > Date: 01/20/2014 10:38 AM
> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >
> > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola wrote:
> > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014 10:14:36
> AM:
> > >
> > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > > Date: 01/20/2014 10:14 AM
> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > >
> > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola wrote:
> > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having consistent
> > > crashes
> > > > > with multiple older R600 series (HD 6470 and HD 6570) and unusably
>
> > > slow
> > > > > graphics with a newer HD7000 (can see each line refresh
> indiviually on
> > >
> > > > > radeonfb tty). All 3 systems seem to work fine bare metal.
> > > >
> > > > I hadn't been using DRM, just Xserver. Is that what you mean?
> > >
> > > The R600 problems happen when in X, using OpenGL, on my dom0. The
> > > RadeonSI sluggishness is when using the KMS framebuffer device for a
> plain
> > > text console login.
> >
> > So sluggish is probably due to the PAT not being enabled. This patch
> > should be applied:
> >
> > lkml.org/lkml/2011/11/8/406
> >
> > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> >
> > and these two reverted:
> >
> > "xen/pat: Disable PAT support for now."
> > "xen/pat: Disable PAT using pat_enabled value."
> >
> > Which is to say do:
> >
> > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
>
> Thanks! I cherry-picked that patch out of your testing tree, reverted
> those 2 commits, recompiled and installed. Definitely fixed the HD 7000
> sluggishness and appears to have fixed the R600 crashes (although it's
> only been running a few hours).
>
> How come that patch didn't get into mainline? It looks pretty innocuous
> to me...
<Sigh> the x86 maintainers wanted a different route. And I hadn't had
the chance nor time to implement it.
>
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@gdeb.com
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-21 21:59 ` Konrad Rzeszutek Wilk
@ 2014-01-23 16:54 ` Michael D Labriola
2014-01-24 14:49 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael D Labriola @ 2014-01-23 16:54 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel-bounces, xen-devel
xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> To: Michael D Labriola <mlabriol@gdeb.com>,
> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> Date: 01/21/2014 04:59 PM
> Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> Sent by: xen-devel-bounces@lists.xen.org
>
> On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> > 10:38:27 AM:
> >
> > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > Date: 01/20/2014 10:38 AM
> > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >
> > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola wrote:
> > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
10:14:36
> > AM:
> > > >
> > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > > > Date: 01/20/2014 10:14 AM
> > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > >
> > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
wrote:
> > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
consistent
> > > > crashes
> > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
unusably
> >
> > > > slow
> > > > > > graphics with a newer HD7000 (can see each line refresh
> > indiviually on
> > > >
> > > > > > radeonfb tty). All 3 systems seem to work fine bare metal.
> > > > >
> > > > > I hadn't been using DRM, just Xserver. Is that what you mean?
> > > >
> > > > The R600 problems happen when in X, using OpenGL, on my dom0. The
> > > > RadeonSI sluggishness is when using the KMS framebuffer device for
a
> > plain
> > > > text console login.
> > >
> > > So sluggish is probably due to the PAT not being enabled. This patch
> > > should be applied:
> > >
> > > lkml.org/lkml/2011/11/8/406
> > >
> > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > >
> > > and these two reverted:
> > >
> > > "xen/pat: Disable PAT support for now."
> > > "xen/pat: Disable PAT using pat_enabled value."
> > >
> > > Which is to say do:
> > >
> > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> >
> > Thanks! I cherry-picked that patch out of your testing tree, reverted
> > those 2 commits, recompiled and installed. Definitely fixed the HD
7000
> > sluggishness and appears to have fixed the R600 crashes (although it's
> > only been running a few hours).
> >
> > How come that patch didn't get into mainline? It looks pretty
innocuous
> > to me...
>
> <Sigh> the x86 maintainers wanted a different route. And I hadn't had
> the chance nor time to implement it.
I see. Well, I've got a handful of boxes in my lab that need that patch
to be usable. If you do come up with a more mainline-able solution, I'd
gladly test it for you. ;-)
Thanks again, by the way.
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-23 16:54 ` Michael D Labriola
@ 2014-01-24 14:49 ` Konrad Rzeszutek Wilk
2014-02-11 15:35 ` Michael D Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-01-24 14:49 UTC (permalink / raw)
To: Michael D Labriola
Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel-bounces, xen-devel
On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
>
> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > To: Michael D Labriola <mlabriol@gdeb.com>,
> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > Date: 01/21/2014 04:59 PM
> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > Sent by: xen-devel-bounces@lists.xen.org
> >
> > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> > > 10:38:27 AM:
> > >
> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > > Date: 01/20/2014 10:38 AM
> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > >
> > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola wrote:
> > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
> 10:14:36
> > > AM:
> > > > >
> > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > > > > Date: 01/20/2014 10:14 AM
> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > >
> > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
> wrote:
> > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
> consistent
> > > > > crashes
> > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
> unusably
> > >
> > > > > slow
> > > > > > > graphics with a newer HD7000 (can see each line refresh
> > > indiviually on
> > > > >
> > > > > > > radeonfb tty). All 3 systems seem to work fine bare metal.
> > > > > >
> > > > > > I hadn't been using DRM, just Xserver. Is that what you mean?
> > > > >
> > > > > The R600 problems happen when in X, using OpenGL, on my dom0. The
>
> > > > > RadeonSI sluggishness is when using the KMS framebuffer device for
> a
> > > plain
> > > > > text console login.
> > > >
> > > > So sluggish is probably due to the PAT not being enabled. This patch
> > > > should be applied:
> > > >
> > > > lkml.org/lkml/2011/11/8/406
> > > >
> > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > > >
> > > > and these two reverted:
> > > >
> > > > "xen/pat: Disable PAT support for now."
> > > > "xen/pat: Disable PAT using pat_enabled value."
> > > >
> > > > Which is to say do:
> > > >
> > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> > >
> > > Thanks! I cherry-picked that patch out of your testing tree, reverted
>
> > > those 2 commits, recompiled and installed. Definitely fixed the HD
> 7000
> > > sluggishness and appears to have fixed the R600 crashes (although it's
>
> > > only been running a few hours).
> > >
> > > How come that patch didn't get into mainline? It looks pretty
> innocuous
> > > to me...
> >
> > <Sigh> the x86 maintainers wanted a different route. And I hadn't had
> > the chance nor time to implement it.
>
> I see. Well, I've got a handful of boxes in my lab that need that patch
> to be usable. If you do come up with a more mainline-able solution, I'd
> gladly test it for you. ;-)
Thank you!
>
> Thanks again, by the way.
>
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@gdeb.com
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-01-24 14:49 ` Konrad Rzeszutek Wilk
@ 2014-02-11 15:35 ` Michael D Labriola
2014-02-19 17:04 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael D Labriola @ 2014-02-11 15:35 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel-bounces, xen-devel
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
09:49:38 AM:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> To: Michael D Labriola <mlabriol@gdeb.com>,
> Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
> bounces@lists.xen.org
> Date: 01/24/2014 09:50 AM
> Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>
> On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
> >
> > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > Date: 01/21/2014 04:59 PM
> > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > Sent by: xen-devel-bounces@lists.xen.org
> > >
> > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> > > > 10:38:27 AM:
> > > >
> > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > > > Date: 01/20/2014 10:38 AM
> > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > >
> > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
wrote:
> > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
> > 10:14:36
> > > > AM:
> > > > > >
> > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > > > > > Date: 01/20/2014 10:14 AM
> > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > > >
> > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
> > wrote:
> > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
> > consistent
> > > > > > crashes
> > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
> > unusably
> > > >
> > > > > > slow
> > > > > > > > graphics with a newer HD7000 (can see each line refresh
> > > > indiviually on
> > > > > >
> > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
metal.
> > > > > > >
> > > > > > > I hadn't been using DRM, just Xserver. Is that what you
mean?
> > > > > >
> > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
The
> >
> > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
for
> > a
> > > > plain
> > > > > > text console login.
> > > > >
> > > > > So sluggish is probably due to the PAT not being enabled. This
patch
> > > > > should be applied:
> > > > >
> > > > > lkml.org/lkml/2011/11/8/406
> > > > >
> > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > > > >
> > > > > and these two reverted:
> > > > >
> > > > > "xen/pat: Disable PAT support for now."
> > > > > "xen/pat: Disable PAT using pat_enabled value."
> > > > >
> > > > > Which is to say do:
> > > > >
> > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> > > >
> > > > Thanks! I cherry-picked that patch out of your testing tree,
reverted
> >
> > > > those 2 commits, recompiled and installed. Definitely fixed the
HD
> > 7000
> > > > sluggishness and appears to have fixed the R600 crashes (although
it's
> >
> > > > only been running a few hours).
> > > >
> > > > How come that patch didn't get into mainline? It looks pretty
> > innocuous
> > > > to me...
> > >
> > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
had
> > > the chance nor time to implement it.
> >
> > I see. Well, I've got a handful of boxes in my lab that need that
patch
> > to be usable. If you do come up with a more mainline-able solution,
I'd
> > gladly test it for you. ;-)
>
> Thank you!
Uh, oh. Looks like those reverts and patches didn't entirely fix my
problem. My box with the HD5450 (r600 gallium3d) started going bonkers
again yeserday. After being solid as a rock for 2 weeks as my primary
workstation, X has crashed a half dozen or so times so far this week. I've
been in Xen with 2 paravirtual linux guests running almost constantly for
this whole period. I don't understand what's changed, but my system has
been entirely unstable now. I did recompile my kernel... but I all did
was merge the v3.13.1 stable commit into my working tree and turn a few
things on (netfilter, wifi, a couple drivers turned on here and there). I
just went and verified that those patches are still applied in my tree
(i.e., I didn't accidentally undo them). I'm scratching my head (and
staring at a TTY login).
When X crashes, my kernel log prints a couple dozen iterations of this. 3d
acceleration no longer functions unless I reboot. If memory serves, the
unpatched behavior upon X crash was that the kernel continued to spew
these errors until the whole box locked up. At least that's not happening
any more... ;-)
[ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
[ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
(r:-12)!
[ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
[ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
(r:-12)!
[ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
GEM object (8192, 2, 4096, -12)
and here's a slightly different variant that happened while I was typing
this email (on a different machine, luckily):
[ 3107.713039] sdf: detected capacity change from 31625052160 to 0
[ 3114.491717] usb 9-1: USB disconnect, device number 2
[64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
[64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
(r:-12)!
[64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
[64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
(r:-12)!
[64348.297561] [TTM] Buffer eviction failed
[64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
[64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
(r:-12)!
[64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
GEM object (16384, 2, 4096, -12)
Any ideas?
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-11 15:35 ` Michael D Labriola
@ 2014-02-19 17:04 ` Konrad Rzeszutek Wilk
2014-02-19 19:33 ` Michael Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-19 17:04 UTC (permalink / raw)
To: Michael D Labriola
Cc: Konrad Rzeszutek Wilk, michael.d.labriola, xen-devel-bounces, xen-devel
On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
> 09:49:38 AM:
>
> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > To: Michael D Labriola <mlabriol@gdeb.com>,
> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
> > bounces@lists.xen.org
> > Date: 01/24/2014 09:50 AM
> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >
> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
> > >
> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > > Date: 01/21/2014 04:59 PM
> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > Sent by: xen-devel-bounces@lists.xen.org
> > > >
> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
>
> > > > > 10:38:27 AM:
> > > > >
> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > > > > > Date: 01/20/2014 10:38 AM
> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > >
> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
> wrote:
> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
> > > 10:14:36
> > > > > AM:
> > > > > > >
> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> > > > > > > > Date: 01/20/2014 10:14 AM
> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > > > > > > >
> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
>
> > > wrote:
> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
> > > consistent
> > > > > > > crashes
> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
> > > unusably
> > > > >
> > > > > > > slow
> > > > > > > > > graphics with a newer HD7000 (can see each line refresh
> > > > > indiviually on
> > > > > > >
> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
> metal.
> > > > > > > >
> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you
> mean?
> > > > > > >
> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
> The
> > >
> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
> for
> > > a
> > > > > plain
> > > > > > > text console login.
> > > > > >
> > > > > > So sluggish is probably due to the PAT not being enabled. This
> patch
> > > > > > should be applied:
> > > > > >
> > > > > > lkml.org/lkml/2011/11/8/406
> > > > > >
> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > > > > >
> > > > > > and these two reverted:
> > > > > >
> > > > > > "xen/pat: Disable PAT support for now."
> > > > > > "xen/pat: Disable PAT using pat_enabled value."
> > > > > >
> > > > > > Which is to say do:
> > > > > >
> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> > > > >
> > > > > Thanks! I cherry-picked that patch out of your testing tree,
> reverted
> > >
> > > > > those 2 commits, recompiled and installed. Definitely fixed the
> HD
> > > 7000
> > > > > sluggishness and appears to have fixed the R600 crashes (although
> it's
> > >
> > > > > only been running a few hours).
> > > > >
> > > > > How come that patch didn't get into mainline? It looks pretty
> > > innocuous
> > > > > to me...
> > > >
> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
> had
> > > > the chance nor time to implement it.
> > >
> > > I see. Well, I've got a handful of boxes in my lab that need that
> patch
> > > to be usable. If you do come up with a more mainline-able solution,
> I'd
> > > gladly test it for you. ;-)
> >
> > Thank you!
>
> Uh, oh. Looks like those reverts and patches didn't entirely fix my
> problem. My box with the HD5450 (r600 gallium3d) started going bonkers
> again yeserday. After being solid as a rock for 2 weeks as my primary
> workstation, X has crashed a half dozen or so times so far this week. I've
> been in Xen with 2 paravirtual linux guests running almost constantly for
> this whole period. I don't understand what's changed, but my system has
> been entirely unstable now. I did recompile my kernel... but I all did
> was merge the v3.13.1 stable commit into my working tree and turn a few
> things on (netfilter, wifi, a couple drivers turned on here and there). I
> just went and verified that those patches are still applied in my tree
> (i.e., I didn't accidentally undo them). I'm scratching my head (and
> staring at a TTY login).
>
> When X crashes, my kernel log prints a couple dozen iterations of this. 3d
> acceleration no longer functions unless I reboot. If memory serves, the
> unpatched behavior upon X crash was that the kernel continued to spew
> these errors until the whole box locked up. At least that's not happening
> any more... ;-)
>
> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> (r:-12)!
> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> (r:-12)!
> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> GEM object (8192, 2, 4096, -12)
>
> and here's a slightly different variant that happened while I was typing
> this email (on a different machine, luckily):
>
> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
> [ 3114.491717] usb 9-1: USB disconnect, device number 2
> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> (r:-12)!
> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> (r:-12)!
> [64348.297561] [TTM] Buffer eviction failed
> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> (r:-12)!
> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> GEM object (16384, 2, 4096, -12)
>
> Any ideas?
yes. I believe you have a memory leak. As in, some driver (or X) is
eating up the memory and not giving up enough. That means the TTM
layer is hitting its ceiling of how much memory it can allocate.
Now finding the culprit is going to be a bit hard.
You could use:
[root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
pool refills pages freed inuse available name
wc 259 224 808 4 nouveau 0000:05:00.0
cached 3403058 13561071 51158 3 radeon 0000:01:00.0
cached 25 0 96 4 nouveau 0000:05:00.0
to figure out if my thinking is really true. You should have a huge
'inuse' count and almost no 'available'.
But that will get us just to confirm that yes - you have a big usage
of memory and it is hitting the ceiling.
Now to actually figure out which application is hanging on these - that
I am not sure about. I think there is some drm info tool to investigate
how many pages each application is using. You can leave it running and
see which app is gulping up the memory. But I am not sure which
tool that is (if there was one).
Well, lets do one step at a time - see if my theory is correct first.
>
>
> ---
> Michael D Labriola
> Electric Boat
> mlabriol@gdeb.com
> 401-848-8871 (desk)
> 401-848-8513 (lab)
> 401-316-9844 (cell)
>
>
>
>
>
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-19 17:04 ` Konrad Rzeszutek Wilk
@ 2014-02-19 19:33 ` Michael Labriola
2014-02-19 19:57 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael Labriola @ 2014-02-19 19:33 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, xen-devel-bounces, Michael D Labriola, xen-devel
On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
>> 09:49:38 AM:
>>
>> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > To: Michael D Labriola <mlabriol@gdeb.com>,
>> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
>> > bounces@lists.xen.org
>> > Date: 01/24/2014 09:50 AM
>> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> >
>> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
>> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
>> > >
>> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
>> > > > Date: 01/21/2014 04:59 PM
>> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> > > > Sent by: xen-devel-bounces@lists.xen.org
>> > > >
>> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
>> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
>>
>> > > > > 10:38:27 AM:
>> > > > >
>> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
>> > > > > > Date: 01/20/2014 10:38 AM
>> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> > > > > >
>> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
>> wrote:
>> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
>> > > 10:14:36
>> > > > > AM:
>> > > > > > >
>> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
>> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> > > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
>> > > > > > > > Date: 01/20/2014 10:14 AM
>> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> > > > > > > >
>> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
>>
>> > > wrote:
>> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
>> > > consistent
>> > > > > > > crashes
>> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
>> > > unusably
>> > > > >
>> > > > > > > slow
>> > > > > > > > > graphics with a newer HD7000 (can see each line refresh
>> > > > > indiviually on
>> > > > > > >
>> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
>> metal.
>> > > > > > > >
>> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you
>> mean?
>> > > > > > >
>> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
>> The
>> > >
>> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
>> for
>> > > a
>> > > > > plain
>> > > > > > > text console login.
>> > > > > >
>> > > > > > So sluggish is probably due to the PAT not being enabled. This
>> patch
>> > > > > > should be applied:
>> > > > > >
>> > > > > > lkml.org/lkml/2011/11/8/406
>> > > > > >
>> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
>> > > > > >
>> > > > > > and these two reverted:
>> > > > > >
>> > > > > > "xen/pat: Disable PAT support for now."
>> > > > > > "xen/pat: Disable PAT using pat_enabled value."
>> > > > > >
>> > > > > > Which is to say do:
>> > > > > >
>> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
>> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
>> > > > >
>> > > > > Thanks! I cherry-picked that patch out of your testing tree,
>> reverted
>> > >
>> > > > > those 2 commits, recompiled and installed. Definitely fixed the
>> HD
>> > > 7000
>> > > > > sluggishness and appears to have fixed the R600 crashes (although
>> it's
>> > >
>> > > > > only been running a few hours).
>> > > > >
>> > > > > How come that patch didn't get into mainline? It looks pretty
>> > > innocuous
>> > > > > to me...
>> > > >
>> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
>> had
>> > > > the chance nor time to implement it.
>> > >
>> > > I see. Well, I've got a handful of boxes in my lab that need that
>> patch
>> > > to be usable. If you do come up with a more mainline-able solution,
>> I'd
>> > > gladly test it for you. ;-)
>> >
>> > Thank you!
>>
>> Uh, oh. Looks like those reverts and patches didn't entirely fix my
>> problem. My box with the HD5450 (r600 gallium3d) started going bonkers
>> again yeserday. After being solid as a rock for 2 weeks as my primary
>> workstation, X has crashed a half dozen or so times so far this week. I've
>> been in Xen with 2 paravirtual linux guests running almost constantly for
>> this whole period. I don't understand what's changed, but my system has
>> been entirely unstable now. I did recompile my kernel... but I all did
>> was merge the v3.13.1 stable commit into my working tree and turn a few
>> things on (netfilter, wifi, a couple drivers turned on here and there). I
>> just went and verified that those patches are still applied in my tree
>> (i.e., I didn't accidentally undo them). I'm scratching my head (and
>> staring at a TTY login).
>>
>> When X crashes, my kernel log prints a couple dozen iterations of this. 3d
>> acceleration no longer functions unless I reboot. If memory serves, the
>> unpatched behavior upon X crash was that the kernel continued to spew
>> these errors until the whole box locked up. At least that's not happening
>> any more... ;-)
>>
>> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
>> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> (r:-12)!
>> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> (r:-12)!
>> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
>> GEM object (8192, 2, 4096, -12)
>>
>> and here's a slightly different variant that happened while I was typing
>> this email (on a different machine, luckily):
>>
>> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
>> [ 3114.491717] usb 9-1: USB disconnect, device number 2
>> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
>> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> (r:-12)!
>> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> (r:-12)!
>> [64348.297561] [TTM] Buffer eviction failed
>> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> (r:-12)!
>> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
>> GEM object (16384, 2, 4096, -12)
>>
>> Any ideas?
>
> yes. I believe you have a memory leak. As in, some driver (or X) is
> eating up the memory and not giving up enough. That means the TTM
> layer is hitting its ceiling of how much memory it can allocate.
>
> Now finding the culprit is going to be a bit hard.
>
> You could use:
>
> [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
> pool refills pages freed inuse available name
> wc 259 224 808 4 nouveau 0000:05:00.0
> cached 3403058 13561071 51158 3 radeon 0000:01:00.0
> cached 25 0 96 4 nouveau 0000:05:00.0
>
> to figure out if my thinking is really true. You should have a huge
> 'inuse' count and almost no 'available'.
My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which appear to
always have the same contents. Is that normal?
My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist bare
metal... only in Xen. Is that normal?
pool refills pages freed inuse available name
cached 15190 59551 1205 4 radeon 0000:01:00.0
If I watch that file while creating xterms, moving them around, etc, I can
see the number available fluctuate between 3 and 6. This is true, even on
my box w/ the newer R7 card in it, which hasn't gotten that GEM error
message (yet?).
>
> But that will get us just to confirm that yes - you have a big usage
> of memory and it is hitting the ceiling.
>
> Now to actually figure out which application is hanging on these - that
> I am not sure about. I think there is some drm info tool to investigate
> how many pages each application is using. You can leave it running and
> see which app is gulping up the memory. But I am not sure which
> tool that is (if there was one).
>
> Well, lets do one step at a time - see if my theory is correct first.
--
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
401-848-8871 (work)
401-234-1306 (home)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-19 19:33 ` Michael Labriola
@ 2014-02-19 19:57 ` Konrad Rzeszutek Wilk
2014-02-19 20:08 ` Michael Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-19 19:57 UTC (permalink / raw)
To: Michael Labriola
Cc: Konrad Rzeszutek Wilk, xen-devel-bounces, Michael D Labriola, xen-devel
On Wed, Feb 19, 2014 at 02:33:26PM -0500, Michael Labriola wrote:
> On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
> >> 09:49:38 AM:
> >>
> >> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
> >> > bounces@lists.xen.org
> >> > Date: 01/24/2014 09:50 AM
> >> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> >
> >> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> >> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
> >> > >
> >> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> >> > > > Date: 01/21/2014 04:59 PM
> >> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> > > > Sent by: xen-devel-bounces@lists.xen.org
> >> > > >
> >> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> >> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> >>
> >> > > > > 10:38:27 AM:
> >> > > > >
> >> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> >> > > > > > Date: 01/20/2014 10:38 AM
> >> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> > > > > >
> >> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
> >> wrote:
> >> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
> >> > > 10:14:36
> >> > > > > AM:
> >> > > > > > >
> >> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> >> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> > > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> >> > > > > > > > Date: 01/20/2014 10:14 AM
> >> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> > > > > > > >
> >> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
> >>
> >> > > wrote:
> >> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
> >> > > consistent
> >> > > > > > > crashes
> >> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
> >> > > unusably
> >> > > > >
> >> > > > > > > slow
> >> > > > > > > > > graphics with a newer HD7000 (can see each line refresh
> >> > > > > indiviually on
> >> > > > > > >
> >> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
> >> metal.
> >> > > > > > > >
> >> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you
> >> mean?
> >> > > > > > >
> >> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
> >> The
> >> > >
> >> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
> >> for
> >> > > a
> >> > > > > plain
> >> > > > > > > text console login.
> >> > > > > >
> >> > > > > > So sluggish is probably due to the PAT not being enabled. This
> >> patch
> >> > > > > > should be applied:
> >> > > > > >
> >> > > > > > lkml.org/lkml/2011/11/8/406
> >> > > > > >
> >> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> >> > > > > >
> >> > > > > > and these two reverted:
> >> > > > > >
> >> > > > > > "xen/pat: Disable PAT support for now."
> >> > > > > > "xen/pat: Disable PAT using pat_enabled value."
> >> > > > > >
> >> > > > > > Which is to say do:
> >> > > > > >
> >> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> >> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> >> > > > >
> >> > > > > Thanks! I cherry-picked that patch out of your testing tree,
> >> reverted
> >> > >
> >> > > > > those 2 commits, recompiled and installed. Definitely fixed the
> >> HD
> >> > > 7000
> >> > > > > sluggishness and appears to have fixed the R600 crashes (although
> >> it's
> >> > >
> >> > > > > only been running a few hours).
> >> > > > >
> >> > > > > How come that patch didn't get into mainline? It looks pretty
> >> > > innocuous
> >> > > > > to me...
> >> > > >
> >> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
> >> had
> >> > > > the chance nor time to implement it.
> >> > >
> >> > > I see. Well, I've got a handful of boxes in my lab that need that
> >> patch
> >> > > to be usable. If you do come up with a more mainline-able solution,
> >> I'd
> >> > > gladly test it for you. ;-)
> >> >
> >> > Thank you!
> >>
> >> Uh, oh. Looks like those reverts and patches didn't entirely fix my
> >> problem. My box with the HD5450 (r600 gallium3d) started going bonkers
> >> again yeserday. After being solid as a rock for 2 weeks as my primary
> >> workstation, X has crashed a half dozen or so times so far this week. I've
> >> been in Xen with 2 paravirtual linux guests running almost constantly for
> >> this whole period. I don't understand what's changed, but my system has
> >> been entirely unstable now. I did recompile my kernel... but I all did
> >> was merge the v3.13.1 stable commit into my working tree and turn a few
> >> things on (netfilter, wifi, a couple drivers turned on here and there). I
> >> just went and verified that those patches are still applied in my tree
> >> (i.e., I didn't accidentally undo them). I'm scratching my head (and
> >> staring at a TTY login).
> >>
> >> When X crashes, my kernel log prints a couple dozen iterations of this. 3d
> >> acceleration no longer functions unless I reboot. If memory serves, the
> >> unpatched behavior upon X crash was that the kernel continued to spew
> >> these errors until the whole box locked up. At least that's not happening
> >> any more... ;-)
> >>
> >> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
> >> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> (r:-12)!
> >> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> (r:-12)!
> >> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> >> GEM object (8192, 2, 4096, -12)
> >>
> >> and here's a slightly different variant that happened while I was typing
> >> this email (on a different machine, luckily):
> >>
> >> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
> >> [ 3114.491717] usb 9-1: USB disconnect, device number 2
> >> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
> >> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> (r:-12)!
> >> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> (r:-12)!
> >> [64348.297561] [TTM] Buffer eviction failed
> >> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> (r:-12)!
> >> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> >> GEM object (16384, 2, 4096, -12)
> >>
> >> Any ideas?
> >
> > yes. I believe you have a memory leak. As in, some driver (or X) is
> > eating up the memory and not giving up enough. That means the TTM
> > layer is hitting its ceiling of how much memory it can allocate.
> >
> > Now finding the culprit is going to be a bit hard.
> >
> > You could use:
> >
> > [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
> > pool refills pages freed inuse available name
> > wc 259 224 808 4 nouveau 0000:05:00.0
> > cached 3403058 13561071 51158 3 radeon 0000:01:00.0
> > cached 25 0 96 4 nouveau 0000:05:00.0
> >
> > to figure out if my thinking is really true. You should have a huge
> > 'inuse' count and almost no 'available'.
>
> My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which appear to
> always have the same contents. Is that normal?
Yes.
>
> My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist bare
> metal... only in Xen. Is that normal?
It would show up on baremetal if you boot with 'iommu=soft'
>
> pool refills pages freed inuse available name
> cached 15190 59551 1205 4 radeon 0000:01:00.0
>
> If I watch that file while creating xterms, moving them around, etc, I can
> see the number available fluctuate between 3 and 6. This is true, even on
> my box w/ the newer R7 card in it, which hasn't gotten that GEM error
> message (yet?).
OK, so lets see what happens when the error shows. Incidentally - what amount of
memory does your initial domain have? And is it different then when you
boot it as a baremetal?
Thank you.
>
>
> >
> > But that will get us just to confirm that yes - you have a big usage
> > of memory and it is hitting the ceiling.
> >
> > Now to actually figure out which application is hanging on these - that
> > I am not sure about. I think there is some drm info tool to investigate
> > how many pages each application is using. You can leave it running and
> > see which app is gulping up the memory. But I am not sure which
> > tool that is (if there was one).
> >
> > Well, lets do one step at a time - see if my theory is correct first.
>
>
>
> --
> Michael D Labriola
> 21 Rip Van Winkle Cir
> Warwick, RI 02886
> 401-316-9844 (cell)
> 401-848-8871 (work)
> 401-234-1306 (home)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-19 19:57 ` Konrad Rzeszutek Wilk
@ 2014-02-19 20:08 ` Michael Labriola
2014-02-19 20:30 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 15+ messages in thread
From: Michael Labriola @ 2014-02-19 20:08 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, xen-devel-bounces, Michael D Labriola, xen-devel
On Wed, Feb 19, 2014 at 2:57 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Wed, Feb 19, 2014 at 02:33:26PM -0500, Michael Labriola wrote:
>> On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
>> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
>> >> 09:49:38 AM:
>> >>
>> >> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> > To: Michael D Labriola <mlabriol@gdeb.com>,
>> >> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> >> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
>> >> > bounces@lists.xen.org
>> >> > Date: 01/24/2014 09:50 AM
>> >> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> >> >
>> >> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
>> >> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
>> >> > >
>> >> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> >> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> >> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
>> >> > > > Date: 01/21/2014 04:59 PM
>> >> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> >> > > > Sent by: xen-devel-bounces@lists.xen.org
>> >> > > >
>> >> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
>> >> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
>> >>
>> >> > > > > 10:38:27 AM:
>> >> > > > >
>> >> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> >> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> >> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
>> >> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
>> >> > > > > > Date: 01/20/2014 10:38 AM
>> >> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> >> > > > > >
>> >> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
>> >> wrote:
>> >> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
>> >> > > 10:14:36
>> >> > > > > AM:
>> >> > > > > > >
>> >> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
>> >> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
>> >> > > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
>> >> > > > > > > > Date: 01/20/2014 10:14 AM
>> >> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>> >> > > > > > > >
>> >> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
>> >>
>> >> > > wrote:
>> >> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
>> >> > > consistent
>> >> > > > > > > crashes
>> >> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
>> >> > > unusably
>> >> > > > >
>> >> > > > > > > slow
>> >> > > > > > > > > graphics with a newer HD7000 (can see each line refresh
>> >> > > > > indiviually on
>> >> > > > > > >
>> >> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
>> >> metal.
>> >> > > > > > > >
>> >> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you
>> >> mean?
>> >> > > > > > >
>> >> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
>> >> The
>> >> > >
>> >> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
>> >> for
>> >> > > a
>> >> > > > > plain
>> >> > > > > > > text console login.
>> >> > > > > >
>> >> > > > > > So sluggish is probably due to the PAT not being enabled. This
>> >> patch
>> >> > > > > > should be applied:
>> >> > > > > >
>> >> > > > > > lkml.org/lkml/2011/11/8/406
>> >> > > > > >
>> >> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
>> >> > > > > >
>> >> > > > > > and these two reverted:
>> >> > > > > >
>> >> > > > > > "xen/pat: Disable PAT support for now."
>> >> > > > > > "xen/pat: Disable PAT using pat_enabled value."
>> >> > > > > >
>> >> > > > > > Which is to say do:
>> >> > > > > >
>> >> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
>> >> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
>> >> > > > >
>> >> > > > > Thanks! I cherry-picked that patch out of your testing tree,
>> >> reverted
>> >> > >
>> >> > > > > those 2 commits, recompiled and installed. Definitely fixed the
>> >> HD
>> >> > > 7000
>> >> > > > > sluggishness and appears to have fixed the R600 crashes (although
>> >> it's
>> >> > >
>> >> > > > > only been running a few hours).
>> >> > > > >
>> >> > > > > How come that patch didn't get into mainline? It looks pretty
>> >> > > innocuous
>> >> > > > > to me...
>> >> > > >
>> >> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
>> >> had
>> >> > > > the chance nor time to implement it.
>> >> > >
>> >> > > I see. Well, I've got a handful of boxes in my lab that need that
>> >> patch
>> >> > > to be usable. If you do come up with a more mainline-able solution,
>> >> I'd
>> >> > > gladly test it for you. ;-)
>> >> >
>> >> > Thank you!
>> >>
>> >> Uh, oh. Looks like those reverts and patches didn't entirely fix my
>> >> problem. My box with the HD5450 (r600 gallium3d) started going bonkers
>> >> again yeserday. After being solid as a rock for 2 weeks as my primary
>> >> workstation, X has crashed a half dozen or so times so far this week. I've
>> >> been in Xen with 2 paravirtual linux guests running almost constantly for
>> >> this whole period. I don't understand what's changed, but my system has
>> >> been entirely unstable now. I did recompile my kernel... but I all did
>> >> was merge the v3.13.1 stable commit into my working tree and turn a few
>> >> things on (netfilter, wifi, a couple drivers turned on here and there). I
>> >> just went and verified that those patches are still applied in my tree
>> >> (i.e., I didn't accidentally undo them). I'm scratching my head (and
>> >> staring at a TTY login).
>> >>
>> >> When X crashes, my kernel log prints a couple dozen iterations of this. 3d
>> >> acceleration no longer functions unless I reboot. If memory serves, the
>> >> unpatched behavior upon X crash was that the kernel continued to spew
>> >> these errors until the whole box locked up. At least that's not happening
>> >> any more... ;-)
>> >>
>> >> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
>> >> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> >> (r:-12)!
>> >> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> >> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> >> (r:-12)!
>> >> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
>> >> GEM object (8192, 2, 4096, -12)
>> >>
>> >> and here's a slightly different variant that happened while I was typing
>> >> this email (on a different machine, luckily):
>> >>
>> >> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
>> >> [ 3114.491717] usb 9-1: USB disconnect, device number 2
>> >> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
>> >> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> >> (r:-12)!
>> >> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> >> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> >> (r:-12)!
>> >> [64348.297561] [TTM] Buffer eviction failed
>> >> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
>> >> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
>> >> (r:-12)!
>> >> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
>> >> GEM object (16384, 2, 4096, -12)
>> >>
>> >> Any ideas?
>> >
>> > yes. I believe you have a memory leak. As in, some driver (or X) is
>> > eating up the memory and not giving up enough. That means the TTM
>> > layer is hitting its ceiling of how much memory it can allocate.
>> >
>> > Now finding the culprit is going to be a bit hard.
>> >
>> > You could use:
>> >
>> > [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
>> > pool refills pages freed inuse available name
>> > wc 259 224 808 4 nouveau 0000:05:00.0
>> > cached 3403058 13561071 51158 3 radeon 0000:01:00.0
>> > cached 25 0 96 4 nouveau 0000:05:00.0
>> >
>> > to figure out if my thinking is really true. You should have a huge
>> > 'inuse' count and almost no 'available'.
>>
>> My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which appear to
>> always have the same contents. Is that normal?
>
> Yes.
>>
>> My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist bare
>> metal... only in Xen. Is that normal?
>
> It would show up on baremetal if you boot with 'iommu=soft'
>
>>
>> pool refills pages freed inuse available name
>> cached 15190 59551 1205 4 radeon 0000:01:00.0
>>
>> If I watch that file while creating xterms, moving them around, etc, I can
>> see the number available fluctuate between 3 and 6. This is true, even on
>> my box w/ the newer R7 card in it, which hasn't gotten that GEM error
>> message (yet?).
>
> OK, so lets see what happens when the error shows. Incidentally - what amount of
> memory does your initial domain have? And is it different then when you
> boot it as a baremetal?
I've got the problem very reproducible on 3 boxes. All three are
booting the dom0 with as much RAM as Xen will give them, then giving
up some of their RAM as needed when I create domUs. The 3 boxes have
4G, 8G, and 16G if memory serves.
Does the amount of RAM on the actual video cards matter? All the
older cards (that crash all the time) have 2G, whereas the R7 that
hasn't crashed yet only has 1G.
I've been reproducing the crash by just logging in and out of fluxbox
via XDM over and over again right after booting my dom0 in Xen w/ no
guests running. That makes it happen within a few minutes. Otherwise
it randomly crashes while I'm in the middle of trying to work... ;-)
>
> Thank you.
>
>>
>>
>> >
>> > But that will get us just to confirm that yes - you have a big usage
>> > of memory and it is hitting the ceiling.
>> >
>> > Now to actually figure out which application is hanging on these - that
>> > I am not sure about. I think there is some drm info tool to investigate
>> > how many pages each application is using. You can leave it running and
>> > see which app is gulping up the memory. But I am not sure which
>> > tool that is (if there was one).
>> >
>> > Well, lets do one step at a time - see if my theory is correct first.
--
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
401-848-8871 (work)
401-234-1306 (home)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-19 20:08 ` Michael Labriola
@ 2014-02-19 20:30 ` Konrad Rzeszutek Wilk
2014-02-19 21:02 ` Michael D Labriola
0 siblings, 1 reply; 15+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-02-19 20:30 UTC (permalink / raw)
To: Michael Labriola
Cc: Konrad Rzeszutek Wilk, xen-devel-bounces, Michael D Labriola, xen-devel
On Wed, Feb 19, 2014 at 03:08:08PM -0500, Michael Labriola wrote:
> On Wed, Feb 19, 2014 at 2:57 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Wed, Feb 19, 2014 at 02:33:26PM -0500, Michael Labriola wrote:
> >> On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >> > On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola wrote:
> >> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/24/2014
> >> >> 09:49:38 AM:
> >> >>
> >> >> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> >> > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> >> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> >> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org, xen-devel-
> >> >> > bounces@lists.xen.org
> >> >> > Date: 01/24/2014 09:50 AM
> >> >> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> >> >
> >> >> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola wrote:
> >> >> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05 PM:
> >> >> > >
> >> >> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> >> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> >> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> >> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> >> >> > > > Date: 01/21/2014 04:59 PM
> >> >> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> >> > > > Sent by: xen-devel-bounces@lists.xen.org
> >> >> > > >
> >> >> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D Labriola wrote:
> >> >> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 01/20/2014
> >> >>
> >> >> > > > > 10:38:27 AM:
> >> >> > > > >
> >> >> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> >> >> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> >> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> >> >> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> >> >> > > > > > Date: 01/20/2014 10:38 AM
> >> >> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> >> > > > > >
> >> >> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D Labriola
> >> >> wrote:
> >> >> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote on 01/20/2014
> >> >> > > 10:14:36
> >> >> > > > > AM:
> >> >> > > > > > >
> >> >> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> >> >> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> >> >> > > > > > > > Cc: xen-devel@lists.xen.org, michael.d.labriola@gmail.com
> >> >> > > > > > > > Date: 01/20/2014 10:14 AM
> >> >> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> >> >> > > > > > > >
> >> >> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500, Michael D Labriola
> >> >>
> >> >> > > wrote:
> >> >> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm having
> >> >> > > consistent
> >> >> > > > > > > crashes
> >> >> > > > > > > > > with multiple older R600 series (HD 6470 and HD 6570) and
> >> >> > > unusably
> >> >> > > > >
> >> >> > > > > > > slow
> >> >> > > > > > > > > graphics with a newer HD7000 (can see each line refresh
> >> >> > > > > indiviually on
> >> >> > > > > > >
> >> >> > > > > > > > > radeonfb tty). All 3 systems seem to work fine bare
> >> >> metal.
> >> >> > > > > > > >
> >> >> > > > > > > > I hadn't been using DRM, just Xserver. Is that what you
> >> >> mean?
> >> >> > > > > > >
> >> >> > > > > > > The R600 problems happen when in X, using OpenGL, on my dom0.
> >> >> The
> >> >> > >
> >> >> > > > > > > RadeonSI sluggishness is when using the KMS framebuffer device
> >> >> for
> >> >> > > a
> >> >> > > > > plain
> >> >> > > > > > > text console login.
> >> >> > > > > >
> >> >> > > > > > So sluggish is probably due to the PAT not being enabled. This
> >> >> patch
> >> >> > > > > > should be applied:
> >> >> > > > > >
> >> >> > > > > > lkml.org/lkml/2011/11/8/406
> >> >> > > > > >
> >> >> > > > > > (or http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> >> >> > > > > >
> >> >> > > > > > and these two reverted:
> >> >> > > > > >
> >> >> > > > > > "xen/pat: Disable PAT support for now."
> >> >> > > > > > "xen/pat: Disable PAT using pat_enabled value."
> >> >> > > > > >
> >> >> > > > > > Which is to say do:
> >> >> > > > > >
> >> >> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> >> >> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> >> >> > > > >
> >> >> > > > > Thanks! I cherry-picked that patch out of your testing tree,
> >> >> reverted
> >> >> > >
> >> >> > > > > those 2 commits, recompiled and installed. Definitely fixed the
> >> >> HD
> >> >> > > 7000
> >> >> > > > > sluggishness and appears to have fixed the R600 crashes (although
> >> >> it's
> >> >> > >
> >> >> > > > > only been running a few hours).
> >> >> > > > >
> >> >> > > > > How come that patch didn't get into mainline? It looks pretty
> >> >> > > innocuous
> >> >> > > > > to me...
> >> >> > > >
> >> >> > > > <Sigh> the x86 maintainers wanted a different route. And I hadn't
> >> >> had
> >> >> > > > the chance nor time to implement it.
> >> >> > >
> >> >> > > I see. Well, I've got a handful of boxes in my lab that need that
> >> >> patch
> >> >> > > to be usable. If you do come up with a more mainline-able solution,
> >> >> I'd
> >> >> > > gladly test it for you. ;-)
> >> >> >
> >> >> > Thank you!
> >> >>
> >> >> Uh, oh. Looks like those reverts and patches didn't entirely fix my
> >> >> problem. My box with the HD5450 (r600 gallium3d) started going bonkers
> >> >> again yeserday. After being solid as a rock for 2 weeks as my primary
> >> >> workstation, X has crashed a half dozen or so times so far this week. I've
> >> >> been in Xen with 2 paravirtual linux guests running almost constantly for
> >> >> this whole period. I don't understand what's changed, but my system has
> >> >> been entirely unstable now. I did recompile my kernel... but I all did
> >> >> was merge the v3.13.1 stable commit into my working tree and turn a few
> >> >> things on (netfilter, wifi, a couple drivers turned on here and there). I
> >> >> just went and verified that those patches are still applied in my tree
> >> >> (i.e., I didn't accidentally undo them). I'm scratching my head (and
> >> >> staring at a TTY login).
> >> >>
> >> >> When X crashes, my kernel log prints a couple dozen iterations of this. 3d
> >> >> acceleration no longer functions unless I reboot. If memory serves, the
> >> >> unpatched behavior upon X crash was that the kernel continued to spew
> >> >> these errors until the whole box locked up. At least that's not happening
> >> >> any more... ;-)
> >> >>
> >> >> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
> >> >> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> >> (r:-12)!
> >> >> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> >> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> >> (r:-12)!
> >> >> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> >> >> GEM object (8192, 2, 4096, -12)
> >> >>
> >> >> and here's a slightly different variant that happened while I was typing
> >> >> this email (on a different machine, luckily):
> >> >>
> >> >> [ 3107.713039] sdf: detected capacity change from 31625052160 to 0
> >> >> [ 3114.491717] usb 9-1: USB disconnect, device number 2
> >> >> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
> >> >> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> >> (r:-12)!
> >> >> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> >> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> >> (r:-12)!
> >> >> [64348.297561] [TTM] Buffer eviction failed
> >> >> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
> >> >> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached pool
> >> >> (r:-12)!
> >> >> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to allocate
> >> >> GEM object (16384, 2, 4096, -12)
> >> >>
> >> >> Any ideas?
> >> >
> >> > yes. I believe you have a memory leak. As in, some driver (or X) is
> >> > eating up the memory and not giving up enough. That means the TTM
> >> > layer is hitting its ceiling of how much memory it can allocate.
> >> >
> >> > Now finding the culprit is going to be a bit hard.
> >> >
> >> > You could use:
> >> >
> >> > [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
> >> > pool refills pages freed inuse available name
> >> > wc 259 224 808 4 nouveau 0000:05:00.0
> >> > cached 3403058 13561071 51158 3 radeon 0000:01:00.0
> >> > cached 25 0 96 4 nouveau 0000:05:00.0
> >> >
> >> > to figure out if my thinking is really true. You should have a huge
> >> > 'inuse' count and almost no 'available'.
> >>
> >> My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which appear to
> >> always have the same contents. Is that normal?
> >
> > Yes.
> >>
> >> My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist bare
> >> metal... only in Xen. Is that normal?
> >
> > It would show up on baremetal if you boot with 'iommu=soft'
> >
> >>
> >> pool refills pages freed inuse available name
> >> cached 15190 59551 1205 4 radeon 0000:01:00.0
> >>
> >> If I watch that file while creating xterms, moving them around, etc, I can
> >> see the number available fluctuate between 3 and 6. This is true, even on
> >> my box w/ the newer R7 card in it, which hasn't gotten that GEM error
> >> message (yet?).
> >
> > OK, so lets see what happens when the error shows. Incidentally - what amount of
> > memory does your initial domain have? And is it different then when you
> > boot it as a baremetal?
>
> I've got the problem very reproducible on 3 boxes. All three are
> booting the dom0 with as much RAM as Xen will give them, then giving
> up some of their RAM as needed when I create domUs. The 3 boxes have
> 4G, 8G, and 16G if memory serves.
>
> Does the amount of RAM on the actual video cards matter? All the
> older cards (that crash all the time) have 2G, whereas the R7 that
> hasn't crashed yet only has 1G.
The TTM pool has a limit (a hard one). It is pretty simple:
pr_info("Zone %7s: Available graphics memory: %llu kiB\n",
394 zone->name, (unsigned long long)zone->max_mem >> 10);
395 }
396 ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
397 ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/(2*PAGE_SIZE));
so 1/4 of your memory. Which means that when boot dom0 with as much
memory as possible and then balloon down you might confuse it
(as the initial memory assumption is done during bootup).
If you boot the troubled dom0s with 'dom0_mem_max' set to some good
number - that might shed some light on this.
>
> I've been reproducing the crash by just logging in and out of fluxbox
> via XDM over and over again right after booting my dom0 in Xen w/ no
> guests running. That makes it happen within a few minutes. Otherwise
> it randomly crashes while I'm in the middle of trying to work... ;-)
HA!
Does fluxbox use a lot of graphic? I mean does it do a lot of fancy
things when it starts and shuts itself?
>
> >
> > Thank you.
> >
> >>
> >>
> >> >
> >> > But that will get us just to confirm that yes - you have a big usage
> >> > of memory and it is hitting the ceiling.
> >> >
> >> > Now to actually figure out which application is hanging on these - that
> >> > I am not sure about. I think there is some drm info tool to investigate
> >> > how many pages each application is using. You can leave it running and
> >> > see which app is gulping up the memory. But I am not sure which
> >> > tool that is (if there was one).
> >> >
> >> > Well, lets do one step at a time - see if my theory is correct first.
>
> --
> Michael D Labriola
> 21 Rip Van Winkle Cir
> Warwick, RI 02886
> 401-316-9844 (cell)
> 401-848-8871 (work)
> 401-234-1306 (home)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Radeon DRM dom0 issues
2014-02-19 20:30 ` Konrad Rzeszutek Wilk
@ 2014-02-19 21:02 ` Michael D Labriola
0 siblings, 0 replies; 15+ messages in thread
From: Michael D Labriola @ 2014-02-19 21:02 UTC (permalink / raw)
To: Konrad Rzeszutek Wilk
Cc: Konrad Rzeszutek Wilk, Michael Labriola, xen-devel-bounces, xen-devel
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on 02/19/2014
03:30:07 PM:
> From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> To: Michael Labriola <michael.d.labriola@gmail.com>,
> Cc: Michael D Labriola <mlabriol@gdeb.com>, Konrad Rzeszutek Wilk
> <konrad@darnok.org>, xen-devel@lists.xen.org,
xen-devel-bounces@lists.xen.org
> Date: 02/19/2014 03:30 PM
> Subject: Re: [Xen-devel] Radeon DRM dom0 issues
>
> On Wed, Feb 19, 2014 at 03:08:08PM -0500, Michael Labriola wrote:
> > On Wed, Feb 19, 2014 at 2:57 PM, Konrad Rzeszutek Wilk
> > <konrad.wilk@oracle.com> wrote:
> > > On Wed, Feb 19, 2014 at 02:33:26PM -0500, Michael Labriola wrote:
> > >> On Wed, Feb 19, 2014 at 12:04 PM, Konrad Rzeszutek Wilk
> > >> <konrad.wilk@oracle.com> wrote:
> > >> > On Tue, Feb 11, 2014 at 10:35:18AM -0500, Michael D Labriola
wrote:
> > >> >> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote on
01/24/2014
> > >> >> 09:49:38 AM:
> > >> >>
> > >> >> > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > >> >> > To: Michael D Labriola <mlabriol@gdeb.com>,
> > >> >> > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > >> >> > michael.d.labriola@gmail.com, xen-devel@lists.xen.org,
xen-devel-
> > >> >> > bounces@lists.xen.org
> > >> >> > Date: 01/24/2014 09:50 AM
> > >> >> > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >> >> >
> > >> >> > On Thu, Jan 23, 2014 at 11:54:37AM -0500, Michael D Labriola
wrote:
> > >> >> > > xen-devel-bounces@lists.xen.org wrote on 01/21/2014 04:59:05
PM:
> > >> >> > >
> > >> >> > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > >> >> > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > >> >> > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > >> >> > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > >> >> > > > Date: 01/21/2014 04:59 PM
> > >> >> > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >> >> > > > Sent by: xen-devel-bounces@lists.xen.org
> > >> >> > > >
> > >> >> > > > On Mon, Jan 20, 2014 at 03:15:24PM -0500, Michael D
> Labriola wrote:
> > >> >> > > > > Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote
> on 01/20/2014
> > >> >>
> > >> >> > > > > 10:38:27 AM:
> > >> >> > > > >
> > >> >> > > > > > From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > >> >> > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > >> >> > > > > > Cc: Konrad Rzeszutek Wilk <konrad@darnok.org>,
> > >> >> > > > > > michael.d.labriola@gmail.com, xen-devel@lists.xen.org
> > >> >> > > > > > Date: 01/20/2014 10:38 AM
> > >> >> > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >> >> > > > > >
> > >> >> > > > > > On Mon, Jan 20, 2014 at 10:26:22AM -0500, Michael D
Labriola
> > >> >> wrote:
> > >> >> > > > > > > Konrad Rzeszutek Wilk <konrad@darnok.org> wrote
on01/20/2014
> > >> >> > > 10:14:36
> > >> >> > > > > AM:
> > >> >> > > > > > >
> > >> >> > > > > > > > From: Konrad Rzeszutek Wilk <konrad@darnok.org>
> > >> >> > > > > > > > To: Michael D Labriola <mlabriol@gdeb.com>,
> > >> >> > > > > > > > Cc: xen-devel@lists.xen.org,
michael.d.labriola@gmail.com
> > >> >> > > > > > > > Date: 01/20/2014 10:14 AM
> > >> >> > > > > > > > Subject: Re: [Xen-devel] Radeon DRM dom0 issues
> > >> >> > > > > > > >
> > >> >> > > > > > > > On Mon, Jan 20, 2014 at 09:58:32AM -0500,
> Michael D Labriola
> > >> >>
> > >> >> > > wrote:
> > >> >> > > > > > > > > Anyone here running a dom0 w/ Radeon DRM? I'm
having
> > >> >> > > consistent
> > >> >> > > > > > > crashes
> > >> >> > > > > > > > > with multiple older R600 series (HD 6470 and
> HD 6570) and
> > >> >> > > unusably
> > >> >> > > > >
> > >> >> > > > > > > slow
> > >> >> > > > > > > > > graphics with a newer HD7000 (can see each line
refresh
> > >> >> > > > > indiviually on
> > >> >> > > > > > >
> > >> >> > > > > > > > > radeonfb tty). All 3 systems seem to work fine
bare
> > >> >> metal.
> > >> >> > > > > > > >
> > >> >> > > > > > > > I hadn't been using DRM, just Xserver. Is that
what you
> > >> >> mean?
> > >> >> > > > > > >
> > >> >> > > > > > > The R600 problems happen when in X, using OpenGL,
> on my dom0.
> > >> >> The
> > >> >> > >
> > >> >> > > > > > > RadeonSI sluggishness is when using the KMS
> framebuffer device
> > >> >> for
> > >> >> > > a
> > >> >> > > > > plain
> > >> >> > > > > > > text console login.
> > >> >> > > > > >
> > >> >> > > > > > So sluggish is probably due to the PAT not being
enabled. This
> > >> >> patch
> > >> >> > > > > > should be applied:
> > >> >> > > > > >
> > >> >> > > > > > lkml.org/lkml/2011/11/8/406
> > >> >> > > > > >
> > >> >> > > > > > (or
http://marc.info/?l=linux-kernel&m=132888833209874&w=2)
> > >> >> > > > > >
> > >> >> > > > > > and these two reverted:
> > >> >> > > > > >
> > >> >> > > > > > "xen/pat: Disable PAT support for now."
> > >> >> > > > > > "xen/pat: Disable PAT using pat_enabled value."
> > >> >> > > > > >
> > >> >> > > > > > Which is to say do:
> > >> >> > > > > >
> > >> >> > > > > > git revert c79c49826270b8b0061b2fca840fc3f013c8a78a
> > >> >> > > > > > git revert 8eaffa67b43e99ae581622c5133e20b0f48bcef1
> > >> >> > > > >
> > >> >> > > > > Thanks! I cherry-picked that patch out of your testing
tree,
> > >> >> reverted
> > >> >> > >
> > >> >> > > > > those 2 commits, recompiled and installed.
Definitelyfixed the
> > >> >> HD
> > >> >> > > 7000
> > >> >> > > > > sluggishness and appears to have fixed the R600
> crashes (although
> > >> >> it's
> > >> >> > >
> > >> >> > > > > only been running a few hours).
> > >> >> > > > >
> > >> >> > > > > How come that patch didn't get into mainline? It looks
pretty
> > >> >> > > innocuous
> > >> >> > > > > to me...
> > >> >> > > >
> > >> >> > > > <Sigh> the x86 maintainers wanted a different route. And I
hadn't
> > >> >> had
> > >> >> > > > the chance nor time to implement it.
> > >> >> > >
> > >> >> > > I see. Well, I've got a handful of boxes in my lab that
need that
> > >> >> patch
> > >> >> > > to be usable. If you do come up with a more
mainline-ablesolution,
> > >> >> I'd
> > >> >> > > gladly test it for you. ;-)
> > >> >> >
> > >> >> > Thank you!
> > >> >>
> > >> >> Uh, oh. Looks like those reverts and patches didn't entirely
fix my
> > >> >> problem. My box with the HD5450 (r600 gallium3d) started going
bonkers
> > >> >> again yeserday. After being solid as a rock for 2 weeks as my
primary
> > >> >> workstation, X has crashed a half dozen or so times so far
> this week. I've
> > >> >> been in Xen with 2 paravirtual linux guests running almost
> constantly for
> > >> >> this whole period. I don't understand what's changed, but my
system has
> > >> >> been entirely unstable now. I did recompile my kernel... but I
all did
> > >> >> was merge the v3.13.1 stable commit into my working tree and
turn a few
> > >> >> things on (netfilter, wifi, a couple drivers turned on here
> and there). I
> > >> >> just went and verified that those patches are still applied in
my tree
> > >> >> (i.e., I didn't accidentally undo them). I'm scratching my head
(and
> > >> >> staring at a TTY login).
> > >> >>
> > >> >> When X crashes, my kernel log prints a couple dozen iterations
> of this. 3d
> > >> >> acceleration no longer functions unless I reboot. If memory
serves, the
> > >> >> unpatched behavior upon X crash was that the kernel continued to
spew
> > >> >> these errors until the whole box locked up. At least that's
> not happening
> > >> >> any more... ;-)
> > >> >>
> > >> >> [ 702.070084] [TTM] radeon 0000:01:00.0: Unable to get page 2
> > >> >> [ 702.075971] [TTM] radeon 0000:01:00.0: Failed to fill cached
pool
> > >> >> (r:-12)!
> > >> >> [ 704.720699] [TTM] radeon 0000:01:00.0: Unable to get page 0
> > >> >> [ 704.726635] [TTM] radeon 0000:01:00.0: Failed to fill cached
pool
> > >> >> (r:-12)!
> > >> >> [ 704.733910] [drm:radeon_gem_object_create] *ERROR* Failed to
allocate
> > >> >> GEM object (8192, 2, 4096, -12)
> > >> >>
> > >> >> and here's a slightly different variant that happened while I
was typing
> > >> >> this email (on a different machine, luckily):
> > >> >>
> > >> >> [ 3107.713039] sdf: detected capacity change from 31625052160 to
0
> > >> >> [ 3114.491717] usb 9-1: USB disconnect, device number 2
> > >> >> [64348.271534] [TTM] radeon 0000:01:00.0: Unable to get page 3
> > >> >> [64348.277312] [TTM] radeon 0000:01:00.0: Failed to fill cached
pool
> > >> >> (r:-12)!
> > >> >> [64348.284470] [TTM] radeon 0000:01:00.0: Unable to get page 0
> > >> >> [64348.290257] [TTM] radeon 0000:01:00.0: Failed to fill cached
pool
> > >> >> (r:-12)!
> > >> >> [64348.297561] [TTM] Buffer eviction failed
> > >> >> [64349.550518] [TTM] radeon 0000:01:00.0: Unable to get page 0
> > >> >> [64349.556417] [TTM] radeon 0000:01:00.0: Failed to fill cached
pool
> > >> >> (r:-12)!
> > >> >> [64349.563714] [drm:radeon_gem_object_create] *ERROR* Failed to
allocate
> > >> >> GEM object (16384, 2, 4096, -12)
> > >> >>
> > >> >> Any ideas?
> > >> >
> > >> > yes. I believe you have a memory leak. As in, some driver (or X)
is
> > >> > eating up the memory and not giving up enough. That means the TTM
> > >> > layer is hitting its ceiling of how much memory it can allocate.
> > >> >
> > >> > Now finding the culprit is going to be a bit hard.
> > >> >
> > >> > You could use:
> > >> >
> > >> > [root@phenom 1]# cat /sys/kernel/debug/dri/1/ttm_dma_page_pool
> > >> > pool refills pages freed inuse available name
> > >> > wc 259 224 808 4
> nouveau 0000:05:00.0
> > >> > cached 3403058 13561071 51158 3
> radeon 0000:01:00.0
> > >> > cached 25 0 96 4
> nouveau 0000:05:00.0
> > >> >
> > >> > to figure out if my thinking is really true. You should have a
huge
> > >> > 'inuse' count and almost no 'available'.
> > >>
> > >> My /sys/kernel/debug/dri directory has a 0 and a 64 entry, which
appear to
> > >> always have the same contents. Is that normal?
> > >
> > > Yes.
> > >>
> > >> My /sys/kernel/debug/dri/0/ttm_dma_page_pool file doesn't exist
bare
> > >> metal... only in Xen. Is that normal?
> > >
> > > It would show up on baremetal if you boot with 'iommu=soft'
> > >
> > >>
> > >> pool refills pages freed inuse available name
> > >> cached 15190 59551 1205 4 radeon
> 0000:01:00.0
> > >>
> > >> If I watch that file while creating xterms, moving them around,
etc, I can
> > >> see the number available fluctuate between 3 and 6. This is true,
even on
> > >> my box w/ the newer R7 card in it, which hasn't gotten that GEM
error
> > >> message (yet?).
> > >
> > > OK, so lets see what happens when the error shows. Incidentally -
> what amount of
> > > memory does your initial domain have? And is it different then when
you
> > > boot it as a baremetal?
> >
> > I've got the problem very reproducible on 3 boxes. All three are
> > booting the dom0 with as much RAM as Xen will give them, then giving
> > up some of their RAM as needed when I create domUs. The 3 boxes have
> > 4G, 8G, and 16G if memory serves.
Actually, they're 6G, 8G, and 16G... and I've got a box that I can't
reproduce the problem on even though it's got the same video card... and
it only has 2G of RAM. Could this be a PAE/HIHGMEM issue? I'm running
32bit with CONFIG_HIGHMEM64G on all my boxes.
> >
> > Does the amount of RAM on the actual video cards matter? All the
> > older cards (that crash all the time) have 2G, whereas the R7 that
> > hasn't crashed yet only has 1G.
>
> The TTM pool has a limit (a hard one). It is pretty simple:
>
>
> pr_info("Zone %7s: Available graphics memory: %llu kiB\n",
> 394 zone->name, (unsigned long long)
> zone->max_mem >> 10);
> 395 }
> 396 ttm_page_alloc_init(glob, glob->zone_kernel->max_mem/
> (2*PAGE_SIZE));
> 397 ttm_dma_page_alloc_init(glob, glob->zone_kernel->max_mem/
> (2*PAGE_SIZE));
>
> so 1/4 of your memory. Which means that when boot dom0 with as much
> memory as possible and then balloon down you might confuse it
> (as the initial memory assumption is done during bootup).
>
> If you boot the troubled dom0s with 'dom0_mem_max' set to some good
> number - that might shed some light on this.
Ok, I've got one of the problematic boxes booted with dom0_mem=5G and it
doesn't seem to be crashing. Fingers crossed!
>
>
> >
> > I've been reproducing the crash by just logging in and out of fluxbox
> > via XDM over and over again right after booting my dom0 in Xen w/ no
> > guests running. That makes it happen within a few minutes. Otherwise
> > it randomly crashes while I'm in the middle of trying to work... ;-)
>
> HA!
>
> Does fluxbox use a lot of graphic? I mean does it do a lot of fancy
> things when it starts and shuts itself?
Negative. It does next to nothing. Super light weight, pretty much just
gets rid of the login box and puts a taskbar-type-thing on the bottom of
the screen. I'd say the majority of my crashes have happened in
Enlightenment (with plenty of extra fancy things), but it HAS happened in
fluxbox doing next to nothing. Which was pretty surprising.
---
Michael D Labriola
Electric Boat
mlabriol@gdeb.com
401-848-8871 (desk)
401-848-8513 (lab)
401-316-9844 (cell)
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-02-19 21:02 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-20 14:58 Radeon DRM dom0 issues Michael D Labriola
2014-01-20 15:14 ` Konrad Rzeszutek Wilk
2014-01-20 15:26 ` Michael D Labriola
2014-01-20 15:38 ` Konrad Rzeszutek Wilk
2014-01-20 20:15 ` Michael D Labriola
2014-01-21 21:59 ` Konrad Rzeszutek Wilk
2014-01-23 16:54 ` Michael D Labriola
2014-01-24 14:49 ` Konrad Rzeszutek Wilk
2014-02-11 15:35 ` Michael D Labriola
2014-02-19 17:04 ` Konrad Rzeszutek Wilk
2014-02-19 19:33 ` Michael Labriola
2014-02-19 19:57 ` Konrad Rzeszutek Wilk
2014-02-19 20:08 ` Michael Labriola
2014-02-19 20:30 ` Konrad Rzeszutek Wilk
2014-02-19 21:02 ` Michael D Labriola
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.