All of lore.kernel.org
 help / color / mirror / Atom feed
* Nouveau on dom0
@ 2010-02-25  8:46 Arvind R
  2010-02-25 12:55 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-02-25  8:46 UTC (permalink / raw)
  To: xen-devel

Hi all,
I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
got X working on dom0 - but only with option "ShadowFB" set. Using Xorg-7.5,
mesa-git, libdrm-git xf86-video-nouveau-git, xen-testing-git and
qemu-dm-git. Ensured dependencies by deb-packaging everything. Kernel
built without drm and the nouveau driver built as a separate
out-of-tree modules package- to ensure
correct ttm modules. Tried WinXP and debianetch domUs - which worked fine.

On bare-metal boot, everything works - even 3D accelerated rendering. But
when booted on Xen, X works ONLY - as mentioned - with ShadowFB set,
which in turn, turns off even 2D acceleration. The only difference in the boots
is that bare-metal boot has 2GB RAM whereas dom0 has 512M. The graphics
card is a nVidia GeForce 9400GT, and the distro is basically debian lenny.

Turned debug on in the nouveau driver and patched some into libdrm and
compared the outputs on bare-metal and xen boot. Identical output upto
problem point - only differing fields were time-stamp, process pid, and
grobj allocation addresses.

Problem Point:
libdrm has an inlined function OUT_RING, defined in
nouveau/nouveau_pushbuf.h.
static __inline__ void
OUT_RING(struct nouveau_channel *chan, unsigned data)
{
    *(chan->cur++) = (data);
}
- chan->cur is a uint32_t *

The function is entered by X through ScrnInit in the DDX driver.
Patched log-message on entry is written to syslog, and then -
X seems to get suspended. chan->cur can be read on entry,
so (assumed) suspension is on write. System loses consoles,
but can be ssh'ed into - no killed processes, no segfault.

The area pointed to be the pushbuf - which is apparently the
PRAMIN area on the graphics card. Modern graphics is not
my forte - so I am seeking some pointers to resolve this from
anyone. I think that if this is solved, Xen would have open-source
3D-acceleration support! Am game for testing, patching, etc.

I am basically interested in having a develepment domU and
another testing domU without devel-packages.

Arvind R.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-02-25  8:46 Nouveau on dom0 Arvind R
@ 2010-02-25 12:55 ` Konrad Rzeszutek Wilk
  2010-02-25 17:01   ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-02-25 12:55 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
> Hi all,
> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
> got X working on dom0 - but only with option "ShadowFB" set. Using Xorg-7.5,
> mesa-git, libdrm-git xf86-video-nouveau-git, xen-testing-git and
> qemu-dm-git. Ensured dependencies by deb-packaging everything. Kernel
> built without drm and the nouveau driver built as a separate
> out-of-tree modules package- to ensure
> correct ttm modules. Tried WinXP and debianetch domUs - which worked fine.
> 
> On bare-metal boot, everything works - even 3D accelerated rendering. But
> when booted on Xen, X works ONLY - as mentioned - with ShadowFB set,
> which in turn, turns off even 2D acceleration. The only difference in the boots
> is that bare-metal boot has 2GB RAM whereas dom0 has 512M. The graphics
> card is a nVidia GeForce 9400GT, and the distro is basically debian lenny.
> 
> Turned debug on in the nouveau driver and patched some into libdrm and
> compared the outputs on bare-metal and xen boot. Identical output upto
> problem point - only differing fields were time-stamp, process pid, and
> grobj allocation addresses.
> 
> Problem Point:
> libdrm has an inlined function OUT_RING, defined in
> nouveau/nouveau_pushbuf.h.
> static __inline__ void
> OUT_RING(struct nouveau_channel *chan, unsigned data)
> {
>     *(chan->cur++) = (data);
> }
> - chan->cur is a uint32_t *
> 
> The function is entered by X through ScrnInit in the DDX driver.
> Patched log-message on entry is written to syslog, and then -
> X seems to get suspended. chan->cur can be read on entry,
> so (assumed) suspension is on write. System loses consoles,
> but can be ssh'ed into - no killed processes, no segfault.
> 
> The area pointed to be the pushbuf - which is apparently the
> PRAMIN area on the graphics card. Modern graphics is not
> my forte - so I am seeking some pointers to resolve this from

So this looks to assume that the ring is contingous, which it probably
is not. Would it be possible to trace down who allocates that *chan? You
say it is 'PRAMIN' - is that allocated via pci_alloc_* call?

Or is the address retrieved from an ioctl call made in user-space?

> anyone. I think that if this is solved, Xen would have open-source
> 3D-acceleration support! Am game for testing, patching, etc.

Neat!
> 
> I am basically interested in having a develepment domU and
> another testing domU without devel-packages.

You lost me here. Don't you mean Dom0?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-02-25 12:55 ` Konrad Rzeszutek Wilk
@ 2010-02-25 17:01   ` Arvind R
  2010-02-25 17:44     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-02-25 17:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
>> Hi all,
>> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
======= snip =======
> is not. Would it be possible to trace down who allocates that *chan? You
> say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
>
> Or is the address retrieved from an ioctl call made in user-space?
Both true, I guess.

chan is GFP_KERNEL allocated. My current understanding is that
chan->cur, at the end of a lot of initialization, points to specific
areas of card
memory which forms a command ring. What gets written is 32-bits which
encode pointers to contexts and methods already associated with that
specific channel. Each of possibly many channels have their own independent
Command FIFOs (RINGS) and associations.

So, there must be a mmap call somewhere to map the area to user-space
for that problem write to work on non-Xen boots. Will try track down some more
and post. With mmaps and PCIGARTs - it will be some hunt!

>> another testing domU without devel-packages.
>
> You lost me here. Don't you mean Dom0?
>
Let's say virtual appliances - for which one needs dom0!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-02-25 17:01   ` Arvind R
@ 2010-02-25 17:44     ` Konrad Rzeszutek Wilk
  2010-02-26 15:34       ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-02-25 17:44 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote:
> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
> >> Hi all,
> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
> ======= snip =======
> > is not. Would it be possible to trace down who allocates that *chan? You
> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
> >
> > Or is the address retrieved from an ioctl call made in user-space?
> Both true, I guess.
> 
> chan is GFP_KERNEL allocated. My current understanding is that
> chan->cur, at the end of a lot of initialization, points to specific
> areas of card
> memory which forms a command ring. What gets written is 32-bits which
> encode pointers to contexts and methods already associated with that
> specific channel. Each of possibly many channels have their own independent
> Command FIFOs (RINGS) and associations.
> 
> So, there must be a mmap call somewhere to map the area to user-space
> for that problem write to work on non-Xen boots. Will try track down some more
> and post. With mmaps and PCIGARTs - it will be some hunt!


You might want to look also at the source code of the nouveu X driver.
I remember looking at the radeon one, where it made an drmScatterMap
call, saved it, and then later submitted that address via an ioctl call
to the drm_radeon driver which used it as a ring buffer. Took a bit of
hoping around to find who allocated it in the first place.

> 
> >> another testing domU without devel-packages.
> >
> > You lost me here. Don't you mean Dom0?
> >
> Let's say virtual appliances - for which one needs dom0!
Ah yes.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-02-25 17:44     ` Konrad Rzeszutek Wilk
@ 2010-02-26 15:34       ` Arvind R
  2010-03-01 16:01         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-02-26 15:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Thu, Feb 25, 2010 at 11:14 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote:
>> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
>> >> Hi all,
>> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
>> ======= snip =======
>> > is not. Would it be possible to trace down who allocates that *chan? You
>> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
======= snip =======
>> So, there must be a mmap call somewhere to map the area to user-space
>> for that problem write to work on non-Xen boots. Will try track down some more
>> and post. With mmaps and PCIGARTs - it will be some hunt!
 ======= snip =======
> to the drm_radeon driver which used it as a ring buffer. Took a bit of
> hoping around to find who allocated it in the first place.
>
After a lot of reboots and log viewing:
The pushbuf (FIFO/RING) is the only means of programming the card DMA
activity. It is exposed to user-space by mmap of the drm_device (PCI) handle
with different offsets for each channel. Parameters are associated to the DMA
command using ioctls to bind channels/sub-channels/contexts. This mmap is
in the libdrm2 library. Libdrm channel/accelerator  initialization and
setup chores
 and the DDX driver (xf86-video-nouveau) more-or-less acts thro' libdrm.

My suspicion is that Xen has some problems with mmap of PCI(E) device
memory. How is iomem handled in a mmap?

As of now, accelerator on Xen stops right at the initialisation stage - when
libdrm tries to set up the accelerator-engine in the course of ScreenInit. And
to do that, it cannot write the command to setup the basic 2D engine.

Suggestions?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-02-26 15:34       ` Arvind R
@ 2010-03-01 16:01         ` Konrad Rzeszutek Wilk
  2010-03-02 21:34           ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-01 16:01 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

On Fri, Feb 26, 2010 at 09:04:33PM +0530, Arvind R wrote:
> On Thu, Feb 25, 2010 at 11:14 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote:
> >> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
> >> >> Hi all,
> >> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
> >> ======= snip =======
> >> > is not. Would it be possible to trace down who allocates that *chan? You
> >> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
> ======= snip =======
> >> So, there must be a mmap call somewhere to map the area to user-space
> >> for that problem write to work on non-Xen boots. Will try track down some more
> >> and post. With mmaps and PCIGARTs - it will be some hunt!
>  ======= snip =======
> > to the drm_radeon driver which used it as a ring buffer. Took a bit of
> > hoping around to find who allocated it in the first place.
> >
> After a lot of reboots and log viewing:
> The pushbuf (FIFO/RING) is the only means of programming the card DMA
> activity. It is exposed to user-space by mmap of the drm_device (PCI) handle
> with different offsets for each channel. Parameters are associated to the DMA
> command using ioctls to bind channels/sub-channels/contexts. This mmap is
> in the libdrm2 library. Libdrm channel/accelerator  initialization and
> setup chores
>  and the DDX driver (xf86-video-nouveau) more-or-less acts thro' libdrm.

Ok, that is the DRM_NOUVEAU_CHANNEL_ALLOC ioctl, which ends up calling
the 'ttm_bo_init'. I remember Pasi having an issue with this on Radeon
and I provided a hack to see if it would work. Take a look at this
e-mail:

http://lists.xensource.com/archives/cgi-bin/extract-mesg.cgi?a=xen-devel&m=2010-01&i=20100115071856.GD17978%40reaktio.net

> 
> My suspicion is that Xen has some problems with mmap of PCI(E) device
> memory. How is iomem handled in a mmap?

It looks to be using 'ioremap' which is Xen safe. Unless your card has
an AGP bridge on it, at which point it would end up using
dma_alloc_coherent in all likehood. 

> 
> As of now, accelerator on Xen stops right at the initialisation stage - when
> libdrm tries to set up the accelerator-engine in the course of ScreenInit. And
> to do that, it cannot write the command to setup the basic 2D engine.

I think that the ttm_bo calls set up pages in the 4KB size, but the
initial channel requests a 64KB one. I think it also sets up 
page-table directory so that when the GPU accesses the addresses, it
gets the real bus address. I wonder if it fails at that thought -
meaning that the addresses that are written to the page table are
actually the guest page numbers (gpfn) instead of the machine page numbers (mfn).

The other issue might be that your back-port broke the AGP allocation.
It needs to be:
35 #define alloc_gatt_pages(order) ({ \
 36         char *_t; dma_addr_t _d; \
 37         _t = dma_alloc_coherent(NULL, PAGE_SIZE<<(order), &_d, GFP_KERNEL); \
 38         _t; })

But that is less likely.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-01 16:01         ` Konrad Rzeszutek Wilk
@ 2010-03-02 21:34           ` Arvind R
  2010-03-03 17:11             ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-02 21:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Mon, Mar 1, 2010 at 9:31 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Feb 26, 2010 at 09:04:33PM +0530, Arvind R wrote:
>> On Thu, Feb 25, 2010 at 11:14 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote:
>> >> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
>> >> <konrad.wilk@oracle.com> wrote:
>> >> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
>> >> >> Hi all,
>> >> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
>> >> ======= snip =======
>> >> > is not. Would it be possible to trace down who allocates that *chan? You
>> >> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
>> ======= snip =======
>> >> So, there must be a mmap call somewhere to map the area to user-space
>> >> for that problem write to work on non-Xen boots. Will try track down some more
>> >> and post. With mmaps and PCIGARTs - it will be some hunt!
>>  ======= snip =======
>> > to the drm_radeon driver which used it as a ring buffer. Took a bit of
>> > hoping around to find who allocated it in the first place.
>> >
>> After a lot of reboots and log viewing:
>> The pushbuf (FIFO/RING) is the only means of programming the card DMA
>> activity. It is exposed to user-space by mmap of the drm_device (PCI) handle
>> with different offsets for each channel. Parameters are associated to the DMA
>> command using ioctls to bind channels/sub-channels/contexts. This mmap is
>> in the libdrm2 library. Libdrm channel/accelerator  initialization and
>> setup chores
>>  and the DDX driver (xf86-video-nouveau) more-or-less acts thro' libdrm.
>
> Ok, that is the DRM_NOUVEAU_CHANNEL_ALLOC ioctl, which ends up calling
> the 'ttm_bo_init'. I remember Pasi having an issue with this on Radeon
> and I provided a hack to see if it would work. Take a look at this
> e-mail:
>
> http://lists.xensource.com/archives/cgi-bin/extract-mesg.cgi?a=xen-devel&m=2010-01&i=20100115071856.GD17978%40reaktio.net
>
>>
>> My suspicion is that Xen has some problems with mmap of PCI(E) device
>> memory. How is iomem handled in a mmap?
>
> It looks to be using 'ioremap' which is Xen safe. Unless your card has
> an AGP bridge on it, at which point it would end up using
> dma_alloc_coherent in all likehood.
>
>>
>> As of now, accelerator on Xen stops right at the initialisation stage - when
>> libdrm tries to set up the accelerator-engine in the course of ScreenInit. And
>> to do that, it cannot write the command to setup the basic 2D engine.
>
> I think that the ttm_bo calls set up pages in the 4KB size, but the
> initial channel requests a 64KB one. I think it also sets up

Got that far, tried some dirty patches of mine which broke the framebuffer
Your ttm patch using dma_alloc_coherent instead of alloc_page resulted in
the same problem as with the Radeon report - leaking pages, erroneous page count

> page-table directory so that when the GPU accesses the addresses, it
> gets the real bus address. I wonder if it fails at that thought -
> meaning that the addresses that are written to the page table are
> actually the guest page numbers (gpfn) instead of the machine page numbers (mfn).

No, I don't think thats how it works. The user-space write triggers an
aio-write -
I got that in a trace that my patch caused - which page_faults and leads to
the ttm_bo_fault. I tried to alloc_pages in  ttm_bo_vm_fault but I think I got
the remap_pfn_range address parameter wrong. This patch crashed the same
way under bare boot as on xen with_or_without the patch! So it is
clearly the mmap
of pushbuf thats the block. ttm_bo_vm_fault is the pivot for the
pushbuf_bo allocation

My patch in ttm_bo_vm_fault:
if (io_mem) {
    /* retain the orig. speculative pre-fault code */
    ...
}
else {
    /* ttm_bo_get_pages is modified __ttm_tt_get_page using alloc_pages
        Irrespective of where fault occurs, fault-in the whole buffer */
    pages = ttm_bo_get_pages(ttm, get_order(bo->num_pages));
    pfn = page_to_pfn(page);
    remap_pfn_range(vma, bo->buffer_start, pfn, bo->num_pages << PAGE_SHIFT,
              vma->vm_page_prot); /* Triggers Kernel BUG invalid opcode */
}

BTW,  ttm_bo_vm_fault is the ONLY user of vm_insert_mixed in the kernel tree!

Tried to use split_page() - resulted in undefined symbol!

> The other issue might be that your back-port broke the AGP allocation.
>
Nope - untouched and same.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-02 21:34           ` Arvind R
@ 2010-03-03 17:11             ` Arvind R
  2010-03-03 18:13               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-03 17:11 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Wed, Mar 3, 2010 at 3:04 AM, Arvind R <arvino55@gmail.com> wrote:
> On Mon, Mar 1, 2010 at 9:31 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Fri, Feb 26, 2010 at 09:04:33PM +0530, Arvind R wrote:
>>> On Thu, Feb 25, 2010 at 11:14 PM, Konrad Rzeszutek Wilk
>>> <konrad.wilk@oracle.com> wrote:
>>> > On Thu, Feb 25, 2010 at 09:01:48AM -0800, Arvind R wrote:
>>> >> On Thu, Feb 25, 2010 at 6:25 PM, Konrad Rzeszutek Wilk
>>> >> <konrad.wilk@oracle.com> wrote:
>>> >> > On Thu, Feb 25, 2010 at 02:16:07PM +0530, Arvind R wrote:
>>> >> >> I merged the drm-tree from 2.6.33-rc8 into jeremy's 2.6.31.6 master and
>>> >> ======= snip =======
>>> >> > is not. Would it be possible to trace down who allocates that *chan? You
>>> >> > say it is 'PRAMIN' - is that allocated via pci_alloc_* call?
>>> ======= snip =======
>>> >> So, there must be a mmap call somewhere to map the area to user-space
>>> >> for that problem write to work on non-Xen boots. Will try track down some more
>>> >> and post. With mmaps and PCIGARTs - it will be some hunt!
>>>  ======= snip =======
>>> > to the drm_radeon driver which used it as a ring buffer. Took a bit of
>>> > hoping around to find who allocated it in the first place.
>>> >
>>> The pushbuf (FIFO/RING) is the only means of programming the card DMA

>> the 'ttm_bo_init'. I remember Pasi having an issue with this on Radeon
>> and I provided a hack to see if it would work. Take a look at this
>> e-mail:
>>
>> http://lists.xensource.com/archives/cgi-bin/extract-mesg.cgi?a=xen-devel&m=2010-01&i=20100115071856.GD17978%40reaktio.net
>>
>>>

>> It looks to be using 'ioremap' which is Xen safe. Unless your card has
>> an AGP bridge on it, at which point it would end up using
>> dma_alloc_coherent in all likehood.

Can't do that - some later allocations are huge.

>>>
>>> As of now, accelerator on Xen stops right at the initialisation stage - when

>> I think that the ttm_bo calls set up pages in the 4KB size, but the
>> initial channel requests a 64KB one. I think it also sets up

> Your ttm patch using dma_alloc_coherent instead of alloc_page resulted in
> the same problem as with the Radeon report - leaking pages, erroneous page count

>> page-table directory so that when the GPU accesses the addresses, it
>> gets the real bus address. I wonder if it fails at that thought -
>> meaning that the addresses that are written to the page table are
>> actually the guest page numbers (gpfn) instead of the machine page numbers (mfn).
>
> No, I don't think thats how it works. The user-space write triggers an
> aio-write -

which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
and finally ttm_bo_vm_fault.
ttm_bo_fault returns VM_FAULT_NOPAGE

 - but xen-boot keeps on re-triggering the same fault.
when vm_fault calls ttm_tt_get_page, the page is already there, and
the handler does another vm_insert_page (i changed vm_insert_mixed
vm_insert_page/pfn based on io_mem, now the only patch, and it works on
bare machine) on and on and on.

What can possibly cause the fault-handler to repeat endlessly?
If a wrong page is backed at the user-address, it should create bad_access or
some other subsequent events - but the system is running fine minus all local
consoles! If the insertion is to a wrong place, this can happen; but
the top-level
trap is the only provider of the address - and the fault addres and
vma address match,
and the same code works fine on bare-boot.

ttm_tt_get_page calls alloc in a loop - so it may allocate multiple pages from
start/end depending on Highmem memory or not - implying asynchronous allocation
and mapping.

All I want now is *ptr = (uint32_t)data to work as of now!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-03 17:11             ` Arvind R
@ 2010-03-03 18:13               ` Konrad Rzeszutek Wilk
  2010-03-04  9:17                 ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-03 18:13 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3051 bytes --]

> >> page-table directory so that when the GPU accesses the addresses, it
> >> gets the real bus address. I wonder if it fails at that thought -
> >> meaning that the addresses that are written to the page table are
> >> actually the guest page numbers (gpfn) instead of the machine page numbers (mfn).
> >
> > No, I don't think thats how it works. The user-space write triggers an
> > aio-write -
> 
> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
> and finally ttm_bo_vm_fault.
> ttm_bo_fault returns VM_FAULT_NOPAGE

VM_FAULT_NOPAGE = means retry the fault, In other words, I've fixed the
PTE to point to the right PFN.
> 
>  - but xen-boot keeps on re-triggering the same fault.

Which probably means that something is not OK with the PTE. What is the
vma->vm_page_prot value before the vm_insert_mixed? (and maybe even
after)

Try also reading the true value of the PTE and seeing what it shows
before and after the vm_insert_mixed.

I've attached a simple patch I wrote some time ago to get the real MFNs
and its page protection. I think you can adapt it (print_data function to be exact)
to peet at the PTE and its protection values.

There is an extra flag that the PTE can have when running under Xen: _PAGE_IOMAP.
This signifies that the PFN is actually the MFN. In this case thought
it sholdn't be enabled b/c the memory is actually gathered from
alloc_page. But if it is, it might be the culprit.


> when vm_fault calls ttm_tt_get_page, the page is already there, and
> the handler does another vm_insert_page (i changed vm_insert_mixed
> vm_insert_page/pfn based on io_mem, now the only patch, and it works on
> bare machine) on and on and on.
> 
> What can possibly cause the fault-handler to repeat endlessly?

The VM_FAULT_NOPAGE shortcircuits most of the fault-handler and makes it
return back. The application is resumed and retries the operation that
caused the fault - in this case an attempt to write to an address that
was not present. Obviously the second attempt at writing to the address
should have worked without problems.

> If a wrong page is backed at the user-address, it should create bad_access or
> some other subsequent events - but the system is running fine minus all local
> consoles! If the insertion is to a wrong place, this can happen; but
> the top-level
> trap is the only provider of the address - and the fault addres and
> vma address match,
> and the same code works fine on bare-boot.

So you see this fault handler being called endlessly while the machine
is still running and other pieces of code work just fine, right?

> 
> ttm_tt_get_page calls alloc in a loop - so it may allocate multiple pages from
> start/end depending on Highmem memory or not - implying asynchronous allocation
> and mapping.

I thought it had some logic to figure out that it already handled this
page and would return an already allocate page?

> 
> All I want now is *ptr = (uint32_t)data to work as of now!

You are doing a great job at this head-spinning detective work. Much
appreciated!

[-- Attachment #2: debug-print-pte.patch --]
[-- Type: text/plain, Size: 23820 bytes --]

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 634c40a..bbd0c36 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -133,8 +133,8 @@ int set_memory_wc(unsigned long addr, int numpages);
 int set_memory_wb(unsigned long addr, int numpages);
 int set_memory_x(unsigned long addr, int numpages);
 int set_memory_nx(unsigned long addr, int numpages);
-int set_memory_ro(unsigned long addr, int numpages);
-int set_memory_rw(unsigned long addr, int numpages);
+int set_memory_ro(unsigned long addr, int numpages, int debug);
+int set_memory_rw(unsigned long addr, int numpages, int debug);
 int set_memory_np(unsigned long addr, int numpages);
 int set_memory_4k(unsigned long addr, int numpages);
 
@@ -168,23 +168,23 @@ int set_pages_uc(struct page *page, int numpages);
 int set_pages_wb(struct page *page, int numpages);
 int set_pages_x(struct page *page, int numpages);
 int set_pages_nx(struct page *page, int numpages);
-int set_pages_ro(struct page *page, int numpages);
-int set_pages_rw(struct page *page, int numpages);
+int set_pages_ro(struct page *page, int numpages, int debug);
+int set_pages_rw(struct page *page, int numpages, int debug);
 
 
 void clflush_cache_range(void *addr, unsigned int size);
 
-#ifdef CONFIG_DEBUG_RODATA
+//#ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void);
 extern const int rodata_test_data;
 extern int kernel_set_to_readonly;
 void set_kernel_text_rw(void);
-void set_kernel_text_ro(void);
+void set_kernel_text_ro(void);/*
 #else
 static inline void set_kernel_text_rw(void) { }
 static inline void set_kernel_text_ro(void) { }
 #endif
-
+*/
 #ifdef CONFIG_DEBUG_RODATA_TEST
 int rodata_test(void);
 #else
diff --git a/arch/x86/include/asm/system.h b/arch/x86/include/asm/system.h
index ecb544e..ae78303 100644
--- a/arch/x86/include/asm/system.h
+++ b/arch/x86/include/asm/system.h
@@ -344,7 +344,7 @@ void cpu_idle_wait(void);
 
 extern unsigned long arch_align_stack(unsigned long sp);
 extern void free_init_pages(char *what, unsigned long begin, unsigned long end);
-
+extern void print_data(char *what, unsigned long addr);
 void default_idle(void);
 
 void stop_this_cpu(void *dummy);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index de7353c..11ee66f 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -472,11 +472,13 @@ void __init alternative_instructions(void)
 #endif
  	apply_paravirt(__parainstructions, __parainstructions_end);
 
-	if (smp_alt_once)
+	if (smp_alt_once) {
+		print_data("__smp_locks", (unsigned long)__smp_locks+1);
+		print_data("__smp_locks_end",(unsigned long)__smp_locks_end-1);
 		free_init_pages("SMP alternatives",
 				(unsigned long)__smp_locks,
 				(unsigned long)__smp_locks_end);
-
+	}
 	restart_nmi();
 }
 
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index d406c52..d03204f 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -13,7 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/tlb.h>
 #include <asm/proto.h>
-
+#include <asm/xen/page.h>
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
 unsigned long __initdata e820_table_start;
@@ -336,9 +336,81 @@ int devmem_is_allowed(unsigned long pagenr)
 	return 0;
 }
 
+void print_data(char *what, unsigned long addr)
+{
+	static const char * const level_name[] =
+	  { "NONE", "4K", "2M", "1G", "NUM" };
+	unsigned long pfn = virt_to_pfn(addr);
+	pte_t *pte;
+	pteval_t val;
+	unsigned int level;
+	unsigned offset;
+	unsigned long phys;
+	pgprotval_t prot;
+	char buf[40];
+	char *str;
+
+
+
+	// Gets the MFN.
+	pte  = lookup_address(addr, &level);
+	offset = addr & ~PAGE_MASK;
+
+	phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset;		
+	val = pte_val_ma(*pte);
+	prot = pgprot_val(pte_pgprot(*pte));
+
+	str = buf;
+	if (!prot)
+		str += sprintf(str, "Not present.");
+	else  {
+		if (prot & _PAGE_USER)
+			str += sprintf(str, "USR ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_RW)
+			str += sprintf(str, "RW ");
+		else
+			str += sprintf(str, "ro ");
+		if (prot & _PAGE_PWT)
+			str += sprintf(str, "PWT ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_PCD)
+			str += sprintf(str, "PCD ");
+		else
+			str += sprintf(str, "    ");
+
+		/* Bit 9 has a different meaning on level 3 vs 4 */
+		if (level <= 3) {
+			if (prot & _PAGE_PSE)
+				str += sprintf(str, "PSE ");
+			else
+				str += sprintf(str, "    ");
+		} else {
+			if (prot & _PAGE_PAT)
+				str += sprintf(str, "pat ");
+			else
+				str += sprintf(str, "    ");
+		}
+		if (prot & _PAGE_GLOBAL)
+			str += sprintf(str, "GLB ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_NX)
+			str += sprintf(str, "NX ");
+		else
+			str += sprintf(str, "x  ");
+	}
+	printk(KERN_INFO "[%16s]PFN: 0x%lx PTE: 0x%lx: [%s] [%s]\n",
+			what, (unsigned long)pfn, (unsigned long)(pte->pte),
+			buf, level_name[level]);
+}
+
 void free_init_pages(char *what, unsigned long begin, unsigned long end)
 {
 	unsigned long addr = begin;
+	int debug = 0;
 
 	if (addr >= end)
 		return;
@@ -358,13 +430,16 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
 	 * we are going to free part of that, we need to make that
 	 * writeable first.
 	 */
-	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT);
+	//printk(KERN_INFO "Mark RW: memory %08lx..%08lx\n", begin, PAGE_ALIGN(end));
+
+	set_memory_rw(begin, (end - begin) >> PAGE_SHIFT, 1);
 
-	printk(KERN_INFO "Freeing %s: %luk freed\n", what, (end - begin) >> 10);
+	print_data("RW", begin);
 
 	for (; addr < end; addr += PAGE_SIZE) {
 		ClearPageReserved(virt_to_page(addr));
 		init_page_count(virt_to_page(addr));
+		/* crashes here with _RO page */
 		memset((void *)(addr & ~(PAGE_SIZE-1)),
 			POISON_FREE_INITMEM, PAGE_SIZE);
 		free_page(addr);
@@ -375,7 +450,9 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
 
 void free_initmem(void)
 {
-	free_init_pages("unused kernel memory",
+	print_data("__init_begin", (unsigned long)(&__init_begin)+1);
+	print_data("__init_end", (unsigned long)(&__init_end)-1);
+	free_init_pages("unused kernel memory (init_)",
 			(unsigned long)(&__init_begin),
 			(unsigned long)(&__init_end));
 }
@@ -383,6 +460,8 @@ void free_initmem(void)
 #ifdef CONFIG_BLK_DEV_INITRD
 void free_initrd_mem(unsigned long start, unsigned long end)
 {
+	print_data("initrd_start", start+1);
+	print_data("initrd_end", end-1);
 	free_init_pages("initrd memory", start, end);
 }
 #endif
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 9a0c258..97df56b 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -1006,10 +1006,10 @@ void set_kernel_text_rw(void)
 	if (!kernel_set_to_readonly)
 		return;
 
-	pr_debug("Set kernel text: %lx - %lx for read write\n",
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
 		 start, start+size);
 
-	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT);
+	set_pages_rw(virt_to_page(start), size >> PAGE_SHIFT, 1);
 }
 
 void set_kernel_text_ro(void)
@@ -1020,10 +1020,10 @@ void set_kernel_text_ro(void)
 	if (!kernel_set_to_readonly)
 		return;
 
-	pr_debug("Set kernel text: %lx - %lx for read only\n",
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
 		 start, start+size);
 
-	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT);
+	set_pages_ro(virt_to_page(start), size >> PAGE_SHIFT, 1);
 }
 
 void mark_rodata_ro(void)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 69ddfbd..81ca3d6 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -710,7 +710,7 @@ void __init mem_init(void)
 		initsize >> 10);
 }
 
-#ifdef CONFIG_DEBUG_RODATA
+//#ifdef CONFIG_DEBUG_RODATA
 const int rodata_test_data = 0xC3;
 EXPORT_SYMBOL_GPL(rodata_test_data);
 
@@ -724,7 +724,7 @@ void set_kernel_text_rw(void)
 	if (!kernel_set_to_readonly)
 		return;
 
-	pr_debug("Set kernel text: %lx - %lx for read write\n",
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read write\n",
 		 start, end);
 
 	/*
@@ -732,7 +732,7 @@ void set_kernel_text_rw(void)
 	 * mapping will always be RO. Refer to the comment in
 	 * static_protections() in pageattr.c
 	 */
-	set_memory_rw(start, (end - start) >> PAGE_SHIFT);
+	set_memory_rw(start, (end - start) >> PAGE_SHIFT, 1);
 }
 
 void set_kernel_text_ro(void)
@@ -743,13 +743,19 @@ void set_kernel_text_ro(void)
 	if (!kernel_set_to_readonly)
 		return;
 
-	pr_debug("Set kernel text: %lx - %lx for read only\n",
+	printk(KERN_INFO "Set kernel text: %lx - %lx for read only\n",
 		 start, end);
 
 	/*
 	 * Set the kernel identity mapping for text RO.
 	 */
-	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
+	set_memory_ro(start, (end - start) >> PAGE_SHIFT, 1);
+}
+
+static inline int
+within(unsigned long addr, unsigned long start, unsigned long end)
+{
+	return addr >= start && addr < end;
 }
 
 void mark_rodata_ro(void)
@@ -764,15 +770,47 @@ void mark_rodata_ro(void)
 
 	printk(KERN_INFO "Write protecting the kernel read-only data: %luk\n",
 	       (end - start) >> 10);
-	set_memory_ro(start, (end - start) >> PAGE_SHIFT);
 
-	kernel_set_to_readonly = 1;
+	printk(KERN_INFO "start: 0x%lx, end: 0x%lx (size: 0x%ld)\n" \
+			"rodata_start: 0x%lx, rodata_end: 0x%lx (size: 0x%ld)\n" \
+			"text_end: 0x%lx, data_start: 0x%lx (size: 0x%lx)\n",
+			start, end, (end-start) >> PAGE_SHIFT,
+			rodata_start, rodata_end, (rodata_end-rodata_start)>>PAGE_SHIFT,
+			text_end, data_start, (data_start-rodata_end)>>PAGE_SHIFT);
+	print_data("_text", start);
+	print_data("_end-1", end-1);
+	print_data("text_end+1", text_end+1);
+	print_data("text_end-1", text_end-1);
+	print_data("data_start", data_start);
+	print_data("rodata_end-1" ,rodata_end-1);
+	print_data("__stop_ex_t_1", (unsigned long) page_address(virt_to_page(text_end))+1);
+	print_data("rodata_start", (unsigned long) page_address(virt_to_page(rodata_start)));
+
+	set_memory_ro(start, (end - start) >> PAGE_SHIFT, 1);
+	print_data("RO_text", start);
+	print_data("RO_end01", end-1);
+	print_data("ROtext_end+1", text_end+1);
+	print_data("ROdata_start", data_start);
+	print_data("ROrodata_end-1" ,rodata_end-1);
+	print_data("RO__stop___ex_", (unsigned long) page_address(virt_to_page(text_end))+1);
+	print_data("ROrodata_start-1", (unsigned long) page_address(virt_to_page(rodata_start))-1);
+	print_data("ROrodata_start+1", (unsigned long) page_address(virt_to_page(rodata_start))+1);
 
+	kernel_set_to_readonly = 1;
 	/*
 	 * The rodata section (but not the kernel text!) should also be
 	 * not-executable.
 	 */
+	printk(KERN_INFO "NX: 0x%lx, #%ld\n", rodata_start, (end-rodata_start)>>PAGE_SHIFT);
 	set_memory_nx(rodata_start, (end - rodata_start) >> PAGE_SHIFT);
+	print_data("NX_text", start);
+	print_data("NX_end-1", end-1);
+	print_data("NXtext_end+1", text_end+1);
+	print_data("NXdata_start", data_start);
+	print_data("NXrodata_end-1" ,rodata_end-1);
+	print_data("NX__stop___ex+1", (unsigned long) page_address(virt_to_page(text_end))+1);
+	print_data("NXrodata_start-1", (unsigned long) page_address(virt_to_page(rodata_start))-1);
+	print_data("NXrodata_start+1", (unsigned long) page_address(virt_to_page(rodata_start))+1);
 
 	rodata_test();
 
@@ -784,16 +822,16 @@ void mark_rodata_ro(void)
 	set_memory_ro(start, (end-start) >> PAGE_SHIFT);
 #endif
 
-	free_init_pages("unused kernel memory",
+	free_init_pages("#1unused kernel memory",
 			(unsigned long) page_address(virt_to_page(text_end)),
-			(unsigned long)
-				 page_address(virt_to_page(rodata_start)));
-	free_init_pages("unused kernel memory",
+			(unsigned long) page_address(virt_to_page(rodata_start)));
+
+	free_init_pages("#2unused kernel memory",
 			(unsigned long) page_address(virt_to_page(rodata_end)),
 			(unsigned long) page_address(virt_to_page(data_start)));
 }
 
-#endif
+//#endif
 
 int __init reserve_bootmem_generic(unsigned long phys, unsigned long len,
 				   int flags)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1d4eb93..ab4d965 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -23,7 +23,7 @@
 #include <asm/pgalloc.h>
 #include <asm/proto.h>
 #include <asm/pat.h>
-
+#include <asm/xen/page.h>
 /*
  * The current flushing context - we pass it instead of 5 arguments:
  */
@@ -291,8 +291,33 @@ static inline pgprot_t static_protections(pgprot_t prot, unsigned long address,
 	 */
 	if (kernel_set_to_readonly &&
 	    within(address, (unsigned long)_text,
-		   (unsigned long)__end_rodata_hpage_align))
-		pgprot_val(forbidden) |= _PAGE_RW;
+		   (unsigned long)__end_rodata_hpage_align)) {
+
+		/* When kernel_set_to_readonly, two sections that are OK to be
+  		 * PAGE_RW are the ones that are going to be recycled by
+  		 * mark_rodata_ro */
+		if (!within(address, (unsigned long)&__stop___ex_table,
+				(unsigned long)&__start_rodata) &&
+		    !within(address, (unsigned long)&__end_rodata,
+				(unsigned long)&_sdata))
+			pgprot_val(forbidden) |= _PAGE_RW;
+/*
+		if (pgprot_val(prot) & _PAGE_RW) {
+			printk(KERN_INFO "PAGE_RW: 0x%lx 0x%lx,(text:0x%lx, 0x%lx, 0x%lx)\n "\
+				 "(end: 0x%lx, 0x%lx, 0x%lx)\n" \
+				"PGPROT: 0x%lx->0x%lx\n",
+				address, pfn,
+				(unsigned long)_text,
+				virt_to_pfn(_text),
+				(unsigned long) page_address(virt_to_page(_text)),
+				(unsigned long)__end_rodata_hpage_align,
+				virt_to_pfn(__end_rodata_hpage_align),
+				(unsigned long) page_address(virt_to_page(__end_rodata_hpage_align)),
+				(unsigned long)pgprot_val(prot),
+				(unsigned long)pgprot_val(__pgprot(pgprot_val(prot) & ~pgprot_val(forbidden)))); 
+		}
+	*/
+	}
 #endif
 
 	prot = __pgprot(pgprot_val(prot) & ~pgprot_val(forbidden));
@@ -602,7 +627,7 @@ static int __cpa_process_fault(struct cpa_data *cpa, unsigned long vaddr,
 	}
 }
 
-static int __change_page_attr(struct cpa_data *cpa, int primary)
+static int __change_page_attr(struct cpa_data *cpa, int primary, int debug)
 {
 	unsigned long address;
 	int do_split, err;
@@ -647,7 +672,12 @@ repeat:
 		/*
 		 * Do we really change anything ?
 		 */
-		if (pte_val(old_pte) != pte_val(new_pte)) {
+		if (pte_val(old_pte) != pte_val(new_pte)) { /*
+			if (debug) {
+				printk(KERN_INFO " 0x%lx -> 0x%lx\n",
+				(unsigned long)pgprot_val(pte_pgprot(old_pte)),
+				(unsigned long)pgprot_val(pte_pgprot(new_pte)));
+			} */
 			set_pte_atomic(kpte, new_pte);
 			cpa->flags |= CPA_FLUSHTLB;
 		}
@@ -698,7 +728,7 @@ repeat:
 	return err;
 }
 
-static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias);
+static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias, int debug);
 
 static int cpa_process_alias(struct cpa_data *cpa)
 {
@@ -735,7 +765,7 @@ static int cpa_process_alias(struct cpa_data *cpa)
 		alias_cpa.vaddr = &laddr;
 		alias_cpa.flags &= ~(CPA_PAGES_ARRAY | CPA_ARRAY);
 
-		ret = __change_page_attr_set_clr(&alias_cpa, 0);
+		ret = __change_page_attr_set_clr(&alias_cpa, 0, 0);
 		if (ret)
 			return ret;
 	}
@@ -758,14 +788,14 @@ static int cpa_process_alias(struct cpa_data *cpa)
 		 * The high mapping range is imprecise, so ignore the
 		 * return value.
 		 */
-		__change_page_attr_set_clr(&alias_cpa, 0);
+		__change_page_attr_set_clr(&alias_cpa, 0, 0);
 	}
 #endif
 
 	return 0;
 }
 
-static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
+static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias, int debug)
 {
 	int ret, numpages = cpa->numpages;
 
@@ -781,7 +811,7 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int checkalias)
 
 		if (!debug_pagealloc)
 			spin_lock(&cpa_lock);
-		ret = __change_page_attr(cpa, checkalias);
+		ret = __change_page_attr(cpa, checkalias, debug);
 		if (!debug_pagealloc)
 			spin_unlock(&cpa_lock);
 		if (ret)
@@ -818,7 +848,7 @@ static inline int cache_attr(pgprot_t attr)
 static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 				    pgprot_t mask_set, pgprot_t mask_clr,
 				    int force_split, int in_flag,
-				    struct page **pages)
+				    struct page **pages, int debug)
 {
 	struct cpa_data cpa;
 	int ret, cache, checkalias;
@@ -830,9 +860,16 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 	 */
 	mask_set = canon_pgprot(mask_set);
 	mask_clr = canon_pgprot(mask_clr);
-	if (!pgprot_val(mask_set) && !pgprot_val(mask_clr) && !force_split)
+	if (!pgprot_val(mask_set) && !pgprot_val(mask_clr) && !force_split) {
+		/*
+		if (debug) {
+			printk(KERN_INFO "0x%lx [!0x%lx && !0x%lx]\n",
+				(unsigned long)addr,
+				(unsigned long)pgprot_val(mask_set),
+				(unsigned long)pgprot_val(mask_clr));
+		} */
 		return 0;
-
+	}
 	/* Ensure we are PAGE_SIZE aligned */
 	if (in_flag & CPA_ARRAY) {
 		int i;
@@ -880,8 +917,15 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
 
 	/* No alias checking for _NX bit modifications */
 	checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
-
-	ret = __change_page_attr_set_clr(&cpa, checkalias);
+	/*
+	if (debug)
+		printk(KERN_INFO "0x%lx Mask_set: 0x%lx, Mask_clr: 0x%lx, %s\n",
+			(unsigned long)addr,
+			(unsigned long)pgprot_val(mask_set),
+			(unsigned long)pgprot_val(mask_clr),
+			(checkalias) ? "! PAGE_NX" : "NX");	
+	*/
+	ret = __change_page_attr_set_clr(&cpa, checkalias, debug);
 
 	/*
 	 * Check whether we really changed something:
@@ -915,31 +959,31 @@ out:
 }
 
 static inline int change_page_attr_set(unsigned long *addr, int numpages,
-				       pgprot_t mask, int array)
+				       pgprot_t mask, int array, int debug)
 {
 	return change_page_attr_set_clr(addr, numpages, mask, __pgprot(0), 0,
-		(array ? CPA_ARRAY : 0), NULL);
+		(array ? CPA_ARRAY : 0), NULL, debug);
 }
 
 static inline int change_page_attr_clear(unsigned long *addr, int numpages,
-					 pgprot_t mask, int array)
+					 pgprot_t mask, int array, int debug)
 {
 	return change_page_attr_set_clr(addr, numpages, __pgprot(0), mask, 0,
-		(array ? CPA_ARRAY : 0), NULL);
+		(array ? CPA_ARRAY : 0), NULL, 0);
 }
 
 static inline int cpa_set_pages_array(struct page **pages, int numpages,
 				       pgprot_t mask)
 {
 	return change_page_attr_set_clr(NULL, numpages, mask, __pgprot(0), 0,
-		CPA_PAGES_ARRAY, pages);
+		CPA_PAGES_ARRAY, pages, 0);
 }
 
 static inline int cpa_clear_pages_array(struct page **pages, int numpages,
 					 pgprot_t mask)
 {
 	return change_page_attr_set_clr(NULL, numpages, __pgprot(0), mask, 0,
-		CPA_PAGES_ARRAY, pages);
+		CPA_PAGES_ARRAY, pages, 0);
 }
 
 int _set_memory_uc(unsigned long addr, int numpages)
@@ -948,7 +992,7 @@ int _set_memory_uc(unsigned long addr, int numpages)
 	 * for now UC MINUS. see comments in ioremap_nocache()
 	 */
 	return change_page_attr_set(&addr, numpages,
-				    __pgprot(_PAGE_CACHE_UC_MINUS), 0);
+				    __pgprot(_PAGE_CACHE_UC_MINUS), 0, 0);
 }
 
 int set_memory_uc(unsigned long addr, int numpages)
@@ -992,7 +1036,7 @@ int set_memory_array_uc(unsigned long *addr, int addrinarray)
 	}
 
 	ret = change_page_attr_set(addr, addrinarray,
-				    __pgprot(_PAGE_CACHE_UC_MINUS), 1);
+				    __pgprot(_PAGE_CACHE_UC_MINUS), 1, 0);
 	if (ret)
 		goto out_free;
 
@@ -1012,12 +1056,12 @@ int _set_memory_wc(unsigned long addr, int numpages)
 	unsigned long addr_copy = addr;
 
 	ret = change_page_attr_set(&addr, numpages,
-				    __pgprot(_PAGE_CACHE_UC_MINUS), 0);
+				    __pgprot(_PAGE_CACHE_UC_MINUS), 0, 0);
 	if (!ret) {
 		ret = change_page_attr_set_clr(&addr_copy, numpages,
 					       __pgprot(_PAGE_CACHE_WC),
 					       __pgprot(_PAGE_CACHE_MASK),
-					       0, 0, NULL);
+					       0, 0, NULL, 0);
 	}
 	return ret;
 }
@@ -1050,7 +1094,7 @@ EXPORT_SYMBOL(set_memory_wc);
 int _set_memory_wb(unsigned long addr, int numpages)
 {
 	return change_page_attr_clear(&addr, numpages,
-				      __pgprot(_PAGE_CACHE_MASK), 0);
+				      __pgprot(_PAGE_CACHE_MASK), 0, 0);
 }
 
 int set_memory_wb(unsigned long addr, int numpages)
@@ -1072,7 +1116,7 @@ int set_memory_array_wb(unsigned long *addr, int addrinarray)
 	int ret;
 
 	ret = change_page_attr_clear(addr, addrinarray,
-				      __pgprot(_PAGE_CACHE_MASK), 1);
+				      __pgprot(_PAGE_CACHE_MASK), 1, 0);
 	if (ret)
 		return ret;
 
@@ -1088,7 +1132,7 @@ int set_memory_x(unsigned long addr, int numpages)
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
-	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0);
+	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_NX), 0, 0);
 }
 EXPORT_SYMBOL(set_memory_x);
 
@@ -1097,31 +1141,33 @@ int set_memory_nx(unsigned long addr, int numpages)
 	if (!(__supported_pte_mask & _PAGE_NX))
 		return 0;
 
-	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0);
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_NX), 0, 0);
 }
 EXPORT_SYMBOL(set_memory_nx);
 
-int set_memory_ro(unsigned long addr, int numpages)
+int set_memory_ro(unsigned long addr, int numpages, int debug)
 {
-	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0);
+	//if (debug) printk(KERN_INFO "RO: 0x%lx (%d)\n", addr, numpages);
+	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_RW), 0, debug);
 }
 EXPORT_SYMBOL_GPL(set_memory_ro);
 
-int set_memory_rw(unsigned long addr, int numpages)
+int set_memory_rw(unsigned long addr, int numpages, int debug)
 {
-	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0);
+	//if (debug) printk(KERN_INFO "RW: 0x%lx (%d)\n", addr, numpages);
+	return change_page_attr_set(&addr, numpages, __pgprot(_PAGE_RW), 0, debug);
 }
 EXPORT_SYMBOL_GPL(set_memory_rw);
 
 int set_memory_np(unsigned long addr, int numpages)
 {
-	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
+	return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0, 0);
 }
 
 int set_memory_4k(unsigned long addr, int numpages)
 {
 	return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
-					__pgprot(0), 1, 0, NULL);
+					__pgprot(0), 1, 0, NULL, 0);
 }
 
 int set_pages_uc(struct page *page, int numpages)
@@ -1213,18 +1259,18 @@ int set_pages_nx(struct page *page, int numpages)
 }
 EXPORT_SYMBOL(set_pages_nx);
 
-int set_pages_ro(struct page *page, int numpages)
+int set_pages_ro(struct page *page, int numpages, int debug)
 {
 	unsigned long addr = (unsigned long)page_address(page);
 
-	return set_memory_ro(addr, numpages);
+	return set_memory_ro(addr, numpages, debug);
 }
 
-int set_pages_rw(struct page *page, int numpages)
+int set_pages_rw(struct page *page, int numpages, int debug)
 {
 	unsigned long addr = (unsigned long)page_address(page);
 
-	return set_memory_rw(addr, numpages);
+	return set_memory_rw(addr, numpages, debug);
 }
 
 #ifdef CONFIG_DEBUG_PAGEALLOC
@@ -1244,7 +1290,7 @@ static int __set_pages_p(struct page *page, int numpages)
 	 * mappings (this adds to complexity if we want to do this from
 	 * atomic context especially). Let's keep it simple!
 	 */
-	return __change_page_attr_set_clr(&cpa, 0);
+	return __change_page_attr_set_clr(&cpa, 0, 0);
 }
 
 static int __set_pages_np(struct page *page, int numpages)
@@ -1262,7 +1308,7 @@ static int __set_pages_np(struct page *page, int numpages)
 	 * mappings (this adds to complexity if we want to do this from
 	 * atomic context especially). Let's keep it simple!
 	 */
-	return __change_page_attr_set_clr(&cpa, 0);
+	return __change_page_attr_set_clr(&cpa, 0, 0);
 }
 
 void kernel_map_pages(struct page *page, int numpages, int enable)
diff --git a/init/main.c b/init/main.c
index 4cb47a1..6fba891 100644
--- a/init/main.c
+++ b/init/main.c
@@ -91,7 +91,7 @@ extern void prio_tree_init(void);
 extern void radix_tree_init(void);
 extern void free_initmem(void);
 #ifndef CONFIG_DEBUG_RODATA
-static inline void mark_rodata_ro(void) { }
+//static inline void mark_rodata_ro(void) { }
 #endif
 
 #ifdef CONFIG_TC

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-03 18:13               ` Konrad Rzeszutek Wilk
@ 2010-03-04  9:17                 ` Arvind R
  2010-03-04 18:25                   ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-04  9:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > aio-write -
>>
>> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
>> and finally ttm_bo_vm_fault.

> I've attached a simple patch I wrote some time ago to get the real MFNs
> and its page protection. I think you can adapt it (print_data function to be exact)
> to peet at the PTE and its protection values.
Have patched - did not apply clean. Will compile and get some info.

> There is an extra flag that the PTE can have when running under Xen: _PAGE_IOMAP.
> This signifies that the PFN is actually the MFN. In this case thought
> it sholdn't be enabled b/c the memory is actually gathered from
> alloc_page. But if it is, it might be the culprit.

>> What can possibly cause the fault-handler to repeat endlessly?

FYI: about 2000 times a second - slowed by printk

>> If a wrong page is backed at the user-address, it should create bad_access or
>> some other subsequent events - but the system is running fine minus all local
> So  you see this fault handler being called endlessly while the machine
> is still running and other pieces of code work just fine, right?
Right. Can ssh in - but no local console

>> ttm_tt_get_page calls alloc in a loop - so it may allocate multiple pages from
>> start/end depending on Highmem memory or not - implying asynchronous allocation
>> and mapping.
>
> I thought it had some logic to figure out that it already handled this
> page and would return an already allocate page?
Right.

I think the problem lies in the vm_insert_pfn/page/mixed family of functions.
These are only used (grep'ed kernel tree) and invariably for mmaping.
Scsi-tgt, mspec, some media/video, poch,android in staging and ttm
- and, surprise - xen/blktap/ring.c and device.c
- which both check XENFEAT_auto_translated_physmap

Pls. look at xen/blktap/ring.c - it looks to be what we need

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-04  9:17                 ` Arvind R
@ 2010-03-04 18:25                   ` Konrad Rzeszutek Wilk
  2010-03-05  7:46                     ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-04 18:25 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> >> > aio-write -
> >>
> >> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
> >> and finally ttm_bo_vm_fault.
> 
> > I've attached a simple patch I wrote some time ago to get the real MFNs
> > and its page protection. I think you can adapt it (print_data function to be exact)
> > to peet at the PTE and its protection values.
> Have patched - did not apply clean. Will compile and get some info.

Right. I don't think it would help you immediately - I was thinking you could
take the print_data function and just jam it in the tt_bo_vm_fault code
and use it to print the PTE data.
> 
> > There is an extra flag that the PTE can have when running under Xen: _PAGE_IOMAP.
> > This signifies that the PFN is actually the MFN. In this case thought
> > it sholdn't be enabled b/c the memory is actually gathered from
> > alloc_page. But if it is, it might be the culprit.
> 
> >> What can possibly cause the fault-handler to repeat endlessly?
> 
> FYI: about 2000 times a second - slowed by printk
> 
> >> If a wrong page is backed at the user-address, it should create bad_access or
> >> some other subsequent events - but the system is running fine minus all local
> > So  you see this fault handler being called endlessly while the machine
> > is still running and other pieces of code work just fine, right?
> Right. Can ssh in - but no local console
> 
> >> ttm_tt_get_page calls alloc in a loop - so it may allocate multiple pages from
> >> start/end depending on Highmem memory or not - implying asynchronous allocation
> >> and mapping.
> >
> > I thought it had some logic to figure out that it already handled this
> > page and would return an already allocate page?
> Right.
> 
> I think the problem lies in the vm_insert_pfn/page/mixed family of functions.
> These are only used (grep'ed kernel tree) and invariably for mmaping.
> Scsi-tgt, mspec, some media/video, poch,android in staging and ttm
> - and, surprise - xen/blktap/ring.c and device.c
> - which both check XENFEAT_auto_translated_physmap
> 
> Pls. look at xen/blktap/ring.c - it looks to be what we need

Let me take a look at it tomorrow. Bit swamped.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-04 18:25                   ` Konrad Rzeszutek Wilk
@ 2010-03-05  7:46                     ` Arvind R
  2010-03-05 20:23                       ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-05  7:46 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> >> > aio-write -
>> >>
>> >> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
>> >> and finally ttm_bo_vm_fault.
>>
>> > I've attached a simple patch I wrote some time ago to get the real MFNs
>> Have patched - did not apply clean. Will compile and get some info.
> take the print_data function and just jam it in the tt_bo_vm_fault code
Linking problems. But compiled and run
!!! CANNOT lookup_address()!!! Returns NULL on bare AND Xen
Before AND After vm_insert/remap_pfn. Address looked_up is what
fault_handler passes in. Had to add a NULL check in print_data.

Bare-boot log.
 [TTM] ttm_bo_vm_fault: faulting-in pages, TTM_PAGE_FLAGS=0x0
 [         Before:]PFN: Failed lookup_address of 0x7fd82e9aa000
 [            After :]PFN: Failed lookup_address of 0x7fd82e9aa000

 Ring any bells?

>> > There is an extra flag that the PTE can have when running under Xen: _PAGE_IOMAP.
>> > This signifies that the PFN is actually the MFN. In this case thought
>> > it sholdn't be enabled b/c the memory is actually gathered from
>> > alloc_page. But if it is, it might be the culprit.

>> I think the problem lies in the vm_insert_pfn/page/mixed family of functions.
>> These are only used (grep'ed kernel tree) and invariably for mmaping.
>> Scsi-tgt, mspec, some media/video, poch,android in staging and ttm
>> - and, surprise - xen/blktap/ring.c and device.c
>> - which both check XENFEAT_auto_translated_physmap
>>
>> Pls. look at xen/blktap/ring.c - it looks to be what we need
>
> Let me take a look at it tomorrow. Bit swamped.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-05  7:46                     ` Arvind R
@ 2010-03-05 20:23                       ` Konrad Rzeszutek Wilk
  2010-03-06  8:16                         ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-05 20:23 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3712 bytes --]

On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >> >> > aio-write -
> >> >>
> >> >> which triggers do_page_fault, handle_mm_fault, do_linear_fault, __do_fault
> >> >> and finally ttm_bo_vm_fault.
> >>
> >> > I've attached a simple patch I wrote some time ago to get the real MFNs
> >> Have patched - did not apply clean. Will compile and get some info.
> > take the print_data function and just jam it in the tt_bo_vm_fault code
> Linking problems. But compiled and run
> !!! CANNOT lookup_address()!!! Returns NULL on bare AND Xen
> Before AND After vm_insert/remap_pfn. Address looked_up is what

The "after" is a bit surprise. I would have thought it would would have
update the page-table with the new PFN. But maybe it did, but for a
different address (since it does not actually use the 'address' field
but __va(pfn)<< PAGE_SHIFT as the address).

> fault_handler passes in. Had to add a NULL check in print_data.
> 
> Bare-boot log.
>  [TTM] ttm_bo_vm_fault: faulting-in pages, TTM_PAGE_FLAGS=0x0
>  [         Before:]PFN: Failed lookup_address of 0x7fd82e9aa000
>  [            After :]PFN: Failed lookup_address of 0x7fd82e9aa000
> 
>  Ring any bells?

Yeah... Can you also instrument the code to print the PFN? The code goes
through insert_pfn->pfn_pte, which calls xen_make_pte, which ends up
doing pte_pfn_to_mfn. That routine does a pfn_to_mfn which does a
get_phys_to_machine(pfn). The last routine looks up the PFN->MFN lookup
table and finds a MFN that corresponds to this PFN. Since the memory
was allocated from ... well this is the big question.

Is the memory allocated from normal kernel space or is really backed by
the video card. In your previous e-mails you mentioned that is_iomem is
set to zero, which implies that the memory for these functions is NOT
memory backed. 


> 
> >> > There is an extra flag that the PTE can have when running under Xen: _PAGE_IOMAP.
> >> > This signifies that the PFN is actually the MFN. In this case thought
> >> > it sholdn't be enabled b/c the memory is actually gathered from
> >> > alloc_page. But if it is, it might be the culprit.
> 
> >> I think the problem lies in the vm_insert_pfn/page/mixed family of functions.
> >> These are only used (grep'ed kernel tree) and invariably for mmaping.
> >> Scsi-tgt, mspec, some media/video, poch,android in staging and ttm
> >> - and, surprise - xen/blktap/ring.c and device.c
> >> - which both check XENFEAT_auto_translated_physmap
> >>
> >> Pls. look at xen/blktap/ring.c - it looks to be what we need
> >
> > Let me take a look at it tomorrow. Bit swamped.

I started going through the function allocations that were done and
found this in ttm_bo_mmap:

vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;

the VM_IO is OK if the memory that is being referenced is the video
driver memory. _BUT_ if the memory is being allocated through the
alloc_page (ttm_tt_alloc_page) , or kmalloc, then this will cause us
headaches. You might want to check in  ttm_bo_vm_fault what the
vma->vm_flags are and if VM_IO is set.

(FYI, look at
http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)

If the VM_IO is set, change that ttm_bo_mmap to not have VM_IO and see
how that works.


Thought I am not sure if the ttm_bo_mmap is used by the nvidia driver.

Attached is a re-write of the debug patch I sent earlier. I compile
tested it but haven't yet run it (just doing that now).


[-- Attachment #2: debug-print-pte.patch --]
[-- Type: text/plain, Size: 4520 bytes --]

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 3dc8d6b..6b02a03 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -35,6 +35,10 @@
 #include <linux/rbtree.h>
 #include <linux/module.h>
 #include <linux/uaccess.h>
+#include <asm/page.h>
+#include <asm/pgtable_types.h>
+#include <xen/xen.h>
+#include <asm/xen/page.h>
 
 #define TTM_BO_VM_NUM_PREFAULT 16
 
@@ -69,6 +73,137 @@ static struct ttm_buffer_object *ttm_bo_vm_lookup_rb(struct ttm_bo_device *bdev,
 	return best_bo;
 }
 
+void print_pte(char *what, struct page *page, unsigned int pfn, unsigned long address)
+{
+	static const char * const level_name[] =
+	  { "NONE", "4K", "2M", "1G", "NUM" };
+	unsigned long addr = 0;
+	pte_t *pte = NULL;
+	pteval_t val = (pteval_t)0;
+	unsigned int level = 0;
+	unsigned offset;
+	unsigned long phys;
+	pgprotval_t prot;
+	char buf[80];
+	char *str;
+
+	str = buf;
+	// Figure out if the address is pagetable.
+	if (address == 0 && page)
+		addr = (u64)page_address(page);
+
+	if (address && !page)
+		addr = address;
+
+	if (address && page) {
+		addr = (u64)page_address(page);
+		if (address != addr) {
+			if (addr == 0) {
+				str += sprintf(str, "addr==0");
+				addr = address;
+			}
+		}
+	}
+	if (!virt_addr_valid(addr))
+		str += sprintf(str, "!addr");
+
+	if (pfn != 0 && page && pfn_valid(pfn)) {
+		if (pfn != page_to_pfn(page)) // Gosh!?
+			str += sprintf(str, "pfn!=pfn(%ld)", page_to_pfn(page));
+	}
+	if (pfn != 0 && addr != 0 && pfn_valid(pfn)) {
+		if (pfn != virt_to_pfn(addr))
+			str += sprintf(str,"pfn(addr)!=pfn");
+	}
+	if (!pfn_valid(pfn))
+		str += sprintf(str,"!pfn");
+
+	if (pfn_valid(pfn) && xen_initial_domain()) {
+		str += sprintf(str,"mfn:%ld->%lx",
+			pfn_to_mfn(pfn),
+			(unsigned long)mfn_to_virt(pfn_to_mfn(pfn)));
+	}
+	// Fixup, last attempt
+	if (!virt_addr_valid(addr) && pfn_valid(pfn)) {
+		str += sprintf(str,"(%lx)", addr); // old address.
+		addr = (u64)pfn_to_kaddr(pfn);
+	}
+	pte = lookup_address(addr, &level);
+	if (!pte) {
+		str += sprintf(str,"!pte(addr)");
+		goto print;
+	}
+	offset = addr & ~PAGE_MASK;
+
+	if (xen_domain()) {
+		phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset;		
+		val = pte_val_ma(*pte);
+	} else {
+		phys = (pte_pfn(*pte) << PAGE_SHIFT) + offset;
+		val = pte_val(*pte);
+	}	
+	prot = pgprot_val(pte_pgprot(*pte));
+
+	if (!prot)
+		str += sprintf(str, "Not present.");
+	else  {
+		if (prot & _PAGE_USER)
+			str += sprintf(str, "USR ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_RW)
+			str += sprintf(str, "RW ");
+		else
+			str += sprintf(str, "ro ");
+		if (prot & _PAGE_PWT)
+			str += sprintf(str, "PWT ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_PCD)
+			str += sprintf(str, "PCD ");
+		else
+			str += sprintf(str, "    ");
+
+		/* Bit 9 has a different meaning on level 3 vs 4 */
+		if (level <= 3) {
+			if (prot & _PAGE_PSE)
+				str += sprintf(str, "PSE ");
+			else
+				str += sprintf(str, "    ");
+		} else {
+			if (prot & _PAGE_PAT)
+				str += sprintf(str, "pat ");
+			else
+				str += sprintf(str, "    ");
+		}
+		if (prot & _PAGE_GLOBAL)
+			str += sprintf(str, "GLB ");
+		else
+			str += sprintf(str, "    ");
+		if (prot & _PAGE_NX)
+			str += sprintf(str, "NX ");
+		else
+			str += sprintf(str, "x  ");
+#ifdef _PAGE_IOMEM
+		if (prot & _PAGE_IOMEM)
+			str += sprintf(str, "IO ");
+		else
+			str += sprintf(str, "   ");
+#endif
+		if (prot & _PAGE_SPECIAL)
+			str += sprintf(str, "SP ");
+	}
+
+print:
+	printk(KERN_INFO "[%16s]PFN: 0x%lx PTE: 0x%lx (val:%lx): [%s] [%s]\n",
+			what,
+			(unsigned long)pfn,
+			(pte) ? (unsigned long)(pte->pte) : 0,
+			(unsigned long)val,
+			buf,
+			level_name[level]);
+}
+
 static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct ttm_buffer_object *bo = (struct ttm_buffer_object *)
@@ -81,7 +216,7 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	unsigned long page_last;
 	unsigned long pfn;
 	struct ttm_tt *ttm = NULL;
-	struct page *page;
+	struct page *page = NULL;
 	int ret;
 	int i;
 	bool is_iomem;
@@ -186,7 +321,9 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 			pfn = page_to_pfn(page);
 		}
 
+		print_pte("BEFORE",page, pfn, address);
 		ret = vm_insert_mixed(vma, address, pfn);
+		print_pte("AFTER",page, pfn, address);
 		/*
 		 * Somebody beat us to this PTE or prefaulting to
 		 * an already populated PTE, or prefaulting error.

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-05 20:23                       ` Konrad Rzeszutek Wilk
@ 2010-03-06  8:16                         ` Arvind R
  2010-03-06 20:59                           ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-06  8:16 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>> >> <konrad.wilk@oracle.com> wrote:
> Yeah... Can you also instrument the code to print the PFN? The code goes
> through insert_pfn->pfn_pte, which calls xen_make_pte, which ends up
> doing pte_pfn_to_mfn. That routine does a pfn_to_mfn which does a
> get_phys_to_machine(pfn). The last routine looks up the PFN->MFN lookup
> table and finds a MFN that corresponds to this PFN. Since the memory
> was allocated from ... well this is the big question.
>
> Is the memory allocated from normal kernel space or is really backed by
> the video card. In your previous e-mails you mentioned that is_iomem is
> set to zero, which implies that the memory for these functions is NOT
> memory backed.
>
> the VM_IO is OK if the memory that is being referenced is the video
> driver memory. _BUT_ if the memory is being allocated through the
> alloc_page (ttm_tt_alloc_page) , or kmalloc, then this will cause us
> headaches. You might want to check in  ttm_bo_vm_fault what the
> vma->vm_flags are and if VM_IO is set.
>
> (FYI, look at
> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
How do you remember these refs?!

>
> Thought I am not sure if the ttm_bo_mmap is used by the nvidia driver.
U mean nouveau? Only for accelerated graphics.

> Attached is a re-write of the debug patch I sent earlier. I compile
> tested it but haven't yet run it (just doing that now).
>
Output: (snipped/cut/pasted for easier association)

Trace of Pushbuf Memory Access, Bare-BOOT:
X: OUT_RING: Enter: chan=0x8170a0, id=2, data=0x48000, chan->cur=0x7f0aa3594054
kernel: [TTM] FAULTing-in address=0x7f0aa3594000, bo->buffer_start=0x0
kernel: [ BEFORE]PFN: 0x7513f PTE: 0x750001e3 (val:750001e3): [    RW
       PSE GLB x  ] [2M]
kernel: [    AFTER]PFN: 0x7513f PTE: 0x750001e3 (val:750001e3): [
RW         PSE GLB x  ] [2M]
kernel: [BEFORE]PFN: 0x75144 PTE: 0x750001e3 (val:750001e3): [    RW
      PSE GLB x  ][2M]
kernel: [   AFTER]PFN: 0x75144 PTE: 0x750001e3 (val:750001e3): [    RW
        PSE GLB x  ] [2M]
< --- and so on for 14 more pages --->
X: OUT_RING: updated data
X: OUT_RING: Exit

Trace of Pushbuf Memory Access, Xen-BOOT:
X: OUT_RING: Enter: chan=0x8170a0, id=2, data=0x44000, chan->cur=0x7f98838df000
kernel: [TTM] FAULTing-in address=0x7f98838df000, bo->buffer_start=0x0
kernel: [BEFORE]PFN: 0x16042 PTE: 0x10000068042067
(val:10000068042067): [mfn:426050->ffff880016042000USR RW
   x  ] [4K]
kernel: [   AFTER]PFN: 0x16042 PTE: 0x10000068042067
(val:10000068042067): [mfn:426050->ffff880016042000USR RW
   x  ] [4K]
kernel: [BEFORE]PFN: 0x16043 PTE: 0x10000068043067
(val:10000068043067): [mfn:426051->ffff880016043000USR RW
   x  ] [4K]
kernel: [   AFTER]PFN: 0x16043 PTE: 0x10000068043067
(val:10000068043067): [mfn:426051->ffff880016043000USR RW
   x  ] [4K]
< --- and so on for 14 more pages --->
< --- and repeat fault --->
kernel: [TTM] FAULTing-in address=0x7f98838df000, bo->buffer_start=0x0

Do you know what is happening? Is a solution feasible?

Sequence of nouveau operation as I understand it:
1. prepare for user pushbuf write by grabbing memory access rights
(exclude GPU access)
2. Do the write
3. finish and release grab

The memory may/maynot be on the video card. There is a vram_pushbuf
module option which would probably complicate things more. GPU is
informed about the address, I suppose,
in the prepare and finish pre/postamble to RING access.

and, THANKS hugely for your help.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-06  8:16                         ` Arvind R
@ 2010-03-06 20:59                           ` Arvind R
  2010-03-06 23:56                             ` Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-06 20:59 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Sat, Mar 6, 2010 at 1:46 PM, Arvind R <arvino55@gmail.com> wrote:
> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
>>> <konrad.wilk@oracle.com> wrote:
>>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>>> >> <konrad.wilk@oracle.com> wrote:
>> Yeah... Can you also instrument the code to print the PFN? The code goes
>> through insert_pfn->pfn_pte, which calls xen_make_pte, which ends up
>> doing pte_pfn_to_mfn. That routine does a pfn_to_mfn which does a
>> get_phys_to_machine(pfn). The last routine looks up the PFN->MFN lookup
>> table and finds a MFN that corresponds to this PFN. Since the memory
>> was allocated from ... well this is the big question.
by ttm_tt_page_alloc and is not backed by video-memory. the nouveau
folks have just added a patch that disables pushbuf in video memory.
>>
>> Is the memory allocated from normal kernel space or is really backed by
>> the video card. In your previous e-mails you mentioned that is_iomem is
>> set to zero, which implies that the memory for these functions is NOT
>> memory backed.
right. see continuation below.
>>
>> the VM_IO is OK if the memory that is being referenced is the video
>> driver memory. _BUT_ if the memory is being allocated through the
>> alloc_page (ttm_tt_alloc_page) , or kmalloc, then this will cause us
>> headaches. You might want to check in  ttm_bo_vm_fault what the
>> vma->vm_flags are and if VM_IO is set.
>>
>> (FYI, look at
>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
> How do you remember these refs?!
Sadly, VM_IO is set. Tried not setting it (in ttm_bo_map) - works on bare-boot,
but crashes (very) hard on Xen. Tried setting it conditional to
io_mem; same result.
No logs even, so don't know what happened.

>> Thought I am not sure if the ttm_bo_mmap is used by the nvidia driver.
> U mean nouveau? Only for accelerated graphics.
>
>> Attached is a re-write of the debug patch I sent earlier. I compile
>> tested it but haven't yet run it (just doing that now).
>>
> Output: (snipped/cut/pasted for easier association)
>
> Trace of Pushbuf Memory Access, Bare-BOOT:
> X: OUT_RING: Enter: chan=0x8170a0, id=2, data=0x48000, chan->cur=0x7f0aa3594054
> kernel: [TTM] FAULTing-in address=0x7f0aa3594000, bo->buffer_start=0x0
> kernel: [ BEFORE]PFN: 0x7513f PTE: 0x750001e3 (val:750001e3): [    RW
>       PSE GLB x  ] [2M]
> kernel: [    AFTER]PFN: 0x7513f PTE: 0x750001e3 (val:750001e3): [
> RW         PSE GLB x  ] [2M]
> kernel: [BEFORE]PFN: 0x75144 PTE: 0x750001e3 (val:750001e3): [    RW
>      PSE GLB x  ][2M]
> kernel: [   AFTER]PFN: 0x75144 PTE: 0x750001e3 (val:750001e3): [    RW
>        PSE GLB x  ] [2M]
> < --- and so on for 14 more pages --->
> X: OUT_RING: updated data
> X: OUT_RING: Exit
>
> Trace of Pushbuf Memory Access, Xen-BOOT:
> X: OUT_RING: Enter: chan=0x8170a0, id=2, data=0x44000, chan->cur=0x7f98838df000
> kernel: [TTM] FAULTing-in address=0x7f98838df000, bo->buffer_start=0x0
> kernel: [BEFORE]PFN: 0x16042 PTE: 0x10000068042067
> (val:10000068042067): [mfn:426050->ffff880016042000USR RW
>   x  ] [4K]
> kernel: [   AFTER]PFN: 0x16042 PTE: 0x10000068042067
> (val:10000068042067): [mfn:426050->ffff880016042000USR RW
>   x  ] [4K]
> kernel: [BEFORE]PFN: 0x16043 PTE: 0x10000068043067
> (val:10000068043067): [mfn:426051->ffff880016043000USR RW
>   x  ] [4K]
> kernel: [   AFTER]PFN: 0x16043 PTE: 0x10000068043067
> (val:10000068043067): [mfn:426051->ffff880016043000USR RW
>   x  ] [4K]
> < --- and so on for 14 more pages --->
> < --- and repeat fault --->
> kernel: [TTM] FAULTing-in address=0x7f98838df000, bo->buffer_start=0x0
>
Note that the patch now effectively looks up page_address(address)
> Do you know what is happening? Is a solution feasible?
>
> Sequence of nouveau operation as I understand it:
> 1. prepare for user pushbuf write by grabbing memory access rights
> (exclude GPU access)
> 2. Do the write
> 3. finish and release grab
>
> The memory may/maynot be on the video card. There is a vram_pushbuf
> module option which would probably complicate things more.
the option is now made ineffective by the nouveau folks.

Continuation, after some code reading:
The pushbuf needs to be backed by some memory - any memory. The
memory is allocated after the mmap call (which sets up VM), by the
fault-handler.
The user-space program (X thro' libdrm-nouveau) issues ioctls that
are effectively sync_to_cpu/device and wites to the buffer - thereby
invoking the fault handler. After writing, ioctls are issued to sync and
ends up with nouveau_pushbuf_flush - which treats the pushbuf memory
to be __force user-space memory and does a DRM_COPY_FROM_USER
which is in fact copy_from_user into a locally allocated (GFP_KERNEL)
buffer and writes it out to the video card (basically iomem writes). What
gets written triggers activity on the GPU (the parameters having been
set and associated with some other buffers) and the caller waits
on an event_queue for notification. Strange way of doing things, but
guess this is the ground-work for the future of GEM. All that is achieved
is hide buffer-allocations from the user - this may be important - cos
if the fault-handler installs only one page and leaves the rest to be
allocated by future faults - the video card hangs in PFIFO errors!
So the special value of SPECULATIVE_PRE_INSTALL - 16 pages.

I suppose that in xen-boot, the pages are installed to the wrong address.
The installed pages need NOT be contiguous - the contiguous pages
happen only at the first X invocation on a bare boot. So choosing a
different allocator is possible in the fault-handler - the problem is in
the freeing of the allocation.

Sorry for the messed-up postings

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-06 20:59                           ` Arvind R
@ 2010-03-06 23:56                             ` Arvind R
  2010-03-08 17:51                               ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 21+ messages in thread
From: Arvind R @ 2010-03-06 23:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Sun, Mar 7, 2010 at 2:29 AM, Arvind R <arvino55@gmail.com> wrote:
> On Sat, Mar 6, 2010 at 1:46 PM, Arvind R <arvino55@gmail.com> wrote:
>> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
>> <konrad.wilk@oracle.com> wrote:
>>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
>>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
>>>> <konrad.wilk@oracle.com> wrote:
>>>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>>>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>>>> >> <konrad.wilk@oracle.com> wrote:

>>> (FYI, look at
>>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)

THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.

My typo and testing mistakes.
Patched ttm_bo_mmap
vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
if (bo->type != ttm_bo_type_device)
     vma->vm_flags |= VM_IO;

Then, put sleep and exit in libdrm OUT_RING.
The fault-handler worked fine!

One question - How to get DMA addresses for user-buffers under Xen.
Will work on that.

HUGE THANKS!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Nouveau on dom0
  2010-03-06 23:56                             ` Arvind R
@ 2010-03-08 17:51                               ` Konrad Rzeszutek Wilk
  2010-03-10 12:50                                 ` [Solved] " Arvind R
  0 siblings, 1 reply; 21+ messages in thread
From: Konrad Rzeszutek Wilk @ 2010-03-08 17:51 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel

On Sun, Mar 07, 2010 at 05:26:12AM +0530, Arvind R wrote:
> On Sun, Mar 7, 2010 at 2:29 AM, Arvind R <arvino55@gmail.com> wrote:
> > On Sat, Mar 6, 2010 at 1:46 PM, Arvind R <arvino55@gmail.com> wrote:
> >> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
> >> <konrad.wilk@oracle.com> wrote:
> >>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
> >>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
> >>>> <konrad.wilk@oracle.com> wrote:
> >>>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
> >>>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
> >>>> >> <konrad.wilk@oracle.com> wrote:
> 
> >>> (FYI, look at
> >>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
> 
> THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.

That is great! Thanks for doing all the hard-work in digging through the
code.

> 
> My typo and testing mistakes.
> Patched ttm_bo_mmap
> vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
> if (bo->type != ttm_bo_type_device)
>      vma->vm_flags |= VM_IO;
> 
> Then, put sleep and exit in libdrm OUT_RING.
> The fault-handler worked fine!

So this means you got graphics on the screen? Or at least that Kernel
Mode Setting and the DRM parts show fancy graphics during boot?

> 
> One question - How to get DMA addresses for user-buffers under Xen.

This is the X part right? Where the X driver takes control of the GPU
and starts having fun? I am not that familiar with how the drm_nouvou
module hands over the pointers and such to the X driver? Does it reset
it and start from scratch (as if you had no KMS enabled?) Or does it use
the allocated buffers and such and then asks for more using ioctl such
as DRM_ALLOCATE_SCATTER_GATHER (don't remember if that was the right
name).


But to answer your question, the DMA address is actually the MFN
(machine frame number) which is bitshifted by twelve and an offset
added. The debug patch I provided gets that from the

PTE value:

	if (xen_domain()) {
+		phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset;		

The 'phys' now has the physical address that PCI bus (and the video
card) would utilize to request data to. Please keep in mind that the
'pte_mfn' is a special Xen function. Normally one would do 'pte'.

There is a layer of indirection in the Linux pvops kernel that makes
this a bit funny. Mainly most of the time you get something called GPFN
which is a psedu-physical MFN. Then there is a translation of PFN to
MFN (or vice-versa). For pages that are being utilized for PCI devices
(and that have _PAGE_IOMAP PTE flag set), the GPFN is actually the MFN,
while for the rest (like the pages allocated by the mmap and then
stitched up in the ttm_bo_fault handler), it is the PFN.

.. back to the DMA part. When kernel subsystems do DMA they go through a
PCI DMA API. This API has things such as 'dma_map_page', which through
layers of indirection calls the Xen SWIOTLB layer. The Xen SWIOTLB is
smart enough (actually, the enligthen.c) to distinguish if the page has
_PAGE_IOMAP set or not and to figure out if the PTE has a MFN or PFN.

Either way, the PCI DMA API _always_ return the DMA address for pages.

So as long as a user-buffer has 'struct page' backing it it should be
possible to get the DMA address. 

Hopefully I've not confused this matter :-(

> Will work on that.
> 
> HUGE THANKS!

Oh, thank you!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Solved] Nouveau on dom0
  2010-03-08 17:51                               ` Konrad Rzeszutek Wilk
@ 2010-03-10 12:50                                 ` Arvind R
  2010-03-10 14:00                                   ` Pasi Kärkkäinen
                                                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Arvind R @ 2010-03-10 12:50 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

On Mon, Mar 8, 2010 at 11:21 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sun, Mar 07, 2010 at 05:26:12AM +0530, Arvind R wrote:
>> On Sun, Mar 7, 2010 at 2:29 AM, Arvind R <arvino55@gmail.com> wrote:
>> > On Sat, Mar 6, 2010 at 1:46 PM, Arvind R <arvino55@gmail.com> wrote:
>> >> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
>> >> <konrad.wilk@oracle.com> wrote:
>> >>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
>> >>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
>> >>>> <konrad.wilk@oracle.com> wrote:
>> >>>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>> >>>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>> >>>> >> <konrad.wilk@oracle.com> wrote:
>>
>> >>> (FYI, look at
>> >>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
>>
>> THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.
>
> That is great! Thanks for doing all the hard-work in digging through the
> code.
>
>
> So this means you got graphics on the screen? Or at least that Kernel
> Mode Setting and the DRM parts show fancy graphics during boot?

AT LAST, yes! Patch: (after aboout 600 reboots!)

diff -Naur nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c
nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c
--- nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c 2010-01-27
10:19:28.000000000 +0530
+++ nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c  2010-03-10
17:28:59.000000000 +0530
@@ -271,7 +271,10 @@
         */

        vma->vm_private_data = bo;
-       vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
+       vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
+       if (!((bo->mem.placement & TTM_PL_MASK_MEM) & TTM_PL_FLAG_TT))
+               vma->vm_flags |= VM_IO;
+       vma->vm_page_prot = vma_get_vm_prot(vma->vm_flags);
        return 0;
 out_unref:
        ttm_bo_unref(&bo);

The previous patch worked for memory-space exported to user via
mmap. That worked for the pushbuf, but not for mode-setting (I guess).
The ensuing crashes were hard - no logs, nothing. So had to devise
ways of forcing log-writing before crashing (and praying). The located
iomem problem and had search code for appropriate condition.
And setting the vm_page_prot IS important!

Nouveau does kernel-modesetting only. The framebuffer device uses
channel 1 and is as regular a framebuffer as any other. 2D graphics
operations use channel 2 (xf86-video-nouveau). 3D graphics (gallium)
use a channel for every 3D window. There are 128 channels, 0 and 127
being reserved. Every channel has a dma-engine which is user triggered
thro' pushbuffer rings. Every DMA has a 1MiB VRAM space which forms one
of the targets of DMA ops - the other being in the opaque GPU-space. The
BO encapsualtes the virtual-address space of the user VM. and the GPU-DMA
is provided a constructed PageTable that is consistent with the kernel view of
that space. The GEM_NEW ioctl sets up the whole space-management machinery,
the user-space is mmaped out, and the operations triggered thro the pushbuf.

> But to answer your question, the DMA address is actually the MFN
> (machine frame number) which is bitshifted by twelve and an offset
> added. The debug patch I provided gets that from the
>
> PTE value:
>
>        if (xen_domain()) {
> +               phys = (pte_mfn(*pte) << PAGE_SHIFT) + offset;
>
> The 'phys' now has the physical address that PCI bus (and the video
> card) would utilize to request data to. Please keep in mind that the
> 'pte_mfn' is a special Xen function. Normally one would do 'pte'.
>
> There is a layer of indirection in the Linux pvops kernel that makes
> this a bit funny. Mainly most of the time you get something called GPFN
> which is a psedu-physical MFN. Then there is a translation of PFN to
> MFN (or vice-versa). For pages that are being utilized for PCI devices
> (and that have _PAGE_IOMAP PTE flag set), the GPFN is actually the MFN,
> while for the rest (like the pages allocated by the mmap and then
> stitched up in the ttm_bo_fault handler), it is the PFN.
>
> .. back to the DMA part. When kernel subsystems do DMA they go through a
> PCI DMA API. This API has things such as 'dma_map_page', which through
> layers of indirection calls the Xen SWIOTLB layer. The Xen SWIOTLB is
> smart enough (actually, the enligthen.c) to distinguish if the page has
> _PAGE_IOMAP set or not and to figure out if the PTE has a MFN or PFN.
>
> Hopefully I've not confused this matter :-(
On the contrary, a neat essence of the matter - only wish it was clear to me
a month ago:-(

YAHOO! (just a simple shout)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Solved] Nouveau on dom0
  2010-03-10 12:50                                 ` [Solved] " Arvind R
@ 2010-03-10 14:00                                   ` Pasi Kärkkäinen
  2010-03-10 19:37                                   ` Jeremy Fitzhardinge
       [not found]                                   ` <20100311201536.GA22182@phenom.dumpdata.com>
  2 siblings, 0 replies; 21+ messages in thread
From: Pasi Kärkkäinen @ 2010-03-10 14:00 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel, Konrad Rzeszutek Wilk

On Wed, Mar 10, 2010 at 06:20:42PM +0530, Arvind R wrote:
> On Mon, Mar 8, 2010 at 11:21 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com> wrote:
> > On Sun, Mar 07, 2010 at 05:26:12AM +0530, Arvind R wrote:
> >> On Sun, Mar 7, 2010 at 2:29 AM, Arvind R <arvino55@gmail.com> wrote:
> >> > On Sat, Mar 6, 2010 at 1:46 PM, Arvind R <arvino55@gmail.com> wrote:
> >> >> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
> >> >> <konrad.wilk@oracle.com> wrote:
> >> >>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
> >> >>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
> >> >>>> <konrad.wilk@oracle.com> wrote:
> >> >>>> > On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
> >> >>>> >> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
> >> >>>> >> <konrad.wilk@oracle.com> wrote:
> >>
> >> >>> (FYI, look at
> >> >>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
> >>
> >> THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.
> >
> > That is great! Thanks for doing all the hard-work in digging through the
> > code.
> >
> >
> > So this means you got graphics on the screen? Or at least that Kernel
> > Mode Setting and the DRM parts show fancy graphics during boot?
> 
> AT LAST, yes! Patch: (after aboout 600 reboots!)
> 

Cool, congratulations!

-- Pasi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Solved] Nouveau on dom0
  2010-03-10 12:50                                 ` [Solved] " Arvind R
  2010-03-10 14:00                                   ` Pasi Kärkkäinen
@ 2010-03-10 19:37                                   ` Jeremy Fitzhardinge
       [not found]                                   ` <20100311201536.GA22182@phenom.dumpdata.com>
  2 siblings, 0 replies; 21+ messages in thread
From: Jeremy Fitzhardinge @ 2010-03-10 19:37 UTC (permalink / raw)
  To: Arvind R; +Cc: xen-devel, Konrad Rzeszutek Wilk

On 03/10/2010 04:50 AM, Arvind R wrote:
> On Mon, Mar 8, 2010 at 11:21 PM, Konrad Rzeszutek Wilk
> <konrad.wilk@oracle.com>  wrote:
>    
>> On Sun, Mar 07, 2010 at 05:26:12AM +0530, Arvind R wrote:
>>      
>>> On Sun, Mar 7, 2010 at 2:29 AM, Arvind R<arvino55@gmail.com>  wrote:
>>>        
>>>> On Sat, Mar 6, 2010 at 1:46 PM, Arvind R<arvino55@gmail.com>  wrote:
>>>>          
>>>>> On Sat, Mar 6, 2010 at 1:53 AM, Konrad Rzeszutek Wilk
>>>>> <konrad.wilk@oracle.com>  wrote:
>>>>>            
>>>>>> On Fri, Mar 05, 2010 at 01:16:13PM +0530, Arvind R wrote:
>>>>>>              
>>>>>>> On Thu, Mar 4, 2010 at 11:55 PM, Konrad Rzeszutek Wilk
>>>>>>> <konrad.wilk@oracle.com>  wrote:
>>>>>>>                
>>>>>>>> On Thu, Mar 04, 2010 at 02:47:58PM +0530, Arvind R wrote:
>>>>>>>>                  
>>>>>>>>> On Wed, Mar 3, 2010 at 11:43 PM, Konrad Rzeszutek Wilk
>>>>>>>>> <konrad.wilk@oracle.com>  wrote:
>>>>>>>>>                    
>>>        
>>>>>> (FYI, look at
>>>>>> http://git.kernel.org/?p=linux/kernel/git/konrad/xen.git;a=commit;h=e84db8b7136d1b4a393dbd982201d0c5a3794333)
>>>>>>              
>>> THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.
>>>        
>> That is great! Thanks for doing all the hard-work in digging through the
>> code.
>>
>>
>> So this means you got graphics on the screen? Or at least that Kernel
>> Mode Setting and the DRM parts show fancy graphics during boot?
>>      
> AT LAST, yes! Patch: (after aboout 600 reboots!)
>
> diff -Naur nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c
> nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c
> --- nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c 2010-01-27
> 10:19:28.000000000 +0530
> +++ nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c  2010-03-10
> 17:28:59.000000000 +0530
> @@ -271,7 +271,10 @@
>           */
>
>          vma->vm_private_data = bo;
> -       vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
> +       vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
> +       if (!((bo->mem.placement&  TTM_PL_MASK_MEM)&  TTM_PL_FLAG_TT))
> +               vma->vm_flags |= VM_IO;
> +       vma->vm_page_prot = vma_get_vm_prot(vma->vm_flags);
>          return 0;
>   out_unref:
>          ttm_bo_unref(&bo);
>
>    

Cool, nice and simple.  Can you write it up as a proper patch for 
submission to upstream?

Thanks,
     J

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Solved] Nouveau on dom0
       [not found]                                   ` <20100311201536.GA22182@phenom.dumpdata.com>
@ 2010-03-12  6:12                                     ` Arvind R
  0 siblings, 0 replies; 21+ messages in thread
From: Arvind R @ 2010-03-12  6:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel

[-- Attachment #1: Type: text/plain, Size: 5297 bytes --]

On Fri, Mar 12, 2010 at 1:45 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> >> THAT SOLVED THE FAULTING; OUT_RING now completes under Xen.
>
> :-)
>
>> >
>> > That is great! Thanks for doing all the hard-work in digging through the
>> > code.
>> >
>> >
>> > So this means you got graphics on the screen? Or at least that Kernel
>> > Mode Setting and the DRM parts show fancy graphics during boot?
>>
>> AT LAST, yes! Patch: (after aboout 600 reboots!)
>>
>> diff -Naur nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> --- nouveau-kernel.orig/drivers/gpu/drm/ttm/ttm_bo_vm.c 2010-01-27
>> 10:19:28.000000000 +0530
>> +++ nouveau-kernel.new/drivers/gpu/drm/ttm/ttm_bo_vm.c  2010-03-10
>> 17:28:59.000000000 +0530
>> @@ -271,7 +271,10 @@
>>          */
>>
>>         vma->vm_private_data = bo;
>> -       vma->vm_flags |= VM_RESERVED | VM_IO | VM_MIXEDMAP | VM_DONTEXPAND;
>> +       vma->vm_flags |= VM_RESERVED | VM_MIXEDMAP | VM_DONTEXPAND;
>> +       if (!((bo->mem.placement & TTM_PL_MASK_MEM) & TTM_PL_FLAG_TT))
>> +               vma->vm_flags |= VM_IO;
>> +       vma->vm_page_prot = vma_get_vm_prot(vma->vm_flags);
>>         return 0;
>>  out_unref:
>>         ttm_bo_unref(&bo);
>>
Sorry for the typo:
vma_get_vm_prot in last added line should be vm_get_page_prot

>> The previous patch worked for memory-space exported to user via
>> mmap. That worked for the pushbuf, but not for mode-setting (I guess).
>> The ensuing crashes were hard - no logs, nothing. So had to devise
>> ways of forcing log-writing before crashing (and praying). The located
>> iomem problem and had search code for appropriate condition.
>
> Aaah.
>> And setting the vm_page_prot IS important!
>>
>> Nouveau does kernel-modesetting only. The framebuffer device uses
>> channel 1 and is as regular a framebuffer as any other. 2D graphics
>> operations use channel 2 (xf86-video-nouveau). 3D graphics (gallium)
>> use a channel for every 3D window. There are 128 channels, 0 and 127
>> being reserved. Every channel has a dma-engine which is user triggered
>
> What happens if you use only one channel? Does it grow to accomodate
> more of the writes to the ring?
One channel for one compostion. So channel 1 for the consolefb device. So
if X is set to omit acceleration, it works thro the consolefb. Channel 2 is set
up for 2D graphics which alone xf86-video-nouveau (the DDX component)
supports. Channel 3 is set up for 3D acceleration provided by gallium (Mesa)
- sort of tunneling thro the DDX layer. If you run glxgears in a
window, Channel 4
will be set up the application.

Each 'Channe' is self-contained with pushbufs, dma, bo, gpuobject ....
>
>> thro' pushbuffer rings. Every DMA has a 1MiB VRAM space which forms one
>> of the targets of DMA ops - the other being in the opaque GPU-space. The
>
> So 1MiB per channel? Is this how the textures get loaded via this 1MiB
> VRAM?
Yes.

>> BO encapsualtes the virtual-address space of the user VM. and the GPU-DMA
>> is provided a constructed PageTable that is consistent with the kernel view of
>> that space. The GEM_NEW ioctl sets up the whole space-management machinery,
>> the user-space is mmaped out, and the operations triggered thro the pushbuf.
>
> So when the write to the RING is done, the GPU accesses the System RAM memory.
> What is then the deal with the 512MB or so video cards? Is that only
> used for putting textures on it?
Half the memory is used as the viewport to the system CPU and the other
half is the GPU system. The system/user transfer to the viewport (and
controls via the iomem space). The DMA is NOT programmed in the conventional
way - it has the lowest-level pagetable created for the instances 1MiB space
(to which it is bound) and the other end is managed by the GPU intelligence.

>> YAHOO! (just a simple shout)
>
> <grins> Thank you for solving this problem! If you ever are in the
> Boston area give me a ring and the beers (or your favorite liquid) is on
> me!
>
vice-versa if you are around Chennai, India.

Actually, I'm neither an expert on the deep internals of the kernel (though
I'm getting to know more about it) nor the new generation of graphics.
Just reading about the graphics devices of today got me
frustrated because I could not get it it up on Xen, and the debates on
TTM, GEM made me think that either something was drastically wrong
or something stupid was missed. So this mission.

Video-cards have loads of specialised processors - dozens of them often.
CUDA is an environment/architecure that allows normal C programs to
use these cores - someday graphics cards will also do graphics!
So somebody will have to ensure that Xen in the future is enabled for it
- it doesn't stop with Direct-Rendering - which also needs enhancements.
Having X accelerated brought down dom0 CPU usage from 15-30% on
my system to 1-5% when running a sample WinXP (with meadowcourt
PVops drivers) domU!

PS. I sent a mail to your personal address with all the patches I used in
the workout - am attaching it here too - in case it is of interest to somebody.
You should look a the correct_section_mismatch.patch for what it is worth.

[-- Attachment #2: xen-2.6.31.6.patches.tar.gz --]
[-- Type: application/x-gzip, Size: 16478 bytes --]

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-03-12  6:12 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-25  8:46 Nouveau on dom0 Arvind R
2010-02-25 12:55 ` Konrad Rzeszutek Wilk
2010-02-25 17:01   ` Arvind R
2010-02-25 17:44     ` Konrad Rzeszutek Wilk
2010-02-26 15:34       ` Arvind R
2010-03-01 16:01         ` Konrad Rzeszutek Wilk
2010-03-02 21:34           ` Arvind R
2010-03-03 17:11             ` Arvind R
2010-03-03 18:13               ` Konrad Rzeszutek Wilk
2010-03-04  9:17                 ` Arvind R
2010-03-04 18:25                   ` Konrad Rzeszutek Wilk
2010-03-05  7:46                     ` Arvind R
2010-03-05 20:23                       ` Konrad Rzeszutek Wilk
2010-03-06  8:16                         ` Arvind R
2010-03-06 20:59                           ` Arvind R
2010-03-06 23:56                             ` Arvind R
2010-03-08 17:51                               ` Konrad Rzeszutek Wilk
2010-03-10 12:50                                 ` [Solved] " Arvind R
2010-03-10 14:00                                   ` Pasi Kärkkäinen
2010-03-10 19:37                                   ` Jeremy Fitzhardinge
     [not found]                                   ` <20100311201536.GA22182@phenom.dumpdata.com>
2010-03-12  6:12                                     ` Arvind R

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.