All of lore.kernel.org
 help / color / mirror / Atom feed
* drm + 4GB RAM + swiotlb = drm craps out
@ 2007-04-01 23:44 Dave Airlie
  2007-04-02  3:11 ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Airlie @ 2007-04-01 23:44 UTC (permalink / raw)
  To: Linux Kernel, dri-devel

Okay I've got a bug reported before and now again about > 4GB + radeon
blows up the DRM... on Intel hw...

What the drm currently does for the PCI GART table is it allocates a
chunk of memory (8MB) with vmalloc_32(), then when it decides to use
it it goes through every page of it calls pci_map_single() (with
PCI_DMA_TODEVICE, which is probably wrong...) with every page from the
vmalloc mapping and puts the bus addresses of the pages into the PCI
GART table on the GPU.

So when swiotlb happens, as you can guess it all falls apart as the
drm never calls sync functions at any stage...

The main problem is the ring buffer and scratch write back, these
values are read/write from both the CPU and GPU quite a lot, so this
leads me to think I should really just be using dma_alloc_coherent for
the whole lot, however this is an 8MB mapping and possibly could be
getting larger in the future and dynamic as we do dynamic PCIEGART
support for the radeons...

So I suppose I'm asking for ideas on the "correct" way to do this, and
perhaps any quick way to patch up the problem I'm seeing now by making
swiotlb not get involved ....

Regards,
Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-01 23:44 drm + 4GB RAM + swiotlb = drm craps out Dave Airlie
@ 2007-04-02  3:11 ` David Miller
  2007-04-02  4:08   ` Dave Airlie
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2007-04-02  3:11 UTC (permalink / raw)
  To: airlied; +Cc: linux-kernel, dri-devel

From: "Dave Airlie" <airlied@gmail.com>
Date: Mon, 2 Apr 2007 09:44:41 +1000

> Okay I've got a bug reported before and now again about > 4GB + radeon
> blows up the DRM... on Intel hw...
> 
> What the drm currently does for the PCI GART table is it allocates a
> chunk of memory (8MB) with vmalloc_32(), then when it decides to use
> it it goes through every page of it calls pci_map_single() (with
> PCI_DMA_TODEVICE, which is probably wrong...) with every page from the
> vmalloc mapping and puts the bus addresses of the pages into the PCI
> GART table on the GPU.
> 
> So when swiotlb happens, as you can guess it all falls apart as the
> drm never calls sync functions at any stage...

You would have hit this on any platform that does caching
in the PCI controller as well.

> The main problem is the ring buffer and scratch write back, these
> values are read/write from both the CPU and GPU quite a lot, so this
> leads me to think I should really just be using dma_alloc_coherent for
> the whole lot, however this is an 8MB mapping and possibly could be
> getting larger in the future and dynamic as we do dynamic PCIEGART
> support for the radeons...
> 
> So I suppose I'm asking for ideas on the "correct" way to do this, and
> perhaps any quick way to patch up the problem I'm seeing now by making
> swiotlb not get involved ....

Coherent memory was created for precisely the case where the cpu
and the device frequently access the memory.

8MB is indeed a lot for the kind of allocation that the coherent
DMA implementation uses.

Does it really have to be all in one big 8MB chunk?  I doubt it.
Perhaps you can therefore create multiple DMA pools instead?  See
include/linux/dmapool.h

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  3:11 ` David Miller
@ 2007-04-02  4:08   ` Dave Airlie
  2007-04-02  5:08     ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Airlie @ 2007-04-02  4:08 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, dri-devel

> >
> > So when swiotlb happens, as you can guess it all falls apart as the
> > drm never calls sync functions at any stage...
>
> You would have hit this on any platform that does caching
> in the PCI controller as well.

We must not have a great intersect of radeon and such systems..

>
> Coherent memory was created for precisely the case where the cpu
> and the device frequently access the memory.
>
> 8MB is indeed a lot for the kind of allocation that the coherent
> DMA implementation uses.
>
> Does it really have to be all in one big 8MB chunk?  I doubt it.
> Perhaps you can therefore create multiple DMA pools instead?  See
> include/linux/dmapool.h

It currently is required to be in a big 8MB chunk as it gets chopped
up by the X server not the kernel, so kernel needs to allocate pages
to back it when X inits, yes this is ugly, no it can't be fixed
without time-travelling and fixing deployed X servers...

Really we probably only need the ring buffer to be in coherent memory,
the rest of the stuff is used for DMA buffers which are mainly filled
by the CPU and read by the GPU. However I cannot change this without
breaking X, the solution is really to use TTM for this sort of
stuff.... I'm a bit worried as the AGP driver now uses vmalloc_32
which really is a meaningless interface on 64-bit systems..

Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  4:08   ` Dave Airlie
@ 2007-04-02  5:08     ` David Miller
  2007-04-02  5:15       ` Dave Airlie
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2007-04-02  5:08 UTC (permalink / raw)
  To: airlied; +Cc: linux-kernel, dri-devel

From: "Dave Airlie" <airlied@gmail.com>
Date: Mon, 2 Apr 2007 14:08:13 +1000

> > >
> > > So when swiotlb happens, as you can guess it all falls apart as the
> > > drm never calls sync functions at any stage...
> >
> > You would have hit this on any platform that does caching
> > in the PCI controller as well.
> 
> We must not have a great intersect of radeon and such systems..

It might explain why my machine hung when I tried to use
radeon with DRM on my sparc64 workstation :-)  I have
investigating that on my todo list.

> It currently is required to be in a big 8MB chunk as it gets chopped
> up by the X server not the kernel, so kernel needs to allocate pages
> to back it when X inits, yes this is ugly, no it can't be fixed
> without time-travelling and fixing deployed X servers...
>
> Really we probably only need the ring buffer to be in coherent memory,
> the rest of the stuff is used for DMA buffers which are mainly filled
> by the CPU and read by the GPU. However I cannot change this without
> breaking X, the solution is really to use TTM for this sort of
> stuff.... I'm a bit worried as the AGP driver now uses vmalloc_32
> which really is a meaningless interface on 64-bit systems..

I don't know what to recommend to you, getting 8MB of linear memory
really just isn't practical.

Perhaps we'll have to create something ugly like vmalloc_nobounce().

Remind me again why you're ending up with swiotlb'd pages?
vmalloc_32() uses GFP_KERNEL which should use entirely lowmem and thus
RAM below 4GB and not anything which should need bounce buffering.

You should only get swiotlb'd pages if __GFP_HIGHMEM were set in
the gfp flags.

Are you expecting to be able to virtually remap these pages in
PCI space as one huge 8MB chunk too and that's how swiotlb gets
involved?  That won't work, sorry...

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  5:08     ` David Miller
@ 2007-04-02  5:15       ` Dave Airlie
  2007-04-02  5:24         ` David Miller
  2007-04-02  6:27         ` Andi Kleen
  0 siblings, 2 replies; 11+ messages in thread
From: Dave Airlie @ 2007-04-02  5:15 UTC (permalink / raw)
  To: David Miller; +Cc: linux-kernel, dri-devel

> It might explain why my machine hung when I tried to use
> radeon with DRM on my sparc64 workstation :-)  I have
> investigating that on my todo list.

True, maybe the intersection is me + hw like that + radeon :-)

> I don't know what to recommend to you, getting 8MB of linear memory
> really just isn't practical.

This is the thing it doesn't need to be linear, I have a GART onboard
the radeon that I can fill in, I just need internally in the kernel to
access it linearly and in userspace to map it linearly, but it doesn't
need to be physcially linear, vmalloc_32 + map_single should in theory
be possible if.. see below...

>
> Perhaps we'll have to create something ugly like vmalloc_nobounce().
>
> Remind me again why you're ending up with swiotlb'd pages?
> vmalloc_32() uses GFP_KERNEL which should use entirely lowmem and thus
> RAM below 4GB and not anything which should need bounce buffering.

On a 64-bit machine GFP_KERNEL can give me any memory... it all works
fine on 32-bit highmem kernel as I don't get highmem... I really need
__GFP_DMA32 memory but we don't have a generic allocator that gives
this out that I can see..

> Are you expecting to be able to virtually remap these pages in
> PCI space as one huge 8MB chunk too and that's how swiotlb gets
> involved?  That won't work, sorry...

Well I feed the bus address for each page into a GART table in the GPU
and it does the linear stuff internally in the GPU memory
controller...

I suppose I want __GFP_I_D_RATHER_DIE_THAN_BOUNCE.

Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  5:15       ` Dave Airlie
@ 2007-04-02  5:24         ` David Miller
  2007-04-02  6:27         ` Andi Kleen
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2007-04-02  5:24 UTC (permalink / raw)
  To: airlied; +Cc: linux-kernel, dri-devel

From: "Dave Airlie" <airlied@gmail.com>
Date: Mon, 2 Apr 2007 15:15:48 +1000

> > Perhaps we'll have to create something ugly like vmalloc_nobounce().
> >
> > Remind me again why you're ending up with swiotlb'd pages?
> > vmalloc_32() uses GFP_KERNEL which should use entirely lowmem and thus
> > RAM below 4GB and not anything which should need bounce buffering.
> 
> On a 64-bit machine GFP_KERNEL can give me any memory... it all works
> fine on 32-bit highmem kernel as I don't get highmem... I really need
> __GFP_DMA32 memory but we don't have a generic allocator that gives
> this out that I can see..

That clears things up thanks.

Perhaps the other uses of vmalloc_32() want GFP_32 semantics too,
although I didn't check.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  6:27         ` Andi Kleen
@ 2007-04-02  5:38           ` Dave Airlie
  2007-04-02  6:40             ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Airlie @ 2007-04-02  5:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, linux-kernel, dri-devel

> >
> > On a 64-bit machine GFP_KERNEL can give me any memory... it all works
> > fine on 32-bit highmem kernel as I don't get highmem... I really need
> > __GFP_DMA32 memory but we don't have a generic allocator that gives
> > this out that I can see..
>
> __get_free_pages(..., __GFP_DMA32) on 64bit or __GFP_KERNEL or i386
> (only gives you ~900MB)

Doesn't __get_free_pages give me physically linear memory, which while
nice it isn't essential for what I need, so if I can't get my full
allocation I could in theory just start to fallback down the orders
and calling it multiple times to actually get the amount of memory I
need, this just seems overly cumbersome when what I really want is
vmalloc_32 to just work correctly on 64-bit systems... (why doesn't
vmalloc_32 pass __GFP_DMA32 to the allocator????)

> Not sure what you mean? __alloc_pages never bounces by itself.
> The nearest you can get is __GFP_DMA/__GFP_DMA32, but these have
> their own 16MB/4GB zones and don't use the swiotlb pools. And of course it
> only gives you plain memory, but doesn't remap or copy anything.

Yes I want __GFP_DMA32 but I'd like it with vmalloc not with
__get_free_pages and I've no great need for physically linear page
allocations and as I'm after quite a large order I can see this
failing ... granted with a 4GB system maybe not that quickly..

Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  5:15       ` Dave Airlie
  2007-04-02  5:24         ` David Miller
@ 2007-04-02  6:27         ` Andi Kleen
  2007-04-02  5:38           ` Dave Airlie
  1 sibling, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2007-04-02  6:27 UTC (permalink / raw)
  To: Dave Airlie; +Cc: David Miller, linux-kernel, dri-devel

"Dave Airlie" <airlied@gmail.com> writes:
> 
> On a 64-bit machine GFP_KERNEL can give me any memory... it all works
> fine on 32-bit highmem kernel as I don't get highmem... I really need
> __GFP_DMA32 memory but we don't have a generic allocator that gives
> this out that I can see..

__get_free_pages(..., __GFP_DMA32) on 64bit or __GFP_KERNEL or i386
(only gives you ~900MB) 
 
> > Are you expecting to be able to virtually remap these pages in
> > PCI space as one huge 8MB chunk too and that's how swiotlb gets
> > involved?  That won't work, sorry...
> 
> Well I feed the bus address for each page into a GART table in the GPU
> and it does the linear stuff internally in the GPU memory
> controller...
> 
> I suppose I want __GFP_I_D_RATHER_DIE_THAN_BOUNCE.

Not sure what you mean? __alloc_pages never bounces by itself.
The nearest you can get is __GFP_DMA/__GFP_DMA32, but these have
their own 16MB/4GB zones and don't use the swiotlb pools. And of course it
only gives you plain memory, but doesn't remap or copy anything.

I have some plans to unify this with swiotlb, but they're not done.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  5:38           ` Dave Airlie
@ 2007-04-02  6:40             ` Andi Kleen
  2007-04-02  6:52               ` Dave Airlie
  0 siblings, 1 reply; 11+ messages in thread
From: Andi Kleen @ 2007-04-02  6:40 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Andi Kleen, David Miller, linux-kernel

On Mon, Apr 02, 2007 at 03:38:48PM +1000, Dave Airlie wrote:
> Doesn't __get_free_pages give me physically linear memory, which while
> nice it isn't essential for what I need, so if I can't get my full
> allocation I could in theory just start to fallback down the orders
> and calling it multiple times to actually get the amount of memory I

You get linear memory. Just get individual pages.

If you want to merge them you can use vmap(),
but in the kernel you should probably just fix your in kernel 
code to not require that and work with sg lists.

If you get the pages piece by piece you can also more easily
remap them linearly into user space.

> need, this just seems overly cumbersome when what I really want is
> vmalloc_32 to just work correctly on 64-bit systems... (why doesn't
> vmalloc_32 pass __GFP_DMA32 to the allocator????)

It probably should, but see second part of sentence above.

And please never put closed lists in cc of l-k posts. Evil cc dropped.

-Andi


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  6:40             ` Andi Kleen
@ 2007-04-02  6:52               ` Dave Airlie
  2007-04-02  6:55                 ` Andi Kleen
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Airlie @ 2007-04-02  6:52 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David Miller, linux-kernel

>
> > need, this just seems overly cumbersome when what I really want is
> > vmalloc_32 to just work correctly on 64-bit systems... (why doesn't
> > vmalloc_32 pass __GFP_DMA32 to the allocator????)
>
> It probably should, but see second part of sentence above.
>
> And please never put closed lists in cc of l-k posts. Evil cc dropped.

Ah okay I'll just do an allocator based on single pages and see if I
can fix the kernel side to have the sg knowledge....

btw Its not a closed list, it's is a moderated list, all the posts do
get through to it and the people on it are probably interested...

Dave.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: drm + 4GB RAM + swiotlb = drm craps out
  2007-04-02  6:52               ` Dave Airlie
@ 2007-04-02  6:55                 ` Andi Kleen
  0 siblings, 0 replies; 11+ messages in thread
From: Andi Kleen @ 2007-04-02  6:55 UTC (permalink / raw)
  To: Dave Airlie; +Cc: Andi Kleen, David Miller, linux-kernel

On Mon, Apr 02, 2007 at 04:52:04PM +1000, Dave Airlie wrote:
> >
> >> need, this just seems overly cumbersome when what I really want is
> >> vmalloc_32 to just work correctly on 64-bit systems... (why doesn't
> >> vmalloc_32 pass __GFP_DMA32 to the allocator????)
> >
> >It probably should, but see second part of sentence above.
> >
> >And please never put closed lists in cc of l-k posts. Evil cc dropped.
> 
> Ah okay I'll just do an allocator based on single pages and see if I
> can fix the kernel side to have the sg knowledge....

I fixed vmalloc_32 now, but that's probably better to do it this 
way.

> btw Its not a closed list, it's is a moderated list, all the posts do
> get through to it and the people on it are probably interested...

Too late now. Anything that bounces emails is evil.

-Andi

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-04-02  6:55 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-01 23:44 drm + 4GB RAM + swiotlb = drm craps out Dave Airlie
2007-04-02  3:11 ` David Miller
2007-04-02  4:08   ` Dave Airlie
2007-04-02  5:08     ` David Miller
2007-04-02  5:15       ` Dave Airlie
2007-04-02  5:24         ` David Miller
2007-04-02  6:27         ` Andi Kleen
2007-04-02  5:38           ` Dave Airlie
2007-04-02  6:40             ` Andi Kleen
2007-04-02  6:52               ` Dave Airlie
2007-04-02  6:55                 ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.