From: Thomas Hellstrom <thomas@shipmail.org>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: dri-devel@lists.freedesktop.org, airlied@linux.ie,
linux-kernel@vger.kernel.org, konrad@darnok.org
Subject: Re: [RFC PATCH v2] Utilize the PCI API in the TTM framework.
Date: Mon, 10 Jan 2011 21:50:03 +0100 [thread overview]
Message-ID: <4D2B70FB.3000504@shipmail.org> (raw)
In-Reply-To: <20110110164519.GA27066@dumpdata.com>
On 01/10/2011 05:45 PM, Konrad Rzeszutek Wilk wrote:
> . snip ..
>
>>>> 2) What about accounting? In a *non-Xen* environment, will the
>>>> number of coherent pages be less than the number of DMA32 pages, or
>>>> will dma_alloc_coherent just translate into a alloc_page(GFP_DMA32)?
>>>>
>>> The code in the IOMMUs end up calling __get_free_pages, which ends up
>>> in alloc_pages. So the call doe ends up in alloc_page(flags).
>>>
>>>
>>> native SWIOTLB (so no IOMMU): GFP_DMA32
>>> GART (AMD's old IOMMU): GFP_DMA32:
>>>
>>> For the hardware IOMMUs:
>>>
>>> AMD VI: if it is in Passthrough mode, it calls it with GFP_DMA32.
>>> If it is in DMA translation mode (normal mode) it allocates a page
>>> with GFP_ZERO | ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32) and immediately
>>> translates the bus address.
>>>
>>> The flags change a bit:
>>> VT-d: if there is no identity mapping, nor the PCI device is one of the special ones
>>> (GFX, Azalia), then it will pass it with GFP_DMA32.
>>> If it is in identity mapping state, and the device is a GFX or Azalia sound
>>> card, then it will ~(__GFP_DMA | GFP_DMA32) and immediately translate
>>> the buss address.
>>>
>>> However, the interesting thing is that I've passed in the 'NULL' as
>>> the struct device (not intentionally - did not want to add more changes
>>> to the API) so all of the IOMMUs end up doing GFP_DMA32.
>>>
>>> But it does mess up the accounting with the AMD-VI and VT-D as they strip
>>> of the __GFP_DMA32 flag off. That is a big problem, I presume?
>>>
>> Actually, I don't think it's a big problem. TTM allows a small
>> discrepancy between allocated pages and accounted pages to be able
>> to account on actual allocation result. IIRC, This means that a
>> DMA32 page will always be accounted as such, or at least we can make
>> it behave that way. As long as the device can always handle the
>> page, we should be fine.
>>
> Excellent.
>
>>
>>>> 3) Same as above, but in a Xen environment, what will stop multiple
>>>> guests to exhaust the coherent pages? It seems that the TTM
>>>> accounting mechanisms will no longer be valid unless the number of
>>>> available coherent pages are split across the guests?
>>>>
>>> Say I pass in four ATI Radeon cards (wherein each is a 32-bit card) to
>>> four guests. Lets also assume that we are doing heavy operations in all
>>> of the guests. Since there are no communication between each TTM
>>> accounting in each guest you could end up eating all of the 4GB physical
>>> memory that is available to each guest. It could end up that the first
>>> guess gets a lion share of the 4GB memory, while the other ones are
>>> less so.
>>>
>>> And if one was to do that on baremetal, with four ATI Radeon cards, the
>>> TTM accounting mechanism would realize it is nearing the watermark
>>> and do.. something, right? What would it do actually?
>>>
>>> I think the error path would be the same in both cases?
>>>
>> Not really. The really dangerous situation is if TTM is allowed to
>> exhaust all GFP_KERNEL memory. Then any application or kernel task
>>
> Ok, since GFP_KERNEL does not contain the GFP_DMA32 flag then
> this should be OK?
>
No, Unless I miss something, on a machine with 4GB or less, GFP_DMA32
and GFP_KERNEL are allocated from the same pool of pages?
>
>> What *might* be possible, however, is that the GFP_KERNEL memory on
>> the host gets exhausted due to extensive TTM allocations in the
>> guest, but I guess that's a problem for XEN to resolve, not TTM.
>>
> Hmm. I think I am missing something here. The GFP_KERNEL is any memory
> and the GFP_DMA32 is memory from the ZONE_DMA32. When we do start
> using the PCI-API, what happens underneath (so under Linux) is that
> "real PFNs" (Machine Frame Numbers) which are above the 0x100000 mark
> get swizzled in for the guest's PFNs (this is for the PCI devices
> that have the dma_mask set to 32bit). However, that is a Xen MMU
> accounting issue.
>
So I was under the impression that when you allocate coherent memory in
the guest, the physical page comes from DMA32 memory in the host. On a
4GB machine or less, that would be the same as kernel memory. Now, if 4
guests think they can allocate 2GB of coherent memory each, you might
run out of kernel memory on the host?
Another thing that I was thinking of is what happens if you have a huge
gart and allocate a lot of coherent memory. Could that potentially
exhaust IOMMU resources?
>> /Thomas
>>
>> *) I think gem's flink still is vulnerable to this, though, so it
>>
> Is there a good test-case for this?
>
Not put in code. What you can do (for example in an openGL app) is to
write some code that tries to flink with a guessed bo name until it
succeeds. Then repeatedly from within the app, try to flink the same
name until something crashes. I don't think the linux OOM killer can
handle that situation. Should be fairly easy to put together.
/Thomas
next prev parent reply other threads:[~2011-01-10 20:50 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-07 17:11 [RFC PATCH v2] Utilize the PCI API in the TTM framework Konrad Rzeszutek Wilk
2011-01-07 17:11 ` [PATCH 1/5] ttm: Introduce a placeholder for DMA (bus) addresses Konrad Rzeszutek Wilk
2011-01-07 17:11 ` [PATCH 2/5] tm: Utilize the dma_addr_t array for pages that are to in DMA32 pool Konrad Rzeszutek Wilk
2011-01-07 17:11 ` [PATCH 3/5] ttm: Expand (*populate) to support an array of DMA addresses Konrad Rzeszutek Wilk
2011-01-07 17:11 ` [PATCH 4/5] radeon/ttm/PCIe: Use dma_addr if TTM has set it Konrad Rzeszutek Wilk
2011-01-27 21:20 ` Konrad Rzeszutek Wilk
2011-01-28 14:42 ` Jerome Glisse
2011-01-28 15:03 ` Konrad Rzeszutek Wilk
2011-02-16 15:54 ` Konrad Rzeszutek Wilk
2011-02-16 18:51 ` Jerome Glisse
2011-01-07 17:11 ` [PATCH 5/5] nouveau/ttm/PCIe: " Konrad Rzeszutek Wilk
2011-01-27 21:22 ` Konrad Rzeszutek Wilk
2011-01-07 22:21 ` [RFC PATCH v2] Utilize the PCI API in the TTM framework Ian Campbell
2011-01-08 10:41 ` Thomas Hellstrom
2011-01-10 14:25 ` Thomas Hellstrom
2011-01-10 15:21 ` Konrad Rzeszutek Wilk
2011-01-10 15:58 ` Thomas Hellstrom
2011-01-10 16:45 ` Konrad Rzeszutek Wilk
2011-01-10 20:50 ` Thomas Hellstrom [this message]
2011-01-11 15:55 ` Konrad Rzeszutek Wilk
2011-01-11 16:21 ` Alex Deucher
2011-01-11 16:59 ` Konrad Rzeszutek Wilk
2011-01-11 18:12 ` Alex Deucher
2011-01-11 18:28 ` Konrad Rzeszutek Wilk
2011-01-11 19:28 ` Alex Deucher
2011-01-12 9:12 ` Thomas Hellstrom
2011-01-12 15:19 ` Konrad Rzeszutek Wilk
2011-01-24 14:49 ` Konrad Rzeszutek Wilk
2011-03-21 13:11 ` Michel Dänzer
2011-03-21 23:18 ` Konrad Rzeszutek Wilk
2011-03-22 13:13 ` Michel Dänzer
2011-03-22 14:54 ` Konrad Rzeszutek Wilk
2011-03-22 15:10 ` Michel Dänzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D2B70FB.3000504@shipmail.org \
--to=thomas@shipmail.org \
--cc=airlied@linux.ie \
--cc=dri-devel@lists.freedesktop.org \
--cc=konrad.wilk@oracle.com \
--cc=konrad@darnok.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).