From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754703Ab1AJUuN (ORCPT ); Mon, 10 Jan 2011 15:50:13 -0500 Received: from smtp-outbound-2.vmware.com ([65.115.85.73]:51037 "EHLO smtp-outbound-2.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752528Ab1AJUuL (ORCPT ); Mon, 10 Jan 2011 15:50:11 -0500 Message-ID: <4D2B70FB.3000504@shipmail.org> Date: Mon, 10 Jan 2011 21:50:03 +0100 From: Thomas Hellstrom User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100624 Mandriva/3.0.5-0.1mdv2009.1 (2009.1) Thunderbird/3.0.5 MIME-Version: 1.0 To: Konrad Rzeszutek Wilk CC: dri-devel@lists.freedesktop.org, airlied@linux.ie, linux-kernel@vger.kernel.org, konrad@darnok.org Subject: Re: [RFC PATCH v2] Utilize the PCI API in the TTM framework. References: <1294420304-24811-1-git-send-email-konrad.wilk@oracle.com> <4D2B16F3.1070105@shipmail.org> <20110110152135.GA9732@dumpdata.com> <4D2B2CC1.2050203@shipmail.org> <20110110164519.GA27066@dumpdata.com> In-Reply-To: <20110110164519.GA27066@dumpdata.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 01/10/2011 05:45 PM, Konrad Rzeszutek Wilk wrote: > . snip .. > >>>> 2) What about accounting? In a *non-Xen* environment, will the >>>> number of coherent pages be less than the number of DMA32 pages, or >>>> will dma_alloc_coherent just translate into a alloc_page(GFP_DMA32)? >>>> >>> The code in the IOMMUs end up calling __get_free_pages, which ends up >>> in alloc_pages. So the call doe ends up in alloc_page(flags). >>> >>> >>> native SWIOTLB (so no IOMMU): GFP_DMA32 >>> GART (AMD's old IOMMU): GFP_DMA32: >>> >>> For the hardware IOMMUs: >>> >>> AMD VI: if it is in Passthrough mode, it calls it with GFP_DMA32. >>> If it is in DMA translation mode (normal mode) it allocates a page >>> with GFP_ZERO | ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32) and immediately >>> translates the bus address. >>> >>> The flags change a bit: >>> VT-d: if there is no identity mapping, nor the PCI device is one of the special ones >>> (GFX, Azalia), then it will pass it with GFP_DMA32. >>> If it is in identity mapping state, and the device is a GFX or Azalia sound >>> card, then it will ~(__GFP_DMA | GFP_DMA32) and immediately translate >>> the buss address. >>> >>> However, the interesting thing is that I've passed in the 'NULL' as >>> the struct device (not intentionally - did not want to add more changes >>> to the API) so all of the IOMMUs end up doing GFP_DMA32. >>> >>> But it does mess up the accounting with the AMD-VI and VT-D as they strip >>> of the __GFP_DMA32 flag off. That is a big problem, I presume? >>> >> Actually, I don't think it's a big problem. TTM allows a small >> discrepancy between allocated pages and accounted pages to be able >> to account on actual allocation result. IIRC, This means that a >> DMA32 page will always be accounted as such, or at least we can make >> it behave that way. As long as the device can always handle the >> page, we should be fine. >> > Excellent. > >> >>>> 3) Same as above, but in a Xen environment, what will stop multiple >>>> guests to exhaust the coherent pages? It seems that the TTM >>>> accounting mechanisms will no longer be valid unless the number of >>>> available coherent pages are split across the guests? >>>> >>> Say I pass in four ATI Radeon cards (wherein each is a 32-bit card) to >>> four guests. Lets also assume that we are doing heavy operations in all >>> of the guests. Since there are no communication between each TTM >>> accounting in each guest you could end up eating all of the 4GB physical >>> memory that is available to each guest. It could end up that the first >>> guess gets a lion share of the 4GB memory, while the other ones are >>> less so. >>> >>> And if one was to do that on baremetal, with four ATI Radeon cards, the >>> TTM accounting mechanism would realize it is nearing the watermark >>> and do.. something, right? What would it do actually? >>> >>> I think the error path would be the same in both cases? >>> >> Not really. The really dangerous situation is if TTM is allowed to >> exhaust all GFP_KERNEL memory. Then any application or kernel task >> > Ok, since GFP_KERNEL does not contain the GFP_DMA32 flag then > this should be OK? > No, Unless I miss something, on a machine with 4GB or less, GFP_DMA32 and GFP_KERNEL are allocated from the same pool of pages? > >> What *might* be possible, however, is that the GFP_KERNEL memory on >> the host gets exhausted due to extensive TTM allocations in the >> guest, but I guess that's a problem for XEN to resolve, not TTM. >> > Hmm. I think I am missing something here. The GFP_KERNEL is any memory > and the GFP_DMA32 is memory from the ZONE_DMA32. When we do start > using the PCI-API, what happens underneath (so under Linux) is that > "real PFNs" (Machine Frame Numbers) which are above the 0x100000 mark > get swizzled in for the guest's PFNs (this is for the PCI devices > that have the dma_mask set to 32bit). However, that is a Xen MMU > accounting issue. > So I was under the impression that when you allocate coherent memory in the guest, the physical page comes from DMA32 memory in the host. On a 4GB machine or less, that would be the same as kernel memory. Now, if 4 guests think they can allocate 2GB of coherent memory each, you might run out of kernel memory on the host? Another thing that I was thinking of is what happens if you have a huge gart and allocate a lot of coherent memory. Could that potentially exhaust IOMMU resources? >> /Thomas >> >> *) I think gem's flink still is vulnerable to this, though, so it >> > Is there a good test-case for this? > Not put in code. What you can do (for example in an openGL app) is to write some code that tries to flink with a guessed bo name until it succeeds. Then repeatedly from within the app, try to flink the same name until something crashes. I don't think the linux OOM killer can handle that situation. Should be fairly easy to put together. /Thomas