From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751814Ab1AHKl4 (ORCPT ); Sat, 8 Jan 2011 05:41:56 -0500 Received: from smtp-outbound-1.vmware.com ([65.115.85.69]:53453 "EHLO smtp-outbound-1.vmware.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750936Ab1AHKly (ORCPT ); Sat, 8 Jan 2011 05:41:54 -0500 Message-ID: <4D283F6E.2070301@shipmail.org> Date: Sat, 08 Jan 2011 11:41:50 +0100 From: Thomas Hellstrom User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.10) Gecko/20100624 Mandriva/3.0.5-0.1mdv2009.1 (2009.1) Thunderbird/3.0.5 MIME-Version: 1.0 To: Konrad Rzeszutek Wilk CC: dri-devel@lists.freedesktop.org, airlied@linux.ie, linux-kernel@vger.kernel.org, konrad@darnok.org Subject: Re: [RFC PATCH v2] Utilize the PCI API in the TTM framework. References: <1294420304-24811-1-git-send-email-konrad.wilk@oracle.com> In-Reply-To: <1294420304-24811-1-git-send-email-konrad.wilk@oracle.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Konrad! Just back from vacation. I'll try to review this patch set in the upcoming week! Thanks, Thomas On 01/07/2011 06:11 PM, Konrad Rzeszutek Wilk wrote: > Attached is a set of patches that make it possible for drivers using TTM API > (nouveau and radeon graphic drivers) to work under Xen. The explanation > is a bit complex and I am not sure if I am explaining it that well..so if > something is unclear please do ping me. > > Changes since v1: [https://lkml.org/lkml/2010/12/6/516] > - Cleaned up commit message (forgot to add my SoB). > > Short explanation of problem: What we are hitting under Xen is that instead > of programming the GART with the physical DMA address of the TTM page, we > end up programming the bounce buffer DMA (bus) address! > > Long explanation: > The reason we end up doing this is that: > > 1). alloc_page with GFP_DMA32 does not allocate "real" (Machine Frame Numbers > - MFN) under the 4GB in Xen. That is b/c if actually made the pages underneath > 4GB available to the the Linux page allocator we would not be able to > give those to other guest devices. This would mean if we tried to pass > in a USB device to one guest and in another were running the Xorg server > we wouldn't be able to do so as we would run out of pages under 4GB. So > pages that we get from alloc_page have a PFN that is under 4GB but in > reality the real physical address (MFN) is above 4GB. Ugh.. > > 2). The backends for "struct ttm_backend_func" utilize the PCI API. When > they get a page allocated via alloc_page, the use 'pci_map_page' and > program the DMA (bus) address in the GART - which is correct. But then > the calls that kick off the graphic driver to process the pages do not > use the pci_page_sync_* calls. If the physical address of the page > is the same as the DMA bus address returned from pci_map_page then there > are no trouble. But if they are different: > virt_to_phys(page_address(p)) != pci_map_page(p,..) > then the graphic card fetches data from the DMA (bus) address (so the > value returned from pci_map_page). The data however that the user wrote > to (the page p) ends up being untouched. You are probably saying: > "What? Nonsense, we stitch the page in ttm_bo_vm_fault using the PFN > and .. and even if the virt_to_phys(page_address(p)) != pci_map_page(p) > the GART ends up with the bus (DMA) address of the PFN!" That is true. > But if you combine this with 1) where you end up with page that is > above the dma_mask (even if you called it with GFP_DMA32) and then > make a call on pci_map_page you would end up with a bounce buffer! > > > The problem above can be easily reproduced on bare-metal if you pass in > "swiotlb=force iommu=soft". > > > There are two ways of fixing this: > > 1). Use the 'dma_alloc_coherent' (or pci_alloc_consistent if there is > struct pcidev present), instead of alloc_page for GFP_DMA32. The > 'dma_alloc_coherent' guarantees that the allocated page fits > within the device dma_mask (or uses the default DMA32 if no device > is passed in). This also guarantees that any subsequent call > to the PCI API for this page will return the same DMA (bus) address > as the first call (so pci_alloc_consistent, and then pci_map_page > will give the same DMA bus address). > > 2). Use the pci_sync_range_* after sending a page to the graphics > engine. If the bounce buffer is used then we end up copying the > pages. > > 3). This one I really don't want to think about, but I should mention > it. Alter the alloc_page and its backend to know about Xen. > The pushback from the MM guys will be probably: "why not use the PCI API.." > > > So attached is a draft set of patches that use solution #1. I've tested > this on baremetal (and Xen) on PCIe Radeon and nouveau cards with success. > > 1) The 'NULL' when doing dma_alloc_coherent is unsightly. I was toying > with modifying the TTM API to pass in 'struct device' or 'struct pci_device' > but figured it would make first sense to get your guys input before heading that route. > > This patch-set is also available at: > > git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git devel/ttm.pci-api.v3 > > Konrad Rzeszutek Wilk (5): > ttm: Introduce a placeholder for DMA (bus) addresses. > tm: Utilize the dma_addr_t array for pages that are to in DMA32 pool. > ttm: Expand (*populate) to support an array of DMA addresses. > radeon/ttm/PCIe: Use dma_addr if TTM has set it. > nouveau/ttm/PCIe: Use dma_addr if TTM has set it. > > And diffstat: > > drivers/gpu/drm/nouveau/nouveau_sgdma.c | 31 +++++++++++++++++++------- > drivers/gpu/drm/radeon/radeon.h | 4 ++- > drivers/gpu/drm/radeon/radeon_gart.c | 36 ++++++++++++++++++++++-------- > drivers/gpu/drm/radeon/radeon_ttm.c | 8 +++++- > drivers/gpu/drm/ttm/ttm_agp_backend.c | 3 +- > drivers/gpu/drm/ttm/ttm_page_alloc.c | 35 +++++++++++++++++++++++++----- > drivers/gpu/drm/ttm/ttm_tt.c | 12 +++++++-- > drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c | 3 +- > include/drm/ttm/ttm_bo_driver.h | 6 ++++- > include/drm/ttm/ttm_page_alloc.h | 8 +++++- > 10 files changed, 111 insertions(+), 35 deletions(-) > > >