All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] nouveau/gk20a: RAM device removal & IOMMU support
@ 2015-02-17  7:47 Alexandre Courbot
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:47 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Courbot

Thanks Ilia for the v2 review! Here is the v3 of this IOMMU support for GK20A
series.

Changes since v2:
- Cleaner changes for ltc
- Fixed typos in gk20a instmem IOMMU comments

Changes since v1:
- Add missing else condition in ltc
- Remove extra flags that slipped into nouveau_display.c and nv84_fence.c.

Original cover letter:

Patches 1-3 make the presence of a RAM device optional, and remove GK20A's dummy
RAM driver we were using so far. On chips using shared memory, such a device
can confuse the driver into moving objects where there is no need to, and can
trick user-space into believing it can allocate "video" memory that does not
exist. By making it possible to run Nouveau without a RAM device and
systematically returning errors when VRAM allocations are attempted, we force
user-space to do the right thing and always employ the optimal path.

Contiguous memory allocation for GK20A is now handled directly by a custom
instmem driver.

The remaining patches are not related to the RAM device removal, but since
they touch code that has been moved by patch 2 I took the freedom to include
them in this series.

Patch 4 is a little improvement for GK20A's instmem implementation, which
suppresses the permanent and unneeded CPU mapping created by the DMA API, and
frees up some CPU virtual address space.

Patches 5 and 6 implement initial IOMMU support for GK20A. On top of the GPU
MMU, GK20A also has an independent IOMMU that stands between the GPU and the
system RAM. Whether RAM accesses are performed directly or using the IOMMU is
determined by bit 34 of each address.

If a IOMMU is present, GK20A's instmem takes advantage of it to make unrelated
pages of memory appear contiguous to the GPU instead of using the DMA API.
Another benefit of the IOMMU is that it can be used by custom VM implementation
to make GPU objects allocated via TTM appear contiguous in the IOMMU space,
allowing us to maximize the use of large pages and improve performance, but that
part will come once the basic support is agreed on and merged.

All in all this series should be largely unintrusive for non-Tegra GPUs, with
only patch 1 changing common code parts, in a way that looks safe.

Alexandre Courbot (6):
  make RAM device optional
  instmem/gk20a: move memory allocation to instmem
  gk20a: remove RAM device
  instmem/gk20a: use DMA attributes
  platform: probe IOMMU if present
  instmem/gk20a: add IOMMU support

 drm/nouveau/include/nvkm/subdev/instmem.h |   1 +
 drm/nouveau/nouveau_display.c             |   8 +-
 drm/nouveau/nouveau_platform.c            |  75 ++++-
 drm/nouveau/nouveau_platform.h            |  18 ++
 drm/nouveau/nouveau_ttm.c                 |   3 +
 drm/nouveau/nv84_fence.c                  |  14 +-
 drm/nouveau/nvkm/engine/device/base.c     |   9 +-
 drm/nouveau/nvkm/engine/device/gk104.c    |   2 +-
 drm/nouveau/nvkm/subdev/clk/base.c        |   2 +-
 drm/nouveau/nvkm/subdev/fb/Kbuild         |   1 -
 drm/nouveau/nvkm/subdev/fb/base.c         |  26 +-
 drm/nouveau/nvkm/subdev/fb/gk20a.c        |   1 -
 drm/nouveau/nvkm/subdev/fb/priv.h         |   1 -
 drm/nouveau/nvkm/subdev/fb/ramgk20a.c     | 149 ----------
 drm/nouveau/nvkm/subdev/instmem/Kbuild    |   1 +
 drm/nouveau/nvkm/subdev/instmem/gk20a.c   | 438 ++++++++++++++++++++++++++++++
 drm/nouveau/nvkm/subdev/ltc/gf100.c       |  10 +-
 lib/include/nvif/os.h                     |  63 +++++
 18 files changed, 651 insertions(+), 171 deletions(-)
 delete mode 100644 drm/nouveau/nvkm/subdev/fb/ramgk20a.c
 create mode 100644 drm/nouveau/nvkm/subdev/instmem/gk20a.c

-- 
2.3.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/6] make RAM device optional
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2015-02-17  7:47   ` Alexandre Courbot
       [not found]     ` <1424159284-19920-2-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2015-02-17  7:48   ` [PATCH v3 2/6] instmem/gk20a: move memory allocation to instmem Alexandre Courbot
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:47 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Having a RAM device does not make sense for chips like GK20A which have
no dedicated video memory. The dummy RAM device that we used so far
works as a temporary band-aid, but in the long-term it is desirable for
the driver to be able to work without any kind of VRAM.

This patch adds a few conditionals in places where a RAM device was
assumed to be present and allows some more objects to be allocated from
the TT domain, allowing Nouveau to handle GPUs for which
pfb->ram == NULL.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drm/nouveau/nouveau_display.c         |  8 +++++++-
 drm/nouveau/nouveau_ttm.c             |  3 +++
 drm/nouveau/nv84_fence.c              | 14 +++++++++++---
 drm/nouveau/nvkm/engine/device/base.c |  9 ++++++---
 drm/nouveau/nvkm/subdev/clk/base.c    |  2 +-
 drm/nouveau/nvkm/subdev/fb/base.c     | 26 ++++++++++++++++++--------
 drm/nouveau/nvkm/subdev/ltc/gf100.c   | 10 +++++++++-
 7 files changed, 55 insertions(+), 17 deletions(-)

diff --git a/drm/nouveau/nouveau_display.c b/drm/nouveau/nouveau_display.c
index 860b0e2d4181..68ee0af22eea 100644
--- a/drm/nouveau/nouveau_display.c
+++ b/drm/nouveau/nouveau_display.c
@@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file *file_priv, struct drm_device *dev,
 			    struct drm_mode_create_dumb *args)
 {
 	struct nouveau_bo *bo;
+	uint32_t domain;
 	int ret;
 
 	args->pitch = roundup(args->width * (args->bpp / 8), 256);
 	args->size = args->pitch * args->height;
 	args->size = roundup(args->size, PAGE_SIZE);
 
-	ret = nouveau_gem_new(dev, args->size, 0, NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo);
+	if (nvxx_fb(&nouveau_drm(dev)->device)->ram)
+		domain = NOUVEAU_GEM_DOMAIN_VRAM;
+	else
+		domain = NOUVEAU_GEM_DOMAIN_GART;
+
+	ret = nouveau_gem_new(dev, args->size, 0, domain, 0, 0, &bo);
 	if (ret)
 		return ret;
 
diff --git a/drm/nouveau/nouveau_ttm.c b/drm/nouveau/nouveau_ttm.c
index 273e50110ec3..a3c2e9b4d937 100644
--- a/drm/nouveau/nouveau_ttm.c
+++ b/drm/nouveau/nouveau_ttm.c
@@ -85,6 +85,9 @@ nouveau_vram_manager_new(struct ttm_mem_type_manager *man,
 	if (nvbo->tile_flags & NOUVEAU_GEM_TILE_NONCONTIG)
 		size_nc = 1 << nvbo->page_shift;
 
+	if (!pfb->ram)
+		return -ENOMEM;
+
 	ret = pfb->ram->get(pfb, mem->num_pages << PAGE_SHIFT,
 			   mem->page_alignment << PAGE_SHIFT, size_nc,
 			   (nvbo->tile_flags >> 8) & 0x3ff, &node);
diff --git a/drm/nouveau/nv84_fence.c b/drm/nouveau/nv84_fence.c
index bf429cabbaa8..b981f85de888 100644
--- a/drm/nouveau/nv84_fence.c
+++ b/drm/nouveau/nv84_fence.c
@@ -215,6 +215,7 @@ nv84_fence_create(struct nouveau_drm *drm)
 {
 	struct nvkm_fifo *pfifo = nvxx_fifo(&drm->device);
 	struct nv84_fence_priv *priv;
+	u32 domain;
 	int ret;
 
 	priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
@@ -231,10 +232,17 @@ nv84_fence_create(struct nouveau_drm *drm)
 	priv->base.context_base = fence_context_alloc(priv->base.contexts);
 	priv->base.uevent = true;
 
-	ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0,
-			     TTM_PL_FLAG_VRAM, 0, 0, NULL, NULL, &priv->bo);
+	domain = nvxx_fb(&drm->device)->ram ?
+			 TTM_PL_FLAG_VRAM :
+			 /*
+			  * fences created in TT must be coherent or we will
+			  * wait on old CPU cache values!
+			  */
+			 TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED;
+	ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0, domain, 0,
+			     0, NULL, NULL, &priv->bo);
 	if (ret == 0) {
-		ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false);
+		ret = nouveau_bo_pin(priv->bo, domain, false);
 		if (ret == 0) {
 			ret = nouveau_bo_map(priv->bo);
 			if (ret)
diff --git a/drm/nouveau/nvkm/engine/device/base.c b/drm/nouveau/nvkm/engine/device/base.c
index 6efa8f38ff54..48f8537e83ac 100644
--- a/drm/nouveau/nvkm/engine/device/base.c
+++ b/drm/nouveau/nvkm/engine/device/base.c
@@ -139,9 +139,12 @@ nvkm_devobj_info(struct nvkm_object *object, void *data, u32 size)
 
 	args->v0.chipset  = device->chipset;
 	args->v0.revision = device->chiprev;
-	if (pfb)  args->v0.ram_size = args->v0.ram_user = pfb->ram->size;
-	else      args->v0.ram_size = args->v0.ram_user = 0;
-	if (imem) args->v0.ram_user = args->v0.ram_user - imem->reserved;
+	if (pfb  && pfb->ram)
+		args->v0.ram_size = args->v0.ram_user = pfb->ram->size;
+	else
+		args->v0.ram_size = args->v0.ram_user = 0;
+	if (imem)
+		args->v0.ram_user = args->v0.ram_user - imem->reserved;
 	return 0;
 }
 
diff --git a/drm/nouveau/nvkm/subdev/clk/base.c b/drm/nouveau/nvkm/subdev/clk/base.c
index b24a9cc04b73..39a83d82e0cd 100644
--- a/drm/nouveau/nvkm/subdev/clk/base.c
+++ b/drm/nouveau/nvkm/subdev/clk/base.c
@@ -184,7 +184,7 @@ nvkm_pstate_prog(struct nvkm_clk *clk, int pstatei)
 	nv_debug(clk, "setting performance state %d\n", pstatei);
 	clk->pstate = pstatei;
 
-	if (pfb->ram->calc) {
+	if (pfb->ram && pfb->ram->calc) {
 		int khz = pstate->base.domain[nv_clk_src_mem];
 		do {
 			ret = pfb->ram->calc(pfb, khz);
diff --git a/drm/nouveau/nvkm/subdev/fb/base.c b/drm/nouveau/nvkm/subdev/fb/base.c
index 16589fa613cd..61fde43dab71 100644
--- a/drm/nouveau/nvkm/subdev/fb/base.c
+++ b/drm/nouveau/nvkm/subdev/fb/base.c
@@ -55,9 +55,11 @@ _nvkm_fb_fini(struct nvkm_object *object, bool suspend)
 	struct nvkm_fb *pfb = (void *)object;
 	int ret;
 
-	ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend);
-	if (ret && suspend)
-		return ret;
+	if (pfb->ram) {
+		ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend);
+		if (ret && suspend)
+			return ret;
+	}
 
 	return nvkm_subdev_fini(&pfb->base, suspend);
 }
@@ -72,9 +74,11 @@ _nvkm_fb_init(struct nvkm_object *object)
 	if (ret)
 		return ret;
 
-	ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram));
-	if (ret)
-		return ret;
+	if (pfb->ram) {
+		ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram));
+		if (ret)
+			return ret;
+	}
 
 	for (i = 0; i < pfb->tile.regions; i++)
 		pfb->tile.prog(pfb, i, &pfb->tile.region[i]);
@@ -91,9 +95,12 @@ _nvkm_fb_dtor(struct nvkm_object *object)
 	for (i = 0; i < pfb->tile.regions; i++)
 		pfb->tile.fini(pfb, i, &pfb->tile.region[i]);
 	nvkm_mm_fini(&pfb->tags);
-	nvkm_mm_fini(&pfb->vram);
 
-	nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram);
+	if (pfb->ram) {
+		nvkm_mm_fini(&pfb->vram);
+		nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram);
+	}
+
 	nvkm_subdev_destroy(&pfb->base);
 }
 
@@ -127,6 +134,9 @@ nvkm_fb_create_(struct nvkm_object *parent, struct nvkm_object *engine,
 
 	pfb->memtype_valid = impl->memtype;
 
+	if (!impl->ram)
+		return 0;
+
 	ret = nvkm_object_ctor(nv_object(pfb), NULL, impl->ram, NULL, 0, &ram);
 	if (ret) {
 		nv_fatal(pfb, "error detecting memory configuration!!\n");
diff --git a/drm/nouveau/nvkm/subdev/ltc/gf100.c b/drm/nouveau/nvkm/subdev/ltc/gf100.c
index 8e7cc6200d60..7fb5ea0314cb 100644
--- a/drm/nouveau/nvkm/subdev/ltc/gf100.c
+++ b/drm/nouveau/nvkm/subdev/ltc/gf100.c
@@ -136,7 +136,8 @@ gf100_ltc_dtor(struct nvkm_object *object)
 	struct nvkm_ltc_priv *priv = (void *)object;
 
 	nvkm_mm_fini(&priv->tags);
-	nvkm_mm_free(&pfb->vram, &priv->tag_ram);
+	if (pfb->ram)
+		nvkm_mm_free(&pfb->vram, &priv->tag_ram);
 
 	nvkm_ltc_destroy(priv);
 }
@@ -149,6 +150,12 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv)
 	u32 tag_size, tag_margin, tag_align;
 	int ret;
 
+	/* No VRAM, no tags for now. */
+	if (!pfb->ram) {
+		priv->num_tags = 0;
+		goto mm_init;
+	}
+
 	/* tags for 1/4 of VRAM should be enough (8192/4 per GiB of VRAM) */
 	priv->num_tags = (pfb->ram->size >> 17) / 4;
 	if (priv->num_tags > (1 << 17))
@@ -183,6 +190,7 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv)
 		priv->tag_base = tag_base;
 	}
 
+mm_init:
 	ret = nvkm_mm_init(&priv->tags, 0, priv->num_tags, 1);
 	return ret;
 }
-- 
2.3.0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/6] instmem/gk20a: move memory allocation to instmem
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2015-02-17  7:47   ` [PATCH v3 1/6] make RAM device optional Alexandre Courbot
@ 2015-02-17  7:48   ` Alexandre Courbot
  2015-02-17  7:48   ` [PATCH v3 3/6] gk20a: remove RAM device Alexandre Courbot
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:48 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

GK20A does not have dedicated RAM, thus having a RAM device for it does
not make sense. Move the contiguous physical memory allocation to
instmem.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drm/nouveau/include/nvkm/subdev/instmem.h |   1 +
 drm/nouveau/nvkm/engine/device/gk104.c    |   2 +-
 drm/nouveau/nvkm/subdev/fb/ramgk20a.c     |  86 +-----------
 drm/nouveau/nvkm/subdev/instmem/Kbuild    |   1 +
 drm/nouveau/nvkm/subdev/instmem/gk20a.c   | 212 ++++++++++++++++++++++++++++++
 5 files changed, 217 insertions(+), 85 deletions(-)
 create mode 100644 drm/nouveau/nvkm/subdev/instmem/gk20a.c

diff --git a/drm/nouveau/include/nvkm/subdev/instmem.h b/drm/nouveau/include/nvkm/subdev/instmem.h
index d104c1aac807..1bcb763cfca0 100644
--- a/drm/nouveau/include/nvkm/subdev/instmem.h
+++ b/drm/nouveau/include/nvkm/subdev/instmem.h
@@ -45,4 +45,5 @@ nvkm_instmem(void *obj)
 extern struct nvkm_oclass *nv04_instmem_oclass;
 extern struct nvkm_oclass *nv40_instmem_oclass;
 extern struct nvkm_oclass *nv50_instmem_oclass;
+extern struct nvkm_oclass *gk20a_instmem_oclass;
 #endif
diff --git a/drm/nouveau/nvkm/engine/device/gk104.c b/drm/nouveau/nvkm/engine/device/gk104.c
index bf5893458a47..8f266a9a34a6 100644
--- a/drm/nouveau/nvkm/engine/device/gk104.c
+++ b/drm/nouveau/nvkm/engine/device/gk104.c
@@ -171,7 +171,7 @@ gk104_identify(struct nvkm_device *device)
 		device->oclass[NVDEV_SUBDEV_FB     ] =  gk20a_fb_oclass;
 		device->oclass[NVDEV_SUBDEV_LTC    ] =  gk104_ltc_oclass;
 		device->oclass[NVDEV_SUBDEV_IBUS   ] = &gk20a_ibus_oclass;
-		device->oclass[NVDEV_SUBDEV_INSTMEM] = nv50_instmem_oclass;
+		device->oclass[NVDEV_SUBDEV_INSTMEM] = gk20a_instmem_oclass;
 		device->oclass[NVDEV_SUBDEV_MMU    ] = &gf100_mmu_oclass;
 		device->oclass[NVDEV_SUBDEV_BAR    ] = &gk20a_bar_oclass;
 		device->oclass[NVDEV_ENGINE_DMAOBJ ] =  gf110_dmaeng_oclass;
diff --git a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c
index 5f30db140b47..60d8e1cead61 100644
--- a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c
+++ b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c
@@ -23,99 +23,17 @@
 
 #include <core/device.h>
 
-struct gk20a_mem {
-	struct nvkm_mem base;
-	void *cpuaddr;
-	dma_addr_t handle;
-};
-#define to_gk20a_mem(m) container_of(m, struct gk20a_mem, base)
-
 static void
 gk20a_ram_put(struct nvkm_fb *pfb, struct nvkm_mem **pmem)
 {
-	struct device *dev = nv_device_base(nv_device(pfb));
-	struct gk20a_mem *mem = to_gk20a_mem(*pmem);
-
-	*pmem = NULL;
-	if (unlikely(mem == NULL))
-		return;
-
-	if (likely(mem->cpuaddr))
-		dma_free_coherent(dev, mem->base.size << PAGE_SHIFT,
-				  mem->cpuaddr, mem->handle);
-
-	kfree(mem->base.pages);
-	kfree(mem);
+	BUG();
 }
 
 static int
 gk20a_ram_get(struct nvkm_fb *pfb, u64 size, u32 align, u32 ncmin,
 	     u32 memtype, struct nvkm_mem **pmem)
 {
-	struct device *dev = nv_device_base(nv_device(pfb));
-	struct gk20a_mem *mem;
-	u32 type = memtype & 0xff;
-	u32 npages, order;
-	int i;
-
-	nv_debug(pfb, "%s: size: %llx align: %x, ncmin: %x\n", __func__, size,
-		 align, ncmin);
-
-	npages = size >> PAGE_SHIFT;
-	if (npages == 0)
-		npages = 1;
-
-	if (align == 0)
-		align = PAGE_SIZE;
-	align >>= PAGE_SHIFT;
-
-	/* round alignment to the next power of 2, if needed */
-	order = fls(align);
-	if ((align & (align - 1)) == 0)
-		order--;
-	align = BIT(order);
-
-	/* ensure returned address is correctly aligned */
-	npages = max(align, npages);
-
-	mem = kzalloc(sizeof(*mem), GFP_KERNEL);
-	if (!mem)
-		return -ENOMEM;
-
-	mem->base.size = npages;
-	mem->base.memtype = type;
-
-	mem->base.pages = kzalloc(sizeof(dma_addr_t) * npages, GFP_KERNEL);
-	if (!mem->base.pages) {
-		kfree(mem);
-		return -ENOMEM;
-	}
-
-	*pmem = &mem->base;
-
-	mem->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
-					  &mem->handle, GFP_KERNEL);
-	if (!mem->cpuaddr) {
-		nv_error(pfb, "%s: cannot allocate memory!\n", __func__);
-		gk20a_ram_put(pfb, pmem);
-		return -ENOMEM;
-	}
-
-	align <<= PAGE_SHIFT;
-
-	/* alignment check */
-	if (unlikely(mem->handle & (align - 1)))
-		nv_warn(pfb, "memory not aligned as requested: %pad (0x%x)\n",
-			&mem->handle, align);
-
-	nv_debug(pfb, "alloc size: 0x%x, align: 0x%x, paddr: %pad, vaddr: %p\n",
-		 npages << PAGE_SHIFT, align, &mem->handle, mem->cpuaddr);
-
-	for (i = 0; i < npages; i++)
-		mem->base.pages[i] = mem->handle + (PAGE_SIZE * i);
-
-	mem->base.offset = (u64)mem->base.pages[0];
-	return 0;
+	BUG();
 }
 
 static int
diff --git a/drm/nouveau/nvkm/subdev/instmem/Kbuild b/drm/nouveau/nvkm/subdev/instmem/Kbuild
index e6f35abe7879..13bb7fc0a569 100644
--- a/drm/nouveau/nvkm/subdev/instmem/Kbuild
+++ b/drm/nouveau/nvkm/subdev/instmem/Kbuild
@@ -2,3 +2,4 @@ nvkm-y += nvkm/subdev/instmem/base.o
 nvkm-y += nvkm/subdev/instmem/nv04.o
 nvkm-y += nvkm/subdev/instmem/nv40.o
 nvkm-y += nvkm/subdev/instmem/nv50.o
+nvkm-y += nvkm/subdev/instmem/gk20a.o
diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
new file mode 100644
index 000000000000..6176f5072496
--- /dev/null
+++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -0,0 +1,212 @@
+/*
+ * Copyright (c) 2015, NVIDIA CORPORATION. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+
+#include <subdev/fb.h>
+#include <core/mm.h>
+#include <core/device.h>
+
+#include "priv.h"
+
+struct gk20a_instobj_priv {
+	struct nvkm_instobj base;
+	/* Must be second member here - see nouveau_gpuobj_map_vm() */
+	struct nvkm_mem *mem;
+	/* Pointed by mem */
+	struct nvkm_mem _mem;
+	void *cpuaddr;
+	dma_addr_t handle;
+	struct nvkm_mm_node r;
+};
+
+struct gk20a_instmem_priv {
+	struct nvkm_instmem base;
+	spinlock_t lock;
+	u64 addr;
+};
+
+static u32
+gk20a_instobj_rd32(struct nvkm_object *object, u64 offset)
+{
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(object);
+	struct gk20a_instobj_priv *node = (void *)object;
+	unsigned long flags;
+	u64 base = (node->mem->offset + offset) & 0xffffff00000ULL;
+	u64 addr = (node->mem->offset + offset) & 0x000000fffffULL;
+	u32 data;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	if (unlikely(priv->addr != base)) {
+		nv_wr32(priv, 0x001700, base >> 16);
+		priv->addr = base;
+	}
+	data = nv_rd32(priv, 0x700000 + addr);
+	spin_unlock_irqrestore(&priv->lock, flags);
+	return data;
+}
+
+static void
+gk20a_instobj_wr32(struct nvkm_object *object, u64 offset, u32 data)
+{
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(object);
+	struct gk20a_instobj_priv *node = (void *)object;
+	unsigned long flags;
+	u64 base = (node->mem->offset + offset) & 0xffffff00000ULL;
+	u64 addr = (node->mem->offset + offset) & 0x000000fffffULL;
+
+	spin_lock_irqsave(&priv->lock, flags);
+	if (unlikely(priv->addr != base)) {
+		nv_wr32(priv, 0x001700, base >> 16);
+		priv->addr = base;
+	}
+	nv_wr32(priv, 0x700000 + addr, data);
+	spin_unlock_irqrestore(&priv->lock, flags);
+}
+
+static void
+gk20a_instobj_dtor(struct nvkm_object *object)
+{
+	struct gk20a_instobj_priv *node = (void *)object;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node);
+	struct device *dev = nv_device_base(nv_device(priv));
+
+	if (unlikely(!node->handle))
+		return;
+
+	dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
+			  node->handle);
+
+	nvkm_instobj_destroy(&node->base);
+}
+
+static int
+gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
+		   struct nvkm_oclass *oclass, void *data, u32 _size,
+		   struct nvkm_object **pobject)
+{
+	struct nvkm_instobj_args *args = data;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent);
+	struct device *dev = nv_device_base(nv_device(priv));
+	struct gk20a_instobj_priv *node;
+	u32 size, align;
+	u32 npages;
+	int ret;
+
+	nv_debug(parent, "%s: size: %x align: %x\n", __func__,
+		 args->size, args->align);
+
+	size  = max((args->size  + 4095) & ~4095, (u32)4096);
+	align = max((args->align + 4095) & ~4095, (u32)4096);
+
+	npages = size >> PAGE_SHIFT;
+
+	ret = nvkm_instobj_create_(parent, engine, oclass, sizeof(*node),
+				      (void **)&node);
+	*pobject = nv_object(node);
+	if (ret)
+		return ret;
+
+	node->mem = &node->_mem;
+
+	node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
+					   &node->handle, GFP_KERNEL);
+	if (!node->cpuaddr) {
+		nv_error(priv, "cannot allocate DMA memory\n");
+		return -ENOMEM;
+	}
+
+	/* alignment check */
+	if (unlikely(node->handle & (align - 1)))
+		nv_warn(priv, "memory not aligned as requested: %pad (0x%x)\n",
+			&node->handle, align);
+
+	node->mem->offset = node->handle;
+	node->mem->size = size >> 12;
+	node->mem->memtype = 0;
+	node->mem->page_shift = 12;
+	INIT_LIST_HEAD(&node->mem->regions);
+
+	node->r.type = 12;
+	node->r.offset = node->handle >> 12;
+	node->r.length = npages;
+	list_add_tail(&node->r.rl_entry, &node->mem->regions);
+
+	node->base.addr = node->mem->offset;
+	node->base.size = size;
+
+	nv_debug(parent, "alloc size: 0x%x, align: 0x%x, gaddr: 0x%llx\n",
+		 size, align, node->mem->offset);
+
+	return 0;
+}
+
+static struct nvkm_instobj_impl
+gk20a_instobj_oclass = {
+	.base.ofuncs = &(struct nvkm_ofuncs) {
+		.ctor = gk20a_instobj_ctor,
+		.dtor = gk20a_instobj_dtor,
+		.init = _nvkm_instobj_init,
+		.fini = _nvkm_instobj_fini,
+		.rd32 = gk20a_instobj_rd32,
+		.wr32 = gk20a_instobj_wr32,
+	},
+};
+
+
+
+static int
+gk20a_instmem_fini(struct nvkm_object *object, bool suspend)
+{
+	struct gk20a_instmem_priv *priv = (void *)object;
+	priv->addr = ~0ULL;
+	return nvkm_instmem_fini(&priv->base, suspend);
+}
+
+static int
+gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
+		   struct nvkm_oclass *oclass, void *data, u32 size,
+		   struct nvkm_object **pobject)
+{
+	struct gk20a_instmem_priv *priv;
+	int ret;
+
+	ret = nvkm_instmem_create(parent, engine, oclass, &priv);
+	*pobject = nv_object(priv);
+	if (ret)
+		return ret;
+
+	spin_lock_init(&priv->lock);
+
+	return 0;
+}
+
+struct nvkm_oclass *
+gk20a_instmem_oclass = &(struct nvkm_instmem_impl) {
+	.base.handle = NV_SUBDEV(INSTMEM, 0xea),
+	.base.ofuncs = &(struct nvkm_ofuncs) {
+		.ctor = gk20a_instmem_ctor,
+		.dtor = _nvkm_instmem_dtor,
+		.init = _nvkm_instmem_init,
+		.fini = gk20a_instmem_fini,
+	},
+	.instobj = &gk20a_instobj_oclass.base,
+}.base;
+
-- 
2.3.0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/6] gk20a: remove RAM device
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2015-02-17  7:47   ` [PATCH v3 1/6] make RAM device optional Alexandre Courbot
  2015-02-17  7:48   ` [PATCH v3 2/6] instmem/gk20a: move memory allocation to instmem Alexandre Courbot
@ 2015-02-17  7:48   ` Alexandre Courbot
  2015-02-17  7:48   ` [PATCH v3 4/6] instmem/gk20a: use DMA attributes Alexandre Courbot
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:48 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Courbot

Now that Nouveau can operate even when there is no RAM device, remove
the dummy one used by GK20A.

Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drm/nouveau/nvkm/subdev/fb/Kbuild     |  1 -
 drm/nouveau/nvkm/subdev/fb/gk20a.c    |  1 -
 drm/nouveau/nvkm/subdev/fb/priv.h     |  1 -
 drm/nouveau/nvkm/subdev/fb/ramgk20a.c | 67 -----------------------------------
 4 files changed, 70 deletions(-)
 delete mode 100644 drm/nouveau/nvkm/subdev/fb/ramgk20a.c

diff --git a/drm/nouveau/nvkm/subdev/fb/Kbuild b/drm/nouveau/nvkm/subdev/fb/Kbuild
index 904d601e8a50..d6be4c6c5408 100644
--- a/drm/nouveau/nvkm/subdev/fb/Kbuild
+++ b/drm/nouveau/nvkm/subdev/fb/Kbuild
@@ -37,7 +37,6 @@ nvkm-y += nvkm/subdev/fb/ramgt215.o
 nvkm-y += nvkm/subdev/fb/rammcp77.o
 nvkm-y += nvkm/subdev/fb/ramgf100.o
 nvkm-y += nvkm/subdev/fb/ramgk104.o
-nvkm-y += nvkm/subdev/fb/ramgk20a.o
 nvkm-y += nvkm/subdev/fb/ramgm107.o
 nvkm-y += nvkm/subdev/fb/sddr2.o
 nvkm-y += nvkm/subdev/fb/sddr3.o
diff --git a/drm/nouveau/nvkm/subdev/fb/gk20a.c b/drm/nouveau/nvkm/subdev/fb/gk20a.c
index 6762847c05e8..a5d7857d3898 100644
--- a/drm/nouveau/nvkm/subdev/fb/gk20a.c
+++ b/drm/nouveau/nvkm/subdev/fb/gk20a.c
@@ -65,5 +65,4 @@ gk20a_fb_oclass = &(struct nvkm_fb_impl) {
 		.fini = _nvkm_fb_fini,
 	},
 	.memtype = gf100_fb_memtype_valid,
-	.ram = &gk20a_ram_oclass,
 }.base;
diff --git a/drm/nouveau/nvkm/subdev/fb/priv.h b/drm/nouveau/nvkm/subdev/fb/priv.h
index d82da02daa1f..485c4b64819a 100644
--- a/drm/nouveau/nvkm/subdev/fb/priv.h
+++ b/drm/nouveau/nvkm/subdev/fb/priv.h
@@ -32,7 +32,6 @@ extern struct nvkm_oclass gt215_ram_oclass;
 extern struct nvkm_oclass mcp77_ram_oclass;
 extern struct nvkm_oclass gf100_ram_oclass;
 extern struct nvkm_oclass gk104_ram_oclass;
-extern struct nvkm_oclass gk20a_ram_oclass;
 extern struct nvkm_oclass gm107_ram_oclass;
 
 int nvkm_sddr2_calc(struct nvkm_ram *ram);
diff --git a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c b/drm/nouveau/nvkm/subdev/fb/ramgk20a.c
deleted file mode 100644
index 60d8e1cead61..000000000000
--- a/drm/nouveau/nvkm/subdev/fb/ramgk20a.c
+++ /dev/null
@@ -1,67 +0,0 @@
-/*
- * Copyright (c) 2014, NVIDIA CORPORATION. All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
- * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
- * DEALINGS IN THE SOFTWARE.
- */
-#include "priv.h"
-
-#include <core/device.h>
-
-static void
-gk20a_ram_put(struct nvkm_fb *pfb, struct nvkm_mem **pmem)
-{
-	BUG();
-}
-
-static int
-gk20a_ram_get(struct nvkm_fb *pfb, u64 size, u32 align, u32 ncmin,
-	     u32 memtype, struct nvkm_mem **pmem)
-{
-	BUG();
-}
-
-static int
-gk20a_ram_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
-	       struct nvkm_oclass *oclass, void *data, u32 datasize,
-	       struct nvkm_object **pobject)
-{
-	struct nvkm_ram *ram;
-	int ret;
-
-	ret = nvkm_ram_create(parent, engine, oclass, &ram);
-	*pobject = nv_object(ram);
-	if (ret)
-		return ret;
-	ram->type = NV_MEM_TYPE_STOLEN;
-	ram->size = get_num_physpages() << PAGE_SHIFT;
-
-	ram->get = gk20a_ram_get;
-	ram->put = gk20a_ram_put;
-	return 0;
-}
-
-struct nvkm_oclass
-gk20a_ram_oclass = {
-	.ofuncs = &(struct nvkm_ofuncs) {
-		.ctor = gk20a_ram_ctor,
-		.dtor = _nvkm_ram_dtor,
-		.init = _nvkm_ram_init,
-		.fini = _nvkm_ram_fini,
-	},
-};
-- 
2.3.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/6] instmem/gk20a: use DMA attributes
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (2 preceding siblings ...)
  2015-02-17  7:48   ` [PATCH v3 3/6] gk20a: remove RAM device Alexandre Courbot
@ 2015-02-17  7:48   ` Alexandre Courbot
       [not found]     ` <1424159284-19920-5-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2015-02-17  7:48   ` [PATCH v3 5/6] platform: probe IOMMU if present Alexandre Courbot
  2015-02-17  7:48   ` [PATCH v3 6/6] instmem/gk20a: add IOMMU support Alexandre Courbot
  5 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:48 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

instmem for GK20A is allocated using dma_alloc_coherent(), which
provides us with a coherent CPU mapping that we never use because
instmem objects are accessed through PRAMIN. Switch to
dma_alloc_attrs() which gives us the option to dismiss that CPU mapping
and free up some CPU virtual space.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drm/nouveau/nvkm/subdev/instmem/gk20a.c | 24 ++++++++++++++++++++----
 lib/include/nvif/os.h                   | 31 +++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 6176f5072496..4c8af6e3677c 100644
--- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -24,6 +24,10 @@
 #include <core/mm.h>
 #include <core/device.h>
 
+#ifdef __KERNEL__
+#include <linux/dma-attrs.h>
+#endif
+
 #include "priv.h"
 
 struct gk20a_instobj_priv {
@@ -34,6 +38,7 @@ struct gk20a_instobj_priv {
 	struct nvkm_mem _mem;
 	void *cpuaddr;
 	dma_addr_t handle;
+	struct dma_attrs attrs;
 	struct nvkm_mm_node r;
 };
 
@@ -91,8 +96,8 @@ gk20a_instobj_dtor(struct nvkm_object *object)
 	if (unlikely(!node->handle))
 		return;
 
-	dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
-			  node->handle);
+	dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
+		       node->handle, &node->attrs);
 
 	nvkm_instobj_destroy(&node->base);
 }
@@ -126,8 +131,19 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
 
 	node->mem = &node->_mem;
 
-	node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
-					   &node->handle, GFP_KERNEL);
+	init_dma_attrs(&node->attrs);
+	/*
+	 * We will access this memory through PRAMIN and thus do not need a
+	 * consistent CPU pointer
+	 */
+	dma_set_attr(DMA_ATTR_NON_CONSISTENT, &node->attrs);
+	dma_set_attr(DMA_ATTR_WEAK_ORDERING, &node->attrs);
+	dma_set_attr(DMA_ATTR_WRITE_COMBINE, &node->attrs);
+	dma_set_attr(DMA_ATTR_NO_KERNEL_MAPPING, &node->attrs);
+
+	node->cpuaddr = dma_alloc_attrs(dev, npages << PAGE_SHIFT,
+					&node->handle, GFP_KERNEL,
+					&node->attrs);
 	if (!node->cpuaddr) {
 		nv_error(priv, "cannot allocate DMA memory\n");
 		return -ENOMEM;
diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h
index f6391a58fd11..b4d307e3ac44 100644
--- a/lib/include/nvif/os.h
+++ b/lib/include/nvif/os.h
@@ -683,6 +683,37 @@ dma_free_coherent(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus)
 {
 }
 
+enum dma_attr {
+	DMA_ATTR_WRITE_BARRIER,
+	DMA_ATTR_WEAK_ORDERING,
+	DMA_ATTR_WRITE_COMBINE,
+	DMA_ATTR_NON_CONSISTENT,
+	DMA_ATTR_NO_KERNEL_MAPPING,
+	DMA_ATTR_SKIP_CPU_SYNC,
+	DMA_ATTR_FORCE_CONTIGUOUS,
+	DMA_ATTR_MAX,
+};
+
+struct dma_attrs {
+};
+
+static inline void init_dma_attrs(struct dma_attrs *attrs) {}
+static inline void dma_set_attr(enum dma_attr attr, struct dma_attrs *attrs) {}
+
+static inline void *
+dma_alloc_attrs(struct device *dev, size_t sz, dma_addr_t *hdl, gfp_t gfp,
+		struct dma_attrs *attrs)
+{
+	return NULL;
+}
+
+static inline void
+dma_free_attrs(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus,
+	       struct dma_attrs *attrs)
+{
+}
+
+
 /******************************************************************************
  * PCI
  *****************************************************************************/
-- 
2.3.0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/6] platform: probe IOMMU if present
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (3 preceding siblings ...)
  2015-02-17  7:48   ` [PATCH v3 4/6] instmem/gk20a: use DMA attributes Alexandre Courbot
@ 2015-02-17  7:48   ` Alexandre Courbot
  2015-02-17  7:48   ` [PATCH v3 6/6] instmem/gk20a: add IOMMU support Alexandre Courbot
  5 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:48 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Courbot

Tegra SoCs have an IOMMU that can be used to present non-contiguous
physical memory as contiguous to the GPU and maximize the use of large
pages in the GPU MMU, leading to performance gains. This patch adds
support for probing such a IOMMU if present and make its properties
available in the nouveau_platform_gpu structure so subsystems can take
advantage of it.

Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drm/nouveau/nouveau_platform.c | 75 +++++++++++++++++++++++++++++++++++++++++-
 drm/nouveau/nouveau_platform.h | 18 ++++++++++
 lib/include/nvif/os.h          | 32 ++++++++++++++++++
 3 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/drm/nouveau/nouveau_platform.c b/drm/nouveau/nouveau_platform.c
index dc5900bf54ff..3691982452a9 100644
--- a/drm/nouveau/nouveau_platform.c
+++ b/drm/nouveau/nouveau_platform.c
@@ -27,6 +27,7 @@
 #include <linux/of.h>
 #include <linux/reset.h>
 #include <linux/regulator/consumer.h>
+#include <linux/iommu.h>
 #include <soc/tegra/fuse.h>
 #include <soc/tegra/pmc.h>
 
@@ -91,6 +92,71 @@ static int nouveau_platform_power_down(struct nouveau_platform_gpu *gpu)
 	return 0;
 }
 
+static void nouveau_platform_probe_iommu(struct device *dev,
+					 struct nouveau_platform_gpu *gpu)
+{
+	int err;
+	unsigned long pgsize_bitmap;
+
+	mutex_init(&gpu->iommu.mutex);
+
+	if (iommu_present(&platform_bus_type)) {
+		gpu->iommu.domain = iommu_domain_alloc(&platform_bus_type);
+		if (IS_ERR(gpu->iommu.domain))
+			goto error;
+
+		/*
+		 * A IOMMU is only usable if it supports page sizes smaller
+		 * or equal to the system's PAGE_SIZE, with a preference if
+		 * both are equal.
+		 */
+		pgsize_bitmap = gpu->iommu.domain->ops->pgsize_bitmap;
+		if (pgsize_bitmap & PAGE_SIZE) {
+			gpu->iommu.pgshift = PAGE_SHIFT;
+		} else {
+			gpu->iommu.pgshift = fls(pgsize_bitmap & ~PAGE_MASK);
+			if (gpu->iommu.pgshift == 0) {
+				dev_warn(dev, "unsupported IOMMU page size\n");
+				goto free_domain;
+			}
+			gpu->iommu.pgshift -= 1;
+		}
+
+		err = iommu_attach_device(gpu->iommu.domain, dev);
+		if (err)
+			goto free_domain;
+
+		err = nvkm_mm_init(&gpu->iommu._mm, 0,
+				   (1ULL << 40) >> gpu->iommu.pgshift, 1);
+		if (err)
+			goto detach_device;
+
+		gpu->iommu.mm = &gpu->iommu._mm;
+	}
+
+	return;
+
+detach_device:
+	iommu_detach_device(gpu->iommu.domain, dev);
+
+free_domain:
+	iommu_domain_free(gpu->iommu.domain);
+
+error:
+	gpu->iommu.domain = NULL;
+	gpu->iommu.pgshift = 0;
+	dev_err(dev, "cannot initialize IOMMU MM\n");
+}
+
+static void nouveau_platform_remove_iommu(struct device *dev,
+					  struct nouveau_platform_gpu *gpu)
+{
+	if (gpu->iommu.domain) {
+		iommu_detach_device(gpu->iommu.domain, dev);
+		iommu_domain_free(gpu->iommu.domain);
+	}
+}
+
 static int nouveau_platform_probe(struct platform_device *pdev)
 {
 	struct nouveau_platform_gpu *gpu;
@@ -118,6 +184,8 @@ static int nouveau_platform_probe(struct platform_device *pdev)
 	if (IS_ERR(gpu->clk_pwr))
 		return PTR_ERR(gpu->clk_pwr);
 
+	nouveau_platform_probe_iommu(&pdev->dev, gpu);
+
 	err = nouveau_platform_power_up(gpu);
 	if (err)
 		return err;
@@ -154,10 +222,15 @@ static int nouveau_platform_remove(struct platform_device *pdev)
 	struct nouveau_drm *drm = nouveau_drm(drm_dev);
 	struct nvkm_device *device = nvxx_device(&drm->device);
 	struct nouveau_platform_gpu *gpu = nv_device_to_platform(device)->gpu;
+	int err;
 
 	nouveau_drm_device_remove(drm_dev);
 
-	return nouveau_platform_power_down(gpu);
+	err = nouveau_platform_power_down(gpu);
+
+	nouveau_platform_remove_iommu(&pdev->dev, gpu);
+
+	return err;
 }
 
 #if IS_ENABLED(CONFIG_OF)
diff --git a/drm/nouveau/nouveau_platform.h b/drm/nouveau/nouveau_platform.h
index 268bb7213681..392874cf4725 100644
--- a/drm/nouveau/nouveau_platform.h
+++ b/drm/nouveau/nouveau_platform.h
@@ -24,10 +24,12 @@
 #define __NOUVEAU_PLATFORM_H__
 
 #include "core/device.h"
+#include "core/mm.h"
 
 struct reset_control;
 struct clk;
 struct regulator;
+struct iommu_domain;
 struct platform_driver;
 
 struct nouveau_platform_gpu {
@@ -36,6 +38,22 @@ struct nouveau_platform_gpu {
 	struct clk *clk_pwr;
 
 	struct regulator *vdd;
+
+	struct {
+		/*
+		 * Protects accesses to mm from subsystems
+		 */
+		struct mutex mutex;
+
+		struct nvkm_mm _mm;
+		/*
+		 * Just points to _mm. We need this to avoid embedding
+		 * struct nvkm_mm in os.h
+		 */
+		struct nvkm_mm *mm;
+		struct iommu_domain *domain;
+		unsigned long pgshift;
+	} iommu;
 };
 
 struct nouveau_platform_device {
diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h
index b4d307e3ac44..275fa84ad003 100644
--- a/lib/include/nvif/os.h
+++ b/lib/include/nvif/os.h
@@ -715,6 +715,29 @@ dma_free_attrs(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus,
 
 
 /******************************************************************************
+ * IOMMU
+ *****************************************************************************/
+struct iommu_domain;
+
+#define IOMMU_READ     (1 << 0)
+#define IOMMU_WRITE    (1 << 1)
+#define IOMMU_CACHE    (1 << 2) /* DMA cache coherency */
+#define IOMMU_NOEXEC   (1 << 3)
+
+static inline int
+iommu_map(struct iommu_domain *domain, unsigned long iova, dma_addr_t paddr,
+	  size_t size, int prot)
+{
+	return 0;
+}
+
+static inline size_t
+iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
+{
+	return 0;
+}
+
+/******************************************************************************
  * PCI
  *****************************************************************************/
 #include <pciaccess.h>
@@ -1208,9 +1231,18 @@ regulator_get_voltage(struct regulator *regulator)
  * nouveau drm platform device
  *****************************************************************************/
 
+struct nvkm_mm;
+
 struct nouveau_platform_gpu {
 	struct clk *clk;
 	struct regulator *vdd;
+
+	struct {
+		struct mutex mutex;
+		struct nvkm_mm *mm;
+		struct iommu_domain *domain;
+		unsigned long pgshift;
+	} iommu;
 };
 
 struct nouveau_platform_device {
-- 
2.3.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/6] instmem/gk20a: add IOMMU support
       [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                     ` (4 preceding siblings ...)
  2015-02-17  7:48   ` [PATCH v3 5/6] platform: probe IOMMU if present Alexandre Courbot
@ 2015-02-17  7:48   ` Alexandre Courbot
  5 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-17  7:48 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	gnurou-Re5JQEeQqe8AvxtiuMwx3w, Alexandre Courbot

Let GK20A's instmem take advantage of the IOMMU if it is present. Having
an IOMMU means that instmem is no longer allocated using the DMA API,
but instead obtained through page_alloc and made contiguous to the GPU
by IOMMU mappings.

Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drm/nouveau/nvkm/subdev/instmem/gk20a.c | 272 ++++++++++++++++++++++++++++----
 1 file changed, 241 insertions(+), 31 deletions(-)

diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
index 4c8af6e3677c..67a89c7f7934 100644
--- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c
+++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
@@ -20,12 +20,32 @@
  * DEALINGS IN THE SOFTWARE.
  */
 
+/*
+ * GK20A does not have dedicated video memory, and to accurately represent this
+ * fact Nouveau will not create a RAM device for it. Therefore its instmem
+ * implementation must be done directly on top of system memory, while providing
+ * coherent read and write operations.
+ *
+ * Instmem can be allocated through two means:
+ * 1) If an IOMMU mapping has been probed, the IOMMU API is used to make memory
+ *    pages contiguous to the GPU. This is the preferred way.
+ * 2) If no IOMMU mapping is probed, the DMA API is used to allocate physically
+ *    contiguous memory.
+ *
+ * In both cases CPU read and writes are performed using PRAMIN (i.e. using the
+ * GPU path) to ensure these operations are coherent for the GPU. This allows us
+ * to use more "relaxed" allocation parameters when using the DMA API, since we
+ * never need a kernel mapping.
+ */
+
 #include <subdev/fb.h>
 #include <core/mm.h>
 #include <core/device.h>
 
 #ifdef __KERNEL__
 #include <linux/dma-attrs.h>
+#include <linux/iommu.h>
+#include <nouveau_platform.h>
 #endif
 
 #include "priv.h"
@@ -36,18 +56,51 @@ struct gk20a_instobj_priv {
 	struct nvkm_mem *mem;
 	/* Pointed by mem */
 	struct nvkm_mem _mem;
+};
+
+/*
+ * Used for objects allocated using the DMA API
+ */
+struct gk20a_instobj_dma {
+	struct gk20a_instobj_priv base;
+
 	void *cpuaddr;
 	dma_addr_t handle;
 	struct dma_attrs attrs;
 	struct nvkm_mm_node r;
 };
 
+/*
+ * Used for objects flattened using the IOMMU API
+ */
+struct gk20a_instobj_iommu {
+	struct gk20a_instobj_priv base;
+
+	/* array of base.mem->size pages */
+	struct page *pages[];
+};
+
 struct gk20a_instmem_priv {
 	struct nvkm_instmem base;
 	spinlock_t lock;
 	u64 addr;
+
+	/* Only used if IOMMU if present */
+	struct mutex *mm_mutex;
+	struct nvkm_mm *mm;
+	struct iommu_domain *domain;
+	unsigned long iommu_pgshift;
 };
 
+/*
+ * Use PRAMIN to read/write data and avoid coherency issues.
+ * PRAMIN uses the GPU path and ensures data will always be coherent.
+ *
+ * A dynamic mapping based solution would be desirable in the future, but
+ * the issue remains of how to maintain coherency efficiently. On ARM it is
+ * not easy (if possible at all?) to create uncached temporary mappings.
+ */
+
 static u32
 gk20a_instobj_rd32(struct nvkm_object *object, u64 offset)
 {
@@ -87,50 +140,79 @@ gk20a_instobj_wr32(struct nvkm_object *object, u64 offset, u32 data)
 }
 
 static void
-gk20a_instobj_dtor(struct nvkm_object *object)
+gk20a_instobj_dtor_dma(struct gk20a_instobj_priv *_node)
 {
-	struct gk20a_instobj_priv *node = (void *)object;
+	struct gk20a_instobj_dma *node = (void *)_node;
 	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node);
 	struct device *dev = nv_device_base(nv_device(priv));
 
 	if (unlikely(!node->handle))
 		return;
 
-	dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
+	dma_free_attrs(dev, _node->mem->size << PAGE_SHIFT, node->cpuaddr,
 		       node->handle, &node->attrs);
+}
+
+static void
+gk20a_instobj_dtor_iommu(struct gk20a_instobj_priv *_node)
+{
+	struct gk20a_instobj_iommu *node = (void *)_node;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node);
+	struct nvkm_mm_node *r;
+	int i;
+
+	if (unlikely(list_empty(&_node->mem->regions)))
+		return;
+
+	r = list_first_entry(&_node->mem->regions, struct nvkm_mm_node,
+			     rl_entry);
+
+	/* clear bit 34 to unmap pages */
+	r->offset &= ~BIT(34 - priv->iommu_pgshift);
+
+	/* Unmap pages from GPU address space and free them */
+	for (i = 0; i < _node->mem->size; i++) {
+		iommu_unmap(priv->domain,
+			    (r->offset + i) << priv->iommu_pgshift, PAGE_SIZE);
+		__free_page(node->pages[i]);
+	}
+
+	/* Release area from GPU address space */
+	mutex_lock(priv->mm_mutex);
+	nvkm_mm_free(priv->mm, &r);
+	mutex_unlock(priv->mm_mutex);
+}
+
+static void
+gk20a_instobj_dtor(struct nvkm_object *object)
+{
+	struct gk20a_instobj_priv *node = (void *)object;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(node);
+
+	if (priv->domain)
+		gk20a_instobj_dtor_iommu(node);
+	else
+		gk20a_instobj_dtor_dma(node);
 
 	nvkm_instobj_destroy(&node->base);
 }
 
 static int
-gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
-		   struct nvkm_oclass *oclass, void *data, u32 _size,
-		   struct nvkm_object **pobject)
+gk20a_instobj_ctor_dma(struct nvkm_object *parent, struct nvkm_object *engine,
+		       struct nvkm_oclass *oclass, u32 npages, u32 align,
+		       struct gk20a_instobj_priv **_node)
 {
-	struct nvkm_instobj_args *args = data;
+	struct gk20a_instobj_dma *node;
 	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent);
-	struct device *dev = nv_device_base(nv_device(priv));
-	struct gk20a_instobj_priv *node;
-	u32 size, align;
-	u32 npages;
+	struct device *dev = nv_device_base(nv_device(parent));
 	int ret;
 
-	nv_debug(parent, "%s: size: %x align: %x\n", __func__,
-		 args->size, args->align);
-
-	size  = max((args->size  + 4095) & ~4095, (u32)4096);
-	align = max((args->align + 4095) & ~4095, (u32)4096);
-
-	npages = size >> PAGE_SHIFT;
-
 	ret = nvkm_instobj_create_(parent, engine, oclass, sizeof(*node),
-				      (void **)&node);
-	*pobject = nv_object(node);
+				   (void **)&node);
+	*_node = &node->base;
 	if (ret)
 		return ret;
 
-	node->mem = &node->_mem;
-
 	init_dma_attrs(&node->attrs);
 	/*
 	 * We will access this memory through PRAMIN and thus do not need a
@@ -154,16 +236,132 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
 		nv_warn(priv, "memory not aligned as requested: %pad (0x%x)\n",
 			&node->handle, align);
 
-	node->mem->offset = node->handle;
+	/* present memory for being mapped using small pages */
+	node->r.type = 12;
+	node->r.offset = node->handle >> 12;
+	node->r.length = (npages << PAGE_SHIFT) >> 12;
+
+	node->base._mem.offset = node->handle;
+
+	INIT_LIST_HEAD(&node->base._mem.regions);
+	list_add_tail(&node->r.rl_entry, &node->base._mem.regions);
+
+	return 0;
+}
+
+static int
+gk20a_instobj_ctor_iommu(struct nvkm_object *parent, struct nvkm_object *engine,
+			 struct nvkm_oclass *oclass, u32 npages, u32 align,
+			 struct gk20a_instobj_priv **_node)
+{
+	struct gk20a_instobj_iommu *node;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent);
+	struct nvkm_mm_node *r;
+	int ret;
+	int i;
+
+	ret = nvkm_instobj_create_(parent, engine, oclass,
+				sizeof(*node) + sizeof(node->pages[0]) * npages,
+				(void **)&node);
+	*_node = &node->base;
+	if (ret)
+		return ret;
+
+	/* Allocate backing memory */
+	for (i = 0; i < npages; i++) {
+		struct page *p = alloc_page(GFP_KERNEL);
+
+		if (p == NULL) {
+			ret = -ENOMEM;
+			goto free_pages;
+		}
+		node->pages[i] = p;
+	}
+
+	mutex_lock(priv->mm_mutex);
+	/* Reserve area from GPU address space */
+	ret = nvkm_mm_head(priv->mm, 0, 1, npages, npages,
+			   align >> priv->iommu_pgshift, &r);
+	mutex_unlock(priv->mm_mutex);
+	if (ret) {
+		nv_error(priv, "virtual space is full!\n");
+		goto free_pages;
+	}
+
+	/* Map into GPU address space */
+	for (i = 0; i < npages; i++) {
+		struct page *p = node->pages[i];
+		u32 offset = (r->offset + i) << priv->iommu_pgshift;
+
+		ret = iommu_map(priv->domain, offset, page_to_phys(p),
+				PAGE_SIZE, IOMMU_READ | IOMMU_WRITE);
+		if (ret < 0) {
+			nv_error(priv, "IOMMU mapping failure: %d\n", ret);
+
+			while (i-- > 0) {
+				offset -= PAGE_SIZE;
+				iommu_unmap(priv->domain, offset, PAGE_SIZE);
+			}
+			goto release_area;
+		}
+	}
+
+	/* Bit 34 tells that an address is to be resolved through the IOMMU */
+	r->offset |= BIT(34 - priv->iommu_pgshift);
+
+	node->base._mem.offset = ((u64)r->offset) << priv->iommu_pgshift;
+
+	INIT_LIST_HEAD(&node->base._mem.regions);
+	list_add_tail(&r->rl_entry, &node->base._mem.regions);
+
+	return 0;
+
+release_area:
+	mutex_lock(priv->mm_mutex);
+	nvkm_mm_free(priv->mm, &r);
+	mutex_unlock(priv->mm_mutex);
+
+free_pages:
+	for (i = 0; i < npages && node->pages[i] != NULL; i++)
+		__free_page(node->pages[i]);
+
+	return ret;
+}
+
+static int
+gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
+		   struct nvkm_oclass *oclass, void *data, u32 _size,
+		   struct nvkm_object **pobject)
+{
+	struct nvkm_instobj_args *args = data;
+	struct gk20a_instmem_priv *priv = (void *)nvkm_instmem(parent);
+	struct gk20a_instobj_priv *node;
+	u32 size, align;
+	int ret;
+
+	nv_debug(parent, "%s (%s): size: %x align: %x\n", __func__,
+		 priv->domain ? "IOMMU" : "DMA", args->size, args->align);
+
+	/* Round size and align to page bounds */
+	size = max((args->size  + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE);
+	align = max((args->align + ~PAGE_MASK) & PAGE_MASK, (u32)PAGE_SIZE);
+
+	if (priv->domain)
+		ret = gk20a_instobj_ctor_iommu(parent, engine, oclass,
+					      size >> PAGE_SHIFT, align, &node);
+	else
+		ret = gk20a_instobj_ctor_dma(parent, engine, oclass,
+					     size >> PAGE_SHIFT, align, &node);
+	*pobject = nv_object(node);
+	if (ret)
+		return ret;
+
+	node->mem = &node->_mem;
+
+	/* present memory for being mapped using small pages */
 	node->mem->size = size >> 12;
 	node->mem->memtype = 0;
 	node->mem->page_shift = 12;
-	INIT_LIST_HEAD(&node->mem->regions);
-
-	node->r.type = 12;
-	node->r.offset = node->handle >> 12;
-	node->r.length = npages;
-	list_add_tail(&node->r.rl_entry, &node->mem->regions);
 
 	node->base.addr = node->mem->offset;
 	node->base.size = size;
@@ -202,6 +400,7 @@ gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
 		   struct nvkm_object **pobject)
 {
 	struct gk20a_instmem_priv *priv;
+	struct nouveau_platform_device *plat;
 	int ret;
 
 	ret = nvkm_instmem_create(parent, engine, oclass, &priv);
@@ -211,6 +410,18 @@ gk20a_instmem_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
 
 	spin_lock_init(&priv->lock);
 
+	plat = nv_device_to_platform(nv_device(parent));
+	if (plat->gpu->iommu.domain) {
+		priv->domain = plat->gpu->iommu.domain;
+		priv->mm = plat->gpu->iommu.mm;
+		priv->iommu_pgshift = plat->gpu->iommu.pgshift;
+		priv->mm_mutex = &plat->gpu->iommu.mutex;
+
+		nv_info(priv, "using IOMMU\n");
+	} else {
+		nv_info(priv, "using DMA API\n");
+	}
+
 	return 0;
 }
 
@@ -225,4 +436,3 @@ gk20a_instmem_oclass = &(struct nvkm_instmem_impl) {
 	},
 	.instobj = &gk20a_instobj_oclass.base,
 }.base;
-
-- 
2.3.0

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Nouveau] [PATCH v3 1/6] make RAM device optional
       [not found]     ` <1424159284-19920-2-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2015-02-17 23:01       ` Ben Skeggs
       [not found]         ` <CACAvsv5mNb3bPzbkVoSCvXKtX195w3c8OX=543V5JHnM=hsN+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Skeggs @ 2015-02-17 23:01 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Ben Skeggs, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Tue, Feb 17, 2015 at 5:47 PM, Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
> Having a RAM device does not make sense for chips like GK20A which have
> no dedicated video memory. The dummy RAM device that we used so far
> works as a temporary band-aid, but in the long-term it is desirable for
> the driver to be able to work without any kind of VRAM.
>
> This patch adds a few conditionals in places where a RAM device was
> assumed to be present and allows some more objects to be allocated from
> the TT domain, allowing Nouveau to handle GPUs for which
> pfb->ram == NULL.
>
> Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drm/nouveau/nouveau_display.c         |  8 +++++++-
>  drm/nouveau/nouveau_ttm.c             |  3 +++
>  drm/nouveau/nv84_fence.c              | 14 +++++++++++---
>  drm/nouveau/nvkm/engine/device/base.c |  9 ++++++---
>  drm/nouveau/nvkm/subdev/clk/base.c    |  2 +-
>  drm/nouveau/nvkm/subdev/fb/base.c     | 26 ++++++++++++++++++--------
>  drm/nouveau/nvkm/subdev/ltc/gf100.c   | 10 +++++++++-
>  7 files changed, 55 insertions(+), 17 deletions(-)
>
> diff --git a/drm/nouveau/nouveau_display.c b/drm/nouveau/nouveau_display.c
> index 860b0e2d4181..68ee0af22eea 100644
> --- a/drm/nouveau/nouveau_display.c
> +++ b/drm/nouveau/nouveau_display.c
> @@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file *file_priv, struct drm_device *dev,
>                             struct drm_mode_create_dumb *args)
>  {
>         struct nouveau_bo *bo;
> +       uint32_t domain;
>         int ret;
>
>         args->pitch = roundup(args->width * (args->bpp / 8), 256);
>         args->size = args->pitch * args->height;
>         args->size = roundup(args->size, PAGE_SIZE);
>
> -       ret = nouveau_gem_new(dev, args->size, 0, NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo);
> +       if (nvxx_fb(&nouveau_drm(dev)->device)->ram)
For these checks in the drm, it's probably better to use
nouveau_drm(dev)->device.info.ram_size.

> +               domain = NOUVEAU_GEM_DOMAIN_VRAM;
> +       else
> +               domain = NOUVEAU_GEM_DOMAIN_GART;
> +
> +       ret = nouveau_gem_new(dev, args->size, 0, domain, 0, 0, &bo);
>         if (ret)
>                 return ret;
>
> diff --git a/drm/nouveau/nouveau_ttm.c b/drm/nouveau/nouveau_ttm.c
> index 273e50110ec3..a3c2e9b4d937 100644
> --- a/drm/nouveau/nouveau_ttm.c
> +++ b/drm/nouveau/nouveau_ttm.c
> @@ -85,6 +85,9 @@ nouveau_vram_manager_new(struct ttm_mem_type_manager *man,
>         if (nvbo->tile_flags & NOUVEAU_GEM_TILE_NONCONTIG)
>                 size_nc = 1 << nvbo->page_shift;
>
> +       if (!pfb->ram)
> +               return -ENOMEM;
> +
>         ret = pfb->ram->get(pfb, mem->num_pages << PAGE_SHIFT,
>                            mem->page_alignment << PAGE_SHIFT, size_nc,
>                            (nvbo->tile_flags >> 8) & 0x3ff, &node);
> diff --git a/drm/nouveau/nv84_fence.c b/drm/nouveau/nv84_fence.c
> index bf429cabbaa8..b981f85de888 100644
> --- a/drm/nouveau/nv84_fence.c
> +++ b/drm/nouveau/nv84_fence.c
> @@ -215,6 +215,7 @@ nv84_fence_create(struct nouveau_drm *drm)
>  {
>         struct nvkm_fifo *pfifo = nvxx_fifo(&drm->device);
>         struct nv84_fence_priv *priv;
> +       u32 domain;
>         int ret;
>
>         priv = drm->fence = kzalloc(sizeof(*priv), GFP_KERNEL);
> @@ -231,10 +232,17 @@ nv84_fence_create(struct nouveau_drm *drm)
>         priv->base.context_base = fence_context_alloc(priv->base.contexts);
>         priv->base.uevent = true;
>
> -       ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0,
> -                            TTM_PL_FLAG_VRAM, 0, 0, NULL, NULL, &priv->bo);
> +       domain = nvxx_fb(&drm->device)->ram ?
> +                        TTM_PL_FLAG_VRAM :
> +                        /*
> +                         * fences created in TT must be coherent or we will
> +                         * wait on old CPU cache values!
> +                         */
> +                        TTM_PL_FLAG_TT | TTM_PL_FLAG_UNCACHED;
> +       ret = nouveau_bo_new(drm->dev, 16 * priv->base.contexts, 0, domain, 0,
> +                            0, NULL, NULL, &priv->bo);
>         if (ret == 0) {
> -               ret = nouveau_bo_pin(priv->bo, TTM_PL_FLAG_VRAM, false);
> +               ret = nouveau_bo_pin(priv->bo, domain, false);
>                 if (ret == 0) {
>                         ret = nouveau_bo_map(priv->bo);
>                         if (ret)
> diff --git a/drm/nouveau/nvkm/engine/device/base.c b/drm/nouveau/nvkm/engine/device/base.c
> index 6efa8f38ff54..48f8537e83ac 100644
> --- a/drm/nouveau/nvkm/engine/device/base.c
> +++ b/drm/nouveau/nvkm/engine/device/base.c
> @@ -139,9 +139,12 @@ nvkm_devobj_info(struct nvkm_object *object, void *data, u32 size)
>
>         args->v0.chipset  = device->chipset;
>         args->v0.revision = device->chiprev;
> -       if (pfb)  args->v0.ram_size = args->v0.ram_user = pfb->ram->size;
> -       else      args->v0.ram_size = args->v0.ram_user = 0;
> -       if (imem) args->v0.ram_user = args->v0.ram_user - imem->reserved;
> +       if (pfb  && pfb->ram)
> +               args->v0.ram_size = args->v0.ram_user = pfb->ram->size;
> +       else
> +               args->v0.ram_size = args->v0.ram_user = 0;
> +       if (imem)
> +               args->v0.ram_user = args->v0.ram_user - imem->reserved;
>         return 0;
>  }
>
> diff --git a/drm/nouveau/nvkm/subdev/clk/base.c b/drm/nouveau/nvkm/subdev/clk/base.c
> index b24a9cc04b73..39a83d82e0cd 100644
> --- a/drm/nouveau/nvkm/subdev/clk/base.c
> +++ b/drm/nouveau/nvkm/subdev/clk/base.c
> @@ -184,7 +184,7 @@ nvkm_pstate_prog(struct nvkm_clk *clk, int pstatei)
>         nv_debug(clk, "setting performance state %d\n", pstatei);
>         clk->pstate = pstatei;
>
> -       if (pfb->ram->calc) {
> +       if (pfb->ram && pfb->ram->calc) {
>                 int khz = pstate->base.domain[nv_clk_src_mem];
>                 do {
>                         ret = pfb->ram->calc(pfb, khz);
> diff --git a/drm/nouveau/nvkm/subdev/fb/base.c b/drm/nouveau/nvkm/subdev/fb/base.c
> index 16589fa613cd..61fde43dab71 100644
> --- a/drm/nouveau/nvkm/subdev/fb/base.c
> +++ b/drm/nouveau/nvkm/subdev/fb/base.c
> @@ -55,9 +55,11 @@ _nvkm_fb_fini(struct nvkm_object *object, bool suspend)
>         struct nvkm_fb *pfb = (void *)object;
>         int ret;
>
> -       ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend);
> -       if (ret && suspend)
> -               return ret;
> +       if (pfb->ram) {
> +               ret = nv_ofuncs(pfb->ram)->fini(nv_object(pfb->ram), suspend);
> +               if (ret && suspend)
> +                       return ret;
> +       }
>
>         return nvkm_subdev_fini(&pfb->base, suspend);
>  }
> @@ -72,9 +74,11 @@ _nvkm_fb_init(struct nvkm_object *object)
>         if (ret)
>                 return ret;
>
> -       ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram));
> -       if (ret)
> -               return ret;
> +       if (pfb->ram) {
> +               ret = nv_ofuncs(pfb->ram)->init(nv_object(pfb->ram));
> +               if (ret)
> +                       return ret;
> +       }
>
>         for (i = 0; i < pfb->tile.regions; i++)
>                 pfb->tile.prog(pfb, i, &pfb->tile.region[i]);
> @@ -91,9 +95,12 @@ _nvkm_fb_dtor(struct nvkm_object *object)
>         for (i = 0; i < pfb->tile.regions; i++)
>                 pfb->tile.fini(pfb, i, &pfb->tile.region[i]);
>         nvkm_mm_fini(&pfb->tags);
> -       nvkm_mm_fini(&pfb->vram);
>
> -       nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram);
> +       if (pfb->ram) {
> +               nvkm_mm_fini(&pfb->vram);
> +               nvkm_object_ref(NULL, (struct nvkm_object **)&pfb->ram);
> +       }
> +
>         nvkm_subdev_destroy(&pfb->base);
>  }
>
> @@ -127,6 +134,9 @@ nvkm_fb_create_(struct nvkm_object *parent, struct nvkm_object *engine,
>
>         pfb->memtype_valid = impl->memtype;
>
> +       if (!impl->ram)
> +               return 0;
> +
>         ret = nvkm_object_ctor(nv_object(pfb), NULL, impl->ram, NULL, 0, &ram);
>         if (ret) {
>                 nv_fatal(pfb, "error detecting memory configuration!!\n");
> diff --git a/drm/nouveau/nvkm/subdev/ltc/gf100.c b/drm/nouveau/nvkm/subdev/ltc/gf100.c
> index 8e7cc6200d60..7fb5ea0314cb 100644
> --- a/drm/nouveau/nvkm/subdev/ltc/gf100.c
> +++ b/drm/nouveau/nvkm/subdev/ltc/gf100.c
> @@ -136,7 +136,8 @@ gf100_ltc_dtor(struct nvkm_object *object)
>         struct nvkm_ltc_priv *priv = (void *)object;
>
>         nvkm_mm_fini(&priv->tags);
> -       nvkm_mm_free(&pfb->vram, &priv->tag_ram);
> +       if (pfb->ram)
> +               nvkm_mm_free(&pfb->vram, &priv->tag_ram);
>
>         nvkm_ltc_destroy(priv);
>  }
> @@ -149,6 +150,12 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv)
>         u32 tag_size, tag_margin, tag_align;
>         int ret;
>
> +       /* No VRAM, no tags for now. */
> +       if (!pfb->ram) {
> +               priv->num_tags = 0;
> +               goto mm_init;
> +       }
> +
>         /* tags for 1/4 of VRAM should be enough (8192/4 per GiB of VRAM) */
>         priv->num_tags = (pfb->ram->size >> 17) / 4;
>         if (priv->num_tags > (1 << 17))
> @@ -183,6 +190,7 @@ gf100_ltc_init_tag_ram(struct nvkm_fb *pfb, struct nvkm_ltc_priv *priv)
>                 priv->tag_base = tag_base;
>         }
>
> +mm_init:
>         ret = nvkm_mm_init(&priv->tags, 0, priv->num_tags, 1);
>         return ret;
>  }
> --
> 2.3.0
>
> _______________________________________________
> Nouveau mailing list
> Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] instmem/gk20a: use DMA attributes
       [not found]     ` <1424159284-19920-5-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2015-02-17 23:08       ` Ben Skeggs
  2015-02-18  7:19         ` Alexandre Courbot
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Skeggs @ 2015-02-17 23:08 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

On Tue, Feb 17, 2015 at 5:48 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> instmem for GK20A is allocated using dma_alloc_coherent(), which
> provides us with a coherent CPU mapping that we never use because
> instmem objects are accessed through PRAMIN. Switch to
> dma_alloc_attrs() which gives us the option to dismiss that CPU mapping
> and free up some CPU virtual space.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drm/nouveau/nvkm/subdev/instmem/gk20a.c | 24 ++++++++++++++++++++----
>  lib/include/nvif/os.h                   | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
> index 6176f5072496..4c8af6e3677c 100644
> --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c
> +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
> @@ -24,6 +24,10 @@
>  #include <core/mm.h>
>  #include <core/device.h>
>
> +#ifdef __KERNEL__
> +#include <linux/dma-attrs.h>
> +#endif
> +
>  #include "priv.h"
>
>  struct gk20a_instobj_priv {
> @@ -34,6 +38,7 @@ struct gk20a_instobj_priv {
>         struct nvkm_mem _mem;
>         void *cpuaddr;
>         dma_addr_t handle;
> +       struct dma_attrs attrs;
>         struct nvkm_mm_node r;
>  };
>
> @@ -91,8 +96,8 @@ gk20a_instobj_dtor(struct nvkm_object *object)
>         if (unlikely(!node->handle))
>                 return;
>
> -       dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
> -                         node->handle);
> +       dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
> +                      node->handle, &node->attrs);
>
>         nvkm_instobj_destroy(&node->base);
>  }
> @@ -126,8 +131,19 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
>
>         node->mem = &node->_mem;
>
> -       node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
> -                                          &node->handle, GFP_KERNEL);
> +       init_dma_attrs(&node->attrs);
> +       /*
> +        * We will access this memory through PRAMIN and thus do not need a
> +        * consistent CPU pointer
> +        */
> +       dma_set_attr(DMA_ATTR_NON_CONSISTENT, &node->attrs);
> +       dma_set_attr(DMA_ATTR_WEAK_ORDERING, &node->attrs);
> +       dma_set_attr(DMA_ATTR_WRITE_COMBINE, &node->attrs);
> +       dma_set_attr(DMA_ATTR_NO_KERNEL_MAPPING, &node->attrs);
I wonder, is it possible to have a per-priv version of this instead of
per-object?  The kernel's function prototypes aren't marked const or
anything, which gives me some doubts, but it's worth checking.

> +
> +       node->cpuaddr = dma_alloc_attrs(dev, npages << PAGE_SHIFT,
> +                                       &node->handle, GFP_KERNEL,
> +                                       &node->attrs);
>         if (!node->cpuaddr) {
>                 nv_error(priv, "cannot allocate DMA memory\n");
>                 return -ENOMEM;
> diff --git a/lib/include/nvif/os.h b/lib/include/nvif/os.h
> index f6391a58fd11..b4d307e3ac44 100644
> --- a/lib/include/nvif/os.h
> +++ b/lib/include/nvif/os.h
> @@ -683,6 +683,37 @@ dma_free_coherent(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus)
>  {
>  }
>
> +enum dma_attr {
> +       DMA_ATTR_WRITE_BARRIER,
> +       DMA_ATTR_WEAK_ORDERING,
> +       DMA_ATTR_WRITE_COMBINE,
> +       DMA_ATTR_NON_CONSISTENT,
> +       DMA_ATTR_NO_KERNEL_MAPPING,
> +       DMA_ATTR_SKIP_CPU_SYNC,
> +       DMA_ATTR_FORCE_CONTIGUOUS,
> +       DMA_ATTR_MAX,
> +};
> +
> +struct dma_attrs {
> +};
> +
> +static inline void init_dma_attrs(struct dma_attrs *attrs) {}
> +static inline void dma_set_attr(enum dma_attr attr, struct dma_attrs *attrs) {}
> +
> +static inline void *
> +dma_alloc_attrs(struct device *dev, size_t sz, dma_addr_t *hdl, gfp_t gfp,
> +               struct dma_attrs *attrs)
> +{
> +       return NULL;
> +}
> +
> +static inline void
> +dma_free_attrs(struct device *dev, size_t sz, void *vaddr, dma_addr_t bus,
> +              struct dma_attrs *attrs)
> +{
> +}
> +
> +
>  /******************************************************************************
>   * PCI
>   *****************************************************************************/
> --
> 2.3.0
>
> _______________________________________________
> Nouveau mailing list
> Nouveau@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Nouveau] [PATCH v3 1/6] make RAM device optional
       [not found]         ` <CACAvsv5mNb3bPzbkVoSCvXKtX195w3c8OX=543V5JHnM=hsN+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-02-18  7:08           ` Alexandre Courbot
       [not found]             ` <CAAVeFuKxzz7cMeaC7wKno17aErLSbqBYGtQLN=pH1qsK5DgFkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-18  7:08 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: Alexandre Courbot, linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

On Wed, Feb 18, 2015 at 8:01 AM, Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Feb 17, 2015 at 5:47 PM, Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote:
>> Having a RAM device does not make sense for chips like GK20A which have
>> no dedicated video memory. The dummy RAM device that we used so far
>> works as a temporary band-aid, but in the long-term it is desirable for
>> the driver to be able to work without any kind of VRAM.
>>
>> This patch adds a few conditionals in places where a RAM device was
>> assumed to be present and allows some more objects to be allocated from
>> the TT domain, allowing Nouveau to handle GPUs for which
>> pfb->ram == NULL.
>>
>> Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> ---
>>  drm/nouveau/nouveau_display.c         |  8 +++++++-
>>  drm/nouveau/nouveau_ttm.c             |  3 +++
>>  drm/nouveau/nv84_fence.c              | 14 +++++++++++---
>>  drm/nouveau/nvkm/engine/device/base.c |  9 ++++++---
>>  drm/nouveau/nvkm/subdev/clk/base.c    |  2 +-
>>  drm/nouveau/nvkm/subdev/fb/base.c     | 26 ++++++++++++++++++--------
>>  drm/nouveau/nvkm/subdev/ltc/gf100.c   | 10 +++++++++-
>>  7 files changed, 55 insertions(+), 17 deletions(-)
>>
>> diff --git a/drm/nouveau/nouveau_display.c b/drm/nouveau/nouveau_display.c
>> index 860b0e2d4181..68ee0af22eea 100644
>> --- a/drm/nouveau/nouveau_display.c
>> +++ b/drm/nouveau/nouveau_display.c
>> @@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file *file_priv, struct drm_device *dev,
>>                             struct drm_mode_create_dumb *args)
>>  {
>>         struct nouveau_bo *bo;
>> +       uint32_t domain;
>>         int ret;
>>
>>         args->pitch = roundup(args->width * (args->bpp / 8), 256);
>>         args->size = args->pitch * args->height;
>>         args->size = roundup(args->size, PAGE_SIZE);
>>
>> -       ret = nouveau_gem_new(dev, args->size, 0, NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo);
>> +       if (nvxx_fb(&nouveau_drm(dev)->device)->ram)
> For these checks in the drm, it's probably better to use
> nouveau_drm(dev)->device.info.ram_size.

I wonder - in other places (e.g. clock, ltc) we don't have access to
nouveau_drm, so IIUC we need to rely on pfb->ram there. Wouldn't it be
more confusing to use two different ways to check the presence of VRAM
when we could stick to a single one?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] instmem/gk20a: use DMA attributes
  2015-02-17 23:08       ` Ben Skeggs
@ 2015-02-18  7:19         ` Alexandre Courbot
  0 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-18  7:19 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

On Wed, Feb 18, 2015 at 8:08 AM, Ben Skeggs <skeggsb@gmail.com> wrote:
> On Tue, Feb 17, 2015 at 5:48 PM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> instmem for GK20A is allocated using dma_alloc_coherent(), which
>> provides us with a coherent CPU mapping that we never use because
>> instmem objects are accessed through PRAMIN. Switch to
>> dma_alloc_attrs() which gives us the option to dismiss that CPU mapping
>> and free up some CPU virtual space.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drm/nouveau/nvkm/subdev/instmem/gk20a.c | 24 ++++++++++++++++++++----
>>  lib/include/nvif/os.h                   | 31 +++++++++++++++++++++++++++++++
>>  2 files changed, 51 insertions(+), 4 deletions(-)
>>
>> diff --git a/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
>> index 6176f5072496..4c8af6e3677c 100644
>> --- a/drm/nouveau/nvkm/subdev/instmem/gk20a.c
>> +++ b/drm/nouveau/nvkm/subdev/instmem/gk20a.c
>> @@ -24,6 +24,10 @@
>>  #include <core/mm.h>
>>  #include <core/device.h>
>>
>> +#ifdef __KERNEL__
>> +#include <linux/dma-attrs.h>
>> +#endif
>> +
>>  #include "priv.h"
>>
>>  struct gk20a_instobj_priv {
>> @@ -34,6 +38,7 @@ struct gk20a_instobj_priv {
>>         struct nvkm_mem _mem;
>>         void *cpuaddr;
>>         dma_addr_t handle;
>> +       struct dma_attrs attrs;
>>         struct nvkm_mm_node r;
>>  };
>>
>> @@ -91,8 +96,8 @@ gk20a_instobj_dtor(struct nvkm_object *object)
>>         if (unlikely(!node->handle))
>>                 return;
>>
>> -       dma_free_coherent(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
>> -                         node->handle);
>> +       dma_free_attrs(dev, node->mem->size << PAGE_SHIFT, node->cpuaddr,
>> +                      node->handle, &node->attrs);
>>
>>         nvkm_instobj_destroy(&node->base);
>>  }
>> @@ -126,8 +131,19 @@ gk20a_instobj_ctor(struct nvkm_object *parent, struct nvkm_object *engine,
>>
>>         node->mem = &node->_mem;
>>
>> -       node->cpuaddr = dma_alloc_coherent(dev, npages << PAGE_SHIFT,
>> -                                          &node->handle, GFP_KERNEL);
>> +       init_dma_attrs(&node->attrs);
>> +       /*
>> +        * We will access this memory through PRAMIN and thus do not need a
>> +        * consistent CPU pointer
>> +        */
>> +       dma_set_attr(DMA_ATTR_NON_CONSISTENT, &node->attrs);
>> +       dma_set_attr(DMA_ATTR_WEAK_ORDERING, &node->attrs);
>> +       dma_set_attr(DMA_ATTR_WRITE_COMBINE, &node->attrs);
>> +       dma_set_attr(DMA_ATTR_NO_KERNEL_MAPPING, &node->attrs);
> I wonder, is it possible to have a per-priv version of this instead of
> per-object?  The kernel's function prototypes aren't marked const or
> anything, which gives me some doubts, but it's worth checking.

I checked the ARM implementation of the DMA API and it seems to be
safe indeed - I will do this. Thanks!
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/6] make RAM device optional
       [not found]             ` <CAAVeFuKxzz7cMeaC7wKno17aErLSbqBYGtQLN=pH1qsK5DgFkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-02-19  2:20               ` Ben Skeggs
       [not found]                 ` <CACAvsv4uj92K+eCMCq=3+EyrL3JFFH6Mtf9G6nCpb55HhkhFnA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Skeggs @ 2015-02-19  2:20 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs


[-- Attachment #1.1: Type: text/plain, Size: 2918 bytes --]

On 18 Feb 2015 17:08, "Alexandre Courbot" <gnurou-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Wed, Feb 18, 2015 at 8:01 AM, Ben Skeggs <skeggsb-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Tue, Feb 17, 2015 at 5:47 PM, Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
wrote:
> >> Having a RAM device does not make sense for chips like GK20A which have
> >> no dedicated video memory. The dummy RAM device that we used so far
> >> works as a temporary band-aid, but in the long-term it is desirable for
> >> the driver to be able to work without any kind of VRAM.
> >>
> >> This patch adds a few conditionals in places where a RAM device was
> >> assumed to be present and allows some more objects to be allocated from
> >> the TT domain, allowing Nouveau to handle GPUs for which
> >> pfb->ram == NULL.
> >>
> >> Signed-off-by: Alexandre Courbot <acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> >> ---
> >>  drm/nouveau/nouveau_display.c         |  8 +++++++-
> >>  drm/nouveau/nouveau_ttm.c             |  3 +++
> >>  drm/nouveau/nv84_fence.c              | 14 +++++++++++---
> >>  drm/nouveau/nvkm/engine/device/base.c |  9 ++++++---
> >>  drm/nouveau/nvkm/subdev/clk/base.c    |  2 +-
> >>  drm/nouveau/nvkm/subdev/fb/base.c     | 26 ++++++++++++++++++--------
> >>  drm/nouveau/nvkm/subdev/ltc/gf100.c   | 10 +++++++++-
> >>  7 files changed, 55 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drm/nouveau/nouveau_display.c
b/drm/nouveau/nouveau_display.c
> >> index 860b0e2d4181..68ee0af22eea 100644
> >> --- a/drm/nouveau/nouveau_display.c
> >> +++ b/drm/nouveau/nouveau_display.c
> >> @@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file
*file_priv, struct drm_device *dev,
> >>                             struct drm_mode_create_dumb *args)
> >>  {
> >>         struct nouveau_bo *bo;
> >> +       uint32_t domain;
> >>         int ret;
> >>
> >>         args->pitch = roundup(args->width * (args->bpp / 8), 256);
> >>         args->size = args->pitch * args->height;
> >>         args->size = roundup(args->size, PAGE_SIZE);
> >>
> >> -       ret = nouveau_gem_new(dev, args->size, 0,
NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo);
> >> +       if (nvxx_fb(&nouveau_drm(dev)->device)->ram)
> > For these checks in the drm, it's probably better to use
> > nouveau_drm(dev)->device.info.ram_size.
>
> I wonder - in other places (e.g. clock, ltc) we don't have access to
> nouveau_drm, so IIUC we need to rely on pfb->ram there.
Correct.

>Wouldn't it be
> more confusing to use two different ways to check the presence of VRAM
> when we could stick to a single one?
It's best to think of nvkm/ as a separate entity, and it will be at some
point (drm load on its own, inside a vm), and drm might not be able to
access it's internal structures.

That's not the case now, so the code is fine as-is for the moment. But it's
worth keeping in mind.

Thanks,
Ben.

[-- Attachment #1.2: Type: text/html, Size: 4117 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/6] make RAM device optional
       [not found]                 ` <CACAvsv4uj92K+eCMCq=3+EyrL3JFFH6Mtf9G6nCpb55HhkhFnA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-02-19  9:12                   ` Alexandre Courbot
  0 siblings, 0 replies; 13+ messages in thread
From: Alexandre Courbot @ 2015-02-19  9:12 UTC (permalink / raw)
  To: Ben Skeggs
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs

On Thu, Feb 19, 2015 at 11:20 AM, Ben Skeggs <skeggsb@gmail.com> wrote:
> On 18 Feb 2015 17:08, "Alexandre Courbot" <gnurou@gmail.com> wrote:
>>
>> On Wed, Feb 18, 2015 at 8:01 AM, Ben Skeggs <skeggsb@gmail.com> wrote:
>> > On Tue, Feb 17, 2015 at 5:47 PM, Alexandre Courbot <acourbot@nvidia.com>
>> > wrote:
>> >> Having a RAM device does not make sense for chips like GK20A which have
>> >> no dedicated video memory. The dummy RAM device that we used so far
>> >> works as a temporary band-aid, but in the long-term it is desirable for
>> >> the driver to be able to work without any kind of VRAM.
>> >>
>> >> This patch adds a few conditionals in places where a RAM device was
>> >> assumed to be present and allows some more objects to be allocated from
>> >> the TT domain, allowing Nouveau to handle GPUs for which
>> >> pfb->ram == NULL.
>> >>
>> >> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> >> ---
>> >>  drm/nouveau/nouveau_display.c         |  8 +++++++-
>> >>  drm/nouveau/nouveau_ttm.c             |  3 +++
>> >>  drm/nouveau/nv84_fence.c              | 14 +++++++++++---
>> >>  drm/nouveau/nvkm/engine/device/base.c |  9 ++++++---
>> >>  drm/nouveau/nvkm/subdev/clk/base.c    |  2 +-
>> >>  drm/nouveau/nvkm/subdev/fb/base.c     | 26 ++++++++++++++++++--------
>> >>  drm/nouveau/nvkm/subdev/ltc/gf100.c   | 10 +++++++++-
>> >>  7 files changed, 55 insertions(+), 17 deletions(-)
>> >>
>> >> diff --git a/drm/nouveau/nouveau_display.c
>> >> b/drm/nouveau/nouveau_display.c
>> >> index 860b0e2d4181..68ee0af22eea 100644
>> >> --- a/drm/nouveau/nouveau_display.c
>> >> +++ b/drm/nouveau/nouveau_display.c
>> >> @@ -869,13 +869,19 @@ nouveau_display_dumb_create(struct drm_file
>> >> *file_priv, struct drm_device *dev,
>> >>                             struct drm_mode_create_dumb *args)
>> >>  {
>> >>         struct nouveau_bo *bo;
>> >> +       uint32_t domain;
>> >>         int ret;
>> >>
>> >>         args->pitch = roundup(args->width * (args->bpp / 8), 256);
>> >>         args->size = args->pitch * args->height;
>> >>         args->size = roundup(args->size, PAGE_SIZE);
>> >>
>> >> -       ret = nouveau_gem_new(dev, args->size, 0,
>> >> NOUVEAU_GEM_DOMAIN_VRAM, 0, 0, &bo);
>> >> +       if (nvxx_fb(&nouveau_drm(dev)->device)->ram)
>> > For these checks in the drm, it's probably better to use
>> > nouveau_drm(dev)->device.info.ram_size.
>>
>> I wonder - in other places (e.g. clock, ltc) we don't have access to
>> nouveau_drm, so IIUC we need to rely on pfb->ram there.
> Correct.
>
>>Wouldn't it be
>> more confusing to use two different ways to check the presence of VRAM
>> when we could stick to a single one?
> It's best to think of nvkm/ as a separate entity, and it will be at some
> point (drm load on its own, inside a vm), and drm might not be able to
> access it's internal structures.
>
> That's not the case now, so the code is fine as-is for the moment. But it's
> worth keeping in mind.

Thanks for clarifying! I will update according to your suggestion.
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-02-19  9:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-17  7:47 [PATCH v3 0/6] nouveau/gk20a: RAM device removal & IOMMU support Alexandre Courbot
     [not found] ` <1424159284-19920-1-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2015-02-17  7:47   ` [PATCH v3 1/6] make RAM device optional Alexandre Courbot
     [not found]     ` <1424159284-19920-2-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2015-02-17 23:01       ` [Nouveau] " Ben Skeggs
     [not found]         ` <CACAvsv5mNb3bPzbkVoSCvXKtX195w3c8OX=543V5JHnM=hsN+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-18  7:08           ` Alexandre Courbot
     [not found]             ` <CAAVeFuKxzz7cMeaC7wKno17aErLSbqBYGtQLN=pH1qsK5DgFkQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-19  2:20               ` Ben Skeggs
     [not found]                 ` <CACAvsv4uj92K+eCMCq=3+EyrL3JFFH6Mtf9G6nCpb55HhkhFnA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-02-19  9:12                   ` Alexandre Courbot
2015-02-17  7:48   ` [PATCH v3 2/6] instmem/gk20a: move memory allocation to instmem Alexandre Courbot
2015-02-17  7:48   ` [PATCH v3 3/6] gk20a: remove RAM device Alexandre Courbot
2015-02-17  7:48   ` [PATCH v3 4/6] instmem/gk20a: use DMA attributes Alexandre Courbot
     [not found]     ` <1424159284-19920-5-git-send-email-acourbot-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2015-02-17 23:08       ` Ben Skeggs
2015-02-18  7:19         ` Alexandre Courbot
2015-02-17  7:48   ` [PATCH v3 5/6] platform: probe IOMMU if present Alexandre Courbot
2015-02-17  7:48   ` [PATCH v3 6/6] instmem/gk20a: add IOMMU support Alexandre Courbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.