All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Radeon memory management improvements
@ 2014-02-24 15:20 Marek Olšák
  2014-02-24 15:20 ` [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains Marek Olšák
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

This series improves performance for the cases when there is not enough VRAM for all buffers.

First of all, I'd like to mention that if you set both VRAM and GTT domains for a buffer, you pretty much say you don't care where the buffer ends up. It usually makes the performance even worse.

This work was largely benchmark-driven and I tried a lot of ideas before I found out which ones work. The patches describe what they do and they're quite simple, so I'll just share the results here.


Card: Evergreen Redwood (HD 5670), 512 MB of VRAM
Test: Unigine Heaven 4.0, High settings

1) 1280x720, 4x MSAA, need 525 MB of VRAM

Without patches: 16.6 FPS
With patches: 16.6 FPS
Improvement: 0 %

2) 1600x900, 4x MSAA, need 642 MB of VRAM

Without patches: 7.1 FPS
With patches: 9.7 FPS
Improvement: 36 %

3) 1920x1080, 4x MSAA, need 743 MB of VRAM

Without patches: 3.7 FPS
With patches: 5.6 FPS
Improvement: 51 %

4) 1600x900, 8x MSAA, need 838 MB of VRAM
Without patches: 2.9 FPS
With patches: 4.6 FPS
Improvement: 58 %

These results don't change if you run the benchmark several times, which proves the improvement is stable.


To conclude this, here are ideas for future work:

1) Add virtual memory support for VRAM. Our GPUs support virtual memory, which not only solves fragmentation issues, but it also allows each buffer to be partially in VRAM and partially in GTT, which becomes more important with large buffers like 100 MB. Moving whole buffers back and forth between VRAM and GTT is inefficient if you can do it at page granularity. Also, due to fragmentation, we can never really use all of VRAM, but only about 90-95%.

2) Add support for uncached GTT. I think it should improve performance for dGPUs under memory pressure, but some testing needs to be done to confirm that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said to be working on some later chips.


The patches for Mesa will follow later today. Please review.

Marek

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 16:17   ` Christian König
  2014-02-24 15:20 ` [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves Marek Olšák
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

When passing buffers between processes, the receiving process needs to know
the original buffer domain, so that it doesn't accidentally move the buffer.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  3 +++
 drivers/gpu/drm/radeon/radeon_drv.c    |  3 ++-
 drivers/gpu/drm/radeon/radeon_gem.c    | 30 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c    |  1 +
 drivers/gpu/drm/radeon/radeon_object.c |  3 +++
 include/uapi/drm/radeon_drm.h          | 12 ++++++++++++
 6 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a415f8e..3f10782 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -454,6 +454,7 @@ struct radeon_bo {
 	/* Protected by gem.mutex */
 	struct list_head		list;
 	/* Protected by tbo.reserved */
+	u32				initial_domain;
 	u32				placements[3];
 	struct ttm_placement		placement;
 	struct ttm_buffer_object	tbo;
@@ -2114,6 +2115,8 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *filp);
 int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *filp);
+int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *filp);
 int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
 int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 84a1bbb7..4392b7c 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -79,9 +79,10 @@
  *   2.35.0 - Add CIK macrotile mode array query
  *   2.36.0 - Fix CIK DCE tiling setup
  *   2.37.0 - allow GS ring setup on r6xx/r7xx
+ *   2.38.0 - RADEON_GEM_OP (GET_INITIAL_DOMAIN, SET_INITIAL_DOMAIN)
  */
 #define KMS_DRIVER_MAJOR	2
-#define KMS_DRIVER_MINOR	37
+#define KMS_DRIVER_MINOR	38
 #define KMS_DRIVER_PATCHLEVEL	0
 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
 int radeon_driver_unload_kms(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index b96c819..d7890f2 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -533,6 +533,36 @@ out:
 	return r;
 }
 
+int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *filp)
+{
+	struct drm_radeon_gem_op *args = data;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		return -ENOENT;
+	}
+	robj = gem_to_radeon_bo(gobj);
+
+	switch (args->op) {
+	case RADEON_GEM_OP_GET_INITIAL_DOMAIN:
+		args->value = robj->initial_domain;
+		break;
+	case RADEON_GEM_OP_SET_INITIAL_DOMAIN:
+		robj->initial_domain = args->value & (RADEON_GEM_DOMAIN_VRAM |
+						      RADEON_GEM_DOMAIN_GTT |
+						      RADEON_GEM_DOMAIN_CPU);
+		break;
+	default:
+		; /* do nothing */
+	}
+
+	drm_gem_object_unreference_unlocked(gobj);
+	return 0;
+}
+
 int radeon_mode_dumb_create(struct drm_file *file_priv,
 			    struct drm_device *dev,
 			    struct drm_mode_create_dumb *args)
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index baff98b..0b631eb 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -814,5 +814,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 08595cf..dd12bb4 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -145,6 +145,9 @@ int radeon_bo_create(struct radeon_device *rdev,
 	bo->surface_reg = -1;
 	INIT_LIST_HEAD(&bo->list);
 	INIT_LIST_HEAD(&bo->va);
+	bo->initial_domain = domain & (RADEON_GEM_DOMAIN_VRAM |
+	                               RADEON_GEM_DOMAIN_GTT |
+	                               RADEON_GEM_DOMAIN_CPU);
 	radeon_ttm_placement_from_domain(bo, domain);
 	/* Kernel allocation are uninterruptible */
 	down_read(&rdev->pm.mclk_lock);
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 1cf18b4..cb5c93a 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -510,6 +510,7 @@ typedef struct {
 #define DRM_RADEON_GEM_GET_TILING	0x29
 #define DRM_RADEON_GEM_BUSY		0x2a
 #define DRM_RADEON_GEM_VA		0x2b
+#define DRM_RADEON_GEM_OP		0x2c
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -552,6 +553,7 @@ typedef struct {
 #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
 #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
 #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -884,6 +886,16 @@ struct drm_radeon_gem_pwrite {
 	uint64_t data_ptr;
 };
 
+/* Sets or returns a value associated with a buffer. */
+struct drm_radeon_gem_op {
+	uint32_t	handle; /* buffer */
+	uint32_t	op;     /* RADEON_GEM_OP_* */
+	uint64_t	value;  /* input or return value */
+};
+
+#define RADEON_GEM_OP_GET_INITIAL_DOMAIN	0
+#define RADEON_GEM_OP_SET_INITIAL_DOMAIN	1
+
 #define RADEON_VA_MAP			1
 #define RADEON_VA_UNMAP			2
 
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
  2014-02-24 15:20 ` [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 16:20   ` Christian König
  2014-02-24 15:20 ` [PATCH 3/6] drm/radeon: deduplicate code in radeon_gem_busy_ioctl Marek Olšák
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

The statistics are:
- VRAM usage in bytes
- GTT usage in bytes
- number of bytes moved by TTM

The last one is actually a counter, so you need to sample it before and after
command submission and take the difference.

This is useful for finding performance bottlenecks. Userspace queries are
also added.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  5 +++++
 drivers/gpu/drm/radeon/radeon_device.c |  1 +
 drivers/gpu/drm/radeon/radeon_kms.c    | 15 ++++++++++++++
 drivers/gpu/drm/radeon/radeon_object.c | 38 +++++++++++++++++++++++++++++++++-
 drivers/gpu/drm/radeon/radeon_object.h |  2 +-
 drivers/gpu/drm/radeon/radeon_ttm.c    | 10 ++++++++-
 include/uapi/drm/radeon_drm.h          |  3 +++
 7 files changed, 71 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 3f10782..d37a57a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2307,6 +2307,11 @@ struct radeon_device {
 	/* virtual memory */
 	struct radeon_vm_manager	vm_manager;
 	struct mutex			gpu_clock_mutex;
+	/* memory stats */
+	struct mutex			memory_stats_mutex;
+	uint64_t			vram_usage;
+	uint64_t			gtt_usage;
+	uint64_t			num_bytes_moved;
 	/* ACPI interface */
 	struct radeon_atif		atif;
 	struct radeon_atcs		atcs;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index b012cbb..6564af7 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1184,6 +1184,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->gem.mutex);
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->gpu_clock_mutex);
+	mutex_init(&rdev->memory_stats_mutex);
 	mutex_init(&rdev->srbm_mutex);
 	init_rwsem(&rdev->pm.mclk_lock);
 	init_rwsem(&rdev->exclusive_lock);
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 0b631eb..ddc8c74 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -486,6 +486,21 @@ static int radeon_info_ioctl(struct drm_device *dev, void *data, struct drm_file
 	case RADEON_INFO_VCE_FB_VERSION:
 		*value = rdev->vce.fb_version;
 		break;
+	case RADEON_INFO_NUM_BYTES_MOVED:
+		value = (uint32_t*)&value64;
+		value_size = sizeof(uint64_t);
+		value64 = rdev->num_bytes_moved;
+		break;
+	case RADEON_INFO_VRAM_USAGE:
+		value = (uint32_t*)&value64;
+		value_size = sizeof(uint64_t);
+		value64 = rdev->vram_usage;
+		break;
+	case RADEON_INFO_GTT_USAGE:
+		value = (uint32_t*)&value64;
+		value_size = sizeof(uint64_t);
+		value64 = rdev->gtt_usage;
+		break;
 	default:
 		DRM_DEBUG_KMS("Invalid request %d\n", info->request);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index dd12bb4..d676ee2 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -56,11 +56,38 @@ static void radeon_bo_clear_va(struct radeon_bo *bo)
 	}
 }
 
+static void radeon_update_memory_usage(struct radeon_bo *bo,
+				       unsigned mem_type, int sign)
+{
+	struct radeon_device *rdev = bo->rdev;
+	u64 size = (u64)bo->tbo.num_pages << PAGE_SHIFT;
+
+	mutex_lock(&rdev->memory_stats_mutex);
+	switch (mem_type) {
+	case TTM_PL_TT:
+		if (sign > 0)
+			rdev->gtt_usage += size;
+		else
+			rdev->gtt_usage -= size;
+		break;
+	case TTM_PL_VRAM:
+		if (sign > 0)
+			rdev->vram_usage += size;
+		else
+			rdev->vram_usage -= size;
+		break;
+	}
+	mutex_unlock(&rdev->memory_stats_mutex);
+}
+
 static void radeon_ttm_bo_destroy(struct ttm_buffer_object *tbo)
 {
 	struct radeon_bo *bo;
 
 	bo = container_of(tbo, struct radeon_bo, tbo);
+
+	radeon_update_memory_usage(bo, bo->tbo.mem.mem_type, -1);
+
 	mutex_lock(&bo->rdev->gem.mutex);
 	list_del_init(&bo->list);
 	mutex_unlock(&bo->rdev->gem.mutex);
@@ -567,14 +594,23 @@ int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved,
 }
 
 void radeon_bo_move_notify(struct ttm_buffer_object *bo,
-			   struct ttm_mem_reg *mem)
+			   struct ttm_mem_reg *new_mem)
 {
 	struct radeon_bo *rbo;
+
 	if (!radeon_ttm_bo_is_radeon_bo(bo))
 		return;
+
 	rbo = container_of(bo, struct radeon_bo, tbo);
 	radeon_bo_check_tiling(rbo, 0, 1);
 	radeon_vm_bo_invalidate(rbo->rdev, rbo);
+
+	/* update statistics */
+	if (!new_mem)
+		return;
+
+	radeon_update_memory_usage(rbo, bo->mem.mem_type, -1);
+	radeon_update_memory_usage(rbo, new_mem->mem_type, 1);
 }
 
 int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 209b111..a9a8c11 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -151,7 +151,7 @@ extern void radeon_bo_get_tiling_flags(struct radeon_bo *bo,
 extern int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved,
 				bool force_drop);
 extern void radeon_bo_move_notify(struct ttm_buffer_object *bo,
-					struct ttm_mem_reg *mem);
+				  struct ttm_mem_reg *new_mem);
 extern int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
 extern int radeon_bo_get_surface_reg(struct radeon_bo *bo);
 
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 77f5b0c..7e2e833 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -406,8 +406,16 @@ static int radeon_bo_move(struct ttm_buffer_object *bo,
 	if (r) {
 memcpy:
 		r = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
+		if (r) {
+			return r;
+		}
 	}
-	return r;
+
+	/* update statistics */
+	mutex_lock(&rdev->memory_stats_mutex);
+	rdev->num_bytes_moved += (u64)bo->num_pages << PAGE_SHIFT;
+	mutex_unlock(&rdev->memory_stats_mutex);
+	return 0;
 }
 
 static int radeon_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem)
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index cb5c93a..aefa2f6 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -1004,6 +1004,9 @@ struct drm_radeon_cs {
 #define RADEON_INFO_VCE_FW_VERSION	0x1b
 /* version of VCE feedback */
 #define RADEON_INFO_VCE_FB_VERSION	0x1c
+#define RADEON_INFO_NUM_BYTES_MOVED	0x1d
+#define RADEON_INFO_VRAM_USAGE		0x1e
+#define RADEON_INFO_GTT_USAGE		0x1f
 
 
 struct drm_radeon_info {
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] drm/radeon: deduplicate code in radeon_gem_busy_ioctl
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
  2014-02-24 15:20 ` [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains Marek Olšák
  2014-02-24 15:20 ` [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 15:20 ` [PATCH 4/6] drm/radeon: add buffers to the LRU list from smallest to largest Marek Olšák
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon_gem.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index d7890f2..ccb7d0f 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -344,18 +344,7 @@ int radeon_gem_busy_ioctl(struct drm_device *dev, void *data,
 	}
 	robj = gem_to_radeon_bo(gobj);
 	r = radeon_bo_wait(robj, &cur_placement, true);
-	switch (cur_placement) {
-	case TTM_PL_VRAM:
-		args->domain = RADEON_GEM_DOMAIN_VRAM;
-		break;
-	case TTM_PL_TT:
-		args->domain = RADEON_GEM_DOMAIN_GTT;
-		break;
-	case TTM_PL_SYSTEM:
-		args->domain = RADEON_GEM_DOMAIN_CPU;
-	default:
-		break;
-	}
+	args->domain = radeon_mem_type_to_domain(cur_placement);
 	drm_gem_object_unreference_unlocked(gobj);
 	r = radeon_gem_handle_lockup(rdev, r);
 	return r;
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] drm/radeon: add buffers to the LRU list from smallest to largest
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
                   ` (2 preceding siblings ...)
  2014-02-24 15:20 ` [PATCH 3/6] drm/radeon: deduplicate code in radeon_gem_busy_ioctl Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 15:20 ` [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace Marek Olšák
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index f28a8d8..d49a3f7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -24,6 +24,7 @@
  * Authors:
  *    Jerome Glisse <glisse@freedesktop.org>
  */
+#include <linux/list_sort.h>
 #include <drm/drmP.h>
 #include <drm/radeon_drm.h>
 #include "radeon_reg.h"
@@ -290,6 +291,16 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	return 0;
 }
 
+static int cmp_size_smaller_first(void *priv, struct list_head *a,
+				  struct list_head *b)
+{
+	struct radeon_bo_list *la = list_entry(a, struct radeon_bo_list, tv.head);
+	struct radeon_bo_list *lb = list_entry(b, struct radeon_bo_list, tv.head);
+
+	/* Sort A before B if A is smaller. */
+	return (int)la->bo->tbo.num_pages - (int)lb->bo->tbo.num_pages;
+}
+
 /**
  * cs_parser_fini() - clean parser states
  * @parser:	parser structure holding parsing context.
@@ -303,6 +314,18 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error, bo
 	unsigned i;
 
 	if (!error) {
+		/* Sort the buffer list from the smallest to largest buffer,
+		 * which affects the order of buffers in the LRU list.
+		 * This assures that the smallest buffers are added first
+		 * to the LRU list, so they are likely to be later evicted
+		 * first, instead of large buffers whose eviction is more
+		 * expensive.
+		 *
+		 * This slightly lowers the number of bytes moved by TTM
+		 * per frame under memory pressure.
+		 */
+		list_sort(NULL, &parser->validated, cmp_size_smaller_first);
+
 		ttm_eu_fence_buffer_objects(&parser->ticket,
 					    &parser->validated,
 					    parser->ib.fence);
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
                   ` (3 preceding siblings ...)
  2014-02-24 15:20 ` [PATCH 4/6] drm/radeon: add buffers to the LRU list from smallest to largest Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 16:27   ` Christian König
  2014-02-24 15:20 ` [PATCH 6/6] drm/radeon: limit how much memory TTM can move per IB according to VRAM usage Marek Olšák
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

Userspace should set the first 4 bits of drm_radeon_cs_reloc::flags to
a number from 0 to 15. The higher the number, the higher the priority,
which means a buffer with a higher number will be validated sooner.

The old behavior is preserved: Buffers used for write are prioritized over
read-only buffers if the userspace doesn't set the number.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  2 +-
 drivers/gpu/drm/radeon/radeon_cs.c     | 53 ++++++++++++++++++++++++++++++++--
 drivers/gpu/drm/radeon/radeon_object.c | 10 -------
 drivers/gpu/drm/radeon/radeon_object.h |  2 --
 4 files changed, 51 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index d37a57a..f7a3174 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -481,7 +481,7 @@ struct radeon_bo_list {
 	struct ttm_validate_buffer tv;
 	struct radeon_bo	*bo;
 	uint64_t		gpu_offset;
-	bool			written;
+	unsigned		priority;
 	unsigned		domain;
 	unsigned		alt_domain;
 	u32			tiling_flags;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index d49a3f7..1ba1a48 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -31,6 +31,41 @@
 #include "radeon.h"
 #include "radeon_trace.h"
 
+/* This is a variation of the bucket sort with O(n) time complexity.
+ * The relocations are sorted from the highest to the lowest priority. */
+static void sort_relocs_for_validation(struct list_head *list)
+{
+	struct list_head bucket[17], *it, *tmp;
+	unsigned i, priority;
+
+	for (i = 0; i < 17; i++)
+		INIT_LIST_HEAD(&bucket[i]);
+
+	/* Move the elements into buckets. An i-th bucket only contains
+	 * elements with priorities i*2 and i*2+1. Odd numbers are added
+	 * at the head of a bucket and even numbers are added at the tail,
+	 * therefore all buckets are always sorted. */
+	list_for_each_safe(it, tmp, list) {
+		priority = list_entry(it, struct radeon_bo_list,
+				      tv.head)->priority;
+		i = priority / 2;
+		i = min(i, 16u) ;
+
+		if (priority % 2 == 1) {
+			list_move(it, &bucket[i]);
+		} else {
+			list_move_tail(it, &bucket[i]);
+		}
+	}
+
+	INIT_LIST_HEAD(list);
+
+	/* connect the sorted buckets */
+	for (i = 0; i < 17; i++) {
+		list_splice(&bucket[i], list);
+	}
+}
+
 static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 {
 	struct drm_device *ddev = p->rdev->ddev;
@@ -80,7 +115,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 		p->relocs_ptr[i] = &p->relocs[i];
 		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
 		p->relocs[i].lobj.bo = p->relocs[i].robj;
-		p->relocs[i].lobj.written = !!r->write_domain;
+
+		/* The userspace buffer priorities are from 0 to 15. A higher
+		 * number means the buffer is more important.
+		 * Also, the buffers used for write have a higher priority than
+		 * the buffers used for read only, which doubles the range
+		 * to 0 to 31. Numbers 32 and 33 are reserved for the kernel
+		 * driver.
+		 */
+		p->relocs[i].lobj.priority = (r->flags & 0xf) * 2 + !!r->write_domain;
 
 		/* the first reloc of an UVD job is the msg and that must be in
 		   VRAM, also but everything into VRAM on AGP cards to avoid
@@ -94,6 +137,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 			p->relocs[i].lobj.alt_domain =
 				RADEON_GEM_DOMAIN_VRAM;
 
+			/* prioritize this over any other relocation */
+			p->relocs[i].lobj.priority = 32;
 		} else {
 			uint32_t domain = r->write_domain ?
 				r->write_domain : r->read_domains;
@@ -107,9 +152,11 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 		p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
 		p->relocs[i].handle = r->handle;
 
-		radeon_bo_list_add_object(&p->relocs[i].lobj,
-					  &p->validated);
+		list_add(&p->relocs[i].lobj.tv.head, &p->validated);
 	}
+
+	sort_relocs_for_validation(&p->validated);
+
 	return radeon_bo_list_validate(&p->ticket, &p->validated, p->ring);
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index d676ee2..19042ae 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -368,16 +368,6 @@ void radeon_bo_fini(struct radeon_device *rdev)
 	arch_phys_wc_del(rdev->mc.vram_mtrr);
 }
 
-void radeon_bo_list_add_object(struct radeon_bo_list *lobj,
-				struct list_head *head)
-{
-	if (lobj->written) {
-		list_add(&lobj->tv.head, head);
-	} else {
-		list_add_tail(&lobj->tv.head, head);
-	}
-}
-
 int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
 			    struct list_head *head, int ring)
 {
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index a9a8c11..6c3ca9e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -138,8 +138,6 @@ extern int radeon_bo_evict_vram(struct radeon_device *rdev);
 extern void radeon_bo_force_delete(struct radeon_device *rdev);
 extern int radeon_bo_init(struct radeon_device *rdev);
 extern void radeon_bo_fini(struct radeon_device *rdev);
-extern void radeon_bo_list_add_object(struct radeon_bo_list *lobj,
-				struct list_head *head);
 extern int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
 				   struct list_head *head, int ring);
 extern int radeon_bo_fbdev_mmap(struct radeon_bo *bo,
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] drm/radeon: limit how much memory TTM can move per IB according to VRAM usage
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
                   ` (4 preceding siblings ...)
  2014-02-24 15:20 ` [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace Marek Olšák
@ 2014-02-24 15:20 ` Marek Olšák
  2014-02-24 16:40 ` [PATCH 0/6] Radeon memory management improvements Christian König
  2014-02-25  2:52 ` Michel Dänzer
  7 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 15:20 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c     |  2 +-
 drivers/gpu/drm/radeon/radeon_object.c | 87 +++++++++++++++++++++++++++++++---
 drivers/gpu/drm/radeon/radeon_object.h |  3 +-
 3 files changed, 84 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ba1a48..2338eef 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -157,7 +157,7 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 
 	sort_relocs_for_validation(&p->validated);
 
-	return radeon_bo_list_validate(&p->ticket, &p->validated, p->ring);
+	return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->ring);
 }
 
 static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 19042ae..e1a5af7 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -368,29 +368,104 @@ void radeon_bo_fini(struct radeon_device *rdev)
 	arch_phys_wc_del(rdev->mc.vram_mtrr);
 }
 
-int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
+/* Returns how many bytes TTM can move per IB.
+ */
+static u64 radeon_bo_get_threshold_for_moves(struct radeon_device *rdev)
+{
+	u64 real_vram_size = rdev->mc.real_vram_size;
+	u64 vram_usage = rdev->vram_usage;
+
+	/* This function is based on the current VRAM usage.
+	 *
+	 * - If all of VRAM is free, allow relocating the number of bytes that
+	 *   is equal to 1/4 of the size of VRAM for this IB.
+
+	 * - If more than one half of VRAM is occupied, only allow relocating
+	 *   1 MB of data for this IB.
+	 *
+	 * - From 0 to one half of used VRAM, the threshold decreases
+	 *   linearly.
+	 *         __________________
+	 * 1/4 of -|\               |
+	 * VRAM    | \              |
+	 *         |  \             |
+	 *         |   \            |
+	 *         |    \           |
+	 *         |     \          |
+	 *         |      \         |
+	 *         |       \________|1 MB
+	 *         |----------------|
+	 *    VRAM 0 %             100 %
+	 *         used            used
+	 *
+	 * Note: It's a threshold, not a limit. The threshold must be crossed
+	 * for buffer relocations to stop, so any buffer of an arbitrary size
+	 * can be moved as long as the threshold isn't crossed before
+	 * the relocation takes place. We don't want to disable buffer
+	 * relocations completely.
+	 *
+	 * The idea is that buffers should be placed in VRAM at creation time
+	 * and TTM should only do a minimum number of relocations during
+	 * command submission. In practice, you need to submit at least
+	 * a dozen IBs to move all buffers to VRAM if they are in GTT.
+	 *
+	 * Also, things can get pretty crazy under memory pressure and actual
+	 * VRAM usage can change a lot, so playing safe even at 50% does
+	 * consistently increase performance.
+	 */
+
+	u64 half_vram = real_vram_size >> 1;
+	u64 half_free_vram = vram_usage >= half_vram ? 0 : half_vram - vram_usage;
+	u64 bytes_moved_threshold = half_free_vram >> 1;
+	return max(bytes_moved_threshold, 1024*1024ull);
+}
+
+int radeon_bo_list_validate(struct radeon_device *rdev,
+			    struct ww_acquire_ctx *ticket,
 			    struct list_head *head, int ring)
 {
 	struct radeon_bo_list *lobj;
 	struct radeon_bo *bo;
-	u32 domain;
 	int r;
+	u64 bytes_moved = 0, initial_bytes_moved;
+	u64 bytes_moved_threshold = radeon_bo_get_threshold_for_moves(rdev);
 
 	r = ttm_eu_reserve_buffers(ticket, head);
 	if (unlikely(r != 0)) {
 		return r;
 	}
+
 	list_for_each_entry(lobj, head, tv.head) {
 		bo = lobj->bo;
 		if (!bo->pin_count) {
-			domain = lobj->domain;
-			
+			u32 domain = lobj->domain;
+			u32 current_domain =
+				radeon_mem_type_to_domain(bo->tbo.mem.mem_type);
+
+			/* Check if this buffer will be moved and don't move it
+			 * if we have moved too many buffers for this IB already.
+			 *
+			 * Note that this allows moving at least one buffer of
+			 * any size, because it doesn't take the current "bo"
+			 * into account. We don't want to disallow buffer moves
+			 * completely.
+			 */
+			if (current_domain != RADEON_GEM_DOMAIN_CPU &&
+			    (domain & current_domain) == 0 && /* will be moved */
+			    bytes_moved > bytes_moved_threshold) {
+				/* don't move it */
+				domain = current_domain;
+			}
+
 		retry:
 			radeon_ttm_placement_from_domain(bo, domain);
 			if (ring == R600_RING_TYPE_UVD_INDEX)
 				radeon_uvd_force_into_uvd_segment(bo);
-			r = ttm_bo_validate(&bo->tbo, &bo->placement,
-						true, false);
+
+			initial_bytes_moved = rdev->num_bytes_moved;
+			r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false);
+			bytes_moved += rdev->num_bytes_moved - initial_bytes_moved;
+
 			if (unlikely(r)) {
 				if (r != -ERESTARTSYS && domain != lobj->alt_domain) {
 					domain = lobj->alt_domain;
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 6c3ca9e..7dff64d 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -138,7 +138,8 @@ extern int radeon_bo_evict_vram(struct radeon_device *rdev);
 extern void radeon_bo_force_delete(struct radeon_device *rdev);
 extern int radeon_bo_init(struct radeon_device *rdev);
 extern void radeon_bo_fini(struct radeon_device *rdev);
-extern int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
+extern int radeon_bo_list_validate(struct radeon_device *rdev,
+				   struct ww_acquire_ctx *ticket,
 				   struct list_head *head, int ring);
 extern int radeon_bo_fbdev_mmap(struct radeon_bo *bo,
 				struct vm_area_struct *vma);
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains
  2014-02-24 15:20 ` [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains Marek Olšák
@ 2014-02-24 16:17   ` Christian König
  2014-02-26  0:44     ` Marek Olšák
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2014-02-24 16:17 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Am 24.02.2014 16:20, schrieb Marek Olšák:
> From: Marek Olšák <marek.olsak@amd.com>
>
> When passing buffers between processes, the receiving process needs to know
> the original buffer domain, so that it doesn't accidentally move the buffer.
>
> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
> ---
>   drivers/gpu/drm/radeon/radeon.h        |  3 +++
>   drivers/gpu/drm/radeon/radeon_drv.c    |  3 ++-
>   drivers/gpu/drm/radeon/radeon_gem.c    | 30 ++++++++++++++++++++++++++++++
>   drivers/gpu/drm/radeon/radeon_kms.c    |  1 +
>   drivers/gpu/drm/radeon/radeon_object.c |  3 +++
>   include/uapi/drm/radeon_drm.h          | 12 ++++++++++++
>   6 files changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index a415f8e..3f10782 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -454,6 +454,7 @@ struct radeon_bo {
>   	/* Protected by gem.mutex */
>   	struct list_head		list;
>   	/* Protected by tbo.reserved */
> +	u32				initial_domain;
>   	u32				placements[3];
>   	struct ttm_placement		placement;
>   	struct ttm_buffer_object	tbo;
> @@ -2114,6 +2115,8 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
>   			      struct drm_file *filp);
>   int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
>   			  struct drm_file *filp);
> +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *filp);
>   int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>   int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
>   				struct drm_file *filp);
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index 84a1bbb7..4392b7c 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -79,9 +79,10 @@
>    *   2.35.0 - Add CIK macrotile mode array query
>    *   2.36.0 - Fix CIK DCE tiling setup
>    *   2.37.0 - allow GS ring setup on r6xx/r7xx
> + *   2.38.0 - RADEON_GEM_OP (GET_INITIAL_DOMAIN, SET_INITIAL_DOMAIN)
>    */
>   #define KMS_DRIVER_MAJOR	2
> -#define KMS_DRIVER_MINOR	37
> +#define KMS_DRIVER_MINOR	38
>   #define KMS_DRIVER_PATCHLEVEL	0
>   int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
>   int radeon_driver_unload_kms(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index b96c819..d7890f2 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -533,6 +533,36 @@ out:
>   	return r;
>   }
>   
> +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *filp)
> +{
> +	struct drm_radeon_gem_op *args = data;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		return -ENOENT;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +

When the comment is correct and initial_domain is protected by reserving 
the BO you should actually reserve/unreserve it here. On the other hand 
locking might not be necessary.

> +	switch (args->op) {
> +	case RADEON_GEM_OP_GET_INITIAL_DOMAIN:
> +		args->value = robj->initial_domain;
> +		break;
> +	case RADEON_GEM_OP_SET_INITIAL_DOMAIN:
> +		robj->initial_domain = args->value & (RADEON_GEM_DOMAIN_VRAM |
> +						      RADEON_GEM_DOMAIN_GTT |
> +						      RADEON_GEM_DOMAIN_CPU);
> +		break;
> +	default:
> +		; /* do nothing */

Should probably return -EINVAL here.

> +	}
> +
> +	drm_gem_object_unreference_unlocked(gobj);
> +	return 0;
> +}
> +
>   int radeon_mode_dumb_create(struct drm_file *file_priv,
>   			    struct drm_device *dev,
>   			    struct drm_mode_create_dumb *args)
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index baff98b..0b631eb 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -814,5 +814,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   };
>   int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index 08595cf..dd12bb4 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -145,6 +145,9 @@ int radeon_bo_create(struct radeon_device *rdev,
>   	bo->surface_reg = -1;
>   	INIT_LIST_HEAD(&bo->list);
>   	INIT_LIST_HEAD(&bo->va);
> +	bo->initial_domain = domain & (RADEON_GEM_DOMAIN_VRAM |
> +	                               RADEON_GEM_DOMAIN_GTT |
> +	                               RADEON_GEM_DOMAIN_CPU);
>   	radeon_ttm_placement_from_domain(bo, domain);
>   	/* Kernel allocation are uninterruptible */
>   	down_read(&rdev->pm.mclk_lock);
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 1cf18b4..cb5c93a 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -510,6 +510,7 @@ typedef struct {
>   #define DRM_RADEON_GEM_GET_TILING	0x29
>   #define DRM_RADEON_GEM_BUSY		0x2a
>   #define DRM_RADEON_GEM_VA		0x2b
> +#define DRM_RADEON_GEM_OP		0x2c
>   
>   #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
>   #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
> @@ -552,6 +553,7 @@ typedef struct {
>   #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
>   #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
>   #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> +#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
>   
>   typedef struct drm_radeon_init {
>   	enum {
> @@ -884,6 +886,16 @@ struct drm_radeon_gem_pwrite {
>   	uint64_t data_ptr;
>   };
>   
> +/* Sets or returns a value associated with a buffer. */
> +struct drm_radeon_gem_op {
> +	uint32_t	handle; /* buffer */
> +	uint32_t	op;     /* RADEON_GEM_OP_* */
> +	uint64_t	value;  /* input or return value */
> +};
> +
> +#define RADEON_GEM_OP_GET_INITIAL_DOMAIN	0
> +#define RADEON_GEM_OP_SET_INITIAL_DOMAIN	1
> +
>   #define RADEON_VA_MAP			1
>   #define RADEON_VA_UNMAP			2
>   

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves
  2014-02-24 15:20 ` [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves Marek Olšák
@ 2014-02-24 16:20   ` Christian König
  2014-02-26 17:56     ` Marek Olšák
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2014-02-24 16:20 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Am 24.02.2014 16:20, schrieb Marek Olšák:
> From: Marek Olšák <marek.olsak@amd.com>
>
> The statistics are:
> - VRAM usage in bytes
> - GTT usage in bytes
> - number of bytes moved by TTM
>
> The last one is actually a counter, so you need to sample it before and after
> command submission and take the difference.
>
> This is useful for finding performance bottlenecks. Userspace queries are
> also added.
>
> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
> ---
>   drivers/gpu/drm/radeon/radeon.h        |  5 +++++
>   drivers/gpu/drm/radeon/radeon_device.c |  1 +
>   drivers/gpu/drm/radeon/radeon_kms.c    | 15 ++++++++++++++
>   drivers/gpu/drm/radeon/radeon_object.c | 38 +++++++++++++++++++++++++++++++++-
>   drivers/gpu/drm/radeon/radeon_object.h |  2 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c    | 10 ++++++++-
>   include/uapi/drm/radeon_drm.h          |  3 +++
>   7 files changed, 71 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 3f10782..d37a57a 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2307,6 +2307,11 @@ struct radeon_device {
>   	/* virtual memory */
>   	struct radeon_vm_manager	vm_manager;
>   	struct mutex			gpu_clock_mutex;
> +	/* memory stats */
> +	struct mutex			memory_stats_mutex;
> +	uint64_t			vram_usage;
> +	uint64_t			gtt_usage;
> +	uint64_t			num_bytes_moved;

As far as I can see you could make those tree values atomic64_t instead 
and avoid the mutex.

>   	/* ACPI interface */
>   	struct radeon_atif		atif;
>   	struct radeon_atcs		atcs;
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index b012cbb..6564af7 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1184,6 +1184,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	mutex_init(&rdev->gem.mutex);
>   	mutex_init(&rdev->pm.mutex);
>   	mutex_init(&rdev->gpu_clock_mutex);
> +	mutex_init(&rdev->memory_stats_mutex);
>   	mutex_init(&rdev->srbm_mutex);
>   	init_rwsem(&rdev->pm.mclk_lock);
>   	init_rwsem(&rdev->exclusive_lock);
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 0b631eb..ddc8c74 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -486,6 +486,21 @@ static int radeon_info_ioctl(struct drm_device *dev, void *data, struct drm_file
>   	case RADEON_INFO_VCE_FB_VERSION:
>   		*value = rdev->vce.fb_version;
>   		break;
> +	case RADEON_INFO_NUM_BYTES_MOVED:
> +		value = (uint32_t*)&value64;
> +		value_size = sizeof(uint64_t);
> +		value64 = rdev->num_bytes_moved;
> +		break;
> +	case RADEON_INFO_VRAM_USAGE:
> +		value = (uint32_t*)&value64;
> +		value_size = sizeof(uint64_t);
> +		value64 = rdev->vram_usage;
> +		break;
> +	case RADEON_INFO_GTT_USAGE:
> +		value = (uint32_t*)&value64;
> +		value_size = sizeof(uint64_t);
> +		value64 = rdev->gtt_usage;
> +		break;
>   	default:
>   		DRM_DEBUG_KMS("Invalid request %d\n", info->request);
>   		return -EINVAL;
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index dd12bb4..d676ee2 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -56,11 +56,38 @@ static void radeon_bo_clear_va(struct radeon_bo *bo)
>   	}
>   }
>   
> +static void radeon_update_memory_usage(struct radeon_bo *bo,
> +				       unsigned mem_type, int sign)
> +{
> +	struct radeon_device *rdev = bo->rdev;
> +	u64 size = (u64)bo->tbo.num_pages << PAGE_SHIFT;
> +
> +	mutex_lock(&rdev->memory_stats_mutex);
> +	switch (mem_type) {
> +	case TTM_PL_TT:
> +		if (sign > 0)
> +			rdev->gtt_usage += size;
> +		else
> +			rdev->gtt_usage -= size;
> +		break;
> +	case TTM_PL_VRAM:
> +		if (sign > 0)
> +			rdev->vram_usage += size;
> +		else
> +			rdev->vram_usage -= size;
> +		break;
> +	}
> +	mutex_unlock(&rdev->memory_stats_mutex);
> +}
> +
>   static void radeon_ttm_bo_destroy(struct ttm_buffer_object *tbo)
>   {
>   	struct radeon_bo *bo;
>   
>   	bo = container_of(tbo, struct radeon_bo, tbo);
> +
> +	radeon_update_memory_usage(bo, bo->tbo.mem.mem_type, -1);
> +
>   	mutex_lock(&bo->rdev->gem.mutex);
>   	list_del_init(&bo->list);
>   	mutex_unlock(&bo->rdev->gem.mutex);
> @@ -567,14 +594,23 @@ int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved,
>   }
>   
>   void radeon_bo_move_notify(struct ttm_buffer_object *bo,
> -			   struct ttm_mem_reg *mem)
> +			   struct ttm_mem_reg *new_mem)
>   {
>   	struct radeon_bo *rbo;
> +
>   	if (!radeon_ttm_bo_is_radeon_bo(bo))
>   		return;
> +
>   	rbo = container_of(bo, struct radeon_bo, tbo);
>   	radeon_bo_check_tiling(rbo, 0, 1);
>   	radeon_vm_bo_invalidate(rbo->rdev, rbo);
> +
> +	/* update statistics */
> +	if (!new_mem)
> +		return;
> +
> +	radeon_update_memory_usage(rbo, bo->mem.mem_type, -1);
> +	radeon_update_memory_usage(rbo, new_mem->mem_type, 1);
>   }
>   
>   int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
> diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
> index 209b111..a9a8c11 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.h
> +++ b/drivers/gpu/drm/radeon/radeon_object.h
> @@ -151,7 +151,7 @@ extern void radeon_bo_get_tiling_flags(struct radeon_bo *bo,
>   extern int radeon_bo_check_tiling(struct radeon_bo *bo, bool has_moved,
>   				bool force_drop);
>   extern void radeon_bo_move_notify(struct ttm_buffer_object *bo,
> -					struct ttm_mem_reg *mem);
> +				  struct ttm_mem_reg *new_mem);
>   extern int radeon_bo_fault_reserve_notify(struct ttm_buffer_object *bo);
>   extern int radeon_bo_get_surface_reg(struct radeon_bo *bo);
>   
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 77f5b0c..7e2e833 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -406,8 +406,16 @@ static int radeon_bo_move(struct ttm_buffer_object *bo,
>   	if (r) {
>   memcpy:
>   		r = ttm_bo_move_memcpy(bo, evict, no_wait_gpu, new_mem);
> +		if (r) {
> +			return r;
> +		}
>   	}
> -	return r;
> +
> +	/* update statistics */
> +	mutex_lock(&rdev->memory_stats_mutex);
> +	rdev->num_bytes_moved += (u64)bo->num_pages << PAGE_SHIFT;
> +	mutex_unlock(&rdev->memory_stats_mutex);
> +	return 0;
>   }
>   
>   static int radeon_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct ttm_mem_reg *mem)
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index cb5c93a..aefa2f6 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -1004,6 +1004,9 @@ struct drm_radeon_cs {
>   #define RADEON_INFO_VCE_FW_VERSION	0x1b
>   /* version of VCE feedback */
>   #define RADEON_INFO_VCE_FB_VERSION	0x1c
> +#define RADEON_INFO_NUM_BYTES_MOVED	0x1d
> +#define RADEON_INFO_VRAM_USAGE		0x1e
> +#define RADEON_INFO_GTT_USAGE		0x1f
>   
>   
>   struct drm_radeon_info {

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace
  2014-02-24 15:20 ` [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace Marek Olšák
@ 2014-02-24 16:27   ` Christian König
  0 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2014-02-24 16:27 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Am 24.02.2014 16:20, schrieb Marek Olšák:
> From: Marek Olšák <marek.olsak@amd.com>
>
> Userspace should set the first 4 bits of drm_radeon_cs_reloc::flags to
> a number from 0 to 15. The higher the number, the higher the priority,
> which means a buffer with a higher number will be validated sooner.

Assuming that we only have 32 different priorities it would probably be 
better to add the buffers to 32 different lists while evaluating the 
priorities in radeon_cs_parser_relocs and then concatenate all 32 lists 
at the end instead of sorting them.

>
> The old behavior is preserved: Buffers used for write are prioritized over
> read-only buffers if the userspace doesn't set the number.
>
> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
> ---
>   drivers/gpu/drm/radeon/radeon.h        |  2 +-
>   drivers/gpu/drm/radeon/radeon_cs.c     | 53 ++++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/radeon/radeon_object.c | 10 -------
>   drivers/gpu/drm/radeon/radeon_object.h |  2 --
>   4 files changed, 51 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index d37a57a..f7a3174 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -481,7 +481,7 @@ struct radeon_bo_list {
>   	struct ttm_validate_buffer tv;
>   	struct radeon_bo	*bo;
>   	uint64_t		gpu_offset;
> -	bool			written;
> +	unsigned		priority;
>   	unsigned		domain;
>   	unsigned		alt_domain;
>   	u32			tiling_flags;
> diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
> index d49a3f7..1ba1a48 100644
> --- a/drivers/gpu/drm/radeon/radeon_cs.c
> +++ b/drivers/gpu/drm/radeon/radeon_cs.c
> @@ -31,6 +31,41 @@
>   #include "radeon.h"
>   #include "radeon_trace.h"
>   
> +/* This is a variation of the bucket sort with O(n) time complexity.
> + * The relocations are sorted from the highest to the lowest priority. */
> +static void sort_relocs_for_validation(struct list_head *list)
> +{
> +	struct list_head bucket[17], *it, *tmp;
> +	unsigned i, priority;
> +
> +	for (i = 0; i < 17; i++)
> +		INIT_LIST_HEAD(&bucket[i]);
> +
> +	/* Move the elements into buckets. An i-th bucket only contains
> +	 * elements with priorities i*2 and i*2+1. Odd numbers are added
> +	 * at the head of a bucket and even numbers are added at the tail,
> +	 * therefore all buckets are always sorted. */
> +	list_for_each_safe(it, tmp, list) {
> +		priority = list_entry(it, struct radeon_bo_list,
> +				      tv.head)->priority;
> +		i = priority / 2;
> +		i = min(i, 16u) ;
> +
> +		if (priority % 2 == 1) {
> +			list_move(it, &bucket[i]);
> +		} else {
> +			list_move_tail(it, &bucket[i]);
> +		}
> +	}
> +
> +	INIT_LIST_HEAD(list);
> +
> +	/* connect the sorted buckets */
> +	for (i = 0; i < 17; i++) {
> +		list_splice(&bucket[i], list);
> +	}
> +}
> +
>   static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>   {
>   	struct drm_device *ddev = p->rdev->ddev;
> @@ -80,7 +115,15 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>   		p->relocs_ptr[i] = &p->relocs[i];
>   		p->relocs[i].robj = gem_to_radeon_bo(p->relocs[i].gobj);
>   		p->relocs[i].lobj.bo = p->relocs[i].robj;
> -		p->relocs[i].lobj.written = !!r->write_domain;
> +
> +		/* The userspace buffer priorities are from 0 to 15. A higher
> +		 * number means the buffer is more important.
> +		 * Also, the buffers used for write have a higher priority than
> +		 * the buffers used for read only, which doubles the range
> +		 * to 0 to 31. Numbers 32 and 33 are reserved for the kernel
> +		 * driver.
> +		 */
> +		p->relocs[i].lobj.priority = (r->flags & 0xf) * 2 + !!r->write_domain;
>   
>   		/* the first reloc of an UVD job is the msg and that must be in
>   		   VRAM, also but everything into VRAM on AGP cards to avoid
> @@ -94,6 +137,8 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>   			p->relocs[i].lobj.alt_domain =
>   				RADEON_GEM_DOMAIN_VRAM;
>   
> +			/* prioritize this over any other relocation */
> +			p->relocs[i].lobj.priority = 32;
>   		} else {
>   			uint32_t domain = r->write_domain ?
>   				r->write_domain : r->read_domains;
> @@ -107,9 +152,11 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
>   		p->relocs[i].lobj.tv.bo = &p->relocs[i].robj->tbo;
>   		p->relocs[i].handle = r->handle;
>   
> -		radeon_bo_list_add_object(&p->relocs[i].lobj,
> -					  &p->validated);
> +		list_add(&p->relocs[i].lobj.tv.head, &p->validated);
>   	}
> +
> +	sort_relocs_for_validation(&p->validated);
> +
>   	return radeon_bo_list_validate(&p->ticket, &p->validated, p->ring);
>   }
>   
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index d676ee2..19042ae 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -368,16 +368,6 @@ void radeon_bo_fini(struct radeon_device *rdev)
>   	arch_phys_wc_del(rdev->mc.vram_mtrr);
>   }
>   
> -void radeon_bo_list_add_object(struct radeon_bo_list *lobj,
> -				struct list_head *head)
> -{
> -	if (lobj->written) {
> -		list_add(&lobj->tv.head, head);
> -	} else {
> -		list_add_tail(&lobj->tv.head, head);
> -	}
> -}
> -
>   int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
>   			    struct list_head *head, int ring)
>   {
> diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
> index a9a8c11..6c3ca9e 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.h
> +++ b/drivers/gpu/drm/radeon/radeon_object.h
> @@ -138,8 +138,6 @@ extern int radeon_bo_evict_vram(struct radeon_device *rdev);
>   extern void radeon_bo_force_delete(struct radeon_device *rdev);
>   extern int radeon_bo_init(struct radeon_device *rdev);
>   extern void radeon_bo_fini(struct radeon_device *rdev);
> -extern void radeon_bo_list_add_object(struct radeon_bo_list *lobj,
> -				struct list_head *head);
>   extern int radeon_bo_list_validate(struct ww_acquire_ctx *ticket,
>   				   struct list_head *head, int ring);
>   extern int radeon_bo_fbdev_mmap(struct radeon_bo *bo,

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
                   ` (5 preceding siblings ...)
  2014-02-24 15:20 ` [PATCH 6/6] drm/radeon: limit how much memory TTM can move per IB according to VRAM usage Marek Olšák
@ 2014-02-24 16:40 ` Christian König
  2014-02-24 19:23   ` Alex Deucher
  2014-02-24 19:39   ` Marek Olšák
  2014-02-25  2:52 ` Michel Dänzer
  7 siblings, 2 replies; 23+ messages in thread
From: Christian König @ 2014-02-24 16:40 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Hi Marek,

Some minor comments on patch 1, 2 and 5, but nothing serious. Patch 3, 4 
and 6 are Reviewed-by: christian König <christian.koenig@amd.com>

See below for a few in line comments.

Am 24.02.2014 16:20, schrieb Marek Olšák:
> This series improves performance for the cases when there is not enough VRAM for all buffers.
>
> First of all, I'd like to mention that if you set both VRAM and GTT domains for a buffer, you pretty much say you don't care where the buffer ends up. It usually makes the performance even worse.
>
> This work was largely benchmark-driven and I tried a lot of ideas before I found out which ones work. The patches describe what they do and they're quite simple, so I'll just share the results here.
>
>
> Card: Evergreen Redwood (HD 5670), 512 MB of VRAM
> Test: Unigine Heaven 4.0, High settings
>
> 1) 1280x720, 4x MSAA, need 525 MB of VRAM
>
> Without patches: 16.6 FPS
> With patches: 16.6 FPS
> Improvement: 0 %
>
> 2) 1600x900, 4x MSAA, need 642 MB of VRAM
>
> Without patches: 7.1 FPS
> With patches: 9.7 FPS
> Improvement: 36 %
>
> 3) 1920x1080, 4x MSAA, need 743 MB of VRAM
>
> Without patches: 3.7 FPS
> With patches: 5.6 FPS
> Improvement: 51 %
>
> 4) 1600x900, 8x MSAA, need 838 MB of VRAM
> Without patches: 2.9 FPS
> With patches: 4.6 FPS
> Improvement: 58 %
>
> These results don't change if you run the benchmark several times, which proves the improvement is stable.
>
>
> To conclude this, here are ideas for future work:
>
> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory, which not only solves fragmentation issues, but it also allows each buffer to be partially in VRAM and partially in GTT, which becomes more important with large buffers like 100 MB. Moving whole buffers back and forth between VRAM and GTT is inefficient if you can do it at page granularity. Also, due to fragmentation, we can never really use all of VRAM, but only about 90-95%.

Yeah, I'm also thinking about this for quite some time now. The basic 
problem is that while our GPUs support VM they don't support faulting 
pages in and continuing (at least nobody got that working reliable so 
far). E.g. when you hit a page fault you can't relocate the page and 
then continue.

Support for partially resident textures on newer hardware currently 
works by splitting the buffer up into smaller buffers in userspace and 
then actively checking in the shader if we hit a buffer that's not 
currently in memory, but that's not really applicable in the general use 
case (to much shader overhead).

> 2) Add support for uncached GTT. I think it should improve performance for dGPUs under memory pressure, but some testing needs to be done to confirm that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said to be working on some later chips.

Did you try to make the whole GTT uncached or just evicted BOs? Making 
the whole GTT uncached probably won't work out of the box, but avoiding 
setting the "SNOOPED" flag on those pages might get us better 
performance while swapping them into VRAM again.

Christian.

>
> The patches for Mesa will follow later today. Please review.
>
> Marek
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 16:40 ` [PATCH 0/6] Radeon memory management improvements Christian König
@ 2014-02-24 19:23   ` Alex Deucher
  2014-02-24 19:39   ` Marek Olšák
  1 sibling, 0 replies; 23+ messages in thread
From: Alex Deucher @ 2014-02-24 19:23 UTC (permalink / raw)
  To: Christian König; +Cc: Maling list - DRI developers

On Mon, Feb 24, 2014 at 11:40 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Hi Marek,
>
> Some minor comments on patch 1, 2 and 5, but nothing serious. Patch 3, 4 and
> 6 are Reviewed-by: christian König <christian.koenig@amd.com>
>
> See below for a few in line comments.
>
> Am 24.02.2014 16:20, schrieb Marek Olšák:
>
>> This series improves performance for the cases when there is not enough
>> VRAM for all buffers.
>>
>> First of all, I'd like to mention that if you set both VRAM and GTT
>> domains for a buffer, you pretty much say you don't care where the buffer
>> ends up. It usually makes the performance even worse.
>>
>> This work was largely benchmark-driven and I tried a lot of ideas before I
>> found out which ones work. The patches describe what they do and they're
>> quite simple, so I'll just share the results here.
>>
>>
>> Card: Evergreen Redwood (HD 5670), 512 MB of VRAM
>> Test: Unigine Heaven 4.0, High settings
>>
>> 1) 1280x720, 4x MSAA, need 525 MB of VRAM
>>
>> Without patches: 16.6 FPS
>> With patches: 16.6 FPS
>> Improvement: 0 %
>>
>> 2) 1600x900, 4x MSAA, need 642 MB of VRAM
>>
>> Without patches: 7.1 FPS
>> With patches: 9.7 FPS
>> Improvement: 36 %
>>
>> 3) 1920x1080, 4x MSAA, need 743 MB of VRAM
>>
>> Without patches: 3.7 FPS
>> With patches: 5.6 FPS
>> Improvement: 51 %
>>
>> 4) 1600x900, 8x MSAA, need 838 MB of VRAM
>> Without patches: 2.9 FPS
>> With patches: 4.6 FPS
>> Improvement: 58 %
>>
>> These results don't change if you run the benchmark several times, which
>> proves the improvement is stable.
>>
>>
>> To conclude this, here are ideas for future work:
>>
>> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
>> which not only solves fragmentation issues, but it also allows each buffer
>> to be partially in VRAM and partially in GTT, which becomes more important
>> with large buffers like 100 MB. Moving whole buffers back and forth between
>> VRAM and GTT is inefficient if you can do it at page granularity. Also, due
>> to fragmentation, we can never really use all of VRAM, but only about
>> 90-95%.
>
>
> Yeah, I'm also thinking about this for quite some time now. The basic
> problem is that while our GPUs support VM they don't support faulting pages
> in and continuing (at least nobody got that working reliable so far). E.g.
> when you hit a page fault you can't relocate the page and then continue.
>

Well, for non-scanout buffers we can do scatter gather for vram pages
rather than using contiguous buffers.  That would at least avoid low
memory situations where there is enough vram, just not contiguous.
Another option would be to write a ttm defragmentor, but I think we'd
have to fix the synchronization issues with bo moves first.

Alex

> Support for partially resident textures on newer hardware currently works by
> splitting the buffer up into smaller buffers in userspace and then actively
> checking in the shader if we hit a buffer that's not currently in memory,
> but that's not really applicable in the general use case (to much shader
> overhead).
>
>
>> 2) Add support for uncached GTT. I think it should improve performance for
>> dGPUs under memory pressure, but some testing needs to be done to confirm
>> that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
>> to be working on some later chips.
>
>
> Did you try to make the whole GTT uncached or just evicted BOs? Making the
> whole GTT uncached probably won't work out of the box, but avoiding setting
> the "SNOOPED" flag on those pages might get us better performance while
> swapping them into VRAM again.
>
> Christian.
>
>
>>
>> The patches for Mesa will follow later today. Please review.
>>
>> Marek
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 16:40 ` [PATCH 0/6] Radeon memory management improvements Christian König
  2014-02-24 19:23   ` Alex Deucher
@ 2014-02-24 19:39   ` Marek Olšák
  2014-02-25 10:11     ` Christian König
  2014-02-27  1:17     ` Jerome Glisse
  1 sibling, 2 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-24 19:39 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, Feb 24, 2014 at 5:40 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 24.02.2014 16:20, schrieb Marek Olšák:
>> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
>> which not only solves fragmentation issues, but it also allows each buffer
>> to be partially in VRAM and partially in GTT, which becomes more important
>> with large buffers like 100 MB. Moving whole buffers back and forth between
>> VRAM and GTT is inefficient if you can do it at page granularity. Also, due
>> to fragmentation, we can never really use all of VRAM, but only about
>> 90-95%.
>
>
> Yeah, I'm also thinking about this for quite some time now. The basic
> problem is that while our GPUs support VM they don't support faulting pages
> in and continuing (at least nobody got that working reliable so far). E.g.
> when you hit a page fault you can't relocate the page and then continue.
>
> Support for partially resident textures on newer hardware currently works by
> splitting the buffer up into smaller buffers in userspace and then actively
> checking in the shader if we hit a buffer that's not currently in memory,
> but that's not really applicable in the general use case (to much shader
> overhead).

I was thinking of splitting buffers into smaller chunks and treating
them like independent TTM buffers, i.e. one radeon_bo would contain an
array of TTM buffers which would be validated independently of each
other. The chunks would only be mapped together to make them look like
one buffer. This would be hidden from userspace and there would only
be one GEM handle for the whole buffer, so that DRI2 sharing works.

>
>
>> 2) Add support for uncached GTT. I think it should improve performance for
>> dGPUs under memory pressure, but some testing needs to be done to confirm
>> that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
>> to be working on some later chips.
>
>
> Did you try to make the whole GTT uncached or just evicted BOs? Making the
> whole GTT uncached probably won't work out of the box, but avoiding setting
> the "SNOOPED" flag on those pages might get us better performance while
> swapping them into VRAM again.

I made the whole GTT uncached.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
                   ` (6 preceding siblings ...)
  2014-02-24 16:40 ` [PATCH 0/6] Radeon memory management improvements Christian König
@ 2014-02-25  2:52 ` Michel Dänzer
  7 siblings, 0 replies; 23+ messages in thread
From: Michel Dänzer @ 2014-02-25  2:52 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On Mon, 2014-02-24 at 16:20 +0100, Marek Olšák wrote:
> 
> 2) Add support for uncached GTT. I think it should improve performance
> for dGPUs under memory pressure, but some testing needs to be done to
> confirm that. Uncached GTT doesn't seem to work for me on Evergreen,
> but it's said to be working on some later chips.

FWIW, I think the issue is more likely with the code or your system than
with the GPU: fglrx has always been using unsnooped GPU mappings with
write-combined CPU mappings with PCIe as well. I'm not sure how they
choose that vs. snooped and cacheable mappings though.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 19:39   ` Marek Olšák
@ 2014-02-25 10:11     ` Christian König
  2014-02-25 23:29       ` Marek Olšák
  2014-02-27  1:17     ` Jerome Glisse
  1 sibling, 1 reply; 23+ messages in thread
From: Christian König @ 2014-02-25 10:11 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Am 24.02.2014 20:39, schrieb Marek Olšák:
> On Mon, Feb 24, 2014 at 5:40 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 24.02.2014 16:20, schrieb Marek Olšák:
>>> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
>>> which not only solves fragmentation issues, but it also allows each buffer
>>> to be partially in VRAM and partially in GTT, which becomes more important
>>> with large buffers like 100 MB. Moving whole buffers back and forth between
>>> VRAM and GTT is inefficient if you can do it at page granularity. Also, due
>>> to fragmentation, we can never really use all of VRAM, but only about
>>> 90-95%.
>>
>> Yeah, I'm also thinking about this for quite some time now. The basic
>> problem is that while our GPUs support VM they don't support faulting pages
>> in and continuing (at least nobody got that working reliable so far). E.g.
>> when you hit a page fault you can't relocate the page and then continue.
>>
>> Support for partially resident textures on newer hardware currently works by
>> splitting the buffer up into smaller buffers in userspace and then actively
>> checking in the shader if we hit a buffer that's not currently in memory,
>> but that's not really applicable in the general use case (to much shader
>> overhead).
> I was thinking of splitting buffers into smaller chunks and treating
> them like independent TTM buffers, i.e. one radeon_bo would contain an
> array of TTM buffers which would be validated independently of each
> other. The chunks would only be mapped together to make them look like
> one buffer. This would be hidden from userspace and there would only
> be one GEM handle for the whole buffer, so that DRI2 sharing works.

My thoughts where more to to this on the userspace side, but doing it in 
the kernel indeed avoids a bunch of problems with sharing the buffer.

Sounds like a plan to me. The only thing I can currently see missing is 
handling of scanout buffers. Do we have a flag for this while creating 
the buffer?

>>
>>> 2) Add support for uncached GTT. I think it should improve performance for
>>> dGPUs under memory pressure, but some testing needs to be done to confirm
>>> that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
>>> to be working on some later chips.
>>
>> Did you try to make the whole GTT uncached or just evicted BOs? Making the
>> whole GTT uncached probably won't work out of the box, but avoiding setting
>> the "SNOOPED" flag on those pages might get us better performance while
>> swapping them into VRAM again.
> I made the whole GTT uncached.

I'm not sure if this will work, and at least for the ring buffers it's 
probably also not a good idea.

Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-25 10:11     ` Christian König
@ 2014-02-25 23:29       ` Marek Olšák
  0 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-25 23:29 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, Feb 25, 2014 at 11:11 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 24.02.2014 20:39, schrieb Marek Olšák:
>
>> On Mon, Feb 24, 2014 at 5:40 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Am 24.02.2014 16:20, schrieb Marek Olšák:
>>>>
>>>> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
>>>> which not only solves fragmentation issues, but it also allows each
>>>> buffer
>>>> to be partially in VRAM and partially in GTT, which becomes more
>>>> important
>>>> with large buffers like 100 MB. Moving whole buffers back and forth
>>>> between
>>>> VRAM and GTT is inefficient if you can do it at page granularity. Also,
>>>> due
>>>> to fragmentation, we can never really use all of VRAM, but only about
>>>> 90-95%.
>>>
>>>
>>> Yeah, I'm also thinking about this for quite some time now. The basic
>>> problem is that while our GPUs support VM they don't support faulting
>>> pages
>>> in and continuing (at least nobody got that working reliable so far).
>>> E.g.
>>> when you hit a page fault you can't relocate the page and then continue.
>>>
>>> Support for partially resident textures on newer hardware currently works
>>> by
>>> splitting the buffer up into smaller buffers in userspace and then
>>> actively
>>> checking in the shader if we hit a buffer that's not currently in memory,
>>> but that's not really applicable in the general use case (to much shader
>>> overhead).
>>
>> I was thinking of splitting buffers into smaller chunks and treating
>> them like independent TTM buffers, i.e. one radeon_bo would contain an
>> array of TTM buffers which would be validated independently of each
>> other. The chunks would only be mapped together to make them look like
>> one buffer. This would be hidden from userspace and there would only
>> be one GEM handle for the whole buffer, so that DRI2 sharing works.
>
>
> My thoughts where more to to this on the userspace side, but doing it in the
> kernel indeed avoids a bunch of problems with sharing the buffer.
>
> Sounds like a plan to me. The only thing I can currently see missing is
> handling of scanout buffers. Do we have a flag for this while creating the
> buffer?

We have a flag in Mesa, but not in the kernel. We can put it in
drm_radeon_gem_create::flags though.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains
  2014-02-24 16:17   ` Christian König
@ 2014-02-26  0:44     ` Marek Olšák
  2014-02-26  9:39       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Olšák @ 2014-02-26  0:44 UTC (permalink / raw)
  To: dri-devel

From: Marek Olšák <marek.olsak@amd.com>

When passing buffers between processes, the receiving process needs to know
the original buffer domain, so that it doesn't accidentally move the buffer.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  3 +++
 drivers/gpu/drm/radeon/radeon_drv.c    |  3 ++-
 drivers/gpu/drm/radeon/radeon_gem.c    | 36 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c    |  1 +
 drivers/gpu/drm/radeon/radeon_object.c |  3 +++
 include/uapi/drm/radeon_drm.h          | 12 ++++++++++++
 6 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index a415f8e..3f10782 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -454,6 +454,7 @@ struct radeon_bo {
 	/* Protected by gem.mutex */
 	struct list_head		list;
 	/* Protected by tbo.reserved */
+	u32				initial_domain;
 	u32				placements[3];
 	struct ttm_placement		placement;
 	struct ttm_buffer_object	tbo;
@@ -2114,6 +2115,8 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
 			      struct drm_file *filp);
 int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
 			  struct drm_file *filp);
+int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *filp);
 int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
 int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 84a1bbb7..4392b7c 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -79,9 +79,10 @@
  *   2.35.0 - Add CIK macrotile mode array query
  *   2.36.0 - Fix CIK DCE tiling setup
  *   2.37.0 - allow GS ring setup on r6xx/r7xx
+ *   2.38.0 - RADEON_GEM_OP (GET_INITIAL_DOMAIN, SET_INITIAL_DOMAIN)
  */
 #define KMS_DRIVER_MAJOR	2
-#define KMS_DRIVER_MINOR	37
+#define KMS_DRIVER_MINOR	38
 #define KMS_DRIVER_PATCHLEVEL	0
 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
 int radeon_driver_unload_kms(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index b96c819..9863ca7 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -533,6 +533,42 @@ out:
 	return r;
 }
 
+int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *filp)
+{
+	struct drm_radeon_gem_op *args = data;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+	int r;
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		return -ENOENT;
+	}
+	robj = gem_to_radeon_bo(gobj);
+	r = radeon_bo_reserve(robj, false);
+	if (unlikely(r))
+		goto out;
+
+	switch (args->op) {
+	case RADEON_GEM_OP_GET_INITIAL_DOMAIN:
+		args->value = robj->initial_domain;
+		break;
+	case RADEON_GEM_OP_SET_INITIAL_DOMAIN:
+		robj->initial_domain = args->value & (RADEON_GEM_DOMAIN_VRAM |
+						      RADEON_GEM_DOMAIN_GTT |
+						      RADEON_GEM_DOMAIN_CPU);
+		break;
+	default:
+		r = -EINVAL;
+	}
+
+	radeon_bo_unreserve(robj);
+out:
+	drm_gem_object_unreference_unlocked(gobj);
+	return r;
+}
+
 int radeon_mode_dumb_create(struct drm_file *file_priv,
 			    struct drm_device *dev,
 			    struct drm_mode_create_dumb *args)
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index baff98b..0b631eb 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -814,5 +814,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 08595cf..dd12bb4 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -145,6 +145,9 @@ int radeon_bo_create(struct radeon_device *rdev,
 	bo->surface_reg = -1;
 	INIT_LIST_HEAD(&bo->list);
 	INIT_LIST_HEAD(&bo->va);
+	bo->initial_domain = domain & (RADEON_GEM_DOMAIN_VRAM |
+	                               RADEON_GEM_DOMAIN_GTT |
+	                               RADEON_GEM_DOMAIN_CPU);
 	radeon_ttm_placement_from_domain(bo, domain);
 	/* Kernel allocation are uninterruptible */
 	down_read(&rdev->pm.mclk_lock);
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 1cf18b4..cb5c93a 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -510,6 +510,7 @@ typedef struct {
 #define DRM_RADEON_GEM_GET_TILING	0x29
 #define DRM_RADEON_GEM_BUSY		0x2a
 #define DRM_RADEON_GEM_VA		0x2b
+#define DRM_RADEON_GEM_OP		0x2c
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -552,6 +553,7 @@ typedef struct {
 #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
 #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
 #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -884,6 +886,16 @@ struct drm_radeon_gem_pwrite {
 	uint64_t data_ptr;
 };
 
+/* Sets or returns a value associated with a buffer. */
+struct drm_radeon_gem_op {
+	uint32_t	handle; /* buffer */
+	uint32_t	op;     /* RADEON_GEM_OP_* */
+	uint64_t	value;  /* input or return value */
+};
+
+#define RADEON_GEM_OP_GET_INITIAL_DOMAIN	0
+#define RADEON_GEM_OP_SET_INITIAL_DOMAIN	1
+
 #define RADEON_VA_MAP			1
 #define RADEON_VA_UNMAP			2
 
-- 
1.8.3.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains
  2014-02-26  0:44     ` Marek Olšák
@ 2014-02-26  9:39       ` Christian König
  2014-02-26 10:58         ` Marek Olšák
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2014-02-26  9:39 UTC (permalink / raw)
  To: Marek Olšák, dri-devel

Am 26.02.2014 01:44, schrieb Marek Olšák:
> From: Marek Olšák <marek.olsak@amd.com>
>
> When passing buffers between processes, the receiving process needs to know
> the original buffer domain, so that it doesn't accidentally move the buffer.
>
> Signed-off-by: Marek Olšák <marek.olsak@amd.com>

This patch is: Reviewed-by: Christian König <christian.koenig@amd.com>

Where is the rest of the series? At least I didn't got it in my mailbox, 
and I would like to apply it to drm-next-3.15.

Cheers,
Christian.

> ---
>   drivers/gpu/drm/radeon/radeon.h        |  3 +++
>   drivers/gpu/drm/radeon/radeon_drv.c    |  3 ++-
>   drivers/gpu/drm/radeon/radeon_gem.c    | 36 ++++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/radeon/radeon_kms.c    |  1 +
>   drivers/gpu/drm/radeon/radeon_object.c |  3 +++
>   include/uapi/drm/radeon_drm.h          | 12 ++++++++++++
>   6 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index a415f8e..3f10782 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -454,6 +454,7 @@ struct radeon_bo {
>   	/* Protected by gem.mutex */
>   	struct list_head		list;
>   	/* Protected by tbo.reserved */
> +	u32				initial_domain;
>   	u32				placements[3];
>   	struct ttm_placement		placement;
>   	struct ttm_buffer_object	tbo;
> @@ -2114,6 +2115,8 @@ int radeon_gem_wait_idle_ioctl(struct drm_device *dev, void *data,
>   			      struct drm_file *filp);
>   int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
>   			  struct drm_file *filp);
> +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *filp);
>   int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp);
>   int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
>   				struct drm_file *filp);
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index 84a1bbb7..4392b7c 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -79,9 +79,10 @@
>    *   2.35.0 - Add CIK macrotile mode array query
>    *   2.36.0 - Fix CIK DCE tiling setup
>    *   2.37.0 - allow GS ring setup on r6xx/r7xx
> + *   2.38.0 - RADEON_GEM_OP (GET_INITIAL_DOMAIN, SET_INITIAL_DOMAIN)
>    */
>   #define KMS_DRIVER_MAJOR	2
> -#define KMS_DRIVER_MINOR	37
> +#define KMS_DRIVER_MINOR	38
>   #define KMS_DRIVER_PATCHLEVEL	0
>   int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
>   int radeon_driver_unload_kms(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index b96c819..9863ca7 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -533,6 +533,42 @@ out:
>   	return r;
>   }
>   
> +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *filp)
> +{
> +	struct drm_radeon_gem_op *args = data;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	int r;
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		return -ENOENT;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +	r = radeon_bo_reserve(robj, false);
> +	if (unlikely(r))
> +		goto out;
> +
> +	switch (args->op) {
> +	case RADEON_GEM_OP_GET_INITIAL_DOMAIN:
> +		args->value = robj->initial_domain;
> +		break;
> +	case RADEON_GEM_OP_SET_INITIAL_DOMAIN:
> +		robj->initial_domain = args->value & (RADEON_GEM_DOMAIN_VRAM |
> +						      RADEON_GEM_DOMAIN_GTT |
> +						      RADEON_GEM_DOMAIN_CPU);
> +		break;
> +	default:
> +		r = -EINVAL;
> +	}
> +
> +	radeon_bo_unreserve(robj);
> +out:
> +	drm_gem_object_unreference_unlocked(gobj);
> +	return r;
> +}
> +
>   int radeon_mode_dumb_create(struct drm_file *file_priv,
>   			    struct drm_device *dev,
>   			    struct drm_mode_create_dumb *args)
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index baff98b..0b631eb 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -814,5 +814,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   };
>   int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
> diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
> index 08595cf..dd12bb4 100644
> --- a/drivers/gpu/drm/radeon/radeon_object.c
> +++ b/drivers/gpu/drm/radeon/radeon_object.c
> @@ -145,6 +145,9 @@ int radeon_bo_create(struct radeon_device *rdev,
>   	bo->surface_reg = -1;
>   	INIT_LIST_HEAD(&bo->list);
>   	INIT_LIST_HEAD(&bo->va);
> +	bo->initial_domain = domain & (RADEON_GEM_DOMAIN_VRAM |
> +	                               RADEON_GEM_DOMAIN_GTT |
> +	                               RADEON_GEM_DOMAIN_CPU);
>   	radeon_ttm_placement_from_domain(bo, domain);
>   	/* Kernel allocation are uninterruptible */
>   	down_read(&rdev->pm.mclk_lock);
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 1cf18b4..cb5c93a 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -510,6 +510,7 @@ typedef struct {
>   #define DRM_RADEON_GEM_GET_TILING	0x29
>   #define DRM_RADEON_GEM_BUSY		0x2a
>   #define DRM_RADEON_GEM_VA		0x2b
> +#define DRM_RADEON_GEM_OP		0x2c
>   
>   #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
>   #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
> @@ -552,6 +553,7 @@ typedef struct {
>   #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
>   #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
>   #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> +#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
>   
>   typedef struct drm_radeon_init {
>   	enum {
> @@ -884,6 +886,16 @@ struct drm_radeon_gem_pwrite {
>   	uint64_t data_ptr;
>   };
>   
> +/* Sets or returns a value associated with a buffer. */
> +struct drm_radeon_gem_op {
> +	uint32_t	handle; /* buffer */
> +	uint32_t	op;     /* RADEON_GEM_OP_* */
> +	uint64_t	value;  /* input or return value */
> +};
> +
> +#define RADEON_GEM_OP_GET_INITIAL_DOMAIN	0
> +#define RADEON_GEM_OP_SET_INITIAL_DOMAIN	1
> +
>   #define RADEON_VA_MAP			1
>   #define RADEON_VA_UNMAP			2
>   

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains
  2014-02-26  9:39       ` Christian König
@ 2014-02-26 10:58         ` Marek Olšák
  0 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-26 10:58 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

I'll send the other patches today.

Marek

On Wed, Feb 26, 2014 at 10:39 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 26.02.2014 01:44, schrieb Marek Olšák:
>
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> When passing buffers between processes, the receiving process needs to
>> know
>> the original buffer domain, so that it doesn't accidentally move the
>> buffer.
>>
>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>
>
> This patch is: Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Where is the rest of the series? At least I didn't got it in my mailbox, and
> I would like to apply it to drm-next-3.15.
>
> Cheers,
> Christian.
>
>
>> ---
>>   drivers/gpu/drm/radeon/radeon.h        |  3 +++
>>   drivers/gpu/drm/radeon/radeon_drv.c    |  3 ++-
>>   drivers/gpu/drm/radeon/radeon_gem.c    | 36
>> ++++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/radeon/radeon_kms.c    |  1 +
>>   drivers/gpu/drm/radeon/radeon_object.c |  3 +++
>>   include/uapi/drm/radeon_drm.h          | 12 ++++++++++++
>>   6 files changed, 57 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>> b/drivers/gpu/drm/radeon/radeon.h
>> index a415f8e..3f10782 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -454,6 +454,7 @@ struct radeon_bo {
>>         /* Protected by gem.mutex */
>>         struct list_head                list;
>>         /* Protected by tbo.reserved */
>> +       u32                             initial_domain;
>>         u32                             placements[3];
>>         struct ttm_placement            placement;
>>         struct ttm_buffer_object        tbo;
>> @@ -2114,6 +2115,8 @@ int radeon_gem_wait_idle_ioctl(struct drm_device
>> *dev, void *data,
>>                               struct drm_file *filp);
>>   int radeon_gem_va_ioctl(struct drm_device *dev, void *data,
>>                           struct drm_file *filp);
>> +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
>> +                       struct drm_file *filp);
>>   int radeon_cs_ioctl(struct drm_device *dev, void *data, struct drm_file
>> *filp);
>>   int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
>>                                 struct drm_file *filp);
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c
>> b/drivers/gpu/drm/radeon/radeon_drv.c
>> index 84a1bbb7..4392b7c 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -79,9 +79,10 @@
>>    *   2.35.0 - Add CIK macrotile mode array query
>>    *   2.36.0 - Fix CIK DCE tiling setup
>>    *   2.37.0 - allow GS ring setup on r6xx/r7xx
>> + *   2.38.0 - RADEON_GEM_OP (GET_INITIAL_DOMAIN, SET_INITIAL_DOMAIN)
>>    */
>>   #define KMS_DRIVER_MAJOR      2
>> -#define KMS_DRIVER_MINOR       37
>> +#define KMS_DRIVER_MINOR       38
>>   #define KMS_DRIVER_PATCHLEVEL 0
>>   int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
>>   int radeon_driver_unload_kms(struct drm_device *dev);
>> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c
>> b/drivers/gpu/drm/radeon/radeon_gem.c
>> index b96c819..9863ca7 100644
>> --- a/drivers/gpu/drm/radeon/radeon_gem.c
>> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
>> @@ -533,6 +533,42 @@ out:
>>         return r;
>>   }
>>   +int radeon_gem_op_ioctl(struct drm_device *dev, void *data,
>> +                       struct drm_file *filp)
>> +{
>> +       struct drm_radeon_gem_op *args = data;
>> +       struct drm_gem_object *gobj;
>> +       struct radeon_bo *robj;
>> +       int r;
>> +
>> +       gobj = drm_gem_object_lookup(dev, filp, args->handle);
>> +       if (gobj == NULL) {
>> +               return -ENOENT;
>> +       }
>> +       robj = gem_to_radeon_bo(gobj);
>> +       r = radeon_bo_reserve(robj, false);
>> +       if (unlikely(r))
>> +               goto out;
>> +
>> +       switch (args->op) {
>> +       case RADEON_GEM_OP_GET_INITIAL_DOMAIN:
>> +               args->value = robj->initial_domain;
>> +               break;
>> +       case RADEON_GEM_OP_SET_INITIAL_DOMAIN:
>> +               robj->initial_domain = args->value &
>> (RADEON_GEM_DOMAIN_VRAM |
>> +
>> RADEON_GEM_DOMAIN_GTT |
>> +
>> RADEON_GEM_DOMAIN_CPU);
>> +               break;
>> +       default:
>> +               r = -EINVAL;
>> +       }
>> +
>> +       radeon_bo_unreserve(robj);
>> +out:
>> +       drm_gem_object_unreference_unlocked(gobj);
>> +       return r;
>> +}
>> +
>>   int radeon_mode_dumb_create(struct drm_file *file_priv,
>>                             struct drm_device *dev,
>>                             struct drm_mode_create_dumb *args)
>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c
>> b/drivers/gpu/drm/radeon/radeon_kms.c
>> index baff98b..0b631eb 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>> @@ -814,5 +814,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
>>         DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING,
>> radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl,
>> DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>         DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl,
>> DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>> +       DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl,
>> DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>>   };
>>   int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
>> diff --git a/drivers/gpu/drm/radeon/radeon_object.c
>> b/drivers/gpu/drm/radeon/radeon_object.c
>> index 08595cf..dd12bb4 100644
>> --- a/drivers/gpu/drm/radeon/radeon_object.c
>> +++ b/drivers/gpu/drm/radeon/radeon_object.c
>> @@ -145,6 +145,9 @@ int radeon_bo_create(struct radeon_device *rdev,
>>         bo->surface_reg = -1;
>>         INIT_LIST_HEAD(&bo->list);
>>         INIT_LIST_HEAD(&bo->va);
>> +       bo->initial_domain = domain & (RADEON_GEM_DOMAIN_VRAM |
>> +                                      RADEON_GEM_DOMAIN_GTT |
>> +                                      RADEON_GEM_DOMAIN_CPU);
>>         radeon_ttm_placement_from_domain(bo, domain);
>>         /* Kernel allocation are uninterruptible */
>>         down_read(&rdev->pm.mclk_lock);
>> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
>> index 1cf18b4..cb5c93a 100644
>> --- a/include/uapi/drm/radeon_drm.h
>> +++ b/include/uapi/drm/radeon_drm.h
>> @@ -510,6 +510,7 @@ typedef struct {
>>   #define DRM_RADEON_GEM_GET_TILING     0x29
>>   #define DRM_RADEON_GEM_BUSY           0x2a
>>   #define DRM_RADEON_GEM_VA             0x2b
>> +#define DRM_RADEON_GEM_OP              0x2c
>>     #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE +
>> DRM_RADEON_CP_INIT, drm_radeon_init_t)
>>   #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE +
>> DRM_RADEON_CP_START)
>> @@ -552,6 +553,7 @@ typedef struct {
>>   #define DRM_IOCTL_RADEON_GEM_GET_TILING       DRM_IOWR(DRM_COMMAND_BASE
>> + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
>>   #define DRM_IOCTL_RADEON_GEM_BUSY     DRM_IOWR(DRM_COMMAND_BASE +
>> DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
>>   #define DRM_IOCTL_RADEON_GEM_VA               DRM_IOWR(DRM_COMMAND_BASE
>> + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
>> +#define DRM_IOCTL_RADEON_GEM_OP                DRM_IOWR(DRM_COMMAND_BASE
>> + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
>>     typedef struct drm_radeon_init {
>>         enum {
>> @@ -884,6 +886,16 @@ struct drm_radeon_gem_pwrite {
>>         uint64_t data_ptr;
>>   };
>>   +/* Sets or returns a value associated with a buffer. */
>> +struct drm_radeon_gem_op {
>> +       uint32_t        handle; /* buffer */
>> +       uint32_t        op;     /* RADEON_GEM_OP_* */
>> +       uint64_t        value;  /* input or return value */
>> +};
>> +
>> +#define RADEON_GEM_OP_GET_INITIAL_DOMAIN       0
>> +#define RADEON_GEM_OP_SET_INITIAL_DOMAIN       1
>> +
>>   #define RADEON_VA_MAP                 1
>>   #define RADEON_VA_UNMAP                       2
>>
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves
  2014-02-24 16:20   ` Christian König
@ 2014-02-26 17:56     ` Marek Olšák
  2014-02-26 18:26       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Marek Olšák @ 2014-02-26 17:56 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, Feb 24, 2014 at 5:20 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 24.02.2014 16:20, schrieb Marek Olšák:
>
>> From: Marek Olšák <marek.olsak@amd.com>
>>
>> The statistics are:
>> - VRAM usage in bytes
>> - GTT usage in bytes
>> - number of bytes moved by TTM
>>
>> The last one is actually a counter, so you need to sample it before and
>> after
>> command submission and take the difference.
>>
>> This is useful for finding performance bottlenecks. Userspace queries are
>> also added.
>>
>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>> ---
>>   drivers/gpu/drm/radeon/radeon.h        |  5 +++++
>>   drivers/gpu/drm/radeon/radeon_device.c |  1 +
>>   drivers/gpu/drm/radeon/radeon_kms.c    | 15 ++++++++++++++
>>   drivers/gpu/drm/radeon/radeon_object.c | 38
>> +++++++++++++++++++++++++++++++++-
>>   drivers/gpu/drm/radeon/radeon_object.h |  2 +-
>>   drivers/gpu/drm/radeon/radeon_ttm.c    | 10 ++++++++-
>>   include/uapi/drm/radeon_drm.h          |  3 +++
>>   7 files changed, 71 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>> b/drivers/gpu/drm/radeon/radeon.h
>> index 3f10782..d37a57a 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -2307,6 +2307,11 @@ struct radeon_device {
>>         /* virtual memory */
>>         struct radeon_vm_manager        vm_manager;
>>         struct mutex                    gpu_clock_mutex;
>> +       /* memory stats */
>> +       struct mutex                    memory_stats_mutex;
>> +       uint64_t                        vram_usage;
>> +       uint64_t                        gtt_usage;
>> +       uint64_t                        num_bytes_moved;
>
>
> As far as I can see you could make those tree values atomic64_t instead and
> avoid the mutex.

I'm afraid I cannot use atomic64_t. It doesn't work on x86 32-bit.
This seems to be a no-op:

u64 size = (u64)bo->num_pages << PAGE_SHIFT;
atomic64_add(size, &rdev->num_bytes_moved);

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves
  2014-02-26 17:56     ` Marek Olšák
@ 2014-02-26 18:26       ` Christian König
  2014-02-27  0:39         ` Marek Olšák
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2014-02-26 18:26 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

Am 26.02.2014 18:56, schrieb Marek Olšák:
> On Mon, Feb 24, 2014 at 5:20 PM, Christian König
> <deathsimple@vodafone.de> wrote:
>> Am 24.02.2014 16:20, schrieb Marek Olšák:
>>
>>> From: Marek Olšák <marek.olsak@amd.com>
>>>
>>> The statistics are:
>>> - VRAM usage in bytes
>>> - GTT usage in bytes
>>> - number of bytes moved by TTM
>>>
>>> The last one is actually a counter, so you need to sample it before and
>>> after
>>> command submission and take the difference.
>>>
>>> This is useful for finding performance bottlenecks. Userspace queries are
>>> also added.
>>>
>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>> ---
>>>    drivers/gpu/drm/radeon/radeon.h        |  5 +++++
>>>    drivers/gpu/drm/radeon/radeon_device.c |  1 +
>>>    drivers/gpu/drm/radeon/radeon_kms.c    | 15 ++++++++++++++
>>>    drivers/gpu/drm/radeon/radeon_object.c | 38
>>> +++++++++++++++++++++++++++++++++-
>>>    drivers/gpu/drm/radeon/radeon_object.h |  2 +-
>>>    drivers/gpu/drm/radeon/radeon_ttm.c    | 10 ++++++++-
>>>    include/uapi/drm/radeon_drm.h          |  3 +++
>>>    7 files changed, 71 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>> b/drivers/gpu/drm/radeon/radeon.h
>>> index 3f10782..d37a57a 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -2307,6 +2307,11 @@ struct radeon_device {
>>>          /* virtual memory */
>>>          struct radeon_vm_manager        vm_manager;
>>>          struct mutex                    gpu_clock_mutex;
>>> +       /* memory stats */
>>> +       struct mutex                    memory_stats_mutex;
>>> +       uint64_t                        vram_usage;
>>> +       uint64_t                        gtt_usage;
>>> +       uint64_t                        num_bytes_moved;
>>
>> As far as I can see you could make those tree values atomic64_t instead and
>> avoid the mutex.
> I'm afraid I cannot use atomic64_t. It doesn't work on x86 32-bit.
> This seems to be a no-op:
>
> u64 size = (u64)bo->num_pages << PAGE_SHIFT;
> atomic64_add(size, &rdev->num_bytes_moved);

Are you sure about this? Haven't tested x86 32-bit in a long time, but 
we use atomic64 all around the place and they usually work perfectly.

Christian.

> Marek

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves
  2014-02-26 18:26       ` Christian König
@ 2014-02-27  0:39         ` Marek Olšák
  0 siblings, 0 replies; 23+ messages in thread
From: Marek Olšák @ 2014-02-27  0:39 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

Dammit. I renamed the RADEON_INFO definitions for the new queries to
0xd, e, f in the kernel tree, but I forgot to update the Mesa code,
which used 0xc, d, e. Sorry.

Marek

On Wed, Feb 26, 2014 at 7:26 PM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 26.02.2014 18:56, schrieb Marek Olšák:
>
>> On Mon, Feb 24, 2014 at 5:20 PM, Christian König
>> <deathsimple@vodafone.de> wrote:
>>>
>>> Am 24.02.2014 16:20, schrieb Marek Olšák:
>>>
>>>> From: Marek Olšák <marek.olsak@amd.com>
>>>>
>>>> The statistics are:
>>>> - VRAM usage in bytes
>>>> - GTT usage in bytes
>>>> - number of bytes moved by TTM
>>>>
>>>> The last one is actually a counter, so you need to sample it before and
>>>> after
>>>> command submission and take the difference.
>>>>
>>>> This is useful for finding performance bottlenecks. Userspace queries
>>>> are
>>>> also added.
>>>>
>>>> Signed-off-by: Marek Olšák <marek.olsak@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/radeon/radeon.h        |  5 +++++
>>>>    drivers/gpu/drm/radeon/radeon_device.c |  1 +
>>>>    drivers/gpu/drm/radeon/radeon_kms.c    | 15 ++++++++++++++
>>>>    drivers/gpu/drm/radeon/radeon_object.c | 38
>>>> +++++++++++++++++++++++++++++++++-
>>>>    drivers/gpu/drm/radeon/radeon_object.h |  2 +-
>>>>    drivers/gpu/drm/radeon/radeon_ttm.c    | 10 ++++++++-
>>>>    include/uapi/drm/radeon_drm.h          |  3 +++
>>>>    7 files changed, 71 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>>> b/drivers/gpu/drm/radeon/radeon.h
>>>> index 3f10782..d37a57a 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>>> @@ -2307,6 +2307,11 @@ struct radeon_device {
>>>>          /* virtual memory */
>>>>          struct radeon_vm_manager        vm_manager;
>>>>          struct mutex                    gpu_clock_mutex;
>>>> +       /* memory stats */
>>>> +       struct mutex                    memory_stats_mutex;
>>>> +       uint64_t                        vram_usage;
>>>> +       uint64_t                        gtt_usage;
>>>> +       uint64_t                        num_bytes_moved;
>>>
>>>
>>> As far as I can see you could make those tree values atomic64_t instead
>>> and
>>> avoid the mutex.
>>
>> I'm afraid I cannot use atomic64_t. It doesn't work on x86 32-bit.
>> This seems to be a no-op:
>>
>> u64 size = (u64)bo->num_pages << PAGE_SHIFT;
>> atomic64_add(size, &rdev->num_bytes_moved);
>
>
> Are you sure about this? Haven't tested x86 32-bit in a long time, but we
> use atomic64 all around the place and they usually work perfectly.
>
> Christian.
>
>> Marek
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Radeon memory management improvements
  2014-02-24 19:39   ` Marek Olšák
  2014-02-25 10:11     ` Christian König
@ 2014-02-27  1:17     ` Jerome Glisse
  1 sibling, 0 replies; 23+ messages in thread
From: Jerome Glisse @ 2014-02-27  1:17 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On Mon, Feb 24, 2014 at 08:39:07PM +0100, Marek Olšák wrote:
> On Mon, Feb 24, 2014 at 5:40 PM, Christian König
> <deathsimple@vodafone.de> wrote:
> > Am 24.02.2014 16:20, schrieb Marek Olšák:
> >> 1) Add virtual memory support for VRAM. Our GPUs support virtual memory,
> >> which not only solves fragmentation issues, but it also allows each buffer
> >> to be partially in VRAM and partially in GTT, which becomes more important
> >> with large buffers like 100 MB. Moving whole buffers back and forth between
> >> VRAM and GTT is inefficient if you can do it at page granularity. Also, due
> >> to fragmentation, we can never really use all of VRAM, but only about
> >> 90-95%.
> >
> >
> > Yeah, I'm also thinking about this for quite some time now. The basic
> > problem is that while our GPUs support VM they don't support faulting pages
> > in and continuing (at least nobody got that working reliable so far). E.g.
> > when you hit a page fault you can't relocate the page and then continue.
> >
> > Support for partially resident textures on newer hardware currently works by
> > splitting the buffer up into smaller buffers in userspace and then actively
> > checking in the shader if we hit a buffer that's not currently in memory,
> > but that's not really applicable in the general use case (to much shader
> > overhead).
> 
> I was thinking of splitting buffers into smaller chunks and treating
> them like independent TTM buffers, i.e. one radeon_bo would contain an
> array of TTM buffers which would be validated independently of each
> other. The chunks would only be mapped together to make them look like
> one buffer. This would be hidden from userspace and there would only
> be one GEM handle for the whole buffer, so that DRI2 sharing works.

This is a bad idea you will waste a lot of memory for all the ttm objects.
I think you should just decouple that from ttm. TTM placement would be a
hint ie if ttm placement is VRAM than radeon code should try to put as
much as possible of it in VRAM.

radeon would manage chunk of VRAM with lightweight structure. Of course if
such thing is also usefull for nvidia then it would make sense to do that
in ttm.

For the scanout buffer this can be done when the buffer is bound to display
at which point a flag is set thus we would be backward compatible with old
userspace.

Cheers,
Jérôme

> 
> >
> >
> >> 2) Add support for uncached GTT. I think it should improve performance for
> >> dGPUs under memory pressure, but some testing needs to be done to confirm
> >> that. Uncached GTT doesn't seem to work for me on Evergreen, but it's said
> >> to be working on some later chips.
> >
> >
> > Did you try to make the whole GTT uncached or just evicted BOs? Making the
> > whole GTT uncached probably won't work out of the box, but avoiding setting
> > the "SNOOPED" flag on those pages might get us better performance while
> > swapping them into VRAM again.
> 
> I made the whole GTT uncached.
> 
> Marek
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-02-27  1:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24 15:20 [PATCH 0/6] Radeon memory management improvements Marek Olšák
2014-02-24 15:20 ` [PATCH 1/6] drm/radeon: add a way to get and set initial buffer domains Marek Olšák
2014-02-24 16:17   ` Christian König
2014-02-26  0:44     ` Marek Olšák
2014-02-26  9:39       ` Christian König
2014-02-26 10:58         ` Marek Olšák
2014-02-24 15:20 ` [PATCH 2/6] drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves Marek Olšák
2014-02-24 16:20   ` Christian König
2014-02-26 17:56     ` Marek Olšák
2014-02-26 18:26       ` Christian König
2014-02-27  0:39         ` Marek Olšák
2014-02-24 15:20 ` [PATCH 3/6] drm/radeon: deduplicate code in radeon_gem_busy_ioctl Marek Olšák
2014-02-24 15:20 ` [PATCH 4/6] drm/radeon: add buffers to the LRU list from smallest to largest Marek Olšák
2014-02-24 15:20 ` [PATCH 5/6] drm/radeon: validate relocations in the order determined by userspace Marek Olšák
2014-02-24 16:27   ` Christian König
2014-02-24 15:20 ` [PATCH 6/6] drm/radeon: limit how much memory TTM can move per IB according to VRAM usage Marek Olšák
2014-02-24 16:40 ` [PATCH 0/6] Radeon memory management improvements Christian König
2014-02-24 19:23   ` Alex Deucher
2014-02-24 19:39   ` Marek Olšák
2014-02-25 10:11     ` Christian König
2014-02-25 23:29       ` Marek Olšák
2014-02-27  1:17     ` Jerome Glisse
2014-02-25  2:52 ` Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.