* [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
@ 2014-04-20 17:29 Lauri Kasanen
2014-04-20 17:41 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Lauri Kasanen @ 2014-04-20 17:29 UTC (permalink / raw)
To: dri-devel
This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
Indeed, a first attempt at inlining it grew radeon.ko by 7%.
However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
in Urban Terror.
v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
optimize the branch out, improving both performance and size.
The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
says perf is now more than 1% better.
v3: Also change _wreg, make the threshold a define.
Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
is only 1% smaller.
Signed-off-by: Lauri Kasanen <cand@gmx.com>
---
drivers/gpu/drm/radeon/r100.c | 33 ---------------------------------
drivers/gpu/drm/radeon/radeon.h | 40 ++++++++++++++++++++++++++++++++++++----
2 files changed, 36 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index b6c3264..a4e7871 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev)
return 0;
}
-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect)
-{
- if (reg < rdev->rmmio_size && !always_indirect)
- return readl(((void __iomem *)rdev->rmmio) + reg);
- else {
- unsigned long flags;
- uint32_t ret;
-
- spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
- writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
- ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
- spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
-
- return ret;
- }
-}
-
-void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
- bool always_indirect)
-{
- if (reg < rdev->rmmio_size && !always_indirect)
- writel(v, ((void __iomem *)rdev->rmmio) + reg);
- else {
- unsigned long flags;
-
- spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
- writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
- writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
- spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
- }
-}
-
u32 r100_io_rreg(struct radeon_device *rdev, u32 reg)
{
if (reg < rdev->rio_mem_size)
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index f21db7a..a749b6c 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev,
void radeon_device_fini(struct radeon_device *rdev);
int radeon_gpu_wait_for_idle(struct radeon_device *rdev);
-uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
- bool always_indirect);
-void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
- bool always_indirect);
+#define RADEON_MIN_MMIO_SIZE 0x10000
+
+static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
+ bool always_indirect)
+{
+ /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */
+ if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect)
+ return readl(((void __iomem *)rdev->rmmio) + reg);
+ else {
+ unsigned long flags;
+ uint32_t ret;
+
+ spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+ writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+ ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+ spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+
+ return ret;
+ }
+}
+
+static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
+ bool always_indirect)
+{
+ if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect)
+ writel(v, ((void __iomem *)rdev->rmmio) + reg);
+ else {
+ unsigned long flags;
+
+ spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
+ writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
+ writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
+ spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
+ }
+}
+
u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
2014-04-20 17:29 [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3 Lauri Kasanen
@ 2014-04-20 17:41 ` Christian König
2014-07-10 8:48 ` Lauri Kasanen
0 siblings, 1 reply; 5+ messages in thread
From: Christian König @ 2014-04-20 17:41 UTC (permalink / raw)
To: Lauri Kasanen, dri-devel
Am 20.04.2014 19:29, schrieb Lauri Kasanen:
> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>
> However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
> in Urban Terror.
>
> v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
> optimize the branch out, improving both performance and size.
>
> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
> says perf is now more than 1% better.
>
> v3: Also change _wreg, make the threshold a define.
>
> Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
> is only 1% smaller.
>
> Signed-off-by: Lauri Kasanen <cand@gmx.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/radeon/r100.c | 33 ---------------------------------
> drivers/gpu/drm/radeon/radeon.h | 40 ++++++++++++++++++++++++++++++++++++----
> 2 files changed, 36 insertions(+), 37 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index b6c3264..a4e7871 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -4086,39 +4086,6 @@ int r100_init(struct radeon_device *rdev)
> return 0;
> }
>
> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> - bool always_indirect)
> -{
> - if (reg < rdev->rmmio_size && !always_indirect)
> - return readl(((void __iomem *)rdev->rmmio) + reg);
> - else {
> - unsigned long flags;
> - uint32_t ret;
> -
> - spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> - ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> -
> - return ret;
> - }
> -}
> -
> -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
> - bool always_indirect)
> -{
> - if (reg < rdev->rmmio_size && !always_indirect)
> - writel(v, ((void __iomem *)rdev->rmmio) + reg);
> - else {
> - unsigned long flags;
> -
> - spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> - writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> - writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> - spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> - }
> -}
> -
> u32 r100_io_rreg(struct radeon_device *rdev, u32 reg)
> {
> if (reg < rdev->rio_mem_size)
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index f21db7a..a749b6c 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2328,10 +2328,42 @@ int radeon_device_init(struct radeon_device *rdev,
> void radeon_device_fini(struct radeon_device *rdev);
> int radeon_gpu_wait_for_idle(struct radeon_device *rdev);
>
> -uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> - bool always_indirect);
> -void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
> - bool always_indirect);
> +#define RADEON_MIN_MMIO_SIZE 0x10000
> +
> +static inline uint32_t r100_mm_rreg(struct radeon_device *rdev, uint32_t reg,
> + bool always_indirect)
> +{
> + /* The mmio size is 64kb at minimum. Allows the if to be optimized out. */
> + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect)
> + return readl(((void __iomem *)rdev->rmmio) + reg);
> + else {
> + unsigned long flags;
> + uint32_t ret;
> +
> + spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> + ret = readl(((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> +
> + return ret;
> + }
> +}
> +
> +static inline void r100_mm_wreg(struct radeon_device *rdev, uint32_t reg, uint32_t v,
> + bool always_indirect)
> +{
> + if ((reg < rdev->rmmio_size || reg < RADEON_MIN_MMIO_SIZE) && !always_indirect)
> + writel(v, ((void __iomem *)rdev->rmmio) + reg);
> + else {
> + unsigned long flags;
> +
> + spin_lock_irqsave(&rdev->mmio_idx_lock, flags);
> + writel(reg, ((void __iomem *)rdev->rmmio) + RADEON_MM_INDEX);
> + writel(v, ((void __iomem *)rdev->rmmio) + RADEON_MM_DATA);
> + spin_unlock_irqrestore(&rdev->mmio_idx_lock, flags);
> + }
> +}
> +
> u32 r100_io_rreg(struct radeon_device *rdev, u32 reg);
> void r100_io_wreg(struct radeon_device *rdev, u32 reg, u32 v);
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
2014-04-20 17:41 ` Christian König
@ 2014-07-10 8:48 ` Lauri Kasanen
2014-07-10 8:55 ` Christian König
0 siblings, 1 reply; 5+ messages in thread
From: Lauri Kasanen @ 2014-07-10 8:48 UTC (permalink / raw)
To: Christian König, Dave Airlie; +Cc: dri-devel
On Sun, 20 Apr 2014 19:41:11 +0200
Christian König <deathsimple@vodafone.de> wrote:
> Am 20.04.2014 19:29, schrieb Lauri Kasanen:
> > This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
> > Indeed, a first attempt at inlining it grew radeon.ko by 7%.
> >
> > However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
> > in Urban Terror.
> >
> > v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
> > optimize the branch out, improving both performance and size.
> >
> > The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
> > says perf is now more than 1% better.
> >
> > v3: Also change _wreg, make the threshold a define.
> >
> > Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
> > is only 1% smaller.
> >
> > Signed-off-by: Lauri Kasanen <cand@gmx.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
Ping. Although reviewed in April, seems this wasn't applied to any tree?
- Lauri
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
2014-07-10 8:48 ` Lauri Kasanen
@ 2014-07-10 8:55 ` Christian König
2014-07-10 17:17 ` Alex Deucher
0 siblings, 1 reply; 5+ messages in thread
From: Christian König @ 2014-07-10 8:55 UTC (permalink / raw)
To: Lauri Kasanen; +Cc: Alex Deucher, dri-devel
Am 10.07.2014 10:48, schrieb Lauri Kasanen:
> On Sun, 20 Apr 2014 19:41:11 +0200
> Christian König <deathsimple@vodafone.de> wrote:
>
>> Am 20.04.2014 19:29, schrieb Lauri Kasanen:
>>> This was originally un-inlined by Andi Kleen in 2011 citing size concerns.
>>> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>>>
>>> However, 2% of cpu is spent in this function. Simply inlining it gave 1% more fps
>>> in Urban Terror.
>>>
>>> v2: We know the minimum MMIO size. Adding it to the if allows the compiler to
>>> optimize the branch out, improving both performance and size.
>>>
>>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but common sense
>>> says perf is now more than 1% better.
>>>
>>> v3: Also change _wreg, make the threshold a define.
>>>
>>> Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
>>> is only 1% smaller.
>>>
>>> Signed-off-by: Lauri Kasanen <cand@gmx.com>
>> Reviewed-by: Christian König <christian.koenig@amd.com>
> Ping. Although reviewed in April, seems this wasn't applied to any tree?
Sorry looks like I missed it. Alex can you pull that in your 3.17 branch?
Thanks,
Christian.
>
> - Lauri
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3
2014-07-10 8:55 ` Christian König
@ 2014-07-10 17:17 ` Alex Deucher
0 siblings, 0 replies; 5+ messages in thread
From: Alex Deucher @ 2014-07-10 17:17 UTC (permalink / raw)
To: Christian König
Cc: Alex Deucher, Lauri Kasanen, Maling list - DRI developers
On Thu, Jul 10, 2014 at 4:55 AM, Christian König
<deathsimple@vodafone.de> wrote:
> Am 10.07.2014 10:48, schrieb Lauri Kasanen:
>
>> On Sun, 20 Apr 2014 19:41:11 +0200
>> Christian König <deathsimple@vodafone.de> wrote:
>>
>>> Am 20.04.2014 19:29, schrieb Lauri Kasanen:
>>>>
>>>> This was originally un-inlined by Andi Kleen in 2011 citing size
>>>> concerns.
>>>> Indeed, a first attempt at inlining it grew radeon.ko by 7%.
>>>>
>>>> However, 2% of cpu is spent in this function. Simply inlining it gave 1%
>>>> more fps
>>>> in Urban Terror.
>>>>
>>>> v2: We know the minimum MMIO size. Adding it to the if allows the
>>>> compiler to
>>>> optimize the branch out, improving both performance and size.
>>>>
>>>> The v2 patch decreases radeon.ko size by 2%. I didn't re-benchmark, but
>>>> common sense
>>>> says perf is now more than 1% better.
>>>>
>>>> v3: Also change _wreg, make the threshold a define.
>>>>
>>>> Inlining _wreg increased the size a bit compared to v2, so now radeon.ko
>>>> is only 1% smaller.
>>>>
>>>> Signed-off-by: Lauri Kasanen <cand@gmx.com>
>>>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>
>> Ping. Although reviewed in April, seems this wasn't applied to any tree?
>
>
> Sorry looks like I missed it. Alex can you pull that in your 3.17 branch?
Applied. thanks.
Alex
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-07-10 17:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-20 17:29 [PATCH] drm/radeon: Inline r100_mm_rreg, -wreg, v3 Lauri Kasanen
2014-04-20 17:41 ` Christian König
2014-07-10 8:48 ` Lauri Kasanen
2014-07-10 8:55 ` Christian König
2014-07-10 17:17 ` Alex Deucher
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.