All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] block/ps3: Fix slow VRAM IO
@ 2009-10-19 19:58 Geoff Levand
  2009-10-19 20:03 ` Jim Paris
  2009-11-03  8:23 ` Andrew Morton
  0 siblings, 2 replies; 7+ messages in thread
From: Geoff Levand @ 2009-10-19 19:58 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Jim Paris, Cell Broadband Engine OSS Development,
	Geert Uytterhoeven, linux-kernel


From: Hideyuki Sasaki <Hideyuki_Sasaki@hq.scei.sony.co.jp>

The current PS3 VRAM driver uses msleep() to wait for completion
of RSX DMA transfers between system memory and VRAM.  Depending
on the system timing, the processing delay and overhead of this
msleep() call can significantly impact VRAM driver IO.

To avoid the condition, add a short duration (200 usec max)
udelay() polling loop before entering the msleep() polling
loop.

Signed-off-by: Hideyuki Sasaki <xhide@rd.scei.sony.co.jp>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
---

 drivers/block/ps3vram.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

--- a/drivers/block/ps3vram.c
+++ b/drivers/block/ps3vram.c
@@ -123,7 +123,15 @@ static int ps3vram_notifier_wait(struct 
 {
 	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
 	u32 *notify = ps3vram_get_notifier(priv->reports, NOTIFIER);
-	unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
+	unsigned long timeout;
+
+	for (timeout = 20; timeout; timeout--) {
+		if (!notify[3])
+			return 0;
+		udelay(10);
+	}
+
+	timeout = jiffies + msecs_to_jiffies(timeout_ms);
 
 	do {
 		if (!notify[3])


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block/ps3: Fix slow VRAM IO
  2009-10-19 19:58 [PATCH] block/ps3: Fix slow VRAM IO Geoff Levand
@ 2009-10-19 20:03 ` Jim Paris
  2009-11-03  8:23 ` Andrew Morton
  1 sibling, 0 replies; 7+ messages in thread
From: Jim Paris @ 2009-10-19 20:03 UTC (permalink / raw)
  To: Geoff Levand
  Cc: Jens Axboe, Cell Broadband Engine OSS Development,
	Geert Uytterhoeven, linux-kernel

Geoff Levand wrote:
> 
> From: Hideyuki Sasaki <Hideyuki_Sasaki@hq.scei.sony.co.jp>
> 
> The current PS3 VRAM driver uses msleep() to wait for completion
> of RSX DMA transfers between system memory and VRAM.  Depending
> on the system timing, the processing delay and overhead of this
> msleep() call can significantly impact VRAM driver IO.
> 
> To avoid the condition, add a short duration (200 usec max)
> udelay() polling loop before entering the msleep() polling
> loop.
> 
> Signed-off-by: Hideyuki Sasaki <xhide@rd.scei.sony.co.jp>
> Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>

Acked-by: Jim Paris <jim@jtan.com>

Thanks for tracking this down.

-jim

> ---
> 
>  drivers/block/ps3vram.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> --- a/drivers/block/ps3vram.c
> +++ b/drivers/block/ps3vram.c
> @@ -123,7 +123,15 @@ static int ps3vram_notifier_wait(struct 
>  {
>  	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>  	u32 *notify = ps3vram_get_notifier(priv->reports, NOTIFIER);
> -	unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> +	unsigned long timeout;
> +
> +	for (timeout = 20; timeout; timeout--) {
> +		if (!notify[3])
> +			return 0;
> +		udelay(10);
> +	}
> +
> +	timeout = jiffies + msecs_to_jiffies(timeout_ms);
>  
>  	do {
>  		if (!notify[3])

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block/ps3: Fix slow VRAM IO
  2009-10-19 19:58 [PATCH] block/ps3: Fix slow VRAM IO Geoff Levand
  2009-10-19 20:03 ` Jim Paris
@ 2009-11-03  8:23 ` Andrew Morton
  2009-11-09  6:40   ` [Cbe-oss-dev] " Akira Tsukamoto
  1 sibling, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2009-11-03  8:23 UTC (permalink / raw)
  To: Geoff Levand
  Cc: Jens Axboe, Jim Paris, Cell Broadband Engine OSS Development,
	Geert Uytterhoeven, linux-kernel

On Mon, 19 Oct 2009 12:58:27 -0700 Geoff Levand <geoffrey.levand@am.sony.com> wrote:

> 
> From: Hideyuki Sasaki <Hideyuki_Sasaki@hq.scei.sony.co.jp>
> 
> The current PS3 VRAM driver uses msleep() to wait for completion
> of RSX DMA transfers between system memory and VRAM.  Depending
> on the system timing, the processing delay and overhead of this
> msleep() call can significantly impact VRAM driver IO.
> 
> To avoid the condition, add a short duration (200 usec max)
> udelay() polling loop before entering the msleep() polling
> loop.
> 

When raising a performance-based patch, please always try to include
before-and-after performance measurements in the changelog.  People
want to know the magnitude of the improvement.

> 
>  drivers/block/ps3vram.c |   10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> --- a/drivers/block/ps3vram.c
> +++ b/drivers/block/ps3vram.c
> @@ -123,7 +123,15 @@ static int ps3vram_notifier_wait(struct 
>  {
>  	struct ps3vram_priv *priv = ps3_system_bus_get_drvdata(dev);
>  	u32 *notify = ps3vram_get_notifier(priv->reports, NOTIFIER);
> -	unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms);
> +	unsigned long timeout;
> +
> +	for (timeout = 20; timeout; timeout--) {

	for (timeout = 0; timeout < 20; timeout++) {

would be simpler.

> +		if (!notify[3])
> +			return 0;
> +		udelay(10);
> +	}

You might as well do a udelay(1) here.  The additional cost will be
negligible, and it will reduce latency.

> +	timeout = jiffies + msecs_to_jiffies(timeout_ms);

The maximum latency is now timout_ms + 200usec.

That's OK with the current constants, but if someone later changes a
constant, the error could become significant.

Perhaps that isn't worth bothering about though.

>  	do {
>  		if (!notify[3])



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cbe-oss-dev] [PATCH] block/ps3: Fix slow VRAM IO
  2009-11-03  8:23 ` Andrew Morton
@ 2009-11-09  6:40   ` Akira Tsukamoto
  2009-11-13  2:03     ` Akira Tsukamoto
  2009-11-28 22:50     ` [Cbe-oss-dev] " Siarhei Siamashka
  0 siblings, 2 replies; 7+ messages in thread
From: Akira Tsukamoto @ 2009-11-09  6:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Geoff Levand, Geert Uytterhoeven, linux-kernel,
	Cell Broadband Engine OSS Development, Jim Paris, Jens Axboe

Thank you for the review!

> > The current PS3 VRAM driver uses msleep() to wait for completion
> > of RSX DMA transfers between system memory and VRAM.  Depending
> > on the system timing, the processing delay and overhead of this
> > msleep() call can significantly impact VRAM driver IO.
> > 
> > To avoid the condition, add a short duration (200 usec max)
> > udelay() polling loop before entering the msleep() polling
> > loop.
> > 
> 
> When raising a performance-based patch, please always try to include
> before-and-after performance measurements in the changelog.  People
> want to know the magnitude of the improvement.

No problem we will add the difference of improvement in the changelog.
This is the results. Pretty impressive.
Before
  Reading:  33MB/s 
  Writing:  16MB/s
After
  Reading: 370MB/s
  Writing: 238MB/s

> > +		if (!notify[3])
> > +			return 0;
> > +		udelay(10);
> > +	}
> 
> You might as well do a udelay(1) here.  The additional cost will be
> negligible, and it will reduce latency.

Are you mentioning adding udelay(1) in the between udelay polling 
and msleep polling? Or are you mentioning to change udelay(10) to udelay(1)
inside the udelay polling?

The former is no problem, but the later has impact on performance of PS3 
system.
Because Cell/B.E.(consists of PPE and SPEs cores) and GPU are connected with 
ring bus called EIB and every issuing notify[3] to check VRAM-DMA results 
will generate data transfer to the bus. 
There are only one EIB bus in PS3 and other devices connected on the bus
such as SPEs will be affected if the bus is occupied by many notify[3] and
as a result it will decrease the over all system performance.

The udelay(10) was the most reasonable distance not to overcrowd the bus 
and not to wait too long for checking DMA on VRAM.
We have tried udelay(5) but did not improve the VRAM IO speed.

> > +	timeout = jiffies + msecs_to_jiffies(timeout_ms);
> 
> The maximum latency is now timout_ms + 200usec.
> 
> That's OK with the current constants, but if someone later changes a
> constant, the error could become significant.

Yes, I think so too. Probably reconstructing the design entirely based on 
usec instead of msec might be ideal but adding 200usec loops fixes the
current slow VRAM driver, so I thought it is acceptable work around.

> Perhaps that isn't worth bothering about though.
> 
> >  	do {
> >  		if (!notify[3])

-- 
Akira Tsukamoto
Sony Computer Entertainment Inc. 
Architecture Lab.
Japan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block/ps3: Fix slow VRAM IO
  2009-11-09  6:40   ` [Cbe-oss-dev] " Akira Tsukamoto
@ 2009-11-13  2:03     ` Akira Tsukamoto
  2009-11-13  7:20       ` Jens Axboe
  2009-11-28 22:50     ` [Cbe-oss-dev] " Siarhei Siamashka
  1 sibling, 1 reply; 7+ messages in thread
From: Akira Tsukamoto @ 2009-11-13  2:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Geoff Levand, Geert Uytterhoeven, linux-kernel,
	Cell Broadband Engine OSS Development, Jim Paris, Jens Axboe

Hello Andrew Morton,

Ping?

This patch is pretty important to improve the performance of PS3.
I really appreciate for your reply.

Thanks,

Akira

On Mon, 09 Nov 2009 15:40:42 +0900, 
Akira Tsukamoto <akirat@rd.scei.sony.co.jp> mentioned: 
> Thank you for the review!
> 
> > > The current PS3 VRAM driver uses msleep() to wait for completion
> > > of RSX DMA transfers between system memory and VRAM.  Depending
> > > on the system timing, the processing delay and overhead of this
> > > msleep() call can significantly impact VRAM driver IO.
> > > 
> > > To avoid the condition, add a short duration (200 usec max)
> > > udelay() polling loop before entering the msleep() polling
> > > loop.
> > > 
> > 
> > When raising a performance-based patch, please always try to include
> > before-and-after performance measurements in the changelog.  People
> > want to know the magnitude of the improvement.
> 
> No problem we will add the difference of improvement in the changelog.
> This is the results. Pretty impressive.
> Before
>   Reading:  33MB/s 
>   Writing:  16MB/s
> After
>   Reading: 370MB/s
>   Writing: 238MB/s
> 
> > > +		if (!notify[3])
> > > +			return 0;
> > > +		udelay(10);
> > > +	}
> > 
> > You might as well do a udelay(1) here.  The additional cost will be
> > negligible, and it will reduce latency.
> 
> Are you mentioning adding udelay(1) in the between udelay polling 
> and msleep polling? Or are you mentioning to change udelay(10) to udelay(1)
> inside the udelay polling?
> 
> The former is no problem, but the later has impact on performance of PS3 
> system.
> Because Cell/B.E.(consists of PPE and SPEs cores) and GPU are connected with 
> ring bus called EIB and every issuing notify[3] to check VRAM-DMA results 
> will generate data transfer to the bus. 
> There are only one EIB bus in PS3 and other devices connected on the bus
> such as SPEs will be affected if the bus is occupied by many notify[3] and
> as a result it will decrease the over all system performance.
> 
> The udelay(10) was the most reasonable distance not to overcrowd the bus 
> and not to wait too long for checking DMA on VRAM.
> We have tried udelay(5) but did not improve the VRAM IO speed.
> 
> > > +	timeout = jiffies + msecs_to_jiffies(timeout_ms);
> > 
> > The maximum latency is now timout_ms + 200usec.
> > 
> > That's OK with the current constants, but if someone later changes a
> > constant, the error could become significant.
> 
> Yes, I think so too. Probably reconstructing the design entirely based on 
> usec instead of msec might be ideal but adding 200usec loops fixes the
> current slow VRAM driver, so I thought it is acceptable work around.
> 
> > Perhaps that isn't worth bothering about though.
> > 
> > >  	do {
> > >  		if (!notify[3])
> 
> -- 
> Akira Tsukamoto
> Sony Computer Entertainment Inc. 
> Architecture Lab.
> Japan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Akira Tsukamoto
Sony Computer Entertainment Inc. 
Architecture Lab.
Japan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] block/ps3: Fix slow VRAM IO
  2009-11-13  2:03     ` Akira Tsukamoto
@ 2009-11-13  7:20       ` Jens Axboe
  0 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2009-11-13  7:20 UTC (permalink / raw)
  To: Akira Tsukamoto
  Cc: Andrew Morton, Geoff Levand, Geert Uytterhoeven, linux-kernel,
	Cell Broadband Engine OSS Development, Jim Paris

On Fri, Nov 13 2009, Akira Tsukamoto wrote:
> Hello Andrew Morton,
> 
> Ping?
> 
> This patch is pretty important to improve the performance of PS3.
> I really appreciate for your reply.

I queued it up for 2.6.33 some time ago, so it's not lost.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cbe-oss-dev] [PATCH] block/ps3: Fix slow VRAM IO
  2009-11-09  6:40   ` [Cbe-oss-dev] " Akira Tsukamoto
  2009-11-13  2:03     ` Akira Tsukamoto
@ 2009-11-28 22:50     ` Siarhei Siamashka
  1 sibling, 0 replies; 7+ messages in thread
From: Siarhei Siamashka @ 2009-11-28 22:50 UTC (permalink / raw)
  To: cbe-oss-dev
  Cc: Akira Tsukamoto, Andrew Morton, linux-kernel, Jim Paris,
	Jens Axboe, Geert Uytterhoeven,
	Cell Broadband Engine OSS Development

On Monday 09 November 2009, Akira Tsukamoto wrote:
> Thank you for the review!
>
> > > The current PS3 VRAM driver uses msleep() to wait for completion
> > > of RSX DMA transfers between system memory and VRAM.  Depending
> > > on the system timing, the processing delay and overhead of this
> > > msleep() call can significantly impact VRAM driver IO.
> > >
> > > To avoid the condition, add a short duration (200 usec max)
> > > udelay() polling loop before entering the msleep() polling
> > > loop.
> >
> > When raising a performance-based patch, please always try to include
> > before-and-after performance measurements in the changelog.  People
> > want to know the magnitude of the improvement.
>
> No problem we will add the difference of improvement in the changelog.
> This is the results. Pretty impressive.
> Before
>   Reading:  33MB/s
>   Writing:  16MB/s
> After
>   Reading: 370MB/s
>   Writing: 238MB/s
>
> > > +		if (!notify[3])
> > > +			return 0;
> > > +		udelay(10);
> > > +	}
> >
> > You might as well do a udelay(1) here.  The additional cost will be
> > negligible, and it will reduce latency.
>
> Are you mentioning adding udelay(1) in the between udelay polling
> and msleep polling? Or are you mentioning to change udelay(10) to udelay(1)
> inside the udelay polling?
>
> The former is no problem, but the later has impact on performance of PS3
> system.
> Because Cell/B.E.(consists of PPE and SPEs cores) and GPU are connected
> with ring bus called EIB and every issuing notify[3] to check VRAM-DMA
> results will generate data transfer to the bus.
> There are only one EIB bus in PS3 and other devices connected on the bus
> such as SPEs will be affected if the bus is occupied by many notify[3] and
> as a result it will decrease the over all system performance.
>
> The udelay(10) was the most reasonable distance not to overcrowd the bus
> and not to wait too long for checking DMA on VRAM.
> We have tried udelay(5) but did not improve the VRAM IO speed.
>
> > > +	timeout = jiffies + msecs_to_jiffies(timeout_ms);
> >
> > The maximum latency is now timout_ms + 200usec.
> >
> > That's OK with the current constants, but if someone later changes a
> > constant, the error could become significant.
>
> Yes, I think so too. Probably reconstructing the design entirely based on
> usec instead of msec might be ideal but adding 200usec loops fixes the
> current slow VRAM driver, so I thought it is acceptable work around.

Thanks for the detailed explanations. I wonder if it makes sense to change
200usec magic number to something more flexible. If I understand it correctly,
200usec is just about twice the time that is needed to transfer 256KiB sized
ps3vram internal cache page from or to VRAM via DMA with otherwise idle EIB
bus. I guess it is done so that msleep is only ever reachable when EIB bus is
heavily overloaded. Reaching msleep in the code means getting all the same
33MB/s or 16MB/s for ps3vram performance.

If somebody tries to play with tweaking ps3vram constants like 
CACHE_PAGE_SIZE, the magic 200usec delay may need to be changed
to something more appropriate, but right now it is not very obvious
from the patch description or comments in the code.

So probably some constant, based on DMA throughput and cache page size
would be better here?

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-11-28 22:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-19 19:58 [PATCH] block/ps3: Fix slow VRAM IO Geoff Levand
2009-10-19 20:03 ` Jim Paris
2009-11-03  8:23 ` Andrew Morton
2009-11-09  6:40   ` [Cbe-oss-dev] " Akira Tsukamoto
2009-11-13  2:03     ` Akira Tsukamoto
2009-11-13  7:20       ` Jens Axboe
2009-11-28 22:50     ` [Cbe-oss-dev] " Siarhei Siamashka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.