linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
@ 2011-08-08  0:22 Per Forlin
  2011-08-08  9:40 ` Michal Nazarewicz
  2011-08-08 14:34 ` Alan Stern
  0 siblings, 2 replies; 10+ messages in thread
From: Per Forlin @ 2011-08-08  0:22 UTC (permalink / raw)
  To: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev
  Cc: Per Forlin

FSG_NUM_BUFFERS is set to 2 as default.
Usually 2 buffers are enough to establish a good
double buffering pipeline. But when dealing with expensive
request preparation (i.e. dma_map) there may be benefits of
increasing the number of buffers. There is an extra cost for
every first request, the others are prepared in parallell with
an ongoing transfer. Every time all buffers are consumed there is
an additional cost for the next first request. Increasing the number
of buffers decreases the risk of running out of buffers.

Test set up
 * Running dd on the host reading from the mass storage device
 * cmdline: dd if=/dev/sdb of=/dev/null bs=4k count=$((256*100))
 * Caches are dropped on the host and on the device before each run

Measurements on a Snowball board with ondemand_govenor active.

FSG_NUM_BUFFERS 2
104857600 bytes (105 MB) copied, 5.62173 s, 18.7 MB/s
104857600 bytes (105 MB) copied, 5.61811 s, 18.7 MB/s
104857600 bytes (105 MB) copied, 5.57817 s, 18.8 MB/s
104857600 bytes (105 MB) copied, 5.57769 s, 18.8 MB/s
104857600 bytes (105 MB) copied, 5.59654 s, 18.7 MB/s
104857600 bytes (105 MB) copied, 5.58948 s, 18.8 MB/s

FSG_NUM_BUFFERS 4
104857600 bytes (105 MB) copied, 5.26839 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.2691 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.2711 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.27174 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.27261 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.27135 s, 19.9 MB/s
104857600 bytes (105 MB) copied, 5.27249 s, 19.9 MB/s

There may not be one optimal number for all boards. That is the
reason for adding the number to Kconfig,

Signed-off-by: Per Forlin <per.forlin@linaro.org>
---
 drivers/usb/gadget/Kconfig          |   15 +++++++++++++++
 drivers/usb/gadget/storage_common.c |    4 ++--
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
index 029e288..bbd17f3 100644
--- a/drivers/usb/gadget/Kconfig
+++ b/drivers/usb/gadget/Kconfig
@@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
 	   This value will be used except for system-specific gadget
 	   drivers that have more specific information.
 
+config USB_GADGET_STORAGE_NUM_BUFFERS
+	int "Number of storage pipline buffers"
+	range 2 64
+	default 2
+	help
+	  Usually 2 buffers are enough to establish a good
+	  double buffering pipeline. But when dealing with expensive
+	  request preparation (i.e. dma_map) there may be benefits of
+	  increasing the number of buffers. There is an extra cost for
+	  every first request, the others are prepared in parallell with
+	  an ongoing transfer. Every time all buffers are consumed there is
+	  an additional cost for the next first request. Increasing the number
+	  of buffers decreases the risk of running out of buffers.
+	  If unsure, say 2.
+
 config	USB_GADGET_SELECTED
 	boolean
 
diff --git a/drivers/usb/gadget/storage_common.c b/drivers/usb/gadget/storage_common.c
index 1fa4f70..512d9cf 100644
--- a/drivers/usb/gadget/storage_common.c
+++ b/drivers/usb/gadget/storage_common.c
@@ -262,8 +262,8 @@ static struct fsg_lun *fsg_lun_from_dev(struct device *dev)
 #define EP0_BUFSIZE	256
 #define DELAYED_STATUS	(EP0_BUFSIZE + 999)	/* An impossibly large value */
 
-/* Number of buffers we will use.  2 is enough for double-buffering */
-#define FSG_NUM_BUFFERS	2
+/* Number of buffers we will use.  2 is usually enough for double-buffering */
+#define FSG_NUM_BUFFERS	CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS
 
 /* Default size of buffer length. */
 #define FSG_BUFLEN	((u32)16384)
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08  0:22 [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size Per Forlin
@ 2011-08-08  9:40 ` Michal Nazarewicz
  2011-08-08 10:50   ` Per Forlin
  2011-08-08 14:34 ` Alan Stern
  1 sibling, 1 reply; 10+ messages in thread
From: Michal Nazarewicz @ 2011-08-08  9:40 UTC (permalink / raw)
  To: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel,
	linaro-dev, Per Forlin

On Mon, 08 Aug 2011 02:22:14 +0200, Per Forlin <per.forlin@linaro.org>  
wrote:
> FSG_NUM_BUFFERS is set to 2 as default.
> Usually 2 buffers are enough to establish a good
> double buffering pipeline. But when dealing with expensive
> request preparation (i.e. dma_map) there may be benefits of
> increasing the number of buffers. There is an extra cost for
> every first request, the others are prepared in parallell with
> an ongoing transfer. Every time all buffers are consumed there is
> an additional cost for the next first request. Increasing the number
> of buffers decreases the risk of running out of buffers.

[... benchmark snip ...]

> There may not be one optimal number for all boards. That is the
> reason for adding the number to Kconfig,

It could actually be turned into a run-time configuration, but
that's a bit more intrusive, so adding Kconfig should be enough
for now.

> Signed-off-by: Per Forlin <per.forlin@linaro.org>

Acked-by: Michal Nazarewicz <mina86@mina86.com>


Some not very relevant comments below:

> diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
> index 029e288..bbd17f3 100644
> --- a/drivers/usb/gadget/Kconfig
> +++ b/drivers/usb/gadget/Kconfig
> @@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
>  	   This value will be used except for system-specific gadget
>  	   drivers that have more specific information.
> +config USB_GADGET_STORAGE_NUM_BUFFERS
> +	int "Number of storage pipline buffers"

“pipeline”, missing “e”.

> +	range 2 64
> +	default 2
> +	help
> +	  Usually 2 buffers are enough to establish a good
> +	  double buffering pipeline. But when dealing with expensive

s/double//.  By definition, “double” buffering uses two buffers, isn't it?
So if we change number of buffers, it's no longer “double”.

> +	  request preparation (i.e. dma_map) there may be benefits of
> +	  increasing the number of buffers. There is an extra cost for
> +	  every first request, the others are prepared in parallell with

“parallel”, too many “l”s.

> +	  an ongoing transfer. Every time all buffers are consumed there is
> +	  an additional cost for the next first request. Increasing the number

“Next first request” sounds a bit illogical to me.

> +	  of buffers decreases the risk of running out of buffers.
> +	  If unsure, say 2.
> +
>  config	USB_GADGET_SELECTED
>  	boolean
> diff --git a/drivers/usb/gadget/storage_common.c  
> b/drivers/usb/gadget/storage_common.c
> index 1fa4f70..512d9cf 100644
> --- a/drivers/usb/gadget/storage_common.c
> +++ b/drivers/usb/gadget/storage_common.c
> @@ -262,8 +262,8 @@ static struct fsg_lun *fsg_lun_from_dev(struct  
> device *dev)
>  #define EP0_BUFSIZE	256
>  #define DELAYED_STATUS	(EP0_BUFSIZE + 999)	/* An impossibly large value  
> */
> -/* Number of buffers we will use.  2 is enough for double-buffering */
> -#define FSG_NUM_BUFFERS	2
> +/* Number of buffers we will use.  2 is usually enough for  
> double-buffering */

s/double-buffering/good buffering pipeline/?

> +#define FSG_NUM_BUFFERS	CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS
> /* Default size of buffer length. */
>  #define FSG_BUFLEN	((u32)16384)

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08  9:40 ` Michal Nazarewicz
@ 2011-08-08 10:50   ` Per Forlin
  2011-08-08 11:59     ` Michal Nazarewicz
  0 siblings, 1 reply; 10+ messages in thread
From: Per Forlin @ 2011-08-08 10:50 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

Hi Michal,

2011/8/8 Michal Nazarewicz <mina86@mina86.com>:
> On Mon, 08 Aug 2011 02:22:14 +0200, Per Forlin <per.forlin@linaro.org>
> wrote:
>>
>> FSG_NUM_BUFFERS is set to 2 as default.
>> Usually 2 buffers are enough to establish a good
>> double buffering pipeline. But when dealing with expensive
>> request preparation (i.e. dma_map) there may be benefits of
>> increasing the number of buffers. There is an extra cost for
>> every first request, the others are prepared in parallell with
>> an ongoing transfer. Every time all buffers are consumed there is
>> an additional cost for the next first request. Increasing the number
>> of buffers decreases the risk of running out of buffers.
>
> [... benchmark snip ...]
>
>> There may not be one optimal number for all boards. That is the
>> reason for adding the number to Kconfig,
>
> It could actually be turned into a run-time configuration, but
> that's a bit more intrusive,
I consider this too but couldn't see how this would be done nicely.

>> Signed-off-by: Per Forlin <per.forlin@linaro.org>
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>
>
> Some not very relevant comments below:
Thanks for these pointers. I'll update.

>
>> diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
>> index 029e288..bbd17f3 100644
>> --- a/drivers/usb/gadget/Kconfig
>> +++ b/drivers/usb/gadget/Kconfig
>> @@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
>>           This value will be used except for system-specific gadget
>>           drivers that have more specific information.
>> +config USB_GADGET_STORAGE_NUM_BUFFERS
>> +       int "Number of storage pipline buffers"
>
> “pipeline”, missing “e”.
Will fix.

>
>> +       range 2 64
>> +       default 2
>> +       help
>> +         Usually 2 buffers are enough to establish a good
>> +         double buffering pipeline. But when dealing with expensive
>
> s/double//.  By definition, “double” buffering uses two buffers, isn't it?
> So if we change number of buffers, it's no longer “double”.
>
Good point!

>> +         request preparation (i.e. dma_map) there may be benefits of
>> +         increasing the number of buffers. There is an extra cost for
>> +         every first request, the others are prepared in parallell with
>
> “parallel”, too many “l”s.
>
Will fix

>> +         an ongoing transfer. Every time all buffers are consumed there
>> is
>> +         an additional cost for the next first request. Increasing the
>> number
>
> “Next first request” sounds a bit illogical to me.
>
I'll remove "next" to avoid confusion.

>> +         of buffers decreases the risk of running out of buffers.
>> +         If unsure, say 2.
>> +
>>  config USB_GADGET_SELECTED
>>        boolean
>> diff --git a/drivers/usb/gadget/storage_common.c
>> b/drivers/usb/gadget/storage_common.c
>> index 1fa4f70..512d9cf 100644
>> --- a/drivers/usb/gadget/storage_common.c
>> +++ b/drivers/usb/gadget/storage_common.c
>> @@ -262,8 +262,8 @@ static struct fsg_lun *fsg_lun_from_dev(struct device
>> *dev)
>>  #define EP0_BUFSIZE    256
>>  #define DELAYED_STATUS (EP0_BUFSIZE + 999)     /* An impossibly large
>> value */
>> -/* Number of buffers we will use.  2 is enough for double-buffering */
>> -#define FSG_NUM_BUFFERS        2
>> +/* Number of buffers we will use.  2 is usually enough for
>> double-buffering */
>
> s/double-buffering/good buffering pipeline/?
>
Will fix.

>> +#define FSG_NUM_BUFFERS        CONFIG_USB_GADGET_STORAGE_NUM_BUFFERS
>> /* Default size of buffer length. */
>>  #define FSG_BUFLEN     ((u32)16384)
>
> --
> Best regards,                                         _     _
> .o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
> ..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
> ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 10:50   ` Per Forlin
@ 2011-08-08 11:59     ` Michal Nazarewicz
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Nazarewicz @ 2011-08-08 11:59 UTC (permalink / raw)
  To: Per Forlin
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

>> On Mon, 08 Aug 2011 02:22:14 +0200, Per Forlin wrote:
>>> There may not be one optimal number for all boards. That is the
>>> reason for adding the number to Kconfig,

> 2011/8/8 Michal Nazarewicz <mina86@mina86.com>:
>> It could actually be turned into a run-time configuration, but
>> that's a bit more intrusive,

On Mon, 08 Aug 2011 12:50:01 +0200, Per Forlin wrote:
> I consider this too but couldn't see how this would be done nicely.


In f_mass_storage.c, the buffers are stored in fsg_common structure
as an fixed size array.  Instead, this could be replaced by a pointer
and buffhds_count field with number of buffers.

The buffhds_count would have to initialised in fsg_common_init() from
a (new) field stored in fsg_config structure, which in turn in most
cases is filled by values from fsg_module_parameters which correspond
to module parameters defined in FSG_MODULE_PARAMATERS() macro.

The buffers are allocated in fsg_common_init() (see line 2815).  All
that would need to be change is to use value stored in fsg_common
instead of FSG_NUM_BUFFERS and first allocate space for the
common->buffhds array.

Similarly, fsg_common_realese() would have to get the number of
buffers from fsg_common structure and free common->buffhds after
freeing each individual buffers.

As far as I can tell, other places where FSG_NUM_BUFFERS is used
would be a simple s/FSG_NUM_BUFFERS/common->buffhds_count/g.


If you also care about file_storage.c, almost the same expect the
module parameter juggling is a bit easier.


But as I've said previously, it's not like I'm saying that's the way
it should be done right now.  I'm just describing how it could be
done should someone need to be able to change number of buffers via
module parameter rather then recompiling the module.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
ooo +-----<email/xmpp: mnazarewicz@google.com>-----ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08  0:22 [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size Per Forlin
  2011-08-08  9:40 ` Michal Nazarewicz
@ 2011-08-08 14:34 ` Alan Stern
  2011-08-08 18:11   ` Per Forlin
  1 sibling, 1 reply; 10+ messages in thread
From: Alan Stern @ 2011-08-08 14:34 UTC (permalink / raw)
  To: Per Forlin
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

On Mon, 8 Aug 2011, Per Forlin wrote:

> FSG_NUM_BUFFERS is set to 2 as default.
> Usually 2 buffers are enough to establish a good
> double buffering pipeline. But when dealing with expensive
> request preparation (i.e. dma_map) there may be benefits of
> increasing the number of buffers. There is an extra cost for
> every first request, the others are prepared in parallell with
> an ongoing transfer. Every time all buffers are consumed there is
> an additional cost for the next first request. Increasing the number
> of buffers decreases the risk of running out of buffers.

The reasoning here is not clear.  Throughput is limited on the device
side mostly by the time required to prepare the buffer, transfer the
data, and copy the data to/from the backing storage.  Using more
buffers won't affect this directly.  Putting this another way, if the
time required to prepare a buffer and copy the data is longer than the
time needed to transfer the data over the USB link, the throughput
won't be optimal no matter how many buffers you use.

On the other hand, adding buffers will smooth out irregularities and
latencies caused by scheduling delays when other tasks are running at
the same time.  Still, I would expect the benefit to be fairly small
and to diminish rapidly as more buffers are added.

> Test set up
>  * Running dd on the host reading from the mass storage device
>  * cmdline: dd if=/dev/sdb of=/dev/null bs=4k count=$((256*100))
>  * Caches are dropped on the host and on the device before each run
> 
> Measurements on a Snowball board with ondemand_govenor active.
> 
> FSG_NUM_BUFFERS 2
> 104857600 bytes (105 MB) copied, 5.62173 s, 18.7 MB/s
> 104857600 bytes (105 MB) copied, 5.61811 s, 18.7 MB/s
> 104857600 bytes (105 MB) copied, 5.57817 s, 18.8 MB/s
> 104857600 bytes (105 MB) copied, 5.57769 s, 18.8 MB/s
> 104857600 bytes (105 MB) copied, 5.59654 s, 18.7 MB/s
> 104857600 bytes (105 MB) copied, 5.58948 s, 18.8 MB/s
> 
> FSG_NUM_BUFFERS 4
> 104857600 bytes (105 MB) copied, 5.26839 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.2691 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.2711 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.27174 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.27261 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.27135 s, 19.9 MB/s
> 104857600 bytes (105 MB) copied, 5.27249 s, 19.9 MB/s

Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
or 8 buffers?  I bet going beyond 4 makes very little difference.

> There may not be one optimal number for all boards. That is the
> reason for adding the number to Kconfig,
> 
> Signed-off-by: Per Forlin <per.forlin@linaro.org>
> ---
>  drivers/usb/gadget/Kconfig          |   15 +++++++++++++++
>  drivers/usb/gadget/storage_common.c |    4 ++--
>  2 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
> index 029e288..bbd17f3 100644
> --- a/drivers/usb/gadget/Kconfig
> +++ b/drivers/usb/gadget/Kconfig
> @@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
>  	   This value will be used except for system-specific gadget
>  	   drivers that have more specific information.
>  
> +config USB_GADGET_STORAGE_NUM_BUFFERS
> +	int "Number of storage pipline buffers"
> +	range 2 64

64 is definitely way into the overkill range.  I'd be very tempted to
leave the maximum set at 4.  You might even consider setting it to 4
always -- although each buffer requires 16 KB of contiguous kernel
memory so minimizing the number of buffers is always desirable.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 14:34 ` Alan Stern
@ 2011-08-08 18:11   ` Per Forlin
  2011-08-08 18:45     ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Per Forlin @ 2011-08-08 18:11 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

On 8 August 2011 16:34, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>
>> FSG_NUM_BUFFERS is set to 2 as default.
>> Usually 2 buffers are enough to establish a good
>> double buffering pipeline. But when dealing with expensive
>> request preparation (i.e. dma_map) there may be benefits of
>> increasing the number of buffers. There is an extra cost for
>> every first request, the others are prepared in parallell with
>> an ongoing transfer. Every time all buffers are consumed there is
>> an additional cost for the next first request. Increasing the number
>> of buffers decreases the risk of running out of buffers.
>
> The reasoning here is not clear.  Throughput is limited on the device
> side mostly by the time required to prepare the buffer, transfer the
> data, and copy the data to/from the backing storage.  Using more
> buffers won't affect this directly.  Putting this another way, if the
> time required to prepare a buffer and copy the data is longer than the
> time needed to transfer the data over the USB link, the throughput
> won't be optimal no matter how many buffers you use.
>
> On the other hand, adding buffers will smooth out irregularities and
> latencies caused by scheduling delays when other tasks are running at
> the same time.  Still, I would expect the benefit to be fairly small
> and to diminish rapidly as more buffers are added.
>
>> Test set up
>>  * Running dd on the host reading from the mass storage device
>>  * cmdline: dd if=/dev/sdb of=/dev/null bs=4k count=$((256*100))
>>  * Caches are dropped on the host and on the device before each run
>>
>> Measurements on a Snowball board with ondemand_govenor active.
>>
>> FSG_NUM_BUFFERS 2
>> 104857600 bytes (105 MB) copied, 5.62173 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.61811 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.57817 s, 18.8 MB/s
>> 104857600 bytes (105 MB) copied, 5.57769 s, 18.8 MB/s
>> 104857600 bytes (105 MB) copied, 5.59654 s, 18.7 MB/s
>> 104857600 bytes (105 MB) copied, 5.58948 s, 18.8 MB/s
>>
>> FSG_NUM_BUFFERS 4
>> 104857600 bytes (105 MB) copied, 5.26839 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.2691 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.2711 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27174 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27261 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27135 s, 19.9 MB/s
>> 104857600 bytes (105 MB) copied, 5.27249 s, 19.9 MB/s
>
> Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
> or 8 buffers?  I bet going beyond 4 makes very little difference.
>
On my board 4 buffers are enough. More buffers will make no difference.

Background study
I started by running dd to measure performance on my target side.
Simply to measure what would be the maximum bandwidth, 20MiB/s on my
board. Then I started the gadget mass storage on the device and run
the sae test from the PC host side, 18.7 MiB/s. I guessed this might
be due to serialized cache handling. I tested to remove the dma_map
call (replaced it with virt_to_dma). This is just a dummy test to see
if this is causing the performance drop. Without dma_map I get
20MiB/s. It appears that the loss is due to dma_map call. The dma_map
only adds latency for the first request.

Studying the flow of buffers I can see, both buffers are filled with
data from vfs, then both are transmitted over USB, burst like
behaviour. More buffers will add "smoothness" and prevent the bursts
to affect performance negatively. With 4 buffers, there were very few
cases when all 4 buffers were empty during a transfer

>> There may not be one optimal number for all boards. That is the
>> reason for adding the number to Kconfig,
>>
>> Signed-off-by: Per Forlin <per.forlin@linaro.org>
>> ---
>>  drivers/usb/gadget/Kconfig          |   15 +++++++++++++++
>>  drivers/usb/gadget/storage_common.c |    4 ++--
>>  2 files changed, 17 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
>> index 029e288..bbd17f3 100644
>> --- a/drivers/usb/gadget/Kconfig
>> +++ b/drivers/usb/gadget/Kconfig
>> @@ -96,6 +96,21 @@ config USB_GADGET_VBUS_DRAW
>>          This value will be used except for system-specific gadget
>>          drivers that have more specific information.
>>
>> +config USB_GADGET_STORAGE_NUM_BUFFERS
>> +     int "Number of storage pipline buffers"
>> +     range 2 64
>
> 64 is definitely way into the overkill range.  I'd be very tempted to
> leave the maximum set at 4.  You might even consider setting it to 4
> always -- although each buffer requires 16 KB of contiguous kernel
> memory so minimizing the number of buffers is always desirable.
I agree, 64 is not a realistic number. Max 4 is fine with me.

>
> Alan Stern
>
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 18:11   ` Per Forlin
@ 2011-08-08 18:45     ` Alan Stern
  2011-08-08 19:34       ` Per Forlin
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2011-08-08 18:45 UTC (permalink / raw)
  To: Per Forlin
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 2292 bytes --]

On Mon, 8 Aug 2011, Per Forlin wrote:

> > Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
> > or 8 buffers?  I bet going beyond 4 makes very little difference.
> >
> On my board 4 buffers are enough. More buffers will make no difference.
> 
> Background study
> I started by running dd to measure performance on my target side.
> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
> board. Then I started the gadget mass storage on the device and run
> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
> be due to serialized cache handling. I tested to remove the dma_map
> call (replaced it with virt_to_dma). This is just a dummy test to see
> if this is causing the performance drop. Without dma_map I get
> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
> only adds latency for the first request.

What exactly do you mean by "first request"?  g_usb_storage uses two
usb_request structures for bulk-IN transfers; are they what you're
talking about?  Or are you talking about calls to usb_ep_queue()?  Or
something else?

Your test showed an extra latency of about 0.4 seconds when using 
dma_map.  All that latency comes from a single request (the first one)?  
That doesn't sound right.

> Studying the flow of buffers I can see, both buffers are filled with
> data from vfs, then both are transmitted over USB, burst like
> behaviour.

Actually the first buffer is filled from VFS, then it is transmitted 
over USB.  While the transmission is in progress, the second buffer is 
filled from VFS.  Each time a transmission ends, the next buffer is 
queued for transmission -- unless it hasn't been filled yet, in which 
case it is queued when it gets filled.

What do you mean by "burst like behavior"?

>  More buffers will add "smoothness" and prevent the bursts
> to affect performance negatively. With 4 buffers, there were very few
> cases when all 4 buffers were empty during a transfer

In what sense do the bursts affect performance?  It makes sense that
having all the buffers empty from time to time would reduce throughput.  
But as long as each buffer gets filled and queued before the previous
buffer has finished transmitting, it shouldn't matter how "bursty" the
behavior is.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 18:45     ` Alan Stern
@ 2011-08-08 19:34       ` Per Forlin
  2011-08-08 20:23         ` Alan Stern
  0 siblings, 1 reply; 10+ messages in thread
From: Per Forlin @ 2011-08-08 19:34 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

On 8 August 2011 20:45, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>
>> > Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
>> > or 8 buffers?  I bet going beyond 4 makes very little difference.
>> >
>> On my board 4 buffers are enough. More buffers will make no difference.
>>
>> Background study
>> I started by running dd to measure performance on my target side.
>> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
>> board. Then I started the gadget mass storage on the device and run
>> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
>> be due to serialized cache handling. I tested to remove the dma_map
>> call (replaced it with virt_to_dma). This is just a dummy test to see
>> if this is causing the performance drop. Without dma_map I get
>> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
>> only adds latency for the first request.
>
> What exactly do you mean by "first request"?
When both buffers are empty. The first request are filled up by vfs
and then prepared. This first time there are no ongoing transfer over
USB therefore this cost more, since it can't run in parallel with
ongoing transfer. Every time the to buffers run empty there is an
extra cost for the first request in the next series of requests. The
reason for not getting data from vfs in time I don't know.

> Your test showed an extra latency of about 0.4 seconds when using
> dma_map.  All that latency comes from a single request (the first one)?
> That doesn't sound right.
>
This "first request scenario" I refer to happens very often.

>> Studying the flow of buffers I can see, both buffers are filled with
>> data from vfs, then both are transmitted over USB, burst like
>> behaviour.
>
> Actually the first buffer is filled from VFS, then it is transmitted
> over USB.  While the transmission is in progress, the second buffer is
> filled from VFS.  Each time a transmission ends, the next buffer is
> queued for transmission -- unless it hasn't been filled yet, in which
> case it is queued when it gets filled.
>
> What do you mean by "burst like behavior"?
>
Instead of getting refill from vfs smoothly the refills comes in
burst. VFS fills up the two buffers, then USB manage to transmit all
two buffers before VFS refills new data. Roughly 20% of the times VFS
fills up data in time before USB has consumed it all. 80% of the times
USB manage to consume all data before VFS refills data in the buffers.
The reason for the burst like affects are probably due to power save,
which add latency in the system.

>>  More buffers will add "smoothness" and prevent the bursts
>> to affect performance negatively. With 4 buffers, there were very few
>> cases when all 4 buffers were empty during a transfer
>
> In what sense do the bursts affect performance?  It makes sense that
> having all the buffers empty from time to time would reduce throughput.
> But as long as each buffer gets filled and queued before the previous
> buffer has finished transmitting, it shouldn't matter how "bursty" the
> behavior is.
>
This is true. In my system the buffer doesn't get filled up before the
previous has finished. With 4 buffers it works fine.

Thanks,
Per

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 19:34       ` Per Forlin
@ 2011-08-08 20:23         ` Alan Stern
  2011-08-08 20:45           ` Per Forlin
  0 siblings, 1 reply; 10+ messages in thread
From: Alan Stern @ 2011-08-08 20:23 UTC (permalink / raw)
  To: Per Forlin
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: TEXT/PLAIN; charset=UTF-8, Size: 2659 bytes --]

On Mon, 8 Aug 2011, Per Forlin wrote:

> On 8 August 2011 20:45, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 8 Aug 2011, Per Forlin wrote:
> >
> >> > Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
> >> > or 8 buffers?  I bet going beyond 4 makes very little difference.
> >> >
> >> On my board 4 buffers are enough. More buffers will make no difference.
> >>
> >> Background study
> >> I started by running dd to measure performance on my target side.
> >> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
> >> board. Then I started the gadget mass storage on the device and run
> >> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
> >> be due to serialized cache handling. I tested to remove the dma_map
> >> call (replaced it with virt_to_dma). This is just a dummy test to see
> >> if this is causing the performance drop. Without dma_map I get
> >> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
> >> only adds latency for the first request.
> >
> > What exactly do you mean by "first request"?
> When both buffers are empty. The first request are filled up by vfs
> and then prepared. This first time there are no ongoing transfer over
> USB therefore this cost more, since it can't run in parallel with
> ongoing transfer. Every time the to buffers run empty there is an
> extra cost for the first request in the next series of requests. The
> reason for not getting data from vfs in time I don't know.

Okay, that makes sense.  But what connection does this have with
dma_map?  Don't you also have situations where both buffers are empty
when you replace dma_map with virt_to_dma?

> Instead of getting refill from vfs smoothly the refills comes in
> burst. VFS fills up the two buffers, then USB manage to transmit all
> two buffers before VFS refills new data. Roughly 20% of the times VFS
> fills up data in time before USB has consumed it all. 80% of the times
> USB manage to consume all data before VFS refills data in the buffers.
> The reason for the burst like affects are probably due to power save,
> which add latency in the system.

> This is true. In my system the buffer doesn't get filled up before the
> previous has finished. With 4 buffers it works fine.

Adding buffers to smooth out the bursty VFS behavior is a reasonable
thing to do.  But you should improve the patch description and the
comments; the real reason for reduced throughput is VFS's behavior.  
Something like what you just wrote here would be fine.

And reduce the maximum number of buffers to 4.  When you've made those 
changes, I'll Ack the patch.

Alan Stern


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size
  2011-08-08 20:23         ` Alan Stern
@ 2011-08-08 20:45           ` Per Forlin
  0 siblings, 0 replies; 10+ messages in thread
From: Per Forlin @ 2011-08-08 20:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Felipe Balbi, Greg Kroah-Hartman, linux-usb, linux-kernel, linaro-dev

On 8 August 2011 22:23, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 8 Aug 2011, Per Forlin wrote:
>
>> On 8 August 2011 20:45, Alan Stern <stern@rowland.harvard.edu> wrote:
>> > On Mon, 8 Aug 2011, Per Forlin wrote:
>> >
>> >> > Okay, 6% is a worthwhile improvement, though not huge.  Did you try 6
>> >> > or 8 buffers?  I bet going beyond 4 makes very little difference.
>> >> >
>> >> On my board 4 buffers are enough. More buffers will make no difference.
>> >>
>> >> Background study
>> >> I started by running dd to measure performance on my target side.
>> >> Simply to measure what would be the maximum bandwidth, 20MiB/s on my
>> >> board. Then I started the gadget mass storage on the device and run
>> >> the sae test from the PC host side, 18.7 MiB/s. I guessed this might
>> >> be due to serialized cache handling. I tested to remove the dma_map
>> >> call (replaced it with virt_to_dma). This is just a dummy test to see
>> >> if this is causing the performance drop. Without dma_map I get
>> >> 20MiB/s. It appears that the loss is due to dma_map call. The dma_map
>> >> only adds latency for the first request.
>> >
>> > What exactly do you mean by "first request"?
>> When both buffers are empty. The first request are filled up by vfs
>> and then prepared. This first time there are no ongoing transfer over
>> USB therefore this cost more, since it can't run in parallel with
>> ongoing transfer. Every time the to buffers run empty there is an
>> extra cost for the first request in the next series of requests. The
>> reason for not getting data from vfs in time I don't know.
>
> Okay, that makes sense.  But what connection does this have with
> dma_map?  Don't you also have situations where both buffers are empty
> when you replace dma_map with virt_to_dma?
>
All I do here is to remove the cost of dma_map in order to verify if
dma_map is the one responsible for the delay. If the performance is
good without dma_map I know the extra cost is due to dma_map running
in serial instead of parallel. If the performance would be bad even
without dma_map something else is causing it. It doesn't affect how
the VFS feeds data to the buffers.

>> Instead of getting refill from vfs smoothly the refills comes in
>> burst. VFS fills up the two buffers, then USB manage to transmit all
>> two buffers before VFS refills new data. Roughly 20% of the times VFS
>> fills up data in time before USB has consumed it all. 80% of the times
>> USB manage to consume all data before VFS refills data in the buffers.
>> The reason for the burst like affects are probably due to power save,
>> which add latency in the system.
>
>> This is true. In my system the buffer doesn't get filled up before the
>> previous has finished. With 4 buffers it works fine.
>
> Adding buffers to smooth out the bursty VFS behavior is a reasonable
> thing to do.  But you should improve the patch description and the
> comments; the real reason for reduced throughput is VFS's behavior.
> Something like what you just wrote here would be fine.
I agree. The main reason is to compensate for bursty VFS behavior.

>
> And reduce the maximum number of buffers to 4.  When you've made those
> changes, I'll Ack the patch.
I'll do.

Thanks for your time and patience,
Per

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-08-08 20:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-08  0:22 [PATCH] usb: gadget: storage_common: make FSG_NUM_BUFFERS variable size Per Forlin
2011-08-08  9:40 ` Michal Nazarewicz
2011-08-08 10:50   ` Per Forlin
2011-08-08 11:59     ` Michal Nazarewicz
2011-08-08 14:34 ` Alan Stern
2011-08-08 18:11   ` Per Forlin
2011-08-08 18:45     ` Alan Stern
2011-08-08 19:34       ` Per Forlin
2011-08-08 20:23         ` Alan Stern
2011-08-08 20:45           ` Per Forlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).