linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC v4 0/3] Re-introduce TX FIFO resize for larger EP bursting
@ 2020-06-24  2:28 Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wesley Cheng @ 2020-06-24  2:28 UTC (permalink / raw)
  To: agross, bjorn.andersson, balbi, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp, Wesley Cheng

Changes in V4:
 - Removed struct dwc3* as an argument for dwc3_gadget_resize_tx_fifos()
 - Removed WARN_ON(1) in case we run out of fifo space
 
Changes in V3:
 - Removed "Reviewed-by" tags
 - Renamed series back to RFC
 - Modified logic to ensure that fifo_size is reset if we pass the minimum
   threshold.  Tested with binding multiple FDs requesting 6 FIFOs.

Changes in V2:
 - Modified TXFIFO resizing logic to ensure that each EP is reserved a
   FIFO.
 - Removed dev_dbg() prints and fixed typos from patches
 - Added some more description on the dt-bindings commit message

Currently, there is no functionality to allow for resizing the TXFIFOs, and
relying on the HW default setting for the TXFIFO depth.  In most cases, the
HW default is probably sufficient, but for USB compositions that contain
multiple functions that require EP bursting, the default settings
might not be enough.  Also to note, the current SW will assign an EP to a
function driver w/o checking to see if the TXFIFO size for that particular
EP is large enough. (this is a problem if there are multiple HW defined
values for the TXFIFO size)

It is mentioned in the SNPS databook that a minimum of TX FIFO depth = 3
is required for an EP that supports bursting.  Otherwise, there may be
frequent occurences of bursts ending.  For high bandwidth functions,
such as data tethering (protocols that support data aggregation), mass
storage, and media transfer protocol (over FFS), the bMaxBurst value can be
large, and a bigger TXFIFO depth may prove to be beneficial in terms of USB
throughput. (which can be associated to system access latency, etc...)  It
allows for a more consistent burst of traffic, w/o any interruptions, as
data is readily available in the FIFO.

With testing done using the mass storage function driver, the results show
that with a larger TXFIFO depth, the bandwidth increased significantly.

Test Parameters:
 - Platform: Qualcomm SM8150
 - bMaxBurst = 6
 - USB req size = 256kB
 - Num of USB reqs = 16
 - USB Speed = Super-Speed
 - Function Driver: Mass Storage (w/ ramdisk)
 - Test Application: CrystalDiskMark

Results:

TXFIFO Depth = 3 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 193.60
	  |           | 195.86
          |           | 184.77
          |           | 193.60
-------------------------------------------

TXFIFO Depth = 6 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     | 
Read      |9 loops    | 287.35
	  |           | 304.94
          |           | 289.64
          |           | 293.61
-------------------------------------------

Wesley Cheng (3):
  usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic
  dt-bindings: usb: dwc3: Add entry for tx-fifo-resize

 .../devicetree/bindings/usb/dwc3.txt          |   2 +-
 arch/arm64/boot/dts/qcom/sm8150.dtsi          |   1 +
 drivers/usb/dwc3/core.c                       |   2 +
 drivers/usb/dwc3/core.h                       |   8 ++
 drivers/usb/dwc3/ep0.c                        |  37 +++++-
 drivers/usb/dwc3/gadget.c                     | 115 ++++++++++++++++++
 6 files changed, 163 insertions(+), 2 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-06-24  2:28 [RFC v4 0/3] Re-introduce TX FIFO resize for larger EP bursting Wesley Cheng
@ 2020-06-24  2:28 ` Wesley Cheng
  2020-08-10 12:27   ` Felipe Balbi
  2020-08-12  2:22   ` Peter Chen
  2020-06-24  2:28 ` [RFC v4 2/3] arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 3/3] dt-bindings: usb: dwc3: Add entry for tx-fifo-resize Wesley Cheng
  2 siblings, 2 replies; 13+ messages in thread
From: Wesley Cheng @ 2020-06-24  2:28 UTC (permalink / raw)
  To: agross, bjorn.andersson, balbi, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp, Wesley Cheng

Some devices have USB compositions which may require multiple endpoints
that support EP bursting.  HW defined TX FIFO sizes may not always be
sufficient for these compositions.  By utilizing flexible TX FIFO
allocation, this allows for endpoints to request the required FIFO depth to
achieve higher bandwidth.  With some higher bMaxBurst configurations, using
a larger TX FIFO size results in better TX throughput.

Ensure that one TX FIFO is reserved for every IN endpoint.  This allows for
the FIFO logic to prevent running out of FIFO space.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
---
 drivers/usb/dwc3/core.c   |   2 +
 drivers/usb/dwc3/core.h   |   8 +++
 drivers/usb/dwc3/ep0.c    |  37 +++++++++++-
 drivers/usb/dwc3/gadget.c | 115 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 161 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index edc17155cb2b..cca555493929 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1304,6 +1304,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 				&tx_thr_num_pkt_prd);
 	device_property_read_u8(dev, "snps,tx-max-burst-prd",
 				&tx_max_burst_prd);
+	dwc->needs_fifo_resize = device_property_read_bool(dev,
+				"tx-fifo-resize");
 
 	dwc->disable_scramble_quirk = device_property_read_bool(dev,
 				"snps,disable_scramble_quirk");
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 4c171a8e215f..ce0bf288b6ac 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -675,6 +675,7 @@ struct dwc3_event_buffer {
  *		isochronous START TRANSFER command failure workaround
  * @start_cmd_status: the status of testing START TRANSFER command with
  *		combo_num = 'b00
+ * @fifo_depth: allocated TXFIFO depth
  */
 struct dwc3_ep {
 	struct usb_ep		endpoint;
@@ -727,6 +728,7 @@ struct dwc3_ep {
 	/* For isochronous START TRANSFER workaround only */
 	u8			combo_num;
 	int			start_cmd_status;
+	int			fifo_depth;
 };
 
 enum dwc3_phy {
@@ -1004,6 +1006,7 @@ struct dwc3_scratchpad_array {
  * 	1	- utmi_l1_suspend_n
  * @is_fpga: true when we are using the FPGA board
  * @pending_events: true when we have pending IRQs to be handled
+ * @needs_fifo_resize: not all users might want fifo resizing, flag it
  * @pullups_connected: true when Run/Stop bit is set
  * @setup_packet_pending: true when there's a Setup Packet in FIFO. Workaround
  * @three_stage_setup: set if we perform a three phase setup
@@ -1044,6 +1047,8 @@ struct dwc3_scratchpad_array {
  * @dis_metastability_quirk: set to disable metastability quirk.
  * @imod_interval: set the interrupt moderation interval in 250ns
  *                 increments or 0 to disable.
+ * @last_fifo_depth: total TXFIFO depth of all enabled USB IN/INT endpoints
+ * @num_ep_resized: the number of TX FIFOs that have already been resized
  */
 struct dwc3 {
 	struct work_struct	drd_work;
@@ -1204,6 +1209,7 @@ struct dwc3 {
 	unsigned		is_utmi_l1_suspend:1;
 	unsigned		is_fpga:1;
 	unsigned		pending_events:1;
+	unsigned		needs_fifo_resize:1;
 	unsigned		pullups_connected:1;
 	unsigned		setup_packet_pending:1;
 	unsigned		three_stage_setup:1;
@@ -1236,6 +1242,8 @@ struct dwc3 {
 	unsigned		dis_metastability_quirk:1;
 
 	u16			imod_interval;
+	int			last_fifo_depth;
+	int			num_ep_resized;
 };
 
 #define INCRX_BURST_MODE 0
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 6dee4dabc0a4..76db9b530861 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -601,8 +601,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
 {
 	enum usb_device_state state = dwc->gadget.state;
 	u32 cfg;
-	int ret;
+	int ret, num, size;
 	u32 reg;
+	struct dwc3_ep *dep;
 
 	cfg = le16_to_cpu(ctrl->wValue);
 
@@ -611,6 +612,40 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
 		return -EINVAL;
 
 	case USB_STATE_ADDRESS:
+		/*
+		 * If tx-fifo-resize flag is not set for the controller, then
+		 * do not clear existing allocated TXFIFO since we do not
+		 * allocate it again in dwc3_gadget_resize_tx_fifos
+		 */
+		if (dwc->needs_fifo_resize) {
+			/* Read ep0IN related TXFIFO size */
+			dep = dwc->eps[1];
+			size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
+			if (dwc3_is_usb31(dwc))
+				dep->fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
+			else
+				dep->fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
+
+			dwc->last_fifo_depth = dep->fifo_depth;
+			/* Clear existing TXFIFO for all IN eps except ep0 */
+			for (num = 3; num < min_t(int, dwc->num_eps,
+				DWC3_ENDPOINTS_NUM); num += 2) {
+				dep = dwc->eps[num];
+				/* Don't change TXFRAMNUM on usb31 version */
+				size = dwc3_is_usb31(dwc) ?
+					dwc3_readl(dwc->regs,
+						   DWC3_GTXFIFOSIZ(num >> 1)) &
+						   DWC31_GTXFIFOSIZ_TXFRAMNUM :
+						   0;
+
+				dwc3_writel(dwc->regs,
+					    DWC3_GTXFIFOSIZ(num >> 1),
+					    size);
+				dep->fifo_depth = 0;
+			}
+			dwc->num_ep_resized = 0;
+		}
+
 		ret = dwc3_ep0_delegate_req(dwc, ctrl);
 		/* if the cfg matches and the cfg is non zero */
 		if (cfg && (!ret || (ret == USB_GADGET_DELAYED_STATUS))) {
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 00746c2848c0..777badf3e85d 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -540,6 +540,117 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
 	return 0;
 }
 
+/*
+ * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
+ * @dwc: pointer to our context structure
+ *
+ * This function will a best effort FIFO allocation in order
+ * to improve FIFO usage and throughput, while still allowing
+ * us to enable as many endpoints as possible.
+ *
+ * Keep in mind that this operation will be highly dependent
+ * on the configured size for RAM1 - which contains TxFifo -,
+ * the amount of endpoints enabled on coreConsultant tool, and
+ * the width of the Master Bus.
+ *
+ * In general, FIFO depths are represented with the following equation:
+ *
+ * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
+ *
+ * Conversions can be done to the equation to derive the number of packets that
+ * will fit to a particular FIFO size value.
+ */
+static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep)
+{
+	struct dwc3 *dwc = dep->dwc;
+	int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
+	int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;
+
+	if (!dwc->needs_fifo_resize)
+		return 0;
+
+	/* resize IN endpoints except ep0 */
+	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
+		return 0;
+
+	/* Don't resize already resized IN endpoint */
+	if (dep->fifo_depth)
+		return 0;
+
+	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
+	mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
+	/* MDWIDTH is represented in bits, we need it in bytes */
+	mdwidth >>= 3;
+
+	if (((dep->endpoint.maxburst > 1) &&
+			usb_endpoint_xfer_bulk(dep->endpoint.desc))
+			|| usb_endpoint_xfer_isoc(dep->endpoint.desc))
+		mult = 3;
+
+	if ((dep->endpoint.maxburst > 6) &&
+			usb_endpoint_xfer_bulk(dep->endpoint.desc)
+			&& dwc3_is_usb31(dwc))
+		mult = 6;
+
+	/* FIFO size for a single buffer */
+	fifo = (max_packet + mdwidth)/mdwidth;
+	fifo++;
+
+	/* Calculate the number of remaining EPs w/o any FIFO */
+	num_in_ep = dwc->num_eps/2;
+	num_in_ep -= dwc->num_ep_resized;
+	/* Ignore EP0 IN */
+	num_in_ep--;
+
+	/* Reserve at least one FIFO for the number of IN EPs */
+	min_depth = num_in_ep * (fifo+1);
+	remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
+
+	/* We've already reserved 1 FIFO per EP, so check what we can fit in
+	 * addition to it.  If there is not enough remaining space, allocate
+	 * all the remaining space to the EP.
+	 */
+	fifo_size = (mult-1) * fifo;
+	if (remaining < fifo_size) {
+		if (remaining > 0)
+			fifo_size = remaining;
+		else
+			fifo_size = 0;
+	}
+
+	fifo_size += fifo;
+	fifo_size++;
+	dep->fifo_depth = fifo_size;
+
+	/* Check if TXFIFOs start at non-zero addr */
+	tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
+	fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
+
+	fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
+	if (dwc3_is_usb31(dwc))
+		dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
+	else
+		dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
+
+	/* Check fifo size allocation doesn't exceed available RAM size. */
+	if (dwc->last_fifo_depth >= ram1_depth) {
+		dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
+				(dwc->last_fifo_depth * mdwidth), ram1_depth,
+				dep->endpoint.name, fifo_size);
+		if (dwc3_is_usb31(dwc))
+			fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
+		else
+			fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
+		dwc->last_fifo_depth -= fifo_size;
+		dep->fifo_depth = 0;
+		return -ENOMEM;
+	}
+
+	dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
+	dwc->num_ep_resized++;
+	return 0;
+}
+
 static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
 {
 	const struct usb_ss_ep_comp_descriptor *comp_desc;
@@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
 	int			ret;
 
 	if (!(dep->flags & DWC3_EP_ENABLED)) {
+		ret = dwc3_gadget_resize_tx_fifos(dep);
+		if (ret)
+			return ret;
+
 		ret = dwc3_gadget_start_config(dep);
 		if (ret)
 			return ret;
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v4 2/3] arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic
  2020-06-24  2:28 [RFC v4 0/3] Re-introduce TX FIFO resize for larger EP bursting Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
@ 2020-06-24  2:28 ` Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 3/3] dt-bindings: usb: dwc3: Add entry for tx-fifo-resize Wesley Cheng
  2 siblings, 0 replies; 13+ messages in thread
From: Wesley Cheng @ 2020-06-24  2:28 UTC (permalink / raw)
  To: agross, bjorn.andersson, balbi, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp, Wesley Cheng

Enable the flexible TX FIFO resize logic on SM8150.  Using a larger TX FIFO
SZ can help account for situations when system latency is greater than the
USB bus transmission latency.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/sm8150.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/qcom/sm8150.dtsi b/arch/arm64/boot/dts/qcom/sm8150.dtsi
index a36512d1f6a1..c28523313955 100644
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi
+++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi
@@ -708,6 +708,7 @@ usb_1_dwc3: dwc3@a600000 {
 				interrupts = <GIC_SPI 133 IRQ_TYPE_LEVEL_HIGH>;
 				snps,dis_u2_susphy_quirk;
 				snps,dis_enblslpm_quirk;
+				tx-fifo-resize;
 				phys = <&usb_1_hsphy>, <&usb_1_ssphy>;
 				phy-names = "usb2-phy", "usb3-phy";
 			};
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC v4 3/3] dt-bindings: usb: dwc3: Add entry for tx-fifo-resize
  2020-06-24  2:28 [RFC v4 0/3] Re-introduce TX FIFO resize for larger EP bursting Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
  2020-06-24  2:28 ` [RFC v4 2/3] arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic Wesley Cheng
@ 2020-06-24  2:28 ` Wesley Cheng
  2 siblings, 0 replies; 13+ messages in thread
From: Wesley Cheng @ 2020-06-24  2:28 UTC (permalink / raw)
  To: agross, bjorn.andersson, balbi, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp,
	Wesley Cheng, Rob Herring

Re-introduce the comment for the tx-fifo-resize setting for the DWC3
controller.  This allows for vendors to control if they require the TX FIFO
resizing logic on their HW, as the default FIFO size configurations may
already be sufficient.

Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
Acked-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/usb/dwc3.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
index 9946ff9ba735..489f5da83744 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -105,7 +105,7 @@ Optional properties:
 			1-16 (DWC_usb31 programming guide section 1.2.3) to
 			enable periodic ESS TX threshold.
 
- - <DEPRECATED> tx-fifo-resize: determines if the FIFO *has* to be reallocated.
+ - tx-fifo-resize: determines if the FIFO *has* to be reallocated.
  - snps,incr-burst-type-adjustment: Value for INCR burst type of GSBUSCFG0
 			register, undefined length INCR burst type enable and INCRx type.
 			When just one value, which means INCRX burst mode enabled. When
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
@ 2020-08-10 12:27   ` Felipe Balbi
  2020-08-11  5:10     ` Wesley Cheng
  2020-08-12  2:22   ` Peter Chen
  1 sibling, 1 reply; 13+ messages in thread
From: Felipe Balbi @ 2020-08-10 12:27 UTC (permalink / raw)
  To: Wesley Cheng, agross, bjorn.andersson, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp, Wesley Cheng

[-- Attachment #1: Type: text/plain, Size: 7794 bytes --]

Wesley Cheng <wcheng@codeaurora.org> writes:

Hi,

> Some devices have USB compositions which may require multiple endpoints
> that support EP bursting.  HW defined TX FIFO sizes may not always be
> sufficient for these compositions.  By utilizing flexible TX FIFO
> allocation, this allows for endpoints to request the required FIFO depth to
> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
> a larger TX FIFO size results in better TX throughput.

how much better? What's the impact? Got some real numbers of this
running with upstream kernel? I guess mass storage gadget is the
simplest one to test.

> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index 6dee4dabc0a4..76db9b530861 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -601,8 +601,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>  {
>  	enum usb_device_state state = dwc->gadget.state;
>  	u32 cfg;
> -	int ret;
> +	int ret, num, size;

no, no. Please one declaration per line.

>  	u32 reg;
> +	struct dwc3_ep *dep;

Keep reverse xmas tree order.

> @@ -611,6 +612,40 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>  		return -EINVAL;
>  
>  	case USB_STATE_ADDRESS:
> +		/*
> +		 * If tx-fifo-resize flag is not set for the controller, then
> +		 * do not clear existing allocated TXFIFO since we do not
> +		 * allocate it again in dwc3_gadget_resize_tx_fifos
> +		 */
> +		if (dwc->needs_fifo_resize) {
> +			/* Read ep0IN related TXFIFO size */
> +			dep = dwc->eps[1];
> +			size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
> +			if (dwc3_is_usb31(dwc))
> +				dep->fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
> +			else
> +				dep->fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
> +
> +			dwc->last_fifo_depth = dep->fifo_depth;
> +			/* Clear existing TXFIFO for all IN eps except ep0 */
> +			for (num = 3; num < min_t(int, dwc->num_eps,
> +				DWC3_ENDPOINTS_NUM); num += 2) {
> +				dep = dwc->eps[num];
> +				/* Don't change TXFRAMNUM on usb31 version */
> +				size = dwc3_is_usb31(dwc) ?
> +					dwc3_readl(dwc->regs,
> +						   DWC3_GTXFIFOSIZ(num >> 1)) &
> +						   DWC31_GTXFIFOSIZ_TXFRAMNUM :
> +						   0;
> +
> +				dwc3_writel(dwc->regs,
> +					    DWC3_GTXFIFOSIZ(num >> 1),
> +					    size);
> +				dep->fifo_depth = 0;
> +			}
> +			dwc->num_ep_resized = 0;

care to move this into a helper that you call unconditionally and the
helper returns early is !needs_fifo_resize?

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 00746c2848c0..777badf3e85d 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -540,6 +540,117 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
>  	return 0;
>  }
>  
> +/*
> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
> + * @dwc: pointer to our context structure
> + *
> + * This function will a best effort FIFO allocation in order
> + * to improve FIFO usage and throughput, while still allowing
> + * us to enable as many endpoints as possible.
> + *
> + * Keep in mind that this operation will be highly dependent
> + * on the configured size for RAM1 - which contains TxFifo -,
> + * the amount of endpoints enabled on coreConsultant tool, and
> + * the width of the Master Bus.
> + *
> + * In general, FIFO depths are represented with the following equation:
> + *
> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
> + *
> + * Conversions can be done to the equation to derive the number of packets that
> + * will fit to a particular FIFO size value.
> + */
> +static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep)
> +{
> +	struct dwc3 *dwc = dep->dwc;
> +	int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
> +	int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;

one declaration per line, also make sure you order it in reverse xmas
tree order.

> +	if (!dwc->needs_fifo_resize)
> +		return 0;
> +
> +	/* resize IN endpoints except ep0 */
> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
> +		return 0;
> +
> +	/* Don't resize already resized IN endpoint */
> +	if (dep->fifo_depth)

using fifo_depth as a flag seems flakey to me. What happens when someone
in the future changes the behavior below and this doesn't apply anymore?

Also, why is this procedure called more than once for the same endpoint?
Does that really happen?

> +		return 0;
> +
> +	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
> +	mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
> +	/* MDWIDTH is represented in bits, we need it in bytes */
> +	mdwidth >>= 3;
> +
> +	if (((dep->endpoint.maxburst > 1) &&
> +			usb_endpoint_xfer_bulk(dep->endpoint.desc))
> +			|| usb_endpoint_xfer_isoc(dep->endpoint.desc))
> +		mult = 3;
> +
> +	if ((dep->endpoint.maxburst > 6) &&
> +			usb_endpoint_xfer_bulk(dep->endpoint.desc)
> +			&& dwc3_is_usb31(dwc))
> +		mult = 6;
> +
> +	/* FIFO size for a single buffer */
> +	fifo = (max_packet + mdwidth)/mdwidth;

add spaces around integer division operator

> +	fifo++;
> +
> +	/* Calculate the number of remaining EPs w/o any FIFO */
> +	num_in_ep = dwc->num_eps/2;
> +	num_in_ep -= dwc->num_ep_resized;
> +	/* Ignore EP0 IN */
> +	num_in_ep--;
> +
> +	/* Reserve at least one FIFO for the number of IN EPs */
> +	min_depth = num_in_ep * (fifo+1);
> +	remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
> +
> +	/* We've already reserved 1 FIFO per EP, so check what we can fit in

comment style is wrong

> +	 * addition to it.  If there is not enough remaining space, allocate
> +	 * all the remaining space to the EP.
> +	 */
> +	fifo_size = (mult-1) * fifo;

spaces around subtraction

> +	if (remaining < fifo_size) {
> +		if (remaining > 0)
> +			fifo_size = remaining;
> +		else
> +			fifo_size = 0;
> +	}
> +
> +	fifo_size += fifo;
> +	fifo_size++;

why the increment?

> +	dep->fifo_depth = fifo_size;
> +
> +	/* Check if TXFIFOs start at non-zero addr */
> +	tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
> +	fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
> +
> +	fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
> +	if (dwc3_is_usb31(dwc))
> +		dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +	else
> +		dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +
> +	/* Check fifo size allocation doesn't exceed available RAM size. */
> +	if (dwc->last_fifo_depth >= ram1_depth) {
> +		dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
> +				(dwc->last_fifo_depth * mdwidth), ram1_depth,
> +				dep->endpoint.name, fifo_size);
> +		if (dwc3_is_usb31(dwc))
> +			fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +		else
> +			fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +		dwc->last_fifo_depth -= fifo_size;
> +		dep->fifo_depth = 0;
> +		return -ENOMEM;
> +	}
> +
> +	dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
> +	dwc->num_ep_resized++;

add a blank line here

> +	return 0;
> +}
> +
>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>  {
>  	const struct usb_ss_ep_comp_descriptor *comp_desc;
> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>  	int			ret;
>  
>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
> +		ret = dwc3_gadget_resize_tx_fifos(dep);
> +		if (ret)
> +			return ret;

doesn't it look odd that you're resizing every fifo every time a new
endpoint is enabled? Is there a better way to achieve this?

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-10 12:27   ` Felipe Balbi
@ 2020-08-11  5:10     ` Wesley Cheng
  2020-08-11  7:12       ` Felipe Balbi
  0 siblings, 1 reply; 13+ messages in thread
From: Wesley Cheng @ 2020-08-11  5:10 UTC (permalink / raw)
  To: Felipe Balbi, agross, bjorn.andersson, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp



On 8/10/2020 5:27 AM, Felipe Balbi wrote:
> Wesley Cheng <wcheng@codeaurora.org> writes:
> 
> Hi,
> 
>> Some devices have USB compositions which may require multiple endpoints
>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>> sufficient for these compositions.  By utilizing flexible TX FIFO
>> allocation, this allows for endpoints to request the required FIFO depth to
>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>> a larger TX FIFO size results in better TX throughput.
> 
> how much better? What's the impact? Got some real numbers of this
> running with upstream kernel? I guess mass storage gadget is the
> simplest one to test.
> 
Hi Felipe,

Thanks for the input.

Sorry for not including the numbers in the patch itself, but I did
mention the set of mass storage tests I ran w/ the upstream kernel on
SM8150 in the cover letter.  Let me just share that here:

Test Parameters:
 - Platform: Qualcomm SM8150
 - bMaxBurst = 6
 - USB req size = 256kB
 - Num of USB reqs = 16
 - USB Speed = Super-Speed
 - Function Driver: Mass Storage (w/ ramdisk)
 - Test Application: CrystalDiskMark

Results:

TXFIFO Depth = 3 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     |
Read      |9 loops    | 193.60
	  |           | 195.86
          |           | 184.77
          |           | 193.60
-------------------------------------------

TXFIFO Depth = 6 max packets

Test Case | Data Size | AVG tput (in MB/s)
-------------------------------------------
Sequential|1 GB x     |
Read      |9 loops    | 287.35
	  |           | 304.94
          |           | 289.64
          |           | 293.61
-------------------------------------------

>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>> index 6dee4dabc0a4..76db9b530861 100644
>> --- a/drivers/usb/dwc3/ep0.c
>> +++ b/drivers/usb/dwc3/ep0.c
>> @@ -601,8 +601,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>>  {
>>  	enum usb_device_state state = dwc->gadget.state;
>>  	u32 cfg;
>> -	int ret;
>> +	int ret, num, size;
> 
> no, no. Please one declaration per line.
> 
Got it.
>>  	u32 reg;
>> +	struct dwc3_ep *dep;
> 
> Keep reverse xmas tree order.
> 
Understood.
>> @@ -611,6 +612,40 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>>  		return -EINVAL;
>>  
>>  	case USB_STATE_ADDRESS:
>> +		/*
>> +		 * If tx-fifo-resize flag is not set for the controller, then
>> +		 * do not clear existing allocated TXFIFO since we do not
>> +		 * allocate it again in dwc3_gadget_resize_tx_fifos
>> +		 */
>> +		if (dwc->needs_fifo_resize) {
>> +			/* Read ep0IN related TXFIFO size */
>> +			dep = dwc->eps[1];
>> +			size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
>> +			if (dwc3_is_usb31(dwc))
>> +				dep->fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
>> +			else
>> +				dep->fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
>> +
>> +			dwc->last_fifo_depth = dep->fifo_depth;
>> +			/* Clear existing TXFIFO for all IN eps except ep0 */
>> +			for (num = 3; num < min_t(int, dwc->num_eps,
>> +				DWC3_ENDPOINTS_NUM); num += 2) {
>> +				dep = dwc->eps[num];
>> +				/* Don't change TXFRAMNUM on usb31 version */
>> +				size = dwc3_is_usb31(dwc) ?
>> +					dwc3_readl(dwc->regs,
>> +						   DWC3_GTXFIFOSIZ(num >> 1)) &
>> +						   DWC31_GTXFIFOSIZ_TXFRAMNUM :
>> +						   0;
>> +
>> +				dwc3_writel(dwc->regs,
>> +					    DWC3_GTXFIFOSIZ(num >> 1),
>> +					    size);
>> +				dep->fifo_depth = 0;
>> +			}
>> +			dwc->num_ep_resized = 0;
> 
> care to move this into a helper that you call unconditionally and the
> helper returns early is !needs_fifo_resize?
> 
Sure, I'll include that in the next revision.
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 00746c2848c0..777badf3e85d 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -540,6 +540,117 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
>>  	return 0;
>>  }
>>  
>> +/*
>> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
>> + * @dwc: pointer to our context structure
>> + *
>> + * This function will a best effort FIFO allocation in order
>> + * to improve FIFO usage and throughput, while still allowing
>> + * us to enable as many endpoints as possible.
>> + *
>> + * Keep in mind that this operation will be highly dependent
>> + * on the configured size for RAM1 - which contains TxFifo -,
>> + * the amount of endpoints enabled on coreConsultant tool, and
>> + * the width of the Master Bus.
>> + *
>> + * In general, FIFO depths are represented with the following equation:
>> + *
>> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
>> + *
>> + * Conversions can be done to the equation to derive the number of packets that
>> + * will fit to a particular FIFO size value.
>> + */
>> +static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep)
>> +{
>> +	struct dwc3 *dwc = dep->dwc;
>> +	int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
>> +	int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;
> 
> one declaration per line, also make sure you order it in reverse xmas
> tree order.
> 
Will make sure to fix this in further declarations.
>> +	if (!dwc->needs_fifo_resize)
>> +		return 0;
>> +
>> +	/* resize IN endpoints except ep0 */
>> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
>> +		return 0;
>> +
>> +	/* Don't resize already resized IN endpoint */
>> +	if (dep->fifo_depth)
> 
> using fifo_depth as a flag seems flakey to me. What happens when someone
> in the future changes the behavior below and this doesn't apply anymore?
> 
> Also, why is this procedure called more than once for the same endpoint?
> Does that really happen?
> 
I guess it can be considered a bug elsewhere (ie usb gadget or function
driver) if this happens twice.  Plus, if we decide to keep this in the
dwc3 enable endpoint path, the DWC3_EP_ENABLED flag will ensure it's
called only once as well.  Its probably overkill to check fifo_depth here.
>> +		return 0;
>> +
>> +	ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
>> +	mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
>> +	/* MDWIDTH is represented in bits, we need it in bytes */
>> +	mdwidth >>= 3;
>> +
>> +	if (((dep->endpoint.maxburst > 1) &&
>> +			usb_endpoint_xfer_bulk(dep->endpoint.desc))
>> +			|| usb_endpoint_xfer_isoc(dep->endpoint.desc))
>> +		mult = 3;
>> +
>> +	if ((dep->endpoint.maxburst > 6) &&
>> +			usb_endpoint_xfer_bulk(dep->endpoint.desc)
>> +			&& dwc3_is_usb31(dwc))
>> +		mult = 6;
>> +
>> +	/* FIFO size for a single buffer */
>> +	fifo = (max_packet + mdwidth)/mdwidth;
> 
> add spaces around integer division operator
> 
>> +	fifo++;
>> +
>> +	/* Calculate the number of remaining EPs w/o any FIFO */
>> +	num_in_ep = dwc->num_eps/2;
>> +	num_in_ep -= dwc->num_ep_resized;
>> +	/* Ignore EP0 IN */
>> +	num_in_ep--;
>> +
>> +	/* Reserve at least one FIFO for the number of IN EPs */
>> +	min_depth = num_in_ep * (fifo+1);
>> +	remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
>> +
>> +	/* We've already reserved 1 FIFO per EP, so check what we can fit in
> 
> comment style is wrong
> 
>> +	 * addition to it.  If there is not enough remaining space, allocate
>> +	 * all the remaining space to the EP.
>> +	 */
>> +	fifo_size = (mult-1) * fifo;
> 
> spaces around subtraction
> 
Will fix the formatting comments above.
>> +	if (remaining < fifo_size) {
>> +		if (remaining > 0)
>> +			fifo_size = remaining;
>> +		else
>> +			fifo_size = 0;
>> +	}
>> +
>> +	fifo_size += fifo;
>> +	fifo_size++;
> 
> why the increment?
> 
This is to account for the last +1 in the equation from the DWC3 databook:
fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1 <- this one

>> +	dep->fifo_depth = fifo_size;
>> +
>> +	/* Check if TXFIFOs start at non-zero addr */
>> +	tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
>> +	fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
>> +
>> +	fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
>> +	if (dwc3_is_usb31(dwc))
>> +		dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
>> +	else
>> +		dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
>> +
>> +	/* Check fifo size allocation doesn't exceed available RAM size. */
>> +	if (dwc->last_fifo_depth >= ram1_depth) {
>> +		dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
>> +				(dwc->last_fifo_depth * mdwidth), ram1_depth,
>> +				dep->endpoint.name, fifo_size);
>> +		if (dwc3_is_usb31(dwc))
>> +			fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
>> +		else
>> +			fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
>> +		dwc->last_fifo_depth -= fifo_size;
>> +		dep->fifo_depth = 0;
>> +		return -ENOMEM;
>> +	}
>> +
>> +	dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
>> +	dwc->num_ep_resized++;
> 
> add a blank line here
> 
>> +	return 0;
>> +}
>> +
>>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>>  {
>>  	const struct usb_ss_ep_comp_descriptor *comp_desc;
>> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>>  	int			ret;
>>  
>>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
>> +		ret = dwc3_gadget_resize_tx_fifos(dep);
>> +		if (ret)
>> +			return ret;
> 
> doesn't it look odd that you're resizing every fifo every time a new
> endpoint is enabled? Is there a better way to achieve this?
> 
We're only resizing a single fifo per call, and clearing the previous
fifo configuration upon receiving the set address.  In the past, I know
the change was to resize all fifos after receiving the set configuration
packet.  With that approach, I believe we saw issues with some function
drivers that immediately queued a USB request during their set_alt()
routine, followed by the dwc3 ep0 driver calling the TX fifo resize
API.(as the tx fifo resize was executed after we delegated the set
config packet to the USB composite)

Thanks
Wesley

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-11  5:10     ` Wesley Cheng
@ 2020-08-11  7:12       ` Felipe Balbi
  2020-08-11 13:44         ` Alan Stern
  2020-08-12 18:34         ` Wesley Cheng
  0 siblings, 2 replies; 13+ messages in thread
From: Felipe Balbi @ 2020-08-11  7:12 UTC (permalink / raw)
  To: Wesley Cheng, agross, bjorn.andersson, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp

[-- Attachment #1: Type: text/plain, Size: 5472 bytes --]


Hi,

Wesley Cheng <wcheng@codeaurora.org> writes:
> On 8/10/2020 5:27 AM, Felipe Balbi wrote:
>> Wesley Cheng <wcheng@codeaurora.org> writes:
>> 
>> Hi,
>> 
>>> Some devices have USB compositions which may require multiple endpoints
>>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>>> sufficient for these compositions.  By utilizing flexible TX FIFO
>>> allocation, this allows for endpoints to request the required FIFO depth to
>>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>>> a larger TX FIFO size results in better TX throughput.
>> 
>> how much better? What's the impact? Got some real numbers of this
>> running with upstream kernel? I guess mass storage gadget is the
>> simplest one to test.
>> 
> Hi Felipe,
>
> Thanks for the input.
>
> Sorry for not including the numbers in the patch itself, but I did
> mention the set of mass storage tests I ran w/ the upstream kernel on
> SM8150 in the cover letter.  Let me just share that here:
>
> Test Parameters:
>  - Platform: Qualcomm SM8150
>  - bMaxBurst = 6
>  - USB req size = 256kB
>  - Num of USB reqs = 16
>  - USB Speed = Super-Speed
>  - Function Driver: Mass Storage (w/ ramdisk)
>  - Test Application: CrystalDiskMark
>
> Results:
>
> TXFIFO Depth = 3 max packets
>
> Test Case | Data Size | AVG tput (in MB/s)
> -------------------------------------------
> Sequential|1 GB x     |
> Read      |9 loops    | 193.60
> 	  |           | 195.86
>           |           | 184.77
>           |           | 193.60
> -------------------------------------------
>
> TXFIFO Depth = 6 max packets
>
> Test Case | Data Size | AVG tput (in MB/s)
> -------------------------------------------
> Sequential|1 GB x     |
> Read      |9 loops    | 287.35
> 	  |           | 304.94
>           |           | 289.64
>           |           | 293.61
> -------------------------------------------

awesome, thanks a lot for this :-) It's a considerable increase in your
setup. My only fear here is that we may end up creating a situation
where we can't allocate enough FIFO for all endpoints. This is, of
course, a consequence of the fact that we enable one endpoint at a
time.

Perhaps we could envision a way where function driver requests endpoints
in bulk, i.e. combines all endpoint requirements into a single method
call for gadget framework and, consequently, for UDC.

>>> +	if (!dwc->needs_fifo_resize)
>>> +		return 0;
>>> +
>>> +	/* resize IN endpoints except ep0 */
>>> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
>>> +		return 0;
>>> +
>>> +	/* Don't resize already resized IN endpoint */
>>> +	if (dep->fifo_depth)
>> 
>> using fifo_depth as a flag seems flakey to me. What happens when someone
>> in the future changes the behavior below and this doesn't apply anymore?
>> 
>> Also, why is this procedure called more than once for the same endpoint?
>> Does that really happen?
>> 
> I guess it can be considered a bug elsewhere (ie usb gadget or function
> driver) if this happens twice.  Plus, if we decide to keep this in the
> dwc3 enable endpoint path, the DWC3_EP_ENABLED flag will ensure it's
> called only once as well.  Its probably overkill to check fifo_depth here.

We could add a dev_WARN_ONCE() just to catch any possible bugs elsewhere.

>>> +	if (remaining < fifo_size) {
>>> +		if (remaining > 0)
>>> +			fifo_size = remaining;
>>> +		else
>>> +			fifo_size = 0;
>>> +	}
>>> +
>>> +	fifo_size += fifo;
>>> +	fifo_size++;
>> 
>> why the increment?
>> 
> This is to account for the last +1 in the equation from the DWC3 databook:
> fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1 <- this one

great, could you add this detail as a comment so it doesn't look as
cryptic? :-)

>>> +	return 0;
>>> +}
>>> +
>>>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>>>  {
>>>  	const struct usb_ss_ep_comp_descriptor *comp_desc;
>>> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>>>  	int			ret;
>>>  
>>>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
>>> +		ret = dwc3_gadget_resize_tx_fifos(dep);
>>> +		if (ret)
>>> +			return ret;
>> 
>> doesn't it look odd that you're resizing every fifo every time a new
>> endpoint is enabled? Is there a better way to achieve this?
>> 
> We're only resizing a single fifo per call, and clearing the previous
> fifo configuration upon receiving the set address.  In the past, I know
> the change was to resize all fifos after receiving the set configuration
> packet.  With that approach, I believe we saw issues with some function
> drivers that immediately queued a USB request during their set_alt()
> routine, followed by the dwc3 ep0 driver calling the TX fifo resize
> API.(as the tx fifo resize was executed after we delegated the set
> config packet to the USB composite)

I don't remember seeing such an issue. Allocating FIFOs after we know
the entire requirements would avoid another possible situation, that of
dwc3 exausting FIFO space before it knows there are more enpdoints to
enable.

One possibility around this was suggested above, something along the
lines of:

	usb_gadget_ep_enable_bulk(struct usb_gadget *, struct
		usb_ep_alloc_desc *alloc_desc)

(please think of better names, I'm hopeless haha)

-- 
balbi

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-11  7:12       ` Felipe Balbi
@ 2020-08-11 13:44         ` Alan Stern
  2020-08-12 18:34         ` Wesley Cheng
  1 sibling, 0 replies; 13+ messages in thread
From: Alan Stern @ 2020-08-11 13:44 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Wesley Cheng, agross, bjorn.andersson, gregkh, robh+dt,
	linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp

On Tue, Aug 11, 2020 at 10:12:41AM +0300, Felipe Balbi wrote:
> One possibility around this was suggested above, something along the
> lines of:
> 
> 	usb_gadget_ep_enable_bulk(struct usb_gadget *, struct
> 		usb_ep_alloc_desc *alloc_desc)
> 
> (please think of better names, I'm hopeless haha)

How about usb_gadget_enable_endpoints()?

Alan Stern

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
  2020-08-10 12:27   ` Felipe Balbi
@ 2020-08-12  2:22   ` Peter Chen
  2020-08-12  7:00     ` Wesley Cheng
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Chen @ 2020-08-12  2:22 UTC (permalink / raw)
  To: Wesley Cheng
  Cc: agross, bjorn.andersson, balbi, Greg Kroah-Hartman, robh+dt,
	lkml, devicetree, linux-arm-msm, USB list, jackp

On Wed, Jun 24, 2020 at 10:31 AM Wesley Cheng <wcheng@codeaurora.org> wrote:
>
> Some devices have USB compositions which may require multiple endpoints
> that support EP bursting.  HW defined TX FIFO sizes may not always be
> sufficient for these compositions.  By utilizing flexible TX FIFO
> allocation, this allows for endpoints to request the required FIFO depth to
> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
> a larger TX FIFO size results in better TX throughput.
>
> Ensure that one TX FIFO is reserved for every IN endpoint.  This allows for
> the FIFO logic to prevent running out of FIFO space.
>

You may do this for only allocated endpoints, but you need override
default .match_ep
API. See cdns3/gadget.c and cdns3/ep0.c as an example.

Peter

> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org>
> ---
>  drivers/usb/dwc3/core.c   |   2 +
>  drivers/usb/dwc3/core.h   |   8 +++
>  drivers/usb/dwc3/ep0.c    |  37 +++++++++++-
>  drivers/usb/dwc3/gadget.c | 115 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 161 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
> index edc17155cb2b..cca555493929 100644
> --- a/drivers/usb/dwc3/core.c
> +++ b/drivers/usb/dwc3/core.c
> @@ -1304,6 +1304,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
>                                 &tx_thr_num_pkt_prd);
>         device_property_read_u8(dev, "snps,tx-max-burst-prd",
>                                 &tx_max_burst_prd);
> +       dwc->needs_fifo_resize = device_property_read_bool(dev,
> +                               "tx-fifo-resize");
>
>         dwc->disable_scramble_quirk = device_property_read_bool(dev,
>                                 "snps,disable_scramble_quirk");
> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
> index 4c171a8e215f..ce0bf288b6ac 100644
> --- a/drivers/usb/dwc3/core.h
> +++ b/drivers/usb/dwc3/core.h
> @@ -675,6 +675,7 @@ struct dwc3_event_buffer {
>   *             isochronous START TRANSFER command failure workaround
>   * @start_cmd_status: the status of testing START TRANSFER command with
>   *             combo_num = 'b00
> + * @fifo_depth: allocated TXFIFO depth
>   */
>  struct dwc3_ep {
>         struct usb_ep           endpoint;
> @@ -727,6 +728,7 @@ struct dwc3_ep {
>         /* For isochronous START TRANSFER workaround only */
>         u8                      combo_num;
>         int                     start_cmd_status;
> +       int                     fifo_depth;
>  };
>
>  enum dwc3_phy {
> @@ -1004,6 +1006,7 @@ struct dwc3_scratchpad_array {
>   *     1       - utmi_l1_suspend_n
>   * @is_fpga: true when we are using the FPGA board
>   * @pending_events: true when we have pending IRQs to be handled
> + * @needs_fifo_resize: not all users might want fifo resizing, flag it
>   * @pullups_connected: true when Run/Stop bit is set
>   * @setup_packet_pending: true when there's a Setup Packet in FIFO. Workaround
>   * @three_stage_setup: set if we perform a three phase setup
> @@ -1044,6 +1047,8 @@ struct dwc3_scratchpad_array {
>   * @dis_metastability_quirk: set to disable metastability quirk.
>   * @imod_interval: set the interrupt moderation interval in 250ns
>   *                 increments or 0 to disable.
> + * @last_fifo_depth: total TXFIFO depth of all enabled USB IN/INT endpoints
> + * @num_ep_resized: the number of TX FIFOs that have already been resized
>   */
>  struct dwc3 {
>         struct work_struct      drd_work;
> @@ -1204,6 +1209,7 @@ struct dwc3 {
>         unsigned                is_utmi_l1_suspend:1;
>         unsigned                is_fpga:1;
>         unsigned                pending_events:1;
> +       unsigned                needs_fifo_resize:1;
>         unsigned                pullups_connected:1;
>         unsigned                setup_packet_pending:1;
>         unsigned                three_stage_setup:1;
> @@ -1236,6 +1242,8 @@ struct dwc3 {
>         unsigned                dis_metastability_quirk:1;
>
>         u16                     imod_interval;
> +       int                     last_fifo_depth;
> +       int                     num_ep_resized;
>  };
>
>  #define INCRX_BURST_MODE 0
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index 6dee4dabc0a4..76db9b530861 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -601,8 +601,9 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>  {
>         enum usb_device_state state = dwc->gadget.state;
>         u32 cfg;
> -       int ret;
> +       int ret, num, size;
>         u32 reg;
> +       struct dwc3_ep *dep;
>
>         cfg = le16_to_cpu(ctrl->wValue);
>
> @@ -611,6 +612,40 @@ static int dwc3_ep0_set_config(struct dwc3 *dwc, struct usb_ctrlrequest *ctrl)
>                 return -EINVAL;
>
>         case USB_STATE_ADDRESS:
> +               /*
> +                * If tx-fifo-resize flag is not set for the controller, then
> +                * do not clear existing allocated TXFIFO since we do not
> +                * allocate it again in dwc3_gadget_resize_tx_fifos
> +                */
> +               if (dwc->needs_fifo_resize) {
> +                       /* Read ep0IN related TXFIFO size */
> +                       dep = dwc->eps[1];
> +                       size = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
> +                       if (dwc3_is_usb31(dwc))
> +                               dep->fifo_depth = DWC31_GTXFIFOSIZ_TXFDEP(size);
> +                       else
> +                               dep->fifo_depth = DWC3_GTXFIFOSIZ_TXFDEP(size);
> +
> +                       dwc->last_fifo_depth = dep->fifo_depth;
> +                       /* Clear existing TXFIFO for all IN eps except ep0 */
> +                       for (num = 3; num < min_t(int, dwc->num_eps,
> +                               DWC3_ENDPOINTS_NUM); num += 2) {
> +                               dep = dwc->eps[num];
> +                               /* Don't change TXFRAMNUM on usb31 version */
> +                               size = dwc3_is_usb31(dwc) ?
> +                                       dwc3_readl(dwc->regs,
> +                                                  DWC3_GTXFIFOSIZ(num >> 1)) &
> +                                                  DWC31_GTXFIFOSIZ_TXFRAMNUM :
> +                                                  0;
> +
> +                               dwc3_writel(dwc->regs,
> +                                           DWC3_GTXFIFOSIZ(num >> 1),
> +                                           size);
> +                               dep->fifo_depth = 0;
> +                       }
> +                       dwc->num_ep_resized = 0;
> +               }
> +
>                 ret = dwc3_ep0_delegate_req(dwc, ctrl);
>                 /* if the cfg matches and the cfg is non zero */
>                 if (cfg && (!ret || (ret == USB_GADGET_DELAYED_STATUS))) {
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 00746c2848c0..777badf3e85d 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -540,6 +540,117 @@ static int dwc3_gadget_start_config(struct dwc3_ep *dep)
>         return 0;
>  }
>
> +/*
> + * dwc3_gadget_resize_tx_fifos - reallocate fifo spaces for current use-case
> + * @dwc: pointer to our context structure
> + *
> + * This function will a best effort FIFO allocation in order
> + * to improve FIFO usage and throughput, while still allowing
> + * us to enable as many endpoints as possible.
> + *
> + * Keep in mind that this operation will be highly dependent
> + * on the configured size for RAM1 - which contains TxFifo -,
> + * the amount of endpoints enabled on coreConsultant tool, and
> + * the width of the Master Bus.
> + *
> + * In general, FIFO depths are represented with the following equation:
> + *
> + * fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1
> + *
> + * Conversions can be done to the equation to derive the number of packets that
> + * will fit to a particular FIFO size value.
> + */
> +static int dwc3_gadget_resize_tx_fifos(struct dwc3_ep *dep)
> +{
> +       struct dwc3 *dwc = dep->dwc;
> +       int ram1_depth, mdwidth, fifo_0_start, tmp, num_in_ep;
> +       int min_depth, remaining, fifo_size, mult = 1, fifo, max_packet = 1024;
> +
> +       if (!dwc->needs_fifo_resize)
> +               return 0;
> +
> +       /* resize IN endpoints except ep0 */
> +       if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
> +               return 0;
> +
> +       /* Don't resize already resized IN endpoint */
> +       if (dep->fifo_depth)
> +               return 0;
> +
> +       ram1_depth = DWC3_RAM1_DEPTH(dwc->hwparams.hwparams7);
> +       mdwidth = DWC3_MDWIDTH(dwc->hwparams.hwparams0);
> +       /* MDWIDTH is represented in bits, we need it in bytes */
> +       mdwidth >>= 3;
> +
> +       if (((dep->endpoint.maxburst > 1) &&
> +                       usb_endpoint_xfer_bulk(dep->endpoint.desc))
> +                       || usb_endpoint_xfer_isoc(dep->endpoint.desc))
> +               mult = 3;
> +
> +       if ((dep->endpoint.maxburst > 6) &&
> +                       usb_endpoint_xfer_bulk(dep->endpoint.desc)
> +                       && dwc3_is_usb31(dwc))
> +               mult = 6;
> +
> +       /* FIFO size for a single buffer */
> +       fifo = (max_packet + mdwidth)/mdwidth;
> +       fifo++;
> +
> +       /* Calculate the number of remaining EPs w/o any FIFO */
> +       num_in_ep = dwc->num_eps/2;
> +       num_in_ep -= dwc->num_ep_resized;
> +       /* Ignore EP0 IN */
> +       num_in_ep--;
> +
> +       /* Reserve at least one FIFO for the number of IN EPs */
> +       min_depth = num_in_ep * (fifo+1);
> +       remaining = ram1_depth - min_depth - dwc->last_fifo_depth;
> +
> +       /* We've already reserved 1 FIFO per EP, so check what we can fit in
> +        * addition to it.  If there is not enough remaining space, allocate
> +        * all the remaining space to the EP.
> +        */
> +       fifo_size = (mult-1) * fifo;
> +       if (remaining < fifo_size) {
> +               if (remaining > 0)
> +                       fifo_size = remaining;
> +               else
> +                       fifo_size = 0;
> +       }
> +
> +       fifo_size += fifo;
> +       fifo_size++;
> +       dep->fifo_depth = fifo_size;
> +
> +       /* Check if TXFIFOs start at non-zero addr */
> +       tmp = dwc3_readl(dwc->regs, DWC3_GTXFIFOSIZ(0));
> +       fifo_0_start = DWC3_GTXFIFOSIZ_TXFSTADDR(tmp);
> +
> +       fifo_size |= (fifo_0_start + (dwc->last_fifo_depth << 16));
> +       if (dwc3_is_usb31(dwc))
> +               dwc->last_fifo_depth += DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +       else
> +               dwc->last_fifo_depth += DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +
> +       /* Check fifo size allocation doesn't exceed available RAM size. */
> +       if (dwc->last_fifo_depth >= ram1_depth) {
> +               dev_err(dwc->dev, "Fifosize(%d) > RAM size(%d) %s depth:%d\n",
> +                               (dwc->last_fifo_depth * mdwidth), ram1_depth,
> +                               dep->endpoint.name, fifo_size);
> +               if (dwc3_is_usb31(dwc))
> +                       fifo_size = DWC31_GTXFIFOSIZ_TXFDEP(fifo_size);
> +               else
> +                       fifo_size = DWC3_GTXFIFOSIZ_TXFDEP(fifo_size);
> +               dwc->last_fifo_depth -= fifo_size;
> +               dep->fifo_depth = 0;
> +               return -ENOMEM;
> +       }
> +
> +       dwc3_writel(dwc->regs, DWC3_GTXFIFOSIZ(dep->number >> 1), fifo_size);
> +       dwc->num_ep_resized++;
> +       return 0;
> +}
> +
>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>  {
>         const struct usb_ss_ep_comp_descriptor *comp_desc;
> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>         int                     ret;
>
>         if (!(dep->flags & DWC3_EP_ENABLED)) {
> +               ret = dwc3_gadget_resize_tx_fifos(dep);
> +               if (ret)
> +                       return ret;
> +
>                 ret = dwc3_gadget_start_config(dep);
>                 if (ret)
>                         return ret;
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-12  2:22   ` Peter Chen
@ 2020-08-12  7:00     ` Wesley Cheng
  2020-08-12 11:01       ` Peter Chen
  0 siblings, 1 reply; 13+ messages in thread
From: Wesley Cheng @ 2020-08-12  7:00 UTC (permalink / raw)
  To: Peter Chen
  Cc: agross, bjorn.andersson, balbi, Greg Kroah-Hartman, robh+dt,
	lkml, devicetree, linux-arm-msm, USB list, jackp



On 8/11/2020 7:22 PM, Peter Chen wrote:
> On Wed, Jun 24, 2020 at 10:31 AM Wesley Cheng <wcheng@codeaurora.org> wrote:
>>
>> Some devices have USB compositions which may require multiple endpoints
>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>> sufficient for these compositions.  By utilizing flexible TX FIFO
>> allocation, this allows for endpoints to request the required FIFO depth to
>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>> a larger TX FIFO size results in better TX throughput.
>>
>> Ensure that one TX FIFO is reserved for every IN endpoint.  This allows for
>> the FIFO logic to prevent running out of FIFO space.
>>
> 
> You may do this for only allocated endpoints, but you need override
> default .match_ep
> API. See cdns3/gadget.c and cdns3/ep0.c as an example.
> 
> Peter
> 

Hi Peter,

Thank you for your input.  I've actually considered doing some
matching/resizing in the .match_ep route as well, but it doesn't work
well for situations where multiple configurations are in play. The
reason being that if you look at the epautoconf APIs, the configfs
driver will use the usb_ep_autoconfig_reset() to reset the endpoints
claimed between initialization of each configuration.  This means that
the epautoconf driver expects to re-use the usb_endpoints:

static int configfs_composite_bind(struct usb_gadget *gadget,
	struct usb_gadget_driver *gdriver)
{
...

/* Go through all configs, attach all functions */
list_for_each_entry(c, &gi->cdev.configs, list) {
...
list_for_each_entry_safe(f, tmp, &cfg->func_list, list) {
	list_del(&f->list);
	ret = usb_add_function(c, f);
	if (ret) {
		list_add(&f->list, &cfg->func_list);
		goto err_purge_funcs;
	}
}
usb_ep_autoconfig_reset(cdev->gadget);
}

So in this situation, I wouldn't want the dwc3 gadget driver to assign a
different dwc3 ep for endpoints in each configuration, when we know that
only one set of EPs will be active when the host chooses.  I hope I
understood your feedback correctly, and definitely appreciate the input!

Thanks
Wesley

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-12  7:00     ` Wesley Cheng
@ 2020-08-12 11:01       ` Peter Chen
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Chen @ 2020-08-12 11:01 UTC (permalink / raw)
  To: Wesley Cheng, Peter Chen
  Cc: agross, bjorn.andersson, balbi, Greg Kroah-Hartman, robh+dt,
	lkml, devicetree, linux-arm-msm, USB list, jackp

 
 
> 
> Thank you for your input.  I've actually considered doing some matching/resizing in
> the .match_ep route as well, but it doesn't work well for situations where multiple
> configurations are in play. The reason being that if you look at the epautoconf APIs,
> the configfs driver will use the usb_ep_autoconfig_reset() to reset the endpoints
> claimed between initialization of each configuration.  This means that the
> epautoconf driver expects to re-use the usb_endpoints:
> 
> static int configfs_composite_bind(struct usb_gadget *gadget,
> 	struct usb_gadget_driver *gdriver)
> {
> ...
> 
> /* Go through all configs, attach all functions */ list_for_each_entry(c, &gi-
> >cdev.configs, list) { ...
> list_for_each_entry_safe(f, tmp, &cfg->func_list, list) {
> 	list_del(&f->list);
> 	ret = usb_add_function(c, f);
> 	if (ret) {
> 		list_add(&f->list, &cfg->func_list);
> 		goto err_purge_funcs;
> 	}
> }
> usb_ep_autoconfig_reset(cdev->gadget);
> }
> 
> So in this situation, I wouldn't want the dwc3 gadget driver to assign a different
> dwc3 ep for endpoints in each configuration, when we know that only one set of
> EPs will be active when the host chooses.  I hope I understood your feedback
> correctly, and definitely appreciate the input!
> 
 
Thanks for mention that, we didn't consider multiple configurations use case, it needs
the UDC driver to record the configuration information, it is too complex at current framework.

I think your solution is OK, reserving one packet for each IN endpoint to avoid running out of
FIFO for later endpoints and fit the first endpoints with larger FIFO room to get the best
performance, it could use as many as FIFOs the device owns.

Peter


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-11  7:12       ` Felipe Balbi
  2020-08-11 13:44         ` Alan Stern
@ 2020-08-12 18:34         ` Wesley Cheng
  2020-08-18 19:58           ` Wesley Cheng
  1 sibling, 1 reply; 13+ messages in thread
From: Wesley Cheng @ 2020-08-12 18:34 UTC (permalink / raw)
  To: Felipe Balbi, agross, bjorn.andersson, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp



On 8/11/2020 12:12 AM, Felipe Balbi wrote:
> 
> Hi,
> 
> Wesley Cheng <wcheng@codeaurora.org> writes:
>> On 8/10/2020 5:27 AM, Felipe Balbi wrote:
>>> Wesley Cheng <wcheng@codeaurora.org> writes:
>>>
>>> Hi,
>>>
>>>> Some devices have USB compositions which may require multiple endpoints
>>>> that support EP bursting.  HW defined TX FIFO sizes may not always be
>>>> sufficient for these compositions.  By utilizing flexible TX FIFO
>>>> allocation, this allows for endpoints to request the required FIFO depth to
>>>> achieve higher bandwidth.  With some higher bMaxBurst configurations, using
>>>> a larger TX FIFO size results in better TX throughput.
>>>
>>> how much better? What's the impact? Got some real numbers of this
>>> running with upstream kernel? I guess mass storage gadget is the
>>> simplest one to test.
>>>
>> Hi Felipe,
>>
>> Thanks for the input.
>>
>> Sorry for not including the numbers in the patch itself, but I did
>> mention the set of mass storage tests I ran w/ the upstream kernel on
>> SM8150 in the cover letter.  Let me just share that here:
>>
>> Test Parameters:
>>  - Platform: Qualcomm SM8150
>>  - bMaxBurst = 6
>>  - USB req size = 256kB
>>  - Num of USB reqs = 16
>>  - USB Speed = Super-Speed
>>  - Function Driver: Mass Storage (w/ ramdisk)
>>  - Test Application: CrystalDiskMark
>>
>> Results:
>>
>> TXFIFO Depth = 3 max packets
>>
>> Test Case | Data Size | AVG tput (in MB/s)
>> -------------------------------------------
>> Sequential|1 GB x     |
>> Read      |9 loops    | 193.60
>> 	  |           | 195.86
>>           |           | 184.77
>>           |           | 193.60
>> -------------------------------------------
>>
>> TXFIFO Depth = 6 max packets
>>
>> Test Case | Data Size | AVG tput (in MB/s)
>> -------------------------------------------
>> Sequential|1 GB x     |
>> Read      |9 loops    | 287.35
>> 	  |           | 304.94
>>           |           | 289.64
>>           |           | 293.61
>> -------------------------------------------
> 
> awesome, thanks a lot for this :-) It's a considerable increase in your
> setup. My only fear here is that we may end up creating a situation
> where we can't allocate enough FIFO for all endpoints. This is, of
> course, a consequence of the fact that we enable one endpoint at a
> time.
> 
> Perhaps we could envision a way where function driver requests endpoints
> in bulk, i.e. combines all endpoint requirements into a single method
> call for gadget framework and, consequently, for UDC.
> 
Hi Felipe,

I agree...Resizing the txfifo is not as straightforward as it sounds :).
 Would be interesting to see how this affects tput on other platforms as
well.  We had a few discussions within our team, and came up with the
logic implemented in this patch to reserve at least 1 txfifo per
endpoint. Then we allocate any additional fifo space requests based on
the remaining space left.  That way we could avoid over allocating, but
the trade off is that we may have unused EPs taking up fifo space.

I didn't consider branching out to changing the gadget framework, so let
me take a look at your suggestion to see how it turns out.

>>>> +	if (!dwc->needs_fifo_resize)
>>>> +		return 0;
>>>> +
>>>> +	/* resize IN endpoints except ep0 */
>>>> +	if (!usb_endpoint_dir_in(dep->endpoint.desc) || dep->number <= 1)
>>>> +		return 0;
>>>> +
>>>> +	/* Don't resize already resized IN endpoint */
>>>> +	if (dep->fifo_depth)
>>>
>>> using fifo_depth as a flag seems flakey to me. What happens when someone
>>> in the future changes the behavior below and this doesn't apply anymore?
>>>
>>> Also, why is this procedure called more than once for the same endpoint?
>>> Does that really happen?
>>>
>> I guess it can be considered a bug elsewhere (ie usb gadget or function
>> driver) if this happens twice.  Plus, if we decide to keep this in the
>> dwc3 enable endpoint path, the DWC3_EP_ENABLED flag will ensure it's
>> called only once as well.  Its probably overkill to check fifo_depth here.
> 
> We could add a dev_WARN_ONCE() just to catch any possible bugs elsewhere.
> 

OK, I can add that.

>>>> +	if (remaining < fifo_size) {
>>>> +		if (remaining > 0)
>>>> +			fifo_size = remaining;
>>>> +		else
>>>> +			fifo_size = 0;
>>>> +	}
>>>> +
>>>> +	fifo_size += fifo;
>>>> +	fifo_size++;
>>>
>>> why the increment?
>>>
>> This is to account for the last +1 in the equation from the DWC3 databook:
>> fifo_size = mult * ((max_packet + mdwidth)/mdwidth + 1) + 1 <- this one
> 
> great, could you add this detail as a comment so it doesn't look as
> cryptic? :-)
> 

Sure, of course.

>>>> +	return 0;
>>>> +}
>>>> +
>>>>  static int dwc3_gadget_set_ep_config(struct dwc3_ep *dep, unsigned int action)
>>>>  {
>>>>  	const struct usb_ss_ep_comp_descriptor *comp_desc;
>>>> @@ -620,6 +731,10 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
>>>>  	int			ret;
>>>>  
>>>>  	if (!(dep->flags & DWC3_EP_ENABLED)) {
>>>> +		ret = dwc3_gadget_resize_tx_fifos(dep);
>>>> +		if (ret)
>>>> +			return ret;
>>>
>>> doesn't it look odd that you're resizing every fifo every time a new
>>> endpoint is enabled? Is there a better way to achieve this?
>>>
>> We're only resizing a single fifo per call, and clearing the previous
>> fifo configuration upon receiving the set address.  In the past, I know
>> the change was to resize all fifos after receiving the set configuration
>> packet.  With that approach, I believe we saw issues with some function
>> drivers that immediately queued a USB request during their set_alt()
>> routine, followed by the dwc3 ep0 driver calling the TX fifo resize
>> API.(as the tx fifo resize was executed after we delegated the set
>> config packet to the USB composite)
> 
> I don't remember seeing such an issue. Allocating FIFOs after we know
> the entire requirements would avoid another possible situation, that of
> dwc3 exausting FIFO space before it knows there are more enpdoints to
> enable.
> 
> One possibility around this was suggested above, something along the
> lines of:
> 
> 	usb_gadget_ep_enable_bulk(struct usb_gadget *, struct
> 		usb_ep_alloc_desc *alloc_desc)
> 
> (please think of better names, I'm hopeless haha)
> 
Sorry forgot to mention that this is something we caught internally with
our testing using the older txfifo resizing change.  As mentioned above,
let me dive into this suggestion a bit more.

Thanks
Wesley

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements
  2020-08-12 18:34         ` Wesley Cheng
@ 2020-08-18 19:58           ` Wesley Cheng
  0 siblings, 0 replies; 13+ messages in thread
From: Wesley Cheng @ 2020-08-18 19:58 UTC (permalink / raw)
  To: Felipe Balbi, agross, bjorn.andersson, gregkh, robh+dt
  Cc: linux-kernel, devicetree, linux-arm-msm, linux-usb, jackp



On 8/12/2020 11:34 AM, Wesley Cheng wrote:
>>
>> awesome, thanks a lot for this :-) It's a considerable increase in your
>> setup. My only fear here is that we may end up creating a situation
>> where we can't allocate enough FIFO for all endpoints. This is, of
>> course, a consequence of the fact that we enable one endpoint at a
>> time.
>>
>> Perhaps we could envision a way where function driver requests endpoints
>> in bulk, i.e. combines all endpoint requirements into a single method
>> call for gadget framework and, consequently, for UDC.
>>
> Hi Felipe,
> 
> I agree...Resizing the txfifo is not as straightforward as it sounds :).
>  Would be interesting to see how this affects tput on other platforms as
> well.  We had a few discussions within our team, and came up with the
> logic implemented in this patch to reserve at least 1 txfifo per
> endpoint. Then we allocate any additional fifo space requests based on
> the remaining space left.  That way we could avoid over allocating, but
> the trade off is that we may have unused EPs taking up fifo space.
> 
> I didn't consider branching out to changing the gadget framework, so let
> me take a look at your suggestion to see how it turns out.
> 

Hi Felipe,

Instead of catching the out of FIFO memory issue during the ep enable
stage, I was thinking if we could do it somewhere during the bind.  Then
this would allow for at least failing the bind instead of having an
enumerated device which doesn't work. (will happen if we bail out during
ep enable phase)  The idea I had was the following:

Introduce a new USB gadget function pointer, say
usb_gadget_check_config(struct usb_gadget *gadget, unsigned long ep_map)

The purpose for the ep_map is to carry information about the endpoints
the configuration requires, since each function driver will define the
endpoint descriptor(s) it will advertise to the host.  We have access to
these ep desc after the bind() routine is executed for the function
driver, so we can update this map after every bind.  The configfs driver
will call the check config API every time a configuration is added.

static int configfs_composite_bind(struct usb_gadget *gadget,
		struct usb_gadget_driver *gdriver)
{
...
  /* Go through all configs, attach all functions */
  list_for_each_entry(c, &gi->cdev.configs, list) {
  ...
    list_for_each_entry_safe(f, tmp, &cfg->func_list, list) {
    ...
      	if (f->ss_descriptors) {
	  struct usb_descriptor_header **descriptors;
	  descriptors = f->ss_descriptors;
	  for (; *descriptors; ++descriptors) {
	    struct usb_endpoint_descriptor *ep;
	    int addr;
		
	    if ((*descriptors)->bDescriptorType != USB_DT_ENDPOINT)
		continue;
		
	    ep = (struct usb_endpoint_descriptor *)*descriptors;
	    addr = ((ep->bEndpointAddress & 0x80) >> 3)
	    |  (ep->bEndpointAddress & 0x0f);
	    set_bit(addr, &ep_map);
	  }
	}
    usb_gadget_check_config(cdev->gadget, ep_map);

What it'll allow us to do is to decode the ep_map in the dwc3/udc driver
to determine if we have enough fifo space. Also, if we wanted to utilize
this ep map for the actual resizing stage, we could eliminate the issue
of not knowing how many EPs will be enabled, and allocating potentially
unused fifos due to unused eps.


Thanks
Wesley

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-18 19:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24  2:28 [RFC v4 0/3] Re-introduce TX FIFO resize for larger EP bursting Wesley Cheng
2020-06-24  2:28 ` [RFC v4 1/3] usb: dwc3: Resize TX FIFOs to meet EP bursting requirements Wesley Cheng
2020-08-10 12:27   ` Felipe Balbi
2020-08-11  5:10     ` Wesley Cheng
2020-08-11  7:12       ` Felipe Balbi
2020-08-11 13:44         ` Alan Stern
2020-08-12 18:34         ` Wesley Cheng
2020-08-18 19:58           ` Wesley Cheng
2020-08-12  2:22   ` Peter Chen
2020-08-12  7:00     ` Wesley Cheng
2020-08-12 11:01       ` Peter Chen
2020-06-24  2:28 ` [RFC v4 2/3] arm64: boot: dts: qcom: sm8150: Enable dynamic TX FIFO resize logic Wesley Cheng
2020-06-24  2:28 ` [RFC v4 3/3] dt-bindings: usb: dwc3: Add entry for tx-fifo-resize Wesley Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).