Linux-USB Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC
@ 2019-12-20 13:38 Tejas Joglekar
  2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Tejas Joglekar @ 2019-12-20 13:38 UTC (permalink / raw)
  To: Felipe Balbi, Tejas Joglekar, linux-usb, devicetree, Rob Herring,
	Mathias Nyman, Mark Rutland
  Cc: John Youn

The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
for HS. The controller loads and updates the TRB cache from the
transfer ring in system memory whenever the driver issues a start
transfer or update transfer command.

For chained TRBs, the Synopsys xHC requires that the total amount of
bytes for all TRBs loaded in the TRB cache be greater than or equal to
1 MPS. Or the chain ends within the TRB cache (with a last TRB).

If this requirement is not met, the controller will not be able to
send or receive a packet and it will hang causing a driver timeout and
error.

This patch set adds logic to the XHCI driver to detect and prevent this
from happening along with the quirk to enable this logic for Synopsys
HAPS platform.


Tejas Joglekar (4):
  usb: xhci: Synopsys xHC consolidate TRBs
  usb: dwc3: Add device property consolidate-trbs
  usb: xhci: Set quirk for XHCI_CONSOLIDATE_TRBS
  dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs

 Documentation/devicetree/bindings/usb/dwc3.txt     |   6 +
 Documentation/devicetree/bindings/usb/usb-xhci.txt |   1 +
 drivers/usb/dwc3/core.c                            |   2 +
 drivers/usb/dwc3/core.h                            |   2 +
 drivers/usb/dwc3/dwc3-haps.c                       |   1 +
 drivers/usb/dwc3/host.c                            |   3 +
 drivers/usb/host/xhci-pci.c                        |   3 +
 drivers/usb/host/xhci-plat.c                       |   3 +
 drivers/usb/host/xhci-ring.c                       | 278 ++++++++++++++++++++-
 drivers/usb/host/xhci.c                            |   3 +
 drivers/usb/host/xhci.h                            |  23 ++
 11 files changed, 317 insertions(+), 8 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2019-12-20 13:38 [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC Tejas Joglekar
@ 2019-12-20 13:39 ` Tejas Joglekar
  2020-01-02  9:08   ` David Laight
  2020-01-03 16:44   ` Mathias Nyman
  2019-12-20 13:39 ` [RFC PATCH 2/4] usb: dwc3: Add device property consolidate-trbs Tejas Joglekar
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Tejas Joglekar @ 2019-12-20 13:39 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
for HS. The controller loads and updates the TRB cache from the transfer
ring in system memory whenever the driver issues a start transfer or
update transfer command.

For chained TRBs, the Synopsys xHC requires that the total amount of
bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
MPS. Or the chain ends within the TRB cache (with a last TRB).

If this requirement is not met, the controller will not be able to send
or receive a packet and it will hang causing a driver timeout and error.

This can be a problem if a class driver queues SG requests with many
small-buffer entries. The XHCI driver will create a chained TRB for each
entry which may trigger this issue.

This patch adds logic to the XHCI driver to detect and prevent this from
happening.

For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
the TRBs and if the chain continues and we don't make up at least 1 MPS,
we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
into the last TRB.

We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
be a link and/or event data TRB that take up to 2 of the cache entries.
And we consolidate the next (4 * MPS) to improve performance.

We discovered this issue with devices on other platforms but have not
yet come across any device that triggers this on Linux. But it could be
a real problem now or in the future. All it takes is N number of small
chained TRBs. And other instances of the Synopsys IP may have smaller
values for the TRB_CACHE_SIZE which would exacerbate the problem.

We verified this patch using our internal driver and testing framework.

Signed-off-by: Tejas Joglekar <joglekar@synopsys.com>
---
 drivers/usb/host/xhci-ring.c | 278 +++++++++++++++++++++++++++++++++++++++++--
 drivers/usb/host/xhci.c      |   3 +
 drivers/usb/host/xhci.h      |  23 ++++
 3 files changed, 296 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d23f7408c81f..94e4ed5a17c0 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -690,6 +690,65 @@ static void xhci_unmap_td_bounce_buffer(struct xhci_hcd *xhci,
 	seg->bounce_offs = 0;
 }
 
+static void xhci_unmap_consolidated_trbs(struct xhci_hcd *xhci,
+					 struct xhci_ring *ring,
+					 struct xhci_td *td)
+{
+	struct device *sysdev = xhci_to_hcd(xhci)->self.sysdev;
+	struct bounce_trb *bounce_trb;
+	struct bounce_trb *tmp;
+	struct urb *urb = td->urb;
+	size_t len;
+	unsigned long flags;
+	unsigned int max_pkt;
+	enum dma_data_direction direction;
+
+	if (!ring || !urb)
+		return;
+
+	max_pkt = usb_endpoint_maxp(&urb->ep->desc);
+
+	if (usb_urb_dir_out(urb))
+		direction = DMA_TO_DEVICE;
+	else
+		direction = DMA_FROM_DEVICE;
+
+	spin_lock_irqsave(&td->lock, flags);
+	list_for_each_entry_safe(bounce_trb, tmp,
+				 &td->bounce_trb_list,
+				 entry) {
+		if (bounce_trb->len > max_pkt)
+			dma_unmap_single(sysdev, bounce_trb->dma,
+					 ring->bounce_buf_len * 4,
+					 direction);
+		else
+			dma_unmap_single(sysdev, bounce_trb->dma,
+					 ring->bounce_buf_len,
+					 direction);
+
+		if (usb_urb_dir_in(urb)) {
+			/*
+			 * For IN direction we need to copy
+			 * the data from bounce to sg
+			 */
+			len = sg_pcopy_from_buffer(urb->sg,
+						   urb->num_sgs,
+						   bounce_trb->buf,
+						   bounce_trb->len,
+						   bounce_trb->offs);
+			if (len != bounce_trb->len)
+				xhci_warn(xhci,
+					  "WARN Wrong buffer read length: %zu != %d\n",
+					  len, bounce_trb->len);
+		}
+		list_del(&bounce_trb->entry);
+		kfree(bounce_trb->buf);
+		kfree(bounce_trb);
+	}
+
+	spin_unlock_irqrestore(&td->lock, flags);
+}
+
 /*
  * When we get a command completion for a Stop Endpoint Command, we need to
  * unlink any cancelled TDs from the ring.  There are two ways to do that:
@@ -821,6 +880,9 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 		 * just overwrite it (because the URB has been unlinked).
 		 */
 		ep_ring = xhci_urb_to_transfer_ring(xhci, cur_td->urb);
+		if (xhci->quirks & XHCI_CONSOLIDATE_TRBS)
+			xhci_unmap_consolidated_trbs(xhci, ep_ring,
+						     cur_td);
 		xhci_unmap_td_bounce_buffer(xhci, ep_ring, cur_td);
 		inc_td_cnt(cur_td->urb);
 		if (last_td_in_urb(cur_td))
@@ -847,8 +909,10 @@ static void xhci_kill_ring_urbs(struct xhci_hcd *xhci, struct xhci_ring *ring)
 		if (!list_empty(&cur_td->cancelled_td_list))
 			list_del_init(&cur_td->cancelled_td_list);
 
+		if (xhci->quirks & XHCI_CONSOLIDATE_TRBS)
+			xhci_unmap_consolidated_trbs(xhci, ring,
+						     cur_td);
 		xhci_unmap_td_bounce_buffer(xhci, ring, cur_td);
-
 		inc_td_cnt(cur_td->urb);
 		if (last_td_in_urb(cur_td))
 			xhci_giveback_urb_in_irq(xhci, cur_td, -ESHUTDOWN);
@@ -1905,6 +1969,10 @@ static int xhci_td_cleanup(struct xhci_hcd *xhci, struct xhci_td *td,
 	/* Clean up the endpoint's TD list */
 	urb = td->urb;
 
+	/* If TRB consolidation has happened then unmap consolidated buffer */
+	if (xhci->quirks & XHCI_CONSOLIDATE_TRBS)
+		xhci_unmap_consolidated_trbs(xhci, ep_ring, td);
+
 	/* if a bounce buffer was used to align this td then unmap it */
 	xhci_unmap_td_bounce_buffer(xhci, ep_ring, td);
 
@@ -3008,6 +3076,7 @@ static int prepare_transfer(struct xhci_hcd *xhci,
 
 	INIT_LIST_HEAD(&td->td_list);
 	INIT_LIST_HEAD(&td->cancelled_td_list);
+	INIT_LIST_HEAD(&td->bounce_trb_list);
 
 	if (td_index == 0) {
 		ret = usb_hcd_link_urb_to_ep(bus_to_hcd(urb->dev->bus), urb);
@@ -3252,11 +3321,160 @@ static int xhci_align_td(struct xhci_hcd *xhci, struct urb *urb, u32 enqd_len,
 	seg->bounce_len = new_buff_len;
 	seg->bounce_offs = enqd_len;
 
-	xhci_dbg(xhci, "Bounce align, new buff len %d\n", *trb_buff_len);
+	return 1;
+}
+
+/*
+ * This function performs TD alignment or is used to consolidate the TRBs up to
+ * 4 * MPS or remaining data size if remaining transfer size is less than 4 *
+ * MPS. We are consolidating (4 * MPS) data when the buffer sizes spread across
+ * the TRB cache is not at least MPS.
+ *
+ * @xhci: xHC driver pointer
+ * @urb: current URB
+ * @enqd_len: total transfer size enqueued in the TRBs
+ * @cache_enq_len: total buffer size spread across the trb cache
+ * @trb_buff_len: current trb buffer length
+ * @td: current transfer descriptor
+ * @align_td: do TD alignment
+ */
+static int xhci_consolidate_trbs(struct xhci_hcd *xhci, struct urb *urb,
+				 u32 enqd_len, u32 cache_enq_len,
+				 u32 *trb_buff_len, struct xhci_td *td,
+				 bool align_td)
+{
+	struct device *sysdev = xhci_to_hcd(xhci)->self.sysdev;
+	bool is_mps;
+	unsigned int unalign;
+	unsigned int max_pkt;
+	unsigned int buffer_length;
+	unsigned long flags;
+	u32 new_buff_len;
+	size_t len;
+	struct bounce_trb *bounce_trb;
+
+	max_pkt = usb_endpoint_maxp(&urb->ep->desc);
+
+	if (!align_td) {
+		is_mps = (cache_enq_len + *trb_buff_len) >= max_pkt;
+
+		/* TRB_CACHE_SIZE trbs has buffer size more than mps */
+		if (is_mps)
+			return 0;
+
+		/*
+		 * We consolidate 4 * MPS or the remaining data size
+		 * if buffers remaining are less than 4 * MPS size.
+		 * This 4 * MPS consolidation gives good performance
+		 * when the SG buffer sizes are of few bytes each.
+		 */
+		buffer_length = 4 * max_pkt;
+	} else {
+		unalign = (enqd_len + *trb_buff_len) % max_pkt;
+		/*
+		 * We got lucky, last normal TRB data on
+		 * segment is packet aligned
+		 *
+		 */
+		if (unalign == 0) {
+			xhci_dbg(xhci, "We got lucky\n");
+			return 0;
+		}
+		/* is the last normal TRB alignable by splitting it */
+		if (*trb_buff_len > unalign) {
+			*trb_buff_len -= unalign;
+			xhci_dbg(xhci, "split align, new buff len %d\n",
+				 *trb_buff_len);
+			return 0;
+		}
+		buffer_length = max_pkt;
+	}
+
+	bounce_trb = kzalloc_node(sizeof(*bounce_trb), GFP_ATOMIC,
+				  dev_to_node(sysdev));
+	if (!bounce_trb)
+		return 0;
+
+	bounce_trb->buf = kzalloc_node(buffer_length, GFP_ATOMIC,
+				       dev_to_node(sysdev));
+	if (!bounce_trb->buf) {
+		kfree(bounce_trb);
+		return 0;
+	}
+
+	/*
+	 * We want enqd_len + trb_buff_len to sum up to at least MPS when we are
+	 * doing TRB buffer consolidation.
+	 * When are aligning td  we want enqd_len + trb_buff_len to sum up to a
+	 * number aligned to number which is divisible by the endpoint's
+	 * wMaxPacketSize. IOW:
+	 * (size of currently enqueued TRBs + remainder) % wMaxPacketSize == 0.
+	 */
+	if (!align_td)
+		new_buff_len = buffer_length - (cache_enq_len % max_pkt);
+	else
+		new_buff_len = buffer_length - (enqd_len % max_pkt);
+
+	if (new_buff_len > (urb->transfer_buffer_length - enqd_len))
+		new_buff_len = (urb->transfer_buffer_length - enqd_len);
+
+	if (usb_urb_dir_out(urb)) {
+		len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs,
+					 bounce_trb->buf, new_buff_len,
+					 enqd_len);
+		if (len != new_buff_len)
+			xhci_warn(xhci,
+				  "WARN Wrong TRB cache buffer write length: %zu != %d\n",
+				  len, new_buff_len);
+		bounce_trb->dma = dma_map_single(sysdev, bounce_trb->buf,
+						 buffer_length, DMA_TO_DEVICE);
+	} else {
+		bounce_trb->dma = dma_map_single(sysdev, bounce_trb->buf,
+						 buffer_length,
+						 DMA_FROM_DEVICE);
+	}
+
+	if (dma_mapping_error(sysdev, bounce_trb->dma)) {
+		/* try without MPS TRB cache data */
+		kfree(bounce_trb);
+		xhci_warn(xhci, "Failed mapping bounce TRB buffer\n");
+		return 0;
+	}
+
+	*trb_buff_len = new_buff_len;
+	bounce_trb->len = new_buff_len;
+	bounce_trb->offs = enqd_len;
+
+	spin_lock_irqsave(&td->lock, flags);
+	list_add_tail(&bounce_trb->entry, &td->bounce_trb_list);
+	spin_unlock_irqrestore(&td->lock, flags);
 
 	return 1;
 }
 
+static void xhci_handle_consolidated_trbs(struct xhci_hcd *xhci,
+					  struct urb *urb,
+					  unsigned int enqd_len,
+					  unsigned int cache_enq_len,
+					  unsigned int *trb_buff_len,
+					  struct xhci_td *td,
+					  u64 *send_addr, bool align_td)
+{
+	struct bounce_trb *bounce_trb;
+	unsigned long flags;
+
+	if (xhci_consolidate_trbs(xhci, urb, enqd_len, cache_enq_len,
+				  trb_buff_len, td, align_td)) {
+		spin_lock_irqsave(&td->lock, flags);
+		bounce_trb = list_last_entry(&td->bounce_trb_list,
+					     struct bounce_trb, entry);
+		spin_unlock_irqrestore(&td->lock, flags);
+		if (bounce_trb->dma)
+			*send_addr = bounce_trb->dma;
+		return;
+	}
+}
+
 /* This is very similar to what ehci-q.c qtd_fill() does */
 int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 		struct urb *urb, int slot_id, unsigned int ep_index)
@@ -3269,12 +3487,21 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 	bool more_trbs_coming = true;
 	bool need_zero_pkt = false;
 	bool first_trb = true;
-	unsigned int num_trbs;
+	bool align_td = false;
+	unsigned int num_trbs, max_pkt, trb_cache_size;
 	unsigned int start_cycle, num_sgs = 0;
-	unsigned int enqd_len, block_len, trb_buff_len, full_len;
+	unsigned int enqd_len, block_len, trb_buff_len, full_len, cache_enq_len;
 	int sent_len, ret;
 	u32 field, length_field, remainder;
 	u64 addr, send_addr;
+	int trb_count = 0;
+	u64 count = 0;
+
+	max_pkt = usb_endpoint_maxp(&urb->ep->desc);
+	if (urb->dev->speed >= USB_SPEED_SUPER)
+		trb_cache_size = xhci->trb_cache_size_ss;
+	else
+		trb_cache_size = xhci->trb_cache_size_hs;
 
 	ring = xhci_urb_to_transfer_ring(xhci, urb);
 	if (!ring)
@@ -3315,10 +3542,10 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 	start_trb = &ring->enqueue->generic;
 	start_cycle = ring->cycle_state;
 	send_addr = addr;
-
+	cache_enq_len = 0;
 	/* Queue the TRBs, even if they are zero-length */
 	for (enqd_len = 0; first_trb || enqd_len < full_len;
-			enqd_len += trb_buff_len) {
+	     enqd_len += trb_buff_len) {
 		field = TRB_TYPE(TRB_NORMAL);
 
 		/* TRB buffer should not cross 64KB boundaries */
@@ -3341,16 +3568,39 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 		 */
 		if (enqd_len + trb_buff_len < full_len) {
 			field |= TRB_CHAIN;
-			if (trb_is_link(ring->enqueue + 1)) {
+
+			if (trb_is_link(ring->enqueue + 1))
+				align_td = true;
+			else
+				align_td = false;
+
+			/*
+			 * Check for data size in TRB cache per
+			 * TRB_CACHE_SIZE - 3.
+			 * 3 is to take one status TRB and
+			 * one link TRB into account.
+			 */
+			if ((!(xhci->quirks & XHCI_CONSOLIDATE_TRBS) ||
+			     urb->stream_id) && align_td) {
 				if (xhci_align_td(xhci, urb, enqd_len,
 						  &trb_buff_len,
 						  ring->enq_seg)) {
 					send_addr = ring->enq_seg->bounce_dma;
-					/* assuming TD won't span 2 segs */
 					td->bounce_seg = ring->enq_seg;
 				}
+			} else if ((count == trb_cache_size - 3 &&
+				    cache_enq_len < max_pkt) || align_td) {
+				xhci_handle_consolidated_trbs(xhci,
+							      urb,
+							      enqd_len,
+							      cache_enq_len,
+							      &trb_buff_len,
+							      td,
+							      &send_addr,
+							      align_td);
 			}
 		}
+
 		if (enqd_len + trb_buff_len >= full_len) {
 			field &= ~TRB_CHAIN;
 			field |= TRB_IOC;
@@ -3383,9 +3633,21 @@ int xhci_queue_bulk_tx(struct xhci_hcd *xhci, gfp_t mem_flags,
 				length_field,
 				field);
 
+		trb_count++;
 		addr += trb_buff_len;
 		sent_len = trb_buff_len;
 
+		if ((xhci->quirks & XHCI_CONSOLIDATE_TRBS) &&
+		    !urb->stream_id) {
+			count++;
+			cache_enq_len += trb_buff_len;
+			if (count == trb_cache_size - 2 || align_td) {
+				count = 0;
+				cache_enq_len = 0;
+				align_td = false;
+			}
+		}
+
 		while (sg && sent_len >= block_len) {
 			/* New sg entry */
 			--num_sgs;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index dbac0fa9748d..c9e2181f68c8 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -596,6 +596,9 @@ static int xhci_init(struct usb_hcd *hcd)
 		compliance_mode_recovery_timer_init(xhci);
 	}
 
+	xhci->trb_cache_size_hs = TRB_CACHE_SIZE_HS;
+	xhci->trb_cache_size_ss = TRB_CACHE_SIZE_SS;
+
 	return retval;
 }
 
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 13d8838cd552..1882389ca4ad 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -30,6 +30,10 @@
 /* Section 5.3.3 - MaxPorts */
 #define MAX_HC_PORTS		127
 
+/* Default TRB Cache size for SNPS xHC */
+#define TRB_CACHE_SIZE_HS	8
+#define TRB_CACHE_SIZE_SS	16
+
 /*
  * xHCI register interface.
  * This corresponds to the eXtensible Host Controller Interface (xHCI)
@@ -1519,6 +1523,14 @@ static inline const char *xhci_trb_type_string(u8 type)
 					(addr & (TRB_MAX_BUFF_SIZE - 1)))
 #define MAX_SOFT_RETRY		3
 
+struct bounce_trb {
+	dma_addr_t		dma;
+	void			*buf;
+	unsigned int		offs;
+	unsigned int		len;
+	struct list_head	entry;
+};
+
 struct xhci_segment {
 	union xhci_trb		*trbs;
 	/* private to HCD */
@@ -1534,7 +1546,10 @@ struct xhci_segment {
 struct xhci_td {
 	struct list_head	td_list;
 	struct list_head	cancelled_td_list;
+	struct list_head	bounce_trb_list;
 	struct urb		*urb;
+	/* TD spinlock for list access */
+	spinlock_t		lock;
 	struct xhci_segment	*start_seg;
 	union xhci_trb		*first_trb;
 	union xhci_trb		*last_trb;
@@ -1867,6 +1882,7 @@ struct xhci_hcd {
 #define XHCI_DEFAULT_PM_RUNTIME_ALLOW	BIT_ULL(33)
 #define XHCI_RESET_PLL_ON_DISCONNECT	BIT_ULL(34)
 #define XHCI_SNPS_BROKEN_SUSPEND    BIT_ULL(35)
+#define XHCI_CONSOLIDATE_TRBS	BIT_ULL(36)
 
 	unsigned int		num_active_eps;
 	unsigned int		limit_active_eps;
@@ -1894,6 +1910,13 @@ struct xhci_hcd {
 	void			*dbc;
 	/* platform-specific data -- must come last */
 	unsigned long		priv[0] __aligned(sizeof(s64));
+	/*
+	 * TRB cache size
+	 * These are set to reflect the TRB cache size.
+	 * Currently they are set to defaults.
+	 */
+	u8			trb_cache_size_ss;
+	u8			trb_cache_size_hs;
 };
 
 /* Platform specific overrides to generic XHCI hc_driver ops */
-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/4] usb: dwc3: Add device property consolidate-trbs
  2019-12-20 13:38 [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC Tejas Joglekar
  2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
@ 2019-12-20 13:39 ` Tejas Joglekar
  2019-12-20 13:40 ` [RFC PATCH 3/4] usb: xhci: Set quirk for XHCI_CONSOLIDATE_TRBS Tejas Joglekar
  2019-12-20 13:40 ` [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs Tejas Joglekar
  3 siblings, 0 replies; 11+ messages in thread
From: Tejas Joglekar @ 2019-12-20 13:39 UTC (permalink / raw)
  To: Felipe Balbi, Tejas Joglekar, linux-usb; +Cc: John Youn

This commit adds the consolidate-trbs property to enable quirk for the
XHCI driver with Synopsys xHC. This property is enabled as initial
property for the dwc3-haps driver.

Signed-off-by: Tejas Joglekar <joglekar@synopsys.com>
---
 drivers/usb/dwc3/core.c      | 2 ++
 drivers/usb/dwc3/core.h      | 2 ++
 drivers/usb/dwc3/dwc3-haps.c | 1 +
 drivers/usb/dwc3/host.c      | 3 +++
 4 files changed, 8 insertions(+)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f561c6c9e8a9..dc0cfcb05b52 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1296,6 +1296,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
 				"snps,usb3_lpm_capable");
 	dwc->usb2_lpm_disable = device_property_read_bool(dev,
 				"snps,usb2-lpm-disable");
+	dwc->consolidate_trbs = device_property_read_bool(dev,
+				"snps,consolidate-trbs");
 	device_property_read_u8(dev, "snps,rx-thr-num-pkt-prd",
 				&rx_thr_num_pkt_prd);
 	device_property_read_u8(dev, "snps,rx-max-burst-prd",
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 1c8b349379af..36b27ea1eb3c 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1000,6 +1000,7 @@ struct dwc3_scratchpad_array {
  *			not needed for DWC_usb31 version 1.70a-ea06 and below
  * @usb3_lpm_capable: set if hadrware supports Link Power Management
  * @usb2_lpm_disable: set to disable usb2 lpm
+ * @consolidate_trbs: set to enable the TRB consolidation
  * @disable_scramble_quirk: set if we enable the disable scramble quirk
  * @u2exit_lfps_quirk: set if we enable u2exit lfps quirk
  * @u2ss_inp3_quirk: set if we enable P3 OK for U2/SS Inactive quirk
@@ -1195,6 +1196,7 @@ struct dwc3 {
 	unsigned		dis_start_transfer_quirk:1;
 	unsigned		usb3_lpm_capable:1;
 	unsigned		usb2_lpm_disable:1;
+	unsigned		consolidate_trbs:1;
 
 	unsigned		disable_scramble_quirk:1;
 	unsigned		u2exit_lfps_quirk:1;
diff --git a/drivers/usb/dwc3/dwc3-haps.c b/drivers/usb/dwc3/dwc3-haps.c
index 3cecbf169452..639ef9f01926 100644
--- a/drivers/usb/dwc3/dwc3-haps.c
+++ b/drivers/usb/dwc3/dwc3-haps.c
@@ -29,6 +29,7 @@ static const struct property_entry initial_properties[] = {
 	PROPERTY_ENTRY_BOOL("snps,usb3_lpm_capable"),
 	PROPERTY_ENTRY_BOOL("snps,has-lpm-erratum"),
 	PROPERTY_ENTRY_BOOL("snps,dis_enblslpm_quirk"),
+	PROPERTY_ENTRY_BOOL("snps,consolidate-trbs"),
 	PROPERTY_ENTRY_BOOL("linux,sysdev_is_parent"),
 	{ },
 };
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 5567ed2cddbe..5e8db49370a3 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -93,6 +93,9 @@ int dwc3_host_init(struct dwc3 *dwc)
 	if (dwc->usb2_lpm_disable)
 		props[prop_idx++].name = "usb2-lpm-disable";
 
+	if (dwc->consolidate_trbs)
+		props[prop_idx++].name = "consolidate-trbs";
+
 	/**
 	 * WORKAROUND: dwc3 revisions <=3.00a have a limitation
 	 * where Port Disable command doesn't work.
-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/4] usb: xhci: Set quirk for XHCI_CONSOLIDATE_TRBS
  2019-12-20 13:38 [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC Tejas Joglekar
  2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
  2019-12-20 13:39 ` [RFC PATCH 2/4] usb: dwc3: Add device property consolidate-trbs Tejas Joglekar
@ 2019-12-20 13:40 ` Tejas Joglekar
  2019-12-20 13:40 ` [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs Tejas Joglekar
  3 siblings, 0 replies; 11+ messages in thread
From: Tejas Joglekar @ 2019-12-20 13:40 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

This commit enables the quirk when the consolidate_trbs property is set
for the Synopsys xHC. TRB cache errata fixes the SNPS xHC hang issue
when the data is scattered across small buffers which does not make
atleast MPS size for given TRB cache size of SNPS xHC.

Signed-off-by: Tejas Joglekar <joglekar@synopsys.com>
---
 drivers/usb/host/xhci-pci.c  | 3 +++
 drivers/usb/host/xhci-plat.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 2907fe4d78dd..74f82e67499a 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -260,6 +260,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 	     pdev->device == 0x9026)
 		xhci->quirks |= XHCI_RESET_PLL_ON_DISCONNECT;
 
+	if (pdev->vendor == PCI_VENDOR_ID_SYNOPSYS)
+		xhci->quirks |= XHCI_CONSOLIDATE_TRBS;
+
 	if (xhci->quirks & XHCI_RESET_ON_RESUME)
 		xhci_dbg_trace(xhci, trace_xhci_dbg_quirks,
 				"QUIRK: Resetting on resume");
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index d90cd5ec09cf..a6f0e7e3d660 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -289,6 +289,9 @@ static int xhci_plat_probe(struct platform_device *pdev)
 		if (device_property_read_bool(tmpdev, "quirk-broken-port-ped"))
 			xhci->quirks |= XHCI_BROKEN_PORT_PED;
 
+		if (device_property_read_bool(tmpdev, "consolidate-trbs"))
+			xhci->quirks |= XHCI_CONSOLIDATE_TRBS;
+
 		device_property_read_u32(tmpdev, "imod-interval-ns",
 					 &xhci->imod_interval);
 	}
-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs
  2019-12-20 13:38 [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC Tejas Joglekar
                   ` (2 preceding siblings ...)
  2019-12-20 13:40 ` [RFC PATCH 3/4] usb: xhci: Set quirk for XHCI_CONSOLIDATE_TRBS Tejas Joglekar
@ 2019-12-20 13:40 ` Tejas Joglekar
  2020-01-08 15:19   ` Rob Herring
  3 siblings, 1 reply; 11+ messages in thread
From: Tejas Joglekar @ 2019-12-20 13:40 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, devicetree, Rob Herring, Mark Rutland
  Cc: John Youn

This commit adds the documentation for consolidate-trbs, and
snps,consolidate-trbs property. These when set enables the quirk for
XHCI driver for consolidation of TRB's when small buffer sizes are
scattered over the TRB cache not making up to MPS or total transfer size
with Synopsys xHC.

Signed-off-by: Tejas Joglekar <joglekar@synopsys.com>
---
 Documentation/devicetree/bindings/usb/dwc3.txt     | 6 ++++++
 Documentation/devicetree/bindings/usb/usb-xhci.txt | 1 +
 2 files changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
index 66780a47ad85..9851da41a442 100644
--- a/Documentation/devicetree/bindings/usb/dwc3.txt
+++ b/Documentation/devicetree/bindings/usb/dwc3.txt
@@ -101,6 +101,12 @@ Optional properties:
 			this and tx-thr-num-pkt-prd to a valid, non-zero value
 			1-16 (DWC_usb31 programming guide section 1.2.3) to
 			enable periodic ESS TX threshold.
+ - snps,consolidate-trbs: enable TRBs consolidation - host mode only. Set this
+			to valid then for Synopsys xHC the TRBs would be
+			consolidated to at least MPS in order to prevent the
+			controller getting hang due to small buffer sizes which
+			does not make up to MPS size or total transfer size
+			across TRB cache.
 
  - <DEPRECATED> tx-fifo-resize: determines if the FIFO *has* to be reallocated.
  - snps,incr-burst-type-adjustment: Value for INCR burst type of GSBUSCFG0
diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt b/Documentation/devicetree/bindings/usb/usb-xhci.txt
index 3f378951d624..be8d5e6a6333 100644
--- a/Documentation/devicetree/bindings/usb/usb-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt
@@ -43,6 +43,7 @@ Optional properties:
   - quirk-broken-port-ped: set if the controller has broken port disable mechanism
   - imod-interval-ns: default interrupt moderation interval is 5000ns
   - phys : see usb-hcd.yaml in the current directory
+  - consolidate-trbs: indicate if you need to consolidate trbs
 
 additionally the properties from usb-hcd.yaml (in the current directory) are
 supported.
-- 
2.11.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
@ 2020-01-02  9:08   ` David Laight
  2020-01-03 16:44   ` Mathias Nyman
  1 sibling, 0 replies; 11+ messages in thread
From: David Laight @ 2020-01-02  9:08 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

From: Tejas Joglekar
> Sent: 20 December 2019 13:39
>
> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
> for HS. The controller loads and updates the TRB cache from the transfer
> ring in system memory whenever the driver issues a start transfer or
> update transfer command.
> 
> For chained TRBs, the Synopsys xHC requires that the total amount of
> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
> MPS. Or the chain ends within the TRB cache (with a last TRB).
> 
> If this requirement is not met, the controller will not be able to send
> or receive a packet and it will hang causing a driver timeout and error.

This ought to be raised with Synopsys and they should be remake their
silicon to meet the XHCI standard and then offer to fix all the
boards that contain their broken version :-)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
  2020-01-02  9:08   ` David Laight
@ 2020-01-03 16:44   ` Mathias Nyman
  2020-02-07 17:17     ` Tejas Joglekar
  1 sibling, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2020-01-03 16:44 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

On 20.12.2019 15.39, Tejas Joglekar wrote:
> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
> for HS. The controller loads and updates the TRB cache from the transfer
> ring in system memory whenever the driver issues a start transfer or
> update transfer command.
> 
> For chained TRBs, the Synopsys xHC requires that the total amount of
> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
> MPS. Or the chain ends within the TRB cache (with a last TRB).
> 
> If this requirement is not met, the controller will not be able to send
> or receive a packet and it will hang causing a driver timeout and error.
> 
> This can be a problem if a class driver queues SG requests with many
> small-buffer entries. The XHCI driver will create a chained TRB for each
> entry which may trigger this issue.
> 
> This patch adds logic to the XHCI driver to detect and prevent this from
> happening.
> 
> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
> the TRBs and if the chain continues and we don't make up at least 1 MPS,
> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
> into the last TRB.
> 
> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
> be a link and/or event data TRB that take up to 2 of the cache entries.
> And we consolidate the next (4 * MPS) to improve performance.
> 
> We discovered this issue with devices on other platforms but have not
> yet come across any device that triggers this on Linux. But it could be
> a real problem now or in the future. All it takes is N number of small
> chained TRBs. And other instances of the Synopsys IP may have smaller
> values for the TRB_CACHE_SIZE which would exacerbate the problem.
> 
> We verified this patch using our internal driver and testing framework.

If I understand the problem correctly you need to make sure the data pointed
to by 16 (SS) or 8 (HS) chained TRBs must be equal to, or more than max packet size.

So in theory this should only be a problem for scatter gather buffers, right?

This should already be handled by usb core unless no_sg_constraint flag is set,
usb core it makes sure each sg list entry length is max packet size divisible, also
meaning it needs to be at least max packet size. (or 0, but not an issue here)

see include/linux/usb.h: struct urb
  
* @sg: scatter gather buffer list, the buffer size of each element in
*      the list (except the last) must be divisible by the endpoint's
*      max packet size if no_sg_constraint isn't set in 'struct usb_bus'"

which is checked in drivers/usb/core/urb.c: usb_submit_urb()

for_each_sg(urb->sg, sg, urb->num_sgs - 1, i)
	if (sg->length % max)
		return -EINVAL;

So it seems all you need to do it make sure that the no_sg_constraint isn't set
for this host controller vendor.

This patch is too intrusive, its a very fine grained and complex solution to a
vendor specific issue that has not caused any real life issues in Linux.
It adds a new spinlock and list of bounce buffer to every transfer descriptor.

-Mathias


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs
  2019-12-20 13:40 ` [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs Tejas Joglekar
@ 2020-01-08 15:19   ` Rob Herring
  0 siblings, 0 replies; 11+ messages in thread
From: Rob Herring @ 2020-01-08 15:19 UTC (permalink / raw)
  To: Tejas Joglekar; +Cc: linux-usb, devicetree, Mark Rutland, John Youn

On Fri, Dec 20, 2019 at 07:10:24PM +0530, Tejas Joglekar wrote:
> This commit adds the documentation for consolidate-trbs, and
> snps,consolidate-trbs property. These when set enables the quirk for
> XHCI driver for consolidation of TRB's when small buffer sizes are
> scattered over the TRB cache not making up to MPS or total transfer size
> with Synopsys xHC.
> 
> Signed-off-by: Tejas Joglekar <joglekar@synopsys.com>
> ---
>  Documentation/devicetree/bindings/usb/dwc3.txt     | 6 ++++++
>  Documentation/devicetree/bindings/usb/usb-xhci.txt | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/usb/dwc3.txt b/Documentation/devicetree/bindings/usb/dwc3.txt
> index 66780a47ad85..9851da41a442 100644
> --- a/Documentation/devicetree/bindings/usb/dwc3.txt
> +++ b/Documentation/devicetree/bindings/usb/dwc3.txt
> @@ -101,6 +101,12 @@ Optional properties:
>  			this and tx-thr-num-pkt-prd to a valid, non-zero value
>  			1-16 (DWC_usb31 programming guide section 1.2.3) to
>  			enable periodic ESS TX threshold.
> + - snps,consolidate-trbs: enable TRBs consolidation - host mode only. Set this
> +			to valid then for Synopsys xHC the TRBs would be
> +			consolidated to at least MPS in order to prevent the
> +			controller getting hang due to small buffer sizes which
> +			does not make up to MPS size or total transfer size
> +			across TRB cache.

Why not just always enable this?

Otherwise, this should be implied by the compatible string.

>  
>   - <DEPRECATED> tx-fifo-resize: determines if the FIFO *has* to be reallocated.
>   - snps,incr-burst-type-adjustment: Value for INCR burst type of GSBUSCFG0
> diff --git a/Documentation/devicetree/bindings/usb/usb-xhci.txt b/Documentation/devicetree/bindings/usb/usb-xhci.txt
> index 3f378951d624..be8d5e6a6333 100644
> --- a/Documentation/devicetree/bindings/usb/usb-xhci.txt
> +++ b/Documentation/devicetree/bindings/usb/usb-xhci.txt
> @@ -43,6 +43,7 @@ Optional properties:
>    - quirk-broken-port-ped: set if the controller has broken port disable mechanism
>    - imod-interval-ns: default interrupt moderation interval is 5000ns
>    - phys : see usb-hcd.yaml in the current directory
> +  - consolidate-trbs: indicate if you need to consolidate trbs
>  
>  additionally the properties from usb-hcd.yaml (in the current directory) are
>  supported.
> -- 
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2020-01-03 16:44   ` Mathias Nyman
@ 2020-02-07 17:17     ` Tejas Joglekar
  2020-02-12 15:04       ` Mathias Nyman
  0 siblings, 1 reply; 11+ messages in thread
From: Tejas Joglekar @ 2020-02-07 17:17 UTC (permalink / raw)
  To: Mathias Nyman, Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

Hi,
Thanks for the review comments.
On 1/3/2020 10:14 PM, Mathias Nyman wrote:
> On 20.12.2019 15.39, Tejas Joglekar wrote:
>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>> for HS. The controller loads and updates the TRB cache from the transfer
>> ring in system memory whenever the driver issues a start transfer or
>> update transfer command.
>>
>> For chained TRBs, the Synopsys xHC requires that the total amount of
>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>
>> If this requirement is not met, the controller will not be able to send
>> or receive a packet and it will hang causing a driver timeout and error.
>>
>> This can be a problem if a class driver queues SG requests with many
>> small-buffer entries. The XHCI driver will create a chained TRB for each
>> entry which may trigger this issue.
>>
>> This patch adds logic to the XHCI driver to detect and prevent this from
>> happening.
>>
>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>> into the last TRB.
>>
>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>> be a link and/or event data TRB that take up to 2 of the cache entries.
>> And we consolidate the next (4 * MPS) to improve performance.
>>
>> We discovered this issue with devices on other platforms but have not
>> yet come across any device that triggers this on Linux. But it could be
>> a real problem now or in the future. All it takes is N number of small
>> chained TRBs. And other instances of the Synopsys IP may have smaller
>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>
>> We verified this patch using our internal driver and testing framework.
> 
> If I understand the problem correctly you need to make sure the data pointed
> to by 16 (SS) or 8 (HS) chained TRBs must be equal to, or more than max packet size.
> 
> So in theory this should only be a problem for scatter gather buffers, right?
> 
Yes, this problem could be seen only with scatter gather buffers.

> This should already be handled by usb core unless no_sg_constraint flag is set,
> usb core it makes sure each sg list entry length is max packet size divisible, also
> meaning it needs to be at least max packet size. (or 0, but not an issue here)
> 
> see include/linux/usb.h: struct urb
>  
> * @sg: scatter gather buffer list, the buffer size of each element in
> *      the list (except the last) must be divisible by the endpoint's
> *      max packet size if no_sg_constraint isn't set in 'struct usb_bus'"
> 
> which is checked in drivers/usb/core/urb.c: usb_submit_urb()
> 
> for_each_sg(urb->sg, sg, urb->num_sgs - 1, i)
>     if (sg->length % max)
>         return -EINVAL;
> 
> So it seems all you need to do it make sure that the no_sg_constraint isn't set
> for this host controller vendor.
>
My understanding is if we don't set the no_sg_constraint and sg list entry is less
than max packet then transfer would not happen with the host controller. But the 
host controller supports all lengths of SG entries even less than MPS, only thing 
is the TRB_CACHE in the controller should have at least MPS size of data spread 
across it. So not setting up no_sg_constraint is not a useful solution. 
 
> This patch is too intrusive, its a very fine grained and complex solution to a
> vendor specific issue that has not caused any real life issues in Linux.
> It adds a new spinlock and list of bounce buffer to every transfer descriptor.
> 
I understand that in a first look it looks a complex solution, but you can suggest
the modifications/changes which would be required to make the patch more readable.
I have tried to keep the implementation similar to bounce buffer implementation 
only with addition of bounce buffer list. For the spinlock case, you can take a 
call if it is required or not.

> -Mathias
> 

Thanks & Regards,
Tejas Joglekar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2020-02-07 17:17     ` Tejas Joglekar
@ 2020-02-12 15:04       ` Mathias Nyman
  2020-02-14  8:36         ` Tejas Joglekar
  0 siblings, 1 reply; 11+ messages in thread
From: Mathias Nyman @ 2020-02-12 15:04 UTC (permalink / raw)
  To: Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

On 7.2.2020 19.17, Tejas Joglekar wrote:
> Hi,
> Thanks for the review comments.
> On 1/3/2020 10:14 PM, Mathias Nyman wrote:
>> On 20.12.2019 15.39, Tejas Joglekar wrote:
>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>>> for HS. The controller loads and updates the TRB cache from the transfer
>>> ring in system memory whenever the driver issues a start transfer or
>>> update transfer command.
>>>
>>> For chained TRBs, the Synopsys xHC requires that the total amount of
>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>>
>>> If this requirement is not met, the controller will not be able to send
>>> or receive a packet and it will hang causing a driver timeout and error.
>>>
>>> This can be a problem if a class driver queues SG requests with many
>>> small-buffer entries. The XHCI driver will create a chained TRB for each
>>> entry which may trigger this issue.
>>>
>>> This patch adds logic to the XHCI driver to detect and prevent this from
>>> happening.
>>>
>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>>> into the last TRB.
>>>
>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>>> be a link and/or event data TRB that take up to 2 of the cache entries.
>>> And we consolidate the next (4 * MPS) to improve performance.
>>>
>>> We discovered this issue with devices on other platforms but have not
>>> yet come across any device that triggers this on Linux. But it could be
>>> a real problem now or in the future. All it takes is N number of small
>>> chained TRBs. And other instances of the Synopsys IP may have smaller
>>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>>
>>> We verified this patch using our internal driver and testing framework.
>>
>>
> I understand that in a first look it looks a complex solution, but you can suggest
> the modifications/changes which would be required to make the patch more readable.
> I have tried to keep the implementation similar to bounce buffer implementation 
> only with addition of bounce buffer list. For the spinlock case, you can take a 
> call if it is required or not.

In your case you know the need for a bounce buffer much earlier than in the
existing TD fragment case.

Have you looked into the struct hc_driver map_urb_for_dma() and
unmap_urb_for_dma() hooks? In those you could detect the need for a bounce
buffer, allocate it, and bluntly copy entire scattergather buffer to the 
bounce buffer. It should be fairly small anyway.

-Mathias

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs
  2020-02-12 15:04       ` Mathias Nyman
@ 2020-02-14  8:36         ` Tejas Joglekar
  0 siblings, 0 replies; 11+ messages in thread
From: Tejas Joglekar @ 2020-02-14  8:36 UTC (permalink / raw)
  To: Mathias Nyman, Tejas Joglekar, linux-usb, Mathias Nyman; +Cc: John Youn

Hi,
On 2/12/2020 8:34 PM, Mathias Nyman wrote:
> On 7.2.2020 19.17, Tejas Joglekar wrote:
>> Hi,
>> Thanks for the review comments.
>> On 1/3/2020 10:14 PM, Mathias Nyman wrote:
>>> On 20.12.2019 15.39, Tejas Joglekar wrote:
>>>> The Synopsys xHC has an internal TRB cache of size TRB_CACHE_SIZE for
>>>> each endpoint. The default value for TRB_CACHE_SIZE is 16 for SS and 8
>>>> for HS. The controller loads and updates the TRB cache from the transfer
>>>> ring in system memory whenever the driver issues a start transfer or
>>>> update transfer command.
>>>>
>>>> For chained TRBs, the Synopsys xHC requires that the total amount of
>>>> bytes for all TRBs loaded in the TRB cache be greater than or equal to 1
>>>> MPS. Or the chain ends within the TRB cache (with a last TRB).
>>>>
>>>> If this requirement is not met, the controller will not be able to send
>>>> or receive a packet and it will hang causing a driver timeout and error.
>>>>
>>>> This can be a problem if a class driver queues SG requests with many
>>>> small-buffer entries. The XHCI driver will create a chained TRB for each
>>>> entry which may trigger this issue.
>>>>
>>>> This patch adds logic to the XHCI driver to detect and prevent this from
>>>> happening.
>>>>
>>>> For every (TRB_CACHE_SIZE - 2) TRBs, we check the total buffer size of
>>>> the TRBs and if the chain continues and we don't make up at least 1 MPS,
>>>> we create a bounce buffer to consolidate up to the next (4 * MPS) TRBs
>>>> into the last TRB.
>>>>
>>>> We check at (TRB_CACHE_SIZE - 2) because it is possible that there would
>>>> be a link and/or event data TRB that take up to 2 of the cache entries.
>>>> And we consolidate the next (4 * MPS) to improve performance.
>>>>
>>>> We discovered this issue with devices on other platforms but have not
>>>> yet come across any device that triggers this on Linux. But it could be
>>>> a real problem now or in the future. All it takes is N number of small
>>>> chained TRBs. And other instances of the Synopsys IP may have smaller
>>>> values for the TRB_CACHE_SIZE which would exacerbate the problem.
>>>>
>>>> We verified this patch using our internal driver and testing framework.
>>>
>>>
>> I understand that in a first look it looks a complex solution, but you can suggest
>> the modifications/changes which would be required to make the patch more readable.
>> I have tried to keep the implementation similar to bounce buffer implementation 
>> only with addition of bounce buffer list. For the spinlock case, you can take a 
>> call if it is required or not.
> 
> In your case you know the need for a bounce buffer much earlier than in the
> existing TD fragment case.
> 
> Have you looked into the struct hc_driver map_urb_for_dma() and
> unmap_urb_for_dma() hooks? In those you could detect the need for a bounce
> buffer, allocate it, and bluntly copy entire scattergather buffer to the 
> bounce buffer. It should be fairly small anyway.
>
I will look into it, and get back to you. Thanks for the suggestion.
 
> -Mathias
> 

Regards,
Tejas Joglekar

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-20 13:38 [RFC PATCH 0/4] Add logic to consolidate TRBs for Synopsys xHC Tejas Joglekar
2019-12-20 13:39 ` [RFC PATCH 1/4] usb: xhci: Synopsys xHC consolidate TRBs Tejas Joglekar
2020-01-02  9:08   ` David Laight
2020-01-03 16:44   ` Mathias Nyman
2020-02-07 17:17     ` Tejas Joglekar
2020-02-12 15:04       ` Mathias Nyman
2020-02-14  8:36         ` Tejas Joglekar
2019-12-20 13:39 ` [RFC PATCH 2/4] usb: dwc3: Add device property consolidate-trbs Tejas Joglekar
2019-12-20 13:40 ` [RFC PATCH 3/4] usb: xhci: Set quirk for XHCI_CONSOLIDATE_TRBS Tejas Joglekar
2019-12-20 13:40 ` [RFC PATCH 4/4] dt-bindings: usb: Add snps,consolidate-trbs & consolidate-trbs Tejas Joglekar
2020-01-08 15:19   ` Rob Herring

Linux-USB Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-usb/0 linux-usb/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-usb linux-usb/ https://lore.kernel.org/linux-usb \
		linux-usb@vger.kernel.org
	public-inbox-index linux-usb

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-usb


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git