linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Christian Lamparter <chunkeey@googlemail.com>
To: John Youn <John.Youn@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Felipe Balbi <felipe.balbi@linux.intel.com>,
	"linux-mips@linux-mips.org" <linux-mips@linux-mips.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"a.seppala@gmail.com" <a.seppala@gmail.com>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4
Date: Wed, 18 May 2016 21:14:55 +0200	[thread overview]
Message-ID: <1569777.jHIobOl9fm@debian64> (raw)
In-Reply-To: <573BAE58.1010206@synopsys.com>

On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote:
> On 5/14/2016 6:11 AM, Christian Lamparter wrote:
> > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> >> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> >>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> >>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> >>>>>>>> Detecting the endianess of the
> >>>>>>>> device is probably the best future-proof solution, but it's also
> >>>>>>>> considerably more work to do in the driver, and comes with a
> >>>>>>>> tiny runtime overhead.
> >>>>>>>
> >>>>>>> The runtime overhead is probably non-measurable compared with the cost
> >>>>>>> of the actual MMIOs.
> >>>>>>
> >>>>>> Right. The code size increase is probably measurable (but still small),
> >>>>>> the runtime overhead is not.
> >>>>>
> >>>>> Ok, so no rebuts or complains have been posted.
> >>>>>
> >>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> >>>>> and it works: 
> >>>>>
> >>>>> Tested-by: Christian Lamparter <chunkeey@googlemail.com>
> >>>>>
> >>>>> So, how do we go from here? There is are two small issues with the
> >>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> >>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.  
> >>>>>
> >>>>> Arnd, can you please respin and post it (cc'd stable as well)?
> >>>>> So this is can be picked up? Or what's your plan?
> >>>>
> >>>> (I just realized my reply was stuck in my outbox, so the patch
> >>>> went out first)
> >>>>
> >>>> If I recall correctly, the rough consensus was to go with your longer
> >>>> patch in the future (fixed up for the comments that BenH and
> >>>> I sent), and I'd suggest basing it on top of a fixed version of
> >>>> my patch.
> >>> Well, but it comes with the "overhead"! So this was just as I said:
> >>> "Let's look at it and see if it's any good"... And I think it isn't
> >>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> >>> archs etc...
> >>
> >> I slightly prefer the more general patch for future kernel versions.
> >> The overhead will probably be negligible, but we can perform some
> >> testing to make sure.
> >>
> >> Can you resubmit with all gathered feedback?
> > 
> > Yes, here are the changes.
> > 
> > I've tested it on my MyBook Live Duo. The usbotg comes right up:
> > [12610.540004] dwc2 4bff80000.usbotg: USB bus 1 deregistered
> > [12612.513934] dwc2 4bff80000.usbotg: Specified GNPTXFDEP=1024 > 256
> > [12612.518756] dwc2 4bff80000.usbotg: EPs: 3, shared fifos, 2042 entries in SPRAM
> > [12612.530112] dwc2 4bff80000.usbotg: DWC OTG Controller
> > [12612.533948] dwc2 4bff80000.usbotg: new USB bus registered, assigned bus number 1
> > [12612.540083] dwc2 4bff80000.usbotg: irq 33, io mem 0x00000000
> > 
> > John: Can you run some perf test with it?
> > 
> > I've based this on:
> > 
> > commit 6ea2fffc9057a67df1994d85a7c085d899eaa25a
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Fri May 13 15:52:27 2016 +0200
> > 
> >     usb: dwc2: fix regression on big-endian PowerPC/ARM systems
> > 
> > so naturally, it needs to be applied first.
> > Most of the conversion work was done by the attached
> > coccinelle semantic patches. 
> > 
> > I had to edit the __bic32 and __orr32 helpers by hand.
> > As well as some debugfs code and stuff in gadget.c.
> > 
> 
> Thanks Christian.
> 
> I'll keep this in our internal tree and send it to Felipe later. This
> causes a bunch of conflicts that I have to fix up and I should do a
> bit of testing as well.
> 
> And since there is a patch that fixes the regression this is can wait.
> 
> Regards,
> John
---
Hey, that's really nice of you to do that :-D. Please keep me in the
loop (Cc) for those then. 

Yes, this needs definitely testing on all the affected ARCHs. 
I've attached a diff to a updated version of the patch. It
drops the special MIPS case (as requested by Arnd).

BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm
not entirely convinced that the hardware FIFOs are actually endian
neutral. But I can't verify it since my Western Digital My Book Live
only supports the host configuration (forces host mode), so I don't
know what a device in dual-mode or peripheral do here.

The reason why I think it was broken is because there's a PIO copy
to and from the HCFIFO(x) in dwc2_hc_write_packet and
dwc2_hc_read_packet access in the hcd.c file as well... And there,
the code was using the dwc2_readl and dwc2_writel to access the data.
I added special accessors for the FIFOS now:
	dwc2_readl_rep and dwc2_writel_rep.

I went all the way and implemented the helpers to do unaligned access
if necessary (not sure if adding likely branches is a good idea, as
this could be either always true or false for a specific driver the
whole time).

NB: it also fixes a "regs variable not used in dwc2_hsotg_dump" warning
if DEBUG isn't selected.

NB2: If it you need a patch against a specific tree, please
let me know.
---
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 2fa57cd..69030bb 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -42,6 +42,7 @@
 #include <linux/usb/gadget.h>
 #include <linux/usb/otg.h>
 #include <linux/usb/phy.h>
+#include <asm/unaligned.h>
 #include "hw.h"
 
 /*
@@ -958,50 +959,6 @@ enum dwc2_halt_status {
 	DWC2_HC_XFER_URB_DEQUEUE,
 };
 
-#ifdef CONFIG_MIPS
-/*
- * There are some MIPS machines that can run in either big-endian
- * or little-endian mode and that use the dwc2 register without
- * a byteswap in both ways.
- * Unlike other architectures, MIPS apparently does not require a
- * barrier before the __raw_writel() to synchronize with DMA but does
- * require the barrier after the __raw_writel() to serialize a set of
- * writes. This set of operations was added specifically for MIPS and
- * should only be used there.
- */
-static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
-			     ptrdiff_t reg)
-{
-	const void __iomem *addr = hsotg->regs + reg;
-	u32 value = __raw_readl(addr);
-
-	/*
-	 * In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
-	 * reads or writes
-	 */
-	mb();
-	return value;
-}
-
-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
-			       ptrdiff_t reg)
-{
-	const void __iomem *addr = hsotg->regs + reg;
-	__raw_writel(value, addr);
-
-	/*
-	 * In order to preserve endianness __raw_* operation is used. Therefore
-	 * a barrier is needed to ensure IO access is not re-ordered across
-	 * reads or writes
-	 */
-	mb();
-#ifdef DWC2_LOG_WRITES
-	pr_info("INFO:: wrote %08x to %p\n", value, addr);
-#endif
-}
-#else
-/* Normal architectures just use readl/write_be */
 static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
 			     ptrdiff_t reg)
 {
@@ -1014,7 +971,8 @@ static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
 }
 
 
-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
+static inline void dwc2_writel(struct dwc2_hsotg *hsotg,
+			       const u32 value,
 			       ptrdiff_t reg)
 {
 	void __iomem *addr = hsotg->regs + reg;
@@ -1028,7 +986,103 @@ static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
 	pr_info("info:: wrote %08x to %p\n", value, addr);
 #endif
 }
-#endif
+
+static inline void dwc2_readl_rep(struct dwc2_hsotg *hsotg,
+				  ptrdiff_t reg,
+				  u32 *buf, const size_t len)
+{
+	void __iomem *addr = hsotg->regs + reg;
+	size_t i, remaining = len & ~4;
+
+	if (hsotg->is_big_endian) {
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				*buf++ = ioread32be(addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = ioread32be(addr);
+
+				put_unaligned(data, buf);
+				buf++;
+			}
+		}
+	} else {
+		/* little-endian accessors */
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				*buf++ = ioread32(addr);
+
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = ioread32be(addr);
+
+				put_unaligned(data, buf);
+				buf++;
+			}
+		}
+	}
+
+	if (unlikely(remaining)) {
+		u32 data_u32;
+		u8 *buf_u8 = (u8 *) buf;
+		u8 *data_u8 = (u8 *) &data_u32;
+
+		data_u32 = dwc2_readl(hsotg, reg);
+
+		while (remaining--)
+			*buf_u8++ = *data_u8++;
+	}
+}
+
+static inline void dwc2_writel_rep(struct dwc2_hsotg *hsotg,
+				   const ptrdiff_t reg,
+				   const u32 *buf, const size_t len)
+{
+	void __iomem *addr = hsotg->regs + reg;
+	size_t i, remaining = len & ~4;
+
+	if (hsotg->is_big_endian) {
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				iowrite32be(*buf++, addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = get_unaligned(buf);
+
+				iowrite32be(data, addr);
+				buf++;
+			}
+		}
+	} else {
+		/* little-endian accessors */
+		if (likely(IS_ALIGNED(*buf, 0x4))) {
+			for (i = len >> 2; i > 0; i--)
+				iowrite32(*buf++, addr);
+		} else {
+			/* xfer_buf is not DWORD aligned */
+			for (i = len >> 2; i > 0; i--) {
+				u32 data = get_unaligned(buf);
+
+				iowrite32(data, addr);
+				buf++;
+			}
+		}
+	}
+
+	if (unlikely(remaining)) {
+		u32 data_u32;
+		u8 *buf_u8 = (u8 *) buf;
+		u8 *data_u8 = (u8 *) &data_u32;
+
+		while (remaining--)
+			*data_u8++ = *buf_u8++;
+
+		dwc2_writel(hsotg, data_u32, reg);
+	}
+}
 
 extern int dwc2_detect_endiannes(struct dwc2_hsotg *hsotg);
 
diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index 2c687d9..531b30f 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -317,7 +317,7 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
 	u32 gnptxsts = dwc2_readl(hsotg, GNPTXSTS);
 	int buf_pos = hs_req->req.actual;
 	int to_write = hs_ep->size_loaded;
-	void *data;
+	u32 *data;
 	int can_write;
 	int pkt_round;
 	int max_transfer;
@@ -457,10 +457,9 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
 	if (periodic)
 		hs_ep->fifo_load += to_write;
 
-	to_write = DIV_ROUND_UP(to_write, 4);
 	data = hs_req->req.buf + buf_pos;
 
-	iowrite32_rep(hsotg->regs + EPFIFO(hs_ep->index), data, to_write);
+	dwc2_writel_rep(hsotg, EPFIFO(hs_ep->index), data, to_write);
 
 	return (to_write >= can_write) ? -ENOSPC : 0;
 }
@@ -1439,12 +1438,11 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 {
 	struct dwc2_hsotg_ep *hs_ep = hsotg->eps_out[ep_idx];
 	struct dwc2_hsotg_req *hs_req = hs_ep->req;
-	void __iomem *fifo = hsotg->regs + EPFIFO(ep_idx);
+	u32 *data;
 	int to_read;
 	int max_req;
 	int read_ptr;
 
-
 	if (!hs_req) {
 		u32 epctl = dwc2_readl(hsotg, DOEPCTL(ep_idx));
 		int ptr;
@@ -1455,7 +1453,7 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 
 		/* dump the data from the FIFO, we've nothing we can do */
 		for (ptr = 0; ptr < size; ptr += 4)
-			(void)dwc2_readl(hsotg, EPFIFO(ep_idx));
+			(void)__raw_readl(hsotg->regs + EPFIFO(ep_idx));
 
 		return;
 	}
@@ -1479,13 +1477,14 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
 
 	hs_ep->total_data += to_read;
 	hs_req->req.actual += to_read;
-	to_read = DIV_ROUND_UP(to_read, 4);
 
 	/*
 	 * note, we might over-write the buffer end by 3 bytes depending on
 	 * alignment of the data.
 	 */
-	ioread32_rep(fifo, hs_req->req.buf + read_ptr, to_read);
+	data = hs_req->req.buf + read_ptr;
+
+	dwc2_readl_rep(hsotg, EPFIFO(ep_idx), data, to_read);
 }
 
 /**
@@ -3411,7 +3416,6 @@ static void dwc2_hsotg_dump(struct dwc2_hsotg *hsotg)
 {
 #ifdef DEBUG
 	struct device *dev = hsotg->dev;
-	void __iomem *regs = hsotg->regs;
 	u32 val;
 	int idx;
 
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index dcd6338..8568ff4 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -567,19 +567,10 @@ u32 dwc2_calc_frame_interval(struct dwc2_hsotg *hsotg)
 void dwc2_read_packet(struct dwc2_hsotg *hsotg, u8 *dest, u16 bytes)
 {
 	u32 *data_buf = (u32 *)dest;
-	int word_count = (bytes + 3) / 4;
-	int i;
-
-	/*
-	 * Todo: Account for the case where dest is not dword aligned. This
-	 * requires reading data from the FIFO into a u32 temp buffer, then
-	 * moving it into the data buffer.
-	 */
 
 	dev_vdbg(hsotg->dev, "%s(%p,%p,%d)\n", __func__, hsotg, dest, bytes);
 
-	for (i = 0; i < word_count; i++, data_buf++)
-		*data_buf = dwc2_readl(hsotg, HCFIFO(0));
+	dwc2_readl_rep(hsotg, HCFIFO(0), data_buf, bytes);
 }
 
 /**
@@ -1236,10 +1227,8 @@ static void dwc2_set_pid_isoc(struct dwc2_host_chan *chan)
 static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
 				 struct dwc2_host_chan *chan)
 {
-	u32 i;
 	u32 remaining_count;
 	u32 byte_count;
-	u32 dword_count;
 	u32 *data_buf = (u32 *)chan->xfer_buf;
 
 	if (dbg_hc(chan))
@@ -1251,20 +1240,7 @@ static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
 	else
 		byte_count = remaining_count;
 
-	dword_count = (byte_count + 3) / 4;
-
-	if (((unsigned long)data_buf & 0x3) == 0) {
-		/* xfer_buf is DWORD aligned */
-		for (i = 0; i < dword_count; i++, data_buf++)
-			dwc2_writel(hsotg, *data_buf, HCFIFO(chan->hc_num));
-	} else {
-		/* xfer_buf is not DWORD aligned */
-		for (i = 0; i < dword_count; i++, data_buf++) {
-			u32 data = data_buf[0] | data_buf[1] << 8 |
-				   data_buf[2] << 16 | data_buf[3] << 24;
-			dwc2_writel(hsotg, data, HCFIFO(chan->hc_num));
-		}
-	}
+	dwc2_writel_rep(hsotg, HCFIFO(chan->hc_num), data_buf, byte_count);
 
 	chan->xfer_count += byte_count;
 	chan->xfer_buf += byte_count;

  reply	other threads:[~2016-05-18 19:15 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-07 22:54 usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4 Christian Lamparter
2016-05-08 10:40 ` Benjamin Herrenschmidt
2016-05-08 11:44   ` Christian Lamparter
2016-05-09  0:23     ` Benjamin Herrenschmidt
2016-05-09 10:36       ` Arnd Bergmann
2016-05-09 10:39         ` Felipe Balbi
2016-05-09 15:08           ` Arnd Bergmann
2016-05-09 19:06             ` Christian Lamparter
2016-05-09 20:10               ` Arnd Bergmann
2016-05-09 22:43               ` Benjamin Herrenschmidt
2016-05-09 22:37             ` Benjamin Herrenschmidt
2016-05-10  7:23               ` Arnd Bergmann
2016-05-12  9:58                 ` Christian Lamparter
2016-05-12 11:55                   ` Arnd Bergmann
2016-05-12 13:30                     ` Christian Lamparter
2016-05-12 18:40                       ` John Youn
2016-05-12 20:39                         ` Christian Lamparter
2016-05-12 20:50                           ` Arnd Bergmann
2016-05-12 20:55                           ` John Youn
2016-05-14 13:11                         ` Christian Lamparter
2016-05-14 19:45                           ` Arnd Bergmann
2016-05-17 23:50                           ` John Youn
2016-05-18 19:14                             ` Christian Lamparter [this message]
2016-05-18 21:09                               ` Arnd Bergmann
2016-05-19  0:36                               ` John Youn
2016-05-12 22:17                   ` Benjamin Herrenschmidt
2016-05-09 22:33           ` Benjamin Herrenschmidt
2016-05-09 14:02         ` Benjamin Herrenschmidt
2016-05-09 20:22         ` John Youn
2016-05-09 20:38           ` Arnd Bergmann
2016-05-09 21:11             ` John Youn
2016-05-09 21:30               ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1569777.jHIobOl9fm@debian64 \
    --to=chunkeey@googlemail.com \
    --cc=John.Youn@synopsys.com \
    --cc=a.seppala@gmail.com \
    --cc=arnd@arndb.de \
    --cc=benh@kernel.crashing.org \
    --cc=felipe.balbi@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@linux-mips.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).