linux-mmc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] MMCI odd buffer alignment fixes
@ 2019-11-13  7:53 Linus Walleij
  2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13  7:53 UTC (permalink / raw)
  To: Ulf Hansson, linux-mmc, linux-arm-kernel, Stephan Gerhold
  Cc: Linus Walleij, Russell King

This is an attempt to fix the three independent issues seen
by Ulf in the MMCI driver.

I am unable to test patches 2 & 3 since I can't provoke the
right traffic, but I hope Stephan can try it with his
Broadcom WiFi.

Linus Walleij (2):
  mmc: mmci: Bail out from odd DMA on Ux500
  mmc: mmci: Proper PIO residue handling

Ulf Hansson (1):
  mmc: mmci: Support odd block sizes for ux500v2 and qcom variant

 drivers/mmc/host/mmci.c | 166 ++++++++++++++++++++++++++++++++++++----
 drivers/mmc/host/mmci.h |  18 ++++-
 2 files changed, 169 insertions(+), 15 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant
  2019-11-13  7:53 [PATCH 0/3] MMCI odd buffer alignment fixes Linus Walleij
@ 2019-11-13  7:53 ` Linus Walleij
  2019-11-13 10:08   ` Ludovic BARRE
  2019-11-13 11:05   ` Ulf Hansson
  2019-11-13  7:53 ` [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500 Linus Walleij
  2019-11-13  7:53 ` [PATCH 3/3] mmc: mmci: Proper PIO residue handling Linus Walleij
  2 siblings, 2 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13  7:53 UTC (permalink / raw)
  To: Ulf Hansson, linux-mmc, linux-arm-kernel, Stephan Gerhold
  Cc: Russell King, Linus Walleij, Russell King, Srinivas Kandagatla,
	Niklas Cassel, Ludovic Barre, Brian Masney

From: Ulf Hansson <ulf.hansson@linaro.org>

This is something like the 5th time this patch is posted,
so let's try to fix this now, once and for all.

For the ux500v2 variant of the PL18x block, odd block sizes are
supported. This is necessary to support some SDIO transfers
such as single bytes. This also affects the QCOM MMCI variant.

This will work fine for PIO using IRQs: SDIO packets are
accepted down to single bytes and the transfers go through
just fine.

This patch has proven necessary for enabling SDIO for WLAN on
PostmarketOS-based Ux500 platforms.

This patch is based on Ulf Hansson's patch
http://www.spinics.net/lists/linux-mmc/msg12160.html

Ulf noted on an earlier iteration in:
https://marc.info/?l=linux-mmc&m=140845189316370&w=2

"There are some prerequisites of the data buffers to supports
any block size, at least for ux500. (...) The conclusion from
the above is that we need to adopt mmci_pio_write() to handle
corner cases."

This points back to a discussion in 2012. The main point was
made by Russell in this message:
https://marc.info/?l=linux-arm-kernel&m=135351237018301&w=2

IIUC this pertains to this code (now gone from the patch):

  if (data->sg->offset & 3) {
      dev_err(...);
      return -EINVAL;
  }

This hit Stephan as he noticed that DMA (DMA40) would not work
with the MMCI driver, so this patch combined with disabling
DMA would do the trick. That way we don't toss unaligned
accesses at the DMA engine as SDIO apparently tends to
do. (This is not a problem when writing ordinary block device
blocks as these are always 512 bytes aligned on a 4-byte
boundary.)

As Ulf notes, odd SG offsets like this should be handled
by the driver even if we run it in DMA mode. I conclude
it must be the duty of the DMA driver to say NO to SG
offsets it cannot handle, or otherwise bitstuff things
around to avoid the situation.

So as a first step make sure errors are propagated upward
from the DMA engine, and assume the DMA engine will say no
to things with weird SG offsets that it cannot handle, and
then the driver will fall back to using PIO.

It might be that some DMA engines (such as the Ux500
DMA40) do not properly say no to sglists with uneven
offsets, or ignore the offset altogether resulting in
unpredictable behavior. That is in that case a bug in the
DMA driver and needs to be fixed there. I got the impression
that the Qualcomm DMA actually can handle these odd
alignments without problems.

(Make a drive-by fix for datactrl_blocksz, misspelled.)

Cc: Ludovic Barre <ludovic.barre@st.com>
Cc: Brian Masney <masneyb@onstation.org>
Cc: Stephan Gerhold <stephan@gerhold.net>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
ChangeLog v2->v3:
- Repost with the inclusion of other patches.
ChangeLog v1->v2:
- Specify odd blocksize field to 1 bit (:1)
- Specify that STMMC supports odd block sizes
- Collect Stephan's test tag
---
 drivers/mmc/host/mmci.c | 20 ++++++++++++++++----
 drivers/mmc/host/mmci.h |  6 +++++-
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index c37e70dbe250..3ffcdf78a428 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -168,6 +168,7 @@ static struct variant_data variant_ux500 = {
 	.cmdreg_srsp		= MCI_CPSM_RESPONSE,
 	.datalength_bits	= 24,
 	.datactrl_blocksz	= 11,
+	.datactrl_odd_blocksz	= true,
 	.datactrl_mask_sdio	= MCI_DPSM_ST_SDIOEN,
 	.st_sdio		= true,
 	.st_clkdiv		= true,
@@ -201,6 +202,7 @@ static struct variant_data variant_ux500v2 = {
 	.datactrl_mask_ddrmode	= MCI_DPSM_ST_DDRMODE,
 	.datalength_bits	= 24,
 	.datactrl_blocksz	= 11,
+	.datactrl_odd_blocksz	= true,
 	.datactrl_mask_sdio	= MCI_DPSM_ST_SDIOEN,
 	.st_sdio		= true,
 	.st_clkdiv		= true,
@@ -260,6 +262,7 @@ static struct variant_data variant_stm32_sdmmc = {
 	.datacnt_useless	= true,
 	.datalength_bits	= 25,
 	.datactrl_blocksz	= 14,
+	.datactrl_odd_blocksz	= true,
 	.stm32_idmabsize_mask	= GENMASK(12, 5),
 	.init			= sdmmc_variant_init,
 };
@@ -279,6 +282,7 @@ static struct variant_data variant_qcom = {
 	.data_cmd_enable	= MCI_CPSM_QCOM_DATCMD,
 	.datalength_bits	= 24,
 	.datactrl_blocksz	= 11,
+	.datactrl_odd_blocksz	= true,
 	.pwrreg_powerup		= MCI_PWR_UP,
 	.f_max			= 208000000,
 	.explicit_mclk_control	= true,
@@ -447,10 +451,11 @@ void mmci_dma_setup(struct mmci_host *host)
 static int mmci_validate_data(struct mmci_host *host,
 			      struct mmc_data *data)
 {
+	struct variant_data *variant = host->variant;
+
 	if (!data)
 		return 0;
-
-	if (!is_power_of_2(data->blksz)) {
+	if (!is_power_of_2(data->blksz) && !variant->datactrl_odd_blocksz) {
 		dev_err(mmc_dev(host->mmc),
 			"unsupported block size (%d bytes)\n", data->blksz);
 		return -EINVAL;
@@ -515,7 +520,9 @@ int mmci_dma_start(struct mmci_host *host, unsigned int datactrl)
 		 "Submit MMCI DMA job, sglen %d blksz %04x blks %04x flags %08x\n",
 		 data->sg_len, data->blksz, data->blocks, data->flags);
 
-	host->ops->dma_start(host, &datactrl);
+	ret = host->ops->dma_start(host, &datactrl);
+	if (ret)
+		return ret;
 
 	/* Trigger the DMA transfer */
 	mmci_write_datactrlreg(host, datactrl);
@@ -872,9 +879,14 @@ int mmci_dmae_prep_data(struct mmci_host *host,
 int mmci_dmae_start(struct mmci_host *host, unsigned int *datactrl)
 {
 	struct mmci_dmae_priv *dmae = host->dma_priv;
+	int ret;
 
 	host->dma_in_progress = true;
-	dmaengine_submit(dmae->desc_current);
+	ret = dma_submit_error(dmaengine_submit(dmae->desc_current));
+	if (ret < 0) {
+		host->dma_in_progress = false;
+		return ret;
+	}
 	dma_async_issue_pending(dmae->cur);
 
 	*datactrl |= MCI_DPSM_DMAENABLE;
diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
index 833236ecb31e..c7f94726eaa1 100644
--- a/drivers/mmc/host/mmci.h
+++ b/drivers/mmc/host/mmci.h
@@ -278,7 +278,10 @@ struct mmci_host;
  * @stm32_clkdiv: true if using a STM32-specific clock divider algorithm
  * @datactrl_mask_ddrmode: ddr mode mask in datactrl register.
  * @datactrl_mask_sdio: SDIO enable mask in datactrl register
- * @datactrl_blksz: block size in power of two
+ * @datactrl_blocksz: block size in power of two
+ * @datactrl_odd_blocksz: true if block any sizes are supported, such as one
+ *		      single character, as is necessary when using some SDIO
+ *		      devices.
  * @datactrl_first: true if data must be setup before send command
  * @datacnt_useless: true if you could not use datacnt register to read
  *		     remaining data
@@ -323,6 +326,7 @@ struct variant_data {
 	unsigned int		datactrl_mask_ddrmode;
 	unsigned int		datactrl_mask_sdio;
 	unsigned int		datactrl_blocksz;
+	u8			datactrl_odd_blocksz:1;
 	u8			datactrl_first:1;
 	u8			datacnt_useless:1;
 	u8			st_sdio:1;
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500
  2019-11-13  7:53 [PATCH 0/3] MMCI odd buffer alignment fixes Linus Walleij
  2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
@ 2019-11-13  7:53 ` Linus Walleij
  2019-11-13 10:29   ` Marc Gonzalez
  2019-11-13 10:54   ` Ulf Hansson
  2019-11-13  7:53 ` [PATCH 3/3] mmc: mmci: Proper PIO residue handling Linus Walleij
  2 siblings, 2 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13  7:53 UTC (permalink / raw)
  To: Ulf Hansson, linux-mmc, linux-arm-kernel, Stephan Gerhold
  Cc: Russell King, Linus Walleij, Russell King, Niklas Cassel,
	Ludovic Barre, Brian Masney

The Ux500 (at least) can only deal with DMA transactions
starting and ending on an even 4-byte aligned address.

The problem isn't in the DMA engine of the system as such:
the problem is in the state machine of the MMCI block that
has some features to handle single bytes but it seems like
it doesn't quite work.

This problem is probably caused by most of the testing
being done on mass storage, which will be 512-bytes aligned
blocks placed neatly in pages and practically never run into
this situation.

On SDIO (for example in WiFi adapters) this situation is
common.

By avoiding any such transfers with a special vendor flag,
we can bail out to PIO when an odd transfer is detected
while keeping DMA for large transfers of evenly aligned
packages also for SDIO.

Cc: Ludovic Barre <ludovic.barre@st.com>
Cc: Brian Masney <masneyb@onstation.org>
Cc: Stephan Gerhold <stephan@gerhold.net>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
ChangeLog v1->v3:
- New patch in v3 after discussion with Ulf
---
 drivers/mmc/host/mmci.c | 21 +++++++++++++++++++++
 drivers/mmc/host/mmci.h | 10 ++++++++++
 2 files changed, 31 insertions(+)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 3ffcdf78a428..a08cd845dddc 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -185,6 +185,7 @@ static struct variant_data variant_ux500 = {
 	.irq_pio_mask		= MCI_IRQ_PIO_MASK,
 	.start_err		= MCI_STARTBITERR,
 	.opendrain		= MCI_OD,
+	.only_long_aligned_dma	= true,
 	.init			= mmci_variant_init,
 };
 
@@ -219,6 +220,7 @@ static struct variant_data variant_ux500v2 = {
 	.irq_pio_mask		= MCI_IRQ_PIO_MASK,
 	.start_err		= MCI_STARTBITERR,
 	.opendrain		= MCI_OD,
+	.only_long_aligned_dma	= true,
 	.init			= ux500v2_variant_init,
 };
 
@@ -829,6 +831,25 @@ static int _mmci_dmae_prep_data(struct mmci_host *host, struct mmc_data *data,
 	if (data->blksz * data->blocks <= variant->fifosize)
 		return -EINVAL;
 
+	/*
+	 * Handle the variants with DMA that is broken such that start and
+	 * end address must be aligned on a long (32bit) boundary for the DMA
+	 * to work. If this occurs, fall back to PIO.
+	 */
+	if (host->variant->only_long_aligned_dma) {
+		struct scatterlist *sg;
+		int tmp;
+
+		for_each_sg(data->sg, sg, data->sg_len, tmp) {
+			/* We start in some odd place, that doesn't work */
+			if (sg->offset & 3)
+				return -EINVAL;
+			/* We end in some odd place, that doesn't work */
+			if (sg->length & 3)
+				return -EINVAL;
+		}
+	}
+
 	device = chan->device;
 	nr_sg = dma_map_sg(device->dev, data->sg, data->sg_len,
 			   mmc_get_dma_dir(data));
diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
index c7f94726eaa1..e20af17bb313 100644
--- a/drivers/mmc/host/mmci.h
+++ b/drivers/mmc/host/mmci.h
@@ -307,6 +307,15 @@ struct mmci_host;
  *	       register.
  * @opendrain: bitmask identifying the OPENDRAIN bit inside MMCIPOWER register
  * @dma_lli: true if variant has dma link list feature.
+ * @only_long_aligned_dma: it appears that the Ux500 has a broken DMA logic for
+ *	       single bytes when either the transfer starts at an odd offset or
+ *	       the final DMA burst is an odd (not divisible by 4) address.
+ *	       Reading must start and end on an even 4-byte boundary, i.e. an
+ *	       even 32bit word in memory. If this is not the case, we need to
+ *	       fall back to PIO for that request. For bulk transfers to mass
+ *	       storage we are almost exclusively dealing with 512-byte chunks
+ *	       allocated at an even address so this is usually only manifesting
+ *	       in SDIO.
  * @stm32_idmabsize_mask: stm32 sdmmc idma buffer size.
  */
 struct variant_data {
@@ -350,6 +359,7 @@ struct variant_data {
 	u32			start_err;
 	u32			opendrain;
 	u8			dma_lli:1;
+	u8			only_long_aligned_dma:1;
 	u32			stm32_idmabsize_mask;
 	void (*init)(struct mmci_host *host);
 };
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] mmc: mmci: Proper PIO residue handling
  2019-11-13  7:53 [PATCH 0/3] MMCI odd buffer alignment fixes Linus Walleij
  2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
  2019-11-13  7:53 ` [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500 Linus Walleij
@ 2019-11-13  7:53 ` Linus Walleij
  2019-11-13 13:24   ` Linus Walleij
  2019-11-13 14:03   ` Ulf Hansson
  2 siblings, 2 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13  7:53 UTC (permalink / raw)
  To: Ulf Hansson, linux-mmc, linux-arm-kernel, Stephan Gerhold
  Cc: Russell King, Linus Walleij, Russell King, Niklas Cassel,
	Ludovic Barre, Brian Masney

The MMCI PIO write function contains a bug if using any buffers
with sglist miter contents that do not add upp to even 32bit
writes. The PIO write code currently does this:

iowrite32_rep(base + MMCIFIFO, ptr, (count + 3) >> 2);

This will make sure that a single buffer passed in gets written
into the FIFO even if it is odd, i.e. not evenly divisible
by 4.

However it has the following problems:

- It will read up to three bytes beyond the end of the buffer
  and fill the FIFO with unpredictable junk and possibly cause
  segmentation violations if the read passes a page boundary
  such as at the end of an sglist buffer.

- It will be fine if a single buffer is passed in, but the code
  handles SG lists. There is a while (1) loop in mmci_pio_irq()
  which repeatedly calls mmci_pio_write() as long as the FIFO
  is not half-full, if it gets half-full it exits the IRQ and
  waits for a new IRQ, also the while loop repeatedly calls
  sg_miter_next() to consume bytes and this means that between
  subsequent calls to mmci_pio_write() some random junk can be
  inserted at the end of each call if the buffers passed in
  do not contain an number of bytes evenly divisible by 4.

Fix this by simply doing this:

iowrite32_rep(base + MMCIFIFO, ptr, count >> 2);

But since the code will then just not write enough bytes if
the count is not evenly divisible by 4, we introduce a special
residue buffer and keep track of any odd bytes from 1..3 and
just write these padded out with new data in a separate
statement the next time we get a call of the mmci_pio_write(),
or, if there is for example only 1,2 or 3 bytes in the transfer,
or we end up with an odd number of bytes in the residue,
flushi it out when we end the current data transaction to
the host.

Cc: Ludovic Barre <ludovic.barre@st.com>
Cc: Brian Masney <masneyb@onstation.org>
Cc: Stephan Gerhold <stephan@gerhold.net>
Cc: Niklas Cassel <niklas.cassel@linaro.org>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
ChangeLog v1->v3:
- New patch in v3 after discussion with Ulf
- I'm unable to test this because I cannot provoke
  these odd reads/writes but hoping for Stephan to take it
  for a spin.
---
 drivers/mmc/host/mmci.c | 125 ++++++++++++++++++++++++++++++++++++----
 drivers/mmc/host/mmci.h |   2 +
 2 files changed, 117 insertions(+), 10 deletions(-)

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index a08cd845dddc..0d01689016f0 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -607,6 +607,7 @@ static void mmci_init_sg(struct mmci_host *host, struct mmc_data *data)
 		flags |= SG_MITER_FROM_SG;
 
 	sg_miter_start(&host->sg_miter, data->sg, data->sg_len, flags);
+	host->pio_write_residue_sz = 0;
 }
 
 static u32 mmci_get_dctrl_cfg(struct mmci_host *host)
@@ -1422,30 +1423,111 @@ static int mmci_pio_write(struct mmci_host *host, char *buffer, unsigned int rem
 	struct variant_data *variant = host->variant;
 	void __iomem *base = host->base;
 	char *ptr = buffer;
+	bool wrote_residue = false;
+	int i;
+
+	/*
+	 * This will normally not happen during block I/O, but can
+	 * happen during SDIO traffic, where odd byte chunks are
+	 * shoveled to the SDIO link. Fill up the residue buffer
+	 * and flush out.
+	 */
+	if (host->pio_write_residue_sz) {
+		int fill = 4 - host->pio_write_residue_sz;
+		u32 val = 0;
+
+		/*
+		 * If we need more padding that what we have, just stash
+		 * some more in the residue then and continue. This only
+		 * happens if we're passed e.g. 1 or 2 bytes and there is
+		 * just 1 byte residue residue already: we can't fill up!
+		 */
+		if (fill > remain) {
+			for (i = 0; i < fill; i++) {
+				host->pio_write_residue[host->pio_write_residue_sz + i] = *ptr;
+				host->pio_write_residue_sz++;
+				ptr++;
+				remain--;
+			}
+			return ptr - buffer;
+		}
+
+		/* Pack the residue into a 32bit word */
+		for (i = 0; i < host->pio_write_residue_sz; i++) {
+			val |= host->pio_write_residue[i];
+			val <<= 8;
+		}
+		/* Top up with new data */
+		for (i = 0; i < fill; i++) {
+			val |= *ptr;
+			val <<= 8;
+			ptr++;
+			remain--;
+		}
+		iowrite32(val, base + MMCIFIFO);
+		host->pio_write_residue_sz = 0;
+		wrote_residue = true;
+	}
+
+	/*
+	 * Maybe the residue plus some few bytes was exactly filling a
+	 * 32bit word.
+	 */
+	if (!remain)
+		return ptr - buffer;
 
 	do {
-		unsigned int count, maxcnt;
+		unsigned int count, maxcnt, written, residue;
 
+		/*
+		 * If the FIFO is empty just stash it full with data else
+		 * just half-fill it.
+		 */
 		maxcnt = status & MCI_TXFIFOEMPTY ?
-			 variant->fifosize : variant->fifohalfsize;
+			variant->fifosize : variant->fifohalfsize;
+
+		/* Watch it to not overfill the FIFO */
+		if (wrote_residue)
+			maxcnt -= 4;
+
 		count = min(remain, maxcnt);
 
 		/*
 		 * SDIO especially may want to send something that is
 		 * not divisible by 4 (as opposed to card sectors
-		 * etc), and the FIFO only accept full 32-bit writes.
-		 * So compensate by adding +3 on the count, a single
-		 * byte become a 32bit write, 7 bytes will be two
-		 * 32bit writes etc.
+		 * etc), but the FIFO only accepts full 32-bit writes so start by
+		 * just filling up quickly with as much as we can.
 		 */
-		iowrite32_rep(base + MMCIFIFO, ptr, (count + 3) >> 2);
+		iowrite32_rep(base + MMCIFIFO, ptr, count >> 2);
+		residue = count & 3;
 
-		ptr += count;
-		remain -= count;
+		written = count - residue;
+		ptr += written;
+		remain -= written;
 
-		if (remain == 0)
+		/* Handles the common case for block-IO */
+		if (!remain)
 			break;
 
+		/*
+		 * These were not written out since we only write 32bit
+		 * words to the FIFO. Since this function gets called
+		 * repeatedly and across page boundaries with new pointers
+		 * in *buffer we need to take special care to stash odd
+		 * bytes away and flush them out in next PIO round or at the
+		 * end of the whole data transfer.
+		 */
+		if (remain < 4) {
+			WARN_ON(remain != residue);
+			host->pio_write_residue_sz = residue;
+			for (i = 0; i < residue; i++) {
+				host->pio_write_residue[i] = *ptr;
+				ptr++;
+				remain--;
+			}
+			break;
+		}
+
 		status = readl(base + MMCISTATUS);
 	} while (status & MCI_TXFIFOHALFEMPTY);
 
@@ -1522,6 +1604,29 @@ static irqreturn_t mmci_pio_irq(int irq, void *dev_id)
 	if (host->size == 0) {
 		mmci_set_mask1(host, 0);
 		writel(readl(base + MMCIMASK0) | MCI_DATAENDMASK, base + MMCIMASK0);
+
+		if ((status & MCI_TXACTIVE) && host->pio_write_residue_sz) {
+			/*
+			 * If the last pass of the SG miter left some residue after the pio
+			 * write loop, push that out to the card. We are now at the end of
+			 * the transfer so it is OK to pad with zeroes.
+			 */
+			int fill = 4 - host->pio_write_residue_sz;
+			u32 val = 0;
+			int i;
+
+			/* Pack the residue into a 32bit word */
+			for (i = 0; i < host->pio_write_residue_sz; i++) {
+				val |= host->pio_write_residue[i];
+				val <<= 8;
+			}
+			/* Top up with zeroes */
+			for (i = 0; i < fill; i++)
+				val <<= 8;
+
+			iowrite32(val, base + MMCIFIFO);
+			host->pio_write_residue_sz = 0;
+		}
 	}
 
 	return IRQ_HANDLED;
diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
index e20af17bb313..a7690b64d4cd 100644
--- a/drivers/mmc/host/mmci.h
+++ b/drivers/mmc/host/mmci.h
@@ -425,6 +425,8 @@ struct mmci_host {
 	/* pio stuff */
 	struct sg_mapping_iter	sg_miter;
 	unsigned int		size;
+	u8			pio_write_residue[4];
+	u8			pio_write_residue_sz;
 	int (*get_rx_fifocnt)(struct mmci_host *h, u32 status, int remain);
 
 	u8			use_dma:1;
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant
  2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
@ 2019-11-13 10:08   ` Ludovic BARRE
  2019-11-13 11:05   ` Ulf Hansson
  1 sibling, 0 replies; 11+ messages in thread
From: Ludovic BARRE @ 2019-11-13 10:08 UTC (permalink / raw)
  To: Linus Walleij, Ulf Hansson, linux-mmc, linux-arm-kernel, Stephan Gerhold
  Cc: Niklas Cassel, Srinivas Kandagatla, Russell King, Russell King,
	Brian Masney

Tested-by: Ludovic Barre <ludovic.barre@st.com>

Le 11/13/19 à 8:53 AM, Linus Walleij a écrit :
> From: Ulf Hansson <ulf.hansson@linaro.org>
> 
> This is something like the 5th time this patch is posted,
> so let's try to fix this now, once and for all.
> 
> For the ux500v2 variant of the PL18x block, odd block sizes are
> supported. This is necessary to support some SDIO transfers
> such as single bytes. This also affects the QCOM MMCI variant.
> 
> This will work fine for PIO using IRQs: SDIO packets are
> accepted down to single bytes and the transfers go through
> just fine.
> 
> This patch has proven necessary for enabling SDIO for WLAN on
> PostmarketOS-based Ux500 platforms.
> 
> This patch is based on Ulf Hansson's patch
> http://www.spinics.net/lists/linux-mmc/msg12160.html
> 
> Ulf noted on an earlier iteration in:
> https://marc.info/?l=linux-mmc&m=140845189316370&w=2
> 
> "There are some prerequisites of the data buffers to supports
> any block size, at least for ux500. (...) The conclusion from
> the above is that we need to adopt mmci_pio_write() to handle
> corner cases."
> 
> This points back to a discussion in 2012. The main point was
> made by Russell in this message:
> https://marc.info/?l=linux-arm-kernel&m=135351237018301&w=2
> 
> IIUC this pertains to this code (now gone from the patch):
> 
>    if (data->sg->offset & 3) {
>        dev_err(...);
>        return -EINVAL;
>    }
> 
> This hit Stephan as he noticed that DMA (DMA40) would not work
> with the MMCI driver, so this patch combined with disabling
> DMA would do the trick. That way we don't toss unaligned
> accesses at the DMA engine as SDIO apparently tends to
> do. (This is not a problem when writing ordinary block device
> blocks as these are always 512 bytes aligned on a 4-byte
> boundary.)
> 
> As Ulf notes, odd SG offsets like this should be handled
> by the driver even if we run it in DMA mode. I conclude
> it must be the duty of the DMA driver to say NO to SG
> offsets it cannot handle, or otherwise bitstuff things
> around to avoid the situation.
> 
> So as a first step make sure errors are propagated upward
> from the DMA engine, and assume the DMA engine will say no
> to things with weird SG offsets that it cannot handle, and
> then the driver will fall back to using PIO.
> 
> It might be that some DMA engines (such as the Ux500
> DMA40) do not properly say no to sglists with uneven
> offsets, or ignore the offset altogether resulting in
> unpredictable behavior. That is in that case a bug in the
> DMA driver and needs to be fixed there. I got the impression
> that the Qualcomm DMA actually can handle these odd
> alignments without problems.
> 
> (Make a drive-by fix for datactrl_blocksz, misspelled.)
> 
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Brian Masney <masneyb@onstation.org>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Tested-by: Stephan Gerhold <stephan@gerhold.net>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v2->v3:
> - Repost with the inclusion of other patches.
> ChangeLog v1->v2:
> - Specify odd blocksize field to 1 bit (:1)
> - Specify that STMMC supports odd block sizes
> - Collect Stephan's test tag
> ---
>   drivers/mmc/host/mmci.c | 20 ++++++++++++++++----
>   drivers/mmc/host/mmci.h |  6 +++++-
>   2 files changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index c37e70dbe250..3ffcdf78a428 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -168,6 +168,7 @@ static struct variant_data variant_ux500 = {
>   	.cmdreg_srsp		= MCI_CPSM_RESPONSE,
>   	.datalength_bits	= 24,
>   	.datactrl_blocksz	= 11,
> +	.datactrl_odd_blocksz	= true,
>   	.datactrl_mask_sdio	= MCI_DPSM_ST_SDIOEN,
>   	.st_sdio		= true,
>   	.st_clkdiv		= true,
> @@ -201,6 +202,7 @@ static struct variant_data variant_ux500v2 = {
>   	.datactrl_mask_ddrmode	= MCI_DPSM_ST_DDRMODE,
>   	.datalength_bits	= 24,
>   	.datactrl_blocksz	= 11,
> +	.datactrl_odd_blocksz	= true,
>   	.datactrl_mask_sdio	= MCI_DPSM_ST_SDIOEN,
>   	.st_sdio		= true,
>   	.st_clkdiv		= true,
> @@ -260,6 +262,7 @@ static struct variant_data variant_stm32_sdmmc = {
>   	.datacnt_useless	= true,
>   	.datalength_bits	= 25,
>   	.datactrl_blocksz	= 14,
> +	.datactrl_odd_blocksz	= true,
>   	.stm32_idmabsize_mask	= GENMASK(12, 5),
>   	.init			= sdmmc_variant_init,
>   };
> @@ -279,6 +282,7 @@ static struct variant_data variant_qcom = {
>   	.data_cmd_enable	= MCI_CPSM_QCOM_DATCMD,
>   	.datalength_bits	= 24,
>   	.datactrl_blocksz	= 11,
> +	.datactrl_odd_blocksz	= true,
>   	.pwrreg_powerup		= MCI_PWR_UP,
>   	.f_max			= 208000000,
>   	.explicit_mclk_control	= true,
> @@ -447,10 +451,11 @@ void mmci_dma_setup(struct mmci_host *host)
>   static int mmci_validate_data(struct mmci_host *host,
>   			      struct mmc_data *data)
>   {
> +	struct variant_data *variant = host->variant;
> +
>   	if (!data)
>   		return 0;
> -
> -	if (!is_power_of_2(data->blksz)) {
> +	if (!is_power_of_2(data->blksz) && !variant->datactrl_odd_blocksz) {
>   		dev_err(mmc_dev(host->mmc),
>   			"unsupported block size (%d bytes)\n", data->blksz);
>   		return -EINVAL;
> @@ -515,7 +520,9 @@ int mmci_dma_start(struct mmci_host *host, unsigned int datactrl)
>   		 "Submit MMCI DMA job, sglen %d blksz %04x blks %04x flags %08x\n",
>   		 data->sg_len, data->blksz, data->blocks, data->flags);
>   
> -	host->ops->dma_start(host, &datactrl);
> +	ret = host->ops->dma_start(host, &datactrl);
> +	if (ret)
> +		return ret;
>   
>   	/* Trigger the DMA transfer */
>   	mmci_write_datactrlreg(host, datactrl);
> @@ -872,9 +879,14 @@ int mmci_dmae_prep_data(struct mmci_host *host,
>   int mmci_dmae_start(struct mmci_host *host, unsigned int *datactrl)
>   {
>   	struct mmci_dmae_priv *dmae = host->dma_priv;
> +	int ret;
>   
>   	host->dma_in_progress = true;
> -	dmaengine_submit(dmae->desc_current);
> +	ret = dma_submit_error(dmaengine_submit(dmae->desc_current));
> +	if (ret < 0) {
> +		host->dma_in_progress = false;
> +		return ret;
> +	}
>   	dma_async_issue_pending(dmae->cur);
>   
>   	*datactrl |= MCI_DPSM_DMAENABLE;
> diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
> index 833236ecb31e..c7f94726eaa1 100644
> --- a/drivers/mmc/host/mmci.h
> +++ b/drivers/mmc/host/mmci.h
> @@ -278,7 +278,10 @@ struct mmci_host;
>    * @stm32_clkdiv: true if using a STM32-specific clock divider algorithm
>    * @datactrl_mask_ddrmode: ddr mode mask in datactrl register.
>    * @datactrl_mask_sdio: SDIO enable mask in datactrl register
> - * @datactrl_blksz: block size in power of two
> + * @datactrl_blocksz: block size in power of two
> + * @datactrl_odd_blocksz: true if block any sizes are supported, such as one
> + *		      single character, as is necessary when using some SDIO
> + *		      devices.
>    * @datactrl_first: true if data must be setup before send command
>    * @datacnt_useless: true if you could not use datacnt register to read
>    *		     remaining data
> @@ -323,6 +326,7 @@ struct variant_data {
>   	unsigned int		datactrl_mask_ddrmode;
>   	unsigned int		datactrl_mask_sdio;
>   	unsigned int		datactrl_blocksz;
> +	u8			datactrl_odd_blocksz:1;
>   	u8			datactrl_first:1;
>   	u8			datacnt_useless:1;
>   	u8			st_sdio:1;
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500
  2019-11-13  7:53 ` [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500 Linus Walleij
@ 2019-11-13 10:29   ` Marc Gonzalez
  2019-11-13 10:54   ` Ulf Hansson
  1 sibling, 0 replies; 11+ messages in thread
From: Marc Gonzalez @ 2019-11-13 10:29 UTC (permalink / raw)
  To: Linus Walleij, Ulf Hansson
  Cc: Stephan Gerhold, MMC, Russell King, Linux ARM, Niklas Cassel,
	Ludovic Barre, Brian Masney

On 13/11/2019 08:53, Linus Walleij wrote:

> The Ux500 (at least) can only deal with DMA transactions
> starting and ending on an even 4-byte aligned address.
> 
> The problem isn't in the DMA engine of the system as such:
> the problem is in the state machine of the MMCI block that
> has some features to handle single bytes but it seems like
> it doesn't quite work.
> 
> This problem is probably caused by most of the testing
> being done on mass storage, which will be 512-bytes aligned
> blocks placed neatly in pages and practically never run into
> this situation.
> 
> On SDIO (for example in WiFi adapters) this situation is
> common.
> 
> By avoiding any such transfers with a special vendor flag,
> we can bail out to PIO when an odd transfer is detected
> while keeping DMA for large transfers of evenly aligned
> packages also for SDIO.
> 
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Brian Masney <masneyb@onstation.org>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v1->v3:
> - New patch in v3 after discussion with Ulf
> ---
>  drivers/mmc/host/mmci.c | 21 +++++++++++++++++++++
>  drivers/mmc/host/mmci.h | 10 ++++++++++
>  2 files changed, 31 insertions(+)
> 
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index 3ffcdf78a428..a08cd845dddc 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -185,6 +185,7 @@ static struct variant_data variant_ux500 = {
>  	.irq_pio_mask		= MCI_IRQ_PIO_MASK,
>  	.start_err		= MCI_STARTBITERR,
>  	.opendrain		= MCI_OD,
> +	.only_long_aligned_dma	= true,
>  	.init			= mmci_variant_init,
>  };
>  
> @@ -219,6 +220,7 @@ static struct variant_data variant_ux500v2 = {
>  	.irq_pio_mask		= MCI_IRQ_PIO_MASK,
>  	.start_err		= MCI_STARTBITERR,
>  	.opendrain		= MCI_OD,
> +	.only_long_aligned_dma	= true,
>  	.init			= ux500v2_variant_init,
>  };
>  
> @@ -829,6 +831,25 @@ static int _mmci_dmae_prep_data(struct mmci_host *host, struct mmc_data *data,
>  	if (data->blksz * data->blocks <= variant->fifosize)
>  		return -EINVAL;
>  
> +	/*
> +	 * Handle the variants with DMA that is broken such that start and
> +	 * end address must be aligned on a long (32bit) boundary for the DMA
> +	 * to work. If this occurs, fall back to PIO.
> +	 */

Nit: why use 'long' as a synonym for "32 bits" ?

Why not name the field "only_32b_aligned_dma" ?

(The size of C's long int is implementation-defined; most 64-bit platforms
have a 64-bit long int.)

Perhaps the ship has already sailed -- what with readl/writel...

Regards.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500
  2019-11-13  7:53 ` [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500 Linus Walleij
  2019-11-13 10:29   ` Marc Gonzalez
@ 2019-11-13 10:54   ` Ulf Hansson
  1 sibling, 0 replies; 11+ messages in thread
From: Ulf Hansson @ 2019-11-13 10:54 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Russell King, Stephan Gerhold, linux-mmc, Russell King,
	Linux ARM, Niklas Cassel, Ludovic Barre, Brian Masney

On Wed, 13 Nov 2019 at 08:53, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> The Ux500 (at least) can only deal with DMA transactions
> starting and ending on an even 4-byte aligned address.
>
> The problem isn't in the DMA engine of the system as such:
> the problem is in the state machine of the MMCI block that
> has some features to handle single bytes but it seems like
> it doesn't quite work.
>
> This problem is probably caused by most of the testing
> being done on mass storage, which will be 512-bytes aligned
> blocks placed neatly in pages and practically never run into
> this situation.
>
> On SDIO (for example in WiFi adapters) this situation is
> common.
>
> By avoiding any such transfers with a special vendor flag,
> we can bail out to PIO when an odd transfer is detected
> while keeping DMA for large transfers of evenly aligned
> packages also for SDIO.
>
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Brian Masney <masneyb@onstation.org>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v1->v3:
> - New patch in v3 after discussion with Ulf
> ---
>  drivers/mmc/host/mmci.c | 21 +++++++++++++++++++++
>  drivers/mmc/host/mmci.h | 10 ++++++++++
>  2 files changed, 31 insertions(+)
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index 3ffcdf78a428..a08cd845dddc 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -185,6 +185,7 @@ static struct variant_data variant_ux500 = {
>         .irq_pio_mask           = MCI_IRQ_PIO_MASK,
>         .start_err              = MCI_STARTBITERR,
>         .opendrain              = MCI_OD,
> +       .only_long_aligned_dma  = true,
>         .init                   = mmci_variant_init,
>  };
>
> @@ -219,6 +220,7 @@ static struct variant_data variant_ux500v2 = {
>         .irq_pio_mask           = MCI_IRQ_PIO_MASK,
>         .start_err              = MCI_STARTBITERR,
>         .opendrain              = MCI_OD,
> +       .only_long_aligned_dma  = true,
>         .init                   = ux500v2_variant_init,
>  };
>
> @@ -829,6 +831,25 @@ static int _mmci_dmae_prep_data(struct mmci_host *host, struct mmc_data *data,
>         if (data->blksz * data->blocks <= variant->fifosize)
>                 return -EINVAL;
>
> +       /*
> +        * Handle the variants with DMA that is broken such that start and
> +        * end address must be aligned on a long (32bit) boundary for the DMA
> +        * to work. If this occurs, fall back to PIO.
> +        */
> +       if (host->variant->only_long_aligned_dma) {
> +               struct scatterlist *sg;
> +               int tmp;
> +
> +               for_each_sg(data->sg, sg, data->sg_len, tmp) {
> +                       /* We start in some odd place, that doesn't work */
> +                       if (sg->offset & 3)
> +                               return -EINVAL;
> +                       /* We end in some odd place, that doesn't work */
> +                       if (sg->length & 3)
> +                               return -EINVAL;
> +               }

Assuming the data i allocated consecutive (is that a wrong assumption?)...

...then it should be sufficient to check only the first sg-element in
the list having a divisible address offset by 4 (sg->offset & 0x3) and
then check that the total request size is also divisible by 4
((data->blksz * data->blocks) & 0x3). That should give the same
result.

 In this way you don't need to iterate through the sg-list.

> +       }
> +
>         device = chan->device;
>         nr_sg = dma_map_sg(device->dev, data->sg, data->sg_len,
>                            mmc_get_dma_dir(data));
> diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
> index c7f94726eaa1..e20af17bb313 100644
> --- a/drivers/mmc/host/mmci.h
> +++ b/drivers/mmc/host/mmci.h
> @@ -307,6 +307,15 @@ struct mmci_host;
>   *            register.
>   * @opendrain: bitmask identifying the OPENDRAIN bit inside MMCIPOWER register
>   * @dma_lli: true if variant has dma link list feature.
> + * @only_long_aligned_dma: it appears that the Ux500 has a broken DMA logic for
> + *            single bytes when either the transfer starts at an odd offset or
> + *            the final DMA burst is an odd (not divisible by 4) address.
> + *            Reading must start and end on an even 4-byte boundary, i.e. an
> + *            even 32bit word in memory. If this is not the case, we need to
> + *            fall back to PIO for that request. For bulk transfers to mass
> + *            storage we are almost exclusively dealing with 512-byte chunks
> + *            allocated at an even address so this is usually only manifesting
> + *            in SDIO.
>   * @stm32_idmabsize_mask: stm32 sdmmc idma buffer size.
>   */
>  struct variant_data {
> @@ -350,6 +359,7 @@ struct variant_data {
>         u32                     start_err;
>         u32                     opendrain;
>         u8                      dma_lli:1;
> +       u8                      only_long_aligned_dma:1;
>         u32                     stm32_idmabsize_mask;
>         void (*init)(struct mmci_host *host);
>  };
> --
> 2.21.0
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant
  2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
  2019-11-13 10:08   ` Ludovic BARRE
@ 2019-11-13 11:05   ` Ulf Hansson
  2019-11-13 13:17     ` Linus Walleij
  1 sibling, 1 reply; 11+ messages in thread
From: Ulf Hansson @ 2019-11-13 11:05 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Russell King, Stephan Gerhold, linux-mmc, Russell King,
	Srinivas Kandagatla, Linux ARM, Niklas Cassel, Ludovic Barre,
	Brian Masney

On Wed, 13 Nov 2019 at 08:53, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> From: Ulf Hansson <ulf.hansson@linaro.org>
>
> This is something like the 5th time this patch is posted,
> so let's try to fix this now, once and for all.
>
> For the ux500v2 variant of the PL18x block, odd block sizes are
> supported. This is necessary to support some SDIO transfers
> such as single bytes. This also affects the QCOM MMCI variant.
>
> This will work fine for PIO using IRQs: SDIO packets are
> accepted down to single bytes and the transfers go through
> just fine.
>
> This patch has proven necessary for enabling SDIO for WLAN on
> PostmarketOS-based Ux500 platforms.
>
> This patch is based on Ulf Hansson's patch
> http://www.spinics.net/lists/linux-mmc/msg12160.html
>
> Ulf noted on an earlier iteration in:
> https://marc.info/?l=linux-mmc&m=140845189316370&w=2
>
> "There are some prerequisites of the data buffers to supports
> any block size, at least for ux500. (...) The conclusion from
> the above is that we need to adopt mmci_pio_write() to handle
> corner cases."
>
> This points back to a discussion in 2012. The main point was
> made by Russell in this message:
> https://marc.info/?l=linux-arm-kernel&m=135351237018301&w=2
>
> IIUC this pertains to this code (now gone from the patch):
>
>   if (data->sg->offset & 3) {
>       dev_err(...);
>       return -EINVAL;
>   }
>
> This hit Stephan as he noticed that DMA (DMA40) would not work
> with the MMCI driver, so this patch combined with disabling
> DMA would do the trick. That way we don't toss unaligned
> accesses at the DMA engine as SDIO apparently tends to
> do. (This is not a problem when writing ordinary block device
> blocks as these are always 512 bytes aligned on a 4-byte
> boundary.)
>
> As Ulf notes, odd SG offsets like this should be handled
> by the driver even if we run it in DMA mode. I conclude
> it must be the duty of the DMA driver to say NO to SG
> offsets it cannot handle, or otherwise bitstuff things
> around to avoid the situation.
>
> So as a first step make sure errors are propagated upward
> from the DMA engine, and assume the DMA engine will say no
> to things with weird SG offsets that it cannot handle, and
> then the driver will fall back to using PIO.
>
> It might be that some DMA engines (such as the Ux500
> DMA40) do not properly say no to sglists with uneven
> offsets, or ignore the offset altogether resulting in
> unpredictable behavior. That is in that case a bug in the
> DMA driver and needs to be fixed there. I got the impression
> that the Qualcomm DMA actually can handle these odd
> alignments without problems.
>
> (Make a drive-by fix for datactrl_blocksz, misspelled.)
>
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Brian Masney <masneyb@onstation.org>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Tested-by: Stephan Gerhold <stephan@gerhold.net>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>

There is no need to keep my authorship of the patch, it's very much
different than the original. I would rather replace it that with a
suggested-by tag.

> ---
> ChangeLog v2->v3:
> - Repost with the inclusion of other patches.
> ChangeLog v1->v2:
> - Specify odd blocksize field to 1 bit (:1)
> - Specify that STMMC supports odd block sizes
> - Collect Stephan's test tag
> ---
>  drivers/mmc/host/mmci.c | 20 ++++++++++++++++----
>  drivers/mmc/host/mmci.h |  6 +++++-
>  2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index c37e70dbe250..3ffcdf78a428 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -168,6 +168,7 @@ static struct variant_data variant_ux500 = {
>         .cmdreg_srsp            = MCI_CPSM_RESPONSE,
>         .datalength_bits        = 24,
>         .datactrl_blocksz       = 11,
> +       .datactrl_odd_blocksz   = true,
>         .datactrl_mask_sdio     = MCI_DPSM_ST_SDIOEN,
>         .st_sdio                = true,
>         .st_clkdiv              = true,
> @@ -201,6 +202,7 @@ static struct variant_data variant_ux500v2 = {
>         .datactrl_mask_ddrmode  = MCI_DPSM_ST_DDRMODE,
>         .datalength_bits        = 24,
>         .datactrl_blocksz       = 11,
> +       .datactrl_odd_blocksz   = true,
>         .datactrl_mask_sdio     = MCI_DPSM_ST_SDIOEN,
>         .st_sdio                = true,
>         .st_clkdiv              = true,
> @@ -260,6 +262,7 @@ static struct variant_data variant_stm32_sdmmc = {
>         .datacnt_useless        = true,
>         .datalength_bits        = 25,
>         .datactrl_blocksz       = 14,
> +       .datactrl_odd_blocksz   = true,
>         .stm32_idmabsize_mask   = GENMASK(12, 5),
>         .init                   = sdmmc_variant_init,
>  };
> @@ -279,6 +282,7 @@ static struct variant_data variant_qcom = {
>         .data_cmd_enable        = MCI_CPSM_QCOM_DATCMD,
>         .datalength_bits        = 24,
>         .datactrl_blocksz       = 11,
> +       .datactrl_odd_blocksz   = true,
>         .pwrreg_powerup         = MCI_PWR_UP,
>         .f_max                  = 208000000,
>         .explicit_mclk_control  = true,
> @@ -447,10 +451,11 @@ void mmci_dma_setup(struct mmci_host *host)
>  static int mmci_validate_data(struct mmci_host *host,
>                               struct mmc_data *data)
>  {
> +       struct variant_data *variant = host->variant;
> +
>         if (!data)
>                 return 0;
> -
> -       if (!is_power_of_2(data->blksz)) {
> +       if (!is_power_of_2(data->blksz) && !variant->datactrl_odd_blocksz) {

This is too early in the series. You need to deal with the DMA and pio
issues, before releasing this constraint.

In other words, I would rather split this patch in two pieces. One
patch dealing with dma submit error path, placed first in the series
and another patch that adds the odd block sizes variant and releases
the constraint, which should comes last.

Makes sense?

>                 dev_err(mmc_dev(host->mmc),
>                         "unsupported block size (%d bytes)\n", data->blksz);
>                 return -EINVAL;
> @@ -515,7 +520,9 @@ int mmci_dma_start(struct mmci_host *host, unsigned int datactrl)
>                  "Submit MMCI DMA job, sglen %d blksz %04x blks %04x flags %08x\n",
>                  data->sg_len, data->blksz, data->blocks, data->flags);
>
> -       host->ops->dma_start(host, &datactrl);
> +       ret = host->ops->dma_start(host, &datactrl);
> +       if (ret)
> +               return ret;
>
>         /* Trigger the DMA transfer */
>         mmci_write_datactrlreg(host, datactrl);
> @@ -872,9 +879,14 @@ int mmci_dmae_prep_data(struct mmci_host *host,
>  int mmci_dmae_start(struct mmci_host *host, unsigned int *datactrl)
>  {
>         struct mmci_dmae_priv *dmae = host->dma_priv;
> +       int ret;
>
>         host->dma_in_progress = true;
> -       dmaengine_submit(dmae->desc_current);
> +       ret = dma_submit_error(dmaengine_submit(dmae->desc_current));
> +       if (ret < 0) {
> +               host->dma_in_progress = false;
> +               return ret;
> +       }
>         dma_async_issue_pending(dmae->cur);
>
>         *datactrl |= MCI_DPSM_DMAENABLE;
> diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
> index 833236ecb31e..c7f94726eaa1 100644
> --- a/drivers/mmc/host/mmci.h
> +++ b/drivers/mmc/host/mmci.h
> @@ -278,7 +278,10 @@ struct mmci_host;
>   * @stm32_clkdiv: true if using a STM32-specific clock divider algorithm
>   * @datactrl_mask_ddrmode: ddr mode mask in datactrl register.
>   * @datactrl_mask_sdio: SDIO enable mask in datactrl register
> - * @datactrl_blksz: block size in power of two
> + * @datactrl_blocksz: block size in power of two
> + * @datactrl_odd_blocksz: true if block any sizes are supported, such as one
> + *                   single character, as is necessary when using some SDIO
> + *                   devices.
>   * @datactrl_first: true if data must be setup before send command
>   * @datacnt_useless: true if you could not use datacnt register to read
>   *                  remaining data
> @@ -323,6 +326,7 @@ struct variant_data {
>         unsigned int            datactrl_mask_ddrmode;
>         unsigned int            datactrl_mask_sdio;
>         unsigned int            datactrl_blocksz;
> +       u8                      datactrl_odd_blocksz:1;
>         u8                      datactrl_first:1;
>         u8                      datacnt_useless:1;
>         u8                      st_sdio:1;
> --
> 2.21.0
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant
  2019-11-13 11:05   ` Ulf Hansson
@ 2019-11-13 13:17     ` Linus Walleij
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13 13:17 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Russell King, Stephan Gerhold, linux-mmc, Russell King,
	Srinivas Kandagatla, Linux ARM, Niklas Cassel, Ludovic Barre,
	Brian Masney

On Wed, Nov 13, 2019 at 12:06 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> > From: Ulf Hansson <ulf.hansson@linaro.org>
(...)
> There is no need to keep my authorship of the patch, it's very much
> different than the original. I would rather replace it that with a
> suggested-by tag.

OK

> This is too early in the series. You need to deal with the DMA and pio
> issues, before releasing this constraint.
>
> In other words, I would rather split this patch in two pieces. One
> patch dealing with dma submit error path, placed first in the series
> and another patch that adds the odd block sizes variant and releases
> the constraint, which should comes last.
>
> Makes sense?

Sure thing, I fix. We need to get the whole slew tested first then
I'll collect the Tested-by's and reorder and split.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] mmc: mmci: Proper PIO residue handling
  2019-11-13  7:53 ` [PATCH 3/3] mmc: mmci: Proper PIO residue handling Linus Walleij
@ 2019-11-13 13:24   ` Linus Walleij
  2019-11-13 14:03   ` Ulf Hansson
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Walleij @ 2019-11-13 13:24 UTC (permalink / raw)
  To: Ulf Hansson, linux-mmc, Linux ARM, Stephan Gerhold
  Cc: Niklas Cassel, Russell King, Russell King, Ludovic Barre, Brian Masney

On Wed, Nov 13, 2019 at 8:53 AM Linus Walleij <linus.walleij@linaro.org> wrote:

> +               /* Pack the residue into a 32bit word */
> +               for (i = 0; i < host->pio_write_residue_sz; i++) {
> +                       val |= host->pio_write_residue[i];
> +                       val <<= 8;
> +               }
> +               /* Top up with new data */
> +               for (i = 0; i < fill; i++) {
> +                       val |= *ptr;
> +                       val <<= 8;
> +                       ptr++;
> +                       remain--;
> +               }

I'm worried that I might have gotten this wrong.

iowrite32_rep() reads the data little-endian (native endianness)
from memory does it not?

Bytes  [0 1 2 3] end up in the FIFO like [3 2 1 0].

So it will pack the first byte into the lowest 8 bits, second byte into
bits 8-15 etc.

So I should rewrite all the loops like this:

for (i = 0; i < host->pio_write_residue_sz; i++) {
        val |= (host->pio_write_residue[i] << 24);
        val >>= 8;
}

So I shift the value down from the high bits instead of the
other way around.

This also gives a pretty plausible hint att what might be wrong with the
DMA in non-divisible by 4.

As suggested by Stephan in another context, I will try to set up my
own test rig for this.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] mmc: mmci: Proper PIO residue handling
  2019-11-13  7:53 ` [PATCH 3/3] mmc: mmci: Proper PIO residue handling Linus Walleij
  2019-11-13 13:24   ` Linus Walleij
@ 2019-11-13 14:03   ` Ulf Hansson
  1 sibling, 0 replies; 11+ messages in thread
From: Ulf Hansson @ 2019-11-13 14:03 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Russell King, Stephan Gerhold, linux-mmc, Russell King,
	Linux ARM, Niklas Cassel, Ludovic Barre, Brian Masney

On Wed, 13 Nov 2019 at 08:53, Linus Walleij <linus.walleij@linaro.org> wrote:
>
> The MMCI PIO write function contains a bug if using any buffers
> with sglist miter contents that do not add upp to even 32bit
> writes. The PIO write code currently does this:
>
> iowrite32_rep(base + MMCIFIFO, ptr, (count + 3) >> 2);
>
> This will make sure that a single buffer passed in gets written
> into the FIFO even if it is odd, i.e. not evenly divisible
> by 4.
>
> However it has the following problems:
>
> - It will read up to three bytes beyond the end of the buffer
>   and fill the FIFO with unpredictable junk and possibly cause
>   segmentation violations if the read passes a page boundary
>   such as at the end of an sglist buffer.
>
> - It will be fine if a single buffer is passed in, but the code
>   handles SG lists. There is a while (1) loop in mmci_pio_irq()
>   which repeatedly calls mmci_pio_write() as long as the FIFO
>   is not half-full, if it gets half-full it exits the IRQ and
>   waits for a new IRQ, also the while loop repeatedly calls
>   sg_miter_next() to consume bytes and this means that between
>   subsequent calls to mmci_pio_write() some random junk can be
>   inserted at the end of each call if the buffers passed in
>   do not contain an number of bytes evenly divisible by 4.

Thanks for a well description to the problem!

>
> Fix this by simply doing this:
>
> iowrite32_rep(base + MMCIFIFO, ptr, count >> 2);
>
> But since the code will then just not write enough bytes if
> the count is not evenly divisible by 4, we introduce a special
> residue buffer and keep track of any odd bytes from 1..3 and
> just write these padded out with new data in a separate
> statement the next time we get a call of the mmci_pio_write(),
> or, if there is for example only 1,2 or 3 bytes in the transfer,
> or we end up with an odd number of bytes in the residue,
> flushi it out when we end the current data transaction to
> the host.
>
> Cc: Ludovic Barre <ludovic.barre@st.com>
> Cc: Brian Masney <masneyb@onstation.org>
> Cc: Stephan Gerhold <stephan@gerhold.net>
> Cc: Niklas Cassel <niklas.cassel@linaro.org>
> Cc: Russell King <rmk+kernel@armlinux.org.uk>
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> ChangeLog v1->v3:
> - New patch in v3 after discussion with Ulf
> - I'm unable to test this because I cannot provoke
>   these odd reads/writes but hoping for Stephan to take it
>   for a spin.
> ---
>  drivers/mmc/host/mmci.c | 125 ++++++++++++++++++++++++++++++++++++----
>  drivers/mmc/host/mmci.h |   2 +
>  2 files changed, 117 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index a08cd845dddc..0d01689016f0 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -607,6 +607,7 @@ static void mmci_init_sg(struct mmci_host *host, struct mmc_data *data)
>                 flags |= SG_MITER_FROM_SG;
>
>         sg_miter_start(&host->sg_miter, data->sg, data->sg_len, flags);
> +       host->pio_write_residue_sz = 0;
>  }
>
>  static u32 mmci_get_dctrl_cfg(struct mmci_host *host)
> @@ -1422,30 +1423,111 @@ static int mmci_pio_write(struct mmci_host *host, char *buffer, unsigned int rem
>         struct variant_data *variant = host->variant;
>         void __iomem *base = host->base;
>         char *ptr = buffer;
> +       bool wrote_residue = false;
> +       int i;
> +
> +       /*
> +        * This will normally not happen during block I/O, but can
> +        * happen during SDIO traffic, where odd byte chunks are
> +        * shoveled to the SDIO link. Fill up the residue buffer
> +        * and flush out.
> +        */
> +       if (host->pio_write_residue_sz) {
> +               int fill = 4 - host->pio_write_residue_sz;
> +               u32 val = 0;
> +
> +               /*
> +                * If we need more padding that what we have, just stash
> +                * some more in the residue then and continue. This only
> +                * happens if we're passed e.g. 1 or 2 bytes and there is
> +                * just 1 byte residue residue already: we can't fill up!
> +                */
> +               if (fill > remain) {
> +                       for (i = 0; i < fill; i++) {
> +                               host->pio_write_residue[host->pio_write_residue_sz + i] = *ptr;
> +                               host->pio_write_residue_sz++;
> +                               ptr++;
> +                               remain--;
> +                       }
> +                       return ptr - buffer;
> +               }
> +
> +               /* Pack the residue into a 32bit word */
> +               for (i = 0; i < host->pio_write_residue_sz; i++) {
> +                       val |= host->pio_write_residue[i];
> +                       val <<= 8;
> +               }
> +               /* Top up with new data */
> +               for (i = 0; i < fill; i++) {
> +                       val |= *ptr;
> +                       val <<= 8;
> +                       ptr++;
> +                       remain--;
> +               }
> +               iowrite32(val, base + MMCIFIFO);
> +               host->pio_write_residue_sz = 0;
> +               wrote_residue = true;
> +       }
> +
> +       /*
> +        * Maybe the residue plus some few bytes was exactly filling a
> +        * 32bit word.
> +        */
> +       if (!remain)
> +               return ptr - buffer;
>
>         do {
> -               unsigned int count, maxcnt;
> +               unsigned int count, maxcnt, written, residue;
>
> +               /*
> +                * If the FIFO is empty just stash it full with data else
> +                * just half-fill it.
> +                */
>                 maxcnt = status & MCI_TXFIFOEMPTY ?
> -                        variant->fifosize : variant->fifohalfsize;
> +                       variant->fifosize : variant->fifohalfsize;
> +
> +               /* Watch it to not overfill the FIFO */
> +               if (wrote_residue)
> +                       maxcnt -= 4;
> +
>                 count = min(remain, maxcnt);
>
>                 /*
>                  * SDIO especially may want to send something that is
>                  * not divisible by 4 (as opposed to card sectors
> -                * etc), and the FIFO only accept full 32-bit writes.
> -                * So compensate by adding +3 on the count, a single
> -                * byte become a 32bit write, 7 bytes will be two
> -                * 32bit writes etc.
> +                * etc), but the FIFO only accepts full 32-bit writes so start by
> +                * just filling up quickly with as much as we can.
>                  */
> -               iowrite32_rep(base + MMCIFIFO, ptr, (count + 3) >> 2);
> +               iowrite32_rep(base + MMCIFIFO, ptr, count >> 2);
> +               residue = count & 3;
>
> -               ptr += count;
> -               remain -= count;
> +               written = count - residue;
> +               ptr += written;
> +               remain -= written;
>
> -               if (remain == 0)
> +               /* Handles the common case for block-IO */
> +               if (!remain)
>                         break;
>
> +               /*
> +                * These were not written out since we only write 32bit
> +                * words to the FIFO. Since this function gets called
> +                * repeatedly and across page boundaries with new pointers
> +                * in *buffer we need to take special care to stash odd
> +                * bytes away and flush them out in next PIO round or at the
> +                * end of the whole data transfer.
> +                */
> +               if (remain < 4) {
> +                       WARN_ON(remain != residue);
> +                       host->pio_write_residue_sz = residue;
> +                       for (i = 0; i < residue; i++) {
> +                               host->pio_write_residue[i] = *ptr;
> +                               ptr++;
> +                               remain--;
> +                       }
> +                       break;
> +               }
> +
>                 status = readl(base + MMCISTATUS);
>         } while (status & MCI_TXFIFOHALFEMPTY);
>
> @@ -1522,6 +1604,29 @@ static irqreturn_t mmci_pio_irq(int irq, void *dev_id)
>         if (host->size == 0) {
>                 mmci_set_mask1(host, 0);
>                 writel(readl(base + MMCIMASK0) | MCI_DATAENDMASK, base + MMCIMASK0);
> +
> +               if ((status & MCI_TXACTIVE) && host->pio_write_residue_sz) {
> +                       /*
> +                        * If the last pass of the SG miter left some residue after the pio
> +                        * write loop, push that out to the card. We are now at the end of
> +                        * the transfer so it is OK to pad with zeroes.
> +                        */
> +                       int fill = 4 - host->pio_write_residue_sz;
> +                       u32 val = 0;
> +                       int i;
> +
> +                       /* Pack the residue into a 32bit word */
> +                       for (i = 0; i < host->pio_write_residue_sz; i++) {
> +                               val |= host->pio_write_residue[i];
> +                               val <<= 8;
> +                       }
> +                       /* Top up with zeroes */
> +                       for (i = 0; i < fill; i++)
> +                               val <<= 8;
> +
> +                       iowrite32(val, base + MMCIFIFO);
> +                       host->pio_write_residue_sz = 0;
> +               }

Honestly, I am concerned about the above complex piece of code. And it
also affects the hot path, etc, but please see my comment in the end
around all this.

However, I am open to apply something along the lines of this change,
in case we can't come up with a better approach.

>         }
>
>         return IRQ_HANDLED;
> diff --git a/drivers/mmc/host/mmci.h b/drivers/mmc/host/mmci.h
> index e20af17bb313..a7690b64d4cd 100644
> --- a/drivers/mmc/host/mmci.h
> +++ b/drivers/mmc/host/mmci.h
> @@ -425,6 +425,8 @@ struct mmci_host {
>         /* pio stuff */
>         struct sg_mapping_iter  sg_miter;
>         unsigned int            size;
> +       u8                      pio_write_residue[4];
> +       u8                      pio_write_residue_sz;
>         int (*get_rx_fifocnt)(struct mmci_host *h, u32 status, int remain);
>
>         u8                      use_dma:1;
> --
> 2.21.0
>

After reviewing this, I decided to also look deeper into the
mmci_pio_read(). Unfortunately, it also doesn't cover all cases, which
I believed it did.

More precisely, mmc_pio_read() can't cope with a buffer being split
across two pages, where the first element contains 2 bytes and the
second contains 4 bytes (total size 6 bytes). The result is that we
will end up filling the buffer with corrupt data. (While filling the
first 2 bytes in the buffer, we read 4 bytes from the fifo, but throw
away the last 2 bytes while copying to the buffer. Then when filling
the last 4 bytes in the buffer, we read 4 bytes from the fifo, while
only the first 2 bytes in fifo is actually valid data...)

Now, why am I stating all this?

Because I am wondering if it's really worth to support all kinds of
different flavors of allocated buffers. It clearly introduces severe
amount of complex code, that not only is hard to test, but also
affects the "hot path" for "nicely" allocated buffers (whatever that
means to performance).

I guess what I am really questioning is, if it would be better to
limit the driver to support only "nicely" allocated buffers. Of
course, in the long run, that means we also need a way to inform upper
layer drivers/code of what that means. I guess we need to introduce
some new mmc host variables, along the lines of how host->max_seg_size
and host->max_req_size is being used.

What do you think, is it a terrible idea? :-)

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-11-13 14:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13  7:53 [PATCH 0/3] MMCI odd buffer alignment fixes Linus Walleij
2019-11-13  7:53 ` [PATCH 1/3] mmc: mmci: Support odd block sizes for ux500v2 and qcom variant Linus Walleij
2019-11-13 10:08   ` Ludovic BARRE
2019-11-13 11:05   ` Ulf Hansson
2019-11-13 13:17     ` Linus Walleij
2019-11-13  7:53 ` [PATCH 2/3] mmc: mmci: Bail out from odd DMA on Ux500 Linus Walleij
2019-11-13 10:29   ` Marc Gonzalez
2019-11-13 10:54   ` Ulf Hansson
2019-11-13  7:53 ` [PATCH 3/3] mmc: mmci: Proper PIO residue handling Linus Walleij
2019-11-13 13:24   ` Linus Walleij
2019-11-13 14:03   ` Ulf Hansson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).